Tutorial: The step two to making a ‘Talking’ iPhone app, when to record and when to stop recording

This post is related to the following posts:

The ‘Talking’ app, you say something and an animal repeats what you say in a cute voice.

Well, we can’t really ask the player to tap the animal to make it record, we want the animal to simply record something when the player say something, and then stop recording when the player stopped talking, and then play it. So how do we detect if the player stopped talking?

How to start recording when detecting sound, and stop recording when detect silence?

From Stack Overflow:

Perhaps you could use the AVAudioRecorder’s support for audio level metering to keep track of the audio levels and enable recording when the levels are above a given threshold. You’d need to enable metering with:
[anAVAudioRecorder setMeteringEnabled:YES];
and then you could periodically call:
[anAVAudioRecorder updateMeters];
power = [anAVAudioRecorder averagePowerForChannel:0];
if (power > threshold && anAVAudioRecorder.recording==NO) {
    [anAVAudioRecorder record];
} else if (power < threshold && anAVAudioRecorder.recording==YES) {
    [anAVAudioRecorder stop];
}
Or something like that.

Source: http://stackoverflow.com/questions/3855919/ios-avaudiorecorder-how-to-record-only-when-audio-input-present-non-silence

According to the API, averagePowerForChannel returns the average power of the sound being recorded. If it returns 0 that means that recording is at its full scale, the maximum power (like when someone shouts really really loudly into the mic?), while -160 is the minimum power or near silence (which is what we want right, near silence?).

Another tutorial (Tutorial: Detecting When a User Blows into the Mic by Dan Grigsby), you can also use peakPowerForChannel. He made an algorithm to get the lowPassResults of the audio input:

From the tutorial:

Each time the timer’s callback method is triggered the lowPassResults level variable is recalculated. As a convenience, it’s converted to a 0-1 scale, where zero is complete quiet and one is full volume.

We’ll recognize someone as having blown into the mic when the low pass filtered level crosses a threshold. Choosing the threshold number is somewhat of an art. Set it too low and it’s easily triggered; set it too high and the person has to breath into the mic at gale force and at length. For my app’s need, 0.95 works.
- (void)listenForBlow:(NSTimer *)timer {
	[recorder updateMeters];

	const double ALPHA = 0.05;
	double peakPowerForChannel = pow(10, (0.05 * [recorder peakPowerForChannel:0]));
	lowPassResults = ALPHA * peakPowerForChannel + (1.0 - ALPHA) * lowPassResults;

if (lowPassResults > 0.95)
		NSLog(@"Mic blow detected");
}

Source: http://mobileorchard.com/tutorial-detecting-when-a-user-blows-into-the-mic/

So I am using this Dan’s algorithm, except the the threshold number, I’m still testing it out, it really is somewhat of an art.

Okay, now we know when the player STOPS talking, what about when the user starts talking? We wouldn’t be able to know that since we stopped recording after the player stops talking, right? We won’t be able to get the power for the channel with a stopped recorder.

And StackOverflow comes to the rescue again, I read somewhere that you should have TWO AVAudioRecorders, instead of ONE. One AVAudioRecorder to monitor the power for channel at all times and one to actually record your player’s voice.

So we have:

NSURL *monitorTmpFile;
NSURL *recordedTmpFile;
AVAudioRecorder *recorder;
AVAudioRecorder *audioMonitor;

And some booleans to keep track of when it is recording or playing:

BOOL isRecording;
BOOL isPlaying;

We have to initialize both controllers, somewhere in your init add:

[self initAudioMonitor];
[self initRecorder];

The functions:

-(void) initAudioMonitor
{    NSMutableDictionary* recordSetting = [[NSMutableDictionary alloc] init];
    [recordSetting setValue :[NSNumber numberWithInt:kAudioFormatAppleIMA4] forKey:AVFormatIDKey];
    [recordSetting setValue:[NSNumber numberWithFloat:44100.0] forKey:AVSampleRateKey];
    [recordSetting setValue:[NSNumber numberWithInt: 1] forKey:AVNumberOfChannelsKey];

    NSArray* documentPaths = NSSearchPathForDirectoriesInDomains(NSDocumentDirectory, NSUserDomainMask, YES);
    NSString* fullFilePath = [[documentPaths objectAtIndex:0] stringByAppendingPathComponent: @”monitor.caf”];
    monitorTmpFile = [NSURL fileURLWithPath:fullFilePath];

    audioMonitor = [[ AVAudioRecorder alloc] initWithURL: monitorTmpFile settings:recordSetting error:&error];

    [audioMonitor setMeteringEnabled:YES];

    [audioMonitor setDelegate:self];

    [audioMonitor record];
}

-(void) initRecorder
{    NSMutableDictionary* recordSetting = [[NSMutableDictionary alloc] init];
    [recordSetting setValue :[NSNumber numberWithInt:kAudioFormatAppleIMA4] forKey:AVFormatIDKey];
    [recordSetting setValue:[NSNumber numberWithFloat:44100.0] forKey:AVSampleRateKey];
    [recordSetting setValue:[NSNumber numberWithInt: 1] forKey:AVNumberOfChannelsKey];

    NSArray* documentPaths = NSSearchPathForDirectoriesInDomains(NSDocumentDirectory, NSUserDomainMask, YES);
    NSString* fullFilePath = [[documentPaths objectAtIndex:0] stringByAppendingPathComponent: @”in.caf”];
    recordedTmpFile = [NSURL fileURLWithPath:fullFilePath];

    recorder = [[ AVAudioRecorder alloc] initWithURL: recordedTmpFile settings:recordSetting error:&error];

    [recorder setMeteringEnabled:YES];

    [recorder setDelegate:self];

    [recorder prepareToRecord];
}

And then we have a function that will be called all the time, to monitor your AVAudioRecorders, call it somewhere in your update:

-(void) monitorAudioController: (ccTime) dt
{
    if(!isPlaying)
    {   [audioMonitor updateMeters];

        // a convenience, it’s converted to a 0-1 scale, where zero is complete quiet and one is full volume
        const double ALPHA = 0.05;
        double peakPowerForChannel = pow(10, (0.05 * [audioMonitor peakPowerForChannel:0]));
        double audioMonitorResults = ALPHA * peakPowerForChannel + (1.0 – ALPHA) * audioMonitorResults;

        NSLog(@”audioMonitorResults: %f”, audioMonitorResults);

        if (audioMonitorResults > AUDIOMONITOR_THRESHOLD)
        {    NSLog(@”Sound detected”);

            if(!isRecording)
            {   [audioMonitor stop];
                [self startRecording];
            }
        }   else
        {   NSLog(@”Silence detected”);
            if(isRecording)
            {   if(silenceTime > MAX_SILENCETIME)
                {   NSLog(@”Next silence detected”);
                    [audioMonitor stop];
                     [self stopRecordingAndPlay];
                    silenceTime = 0;
                }   else
                {   silenceTime += dt;
                }
            }
        }

        if([audioMonitor currentTime] > MAX_MONITORTIME)
        {   [audioMonitor stop];
            [audioMonitor record];
        }
    }
}

Okay, lemme explain…

You have to call [audioMonitor updateMeters], because (according to AVAudioRecorder class reference):

Refreshes the average and peak power values for all channels of an audio recorder.

And then, do you see Dan’s algorithm?

const double ALPHA = 0.05;
double peakPowerForChannel = pow(10, (0.05 * [audioMonitor peakPowerForChannel:0]));
double audioMonitorResults = ALPHA * peakPowerForChannel + (1.0 – ALPHA) * audioMonitorResults;

NSLog(@”audioMonitorResults: %f”, audioMonitorResults);

If audioMonitorResults is greater than our threshold AUDIOMONITOR_THRESHOLD (to get this value, requires many hours of testing and monitoring, that’s why I have a NSLog there), that means we have detected sound. And we start recording!

if(!isRecording)
{ [audioMonitor stop];
[self startRecording];
}

If it isn’t already recording, we stop the audio monitor and start recording:

-(void) startRecording
{   NSLog(@”startRecording”);

    isRecording = YES;
    [recorder record];
}

Okay then, if the audioMonitorResults is less than the AUDIOMONITOR_THRESHOLD and we are recording, it means that silence has been detected, but but but, we do not stop the recording at once. Why…? Because when people are speaking, we speak like this: “Hello, how are you?” instead of “Hellohowareyou”, you see the spaces between each word are also detected as silences, which is why:

if(isRecording)
{   if(silenceTime > MAX_SILENCETIME)
     {   NSLog(@”Next silence detected”);
         [audioMonitor stop];
         [self stopRecordingAndPlay];
         silenceTime = 0;
     }   else
     {   silenceTime += dt;
}

MAX_SILENCETIME is threshold for the silence time between words.

And then to make sure the size of our audioMonitor output will not explode:

if([audioMonitor currentTime] > MAX_MONITORTIME)
{ [audioMonitor stop];
[audioMonitor record];
}

It saves the file after MAX_MONITORTIME.

And then stopRecordingAndPlay:

-(void) stopRecordingAndPlay
{    NSLog(@”stopRecording Record time: %f”, [recorder currentTime]);

    if([recorder currentTime] > MIN_RECORDTIME)
    {   isRecording = NO;
        [recorder stop];

        isPlaying = YES;
        // insert code for playing the audio here
    }   else
    {   [audioMonitor record];
    }
}

After the audio is played, call:

-(void) stopPlaying
{ isPlaying = NO;
[audioMonitor record];
}

And there we go! 🙂

To summarize:

1 AVAudioRecorder to monitor when the player starts talking and stops talking
1 AVAudioRecorder to record the player’s voice
Use peakPowerForChannel to detect talking or silence

And that’s about it!

Tutorial: The step two to making a ‘Talking’ iPhone app, when to record and when to stop recording

Published by Michelle Chen

Leave a comment Cancel reply

Share this:

Related

Published by Michelle Chen

Leave a comment Cancel reply