Intel 52377002 8.9.2 Enabling, 8.9.3 Encoding Methods Supported

96 Voice API Programming Guide — June 2005

Recording and Playback

8.9.2 Enabling

The modes related to the voice activity detector are specified in the mode parameter of the

dx_reciottdata( ) function. They are:

RM_VADNOTIFY

generates an event, TDX_VAD, on detection of voice energy during the recording operation

Note: TDX_VAD does not indicate function termination; it is an unsolicited event. Do not

confuse this event with the TEC_VAD event which is used in the continuous speech

processing (CSP) library.

RM_ISCR

adds initial silence compression to the VAD capability. Initial silence here refers to the amount

of silence on the line before voice activity is detected. When using RM_ISCR, the default

value for the amount of initial silence allowable is 3 seconds. Any initial silence longer than

that will be eliminated to the default allowable amount. This default value can be changed by

modifying a parameter in the .config file for the board and then generating a new .fcd file. The

0x416 parameter must be added in the [encoder] section of the .config file. For details on using

this parameter, see the DM3 Configuration Guide.

Note: The RM_ISCR mode can only be used in conjunction with RM_VADNOTIFY.

When these two modes are used together, no data is recorded as output until voice activity is

detected on the line. The TDX_VAD event indicates the initiation of voice. The output file will be

empty before voice activity is detected, although some initial silence may be included as specified

in the .fcd file.

To enable these modes, OR them to the mode parameter. For example:

t_Return=dx_reciottdata(DevHandle, Iott, Tpt, &t_Xpb, EV_ASYNC|RM_VADNOTIFY);

t_Return=dx_reciottdata(DevHandle, Iott, Tpt, &t_Xpb, EV_ASYNC|RM_VADNOTIFY|RM_ISCR);

Note: The dx_reciottdata( ) function does not perform echo-cancelled streaming. For automatic speech

recognition applications, use record or streaming functions in the continuous speech processing

(CSP) API library. For more information, see the Continuous Speech Processing API Programming

Guide and Continuous Speech Processing API Programming Guide.

8.9.3 Encoding Methods Supported

The following encoding algorithms and sampling rates are supported for recording with the voice

activity detector:

•OKI ADPCM, 6 kHz with 4-bit samples (24 kbps) and 8 kHz with 4-bit samples (32 kbps),

VOX and WAVE file formats

•linear PCM, 8 kHz sampling 64 Kbps (8 bits), 8 kHz sampling 128 Kbps (16 bits), VOX and

WAVE file formats

•G.711 PCM, 6 kHz with 8-bit samples (48 kbps) and 8 kHz with 8-bit samples (64 kbps) using

A-law or mu-law coding, VOX and WAVE file formats

•G.721 at 8 kHz with 4-bit samples (32 kbps), VOX and WAVE file formats