96 Voice API Programming Guide — June 2005
Recording and Playback
8.9.2 Enabling
The modes related to the voice activity detector are specified in the mode parameter of the
dx_reciottdata( ) function. They are:
RM_VADNOTIFY
generates an event, TDX_VAD, on detection of voice energy during the recording operation
Note: TDX_VAD does not indicate function termination; it is an unsolicited event. Do not
confuse this event with the TEC_VAD event which is used in the continuous speech
processing (CSP) library.
RM_ISCR
adds initial silence compression to the VAD capability. Initial silence here refers to the amount
of silence on the line before voice activity is detected. When using RM_ISCR, the default
value for the amount of initial silence allowable is 3 seconds. Any initial silence longer than
that will be eliminated to the default allowable amount. This default value can be changed by
modifying a parameter in the .config file for the board and then generating a new .fcd file. The
0x416 parameter must be added in the [encoder] section of the .config file. For details on using
this parameter, see the DM3 Configuration Guide.
Note: The RM_ISCR mode can only be used in conjunction with RM_VADNOTIFY.
When these two modes are used together, no data is recorded as output until voice activity is
detected on the line. The TDX_VAD event indicates the initiation of voice. The output file will be
empty before voice activity is detected, although some initial silence may be included as specified
in the .fcd file.
To enable these modes, OR them to the mode parameter. For example:
t_Return=dx_reciottdata(DevHandle, Iott, Tpt, &t_Xpb, EV_ASYNC|RM_VADNOTIFY);
t_Return=dx_reciottdata(DevHandle, Iott, Tpt, &t_Xpb, EV_ASYNC|RM_VADNOTIFY|RM_ISCR);
Note: The dx_reciottdata( ) function does not perform echo-cancelled streaming. For automatic speech
recognition applications, use record or streaming functions in the continuous speech processing
(CSP) API library. For more information, see the Continuous Speech Processing API Programming
Guide and Continuous Speech Processing API Programming Guide.
8.9.3 Encoding Methods Supported
The following encoding algorithms and sampling rates are supported for recording with the voice
activity detector:
OKI ADPCM, 6 kHz with 4-bit samples (24 kbps) and 8 kHz with 4-bit samples (32 kbps),
VOX and WAVE file formats
linear PCM, 8 kHz sampling 64 Kbps (8 bits), 8 kHz sampling 128 Kbps (16 bits), VOX and
WAVE file formats
G.711 PCM, 6 kHz with 8-bit samples (48 kbps) and 8 kHz with 8-bit samples (64 kbps) using
A-law or mu-law coding, VOX and WAVE file formats
G.721 at 8 kHz with 4-bit samples (32 kbps), VOX and WAVE file formats