Skip to main content



About Voice Activity Detection and Comfort Noise Generation.


VAD Overview

VAD stands for Voice Activity Detection. Its role is to distinguish between a voice and anything else, including silence. In VoIP applications it may be used as a tool to minimize the number of audio packets transmitted. If nobody is speaking it is possible either to stop the flow of audio packets, or at least change to a much lower rate of comfort noise packets. In a typical phone conversation there are short periods when both parties are talking, but most of the time only one party is talking. With VAD, transmission in each direction can be greatly reduced, or even halted, for nearly 50% of the call. VAD is usually a function within the endpoints of a VoIP path.

There are two things to note here, which often confuse people. VAD is not the same as silence detection. Loud music is certainly not silence, but it is also not voice, and a good VAD will declare "no voice present". Secondly, the use of VAD to minimize packet flow is often described as a bandwidth reduction measure. This is only the case for network links carrying large numbers of concurrent calls, when the likelihood of everyone talking at once is low. For most customer premises applications the bandwidth required of the network will be the peak when all conversations are declared to be voice, and packets are being transmitted at the normal rate of their voice codecs. What VAD allows in these cases is a reduction in the average data rate, freeing up lots of capacity for data which is not real-time data and therefore can be queued..

CNG Overview

CN stands for Comfort Noise. This is simulated background noise, synthesized at the receiving end of a VoIP path. This function is called comfort noise generation (CNG). In a crude form it may be simple simulation of general room "mush" (e.g. Gaussian noise with a Hoth spectral weighting). In a more sophisticated form, noise parameters received from the sender may contain noise modelling parameters. These may be used to produce noise which closely matches the amplitude and spectral qualities of the noise currently being picked up in the sender's environment.

CN also refers to the CN RTP packets specified by RFC 3389. CN packets are sent when the VAD function declares there is no voice present. A CN packet can convey the noise modelling parameters described above, but frequently this information is missing. Ideally CN packets should be sent as the noise in the sender's environment changes, so the CNG function at the receiver can update the noise effectively, and avoid abrupt changes in the noise when the voice signal resumes. More typically, just a single CN packet is sent as the flow of voice packets ceases.


VAD can be set in endpoint profiles and can have 4 values:

  • in - turn on VAD for incoming media,
  • out - turn on VAD for outgoing media,
  • both - turn on VAD for both incoming and outgoing media,
  • none - VAD is completely turned off.

When FreeSWITCH does not detect speech, it stops transmitting RTP. FreeSWITCH also supports per call VAD handling with the following channel variables:


In FreeSWITCH the CNG options select whether or not FreeSWITCH will generate CN RTP packets. The suppress-cng sofia profile option and suppress_cng channel variable control this setting. When both sides support RFC 3389 (they agree in SDP message exchange, rtpmap:13), FreeSWITCH will send CN packets.

Allowing CNG in FreeSWITCH does not mean it will generate any comfort noise into the media channel.

In case one of the call legs does not handle VAD and asynchronous RTP media, it's possible that the listening caller might think that hearing perfect silence means the connection has been dropped. For handling these endpoints, there is a channel variable: bridge_generate_comfort_noise which will generate fake audio.

Applicable Settings

Channel Variables

Profile Parameters

suppress-cng — Suppress Comfort Noise Generator (CNG) on this profile or per call with the 'suppress_cng' variable.

Silence File Type

To assign silence as a source of music on hold or ringback use this syntax:


The higher the level, the lower the volume. Default value is approximately 400. Set the value in the appropriate channel variable:

Playing silence as a file type

 <action application="set" data="hold_music=silence"/>
<action application="set" data="ringback=silence"/>
<action application="set" data="transfer_ringback=silence"/>

silence can be used anywhere a filename would normally be used.