Skip to main content

<Stream>

The <Stream> instruction makes it possible to send raw audio streams from a running phone call over WebSockets in near real time, to a specified URL. The audio frames themselves are base64 encoded, embedded in a json string, together with other information like sequence number and timestamp. The feature can be used with Speech-To-Text systems and others.

Attributes

An example on how to use Stream:

<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Start>
<Stream url="wss://your-application.com/audiostream" />
</Start>
</Response>

This cXML will instruct Signalwire to make a copy of the audio frames of the current call and send them in near real-time over WebSocket to wss://your-application.com/audiostream.

<Stream> will start the audio stream in an asynchronous manner; it will continue with the next cXML instruction at once. In case there is no instruction, Signalwire will disconnect the call.

const { RestClient } = require("@signalwire/compatibility-api");
const response = new RestClient.LaML.VoiceResponse();

const start = response.start();
start.stream({
name: "Example Audio Stream",
url: "wss://your-application.com/audiostream",
});

console.log(response.toString());
AttributeDescription
url requiredAbsolute or relative URL. A WebSocket connection to the url will be established and audio will start flowing towards the Websocket server. The only supported protocol is wss. For security reasons ws is NOT supported.
authBearerToken optionalAn authentication Bearer token that can be supplied when starting a stream. The remote server can then authenticate the websocket connection request from the supplied token. More information can be found in the WebSocket connection section.
codec optionalSpecifies the audio codec for the stream. Default: PCMU@8000h. See Supported Codecs for full list.
name optionalUnique name for the Stream, per Call. It is used to stop a Stream by name.
realtime optionalIf true, and the stream is bidirectional, the stream offers a realtime experience to the call parties by managing packet delays and bursts. If false, the use benefits from buffered audio, which can be played out with delay. Default: false
track optionalThis attribute can be one of: inbound_track, outbound_track, both_tracks . Defaults to inbound_track. For both_tracks there will be both inbound_track and outbound_track events.
statusCallback optionalAbsolute or relative URL. SignalWire will make a HTTP GET or POST request to this URL when a Stream is started, stopped or there is an error.
statusCallbackMethod optionalGET or POST. The type of HTTP request to use when requesting a statusCallback. Default is POST.
Looking to use our REST APIs?

You can utilize our REST API to both start and stop streams.

StatusCallback parameters

For a statusCallback, SignalWire will send a request with the following parameters:

ParameterDescription
AccountSid stringThe unique ID of the Account this call is associated with.
CallSid stringA unique identifier for the call. May be used to later retrieve this message from the REST API.
StreamSid stringThe unique identifier for this Stream.
StreamName stringIf defined, this is the unique name of the Stream. Defaults to the StreamSid.
StreamEvent stringOne of stream-started, stream-stopped, or stream-error.
StreamTrack stringThe track configuration for this stream: inbound_track, outbound_track, or both_tracks.
StreamError stringIf an error has occurred, this will contain a detailed error message. See StreamError Values for possible values.
ConferenceSid stringIf the stream is part of a conference, the unique identifier for the conference.
Unique-ID stringThe unique call identifier.
Timestamp stringThe time of the event in ISO 8601 format.

StreamError Values

When StreamEvent is stream-error, the StreamError field will contain one of the following values:

Error ValueMeaningCommon Causes
invalid_urlInvalid WebSocket URL formatURL doesn't use wss:// protocol
missing_urlNo URL providedurl attribute not specified
invalid_trackInvalid track configurationTrack value not one of: inbound_track, outbound_track, both_tracks
codec_errorCodec not supported or misconfiguredRequested codec not enabled for your account, or multi-channel audio requested (only mono is supported)
connection_refusedRemote endpoint rejected connectionYour WebSocket server refused the connection
connection_refused_timeout_or_ssl_errorConnection timeout or SSL failureWebSocket server unreachable or SSL certificate issues
general_errorInternal stream initialization failedInternal error during stream setup - contact support with StreamSid
Duplicated stream IDStream ID already in useConference already has a stream with this ID
Duplicated stream nameStream name already in useConference already has a stream with this name

Supported Codecs

The codec attribute allows you to control the audio codec used for the stream. The following codecs are supported:

Codec ValueSample Rates Available
PCMU (default)8000h
L1616000h, 24000h

Codec Format Examples:

  • PCMU@8000h (default if no codec specified)
  • L16@24000h
  • L16@16000h

WebSocket connection

When establishing a stream, SignalWire initiates a WebSocket connection to your specified URL endpoint. The connection begins with an HTTP upgrade request containing the following headers:

HeaderDescription
HostThe destination server hosting the WebSocket endpoint (e.g., "example.com")
UpgradeProtocol upgrade request indicating a switch to WebSocket (value: "websocket")
ConnectionConnection type for the upgrade (value: "Upgrade")
Sec-WebSocket-KeyBase64-encoded random value used for the WebSocket handshake
Sec-WebSocket-VersionWebSocket protocol version (value: "13")
AuthorizationBearer token for authentication if authBearerToken attribute is provided (format: "Bearer token_here")

Once the WebSocket connection is established, SignalWire will send various events throughout the stream's lifecycle. These events are delivered as JSON-formatted WebSocket messages, each containing an event property that identifies the message type.

SignalWire sends the following event types to your WebSocket server:

  • Connected - Initial handshake message confirming the connection
  • Start - Stream metadata and configuration details
  • Media - Audio data packets
  • DTMF - Touch-tone digit events
  • Mark - Audio playback completion acknowledgments (echoed back when you send a mark)
  • Stop - Stream termination notification

When using bidirectional streams with <Connect><Stream>, you can also send messages to SignalWire:

  • Media - Send audio data into the call
  • Mark - Request playback completion acknowledgment
  • Clear - Flush the audio buffer
  • DTMF - Inject DTMF tones into the call

Connected message

SignalWire sends the Connected event immediately after establishing the WebSocket connection. This initial message outlines the communication protocol for all subsequent interactions.

PropertyDescription
eventThe string value of connected.
protocolDefines the protocol for the WebSocket connection's lifetime. Value: Call
versionSemantic version of the protocol. Current version: 0.2.0

Example Connected message

{
"event": "connected",
"protocol": "Call",
"version": "0.2.0"
}

Start message

SignalWire delivers this message right after the Connected event, providing essential stream configuration details. This message appears only once when the stream initializes.

PropertyDescription
eventThe string value of start.
sequenceNumberMessage sequence tracking, starting from "1" and incrementing with each message.
startContainer holding stream configuration and metadata details.
start.streamSidThe unique identifier of the Stream.
start.accountSidThe Account identifier that created the Stream.
start.callSidCall session identifier where the stream originated.
start.tracksArray specifying which audio directions will be transmitted. Possible values: ["inbound"], ["outbound"], or ["inbound", "outbound"].
start.customParametersObject containing custom key-value pairs configured during stream creation. Only present when custom parameters are defined.
start.mediaFormatConfiguration details for audio data formatting.
start.mediaFormat.encodingAudio codec format. Possible values: audio/x-mulaw (PCMU), audio/x-L16 (L16).
start.mediaFormat.sampleRateAudio sampling frequency in Hz. Possible values: 8000 (PCMU), 16000 (L16), or 24000 (L16).
start.mediaFormat.channelsAudio channel count. Always 1 (mono). Multi-channel audio is not supported.

Example Start message

{
"event": "start",
"sequenceNumber": "1",
"start": {
"streamSid": "7d56cc11-536d-4a45-b4fb-ed3d55be843b",
"accountSid": "b08dacad-2f6c-4de1-93d6-cc732e0c69c5",
"callSid": "76ac3c36-56da-4a3e-a0d6-b5f8df6da9ad",
"tracks": [
"inbound"
],
"customParameters": {},
"mediaFormat": {
"encoding": "audio/x-L16",
"sampleRate": 24000,
"channels": 1
}
}
}

Media message

Media messages deliver the actual audio content from the call as it flows through the stream.

PropertyDescription
eventThe string value of media.
sequenceNumberSequential message counter for ordering, starting at "1" and incrementing per transmission.
mediaContainer with audio data and associated metadata.
media.trackAudio track identifier. One of: inbound or outbound.
media.chunkChunk counter for this track. Starts at "1" and increments with each chunk.
media.timestampPresentation timestamp in milliseconds from the start of the stream.
media.payloadBase64-encoded raw audio data.

Example Media message

{
"event": "media",
"sequenceNumber": "42",
"media": {
"track": "inbound",
"chunk": "1",
"timestamp": "0",
"payload": "<base64-encoded-audio>"
}
}

Stop message

SignalWire transmits a stop message when the stream terminates or the associated call concludes.

PropertyDescription
eventThe string value of stop.
sequenceNumberMessage sequence counter.

Example stop message

{
"event": "stop",
"sequenceNumber": "999"
}

DTMF message

SignalWire generates DTMF messages whenever touch-tone key presses are detected in the audio stream.

PropertyDescription
eventEvent type identifier set to dtmf.
sequenceNumberMessage sequence counter.
streamSidStream identifier. Only included for bidirectional streams.
dtmfContainer holding the detected touch-tone details.
dtmf.digitThe digit that was pressed. Values: 0-9, *, #, A-D.
dtmf.durationDuration of the key press in milliseconds.

Example DTMF message

{
"event": "dtmf",
"sequenceNumber": "123",
"streamSid": "7d56cc11-536d-4a45-b4fb-ed3d55be843b",
"dtmf": {
"digit": "5",
"duration": 2000
}
}

Mark message

SignalWire delivers mark messages as acknowledgments for completed audio playback or cleared buffer operations. These responses match the mark identifiers from your earlier transmissions to SignalWire.

PropertyDescription
eventEvent type designation set to mark.
streamSidStream connection identifier. Only included for bidirectional streams.
markContainer with the mark acknowledgment details.
mark.nameThe mark identifier echoed back from your original transmission.

Example Mark message

{
"event": "mark",
"streamSid": "7d56cc11-536d-4a45-b4fb-ed3d55be843b",
"mark": {
"name": "my-custom-mark"
}
}

Sending WebSocket Messages

When you create a Stream within a <Connect><Stream> element, the connection becomes bidirectional. Your application can transmit WebSocket messages to SignalWire, enabling you to inject audio into the active call and manage the stream's behavior.

The messages that your WebSocket server can send back to SignalWire are:

  • Media - Send audio data back into the call
  • Mark - Track when audio playback completes
  • Clear - Interrupt buffered audio
  • DTMF - Inject DTMF tones into the call

Send a media message

Transmitting audio to SignalWire requires constructing a valid media message with the correct structure.

The payload encoding depends on the codec specified in your Stream configuration:

  • Default (PCMU/mulaw): audio/x-mulaw with 8000 Hz sample rate
  • L16@16000h: Linear PCM with 16000 Hz sample rate
  • L16@24000h: Linear PCM with 24000 Hz sample rate

All audio must be base64 encoded. SignalWire queues incoming media messages and plays them sequentially. To stop playback and clear the queue, transmit a clear message.

warning

Ensure your media.payload contains only raw audio data without file format headers. Including format headers will result in corrupted audio playback.

PropertyDescription
eventSpecifies the message type. Set to "media" for audio data.
streamSidTarget stream identifier for audio playback
mediaContainer object holding the audio payload
media.payloadBase64-encoded audio data (format varies by codec configuration)

Example media message (payload abbreviated):

{
"event": "media",
"streamSid": "c0c7d59b-df06-435e-afbc-9217ce318390",
"media": {
"payload": "a3242sa..."
}
}

Send a mark message

Transmit a mark message following your media messages to receive confirmation when audio playback finishes. SignalWire responds with a matching mark identifier once the audio completes playing (or immediately if no audio is queued).

You'll also receive mark confirmations when the audio queue is cleared via a clear message.

PropertyDescription
eventMessage type identifier. Set to "mark" for completion tracking.
streamSidTarget stream identifier for the mark operation
markContainer object with mark details
mark.nameCustom identifier to track specific audio segments or playback events

Example mark message:

{
"event": "mark",
"streamSid": "c0c7d59b-df06-435e-afbc-9217ce318390",
"mark": {
"name": "my label"
}
}

Send a clear message

Transmit a clear message to halt audio playback and flush the audio queue. This action triggers SignalWire to return any pending mark messages for the cleared audio segments.

PropertyDescription
eventMessage type identifier. Set to "clear" for audio interruption.
streamSidTarget stream identifier where audio should be stopped.

Example clear message:

{
"event": "clear",
"streamSid": "c0c7d59b-df06-435e-afbc-9217ce318390"
}

Send a DTMF message

Transmit a DTMF message to inject touch-tone digits into the call. This allows you to programmatically send DTMF tones as if they were pressed on a keypad.

PropertyDescription
eventMessage type identifier. Set to "dtmf" for DTMF injection.
streamSidTarget stream identifier where DTMF should be sent.
dtmfContainer object with DTMF details
dtmf.digitThe digit to send. Valid values: 0-9, *, #, A-D

Example DTMF message:

{
"event": "dtmf",
"streamSid": "c0c7d59b-df06-435e-afbc-9217ce318390",
"dtmf": {
"digit": "5"
}
}

Examples

Conference stream

<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Dial trim="do-not-trim">
<Conference beep="false" startConferenceOnEnter="true" trim="do-not-trim" streamUrl="wss://206.189.19.130:8765/">test
<Stream name="my_conference_stream"
url="wss://206.189.19.130:8765/"
streamStartConferenceOnEnter="true"
bidir="true">
<Parameter name="foo1" value="bar1"/>
<Parameter name="foo2" value="bar2"/>
</Stream>
</Conference>
</Dial>
</Response>

Bidirectional stream

The <Stream> instruction can allow you to receive audio into the call too. In this case, the stream must be bidirectional. The external service (e.g., an AI agent) will then be able to both hear the call and play audio.

To initialize a bidirectional stream, wrap the <Stream> instruction in <Connect> instead of <Start>:

<Connect>
<Stream url="wss://mystream.ngrok.io/audiostream" />
</Connect>

Starting and stopping streams

It is possible to stop a stream at any time by name. For instance by naming the Stream "mystream", you can later use the unique name of "mystream" to stop the stream.

<Start>
<Stream name="mystream" url="wss://mystream.ngrok.io/audiostream" />
</Start>
<Stop>
<Stream name="mystream" />
</Stop>

Custom parameters

To pass parameters towards the wss server, it is possible to include additional key value pairs. This can be done by using the nested <Parameter> cXML noun. These parameters will be added to the Start message, as json.

<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Start>
<Stream url="wss://your-application.com/audiostream" >
<Parameter name="Cookie" value ="948f9938-299a-d43e-0df4-af3a7eccb0ac"/>
<Parameter name="Type" value ="SIP" />
</Stream>
</Start>
</Response>

Notes on usage

  • The url does not support query string parameters. To pass custom key value pairs to the WebSocket, make use of Custom Parameters instead.
  • There is a one to one mapping of a stream to a websocket connection, therefore there will be at most one call being streamed over a single websocket connection. Information will be provided so that you can handle multiple inbound connections and manage the association between the unique stream identifier (StreamSid) and the connection.
  • On any given call there are inbound and outbound tracks, inbound represents the audio Signalwire receives from the call, outbound represents the audio generated by Signalwire for the Call.