Skip to main content

<Stream>

The <Stream> instruction makes it possible to send raw audio streams from a running phone call over WebSockets in near real time, to a specified URL. The audio frames themselves are base64 encoded, embedded in a json string, together with other information like sequence number and timestamp. The feature can be used with Speech-To-Text systems and others.

Attributes

An example on how to use Stream:

<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Start>
<Stream url="wss://your-application.com/audiostream" />
</Start>
</Response>

This cXML will instruct Signalwire to make a copy of the audio frames of the current call and send them in near real-time over WebSocket to wss://your-application.com/audiostream.

<Stream> will start the audio stream in an asynchronous manner; it will continue with the next cXML instruction at once. In case there is no instruction, Signalwire will disconnect the call.

const { RestClient } = require("@signalwire/compatibility-api");
const response = new RestClient.LaML.VoiceResponse();

const start = response.start();
start.stream({
name: "Example Audio Stream",
url: "wss://your-application.com/audiostream",
});

console.log(response.toString());
AttributeDescription
url requiredAbsolute or relative URL. A WebSocket connection to the url will be established and audio will start flowing towards the Websocket server. The only supported protocol is wss. For security reasons ws is NOT supported.
authBearerToken optionalAn authentication Bearer token that can be supplied when starting a stream. The remote server can then authenticate the websocket connection request from the supplied token. More information can be found in the WebSocket connection section.
codec optionalThe codecs attribute allows you to control the set codec to be used on the stream. Possible Values: L16@24000h & L16@16000h
name optionalUnique name for the Stream, per Call. It is used to stop a Stream by name.
realtime optionalIf true, and the stream is bidirectional, the stream offers a realtime experience to the call parties by managing packet delays and bursts. If false, the use benefits from buffered audio, which can be played out with delay. Default: false
track optionalThis attribute can be one of: inbound_track, outbound_track, both_tracks . Defaults to inbound_track. For both_tracks there will be both inbound_track and outbound_track events.
statusCallback optionalAbsolute or relative URL. SignalWire will make a HTTP GET or POST request to this URL when a Stream is started, stopped or there is an error.
statusCallbackMethod optionalGET or POST. The type of HTTP request to use when requesting a statusCallback. Default is POST.
Looking to use our REST APIs?

You can utilize our REST API to both start and stop streams.

StatusCallback parameters

For a statusCallback, SignalWire will send a request with the following parameters:

ParameterDescription
AccountSid stringThe unique ID of the Account this call is associated with.
CallSid stringA unique identifier for the call. May be used to later retrieve this message from the REST API.
StreamSid stringThe unique identifier for this Stream.
StreamName stringIf defined, this is the unique name of the Stream. Defaults to the StreamSid.
StreamEvent stringOne of stream-started, stream-stopped, or stream-error.
StreamError stringIf an error has occurred, this will contain a detailed error message.
Timestamp stringThe time of the event in ISO 8601 format.

WebSocket connection

When establishing a stream, SignalWire initiates a WebSocket connection to your specified URL endpoint. The connection begins with an HTTP upgrade request containing the following headers:

HeaderDescription
HostThe destination server hosting the WebSocket endpoint (e.g., "example.com")
UpgradeProtocol upgrade request indicating a switch to WebSocket (value: "websocket")
ConnectionConnection type for the upgrade (value: "Upgrade")
Sec-WebSocket-KeyBase64-encoded random value used for the WebSocket handshake
Sec-WebSocket-VersionWebSocket protocol version (value: "13")
AuthorizationBearer token for authentication if authBearerToken attribute is provided (format: "Bearer token_here")

Once the WebSocket connection is established, SignalWire will send various events throughout the stream's lifecycle. These events are delivered as JSON-formatted WebSocket messages, each containing an event property that identifies the message type.

The stream supports six distinct event types:

  • Connected - Initial handshake message confirming the connection
  • Start - Stream metadata and configuration details
  • Media - Audio data packets
  • DTMF - Touch-tone digit events
  • Mark - Audio playback completion acknowledgments
  • Stop - Stream termination notification

Connected message

SignalWire sends the Connected event immediately after establishing the WebSocket connection. This initial message outlines the communication protocol for all subsequent interactions.

PropertyDescription
timestampISO 8601 timestamp of when the message was sent.
directionDirection of the message (e.g., "inbound").
eventTypeThe event type. In this case, connected.
rawEventWrapper containing the core event information.
rawEvent.eventThe string value of "connected".
rawEvent.protocolDefines the protocol for the WebSocket connections lifetime. eg: "Call"
rawEvent.versionSemantic version of the protocol.

Example Connected message

{
"timestamp": "2025-09-26T16:25:04.792Z",
"direction": "inbound",
"eventType": "connected",
"rawEvent": {
"event": "connected",
"protocol": "Call",
"version": "0.2.0"
}
}

Start message

SignalWire delivers this message right after the Connected event, providing essential stream configuration details. This message appears only once when the stream initializes.

PropertyDescription
timestampISO 8601 timestamp of when the message was sent.
directionDirection of the message (e.g., "inbound").
eventTypeThe event type. In this case, start.
streamSidThe unique identifier of the stream as a string.
callSidThe call identifier as a string.
sequenceNumberMessage ordering counter, beginning at "1" and advancing with each communication.
rawEventContainer for the core event information.
rawEvent.eventThe string value of start.
rawEvent.sequenceNumberMessage sequence tracking within the raw event, starting from "1".
rawEvent.startContainer holding stream configuration and metadata details.
rawEvent.start.streamSidThe unique identifier of the Stream.
rawEvent.start.accountSidThe Account identifier that created the Stream.
rawEvent.start.callSidCall session identifier where the stream originated.
rawEvent.start.tracksArray specifying which audio directions will be transmitted in future messages. Options: "inbound", "outbound", or both.
rawEvent.start.customParametersContainer for custom key-value pairs configured during stream creation. Appears only when parameters are defined.
rawEvent.start.mediaFormatConfiguration details for audio data formatting in subsequent transmissions.
rawEvent.start.mediaFormat.encodingAudio compression format for transmitted data. Varies by codec: "audio/x-mulaw", "audio/x-L16", etc.
rawEvent.start.mediaFormat.sampleRateAudio sampling frequency in Hz for stream data. Configuration-dependent: 8000, 16000, 24000, etc.
rawEvent.start.mediaFormat.channelsAudio channel count for stream data. Typically 1 (mono), or 2 for both_tracks configurations.

Example Start message

{
"timestamp": "2025-09-26T16:25:04.794Z",
"direction": "inbound",
"eventType": "start",
"streamSid": "7d56cc11-536d-4a45-b4fb-ed3d55be843b",
"callSid": "76ac3c36-56da-4a3e-a0d6-b5f8df6da9ad",
"sequenceNumber": "1",
"rawEvent": {
"event": "start",
"sequenceNumber": "1",
"start": {
"streamSid": "7d56cc11-536d-4a45-b4fb-ed3d55be843b",
"accountSid": "b08dacad-2f6c-4de1-93d6-cc732e0c69c5",
"callSid": "76ac3c36-56da-4a3e-a0d6-b5f8df6da9ad",
"tracks": [
"inbound"
],
"mediaFormat": {
"encoding": "audio/x-L16",
"sampleRate": 24000,
"channels": 1
}
}
}
}

Media message

Media messages deliver the actual audio content from the call as it flows through the stream.

PropertyDescription
timestampISO 8601 timestamp of when the message was sent.
directionDirection of the message (e.g., "inbound").
eventTypeThe event type. In this case, media.
streamSidThe unique identifier of the stream as a string.
sequenceNumberSequential message counter for ordering, starting at "1" and incrementing per transmission.
rawEventWrapper for the core event information.
rawEvent.eventThe string value of media.
rawEvent.sequenceNumberSequential counter within the raw event data, beginning at "1".
rawEvent.mediaContainer with audio data and associated metadata.
rawEvent.media.trackOne of the strings inbound or outbound.
rawEvent.media.chunkThe chunk for the message. The first message will begin with number "1" and increment with each subsequent message.
rawEvent.media.timestampPresentation Timestamp in Milliseconds from the start of the stream.
rawEvent.media.payloadRaw audio encoded in base64.

Example Media message

{
"timestamp": "2025-09-26T16:25:24.821Z",
"direction": "inbound",
"eventType": "media",
"streamSid": "7d56cc11-536d-4a45-b4fb-ed3d55be843b",
"sequenceNumber": "1002",
"rawEvent": {
"event": "media",
"sequenceNumber": "1002",
"media": {
"track": "inbound",
"chunk": "1001",
"timestamp": "20041",
"payload": "[AUDIO_DATA_1280_BYTES]"
}
}
}

Stop message

SignalWire transmits a stop message when the stream terminates or the associated call concludes.

PropertyDescription
timestampISO 8601 timestamp of when the message was sent.
directionDirection of the message (e.g., "inbound").
eventTypeThe event type. In this case, stop.
streamSidThe unique identifier of the stream as a string.
sequenceNumberTransmission order tracking number, incrementing from "1".
rawEventContainer for the core event information.
rawEvent.eventThe string value of stop.
rawEvent.sequenceNumberOrder tracking within the raw event payload, starting at "1".

Example stop message

{
"timestamp": "2025-09-26T16:25:24.863Z",
"direction": "inbound",
"eventType": "stop",
"streamSid": "7d56cc11-536d-4a45-b4fb-ed3d55be843b",
"sequenceNumber": "1002",
"rawEvent": {
"event": "stop",
"sequenceNumber": "1002"
}
}

DTMF message

SignalWire generates DTMF messages whenever touch-tone key presses are detected in the audio stream.

PropertyDescription
timestampISO 8601 timestamp of when the message was sent.
directionDirection of the message (e.g., "inbound").
eventTypeThe event type. In this case, dtmf.
streamSidStream identifier for the connection.
sequenceNumberMessage ordering number, starting from "1" and incrementing with each transmission.
rawEventContainer for the core event information.
rawEvent.eventEvent type identifier set to dtmf.
rawEvent.streamSidStream identifier within the raw event data.
rawEvent.dtmfContainer holding the detected touch-tone details.
rawEvent.dtmf.durationTime span of the key press in milliseconds.
rawEvent.dtmf.digitThe numeric key that was pressed on the handset.

Example DTMF message

{
"timestamp": "2025-09-26T16:25:15.123Z",
"direction": "inbound",
"eventType": "dtmf",
"streamSid": "7d56cc11-536d-4a45-b4fb-ed3d55be843b",
"sequenceNumber": "25",
"rawEvent": {
"event": "dtmf",
"streamSid": "7d56cc11-536d-4a45-b4fb-ed3d55be843b",
"dtmf": {
"duration": 2700,
"digit": "8"
}
}
}

Mark message

SignalWire delivers mark messages as acknowledgments for completed audio playback or cleared buffer operations. These responses match the mark identifiers from your earlier transmissions to SignalWire.

PropertyDescription
timestampISO 8601 timestamp of when the message was sent.
directionDirection of the message (e.g., "inbound").
eventTypeEvent classification. Set to mark for completion tracking.
streamSidStream connection identifier.
rawEventWrapper containing the core event information.
rawEvent.eventEvent type designation set to mark.
rawEvent.streamSidStream connection identifier within the event payload.
rawEvent.markContainer with the mark acknowledgment details.
rawEvent.mark.nameThe mark identifier echoed back from your original transmission.

Example Mark message

{
"timestamp": "2025-09-26T16:25:24.740Z",
"direction": "inbound",
"eventType": "mark",
"streamSid": "7d56cc11-536d-4a45-b4fb-ed3d55be843b",
"rawEvent": {
"event": "mark",
"streamSid": "7d56cc11-536d-4a45-b4fb-ed3d55be843b",
"mark": {
"name": "item_CK5XvgANb1jVorf5VQlKQ:43500"
}
}
}

Clear message

SignalWire transmits clear messages to acknowledge buffer flush operations, usually triggered by clear commands from your application.

PropertyDescription
timestampISO 8601 timestamp of when the message was sent.
directionDirection of the message (e.g., "outbound").
eventTypeEvent classification. Set to clear for buffer operations.
streamSidStream connection identifier.
rawEventWrapper containing the core event information.
rawEvent.eventEvent type designation set to clear.
rawEvent.streamSidStream connection identifier within the event payload.

Example Clear message

{
"timestamp": "2025-09-26T16:25:12.018Z",
"direction": "outbound",
"eventType": "clear",
"streamSid": "7d56cc11-536d-4a45-b4fb-ed3d55be843b",
"rawEvent": {
"event": "clear",
"streamSid": "7d56cc11-536d-4a45-b4fb-ed3d55be843b"
}
}

Sending WebSocket Messages

When you create a Stream within a <Connect><Stream> element, the connection becomes bidirectional. Your application can transmit WebSocket messages to SignalWire, enabling you to inject audio into the active call and manage the stream's behavior.

The messages that your WebSocket server can send back to SignalWire are:

  • Media - Send audio data back into the call
  • Mark - Track when audio playback completes
  • Clear - Interrupt buffered audio

Send a media message

Transmitting audio to SignalWire requires constructing a valid media message with the correct structure.

The payload encoding depends on the codec specified in your Stream configuration:

  • Default (PCMU/mulaw): audio/x-mulaw with 8000 Hz sample rate
  • L16@16000h: Linear PCM with 16000 Hz sample rate
  • L16@24000h: Linear PCM with 24000 Hz sample rate

All audio must be base64 encoded. SignalWire queues incoming media messages and plays them sequentially. To stop playback and clear the queue, transmit a clear message.

warning

Ensure your media.payload contains only raw audio data without file format headers. Including format headers will result in corrupted audio playback.

PropertyDescription
eventSpecifies the message type. Set to "media" for audio data.
streamSidTarget stream identifier for audio playback
mediaContainer object holding the audio payload
media.payloadBase64-encoded audio data (format varies by codec configuration)

Example media message (payload abbreviated):

{
"event": "media",
"streamSid": "c0c7d59b-df06-435e-afbc-9217ce318390",
"media": {
"payload": "a3242sa..."
}
}

Send a mark message

Transmit a mark message following your media messages to receive confirmation when audio playback finishes. SignalWire responds with a matching mark identifier once the audio completes playing (or immediately if no audio is queued).

You'll also receive mark confirmations when the audio queue is cleared via a clear message.

PropertyDescription
eventMessage type identifier. Set to "mark" for completion tracking.
streamSidTarget stream identifier for the mark operation
markContainer object with mark details
mark.nameCustom identifier to track specific audio segments or playback events

Example mark message:

{
"event": "mark",
"streamSid": "c0c7d59b-df06-435e-afbc-9217ce318390",
"mark": {
"name": "my label"
}
}

Send a clear message

Transmit a clear message to halt audio playback and flush the audio queue. This action triggers SignalWire to return any pending mark messages for the cleared audio segments.

PropertyDescription
eventMessage type identifier. Set to "clear" for audio interruption.
streamSidTarget stream identifier where audio should be stopped.

Example clear message:

{
"event": "clear",
"streamSid": "c0c7d59b-df06-435e-afbc-9217ce318390"
}

Examples

Conference stream

<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Dial trim="do-not-trim">
<Conference beep="false" startConferenceOnEnter="true" trim="do-not-trim" streamUrl="wss://206.189.19.130:8765/">test
<Stream name="my_conference_stream"
url="wss://206.189.19.130:8765/"
streamStartConferenceOnEnter="true"
bidir="true">
<Parameter name="foo1" value="bar1"/>
<Parameter name="foo2" value="bar2"/>
</Stream>
</Conference>
</Dial>
</Response>

Bidirectional stream

The <Stream> instruction can allow you to receive audio into the call too. In this case, the stream must be bidirectional. The external service (e.g., an AI agent) will then be able to both hear the call and play audio.

To initialize a bidirectional stream, wrap the <Stream> instruction in <Connect> instead of <Start>:

<Connect>
<Stream url="wss://mystream.ngrok.io/audiostream" />
</Connect>

Starting and stopping streams

It is possible to stop a stream at any time by name. For instance by naming the Stream "mystream", you can later use the unique name of "mystream" to stop the stream.

<Start>
<Stream name="mystream" url="wss://mystream.ngrok.io/audiostream" />
</Start>
<Stop>
<Stream name="mystream" />
</Stop>

Custom parameters

To pass parameters towards the wss server, it is possible to include additional key value pairs. This can be done by using the nested <Parameter> cXML noun. These parameters will be added to the Start message, as json.

<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Start>
<Stream url="wss://your-application.com/audiostream" >
<Parameter name="Cookie" value ="948f9938-299a-d43e-0df4-af3a7eccb0ac"/>
<Parameter name="Type" value ="SIP" />
</Stream>
</Start>
</Response>

Notes on usage

  • The url does not support query string parameters. To pass custom key value pairs to the WebSocket, make use of Custom Parameters instead.
  • There is a one to one mapping of a stream to a websocket connection, therefore there will be at most one call being streamed over a single websocket connection. Information will be provided so that you can handle multiple inbound connections and manage the association between the unique stream identifier (StreamSid) and the connection.
  • On any given call there are inbound and outbound tracks, inbound represents the audio Signalwire receives from the call, outbound represents the audio generated by Signalwire for the Call.