Skip to main content

<Stream>

The <Stream> instruction makes it possible to send raw audio streams from a running phone call over WebSockets in near real time, to a specified URL. The audio frames themselves are base64 encoded, embedded in a json string, together with other information like sequence number and timestamp. The feature can be used with Speech-To-Text systems and others.

Attributes

An example on how to use Stream:

<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Start>
<Stream url="wss://your-application.com/audiostream" />
</Start>
</Response>

This cXML will instruct Signalwire to make a copy of the audio frames of the current call and send them in near real-time over WebSocket to wss://your-application.com/audiostream.

<Stream> will start the audio stream asynchronous manner it will continue with the next cXML instruction at once. In case there is no instruction, Signalwire disconnect the call.

const { RestClient } = require("@signalwire/compatibility-api");
const response = new RestClient.LaML.VoiceResponse();

const start = response.start();
start.stream({
name: "Example Audio Stream",
url: "wss://your-application.com/audiostream",
});

console.log(response.toString());
Attribute
urlAbsolute or relative URL. A WebSocket connection to the url will be established and audio will start flowing towards the Websocket server. The only supported protocol is wss. For security reasons ws is NOT supported.
name optionalUnique name for the Stream, per Call. It is used to stop a Stream by name.
track optionalThis attribute can be one of: inbound_track, outbound_track, both_tracks . Defaults to inbound_track. For both_tracks there will be both inbound_track and outbound_track events.
statusCallback optionalAbsolute or relative URL. SignalWire will make a HTTP GET or POST request to this URL when a Stream is started, stopped or there is an error.
statusCallbackMethod optionalGET or POST. The type of HTTP request to use when requesting a statusCallback. Default is POST.

StatusCallback Parameters

For a statusCallback, SignalWire will send a request with the following parameters:

Parameter
AccountSid stringThe unique ID of the Account this call is associated with.
CallSid stringA unique identifier for the call. May be used to later retrieve this message from the REST API.
StreamSid stringThe unique identifier for this Stream.
StreamName stringIf defined, this is the unique name of the Stream. Defaults to the StreamSid.
StreamEvent stringOne of stream-started, stream-stopped, or stream-error.
StreamError stringIf an error has occurred, this will contain a detailed error message.
Timestamp stringThe time of the event in ISO 8601 format.

Custom Parameters

To pass parameters towards the wss server, it is possible to include additional key value pairs. This can be done by using the nested <Parameter> cXML noun. These parameters will be added to the Start message, as json.

<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Start>
<Stream url="wss://your-application.com/audiostream" >
<Parameter name="Cookie" value ="948f9938-299a-d43e-0df4-af3a7eccb0ac"/>
<Parameter name="Type" value ="SIP" />
</Stream>
</Start>
</Response>

Stopping a Stream

It is possible to stop a stream at any time by name. For instance by naming the Stream "mystream", you can later use the unique name of "mystream" to stop the stream.

<Start>
<Stream name="mystream" url="wss://mystream.ngrok.io/audiostream" />
</Start>
<Stop>
<Stream name="mystream" />
</Stop>

Bidirectional Stream

The <Stream> instruction can allow you to receive audio into the call too. In this case, the stream must be bidirectional. The external service (e.g., an AI agent) will then be able to both hear the call and play audio.

To initialize a bidirectional stream, wrap the <Stream> instruction in <Connect> instead of <Start>:

<Connect>
<Stream url="wss://mystream.ngrok.io/audiostream" />
</Connect>

WebSocket Messages

There are 5 separate types of events that occur during the Stream's life cycle. These events are represented via WebSocket Messages: Connected, Start, Media, DTMF and Stop. Each message sent is a JSON string. The type of event which is occurring can be identified by using the event property of every JSON object.

Connected Message

The first message sent once a WebSocket connection is established is the Connected event. This message describes the protocol to expect in the following messages.

Event
eventThe string value of "connected"
protocolDefines the protocol for the WebSocket connections lifetime. eg: "Call"
versionSemantic version of the protocol.

Example Connected Message

{
"event": "connected",
"protocol": "Call",
"version": "0.2.0"
}

Start Message

This message contains important information about the Stream and is sent immediately after the Connected message. It is only sent once at the start of the Stream.

Event
eventThe string value of start.
sequenceNumberNumber used to keep track of message sending order. First message starts with number "1" and then is incremented.
startAn object containing Stream metadata.
start.streamSidThe unique identifier of the Stream.
start.accountSidThe Account identifier that created the Stream.
start.callSidThe Call identifier from where the Stream was started.
start.tracksAn array of values that indicates what media flows to expect in subsequent messages. Values are one of "inbound" or "outbound" or both.
start.customParametersAn object that represents the Custom Parameters that where set when defining the Stream.
start.mediaFormatAn object containing the format of the payload in the Media Messages.
start.mediaFormat.encodingThe encoding of the data in the upcoming payload. Default is "audio/x-mulaw".
start.mediaFormat.sampleRateThe Sample Rate in Hertz of the upcoming audio data. Default value is 8000, which is the rate of PCMU.
start.mediaFormat.channelsThe number of channels in the input audio data. Default value is 1. For both_tracks it will be 2.

Example Start Message

{
"event": "start",
"sequenceNumber": "2",
"start": {
"streamSid": "c0c7d59b-df06-435e-afbc-9217ce318390",
"accountSid": "123abc",
"callSid": "a30d16a5-0368-4104-afbf-14247e76a63d",
"tracks": ["inbound", "outbound"],
"customParameters": {
"FirstName": "Jane",
"LastName": "Doe",
"RemoteParty": "Bob"
},
"mediaFormat": {
"encoding": "audio/x-mulaw",
"sampleRate": 8000,
"channels": 1
}
}
}

Media Message

This message type encapsulates the raw audio data.

Event
eventThe string value of media.
sequenceNumberNumber used to keep track of message sending order. First message starts with number "1" and then is incremented for each message.
mediaAn object containing media metadata and payload.
media.trackOne of the strings inbound or outbound.
media.chunkThe chunk for the message. The first message will begin with number "1" and increment with each subsequent message.
media.timestampPresentation Timestamp in Milliseconds from the start of the stream.
media.payloadRaw audio encoded in base64.

Example Media Messages

Outbound

{
"event": "media",
"sequenceNumber": "3",
"media": {
"track": "outbound",
"chunk": "1",
"payload": "iY//DwkP/4+Jj/8PCQ//j4mP/w8JD/+PiY//DwkP/4+Jj/8PCQ//j4mP/w8JD/+PiY//DwkP/4+Jj/8PCQ//j4mP/w8JD/+PiY//DwkP/4+Jj/8PCQ//j4mP/w8JD/+PiY//DwkP/4+Jj/8PCQ//j4mP/w8JD/+PiY//DwkP/4+Jj/8PCQ//j4mP/w8JD/+PiY//DwkP/4+Jj/8PCQ//jw=="
}
}

Inbound

{
"event": "media",
"sequenceNumber": "4",
"media": {
"track": "inbound",
"chunk": "1",
"timestamp": "5",
"payload": "/4+Jj/8PCQ//j4mP/w8JD/+PiY//DwkP/4+Jj/8PCQ//j4mP/w8JD/+PiY//DwkP/4+Jj/8PCQ//j4mP/w8JD/+PiY//DwkP/4+Jj/8PCQ//j4mP/w8JD/+PiY//DwkP/4+Jj/8PCQ//j4mP/w8JD/+PiY//DwkP/4+Jj/8PCQ//j4mP/w8JD/+PiY//DwkP/4+Jj/8PCQ//j4mP/w8JDw=="
}
}

Stop Message

A stop message will be sent when the Stream is either stopped or the Call has ended.

Example Stop Message

{
"event": "stop",
"sequenceNumber": "5"
}
Event
eventThe string value of stop.
sequenceNumberNumber used to keep track of message sending order. First message starts with number "1" and then is incremented for each message.

DTMF Message

A DTMF message will be sent when the Stream receives a DTMF tone.

Event
eventThe string value of dtmf.
sequence_numberNumber, as a string, used to keep track of message-sending order. The first message starts with "1" and then is incremented for each message.
streamSidThe unique identifier of the stream as a string.
dtmfAn object containing the details of the detected DTMF.
dtmf.durationThe duration of the DTMF in milliseconds.
dtmf.digitThe digit, as a string, that corresponds to the DTMF.

Example DTMF Message

{
"event": "dtmf",
"sequence_number": "1",
"streamSid": "c0c7d59b-df06-435e-afbc-9217ce318390",
"dtmf": {
"duration": 2700,
"digit": "8"
}
}

Clear Message

Send the clear event message if you would like to interrupt the audio that has been sent various media event messages. This will empty all buffered audio.

Event
eventThe string value of clear.
streamSidThe unique identifier of the stream as a string.

Example Clear Message

{ 
"event": "clear",
"streamSid": "c0c7d59b-df06-435e-afbc-9217ce318390"
}

Notes on Usage

  • The url does not support query string parameters. To pass custom key value pairs to the WebSocket, make use of Custom Parameters instead.
  • There is a one to one mapping of a stream to a websocket connection, therefore there will be at most one call being streamed over a single websocket connection. Information will be provided so that you can handle handle multiple inbound connections and manage the association between the unique stream identifier (StreamSid) and the connection.
  • On any given call there are inbound and outbound tracks, inbound represents the audio Signalwire receives from the call, outbound represents the audio generated by Signalwire for the Call.