<Stream>
The <Stream>
instruction makes it possible to send raw audio streams from a running phone call over WebSockets in near real time, to a specified URL. The audio frames themselves are base64 encoded, embedded in a json string, together with other information like sequence number and timestamp. The feature can be used with Speech-To-Text systems and others.
Attributes
An example on how to use Stream:
<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Start>
<Stream url="wss://your-application.com/audiostream" />
</Start>
</Response>
This cXML will instruct Signalwire to make a copy of the audio frames of the current call and send them in near real-time over WebSocket to wss://your-application.com/audiostream.
<Stream>
will start the audio stream asynchronous manner it will continue with the next cXML instruction at once. In case there is no instruction, Signalwire disconnect the call.
- JavaScript
- C#
- Python
- Ruby
const { RestClient } = require("@signalwire/compatibility-api");
const response = new RestClient.LaML.VoiceResponse();
const start = response.start();
start.stream({
name: "Example Audio Stream",
url: "wss://your-application.com/audiostream",
});
console.log(response.toString());
using System;
using Twilio.TwiML;
using Twilio.TwiML.Voice;
class Example
{
static void Main()
{
var response = new VoiceResponse();
var start = new Start();
start.Stream(name: "Example Audio Stream", url: "wss://your-application.com/audiostream");
response.Append(start);
Console.WriteLine(response.ToString());
}
}
from twilio.twiml.voice_response import Parameter, VoiceResponse, Start, Stream
response = VoiceResponse()
start = Start()
stream = Stream(url='wss://your-application.com/audiostream')
stream.parameter(name='FirstName', value='Jane')
stream.parameter(name='LastName', value='Doe')
start.append(stream)
response.append(start)
print(response)
require 'signalwire/sdk'
response = Signalwire::Sdk::VoiceResponse.new
response.start do |start|
start.stream(url: 'wss://your-application.com/audiostream') do |stream|
stream.parameter(name: 'FirstName', value: 'Jane')
stream.parameter(name: 'LastName', value: 'Doe')
end
end
puts response
Attribute | |
---|---|
url | Absolute or relative URL. A WebSocket connection to the url will be established and audio will start flowing towards the Websocket server. The only supported protocol is wss . For security reasons ws is NOT supported. |
name optional | Unique name for the Stream, per Call. It is used to stop a Stream by name. |
track optional | This attribute can be one of: inbound_track , outbound_track , both_tracks . Defaults to inbound_track . For both_tracks there will be both inbound_track and outbound_track events. |
statusCallback optional | Absolute or relative URL. SignalWire will make a HTTP GET or POST request to this URL when a Stream is started, stopped or there is an error. |
statusCallbackMethod optional | GET or POST. The type of HTTP request to use when requesting a statusCallback. Default is POST. |
StatusCallback
Parameters
For a statusCallback
, SignalWire will send a request with the following parameters:
Parameter | |
---|---|
AccountSid string | The unique ID of the Account this call is associated with. |
CallSid string | A unique identifier for the call. May be used to later retrieve this message from the REST API. |
StreamSid string | The unique identifier for this Stream. |
StreamName string | If defined, this is the unique name of the Stream. Defaults to the StreamSid. |
StreamEvent string | One of stream-started , stream-stopped , or stream-error . |
StreamError string | If an error has occurred, this will contain a detailed error message. |
Timestamp string | The time of the event in ISO 8601 format. |
Custom Parameters
To pass parameters towards the wss
server, it is possible to include additional key value pairs.
This can be done by using the nested <Parameter>
cXML noun. These parameters will be added to the Start
message, as json.
<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Start>
<Stream url="wss://your-application.com/audiostream" >
<Parameter name="Cookie" value ="948f9938-299a-d43e-0df4-af3a7eccb0ac"/>
<Parameter name="Type" value ="SIP" />
</Stream>
</Start>
</Response>
Stopping a Stream
It is possible to stop a stream at any time by name. For instance by naming the Stream "mystream", you can later use the unique name of "mystream" to stop the stream.
<Start>
<Stream name="mystream" url="wss://mystream.ngrok.io/audiostream" />
</Start>
<Stop>
<Stream name="mystream" />
</Stop>
Bidirectional Stream
The <Stream>
instruction can allow you to receive audio into the call too. In
this case, the stream must be bidirectional. The external service (e.g., an AI
agent) will then be able to both hear the call and play audio.
To initialize a bidirectional stream, wrap the <Stream>
instruction in <Connect>
instead of <Start>
:
<Connect>
<Stream url="wss://mystream.ngrok.io/audiostream" />
</Connect>
WebSocket Messages
There are 5 separate types of events that occur during the Stream's life cycle.
These events are represented via WebSocket Messages: Connected
, Start
, Media
, DTMF
and Stop
.
Each message sent is a JSON string.
The type of event which is occurring can be identified by using the event
property of every JSON object.
Connected Message
The first message sent once a WebSocket connection is established is the Connected event. This message describes the protocol to expect in the following messages.
Event | |
---|---|
event | The string value of "connected" |
protocol | Defines the protocol for the WebSocket connections lifetime. eg: "Call" |
version | Semantic version of the protocol. |
Example Connected Message
{
"event": "connected",
"protocol": "Call",
"version": "0.2.0"
}
Start Message
This message contains important information about the Stream and is sent immediately after the Connected
message.
It is only sent once at the start of the Stream.
Event | |
---|---|
event | The string value of start . |
sequenceNumber | Number used to keep track of message sending order. First message starts with number "1" and then is incremented. |
start | An object containing Stream metadata. |
start.streamSid | The unique identifier of the Stream. |
start.accountSid | The Account identifier that created the Stream. |
start.callSid | The Call identifier from where the Stream was started. |
start.tracks | An array of values that indicates what media flows to expect in subsequent messages. Values are one of "inbound" or "outbound" or both. |
start.customParameters | An object that represents the Custom Parameters that where set when defining the Stream. |
start.mediaFormat | An object containing the format of the payload in the Media Messages. |
start.mediaFormat.encoding | The encoding of the data in the upcoming payload. Default is "audio/x-mulaw". |
start.mediaFormat.sampleRate | The Sample Rate in Hertz of the upcoming audio data. Default value is 8000, which is the rate of PCMU. |
start.mediaFormat.channels | The number of channels in the input audio data. Default value is 1. For both_tracks it will be 2. |
Example Start Message
{
"event": "start",
"sequenceNumber": "2",
"start": {
"streamSid": "c0c7d59b-df06-435e-afbc-9217ce318390",
"accountSid": "123abc",
"callSid": "a30d16a5-0368-4104-afbf-14247e76a63d",
"tracks": ["inbound", "outbound"],
"customParameters": {
"FirstName": "Jane",
"LastName": "Doe",
"RemoteParty": "Bob"
},
"mediaFormat": {
"encoding": "audio/x-mulaw",
"sampleRate": 8000,
"channels": 1
}
}
}
Media Message
This message type encapsulates the raw audio data.
Event | |
---|---|
event | The string value of media . |
sequenceNumber | Number used to keep track of message sending order. First message starts with number "1" and then is incremented for each message. |
media | An object containing media metadata and payload. |
media.track | One of the strings inbound or outbound . |
media.chunk | The chunk for the message. The first message will begin with number "1" and increment with each subsequent message. |
media.timestamp | Presentation Timestamp in Milliseconds from the start of the stream. |
media.payload | Raw audio encoded in base64. |
Example Media Messages
Outbound
{
"event": "media",
"sequenceNumber": "3",
"media": {
"track": "outbound",
"chunk": "1",
"payload": "iY//DwkP/4+Jj/8PCQ//j4mP/w8JD/+PiY//DwkP/4+Jj/8PCQ//j4mP/w8JD/+PiY//DwkP/4+Jj/8PCQ//j4mP/w8JD/+PiY//DwkP/4+Jj/8PCQ//j4mP/w8JD/+PiY//DwkP/4+Jj/8PCQ//j4mP/w8JD/+PiY//DwkP/4+Jj/8PCQ//j4mP/w8JD/+PiY//DwkP/4+Jj/8PCQ//jw=="
}
}
Inbound
{
"event": "media",
"sequenceNumber": "4",
"media": {
"track": "inbound",
"chunk": "1",
"timestamp": "5",
"payload": "/4+Jj/8PCQ//j4mP/w8JD/+PiY//DwkP/4+Jj/8PCQ//j4mP/w8JD/+PiY//DwkP/4+Jj/8PCQ//j4mP/w8JD/+PiY//DwkP/4+Jj/8PCQ//j4mP/w8JD/+PiY//DwkP/4+Jj/8PCQ//j4mP/w8JD/+PiY//DwkP/4+Jj/8PCQ//j4mP/w8JD/+PiY//DwkP/4+Jj/8PCQ//j4mP/w8JDw=="
}
}
Stop Message
A stop message will be sent when the Stream is either stopped or the Call has ended.
Example Stop Message
{
"event": "stop",
"sequenceNumber": "5"
}
Event | |
---|---|
event | The string value of stop . |
sequenceNumber | Number used to keep track of message sending order. First message starts with number "1" and then is incremented for each message. |
DTMF Message
A DTMF message will be sent when the Stream receives a DTMF tone.
Event | |
---|---|
event | The string value of dtmf . |
sequence_number | Number, as a string, used to keep track of message-sending order. The first message starts with "1" and then is incremented for each message. |
streamSid | The unique identifier of the stream as a string. |
dtmf | An object containing the details of the detected DTMF. |
dtmf.duration | The duration of the DTMF in milliseconds. |
dtmf.digit | The digit, as a string, that corresponds to the DTMF. |
Example DTMF Message
{
"event": "dtmf",
"sequence_number": "1",
"streamSid": "c0c7d59b-df06-435e-afbc-9217ce318390",
"dtmf": {
"duration": 2700,
"digit": "8"
}
}
Clear Message
Send the clear event message if you would like to interrupt the audio that has been sent various media event messages. This will empty all buffered audio.
Event | |
---|---|
event | The string value of clear . |
streamSid | The unique identifier of the stream as a string. |
Example Clear Message
{
"event": "clear",
"streamSid": "c0c7d59b-df06-435e-afbc-9217ce318390"
}
Notes on Usage
- The url does not support query string parameters. To pass custom key value pairs to the WebSocket, make use of Custom Parameters instead.
- There is a one to one mapping of a stream to a websocket connection, therefore there will be at most one call being streamed over a single websocket connection. Information will be provided so that you can handle handle multiple inbound connections and manage the association between the unique stream identifier (StreamSid) and the connection.
- On any given call there are inbound and outbound tracks,
inbound
represents the audio Signalwire receives from the call,outbound
represents the audio generated by Signalwire for the Call.