Supported Voices and Languages
SignalWire's cloud platform integrates with leading third-party text-to-speech (TTS) providers. This guide describes supported engines, voices, and languages. Refer to each provider's documentation for up-to-date model details and service information.
Compare providers and models
SignalWire's TTS providers offer a wide range of voice engines optimized for various applications. Select a provider, model, and voice according to the following considerations:
Cost: When cost-efficiency is the top priority, select a Standard-tier voice from Google Cloud or Amazon Polly. Review our pricing information to learn more.
Language support: Amazon Polly, ElevenLabs, Google Cloud, and OpenAI offer a wide range of supported languages. In addition, all ElevenLabs and OpenAI voices are fully multilingual.
Model quality and realism: All four supported providers offer high-quality engines: Google Cloud's WaveNet and Neural2, Amazon Polly Neural, ElevenLabs' Multilingual v2, and Deepgram's Aura are all optimized for voice quality.
SSML support: Google Cloud and Amazon Polly support SSML
(Speech Synthesis Markup Language) as a string wrapped in <speak>
tags.
Consult Google Cloud's SSML docs for details.
Refer to the Amazon Polly docs for more information on
using SSML
and supported SSML tags.
Use voice identifier strings
Compose voice identifier strings using the <engine>.<voice id>
format.
First, select your engine using the gcloud
, polly
, elevenlabs
, or deepgram
identifier.
Append a period (.
), followed by the specific voice ID from the TTS provider.
Voice identifier strings are case insensitive.
For example,
gcloud.en-US-Neural2-A
,
gcloud.en-us-neural2-a
, and
GCLOUD.EN-US-NEURAL2-A
are equivalent.
For detailed instructions for each provider, consult the voice ID references linked in the Usage column of the below table.
TTS provider | Engine code | Sample voice ID string | Usage |
---|---|---|---|
Amazon Polly | polly | polly.Joanna-Neural | Reference |
Deepgram | deepgram | deepgram.aura-asteria-en | Reference |
ElevenLabs | elevenlabs | elevenlabs.thomas | Reference |
Google Cloud | gcloud | gcloud.en-US-Casual-K | Reference |
OpenAI | openai | openai.alloy | Reference |
TTS providers
Amazon Polly
Amazon Web Services' Polly TTS engine includes several models to accommodate different use cases. SignalWire supports the Standard, Neural, and Generative models:
- Standard
is a traditional, cost-effective, and reliable TTS model.
It is less natural-sounding but more budget-friendly than Polly Neural. Example voice identifier string:
polly.Emma
- Neural
is an advanced model designed to produce speech that is more natural and closer to human-like pronunciation and intonation. Example voice identifier string:
polly.Emma-Neural
Set language for Amazon Polly voices
Most Amazon Polly voices support a single language. Select a language by choosing from the list of supported voices.
All Amazon Polly voices support accented bilingual pronunciation
through the use of the SSML lang
tag.
Amazon Polly also offers some fully bilingual voices designed to fluently speak two languages.
Amazon Polly voice IDs
Polly voices are identified by the voice name (like Amy
, Matthew
, Mia
, Zhiyu
, etc ) only,
except when the voice exists in multiple models.
In that case, append a code after a dash -
to specify variations of the model, like neural
or generative
.
If no model code is specified, the Standard model will be used.
Example string | Model used |
---|---|
polly.Amy | Standard |
polly.Amy-Neural | Neural |
polly.Amy-Generative | Generative |
Use Amazon Polly voices on the SignalWire platform
- SWML
- RELAY Realtime SDK
- Call Flow Builder
- CXML
Use the
languages
SWML method to set one or more voices for an AI agent.
version: 1.0.0
sections:
main:
- ai:
prompt:
text: Have an open-ended conversation about flowers.
languages:
- name: English
code: en-US
voice: polly.Ruth-Neural
Alternatively, use the say_voice
parameter
of the play
SWML method to select a voice for basic TTS.
version: 1.0.0
sections:
main:
- set:
say_voice: "polly.Ruth-Neural"
- play: "say:Greetings. This is the Ruth voice from Amazon Polly's Neural text-to-speech model."
// This example uses the Node.js SDK for SignalWire's RELAY Realtime API.
const playback = await call.playTTS({
text: "Greetings. This is the Ruth voice from Amazon Polly's Neural text-to-speech model.",
voice: "polly.Ruth-Neural",
});
await playback.ended();
<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Say voice="polly.Ruth-Neural">
Greetings. This is the Ruth voice from Amazon Polly's Neural text-to-speech model.
</Say>
</Response>
Deepgram
Deepgram offers a range of English-speaking voices for its text-to-speech API, each designed to produce natural-sounding speech output in an array of different accents and speaking styles.
Deepgram's voices are promised to have human-like tones, rhythm, and emotion, lower than 250 ms latency, and are optimized for high-throughput applications.
Consult Deepgram's TTS models guide for more information and samples for supported voices.
Deepgram voice IDs
Copy the voice ID from the Values column of Deepgram's
Voice Selection reference.
Prepend deepgram.
and the string is ready for use.
For example: deepgram.aura-athena-en
Use Deepgram voices on the SignalWire platform
- SWML
- RELAY Realtime SDK
- Call Flow Builder
- CXML
Use the
languages
SWML method to set one or more voices for an AI agent.
version: 1.0.0
sections:
main:
- ai:
prompt:
text: Have an open-ended conversation about flowers.
languages:
- name: English
code: en-US
voice: deepgram.aura-asteria-en
Alternatively, use the say_voice
parameter
of the play
SWML method to select a voice for basic TTS.
version: 1.0.0
sections:
main:
- set:
say_voice: "deepgram.aura-asteria-en"
- play: "say:Greetings. This is the Asteria voice from Deepgram's Aura text-to-speech model."
// This example uses the Node.js SDK for SignalWire's RELAY Realtime API.
const playback = await call.playTTS({
text: "Greetings. This is the Asteria voice from Deepgram's Aura text-to-speech model.",
voice: "deepgram.aura-asteria-en",
});
await playback.ended();
Deepgram voices are not yet supported in Call Flow Builder.
<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Say voice="deepgram.aura-asteria-en">
Greetings. This is the Asteria voice from Deepgram's Aura text-to-speech model.
</Say>
</Response>
ElevenLabs
ElevenLabs voices offer expressive, human-like pronunciation and an extensive list of supported languages.
SignalWire supports the following voices in the Multilingual v2
model:
Voices | Languages |
---|---|
rachel , clyde , domi , dave , fin , antoni , thomas , charlie , emily , elli , callum , patrick , harry , liam , dorothy , josh , arnold , charlotte , matilda , matthew , james , joseph , jeremy , michael , ethan , gigi , freya , grace , daniel , serena , adam , nicole , jessie , ryan , sam , glinda , giovanni , mimi | 🇺🇸 English (USA), 🇬🇧 English (UK), 🇦🇺 English (Australia), 🇨🇦 English (Canada), 🇯🇵 Japanese, 🇨🇳 Chinese, 🇩🇪 German, 🇮🇳 Hindi, 🇫🇷 French (France), 🇨🇦 French (Canada), 🇰🇷 Korean, 🇧🇷 Portuguese (Brazil), 🇵🇹 Portuguese (Portugal), 🇮🇹 Italian, 🇪🇸 Spanish (Spain), 🇲🇽 Spanish (Mexico), 🇮🇩 Indonesian, 🇳🇱 Dutch, 🇹🇷 Turkish, 🇵🇭 Filipino, 🇵🇱 Polish, 🇸🇪 Swedish, 🇧🇬 Bulgarian, 🇷🇴 Romanian, 🇸🇦 Arabic (Saudi Arabia), 🇦🇪 Arabic (UAE), 🇨🇿 Czech, 🇬🇷 Greek, 🇫🇮 Finnish, 🇭🇷 Croatian, 🇲🇾 Malay, 🇸🇰 Slovak, 🇩🇰 Danish, 🇮🇳 Tamil, 🇺🇦 Ukrainian, 🇷🇺 Russian |
Language selection with ElevenLabs voices
Multilingual v2 voices are designed to be interchangeably compatible with all supported languages.
Rather than enforcing language selection with language code
,
this TTS model automatically uses the appropriate language of the input text.
Consult ElevenLabs' supported languages resource for an up-to-date list of supported languages.
ElevenLabs voice IDs
Copy the voice ID from our list of supported ElevenLabs voices.
Prepend elevenlabs.
and the string is ready for use.
For example: elevenlabs.sam
Use ElevenLabs voices on the SignalWire platform
- SWML
- RELAY Realtime SDK
- Call Flow Builder
- CXML
Use the
languages
SWML method to set one or more voices for an AI agent.
version: 1.0.0
sections:
main:
- ai:
prompt:
text: Have an open-ended conversation about flowers.
languages:
- name: English
code: en-US
voice: elevenlabs.rachel
Alternatively, use the say_voice
parameter
of the play
SWML method to select a voice for basic TTS.
version: 1.0.0
sections:
main:
- set:
say_voice: "elevenlabs.rachel"
- play: "say:Greetings. This is the Rachel voice, speaking in English, from ElevenLabs' Multilingual v2 text-to-speech model."
// This example uses the Node.js SDK for SignalWire's RELAY Realtime API.
const playback = await call.playTTS({
text: "Greetings. This is the Rachel voice, speaking in English, from ElevenLabs' Multilingual v2 text-to-speech model.",
voice: "elevenlabs.rachel",
});
await playback.ended();
ElevenLabs voices are not yet supported in Call Flow Builder.
<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Say voice="elevenlabs.rachel">
Greetings. This is the Rachel voice, speaking in English, from ElevenLabs' Multilingual v2 text-to-speech model.
</Say>
</Response>
Google Cloud
Google Cloud offers a number of robust text-to-speech voice models. SignalWire supports all Google Cloud voices in both General Availability and Preview launch stages, except for the Studio model.
- Standard is a basic, reliable, and budget-friendly text-to-speech model. The Standard model is less natural-sounding than WaveNet and Neural2, but more cost-effective.
- WaveNet is powered by deep learning technology and offers more natural and lifelike speech output.
- Neural2 is based on the same technology used to create Custom Voices and prioritizes natural and human-like pronunciation and intonation.
- Polyglot
voices have variants in multiple languages. For example, at time of writing,
the
polyglot-1
voice has variants for English (Australia), English (US), French, German, Spanish (Spain), and Spanish (US).
Set language for Google Cloud voices
Sample all available voices with Google's supported voices and languages reference. Copy the voice identifier string in whole from the Voice name column.
Unlike the other supported engines, Google Cloud voice identifier strings include both voice and language keys,
following the pattern <language>-<model>-<variant>
.
For example:
- English (UK) WaveNet female voice:
en-GB-Wavenet-A
- Spanish (Spain) Neural2 male voice:
es-ES-Neural2-B
- Mandarin Chinese Standard female voice:
cmn-CN-Standard-D
Google Cloud voice IDs
Copy the voice ID in whole from the Voice name column of Google's table of
supported voices.
Google Cloud voice IDs encode language and model information, so no modification is needed to make these selections.
Prepend gcloud.
and the string is ready for use.
For example: gcloud.en-GB-Wavenet-A
Use Google Cloud voices on the SignalWire platform
- SWML
- RELAY Realtime SDK
- Call Flow Builder
- CXML
Use the
languages
SWML method to set one or more voices for an AI agent.
version: 1.0.0
sections:
main:
- ai:
prompt:
text: Have an open-ended conversation about flowers.
languages:
- name: English
code: en-US
voice: gcloud.en-US-Neural2-A
Alternatively, use the say_voice
parameter
of the play
SWML method to select a voice for basic TTS.
version: 1.0.0
sections:
main:
- set:
say_voice: "gcloud.en-US-Neural2-A"
- play: "say:Greetings. This is the 2-A US English voice from Google Cloud's Neural2 text-to-speech model."
// This example uses the Node.js SDK for SignalWire's RELAY Realtime API.
const playback = await call.playTTS({
text: "Greetings. This is the 2-A US English voice from Google Cloud's Neural2 text-to-speech model.",
voice: "gcloud.en-US-Neural2-A",
});
await playback.ended();
<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Say voice="gcloud.en-US-Neural2-A">
Greetings. This is the 2-A Neural2 English voice from Google Cloud.
</Say>
</Response>
OpenAI
OpenAI offers versatile multilingual voices balancing low latency and good quality. While voices are optimized for English, they perform well across all supported languages.
Consult OpenAI's Text-to-Speech documentation for more information and audio samples for available voices.
OpenAI voice IDs
Copy the voice ID from OpenAI's Voice Options reference.
Prepend openai.
and the string is ready for use.
For example: openai.alloy
Use OpenAI voices on the SignalWire platform
- SWML
- RELAY Realtime SDK
- Call Flow Builder
- CXML
Use the
languages
SWML method to set one or more voices for an AI agent.
version: 1.0.0
sections:
main:
- ai:
prompt:
text: Have an open-ended conversation about flowers.
languages:
- name: English
code: en-US
voice: openai.alloy
Alternatively, use the say_voice
parameter
of the play
SWML method to select a voice for basic TTS.
version: 1.0.0
sections:
main:
- set:
say_voice: "openai.alloy"
- play: "say:Greetings. This is the Alloy voice from OpenAI's text-to-speech model."
// This example uses the Node.js SDK for SignalWire's RELAY Realtime API.
const playback = await call.playTTS({
text: "Greetings. This is the Alloy voice from OpenAI's text-to-speech model.",
voice: "openai.alloy",
});
await playback.ended();
OpenAI voices are not yet supported in Call Flow Builder.
<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Say voice="openai.alloy">
Greetings. This is the Alloy voice from OpenAI's text-to-speech model.
</Say>
</Response>
Pricing
Voices are priced according to model in three tiers. Consult our Voice API Pricing for up-to-date pricing information.
Standard
- Google Cloud Standard
- Amazon Polly Standard
Premium
- Google Cloud Neural2, WaveNet, and Journey
- Amazon Polly Neural and Generative
- Deepgram Aura
ElevenLabs voices have their own tier.