Skip to main content

Supported Voices and Languages

SignalWire's cloud platform integrates with leading third-party text-to-speech (TTS) providers. This guide describes supported engines, voices, and languages. Refer to each provider's documentation for up-to-date model details and service information.

Compare providers and models

SignalWire's TTS providers offer a wide range of voice engines optimized for various applications. Select a provider, model, and voice according to the following considerations:

Cost: When cost-efficiency is the top priority, select a Standard-tier voice from Google Cloud or Amazon Polly. Review our pricing information to learn more.

Language support: Amazon Polly, ElevenLabs, Google Cloud, and OpenAI offer a wide range of supported languages. In addition, all ElevenLabs and OpenAI voices are fully multilingual.

Model quality and realism: All four supported providers offer high-quality engines: Google Cloud's WaveNet and Neural2, Amazon Polly Neural, ElevenLabs' Multilingual v2, and Deepgram's Aura are all optimized for voice quality.

SSML support: Google Cloud and Amazon Polly support SSML (Speech Synthesis Markup Language) as a string wrapped in <speak> tags. Consult Google Cloud's SSML docs for details. Refer to the Amazon Polly docs for more information on using SSML and supported SSML tags.

Use voice identifier strings

Compose voice identifier strings using the <engine>.<voice id> format.

First, select your engine using the gcloud, polly, elevenlabs, or deepgram identifier. Append a period (.), followed by the specific voice ID from the TTS provider.

Case insensitivity

Voice identifier strings are case insensitive. For example, gcloud.en-US-Neural2-A, gcloud.en-us-neural2-a, and GCLOUD.EN-US-NEURAL2-A are equivalent.

For detailed instructions for each provider, consult the voice ID references linked in the Usage column of the below table.

TTS providerEngine codeSample voice ID stringUsage
Amazon Pollypollypolly.Joanna-NeuralReference
Deepgramdeepgramdeepgram.aura-asteria-enReference
ElevenLabselevenlabselevenlabs.thomasReference
Google Cloudgcloudgcloud.en-US-Casual-KReference
OpenAIopenaiopenai.alloyReference


TTS providers

Amazon Polly

Amazon Web Services' Polly TTS engine includes several models to accommodate different use cases. SignalWire supports the Standard, Neural, and Generative models:

  • Standard is a traditional, cost-effective, and reliable TTS model. It is less natural-sounding but more budget-friendly than Polly Neural. Example voice identifier string: polly.Emma
  • Neural is an advanced model designed to produce speech that is more natural and closer to human-like pronunciation and intonation. Example voice identifier string: polly.Emma-Neural

Set language for Amazon Polly voices

Most Amazon Polly voices support a single language. Select a language by choosing from the list of supported voices.

All Amazon Polly voices support accented bilingual pronunciation through the use of the SSML lang tag.

Amazon Polly also offers some fully bilingual voices designed to fluently speak two languages.

Amazon Polly voice IDs

Polly voices are identified by the voice name (like Amy, Matthew, Mia, Zhiyu, etc ) only, except when the voice exists in multiple models. In that case, append a code after a dash - to specify variations of the model, like neural or generative. If no model code is specified, the Standard model will be used.

Example stringModel used
polly.AmyStandard
polly.Amy-NeuralNeural
polly.Amy-GenerativeGenerative
Use Amazon Polly voices on the SignalWire platform

Use the languages SWML method to set one or more voices for an AI agent.

version: 1.0.0
sections:
main:
- ai:
prompt:
text: Have an open-ended conversation about flowers.
languages:
- name: English
code: en-US
voice: polly.Ruth-Neural

Alternatively, use the say_voice parameter of the play SWML method to select a voice for basic TTS.

version: 1.0.0
sections:
main:
- set:
say_voice: "polly.Ruth-Neural"
- play: "say:Greetings. This is the Ruth voice from Amazon Polly's Neural text-to-speech model."

Deepgram

Deepgram offers a range of English-speaking voices for its text-to-speech API, each designed to produce natural-sounding speech output in an array of different accents and speaking styles.

Deepgram's voices are promised to have human-like tones, rhythm, and emotion, lower than 250 ms latency, and are optimized for high-throughput applications.

Consult Deepgram's TTS models guide for more information and samples for supported voices.

Deepgram voice IDs

Copy the voice ID from the Values column of Deepgram's Voice Selection reference. Prepend deepgram. and the string is ready for use. For example: deepgram.aura-athena-en

Use Deepgram voices on the SignalWire platform

Use the languages SWML method to set one or more voices for an AI agent.

version: 1.0.0
sections:
main:
- ai:
prompt:
text: Have an open-ended conversation about flowers.
languages:
- name: English
code: en-US
voice: deepgram.aura-asteria-en

Alternatively, use the say_voice parameter of the play SWML method to select a voice for basic TTS.

version: 1.0.0
sections:
main:
- set:
say_voice: "deepgram.aura-asteria-en"
- play: "say:Greetings. This is the Asteria voice from Deepgram's Aura text-to-speech model."

ElevenLabs

ElevenLabs voices offer expressive, human-like pronunciation and an extensive list of supported languages. SignalWire supports the following voices in the Multilingual v2 model:

VoicesLanguages
rachel, clyde, domi, dave, fin, antoni, thomas, charlie, emily, elli, callum, patrick, harry, liam, dorothy, josh, arnold, charlotte, matilda, matthew, james, joseph, jeremy, michael, ethan, gigi, freya, grace, daniel, serena, adam, nicole, jessie, ryan, sam, glinda, giovanni, mimi🇺🇸 English (USA), 🇬🇧 English (UK), 🇦🇺 English (Australia), 🇨🇦 English (Canada), 🇯🇵 Japanese, 🇨🇳 Chinese, 🇩🇪 German, 🇮🇳 Hindi, 🇫🇷 French (France), 🇨🇦 French (Canada), 🇰🇷 Korean, 🇧🇷 Portuguese (Brazil), 🇵🇹 Portuguese (Portugal), 🇮🇹 Italian, 🇪🇸 Spanish (Spain), 🇲🇽 Spanish (Mexico), 🇮🇩 Indonesian, 🇳🇱 Dutch, 🇹🇷 Turkish, 🇵🇭 Filipino, 🇵🇱 Polish, 🇸🇪 Swedish, 🇧🇬 Bulgarian, 🇷🇴 Romanian, 🇸🇦 Arabic (Saudi Arabia), 🇦🇪 Arabic (UAE), 🇨🇿 Czech, 🇬🇷 Greek, 🇫🇮 Finnish, 🇭🇷 Croatian, 🇲🇾 Malay, 🇸🇰 Slovak, 🇩🇰 Danish, 🇮🇳 Tamil, 🇺🇦 Ukrainian, 🇷🇺 Russian

Language selection with ElevenLabs voices

Multilingual v2 voices are designed to be interchangeably compatible with all supported languages. Rather than enforcing language selection with language code, this TTS model automatically uses the appropriate language of the input text.

Consult ElevenLabs' supported languages resource for an up-to-date list of supported languages.

ElevenLabs voice IDs

Copy the voice ID from our list of supported ElevenLabs voices. Prepend elevenlabs. and the string is ready for use. For example: elevenlabs.sam

Use ElevenLabs voices on the SignalWire platform

Use the languages SWML method to set one or more voices for an AI agent.

version: 1.0.0
sections:
main:
- ai:
prompt:
text: Have an open-ended conversation about flowers.
languages:
- name: English
code: en-US
voice: elevenlabs.rachel

Alternatively, use the say_voice parameter of the play SWML method to select a voice for basic TTS.

version: 1.0.0
sections:
main:
- set:
say_voice: "elevenlabs.rachel"
- play: "say:Greetings. This is the Rachel voice, speaking in English, from ElevenLabs' Multilingual v2 text-to-speech model."

Google Cloud

Google Cloud offers a number of robust text-to-speech voice models. SignalWire supports all Google Cloud voices in both General Availability and Preview launch stages, except for the Studio model.

  • Standard is a basic, reliable, and budget-friendly text-to-speech model. The Standard model is less natural-sounding than WaveNet and Neural2, but more cost-effective.
  • WaveNet is powered by deep learning technology and offers more natural and lifelike speech output.
  • Neural2 is based on the same technology used to create Custom Voices and prioritizes natural and human-like pronunciation and intonation.
  • Polyglot voices have variants in multiple languages. For example, at time of writing, the polyglot-1 voice has variants for English (Australia), English (US), French, German, Spanish (Spain), and Spanish (US).

Set language for Google Cloud voices

Sample all available voices with Google's supported voices and languages reference. Copy the voice identifier string in whole from the Voice name column.

Unlike the other supported engines, Google Cloud voice identifier strings include both voice and language keys, following the pattern <language>-<model>-<variant>. For example:

  • English (UK) WaveNet female voice: en-GB-Wavenet-A
  • Spanish (Spain) Neural2 male voice: es-ES-Neural2-B
  • Mandarin Chinese Standard female voice: cmn-CN-Standard-D

Google Cloud voice IDs

Copy the voice ID in whole from the Voice name column of Google's table of supported voices. Google Cloud voice IDs encode language and model information, so no modification is needed to make these selections. Prepend gcloud. and the string is ready for use. For example: gcloud.en-GB-Wavenet-A

Use Google Cloud voices on the SignalWire platform

Use the languages SWML method to set one or more voices for an AI agent.

version: 1.0.0
sections:
main:
- ai:
prompt:
text: Have an open-ended conversation about flowers.
languages:
- name: English
code: en-US
voice: gcloud.en-US-Neural2-A

Alternatively, use the say_voice parameter of the play SWML method to select a voice for basic TTS.

version: 1.0.0
sections:
main:
- set:
say_voice: "gcloud.en-US-Neural2-A"
- play: "say:Greetings. This is the 2-A US English voice from Google Cloud's Neural2 text-to-speech model."

OpenAI

OpenAI offers versatile multilingual voices balancing low latency and good quality. While voices are optimized for English, they perform well across all supported languages.

Consult OpenAI's Text-to-Speech documentation for more information and audio samples for available voices.

OpenAI voice IDs

Copy the voice ID from OpenAI's Voice Options reference.

Prepend openai. and the string is ready for use. For example: openai.alloy

Use OpenAI voices on the SignalWire platform

Use the languages SWML method to set one or more voices for an AI agent.

version: 1.0.0
sections:
main:
- ai:
prompt:
text: Have an open-ended conversation about flowers.
languages:
- name: English
code: en-US
voice: openai.alloy

Alternatively, use the say_voice parameter of the play SWML method to select a voice for basic TTS.

version: 1.0.0
sections:
main:
- set:
say_voice: "openai.alloy"
- play: "say:Greetings. This is the Alloy voice from OpenAI's text-to-speech model."

Pricing

Voices are priced according to model in three tiers. Consult our Voice API Pricing for up-to-date pricing information.

Standard

  • Google Cloud Standard
  • Amazon Polly Standard

Premium

  • Google Cloud Neural2, WaveNet, and Journey
  • Amazon Polly Neural and Generative
  • Deepgram Aura

ElevenLabs voices have their own tier.