Voice Language

Voice & Language

Summary: Configure Text-to-Speech voices, languages, and pronunciation to create natural-sounding agents.

Voice Configuration Overview

Language Configuration

Parameter	Description	Example
`name`	Human-readable name	`"English"`
`code`	Language code for STT	`"en-US"`
`voice`	TTS voice identifier	`"rime.spore"` or `"elevenlabs.josh:eleven_turbo_v2_5"`

Fillers (Natural Speech)

Parameter	Description	Example
`speech_fillers`	Used during natural conversation pauses	`["Um", "Well", "So"]`
`function_fillers`	Used while executing a function	`["Let me check...", "One moment..."]`

Adding a Language

Basic Configuration

from signalwire_agents import AgentBase


class MyAgent(AgentBase):
    def __init__(self):
        super().__init__(name="my-agent")

        # Basic language setup
        self.add_language(
            name="English",       # Display name
            code="en-US",         # Language code for STT
            voice="rime.spore"    # TTS voice
        )

Voice Format

The voice parameter uses the format engine.voice:model where model is optional:

## Simple voice (engine.voice)
self.add_language("English", "en-US", "rime.spore")

## With model (engine.voice:model)
self.add_language("English", "en-US", "elevenlabs.josh:eleven_turbo_v2_5")

Available TTS Engines

Provider	Engine Code	Example Voice	Reference
Amazon Polly	`amazon`	`amazon.Joanna-Neural`	Voice IDs
Cartesia	`cartesia`	`cartesia.a167e0f3-df7e-4d52-a9c3-f949145efdab`	Voice IDs
Deepgram	`deepgram`	`deepgram.aura-asteria-en`	Voice IDs
ElevenLabs	`elevenlabs`	`elevenlabs.thomas`	Voice IDs
Google Cloud	`gcloud`	`gcloud.en-US-Casual-K`	Voice IDs
Microsoft Azure	`azure`	`azure.en-US-AvaNeural`	Voice IDs
OpenAI	`openai`	`openai.alloy`	Voice IDs
Rime	`rime`	`rime.luna:arcana`	Voice IDs

Filler Phrases

Add natural pauses and filler words:

self.add_language(
    name="English",
    code="en-US",
    voice="rime.spore",
    speech_fillers=[
        "Um",
        "Well",
        "Let me think",
        "So"
    ],
    function_fillers=[
        "Let me check that for you",
        "One moment please",
        "I'm looking that up now",
        "Bear with me"
    ]
)

Speech fillers: Used during natural conversation pauses

Function fillers: Used while the AI is executing a function

Multi-Language Support

Use code="multi" for automatic language detection and matching:

class MultilingualAgent(AgentBase):
    def __init__(self):
        super().__init__(name="multilingual-agent")

        # Multi-language support (auto-detects and matches caller's language)
        self.add_language(
            name="Multilingual",
            code="multi",
            voice="rime.spore"
        )

        self.prompt_add_section(
            "Language",
            "Automatically detect and match the caller's language without "
            "prompting or asking them to verify. Respond naturally in whatever "
            "language they speak."
        )

The multi code supports: English, Spanish, French, German, Hindi, Russian, Portuguese, Japanese, Italian, and Dutch.

Note: Speech recognition hints do not work when using code="multi". If you need hints for specific terms, use individual language codes instead.

For more control over individual languages with custom fillers:

class CustomMultilingualAgent(AgentBase):
    def __init__(self):
        super().__init__(name="custom-multilingual")

        # English (primary)
        self.add_language(
            name="English",
            code="en-US",
            voice="rime.spore",
            speech_fillers=["Um", "Well", "So"],
            function_fillers=["Let me check that"]
        )

        # Spanish
        self.add_language(
            name="Spanish",
            code="es-MX",
            voice="rime.luna",
            speech_fillers=["Eh", "Pues", "Bueno"],
            function_fillers=["Dejame verificar", "Un momento"]
        )

        # French
        self.add_language(
            name="French",
            code="fr-FR",
            voice="rime.claire",
            speech_fillers=["Euh", "Alors", "Bon"],
            function_fillers=["Laissez-moi verifier", "Un instant"]
        )

        self.prompt_add_section(
            "Language",
            "Automatically detect and match the caller's language without "
            "prompting or asking them to verify."
        )

Pronunciation Rules

Fix pronunciation of specific words:

class AgentWithPronunciation(AgentBase):
    def __init__(self):
        super().__init__(name="pronunciation-agent")
        self.add_language("English", "en-US", "rime.spore")

        # Fix brand names
        self.add_pronunciation(
            replace="ACME",
            with_text="Ack-me"
        )

        # Fix technical terms
        self.add_pronunciation(
            replace="SQL",
            with_text="sequel"
        )

        # Case-insensitive matching
        self.add_pronunciation(
            replace="api",
            with_text="A P I",
            ignore_case=True
        )

        # Fix names
        self.add_pronunciation(
            replace="Nguyen",
            with_text="win"
        )

Set Multiple Pronunciations

## Set all pronunciations at once
self.set_pronunciations([
    {"replace": "ACME", "with": "Ack-me"},
    {"replace": "SQL", "with": "sequel"},
    {"replace": "API", "with": "A P I", "ignore_case": True},
    {"replace": "CEO", "with": "C E O"},
    {"replace": "ASAP", "with": "A sap"}
])

Voice Selection Guide

Choosing the right TTS engine and voice significantly impacts caller experience. Consider these factors:

Use Case Recommendations

Use Case	Recommended Voice Style
Customer Service	Warm, friendly (`rime.spore`)
Technical Support	Clear, professional (`rime.marsh`)
Sales	Energetic, persuasive (elevenlabs voices)
Healthcare	Calm, reassuring
Legal/Finance	Formal, authoritative

TTS Engine Comparison

Engine	Latency	Quality	Cost	Best For
Rime	Very fast	Good	Low	Production, low-latency needs
ElevenLabs	Medium	Excellent	Higher	Premium experiences, emotion
Google Cloud	Medium	Very good	Medium	Multilingual, SSML features
Amazon Polly	Fast	Good	Low	AWS integration, Neural voices
OpenAI	Medium	Excellent	Medium	Natural conversation style
Azure	Medium	Very good	Medium	Microsoft ecosystem
Deepgram	Fast	Good	Medium	Speech-focused applications
Cartesia	Fast	Good	Medium	Specialized voices

Choosing an Engine

Prioritize latency (Rime, Polly, Deepgram):

Interactive conversations where quick response matters
High-volume production systems
Cost-sensitive deployments

Prioritize quality (ElevenLabs, OpenAI):

Premium customer experiences
Brand-sensitive applications
When voice quality directly impacts business outcomes

Prioritize features (Google Cloud, Azure):

Need SSML for fine-grained control
Complex multilingual requirements
Specific enterprise integrations

Testing and Evaluation Process

Before selecting a voice for production:

Create test content with domain-specific terms, company names, and typical phrases
Test multiple candidates from your shortlisted engines
Evaluate each voice:
- Pronunciation accuracy (especially brand names)
- Natural pacing and rhythm
- Emotional appropriateness
- Handling of numbers, dates, prices
Test with real users if possible—internal team members or beta callers
Measure latency in your deployment environment

Voice Personality Considerations

Match voice to brand:

Formal brands → authoritative, measured voices
Friendly brands → warm, conversational voices
Tech brands → clear, modern-sounding voices

Consider your audience:

Older demographics may prefer clearer, slower voices
Technical audiences tolerate more complex terminology
Regional preferences may favor certain accents

Test edge cases:

Long monologues (product descriptions)
Lists and numbers (order details, account numbers)
Emotional content (apologies, celebrations)

Dynamic Voice Selection

Change voice based on context:

class DynamicVoiceAgent(AgentBase):
    DEPARTMENT_VOICES = {
        "support": {"voice": "rime.spore", "name": "Alex"},
        "sales": {"voice": "rime.marsh", "name": "Jordan"},
        "billing": {"voice": "rime.coral", "name": "Morgan"}
    }

    def __init__(self):
        super().__init__(name="dynamic-voice")

    def on_swml_request(self, request_data=None, callback_path=None, request=None):
        # Determine department from called number
        call_data = (request_data or {}).get("call", {})
        called_num = call_data.get("to", "")

        if "555-1000" in called_num:
            dept = "support"
        elif "555-2000" in called_num:
            dept = "sales"
        else:
            dept = "billing"

        config = self.DEPARTMENT_VOICES[dept]

        self.add_language("English", "en-US", config["voice"])

        self.prompt_add_section(
            "Role",
            f"You are {config['name']}, a {dept} representative."
        )

Language Codes Reference

Supported language codes:

Language	Codes
Multilingual	`multi` (English, Spanish, French, German, Hindi, Russian, Portuguese, Japanese, Italian, Dutch)
Bulgarian	`bg`
Czech	`cs`
Danish	`da`, `da-DK`
Dutch	`nl`
English	`en`, `en-US`, `en-AU`, `en-GB`, `en-IN`, `en-NZ`
Finnish	`fi`
French	`fr`, `fr-CA`
German	`de`
Hindi	`hi`
Hungarian	`hu`
Indonesian	`id`
Italian	`it`
Japanese	`ja`
Korean	`ko`, `ko-KR`
Norwegian	`no`
Polish	`pl`
Portuguese	`pt`, `pt-BR`, `pt-PT`
Russian	`ru`
Spanish	`es`, `es-419`
Swedish	`sv`, `sv-SE`
Turkish	`tr`
Ukrainian	`uk`
Vietnamese	`vi`

Complete Voice Configuration Example

from signalwire_agents import AgentBase


class FullyConfiguredVoiceAgent(AgentBase):
    def __init__(self):
        super().__init__(name="voice-configured")

        # Primary language with all options
        self.add_language(
            name="English",
            code="en-US",
            voice="rime.spore",
            speech_fillers=[
                "Um",
                "Well",
                "Let me see",
                "So"
            ],
            function_fillers=[
                "Let me look that up for you",
                "One moment while I check",
                "I'm searching for that now",
                "Just a second"
            ]
        )

        # Secondary language
        self.add_language(
            name="Spanish",
            code="es-MX",
            voice="rime.luna",
            speech_fillers=["Pues", "Bueno"],
            function_fillers=["Un momento", "Dejame ver"]
        )

        # Pronunciation fixes
        self.set_pronunciations([
            {"replace": "ACME", "with": "Ack-me"},
            {"replace": "www", "with": "dub dub dub"},
            {"replace": ".com", "with": "dot com"},
            {"replace": "@", "with": "at"}
        ])

        self.prompt_add_section(
            "Role",
            "You are a friendly customer service agent."
        )

Voice & Language​

Voice Configuration Overview​

Language Configuration​

Fillers (Natural Speech)​

Adding a Language​

Basic Configuration​

Voice Format​

Available TTS Engines​

Filler Phrases​

Multi-Language Support​

Pronunciation Rules​

Set Multiple Pronunciations​

Voice Selection Guide​

Use Case Recommendations​

TTS Engine Comparison​

Choosing an Engine​

Testing and Evaluation Process​

Voice Personality Considerations​

Dynamic Voice Selection​

Language Codes Reference​

Complete Voice Configuration Example​