The Voice of AI: Integrating Text-to-Speech with OpenClaw

AI agents are evolving beyond simple text interfaces. With OpenClaw, you can give your agent a literal voice, transforming interactions from silent text exchanges into dynamic, audible conversations. Whether you want your agent to send voice notes on Telegram, narrate stories, or simply speak its mind, OpenClaw's Text-to-Speech (TTS) integration makes it possible.

In this guide, we’ll explore how OpenClaw handles TTS, the providers you can use, and how to configure it for your digital workforce.

Why Give Your Agent a Voice?

Text is efficient, but voice is personal. Enabling TTS unlocks new ways to interact with your agent:

On-the-Go Access: interacting with your agent via voice notes on Telegram or Signal is often easier than typing while walking or driving.
Enhanced Personality: A carefully selected voice from ElevenLabs can give your agent a distinct character—authoritative, whimsical, or helpful.
Accessibility: Audio output makes your agent more accessible to users who prefer listening over reading.

Supported Providers

OpenClaw supports three primary TTS engines, giving you flexibility between quality and cost:

ElevenLabs: The gold standard for realistic, emotive AI voices. Requires an API key.
OpenAI: Excellent quality with familiar voices like "Alloy" and "Nova". Requires an API key.
Microsoft Edge TTS: A free, high-quality option that uses Microsoft's neural voices without needing an API key. This is the default fallback if no other keys are configured.

Configuration: Setting the Stage

To enable TTS, you modify your openclaw.json configuration file. You can set it to be always on, or only respond with audio when you speak to it first.

Basic Setup (Edge TTS)

If you just want to try it out without spending money on API credits, OpenClaw defaults to Edge TTS. You just need to enable the feature:

{
  messages: {
    tts: {
      auto: "always", // or "inbound" to only reply to voice notes
      provider: "edge"
    }
  }
}

Premium Voice Setup (ElevenLabs)

For the best experience, you can configure ElevenLabs:

{
  messages: {
    tts: {
      auto: "always",
      provider: "elevenlabs",
      elevenlabs: {
        voiceId: "YOUR_VOICE_ID", // Grab this from the ElevenLabs library
        modelId: "eleven_multilingual_v2"
      }
    }
  }
}

Modes of Operation

OpenClaw offers smart "auto" modes to control when your agent speaks:

always: Every reply is converted to audio. Great for voice-first channels.
inbound: The agent only replies with audio if you sent a voice note first. This is the most natural "hybrid" mode.
tagged: The agent remains silent unless it explicitly decides to speak using internal tags.
off: TTS is disabled globally.

Controlling TTS on the Fly

You don't always need to edit config files. You can control your agent's voice directly in chat using slash commands:

/tts on - Turn on auto-TTS for the current session.
/tts off - Silence the agent.
/tts inbound - Switch to "speak when spoken to" mode.
/tts provider openai - Switch providers instantly.
/tts audio Hello world - Force a one-off audio message.

The Agent's Perspective

Under the hood, agents have access to a tts tool. This allows them to generate audio files programmatically. For example, if you ask your agent to "Read this poem out loud," it can use the tool to generate the audio file and send it to you, even if auto-TTS is turned off.

// Agent tool usage example
tts.convert({
  text: "Once upon a midnight dreary...",
  channel: "telegram" // Optimizes format for voice bubbles
});

Conclusion

Adding a voice to your OpenClaw agent bridges the gap between a command-line tool and a true digital companion. Whether you're building a storyteller, a personal assistant, or just want to hear your code talk back, OpenClaw's TTS integration is ready to speak up.

Ready to listen? Check out the full TTS documentation for advanced configuration options.