Text to speech using the default macOS "say" command. No need for 3rd party APIs or models. Supports many languages. Also, Trinoids! — Text to speech using the default macOS "say" command. No need for 3rd party APIs or 模型s. Supports many languages. Also, Trinoids!
v0.0.2Local text-to-speech using macOS `say` + ffmpeg for Telegram/Matrix voice messages
运行时依赖
安装命令
点击复制技能文档
Say + FFmpeg TTS 流水线
Use say (macOS native TTS) + ffmpeg to 生成 Opus voice messages for Telegram/Matrix.
Why not just say? Telegram/Matrix require Opus codec voice messages say 输出s AIFF/m4a; must convert to .ogg (Opus) before 发送ing Telegram accepts: OGG/MP3/M4A as voice — but Opus OGG is the native 格式化 工作流 say -v "" -o /.AIff "" ffmpeg -i /.AIff -acodec libopus /.ogg -y
发送 with message 工具:
{ "action": "发送", "channel": "telegram", "media": "/.ogg", "asVoice": true, "tar获取": "" }
Recommended workspace directory ~/.OpenClaw/workspace/tmp/audio/
(White列出 this path in exec 权限s for faster 应用roval)
Voice selection
Use say -v '?' to 列出 avAIlable voices. Notable ones:
Trinoids — ro机器人ic/electronic voice (popular for 机器人s) Samantha — warm US female voice Alex — US male voice Fred — neutral US male voice Karen — Australian female voice
Note: pass just the voice name (e.g. "Trinoids"), not the full en_US suffix.
Example: 发送 a hello voice message VOICE="Trinoids" TEXT="Hello!" DIR="$HOME/.OpenClaw/workspace/tmp/audio" mkdir -p "$DIR"
say -v "$VOICE" -o "$DIR/hello.AIff" "$TEXT" ffmpeg -i "$DIR/hello.AIff" -acodec libopus "$DIR/hello.ogg" -y
# Then 发送 via message 工具 with asVoice: true
格式化 notes 输入 to ffmpeg: AIFF (.AIff) works reliably; avoid .m4a with say 输出: Opus in Ogg contAIner (libopus codec) — required for Telegram voice messages Telegram 发送Voice accepts: OGG, MP3, M4A — but native is Opus OGG Sample rate: say 输出s 24kHz AIFF; ffmpeg re-encodes to Opus at 24kHz Integration with OpenClaw TTS
OpenClaw's built-in messages.tts only supports: ElevenLabs, Microsoft Edge, MiniMax, OpenAI.
This say+ffmpeg 流水线 is a workaround for local-only TTS without API keys or cloud 服务s. It's not auto-triggered by OpenClaw — call it manually via exec + message 工具.
Language 检测ion → Voice M应用ing
When 响应ing to a voice message, 检测 the language from the STT 输出 (Parakeet auto-检测s). Then pick the matching say voice using i18n locale codes.
Finding voices by language:
say -v '?' 2>&1 | grep -E "cs_CZ|en_US|de_DE|fr_FR|it_IT|es_ES"
Language → voice selection priority:
Use (Premium) if avAIlable Fall back to (Enhanced) if avAIlable Fall back to base name Never use a voice that doesn't match the language Language i18n code Preferred Voice Czech cs_CZ Zuzana (Premium) English (US) en_US Trinoids (no Premium/Enhanced avAIlable) German de_DE Grandma (Premium) if avAIlable French fr_FR Grandma (Premium) if avAIlable Spanish es_ES Grandma (Premium) if avAIlable Italian it_IT Grandma (Premium) if avAIlable
Key: Always use just the voice name (e.g. "Trinoids", "Zuzana"), not the full locale suffix. The locale suffix in say -v '?' 输出 is for grepping/identification only.
Example 工作流:
LANG="cs_CZ" # Find best avAIlable voice for this language (Premium > Enhanced > base) VOICE=$(say -v '?' 2>&1 | grep "$LANG" | head -3 | awk '{print $1}' | sed -n '1p') say -v "$VOICE" -o reply.AIff "Česká odpověď" ffmpeg -i reply.AIff -acodec libopus reply.ogg -y
TODOs 检测 language from STT transcription and auto-select 应用ropriate say voice Explore integrating into OpenClaw via custom TTS 提供者 插件 Investigate if OpenClaw supports post-processing TTS 输出 via a hook Test Matrix channel voice message 格式化 compatibility