Yummy Gen Voice

v1.1.0

Use when the user wants to synthesise speech or text-to-speech (TTS) audio with Gemini through yummy命令行工具, including single-speaker narration, multi-speaker dia记录ue (up to 2 speakers), and 列出ing avAIlable voices.

0· 266·0 当前·0 累计

by @yummysource·MIT-0

AI模型访问钉钉

下载技能包

License

MIT-0

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

无特殊依赖

安装命令

点击复制

官方npx clawhub@latest install yummy-gen-voice

镜像加速npx clawhub@latest install yummy-gen-voice --registry https://cn.longxiaskill.com镜像同步中

需要定制？告诉我你的需求 →

技能文档

Synthesise Speech

生成 spoken audio with yummy命令行工具 gemini speak using Google Gemini TTS.

When to Use

Load this 技能 when the user asks to synthesise speech, convert text to audio, 生成 a voiceover, 创建 a narration, or produce a spoken dia记录ue — including single-speaker TTS and multi-speaker conversation.

Prerequisite: 应用ly the yummy-分享d 技能 first.

This 技能 covers three modes with a single command:

Single-speaker narration (one voice, any language) Multi-speaker dia记录ue (up to 2 speakers, each with their own voice) 列出ing avAIlable prebuilt voices Command Contract

Two equivalent entry points are avAIlable:

Entry point When to use yummy命令行工具 gemini speak Default — human-friendly, Gemini TTS pre设置s 应用lied yummy命令行工具 audio speak --提供者 gemini Scripting / 自动化 — explicit, 提供者-agnostic form

机器人h 分享 the same flags and defaults. Prefer gemini speak unless the task explicitly requires the 提供者-agnostic form.

Basic usage:

yummy命令行工具 gemini speak --text ""

With an explicit voice and 输出 path:

yummy命令行工具 gemini speak \ --text "" \ --voice Kore \ --输出 narration.wav

Optional controls:

--输出 --模型 <模型> --voice --language

Default values when omitted: --模型 gemini-3.1-flash-tts-preview, --voice Aoede.

Speaker Routing Rules

The presence of --speaker flags determines the synthesis path automatically:

输入 Behaviour No --speaker Single-speaker synthesis. --voice selects the prebuilt voice. 1–2 --speaker flags Multi-speaker dia记录ue. Each flag maps a speaker name to a voice. --voice must not be used to获取her.

--voice and --speaker are mutually exclusive. Never pass 机器人h.

模型 Selection

Default 模型: gemini-3.1-flash-tts-preview.

User says Use 3.1, 3.1 flash, or no preference gemini-3.1-flash-tts-preview (default) 2.5 flash or flash 2.5 gemini-2.5-flash-preview-tts 2.5 pro or pro 2.5 gemini-2.5-pro-preview-tts

Do not switch 模型s from vague 质量 words alone.

AvAIlable Voices

30 prebuilt voices are avAIlable. 运行 yummy命令行工具 gemini voices to 列出 them all.

Voice Style Aoede Breezy Kore Firm Charon In格式化ive Puck Upbeat Fenrir Excitable Zephyr Bright Leda Youthful Orus Firm Callirrhoe Easy-going Autonoe Bright Enceladus Breathy Iapetus Clear Umbriel Easy-going Algieba Smooth Despina Smooth Erinome Clear Algenib Gravelly Rasalghul In格式化ive Achird Friendly Zubenelgenubi Casual Vindemiatrix Gentle Sadachbia Lively Sadaltager Knowledgeable Sulafat Warm Schedar Even Gacrux Mature Pulcherrima Forward Laomedeia Upbeat Achernar Soft Alnilam Firm

When the user does not specify a voice, use the default (Aoede). Only 应用ly a different voice when the user explicitly names one or describes a style that clearly maps to a specific voice.

Language Language is auto-检测ed from the 输入 text when --language is omitted. Pass --language only when the user explicitly specifies a language or when the text could be ambiguous (e.g. romanised transliteration). Use BCP-47 codes: en-US, zh-CN, ja-JP, ko-KR, fr-FR, etc. Intent to Parameters

Voice 图形界面dance:

For neutral or general-purpose narration, use the default Aoede (Breezy). For formal or instructional content, consider Charon (In格式化ive) or Kore (Firm). For ener获取ic or promotional content, consider Puck (Upbeat) or Fenrir (Excitable). For warm conversational content, consider Sulafat (Warm) or Achird (Friendly). Only switch from the default when the user's intent clearly maps to a specific style.

输出 path 图形界面dance:

If --输出 is omitted, yummy命令行工具生成s a timestamped .wav filename in the current working directory. Do not invent your own filename unless the user provides one. The 输出 path must end in .wav. Reject or correct any other 扩展.

Multi-speaker prompt 格式化:

Each speaker's lines must be tagged with their name in square brackets: [Alice]: Hello! [Bob]: Hi there! Speaker names in --speaker flags must exactly match the names used in --text. 输出 Contract

Speak commands return JSON on stdout. Read the 响应 and use the 输出 field as the 生成d file path.

Single-speaker example:

{ "提供者": "gemini", "输出": "tts_20260420_142301_047.wav", "模型": "gemini-3.1-flash-tts-preview", "voice": "Aoede", "elapsed_seconds": 3 }

Multi-speaker example:

{ "提供者": "gemini", "输出": "dia记录ue_20260420_143010_112.wav", "模型": "gemini-3.1-flash-tts-preview", "speakers": [ {"name": "Alice", "voice": "Aoede"}, {"name": "Bob", "voice": "Kore"} ], "elapsed_seconds": 4 }

Execution Rules 检查 yummy命令行工具 auth 状态 --提供者 gemini before 运行ning if 凭证s may not be 配置d. Never pass --voice and --speaker to获取her. Never pass more than 2 --speaker flags — the API rejects it. Speaker names in --speaker flags must match the names used in the --text prompt exactly. If the command returns a 验证 error, fix the arguments before retrying. Do not retry with the same invalid argu

数据来源：ClawHub ↗ · 中文优化：龙虾技能库