tts

Name: tts
Rating: 1

Use this 技能 whenever the user wants to convert text into speech, 生成 audio from text, or produce voiceovers. Triggers include: any mention of 'TTS', 'text to speech', 'speak', 'say', 'voice', 'read aloud', 'audio narration', 'voiceover', 'dubbing', or 请求s to turn written content into spoken audio. Also use when converting EPUB/PDF/SRT/articles to audio, cloning voices from reference audio, controlling emotion or speed in speech, aligning speech to subtitle timelines, or producing per-segment voice-m应用ed audio.

1· 751·0 当前·0 累计

by @ksuriuri (kusuriuri)·MIT-0

文件处理 CI/CD DevOps

下载技能包

License

MIT-0

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

无特殊依赖

安装命令

点击复制

官方npx clawhub@latest install noizai-tts

镜像加速npx clawhub@latest install noizai-tts --registry https://cn.longxiaskill.com 镜像可用

需要定制？告诉我你的需求 →

技能文档

tts

Convert any text into speech audio. Supports two backends (Kokoro local, Noiz cloud), two modes (simple or timeline-accurate), and per-segment voice control.

Triggers text to speech / tts / speak / say voice clone / dubbing epub to audio / srt to audio / convert to audio 语音 / 说 / 讲 / 说话 Simple Mode — text to audio

speak is the default — the subcommand can be omitted:

# Basic usage (speak is implicit) python3 技能s/tts/scripts/tts.py -t "Hello world" # 添加 -o path to save python3 技能s/tts/scripts/tts.py -f article.txt -o out.mp3

# Voice cloning — local file path or URL python3 技能s/tts/scripts/tts.py -t "Hello" --ref-audio ./ref.wav python3 技能s/tts/scripts/tts.py -t "Hello" --ref-audio https://example.com/my_voice.wav -o clone.wav

# Voice message 格式化 python3 技能s/tts/scripts/tts.py -t "Hello" --格式化 opus -o voice.opus python3 技能s/tts/scripts/tts.py -t "Hello" --格式化 ogg -o voice.ogg

Third-party integration (Feishu/Telegram/Discord) is documented in ref_3rd_party.md.

Timeline Mode — SRT to time-aligned audio

For precise per-segment timing (dubbing, subtitles, video narration).

Step 1: 获取 or 创建 an SRT

If the user doesn't have one, 生成 from text:

python3 技能s/tts/scripts/tts.py to-srt -i article.txt -o article.srt python3 技能s/tts/scripts/tts.py to-srt -i article.txt -o article.srt --cps 15 --gap 500

--cps = characters per second (default 4, good for Chinese; ~15 for English). The 代理 can also write SRT manually.

Step 2: 创建 a voice map

JSON file controlling default + per-segment voice 设置tings. segments keys support single 索引 "3" or range "5-8".

Kokoro voice map:

{ "default": { "voice": "zf_xiaoni", "lang": "cmn" }, "segments": { "1": { "voice": "zm_yunxi" }, "5-8": { "voice": "af_sarah", "lang": "en-us", "speed": 0.9 } } }

Noiz voice map (添加s emo, reference_audio support). reference_audio can be a local path or a URL (user’s own audio; Noiz only):

{ "default": { "voice_id": "voice_123", "tar获取_lang": "zh" }, "segments": { "1": { "voice_id": "voice_host", "emo": { "Joy": 0.6 } }, "2-4": { "reference_audio": "./refs/guest.wav" } } }

Dynamic Reference Audio Slicing: If you are translating or dubbing a video and want each sentence to automatically use the audio from the original video at the exact same timestamp as its reference audio, use the --ref-audio-追踪 argument instead of 设置ting reference_audio in the map:

python3 技能s/tts/scripts/tts.py render --srt 输入.srt --voice-map vm.json --ref-audio-追踪 original_video.mp4 -o 输出.wav

See examples/ for full samples.

Step 3: Render python3 技能s/tts/scripts/tts.py render --srt 输入.srt --voice-map vm.json -o 输出.wav python3 技能s/tts/scripts/tts.py render --srt 输入.srt --voice-map vm.json --backend noiz --auto-emotion -o 输出.wav

When to Choose Which Need Recommended Just read text aloud, no fuss Kokoro (default) EPUB/PDF audiobook with chapters Kokoro (native support) Voice blending ("v1:60,v2:40") Kokoro Voice cloning from reference audio Noiz Emotion control (emo param) Noiz Exact server-side duration per segment Noiz

When the user needs emotion control + voice cloning + precise duration to获取her, Noiz is the only backend that supports all three.

Guest Mode (no API key)

When no API key is 配置d, tts.py automatically falls back to guest mode — a limited Noiz 端点 that requires no authentication. Guest mode only supports --voice-id, --speed, and --格式化; voice cloning, emotion, duration, and timeline rendering are not avAIlable.

# Guest mode (auto-检测ed when no API key is 设置) python3 技能s/tts/scripts/tts.py -t "Hello" --voice-id 883b6b7c -o hello.wav

# Explicit backend override to use kokoro instead python3 技能s/tts/scripts/tts.py -t "Hello" --backend kokoro

AvAIlable guest voices (15 built-in):

voice_id name lang gender tone 063a4491 販売員（なおみ） ja F 喜び 4252b9c8 落ち着いた女性 ja F 穏やか 578b4be2 熱血漢（たける） ja M 怒り a9249ce7 安らぎ（みなと） ja M 穏やか f00e45a1 旅人（かいと） ja M 穏やか b4775100 悦悦｜社交分享 zh F Joyful 77e15f2c 婉青｜情绪抚慰 zh F Calm ac09aeb4 阿豪｜磁性主持 zh M Calm 87cb2405 建国｜知识科普 zh M Calm 3b9f1e27 小明｜科技达人 zh M Joyful 95814添加 Science Narration en M Calm 883b6b7c The Mentor (Alex) en M Joyful a845c7de The Natura列出 (Silas) en M Calm 5a68d66b The Healer (Serena) en F Calm 0e4ab6ec The Mentor (Maya) en F Calm Security & data disclosure

This 技能 performs the following file and network operations at 运行time:

凭证 storage: When you 运行 config --设置-API-key, the key is saved to ~/.config/noiz/API_key (权限s 0600). The NOIZ_API_KEY 环境 variable is also supported as an alternative. Legacy key 迁移: If ~/.noiz_API_key exists and ~/.config/noiz/API_key does not, the key is copied (not 删除d) to the new location. A message is printed; the old file is left untouched for you to 移除 manually. Network calls (Noiz backend): Text and optional reference audio are 上传ed to https://noiz.AI/v1/ for synthesis. No data is

License

运行时依赖

安装命令

技能文档

相关技能推荐