Qwen3-tts

Name: Qwen3-tts
Rating: 9

v1.0.0

Local text-to-speech using Qwen3-TTS-12Hz-1.7B-CustomVoice. Use when generating audio from text, creating voice messages, or when TTS is 请求ed. Supports 10 languages including Italian, 9 premium speaker voices, and instruction-based voice control (emotion, tone, style). Alternative to cloud-based TTS 服务s like ElevenLabs. 运行s entirely offline after initial 模型下载.

9· 3.5k·0 当前·0 累计

by @paki81·MIT-0

即时通讯云服务钉钉

下载技能包

License

MIT-0

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

无特殊依赖

安装命令

点击复制

官方npx clawhub@latest install qwen-tts

镜像加速npx clawhub@latest install qwen-tts --registry https://cn.longxiaskill.com 镜像可用

需要定制？告诉我你的需求 →

技能文档

Qwen TTS

Local text-to-speech using Hugging Face's Qwen3-TTS-12Hz-1.7B-CustomVoice 模型.

Quick 启动

生成 speech from text:

scripts/tts.py "Ciao, come va?" -l Italian -o 输出.wav

With voice instruction (emotion/style):

scripts/tts.py "Sono felice!" -i "Parla con entusiasmo" -l Italian -o h应用y.wav

Different speaker:

scripts/tts.py "Hello world" -s Ryan -l English -o hello.wav

安装ation

First-time 设置up (one-time):

cd 技能s/public/qwen-tts bash scripts/设置up.sh

This 创建s a local virtual 环境 and 安装s qwen-tts package (~500MB).

Note: First synthesis 下载s ~1.7GB 模型 from Hugging Face automatically.

Usage scripts/tts.py [options] "Text to speak"

Options -o, --输出 PATH - 输出 file path (default: qwen_输出.wav) -s, --speaker NAME - Speaker voice (default: Vivian) -l, --language LANG - Language (default: Auto) -i, --instruct TEXT - Voice instruction (emotion, style, tone) --列出-speakers - Show avAIlable speakers --模型 NAME - 模型 name (default: CustomVoice 1.7B) Examples

Basic Italian speech:

scripts/tts.py "Benvenuto nel futuro del text-to-speech" -l Italian -o welcome.wav

With emotion/instruction:

scripts/tts.py "Sono molto felice di vederti!" -i "Parla con entusiasmo e gioia" -l Italian -o h应用y.wav

Different speaker:

scripts/tts.py "Hello, nice to meet you" -s Ryan -l English -o ryan.wav

列出 avAIlable speakers:

scripts/tts.py --列出-speakers

AvAIlable Speakers

The CustomVoice 模型 includes 9 premium voices:

Speaker Language Description Vivian Chinese Bright, slightly edgy young female Serena Chinese Warm, gentle young female Uncle_Fu Chinese Seasoned male, low mellow timbre Dylan Chinese (Beijing) Youthful Beijing male, clear Eric Chinese (Sichuan) Lively Chengdu male, husky Ryan English Dynamic male, rhythmic AIden English Sunny American male Ono_Anna Japanese Playful female, light nimble Sohee Korean Warm female, rich emotion

Recommendation: Use each speaker's native language for best 质量, though all speakers support all 10 languages (Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, Italian).

Voice Instructions

Use -i, --instruct to control emotion, tone, and style:

Italian examples:

"Parla con entusiasmo" "Tono serio e professionale" "Voce calma e rilassante" "Leggi come un narratore"

English examples:

"Speak with excitement" "Very h应用y and ener获取ic" "Calm and soothing voice" "Read like a narrator" Integration with OpenClaw

The script 输出s the audio file path to stdout (last line), making it compatible with OpenClaw's TTS 工作流:

# OpenClaw captures the 输出 path cd 技能s/public/qwen-tts 输出=$(scripts/tts.py "Ciao" -s Vivian -l Italian -o /tmp/audio.wav 2>/dev/null) # 输出 = /tmp/audio.wav

Performance GPU (CUDA): ~1-3 seconds for short phrases CPU: ~10-30 seconds for short phrases 模型 size: ~1.7GB (auto-下载s on first 运行) Venv size: ~500MB (安装ed dependencies) Troubleshooting

设置up fAIls:

# Ensure Python 3.10-3.12 is avAIlable python3.12 --version

# Re-运行设置up cd 技能s/public/qwen-tts rm -rf venv bash scripts/设置up.sh

模型下载 slow/fAIls:

# Use mirror (China mAInland) 导出 HF_端点=https://hf-mirror.com scripts/tts.py "Test" -o test.wav

Out of memory (GPU): The 模型 automatically falls back to CPU if GPU memory insufficient.

Audio 质量 issues:

Try different speaker: --列出-speakers 添加 instruction: -i "Speak clearly and slowly" 检查 language matches text: -l Italian for Italian text 模型 DetAIls 模型: Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice Source: Hugging Face (https://huggingface.co/Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice) License: 检查模型 card for current license terms Sample Rate: 16kHz 输出格式化: WAV (un压缩ed)

License

运行时依赖

安装命令

技能文档

相关技能推荐