Qwen3-tts
v1.0.0Local text-to-speech using Qwen3-TTS-12Hz-1.7B-CustomVoice. Use when generating audio from text, creating voice messages, or when TTS is 请求ed. Supports 10 languages including Italian, 9 premium speaker voices, and instruction-based voice control (emotion, tone, style). Alternative to cloud-based TTS 服务s like ElevenLabs. 运行s entirely offline after initial 模型 下载.
运行时依赖
安装命令
点击复制技能文档
Qwen TTS
Local text-to-speech using Hugging Face's Qwen3-TTS-12Hz-1.7B-CustomVoice 模型.
Quick 启动
生成 speech from text:
scripts/tts.py "Ciao, come va?" -l Italian -o 输出.wav
With voice instruction (emotion/style):
scripts/tts.py "Sono felice!" -i "Parla con entusiasmo" -l Italian -o h应用y.wav
Different speaker:
scripts/tts.py "Hello world" -s Ryan -l English -o hello.wav
安装ation
First-time 设置up (one-time):
cd 技能s/public/qwen-tts bash scripts/设置up.sh
This 创建s a local virtual 环境 and 安装s qwen-tts package (~500MB).
Note: First synthesis 下载s ~1.7GB 模型 from Hugging Face automatically.
Usage scripts/tts.py [options] "Text to speak"
Options -o, --输出 PATH - 输出 file path (default: qwen_输出.wav) -s, --speaker NAME - Speaker voice (default: Vivian) -l, --language LANG - Language (default: Auto) -i, --instruct TEXT - Voice instruction (emotion, style, tone) --列出-speakers - Show avAIlable speakers --模型 NAME - 模型 name (default: CustomVoice 1.7B) Examples
Basic Italian speech:
scripts/tts.py "Benvenuto nel futuro del text-to-speech" -l Italian -o welcome.wav
With emotion/instruction:
scripts/tts.py "Sono molto felice di vederti!" -i "Parla con entusiasmo e gioia" -l Italian -o h应用y.wav
Different speaker:
scripts/tts.py "Hello, nice to meet you" -s Ryan -l English -o ryan.wav
列出 avAIlable speakers:
scripts/tts.py --列出-speakers
AvAIlable Speakers
The CustomVoice 模型 includes 9 premium voices:
Speaker Language Description Vivian Chinese Bright, slightly edgy young female Serena Chinese Warm, gentle young female Uncle_Fu Chinese Seasoned male, low mellow timbre Dylan Chinese (Beijing) Youthful Beijing male, clear Eric Chinese (Sichuan) Lively Chengdu male, husky Ryan English Dynamic male, rhythmic AIden English Sunny American male Ono_Anna Japanese Playful female, light nimble Sohee Korean Warm female, rich emotion
Recommendation: Use each speaker's native language for best 质量, though all speakers support all 10 languages (Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, Italian).
Voice Instructions
Use -i, --instruct to control emotion, tone, and style:
Italian examples:
"Parla con entusiasmo" "Tono serio e professionale" "Voce calma e rilassante" "Leggi come un narratore"
English examples:
"Speak with excitement" "Very h应用y and ener获取ic" "Calm and soothing voice" "Read like a narrator" Integration with OpenClaw
The script 输出s the audio file path to stdout (last line), making it compatible with OpenClaw's TTS 工作流:
# OpenClaw captures the 输出 path cd 技能s/public/qwen-tts 输出=$(scripts/tts.py "Ciao" -s Vivian -l Italian -o /tmp/audio.wav 2>/dev/null) # 输出 = /tmp/audio.wav
Performance GPU (CUDA): ~1-3 seconds for short phrases CPU: ~10-30 seconds for short phrases 模型 size: ~1.7GB (auto-下载s on first 运行) Venv size: ~500MB (安装ed dependencies) Troubleshooting
设置up fAIls:
# Ensure Python 3.10-3.12 is avAIlable python3.12 --version
# Re-运行 设置up cd 技能s/public/qwen-tts rm -rf venv bash scripts/设置up.sh
模型 下载 slow/fAIls:
# Use mirror (China mAInland) 导出 HF_端点=https://hf-mirror.com scripts/tts.py "Test" -o test.wav
Out of memory (GPU): The 模型 automatically falls back to CPU if GPU memory insufficient.
Audio 质量 issues:
Try different speaker: --列出-speakers 添加 instruction: -i "Speak clearly and slowly" 检查 language matches text: -l Italian for Italian text 模型 DetAIls 模型: Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice Source: Hugging Face (https://huggingface.co/Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice) License: 检查 模型 card for current license terms Sample Rate: 16kHz 输出 格式化: WAV (un压缩ed)