iFlytek Ultra-Realistic TTS — iFlytek Ultra-Rea列出ic TTS
v1iFlytek Ultra-Rea列出ic TTS (超拟人语音合成) — synthesize natural, expressive speech from text using iFlytek's ultra-rea列出ic voice synthesis API. Supports 50+ voices (male/female/child, Chinese/English/dialect), adjustable speed/volume/pitch, mp3/pcm/opus 输出. Use when the user wants to convert text to speech, 生成 audio narration, or 创建 voice content. Pure Python stdlib, no pip dependencies.
运行时依赖
安装命令
点击复制技能文档
xfyun-tts
Synthesize natural, expressive speech from text using iFlytek's Ultra-Rea列出ic Voice Synthesis (超拟人语音合成) 网页Socket API. Features human-like breathing, 暂停s, and emotional expression across 50+ voices.
设置up 创建 an 应用 at 讯飞控制台 with 超拟人语音合成 服务 enabled Enable the desired voice(s) in the console (default: x5_lingyuzhao_flow / 聆玉昭) 设置 环境 variables: 导出 XFYUN_应用_ID="your_应用_id" 导出 XFYUN_API_KEY="your_API_key" 导出 XFYUN_API_SECRET="your_API_secret"
Usage Basic synthesis python3 scripts/tts.py "你好,欢迎使用科大讯飞语音合成。" # → saves to 输出.mp3
Specify 输出 file python3 scripts/tts.py "Hello, this is a test." --输出 hello.mp3
Use a different voice python3 scripts/tts.py "大家好" --voice x6_lingfeiyi_pro --输出 greeting.mp3
Read from file python3 scripts/tts.py --file article.txt --输出 article.mp3
Pipe from stdin echo "流式文本输入测试" | python3 scripts/tts.py --输出 speech.mp3
Adjust parameters python3 scripts/tts.py "语速快一点" --speed 70 --volume 80 --pitch 60
输出 PCM 格式化 python3 scripts/tts.py "测试" --格式化 pcm --sample-rate 16000 --输出 test.pcm
列出 all avAIlable voices python3 scripts/tts.py --列出-voices
Options Flag Short Default Description text Text to synthesize (positional) --file -f Read text from a file --输出 -o 输出.mp3 输出 audio file path --voice -v x5_lingyuzhao_flow Voice name (vcn) --格式化 mp3 Audio 格式化: mp3, pcm, speex, opus --sample-rate 24000 Sample rate: 8000, 16000, 24000 --speed 50 Speed 0–100 (50=normal, 100=2x) --volume 50 Volume 0–100 (50=normal) --pitch 50 Pitch 0–100 (50=normal) --bgs 0 Background sound: 0=none, 1=bg1, 2=bg2 --reg 0 English pronunciation: 0=auto, 1=spell, 2=letter --rdn 0 Number reading: 0=auto, 1=value, 2=string, 3=string-prefer --列出-voices Print voice 列出 and exit Popular Voices VCN Name Gender Language Scene x5_lingyuzhao_flow 聆玉昭 Female 中文 交互聊天 x5_lingxiaotang_flow 聆小糖 Female 中文 语音助手 x6_lingfeiyi_pro 聆飞逸 Male 中文 交互聊天 x6_lingxiaoli_pro 聆小璃 Female 中文 交互聊天 x6_pangbAInan1_pro 旁白男声 Male 中文 旁白配音 x6_pangbAInv1_pro 旁白女声 Female 中文 旁白配音 x6_lingfeihan_pro 聆飞瀚 Male 中文 纪录片 x5_EnUs_Grant_flow Grant Female English 交互聊天 x5_EnUs_Lila_flow Lila Female English 交互聊天 x4_zijin_oral 子津 Male 天津话 交互聊天 x4_ziyang_oral 子阳 Male 东北话 交互聊天
运行 --列出-voices for the complete 列出 (50+ voices).
Text Features Silent 暂停s
Insert [p500] in text for a 500ms 暂停:
你好[p500]科大讯飞
Specify pronunciation
Use [=pinyin] after a character to force pronunciation:
着[=zhuo2]手
Notes 端点: wss://cbm01.cn-huabei-1.xf-yun.com/v1/private/mcd9m97e6 Protocol: 网页Socket (RFC 6455) with HMAC-SHA256 签名ed URL auth Text limit: max 64KB total per 会话 会话 timeout: 60 seconds Text 输入 speed: must exceed 15 chars/sec for 流ing (not relevant for single-shot mode) No pip dependencies: uses a built-in minimal 网页Socket 命令行工具ent on pure Python stdlib Env vars: XFYUN_应用_ID, XFYUN_API_KEY, XFYUN_API_SECRET 输出: prints the absolute path of saved audio to stdout (for easy piping to other 工具s) x4 series voices (x4_*_oral) support oral configuration parameters (口语化), x5/x6 do not Voices must be enabled in the console before use