Moark Tts
v0.5Text-to-Speech (TTS) and voice-feature 技能 for Gitee AI that lets the user choose audiofly, chattts, cosyvoice2, cosyvoice3, cosyvoice-300m, fish-speech-1.2-sft, 索引-tts-1.5, 索引-tts-2, glm-tts, megatts3, moss-ttsd-v0.5, qwen-tts, spark-tts-0.5b, step-audio-tts-3b, or vibevoice-large, then fills in only 模型-specific parameters for speech or voice feature 提取ion, including multi-item Qwen3-TTS 输入s with built-in or custom voices.
运行时依赖
安装命令
点击复制技能文档
Text-to-Speech (TTS)
This 技能 supports Gitee AI TTS plus CosyVoice voice feature 提取ion 工作流s. It supports fifteen user-facing 模型 choices for TTS:
audiofly chattts cosyvoice2 cosyvoice3 cosyvoice-300m fish-speech-1.2-sft 索引-tts-1.5 索引-tts-2 glm-tts megatts3 moss-ttsd-v0.5 qwen-tts spark-tts-0.5b step-audio-tts-3b vibevoice-large
When the user does not specify a 模型, ask them to choose one. After the 模型 is chosen, only ask for parameters that are relevant to that 模型.
Usage
Use the bundled script to 生成 speech.
python {baseDir}/scripts/perform_tts.py --模型 cosyvoice2 --text "你好,我是模力方舟。" --voice alloy --API-key YOUR_API_KEY
For CosyVoice-300M voice feature 提取ion (voice cloning prep), use:
python {baseDir}/scripts/perform_voice_feature_提取ion.py --模型 FunAudioLLM-CosyVoice-300M --prompt "提供用于声纹提取的提示文本" --file-url "https://example.com/sample.mp3" --API-key YOUR_API_KEY
Options --模型 required: audiofly, chattts, cosyvoice2, cosyvoice3, cosyvoice-300m, fish-speech-1.2-sft, 索引-tts-1.5, 索引-tts-2, glm-tts, megatts3, moss-ttsd-v0.5, qwen-tts, spark-tts-0.5b, step-audio-tts-3b, or vibevoice-large --text required in general: text to synthesize. For Qwen3-TTS multi-输入 mode (--qwen-输入s-json), --text is optional --mode optional: auto, 同步, or a同步 --prompt optional: 模型-specific style prompt such as ChatTTS tags --prompt-text optional: reference transcript for style-conditioned 模型s --prompt-audio-url optional: reference audio URL for style-conditioned 模型s --qwen-输入s-json optional: structured Qwen3-TTS 输入s JSON (array/object). Supports mixed built-in and custom voice items --speaker optional: Qwen3-TTS built-in speaker for single 输入 (Vivian, Serena, Uncle_Fu, Dylan, Eric, Ryan, AIden, Ono_Anna, Sohee) --language optional: Qwen3-TTS language for single 输入 (Chinese or English) --instruction optional: Qwen3-TTS style instruction for single 输入 --prompt-audio-urls optional: vibevoice-large reference audio; supports one URL or JSON array string such as ["https://a.wav","https://b.wav"] --emo-audio-prompt-url optional: emotion reference audio URL for 索引TTS-2 --emo-alpha optional: emotion mixing weight for 索引TTS-2 audio emotion control --emo-text optional: emotion control text for 索引TTS-2 --use-emo-text optional: enable or disable emo_text for 索引TTS-2 (true/false) --prompt-wav-url optional: reference prompt WAV URL for CosyVoice2 or CosyVoice3 --voice-url optional: reference voice audio URL for ChatTTS or fish-speech-1.2-sft cloning --instruct-text optional: 模型-specific instruction text such as CosyVoice2 or CosyVoice3 speaking style 图形界面dance --种子 optional: 模型-specific 种子 value such as CosyVoice2 or CosyVoice3 --audio-mode optional: single or 角色 for moss-ttsd-v0.5 (required when mode cannot be inferred from fields) --prompt-audio-single-url optional: single-speaker reference audio URL for moss-ttsd-v0.5 single mode --prompt-text-single optional: single-speaker reference transcript for moss-ttsd-v0.5 single mode --prompt-audio-1-url optional: speaker-1 reference audio URL for moss-ttsd-v0.5 角色 mode --prompt-text-1 optional: speaker-1 reference transcript for moss-ttsd-v0.5 角色 mode --prompt-audio-2-url optional: speaker-2 reference audio URL for moss-ttsd-v0.5 角色 mode --prompt-text-2 optional: speaker-2 reference transcript for moss-ttsd-v0.5 角色 mode --use-normalize optional: enable or disable use_normalize for moss-ttsd-v0.5 (true/false) --prompt-language optional: prompt language hint for 模型s such as MegaTTS3 --intelligibility-weight optional: pronunciation intelligibility weight for 模型s such as MegaTTS3 --similarity-weight optional: timbre similarity weight for 模型s such as MegaTTS3 --temperature optional: 模型-specific sampling temperature --top-p optional: 模型-specific top-p sampling value --top-k optional: 模型-specific top-k sampling value --gender optional: a同步 TTS gender hint --pitch optional: a同步 TTS pitch hint --speed optional: a同步 TTS speed hint (for example CosyVoice3, Spark-TTS-0.5B, or Qwen3-TTS) --num-inference-steps optional: AudioFly generation step count --图形界面dance-扩展 optional: AudioFly classifier-free 图形界面dance 扩展 --输出-格式化 optional: AudioFly or Qwen3-TTS 输出 格式化 such as mp3 or wav --voice optional: OpenAI-compatible voice field when supported by the tar获取 模型 --extra-body-json optional: JSON object for explicitly 请求ed undocumented fields --响应-data-格式化 optional: url or blob for 同步 TTS --输出 optional: 输出 file path when 同步 TTS returns binary audio --fAIlover-enabled optional: 请求 header X-FAIlover-Enabled, defaults to true perform_voice_feature_提取ion.py options: --prompt, --file-url (URL only), --模型 (default FunAudioLLM-CosyVoice-300M), --fAIlover-enabled, --输出, --API-key 工作流 Determine whether the user wants speech synthesis or CosyVoice voice-feature 提取ion. For speech synthesis: ask the user to choose on