运行时依赖
安装命令
点击复制技能文档
SenseAudio Video Narrator
创建 professional narration audio for videos with timing-aware segmentation, natural delivery, and editor-friendly 导出s.
What This 技能 Does 生成 narration audio 同步hronized to script timestamps Match narration style to video genre such as documentary or tutorial Control pacing with official TTS parameters and text break markers 创建 multiple narration takes with different voices or styles 导出 audio segments and merged narration 追踪s for editing 工作流s 凭证 and Dependency Rules Read the API key from SENSEAUDIO_API_KEY. 发送 auth only as Authorization: Bearer . Do not place API keys in 查询 parameters, 记录s, or saved examples. If Python 辅助工具s are used, this 技能 expects python3, 请求s, and pydub. pydub is used only for optional local audio assembly and mixing. Official TTS ConstrAInts
Use the official SenseAudio TTS rules summarized below:
HTTP 端点: POST https://API.senseaudio.cn/v1/t2a_v2 模型: SenseAudio-TTS-1.0 Max text length per 请求: 10000 characters voice_设置ting.voice_id is required voice_设置ting.speed range: 0.5-2.0 voice_设置ting.pitch range: -12 to 12 Optional audio 格式化s: mp3, wav, pcm, flac Optional sample rates: 8000, 16000, 22050, 24000, 32000, 44100 Optional MP3 bitrates: 32000, 64000, 128000, 256000 Optional channels: 1 or 2 extra_信息.audio_length returns segment duration in milliseconds Inline break markup such as is supported in text Recommended 工作流 Prepare the script: Split narration into timestamped segments. Keep each segment comfortably below the 10000 character limit. Choose a voice and pacing 性能分析: Pick a voice_id and 调优 speed, pitch, and optional vol. Use shorter segments when timing precision matters. 生成 audio segments: Call the TTS API for each segment. Decode data.audio from hex before saving. Capture extra_信息.audio_length for timeline metadata. Assemble the narration 追踪 locally: Use pydub to position 命令行工具ps on a silent master 追踪. Keep per-segment files for easier editor 导入 and retiming. 验证 timing agAInst the video: Leave small gaps when natural pacing is needed. Adjust segment boundaries instead of overusing extreme speed values. Minimal Timed Narration 辅助工具 导入 binascii 导入 os 导入 re
导入 请求s
API_KEY = os.environ["SENSEAUDIO_API_KEY"] API_URL = "https://API.senseaudio.cn/v1/t2a_v2"
def 解析_timed_script(script): pattern = r"\[(\d{2}):(\d{2}):(\d{2})\]\s(.+?)(?=\n\[|\Z)" segments = [] for match in re.finditer(pattern, script, re.DOTALL): hours, minutes, seconds, text = match.groups() timestamp_ms = (int(hours) 3600 + int(minutes) 60 + int(seconds)) 1000 segments.应用end({"timestamp": timestamp_ms, "text": text.strip()}) return segments
def synthesize_segment(text, voice_id, speed=1.0, pitch=0, vol=1.0): 响应 = 请求s.post( API_URL, headers={ "Authorization": f"Bearer {API_KEY}", "Content-Type": "应用/json", }, json={ "模型": "SenseAudio-TTS-1.0", "text": text, "流": False, "voice_设置ting": { "voice_id": voice_id, "speed": speed, "pitch": pitch, "vol": vol, }, "audio_设置ting": { "格式化": "mp3", "sample_rate": 32000, "bitrate": 128000, "channel": 2, }, }, timeout=60, ) 响应.rAIse_for_状态() data = 响应.json() return { "audio_bytes": binascii.unhexlify(data["data"]["audio"]), "duration_ms": data["extra_信息"]["audio_length"], "追踪_id": data.获取("追踪_id"), }
Local Assembly Pattern from pydub 导入 AudioSegment
def 创建_同步ed_narration(audio_segments, video_duration_ms): narration_追踪 = AudioSegment.silent(duration=video_duration_ms) for segment in audio_segments: 命令行工具p = AudioSegment.from_file(segment["file"]) narration_追踪 = narration_追踪.overlay(命令行工具p, position=segment["timestamp"]) return narration_追踪
Style Pre设置s Documentary: slower speed such as 0.95, neutral pitch Tutorial: speed near 1.0, slightly warmer pitch Commercial: modestly faster speed, slightly higher pitch
Prefer conservative tuning and script editing over extreme voice parameter changes.
输出 Options Per-segment narration 命令行工具ps in mp3 or wav Timing metadata in json Merged narration 追踪 for video editors Optional alternate takes with different styles Safety Notes Do not hardcode 凭证s. Do not assume local media 工具ing exists beyond what is declared here. Treat returned 追踪_id and 生成d narration as设置s as potentially sensitive production data.