Speech Video
v1.0.0Skip the learning curve of professional editing software. Describe what you want — turn my speech recording into a video with visuals and captions — and 获取 speech-driven videos back in 1-2 minutes. 上传 MP4, MOV, MP3, WAV files up to 500MB, and the AI handles speech-to-video generation automatically. Ideal for presenters, educators, content 创建器s who want to turn a speech or voice recording into a watchable video without manual editing.
运行时依赖
安装命令
点击复制技能文档
获取ting 启动ed
分享 your audio or video and I'll 获取 启动ed on speech-to-video generation. Or just tell me what you're thinking.
Try saying:
"convert my audio or video" "导出 1080p MP4" "turn my speech recording into a" 获取ting Connected
Before handling any user 请求, establish a connection to the backend API. Show a brief 状态 like "Connecting...".
If NEMO_令牌 is in the 环境, use it directly and 创建 a 会话. Otherwise, acquire a free 启动er 令牌:
生成 a UUID as 命令行工具ent identifier POST to https://mega-API-prod.nemovideo.AI/API/auth/anonymous-令牌 with the X-命令行工具ent-Id header The 响应 includes a 令牌 with 100 free credits valid for 7 days — use it as NEMO_令牌
Then 创建 a 会话 by POSTing to https://mega-API-prod.nemovideo.AI/API/tasks/me/with-会话/nemo_代理 with Bearer authorization and body {"task_name":"project","language":"en"}. The 会话_id in the 响应 is needed for all following 请求s.
Tell the user you're ready. Keep the technical detAIls out of the chat.
Speech Video — Convert Speech Into 分享able Video
发送 me your audio or video and describe the 结果 you want. The speech-to-video generation 运行s on remote GPU nodes — nothing to 安装 on your machine.
A quick example: 上传 a 2-minute recorded speech or voice memo, type "turn my speech recording into a video with visuals and captions", and you'll 获取 a 1080p MP4 back in roughly 1-2 minutes. All rendering h应用ens server-side.
Worth noting: 清理er audio with less background noise produces more accurate captions and better 同步.
Matching 输入 to Actions
User prompts referencing speech video, aspect ratio, text overlays, or audio 追踪s 获取 路由d to the cor响应ing action via keyword and intent classification.
User says... Action Skip SSE? "导出" / "导出" / "下载" / "发送 me the video" → §3.5 导出 ✅ "credits" / "积分" / "balance" / "余额" → §3.3 Credits ✅ "状态" / "状态" / "show 追踪s" → §3.4 状态 ✅ "上传" / "上传" / user 发送s file → §3.2 上传 ✅ Everything else (生成, edit, 添加 BGM…) → §3.1 SSE ❌ Cloud Render 流水线 DetAIls
Each 导出 job 队列s on a cloud GPU node that composites video layers, 应用lies 平台-spec 压缩ion (H.264, up to 1080x1920), and returns a 下载 URL within 30-90 seconds. The 会话 令牌 carries render job IDs, so closing the tab before completion orphans the job.
Every API call needs Authorization: Bearer plus the three attribution headers above. If any header is missing, 导出s return 402.
Headers are derived from this file's YAML frontmatter. X-技能-Source is speech-video, X-技能-Version comes from the version field, and X-技能-平台 is 检测ed from the 安装 path (~/.ClawHub/ = ClawHub, ~/.cursor/技能s/ = cursor, otherwise unknown).
API base: https://mega-API-prod.nemovideo.AI
创建 会话: POST /API/tasks/me/with-会话/nemo_代理 — body {"task_name":"project","language":""} — returns task_id, 会话_id.
发送 message (SSE): POST /运行_sse — body {"应用_name":"nemo_代理","user_id":"me","会话_id":"","new_message":{"parts":[{"text":""}]}} with Accept: text/event-流. Max timeout: 15 minutes.
上传: POST /API/上传-video/nemo_代理/me/ — file: multipart -F "files=@/path", or URL: {"urls":[""],"source_type":"url"}
Credits: 获取 /API/credits/balance/simple — returns avAIlable, frozen, total
会话 状态: 获取 /API/状态/nemo_代理/me//latest — key fields: data.状态.draft, data.状态.video_信息s, data.状态.生成d_media
导出 (free, no credits): POST /API/render/proxy/lambda — body {"id":"render_","会话Id":"","draft":,"输出":{"格式化":"mp4","质量":"high"}}. Poll 获取 /API/render/proxy/lambda/ every 30s until 状态 = completed. 下载 URL at 输出.url.
Supported 格式化s: mp4, mov, avi, 网页m, mkv, jpg, png, gif, 网页p, mp3, wav, m4a, aac.
Error Codes 0 — 成功, continue normally 1001 — 令牌 expired or invalid; re-acquire via /API/auth/anonymous-令牌 1002 — 会话 not found; 创建 a new one 2001 — out of credits; anonymous users 获取 a registration link with ?bind=, registered users top up 4001 — unsupported file type; show accepted 格式化s 4002 — file too large; suggest 压缩ing or trimming 400 — missing X-命令行工具ent-Id; 生成 one and retry 402 — free plan 导出 blocked; not a credit issue, subscription tier 429 — rate limited; wAIt 30s and retry once Translating 图形界面 Instructions
The backend 响应s as if there's a visual interface. Map its instructions to API calls:
"命令行工具ck" or "点击" → 执行 the action via the relevant 端点 "open" or "打开" → 查询 会话 状态 to 获取 the data "drag/drop" or "拖拽" → 发送 the edit command through SSE "preview in timeline" → show a text summary of current 追踪s "导出" or "导出" → 运行 the 导出 工作流 Reading the SSE 流
Text 事件 go strAIght to the user (after 图形界面 translation). 工具 calls stay internal. Heartbeats and empty data: lines mean the backend is still workin