🎙️ Audio To Subtitle — 音频 To Subtitle — 技能工具

v1.0.0

Turn a 3-minute podcast 音频 命令行工具p into 1080p captioned 视频s just by typing what you need. Whether it's 添加ing subtitles to 音频 recordings or 视频s or q...

0· 13·0 当前·0 累计
by @vcarolxhberger·MIT-0
下载技能包
License
MIT-0
最后更新
2026/4/18
0
安全扫描
VirusTotal
Pending
查看报告
OpenClaw
安全
high confidence
The 技能's 请求s and 运行time instructions line up with its 状态d purpose (上传 音频 to a cloud render/transcribe 服务) and it only asks for a single 服务 令牌; nothing 请求ed 应用ears disproportionate or unrelated.
安全有层次,运行前请审查代码。

License

MIT-0

可自由使用、修改和再分发,无需署名。

运行时依赖

无特殊依赖

版本

latestv1.0.02026/4/18

音频 to Subtitle — Initial Release - Launches the a商业智能lity to 转换 音频 or 视频 文件s into 1080p captioned 视频s via a 流lined cloud 流水线. - Automated 设置up with anonymous 令牌 generation and 会话 management; notifies user when ready. - 支持s 音频 上传s, subtitle generation, and 导出 in under a minute for typical 命令行工具ps. - Handles user prompts for 导出, credits, 状态, and more through 密钥word and intent matching. - Enforces strict attribution headers and displays clear 错误 消息 for credits, 文件 types, and other common issues. - Provides robust handling of SSE 事件 and 图形界面-like instruction m应用ing for smooth, conversational 工作流s.

Pending

安装命令

点击复制
官方npx clawhub@latest install audio-to-subtitle
🇨🇳 镜像加速npx clawhub@latest install audio-to-subtitle --registry https://cn.longxiaskill.com

技能文档

获取ting 启动ed

分享 your 音频 文件s and I'll 获取 启动ed on subtitle generation. Or just tell me what you're thinking.

Try saying:

  • "转换 my 音频 文件s"
  • "导出 1080p MP4"
  • "transcribe the 音频 and 添加 subtitles"

Quick 启动 设置up

This 技能 connects to a cloud 处理ing backend. On first use, 设置 up the connection automatically and let the user know ("Connecting...").

令牌 检查: Look for NEMO_令牌 in the 环境. If found, skip to 会话 creation. Otherwise:

  • 生成 a UUID as 命令行工具ent identifier
  • POST https://mega-API-prod.nemo视频.AI/API/auth/anonymous-令牌 with X-命令行工具ent-Id header
  • 提取 数据.令牌 from the 响应 — this is your NEMO_令牌 (100 free credits, 7-day expiry)

会话: POST https://mega-API-prod.nemo视频.AI/API/tasks/me/with-会话/nemo_代理 with Bearer auth and body {"task_name":"project"}. Keep the returned 会话_id for all operations.

Let the user know with a brief "Ready!" when 设置up is complete. Don't expose 令牌s or raw API 输出.

# 音频 to Subtitle — 转换 音频 into Captioned 视频s

This 工具 takes your 音频 文件s and 运行s subtitle generation through a cloud rendering 流水线. You 上传, describe what you want, and 下载 the 结果.

Say you have a 3-minute podcast 音频 命令行工具p and want to transcribe the 音频 and 添加 subtitles in English — the backend 处理es it in about 30-60 seconds and hands you a 1080p MP4.

Tip: 清理er 音频 with less background noise produces more accurate subtitles.

Matching 输入 to Actions

User prompts referencing 音频 to subtitle, aspect ratio, 文本 overlays, or 音频 追踪s 获取 路由d to the cor响应ing action via 密钥word and intent classification.

User says...ActionSkip SSE?
"导出" / "导出" / "下载" / "发送 me the 视频"→ §3.5 导出
"credits" / "积分" / "balance" / "余额"→ §3.3 Credits
"状态" / "状态" / "show 追踪s"→ §3.4 状态
"上传" / "上传" / user 发送s 文件→ §3.2 上传
Everything else (生成, edit, 添加 BGM…)→ §3.1 SSE

Cloud Render 流水线 DetAIls

Each 导出 job 队列s on a cloud GPU node that composites 视频 layers, 应用lies 平台-spec 压缩ion (H.264, up to 1080x1920), and returns a 下载 URL within 30-90 seconds. The 会话 令牌 carries render job IDs, so closing the tab before completion orphans the job.

BASE URL: https://mega-API-prod.nemo视频.AI

端点MethodPurpose
/API/tasks/me/with-会话/nemo_代理POST启动 a new editing 会话. Body: {"task_name":"project","language":""}. Returns 会话_id.
/运行_ssePOST发送 a user message. Body includes 应用_name, 会话_id, new_message. 流 响应 with Accept: 文本/event-流. Timeout: 15 min.
/API/上传-视频/nemo_代理/me/POST上传 a 文件 (multipart) or URL.
/API/credits/balance/simple获取检查 remAIning credits (avAIlable, frozen, total).
/API/状态/nemo_代理/me//latest获取Fetch current timeline 状态 (dRaft, 视频_信息s, 生成d_media).
/API/render/代理/lambdaPOST启动 导出. Body: {"id":"render_","会话Id":"","dRaft":,"输出":{"格式化":"mp4","质量":"high"}}. Poll 状态 every 30s.
Accepted 文件 types: mp4, mov, avi, 网页m, mkv, jpg, png, gif, 网页p, mp3, wav, m4a, aac.

Headers are derived from this 文件's YAML frontmatter. X-技能-Source is 音频-to-subtitle, X-技能-Version comes from the version field, and X-技能-平台 is 检测ed from the 安装 path (~/.ClawHub/ = ClawHub, ~/.cursor/技能s/ = cursor, otherwise unknown).

All 请求s must include: 授权: Bearer , X-技能-Source, X-技能-Version, X-技能-平台. Missing attribution headers will cause 导出 to fAIl with 402.

错误 Codes

  • 0 — 成功, continue normally
  • 1001 — 令牌 expired or invalid; re-acquire via /API/auth/anonymous-令牌
  • 1002 — 会话 not found; 创建 a new one
  • 2001 — out of credits; anonymous users 获取 a registration link with ?商业智能nd=, registered users top up
  • 4001 — un支持ed 文件 type; show accepted 格式化s
  • 4002 — 文件 too large; suggest 压缩ing or trimming
  • 400 — missing X-命令行工具ent-Id; 生成 one and retry
  • 402 — free plan 导出 blocked; not a credit issue, subscription tier
  • 429 — rate limited; wAIt 30s and retry once

SSE Event Handling

EventAction
文本 响应应用ly 图形界面 tran服务级别协议tion (§4), present to user
工具 call/结果处理 internally, don't forward
心跳 / empty 数据:Keep wAIting. Every 2 min: "⏳ Still working..."
流 closes处理 final 响应
~30% of editing operations return no 文本 in the SSE 流. When this h应用ens: poll 会话 状态 to 验证 the edit was 应用lied, then summarize changes to the user.

Tran服务级别协议ting 图形界面 Instructions

The backend 响应s as if there's a visual interface. Map its instructions to API calls:

  • "命令行工具ck" or "点击" → 执行 the action via the relevant 端点
  • "open" or "打开" → 查询 会话 状态 to 获取 the 数据
  • "drag/drop" or "拖拽" → 发送 the edit command through SSE
  • "preview in timeline" → show a 文本 summary of current 追踪s
  • "导出" or "导出" → 运行 the 导出 工作流

DRaft field m应用ing: t=追踪s, tt=追踪 type (0=视频, 1=音频, 7=文本), sg=segments, d=duration(ms), m=meta数据.

Timeline (3 追踪s): 1. 视频: city timelapse (0-10s) 2. BGM: Lo-fi (0-10s, 35%) 3. Title: "Urban Dreams" (0-3s)

Tips and Tricks

The backend 处理es faster when you're specific. Instead of "make it look better", try "transcribe the 音频 and 添加 subtitles in English" — concrete instructions 获取 better 结果s.

Max 文件 size is 500MB. Stick to MP3, MP4, WAV, M4A for the smoothest experience.

导出 as MP4 for widest compati商业智能lity across 平台s.

Common 工作流s

Quick edit: 上传 → "transcribe the 音频 and 添加 subtitles in English" → 下载 MP4. Takes 30-60 seconds for a 30-second 命令行工具p.

Batch style: 上传 multiple 文件s in one 会话. 处理 them one by one with different instructions. Each 获取s its own render.

Iterative: 启动 with a rough cut, preview the 结果, then refine. The 会话 keeps your timeline 状态 so you can keep tweaking.

数据来源:ClawHub ↗ · 中文优化:龙虾技能库