首页龙虾技能列表 › Assembly Large Audio Transcriber — Assembly工具

Assembly Large Audio Transcriber — Assembly工具

v1.0.0

[AI辅助] Transcribe large audio files (100MB+, up to 1GB/12 hours) with speaker diarization. Uses AssemblyAI API with direct HTTP calls. Supports MP3, WAV, M4A, FLAC,...

0· 58·0 当前·0 累计
by @jiadong0723·MIT-0
下载技能包
License
MIT-0
最后更新
2026/4/9
安全扫描
VirusTotal
无害
查看报告
OpenClaw
安全
medium confidence
The skill does what it says (transcribes large audio via AssemblyAI and only requests an AssemblyAI API key); there are minor implementation inconsistencies and privacy/usability notes to consider before installing.
评估建议
This skill appears to be what it claims: it uploads audio to AssemblyAI and polls for a transcript, and it only needs your ASSEMBLYAI_API_KEY. Before installing or running it: 1) Do not share your API key with anyone—SKILL.md's suggestion to 'tell 许霸天 your API Key' is a social-engineering prompt and unnecessary. 2) Be aware audio is uploaded to AssemblyAI (read their privacy/TOS) — do not upload sensitive audio you don't want processed/stored by a third party. 3) Implementation notes: the bundle...
详细分析 ▾
用途与能力
Name/description match the code and SKILL.md: both call AssemblyAI endpoints and require ASSEMBLYAI_API_KEY. The requested credential is appropriate for the stated purpose.
指令范围
Instructions stay within transcription scope (upload, submit, poll, format, archive). Two small scope concerns: SKILL.md suggests writing archives to /workspace/memory/meetings/... but the included script writes a .transcript.json next to the input file (inconsistent); SKILL.md also includes conversational text asking the user to 'tell 许霸天 your API Key'—do not hand your API key to third parties.
安装机制
No install spec (instruction-only + one script) so nothing is written by an installer. SKILL.md suggests 'pip install requests' but the bundled script uses urllib (no requests required). This is an inconsistency but not a malicious install mechanism.
凭证需求
Only ASSEMBLYAI_API_KEY is required, which is proportional. Note: SKILL.md's text encouraging users to give their API key to the operator is a potential social-engineering risk—you should keep your key private and provision it yourself.
持久化与权限
always is false and the skill does not request system-wide changes or other skills' credentials. Autonomous invocation is enabled (default) but not combined with other red flags.
安全有层次,运行前请审查代码。

License

MIT-0

可自由使用、修改和再分发,无需署名。

运行时依赖

无特殊依赖

版本

latestv1.0.02026/4/9

- Initial release of assembly-large-audio-transcriber skill. - Transcribes large audio files (100MB–1GB, up to 12 hours) with speaker diarization using AssemblyAI API via HTTP calls. - Supports MP3, WAV, M4A, FLAC, OGG, and WEBM formats with no SDK dependencies (only requires `requests`). - Provides word-level timestamps, multi-language support (auto-detection), and detailed output including speaker and time segmentation. - Includes robust error handling and guidance for API limits and file splitting. - Output can be archived to a specified directory with structured Markdown or JSON formatting.

● 无害

安装命令 点击复制

官方npx clawhub@latest install jiadong-assembly-large-audio
镜像加速npx clawhub@latest install jiadong-assembly-large-audio --registry https://cn.clawhub-mirror.com

技能文档

Transcribe超大音频文件(100MB~1GB)专用方案,零SDK依赖,直接调HTTP API。

功能

  • 支持超大文件:最高 1GB / 12小时音频
  • 说话人分离(Speaker /B/C…)
  • 词级时间戳
  • 100+语言,自动检测
  • MP3 / WAV / M4A / FLAC / OGG / WEBM 支持

安装依赖

服务器执行(只需一次):

pip install requests

设置 API 键

在环境变量中设置:

export ASSEMBLYAI_API_KEY="your-key"

或告知许霸天你的 AssemblyAI API Key,我来配置。

免费额度:每月100分钟;付费约 $0.01/分钟。

使用方式

告诉许霸天:

用 AssemblyAI 转录 [文件路径]

支持本地文件和 URL。

技术方案

第一步:上传文件(针对大文件)

AssemblyAI 要求先上传获取 upload_url,再提交转录任务:

import requests, os, time

API_KEY = os.getenv("ASSEMBLYAI_API_KEY") HEADERS = {"authorization": API_KEY}

# 1. 上传文件获取 upload_url def upload_file(file_path): with open(file_path, "rb") as f: response = requests.post( "https://api.assemblyai.com/v2/upload", headers=HEADERS, data=f, timeout=300 ) response.raise_for_status() return response.json()["upload_url"]

# 2. 提交转录任务 def transcribe(upload_url, language="zh"): payload = { "audio_url": upload_url, "speaker_labels": True, "format_text": True, "language_code": language if language != "auto" else None, } if language == "auto": payload["language_detection"] = True response = requests.post( "https://api.assemblyai.com/v2/transcript", headers=HEADERS, json=payload, timeout=30 ) response.raise_for_status() return response.json()["id"]

# 3. 轮询结果 def wait_for_result(transcript_id, poll_interval=5, max_wait=3600): start = time.time() while True: result = requests.get( f"https://api.assemblyai.com/v2/transcript/{transcript_id}", headers=HEADERS, timeout=30 ) result.raise_for_status() data = result.json() status = data["status"] elapsed = time.time() - start if status == "completed": return data elif status == "error": raise Exception(f"Transcription error: {data.get('error')}") elif elapsed > max_wait: raise TimeoutError(f"Timeout after {max_wait}s") else: print(f"[{elapsed:.0f}s] Status: {status}...") time.sleep(poll_interval)

# 4. 完整流程 def transcribe_large_audio(file_path, language="auto"): print(f"上传中: {file_path}") upload_url = upload_file(file_path) print(f"提交转录任务...") tid = transcribe(upload_url, language) print(f"任务ID: {tid}") print("等待转录完成(可能需要数分钟)...") result = wait_for_result(tid) return result

处理结果

result = transcribe_large_audio("/path/to/meeting.mp3", language="zh")

# 打印带说话人的转录 for utt in result.get("utterances", []): speaker = utt.get("speaker", "?") text = utt.get("text", "") start = utt.get("start", 0) / 1000 # 毫秒→秒 print(f"[{speaker}] {start:.1f}s: {text}")

# 或打印纯文本 print(result.get("text", ""))

通过 URL 转录(如果文件已在网上)

如果文件可通过公网访问,直接提交 URL 更简单:

def transcribe_url(audio_url, language="zh"):
    payload = {
        "audio_url": audio_url,
        "speaker_labels": True,
        "language_detection": True,
    }
    response = requests.post(
        "https://api.assemblyai.com/v2/transcript",
        headers=HEADERS, json=payload, timeout=30
    )
    response.raise_for_status()
    tid = response.json()["id"]
    result = wait_for_result(tid)
    return result

完整使用示例

import json, sys

file_path = sys.argv[1] if len(sys.argv) > 1 else "meeting.mp3" language = sys.argv[2] if len(sys.argv) > 2 else "zh"

result = transcribe_large_audio(file_path, language)

output = { "file": file_path, "language": result.get("language_code"), "duration_s": result.get("audio_duration"), "transcript": result.get("text"), "utterances": [ { "speaker": u.get("speaker"), "start_s": round(u.get("start", 0) / 1000, 2), "end_s": round(u.get("end", 0) / 1000, 2), "text": u.get("text"), } for u in result.get("utterances", []) ] }

print(json.dumps(output, ensure_ascii=False, indent=2))

大文件处理流程(许霸天专用)

当用户提交超大音频文件时,按以下步骤执行:

  • 确认文件路径和大小
  • 确认 ASSEMBLYAI_API_KEY 已配置
  • 执行上面的 transcribe_large_audio() 流程
  • 轮询直到完成
  • 整理输出:按时间顺序输出每句话,带说话人和时间戳
  • 写文件存档:/workspace/memory/meetings/{日期}-{会议名}_原始转录.md

错误处理

错误原因解决
401 UnauthorizedAPI Key 无效或未设置检查 ASSEMBLYAI_API_KEY
413 Payload Too Large文件超 1GB需分割文件
422 Unprocessable Entity音频格式不支持用 ffmpeg 转换格式
429 Rate Limit超出并发限制等待后重试,降低轮询频率

文件分割(如果单文件超过1GB)

如遇 1GB 限制,用以下方式分割:

ffmpeg -i large.mp3 -ss 00:00:00 -to 01:00:00 -c copy part1.mp3
ffmpeg -i large.mp3 -ss 01:00:00 -c copy part2.mp3

再分别转录,最后拼接结果。

数据来源:ClawHub ↗ · 中文优化:龙虾技能库
OpenClaw 技能定制 / 插件定制 / 私有工作流定制

免费技能或插件可能存在安全风险,如需更匹配、更安全的方案,建议联系付费定制

了解定制服务