Assembly Large Audio Transcriber — Assembly工具

Name: Assembly Large Audio Transcriber — Assembly工具
Author: jiadong0723

jiadong0723

Assembly Large Audio Transcriber — Assembly工具

v1.0.0

[AI辅助] Transcribe large audio files (100MB+, up to 1GB/12 hours) with speaker diarization. Uses AssemblyAI API with direct HTTP calls. Supports MP3, WAV, M4A, FLAC,...

0· 58·0 当前·0 累计

by @jiadong0723·MIT-0

API工具 AI模型访问网络工具文件处理开发工具

下载技能包

License

MIT-0

最后更新

2026/4/9

安全扫描

VirusTotal

无害

查看报告

OpenClaw

安全

medium confidence

The skill does what it says (transcribes large audio via AssemblyAI and only requests an AssemblyAI API key); there are minor implementation inconsistencies and privacy/usability notes to consider before installing.

评估建议

This skill appears to be what it claims: it uploads audio to AssemblyAI and polls for a transcript, and it only needs your ASSEMBLYAI_API_KEY. Before installing or running it: 1) Do not share your API key with anyone—SKILL.md's suggestion to 'tell 许霸天 your API Key' is a social-engineering prompt and unnecessary. 2) Be aware audio is uploaded to AssemblyAI (read their privacy/TOS) — do not upload sensitive audio you don't want processed/stored by a third party. 3) Implementation notes: the bundle...

详细分析 ▾

✓ 用途与能力

Name/description match the code and SKILL.md: both call AssemblyAI endpoints and require ASSEMBLYAI_API_KEY. The requested credential is appropriate for the stated purpose.

ℹ 指令范围

Instructions stay within transcription scope (upload, submit, poll, format, archive). Two small scope concerns: SKILL.md suggests writing archives to /workspace/memory/meetings/... but the included script writes a .transcript.json next to the input file (inconsistent); SKILL.md also includes conversational text asking the user to 'tell 许霸天 your API Key'—do not hand your API key to third parties.

✓ 安装机制

No install spec (instruction-only + one script) so nothing is written by an installer. SKILL.md suggests 'pip install requests' but the bundled script uses urllib (no requests required). This is an inconsistency but not a malicious install mechanism.

✓ 凭证需求

Only ASSEMBLYAI_API_KEY is required, which is proportional. Note: SKILL.md's text encouraging users to give their API key to the operator is a potential social-engineering risk—you should keep your key private and provision it yourself.

✓ 持久化与权限

always is false and the skill does not request system-wide changes or other skills' credentials. Autonomous invocation is enabled (default) but not combined with other red flags.

安全有层次，运行前请审查代码。

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

无特殊依赖

版本

latestv1.0.02026/4/9

- Initial release of assembly-large-audio-transcriber skill. - Transcribes large audio files (100MB–1GB, up to 12 hours) with speaker diarization using AssemblyAI API via HTTP calls. - Supports MP3, WAV, M4A, FLAC, OGG, and WEBM formats with no SDK dependencies (only requires `requests`). - Provides word-level timestamps, multi-language support (auto-detection), and detailed output including speaker and time segmentation. - Includes robust error handling and guidance for API limits and file splitting. - Output can be archived to a specified directory with structured Markdown or JSON formatting.

● 无害

安装命令点击复制

官方npx clawhub@latest install jiadong-assembly-large-audio

镜像加速npx clawhub@latest install jiadong-assembly-large-audio --registry https://cn.clawhub-mirror.com

技能文档

Transcribe超大音频文件（100MB~1GB）专用方案，零SDK依赖，直接调HTTP API。

功能

支持超大文件：最高 1GB / 12小时音频
说话人分离（Speaker /B/C…）
词级时间戳
100+语言，自动检测
MP3 / WAV / M4A / FLAC / OGG / WEBM 支持

安装依赖

服务器执行（只需一次）：

pip install requests

设置 API 键

在环境变量中设置：

export ASSEMBLYAI_API_KEY="your-key"

或告知许霸天你的 AssemblyAI API Key，我来配置。

免费额度：每月100分钟；付费约 $0.01/分钟。

使用方式

告诉许霸天：

用 AssemblyAI 转录 [文件路径]

支持本地文件和 URL。

技术方案

第一步：上传文件（针对大文件）

AssemblyAI 要求先上传获取 upload_url，再提交转录任务：

import requests, os, time
API_KEY = os.getenv("ASSEMBLYAI_API_KEY")
HEADERS = {"authorization": API_KEY}
# 1. 上传文件获取 upload_url
def upload_file(file_path):
    with open(file_path, "rb") as f:
        response = requests.post(
            "https://api.assemblyai.com/v2/upload",
            headers=HEADERS,
            data=f,
            timeout=300
        )
    response.raise_for_status()
    return response.json()["upload_url"]
# 2. 提交转录任务
def transcribe(upload_url, language="zh"):
    payload = {
        "audio_url": upload_url,
        "speaker_labels": True,
        "format_text": True,
        "language_code": language if language != "auto" else None,
    }
    if language == "auto":
        payload["language_detection"] = True
    response = requests.post(
        "https://api.assemblyai.com/v2/transcript",
        headers=HEADERS,
        json=payload,
        timeout=30
    )
    response.raise_for_status()
    return response.json()["id"]
# 3. 轮询结果
def wait_for_result(transcript_id, poll_interval=5, max_wait=3600):
    start = time.time()
    while True:
        result = requests.get(
            f"https://api.assemblyai.com/v2/transcript/{transcript_id}",
            headers=HEADERS,
            timeout=30
        )
        result.raise_for_status()
        data = result.json()
        status = data["status"]
        elapsed = time.time() - start
        if status == "completed":
            return data
        elif status == "error":
            raise Exception(f"Transcription error: {data.get('error')}")
        elif elapsed > max_wait:
            raise TimeoutError(f"Timeout after {max_wait}s")
        else:
            print(f"[{elapsed:.0f}s] Status: {status}...")
            time.sleep(poll_interval)# 4. 完整流程
def transcribe_large_audio(file_path, language="auto"):
    print(f"上传中: {file_path}")
    upload_url = upload_file(file_path)
    print(f"提交转录任务...")
    tid = transcribe(upload_url, language)
    print(f"任务ID: {tid}")
    print("等待转录完成（可能需要数分钟）...")
    result = wait_for_result(tid)
    return result

处理结果

result = transcribe_large_audio("/path/to/meeting.mp3", language="zh")
# 打印带说话人的转录
for utt in result.get("utterances", []):
    speaker = utt.get("speaker", "?")
    text = utt.get("text", "")
    start = utt.get("start", 0) / 1000  # 毫秒→秒
    print(f"[{speaker}] {start:.1f}s: {text}")# 或打印纯文本
print(result.get("text", ""))

通过 URL 转录（如果文件已在网上）

如果文件可通过公网访问，直接提交 URL 更简单：

def transcribe_url(audio_url, language="zh"):
    payload = {
        "audio_url": audio_url,
        "speaker_labels": True,
        "language_detection": True,
    }
    response = requests.post(
        "https://api.assemblyai.com/v2/transcript",
        headers=HEADERS, json=payload, timeout=30
    )
    response.raise_for_status()
    tid = response.json()["id"]
    result = wait_for_result(tid)
    return result

完整使用示例

import json, sys
file_path = sys.argv[1] if len(sys.argv) > 1 else "meeting.mp3"
language = sys.argv[2] if len(sys.argv) > 2 else "zh"
result = transcribe_large_audio(file_path, language)
output = {
    "file": file_path,
    "language": result.get("language_code"),
    "duration_s": result.get("audio_duration"),
    "transcript": result.get("text"),
    "utterances": [
        {
            "speaker": u.get("speaker"),
            "start_s": round(u.get("start", 0) / 1000, 2),
            "end_s": round(u.get("end", 0) / 1000, 2),
            "text": u.get("text"),
        }
        for u in result.get("utterances", [])
    ]
}print(json.dumps(output, ensure_ascii=False, indent=2))

大文件处理流程（许霸天专用）

当用户提交超大音频文件时，按以下步骤执行：

确认文件路径和大小
确认 ASSEMBLYAI_API_KEY 已配置
执行上面的 transcribe_large_audio() 流程
轮询直到完成
整理输出：按时间顺序输出每句话，带说话人和时间戳
写文件存档：/workspace/memory/meetings/{日期}-{会议名}_原始转录.md

错误处理

错误	原因	解决
401 Unauthorized	API Key 无效或未设置	检查 ASSEMBLYAI_API_KEY
413 Payload Too Large	文件超 1GB	需分割文件
422 Unprocessable Entity	音频格式不支持	用 ffmpeg 转换格式
429 Rate Limit	超出并发限制	等待后重试，降低轮询频率

文件分割（如果单文件超过1GB）

如遇 1GB 限制，用以下方式分割：

ffmpeg -i large.mp3 -ss 00:00:00 -to 01:00:00 -c copy part1.mp3
ffmpeg -i large.mp3 -ss 01:00:00 -c copy part2.mp3

再分别转录，最后拼接结果。

Transcribe超大音频文件（100MB~1GB）专用方案，零SDK依赖，直接调HTTP API。

功能

支持超大文件：最高 1GB / 12小时音频
说话人分离（Speaker A/B/C…）
词级时间戳
100+语言，自动检测
MP3 / WAV / M4A / FLAC / OGG / WEBM 支持

安装依赖

服务器执行（只需一次）：

pip install requests

设置 API Key

在环境变量中设置：

export ASSEMBLYAI_API_KEY="your-key"

或告知许霸天你的 AssemblyAI API Key，我来配置。

免费额度：每月100分钟；付费约 $0.01/分钟。

使用方式

告诉许霸天：

用 AssemblyAI 转录 [文件路径]

支持本地文件和 URL。

技术方案

第一步：上传文件（针对大文件）

AssemblyAI 要求先上传获取 upload_url，再提交转录任务：

import requests, os, time
API_KEY = os.getenv("ASSEMBLYAI_API_KEY")
HEADERS = {"authorization": API_KEY}
# 1. 上传文件获取 upload_url
def upload_file(file_path):
    with open(file_path, "rb") as f:
        response = requests.post(
            "https://api.assemblyai.com/v2/upload",
            headers=HEADERS,
            data=f,
            timeout=300
        )
    response.raise_for_status()
    return response.json()["upload_url"]
# 2. 提交转录任务
def transcribe(upload_url, language="zh"):
    payload = {
        "audio_url": upload_url,
        "speaker_labels": True,
        "format_text": True,
        "language_code": language if language != "auto" else None,
    }
    if language == "auto":
        payload["language_detection"] = True
    response = requests.post(
        "https://api.assemblyai.com/v2/transcript",
        headers=HEADERS,
        json=payload,
        timeout=30
    )
    response.raise_for_status()
    return response.json()["id"]
# 3. 轮询结果
def wait_for_result(transcript_id, poll_interval=5, max_wait=3600):
    start = time.time()
    while True:
        result = requests.get(
            f"https://api.assemblyai.com/v2/transcript/{transcript_id}",
            headers=HEADERS,
            timeout=30
        )
        result.raise_for_status()
        data = result.json()
        status = data["status"]
        elapsed = time.time() - start
        if status == "completed":
            return data
        elif status == "error":
            raise Exception(f"Transcription error: {data.get('error')}")
        elif elapsed > max_wait:
            raise TimeoutError(f"Timeout after {max_wait}s")
        else:
            print(f"[{elapsed:.0f}s] Status: {status}...")
            time.sleep(poll_interval)# 4. 完整流程
def transcribe_large_audio(file_path, language="auto"):
    print(f"上传中: {file_path}")
    upload_url = upload_file(file_path)
    print(f"提交转录任务...")
    tid = transcribe(upload_url, language)
    print(f"任务ID: {tid}")
    print("等待转录完成（可能需要数分钟）...")
    result = wait_for_result(tid)
    return result

处理结果

result = transcribe_large_audio("/path/to/meeting.mp3", language="zh")
# 打印带说话人的转录
for utt in result.get("utterances", []):
    speaker = utt.get("speaker", "?")
    text = utt.get("text", "")
    start = utt.get("start", 0) / 1000  # 毫秒→秒
    print(f"[{speaker}] {start:.1f}s: {text}")# 或打印纯文本
print(result.get("text", ""))

通过 URL 转录（如果文件已在网上）

如果文件可通过公网访问，直接提交 URL 更简单：

def transcribe_url(audio_url, language="zh"):
    payload = {
        "audio_url": audio_url,
        "speaker_labels": True,
        "language_detection": True,
    }
    response = requests.post(
        "https://api.assemblyai.com/v2/transcript",
        headers=HEADERS, json=payload, timeout=30
    )
    response.raise_for_status()
    tid = response.json()["id"]
    result = wait_for_result(tid)
    return result

完整使用示例

import json, sys
file_path = sys.argv[1] if len(sys.argv) > 1 else "meeting.mp3"
language = sys.argv[2] if len(sys.argv) > 2 else "zh"
result = transcribe_large_audio(file_path, language)
output = {
    "file": file_path,
    "language": result.get("language_code"),
    "duration_s": result.get("audio_duration"),
    "transcript": result.get("text"),
    "utterances": [
        {
            "speaker": u.get("speaker"),
            "start_s": round(u.get("start", 0) / 1000, 2),
            "end_s": round(u.get("end", 0) / 1000, 2),
            "text": u.get("text"),
        }
        for u in result.get("utterances", [])
    ]
}print(json.dumps(output, ensure_ascii=False, indent=2))

大文件处理流程（许霸天专用）

当用户提交超大音频文件时，按以下步骤执行：

确认文件路径和大小
确认 ASSEMBLYAI_API_KEY 已配置
执行上面的 transcribe_large_audio() 流程
轮询直到完成
整理输出：按时间顺序输出每句话，带说话人和时间戳
写文件存档：/workspace/memory/meetings/{日期}-{会议名}_原始转录.md

错误处理

错误	原因	解决
401 Unauthorized	API Key 无效或未设置	检查 ASSEMBLYAI_API_KEY
413 Payload Too Large	文件超 1GB	需分割文件
422 Unprocessable Entity	音频格式不支持	用 ffmpeg 转换格式
429 Rate Limit	超出并发限制	等待后重试，降低轮询频率

文件分割（如果单文件超过1GB）

如遇 1GB 限制，用以下方式分割：

ffmpeg -i large.mp3 -ss 00:00:00 -to 01:00:00 -c copy part1.mp3
ffmpeg -i large.mp3 -ss 01:00:00 -c copy part2.mp3

再分别转录，最后拼接结果。

数据来源：ClawHub ↗ · 中文优化：龙虾技能库

OpenClaw 技能定制 / 插件定制 / 私有工作流定制

免费技能或插件可能存在安全风险，如需更匹配、更安全的方案，建议联系付费定制

了解定制服务

License

运行时依赖

版本

安装命令 点击复制

技能文档

功能

安装依赖

设置 API 键

使用方式

技术方案

第一步：上传文件（针对大文件）

处理结果

通过 URL 转录（如果文件已在网上）

完整使用示例

大文件处理流程（许霸天专用）

错误处理

文件分割（如果单文件超过1GB）

功能

安装依赖

设置 API Key

使用方式

技术方案

第一步：上传文件（针对大文件）

处理结果

通过 URL 转录（如果文件已在网上）

完整使用示例

大文件处理流程（许霸天专用）

错误处理

文件分割（如果单文件超过1GB）

安装命令点击复制