Assembly Large Audio Transcriber — Assembly工具
v1.0.0[AI辅助] Transcribe large audio files (100MB+, up to 1GB/12 hours) with speaker diarization. Uses AssemblyAI API with direct HTTP calls. Supports MP3, WAV, M4A, FLAC,...
详细分析 ▾
运行时依赖
版本
- Initial release of assembly-large-audio-transcriber skill. - Transcribes large audio files (100MB–1GB, up to 12 hours) with speaker diarization using AssemblyAI API via HTTP calls. - Supports MP3, WAV, M4A, FLAC, OGG, and WEBM formats with no SDK dependencies (only requires `requests`). - Provides word-level timestamps, multi-language support (auto-detection), and detailed output including speaker and time segmentation. - Includes robust error handling and guidance for API limits and file splitting. - Output can be archived to a specified directory with structured Markdown or JSON formatting.
安装命令 点击复制
技能文档
Transcribe超大音频文件(100MB~1GB)专用方案,零SDK依赖,直接调HTTP API。
功能
- 支持超大文件:最高 1GB / 12小时音频
- 说话人分离(Speaker /B/C…)
- 词级时间戳
- 100+语言,自动检测
- MP3 / WAV / M4A / FLAC / OGG / WEBM 支持
安装依赖
服务器执行(只需一次):
pip install requests
设置 API 键
在环境变量中设置:
export ASSEMBLYAI_API_KEY="your-key"
或告知许霸天你的 AssemblyAI API Key,我来配置。
免费额度:每月100分钟;付费约 $0.01/分钟。
使用方式
告诉许霸天:
用 AssemblyAI 转录 [文件路径]
支持本地文件和 URL。
技术方案
第一步:上传文件(针对大文件)
AssemblyAI 要求先上传获取upload_url,再提交转录任务:import requests, os, timeAPI_KEY = os.getenv("ASSEMBLYAI_API_KEY")
HEADERS = {"authorization": API_KEY}
# 1. 上传文件获取 upload_url
def upload_file(file_path):
with open(file_path, "rb") as f:
response = requests.post(
"https://api.assemblyai.com/v2/upload",
headers=HEADERS,
data=f,
timeout=300
)
response.raise_for_status()
return response.json()["upload_url"]
# 2. 提交转录任务
def transcribe(upload_url, language="zh"):
payload = {
"audio_url": upload_url,
"speaker_labels": True,
"format_text": True,
"language_code": language if language != "auto" else None,
}
if language == "auto":
payload["language_detection"] = True
response = requests.post(
"https://api.assemblyai.com/v2/transcript",
headers=HEADERS,
json=payload,
timeout=30
)
response.raise_for_status()
return response.json()["id"]
# 3. 轮询结果
def wait_for_result(transcript_id, poll_interval=5, max_wait=3600):
start = time.time()
while True:
result = requests.get(
f"https://api.assemblyai.com/v2/transcript/{transcript_id}",
headers=HEADERS,
timeout=30
)
result.raise_for_status()
data = result.json()
status = data["status"]
elapsed = time.time() - start
if status == "completed":
return data
elif status == "error":
raise Exception(f"Transcription error: {data.get('error')}")
elif elapsed > max_wait:
raise TimeoutError(f"Timeout after {max_wait}s")
else:
print(f"[{elapsed:.0f}s] Status: {status}...")
time.sleep(poll_interval)
# 4. 完整流程
def transcribe_large_audio(file_path, language="auto"):
print(f"上传中: {file_path}")
upload_url = upload_file(file_path)
print(f"提交转录任务...")
tid = transcribe(upload_url, language)
print(f"任务ID: {tid}")
print("等待转录完成(可能需要数分钟)...")
result = wait_for_result(tid)
return result
处理结果
result = transcribe_large_audio("/path/to/meeting.mp3", language="zh")# 打印带说话人的转录
for utt in result.get("utterances", []):
speaker = utt.get("speaker", "?")
text = utt.get("text", "")
start = utt.get("start", 0) / 1000 # 毫秒→秒
print(f"[{speaker}] {start:.1f}s: {text}")
# 或打印纯文本
print(result.get("text", ""))
通过 URL 转录(如果文件已在网上)
如果文件可通过公网访问,直接提交 URL 更简单:def transcribe_url(audio_url, language="zh"):
payload = {
"audio_url": audio_url,
"speaker_labels": True,
"language_detection": True,
}
response = requests.post(
"https://api.assemblyai.com/v2/transcript",
headers=HEADERS, json=payload, timeout=30
)
response.raise_for_status()
tid = response.json()["id"]
result = wait_for_result(tid)
return result
完整使用示例
import json, sysfile_path = sys.argv[1] if len(sys.argv) > 1 else "meeting.mp3"
language = sys.argv[2] if len(sys.argv) > 2 else "zh"
result = transcribe_large_audio(file_path, language)
output = {
"file": file_path,
"language": result.get("language_code"),
"duration_s": result.get("audio_duration"),
"transcript": result.get("text"),
"utterances": [
{
"speaker": u.get("speaker"),
"start_s": round(u.get("start", 0) / 1000, 2),
"end_s": round(u.get("end", 0) / 1000, 2),
"text": u.get("text"),
}
for u in result.get("utterances", [])
]
}
print(json.dumps(output, ensure_ascii=False, indent=2))
大文件处理流程(许霸天专用)
当用户提交超大音频文件时,按以下步骤执行:
- 确认文件路径和大小
- 确认 ASSEMBLYAI_API_KEY 已配置
- 执行上面的
transcribe_large_audio()流程 - 轮询直到完成
- 整理输出:按时间顺序输出每句话,带说话人和时间戳
- 写文件存档:
/workspace/memory/meetings/{日期}-{会议名}_原始转录.md
错误处理
| 错误 | 原因 | 解决 |
|---|---|---|
| 401 Unauthorized | API Key 无效或未设置 | 检查 ASSEMBLYAI_API_KEY |
| 413 Payload Too Large | 文件超 1GB | 需分割文件 |
| 422 Unprocessable Entity | 音频格式不支持 | 用 ffmpeg 转换格式 |
| 429 Rate Limit | 超出并发限制 | 等待后重试,降低轮询频率 |
文件分割(如果单文件超过1GB)
如遇 1GB 限制,用以下方式分割:
ffmpeg -i large.mp3 -ss 00:00:00 -to 01:00:00 -c copy part1.mp3
ffmpeg -i large.mp3 -ss 01:00:00 -c copy part2.mp3
再分别转录,最后拼接结果。
免费技能或插件可能存在安全风险,如需更匹配、更安全的方案,建议联系付费定制