MAI Transcribe

Name: MAI Transcribe
Author: robotsbuildrobots

robotsbuildrobots

🪟 MAI Transcribe

v0.1.1

Transcribe audio with Microsoft's MAI-Transcribe-1 model via Azure AI Speech.

0· 60·0 当前·0 累计

by @robotsbuildrobots·MIT-0

下载技能包

License

MIT-0

最后更新

2026/4/7

安全扫描

VirusTotal

无害

查看报告

OpenClaw

安全

high confidence

The skill is internally consistent: it implements a small Node CLI that uploads audio to the Azure Speech endpoint and requires only the Azure speech endpoint and key that are appropriate for that purpose.

评估建议

This skill is coherent and implements a straightforward transcription CLI. Before installing, confirm you are comfortable with audio being uploaded to Microsoft (the script posts audio to the Azure Speech endpoint). Provide a Speech resource key with least privilege possible and rotate/revoke the key if needed. Ensure your runtime has a compatible Node version (FormData/Blob/fetch usage may require modern Node). Avoid uploading highly sensitive recordings unless your Azure policy allows it.

详细分析 ▾

✓ 用途与能力

Name/description (MAI Transcribe) match the requested resources and code. The skill only asks for AZURE_SPEECH_ENDPOINT and AZURE_SPEECH_KEY, requires node, and contains a small CLI that posts audio to the documented Speech API. Nothing requested appears unrelated to transcription.

✓ 指令范围

SKILL.md and scripts instruct the agent to run a local Node script that reads a single audio file, uploads it to the configured AZURE_SPEECH_ENDPOINT, and writes a transcript file. The instructions do not request unrelated files, other environment variables, or unexpected external endpoints. The README and SKILL.md explicitly note that audio is uploaded to Microsoft.

✓ 安装机制

This is an instruction-only skill with no install spec (lowest risk). The included code files are small, documented, and use standard Node runtime behavior; there are no downloads from arbitrary URLs or extraction steps.

✓ 凭证需求

Required env vars are AZURE_SPEECH_ENDPOINT and AZURE_SPEECH_KEY (primaryEnv). Those are appropriate and sufficient for calling Azure Speech. No unrelated secrets or config paths are requested. An optional AZURE_SPEECH_API_VERSION is allowed for compatibility.

✓ 持久化与权限

always is false and the skill does not request persistent/global agent privileges or modify other skill configs. Autonomous invocation is allowed by default but is not combined with broad or unrelated credential access.

⚠ scripts/transcribe.js:51

File read combined with network send (possible exfiltration).

安全有层次，运行前请审查代码。

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

无特殊依赖

版本

latestv0.1.12026/4/7

Add Azure Speech key and endpoint setup instructions

● 无害

安装命令点击复制

官方npx clawhub@latest install mai-transcribe

镜像加速npx clawhub@latest install mai-transcribe --registry https://cn.clawhub-mirror.com

技能文档

Transcribe an audio file via Azure AI Speech using Microsoft's MAI-Transcribe-1 model.

Quick start

node {baseDir}/scripts/transcribe.js /path/to/audio.m4a

Defaults:

Model: mai-transcribe-1
Output: .txt
API version: 2025-10-15

Useful flags

node {baseDir}/scripts/transcribe.js /path/to/audio.ogg --out /tmp/transcript.txt
node {baseDir}/scripts/transcribe.js /path/to/audio.m4a --language en-GB
node {baseDir}/scripts/transcribe.js /path/to/audio.m4a --json --out /tmp/transcript.json
node {baseDir}/scripts/transcribe.js /path/to/audio.wav --model mai-transcribe-1
node {baseDir}/scripts/transcribe.js --help

Required env vars

export AZURE_SPEECH_ENDPOINT="https://YOUR-RESOURCE.cognitiveservices.azure.com"
export AZURE_SPEECH_KEY="YOUR_SPEECH_RESOURCE_KEY"

How to get the API key

Go to the Azure portal and open your Speech or Foundry Speech resource.
Open Keys and Endpoint.
Copy:

- the resource endpoint, for example https://your-resource.cognitiveservices.azure.com - one of the resource keys

Export them:

export AZURE_SPEECH_ENDPOINT="https://YOUR-RESOURCE.cognitiveservices.azure.com"
export AZURE_SPEECH_KEY="YOUR_SPEECH_RESOURCE_KEY"

If gh-style copy-paste chaos is happening, the most important bit is that this skill expects the Speech resource endpoint, not a generic Foundry project URL.

Optional:

export AZURE_SPEECH_API_VERSION="2025-10-15"

API shape

The script calls:

POST {AZURE_SPEECH_ENDPOINT}/speechtotext/transcriptions:transcribe?api-version=2025-10-15

Headers:

Ocp-Apim-Subscription-Key: {AZURE_SPEECH_KEY}

Multipart form fields:

audio
definition

Example definition payload:

{
  "enhancedMode": {
    "enabled": true,
    "model": "mai-transcribe-1"
  }
}

Notes

This is the same style of skill as the Whisper one: a small documented script wrapper, not a built-in OpenClaw media pipeline.
Tested successfully against a live Azure Speech resource.
--json writes the raw Azure response for debugging or downstream processing.
Audio is uploaded to Microsoft for processing.

数据来源：ClawHub ↗ · 中文优化：龙虾技能库

OpenClaw 技能定制 / 插件定制 / 私有工作流定制

免费技能或插件可能存在安全风险，如需更匹配、更安全的方案，建议联系付费定制

了解定制服务

License

运行时依赖

版本

安装命令 点击复制

技能文档

Quick start

Useful flags

Required env vars

How to get the API key

API shape

Notes

安装命令点击复制