MiniMax — 多模态 API 工具（文本、语音、视频、音乐）

Name: MiniMax — 多模态 API 工具（文本、语音、视频、音乐）
Rating: 1 (1 reviews)
Author: Iván

Iván

🎛️ MiniMax — 多模态 API 工具（文本、语音、视频、音乐）

v1.0.0

MiniMax 是一款多模态 API 工具，支持文本、语音、视频和音乐 API，通过模型路由、兼容 SDK 和更安全的多模态工作流。它帮助开发者安全地操作 MiniMax 相关工作流，包括模型路由、API 选择、语音生成、媒体作业队列、MCP 边界和生产环境下的重试模式。

1· 469·3 当前·3 累计

by @ivangdavila (Iván)·MIT-0

开发工具 API工具自动化 AI模型访问文件处理

下载技能包

License

MIT-0

最后更新

2026/3/13

安全扫描

VirusTotal

无害

查看报告

OpenClaw

安全

high confidence

该技能的请求资源和运行指令与 MiniMax 多模态集成一致：需要 MiniMax API 密钥，在 ~/minimax/ 下持久化小型操作员状态，提供操作指南，无隐藏安装或无关凭据。

评估建议

["仅提供适当范围的 MINIMAX_API_KEY（使用专用密钥或可以旋转的工作空间账户）","预计技能将在 ~/minimax/ 下创建和读取小文件 - 确认您对该路径感到舒适，并且不会用于存储大型私人资产，除非您明确批准","技能承诺在私人媒体上传、语音克隆、启用远程 MCP 主机或启动付费/长时间运行作业之前要求明确批准 - 验证代理实际在这些流程中提示您同意","在生产环境中使用之前，审查 MiniMax 提供商文档和任何组织关于外部 API 密钥的政策。如果您想要更高的保证，请要求技能作者提供它将执行的确切 API 调用的最小示例（基本 URL 和请求形状），或先使用低权限测试密钥运行它"]...

详细分析 ▾

✓ 用途与能力

名称/描述（MiniMax 多模态 API）与声明的要求匹配：单个 MINIMAX_API_KEY 和用于持久操作员状态的配置路径。请求的内容（无不相关的云凭据或额外二进制文件）与声明的平台集成目的成比例。

✓ 指令范围

SKILL.md 指示代理在 ~/minimax/ 下创建/读取/写入小型操作文件（内存、路由、默认值）并使用 MINIMAX_API_KEY。这些文件读/写在元数据中声明，与技能的声明行为一致。指令明确要求在上传私人媒体、语音克隆、启用远程 MCP 主机或启动付费作业之前获得用户批准。

✓ 安装机制

仅指令的技能，无安装规格和代码文件 — 最低风险的交付模型。无下载、提取步骤或外部安装 URL。

✓ 凭证需求

仅需要一个凭据（MINIMAX_API_KEY），它是声明的主要凭据。技能不请求无关的秘密或多个无关的环境变量。声明的配置路径 (~/minimax/) 与技能的声明持久存储需求一致。

✓ 持久化与权限

技能不是始终运行的，不请求提升的平台权限。它在用户范围的目录下保留自己的小型操作文件（在获得明确同意后，根据 setup.md）。默认允许自主模型调用，这对于技能来说是正常的，并且本身不是一个问题。

安全有层次，运行前请审查代码。

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

🖥️ OSLinux · macOS · Windows

版本

latestv1.0.02026/3/13

初始发布，包含模型路由、文本兼容性指导、语音和媒体工作流、MCP 边界和故障恢复。

● 无害

安装命令点击复制

官方npx clawhub@latest install minimax

镜像加速npx clawhub@latest install minimax --registry https://cn.clawhub-mirror.com

技能文档

适用场景

用户希望将 MiniMax 作为真正的多模态平台使用，而不是模糊的品牌提及。代理处理模型路由、API选择、兼容SDK注意事项、语音生成、排队媒体作业、MCP边界和生产安全重试模式。

当阻碍是操作性问题时使用此技能：错误的界面、错误的模型层级、被忽略的参数、中断的轮询循环、不安全的媒体上传，或跨文本、语音、视频和音乐任务的不良路由。

架构

记忆存储在 ~/minimax/ 中。如果 ~/minimax/ 不存在，请运行 setup.md。结构请参见 memory-template.md。

~/minimax/
|-- memory.md          # 持久上下文、激活边界和已批准的默认值
|-- routing.md         # 实际有效的模型和界面选择
|-- text-defaults.md   # 文本模型固定、SDK兼容性注释和解析规则
|-- speech-defaults.md # 语音、格式、延迟和同意敏感的语音注释
|-- media-jobs.md      # 异步视频或音乐作业模式、轮询和输出处理
|-- mcp-notes.md       # 已批准的MCP主机、范围和拒绝原因
-- incidents.md       # 速率限制、失败作业、不良提示和恢复注释

`快速参考`

仅加载当前阻碍所需的文件。

主题	文件
设置指南	setup.md
记忆模板	memory-template.md
模型选择和路由	model-routing.md
原生、Anthropic兼容和OpenAI兼容的文本流程	text-interfaces.md
语音生成和音频交付	speech-workflows.md
视频、音乐和异步媒体作业	media-generation.md
MCP边界和编排选择	mcp-and-orchestration.md
故障恢复和调试	troubleshooting.md


要求

MINIMAX_API_KEY 用于直接使用 MiniMax API。


选择的客户端界面：原始HTTP、已批准的SDK，或现有的Anthropic兼容或OpenAI兼容集成。
在上传私人媒体、克隆或模仿真实人物的声音、启用远程MCP服务器或启动长时间运行的付费生成作业之前，需要明确的用户批准。
当任务依赖精确的产品界面时，必须根据官方MiniMax文档验证当前模型名称、兼容性限制和端点行为。

`操作覆盖范围`

此技能将MiniMax视为执行平台，而不是单行提供商交换。它涵盖：

通过原生MiniMax API和兼容SDK界面进行文本生成

在当前文本系列（如 MiniMax-M2.5、MiniMax-M2.5-highspeed、MiniMax-M2.1、MiniMax-M2.1-highspeed 和 MiniMax-M2）之间进行模型路由


使用同步HTTP和低延迟端点选择进行语音生成
视频和音乐的排队媒体工作流，其中提交、轮询和获取是独立阶段
MCP感知工作流，其中工具访问、主机信任和数据范围必须明确
围绕被忽略的参数、格式错误的有效载荷、长队列时间、速率限制和输出可重复性进行调试

`数据存储`

仅在 ~/minimax/中保留持久的MiniMax操作上下文：

用户实际使用的模态：文本、语音、视频、音乐或MCP支持的流程


已批准的模型、速度层级和在实际任务中有效的兼容性界面
输出默认值，如JSON解析规则、音频格式、轮询间隔和重试策略
用户明确批准的媒体安全规则、同意要求和预算边界
重复失败，如401错误、被忽略的参数、队列停滞或不良提示模板

`核心规则`

`1. 首先锁定模态和交付物`


首先命名实际输出：结构化文本、聊天回复、旁白音频、短视频、歌曲草稿或工具增强工作流。
MiniMax不是单一界面。错误的模态选择会导致错误的端点、错误的延迟预期和错误的重试逻辑。
2. 谨慎选择原生与兼容API
当您需要MiniMax特定功能或精确行为时，使用原生MiniMax API。
仅当周围应用已经依赖这些SDK且支持的子集足够好时，才使用Anthropic兼容或OpenAI兼容界面。
将兼容性层视为更窄的界面，而不是功能完整的副本。
3. 固定确切的模型系列和速度层级

明确选择质量优先、速度优先或备用模型，而不是说"使用MiniMax"。

当前文本路由应从 MiniMax-M2.5 或 MiniMax-M2.5-highspeed 开始，仅在延迟、成本或兼容性需要时才降级。


在交付硬编码模型列表之前重新检查实时文档，因为MiniMax经常更新其公共界面。

`4. 分离同步与异步媒体工作`


同步文本和语音流程通常可以在一个请求中返回。
视频和音乐生成通常需要提交、轮询、超时和获取逻辑。
不要为本质上排队的媒体作业设计阻塞式一次性工作流。
5. 在生成前验证媒体权利、输入和格式
确认用户有权上传或转换任何语音、歌词、参考媒体或品牌资产。
在生成前验证格式、时长、语言和输出预期。
不良资产假设比不良提示更快地浪费支出。
6. 明确成本和信任边界
多模态运行可以将提示、媒体和元数据发送到机器外，并且成本会快速累积。
说明哪个端点将接收哪个有效载荷，并在远程MCP或大型媒体上传之前停止，除非用户批准了该路径。
不要仅仅因为API支持就将远程执行视为正常。
7. 以可重现的配方结束
成功的MiniMax运行以精确的模型、界面、关键参数、资产输入和轮询行为结束，记录足够清晰以便重新运行。
如果输出很脆弱，请在再次更改提示或模型之前捕获最窄的可重现有效载荷。
MiniMax陷阱
将每个MiniMax功能视为可通过每个SDK垫片使用 -> 参数被忽略，调试从错误的前提开始。
说"使用MiniMax模型"而不固定系列或速度层级 -> 延迟、质量和成本在运行之间漂移。
将媒体流构建为一个请求和一个响应 -> 排队作业挂起或失败，没有可用的恢复。
在澄清权利或同意之前上传敏感媒体 -> 技术工作流成功但使用不安全。
假设文本默认值适用于语音、视频或音乐 -> 提示、有效载荷形状和验证规则很快偏离。
在检查有效载荷模式、队列状态或输出获取逻辑之前指责模型 -> 操作bug被错误标记为生成质量问题。
让MCP服务器在没有主机审查的情况下接触广泛数据 -> 工具便利性变成信任漏洞。
外部端点
除非用户明确批准更多，否则仅允许这些端点类别：
端点 发送数据 目的
https://api.minimax.io 提示、已批准的媒体输入、生成参数和轮询请求 原生MiniMax文本、语音、媒体和相关API工作流
https://api-uw.minimax.io 已批准的语音有效载荷和生成参数 当用户想要更快的首个音频时，可选的低TTFA语音端点
https://platform.minimax.io/docs 仅文档查询 验证当前模型、兼容性注释和API行为
https://{user-approved-mcp-host} 已批准的MCP服务器所需的请求有效载荷 本地机器之外的可选MCP工具访问
除非用户明确批准额外的主机或提供商路由，否则不会向外部发送其他数据。
安全与隐私
离开您机器的数据：
发送到MiniMax API端点的提示和参数
仅为用户请求的生成工作流发送已批准的媒体资产或参考文件
仅为用户批准的MCP主机发送可选的MCP有效载荷
可选地查阅官方MiniMax文档
保留在本地的数据：

~/minimax/ 下的持久操作注释


本地提示草稿、路由选择和事件注释，除非用户导出它们
任何从未上传的被拒绝或未使用的资产

此技能不会：

在没有验证的情况下将兼容SDK视为精确的功能匹配


在没有明确用户意图的情况下上传私人媒体、语音参考或歌词
在没有明确批准的情况下启用远程MCP或广泛的工具访问
声称每个MiniMax模态都是同步的或立即可用的
修改其自己的技能文件

`信任`

通过使用此技能，提示和已批准的媒体可能会发送到MiniMax服务以及任何可选的用户批准的MCP主机。只有在您信任这些服务处理这些数据时才安装。

`范围`

此技能仅：

帮助安全地操作MiniMax文本、语音、视频、音乐和MCP相关工作流


将任务路由到正确的模型系列、界面和作业模式
保留已批准默认值、预算边界和重复失败的持久注释

此技能从不：

在不检查界面限制的情况下将MiniMax视为通用提供商替代品


在没有权利和同意检查的情况下建议语音模仿或媒体转换
模糊本地编排和远程MCP执行之间的界限
承诺排队的媒体作业表现得像低延迟文本调用

`相关技能`

如果用户确认，请使用 clawhub install 安装：

ai - 在锁定堆栈之前将MiniMax与其他模型提供商进行比较。

api - 在MiniMax API周围重用结构化HTTP、重试和有效载荷调试模式。

models - 为质量、延迟和成本选择正确的模型系列和备用链。

video-generation - 将MiniMax视频工作扩展到更广泛的多提供商视频路由。

music - 当任务特别以音乐为先时，加强提示和编排决策。

`反馈`

如果有用：clawhub star minimax

保持更新：clawhub sync`

When to Use

User wants to work with MiniMax as a real multimodal platform, not as a vague brand mention. Agent handles model routing, API selection, compatible SDK caveats, speech generation, queued media jobs, MCP boundaries, and production-safe retry patterns.

Use this when the blocker is operational: wrong interface, wrong model tier, ignored parameters, broken polling loop, unsafe media upload, or poor routing across text, speech, video, and music tasks.

Architecture

Memory lives in ~/minimax/. If ~/minimax/ does not exist, run setup.md. See memory-template.md for structure.

~/minimax/
|-- memory.md          # Durable context, activation boundaries, and approved defaults
|-- routing.md         # Model and interface choices that worked in practice
|-- text-defaults.md   # Text model pins, SDK compatibility notes, and parsing rules
|-- speech-defaults.md # Voice, format, latency, and consent-sensitive speech notes
|-- media-jobs.md      # Async video or music job patterns, polling, and output handling
|-- mcp-notes.md       # Approved MCP hosts, scopes, and rejection reasons
-- incidents.md       # Rate limits, failed jobs, bad prompts, and recovery notes

`Quick Reference`

Load only the file needed for the current blocker.

Topic	File
Setup guide	setup.md
Memory template	memory-template.md
Model selection and routing	model-routing.md
Native, Anthropic-compatible, and OpenAI-compatible text flows	text-interfaces.md
Speech generation and audio delivery	speech-workflows.md
Video, music, and async media jobs	media-generation.md
MCP boundaries and orchestration choices	mcp-and-orchestration.md
Failure recovery and debugging	troubleshooting.md


Requirements

MINIMAX_API_KEY for direct MiniMax API usage.


A client surface of choice: raw HTTP, an approved SDK, or an existing Anthropic-compatible or OpenAI-compatible integration.
Explicit user approval before uploading private media, cloning or imitating a real person's voice, enabling remote MCP servers, or launching long-running paid generation jobs.
Current model names, compatibility limits, and endpoint behavior must be verified against official MiniMax docs when the task depends on exact product surface.

`Operating Coverage`

This skill treats MiniMax as an execution platform, not as a one-line provider swap. It covers:

text generation through native MiniMax APIs and compatible SDK interfaces

model routing across current text families such as MiniMax-M2.5, MiniMax-M2.5-highspeed, MiniMax-M2.1, MiniMax-M2.1-highspeed, and MiniMax-M2


speech generation with synchronous HTTP and lower-latency endpoint choices
queued media workflows for video and music where submit, poll, and fetch are separate phases
MCP-aware workflows where tool access, host trust, and data scope must be explicit
debugging around ignored parameters, malformed payloads, long queue times, rate limits, and output reproducibility

`Data Storage`

Keep only durable MiniMax operating context in ~/minimax/:

which modalities the user actually uses: text, speech, video, music, or MCP-backed flows


approved models, speed tiers, and compatibility interfaces that worked for real tasks
output defaults such as JSON parsing rules, audio formats, polling intervals, and retry posture
media safety rules, consent requirements, and budget boundaries the user explicitly approved
repeated failures such as 401s, ignored params, queue stalls, or bad prompt templates

`Core Rules`

`1. Lock the Modality and Deliverable First`


Start by naming the actual output: structured text, chat reply, narration audio, short video, song draft, or tool-augmented workflow.
MiniMax is not one surface. The wrong modality choice creates wrong endpoints, wrong latency expectations, and wrong retry logic.
2. Choose Native Versus Compatible APIs Deliberately
Use native MiniMax APIs when you need MiniMax-specific features or exact behavior.
Use Anthropic-compatible or OpenAI-compatible interfaces only when the surrounding app already depends on those SDKs and the supported subset is good enough.
Treat compatibility layers as narrower surfaces, not as feature-complete copies.
3. Pin the Exact Model Family and Speed Tier

Choose quality-first, speed-first, or fallback models explicitly instead of saying "use MiniMax."

Current text routing should start with MiniMax-M2.5 or MiniMax-M2.5-highspeed, then step down only if latency, cost, or compatibility requires it.


Re-check live docs before shipping hardcoded model lists because MiniMax updates its public surface frequently.

`4. Separate Sync From Async Media Work`


Synchronous text and speech flows can often return in one request.
Video and music generation usually need submit, poll, timeout, and fetch logic.
Do not design a blocking one-shot workflow for media jobs that are inherently queued.
5. Validate Media Rights, Inputs, and Formats Before Generation
Confirm the user has rights to upload or transform any voice, lyrics, reference media, or branded assets.
Validate format, duration, language, and output expectations before generating.
Bad asset assumptions waste spend faster than bad prompts.
6. Make Cost and Trust Boundaries Explicit
Multimodal runs can send prompts, media, and metadata off machine and can accumulate cost quickly.
State which endpoint will receive which payload, and stop before remote MCP or large media uploads unless the user approved that path.
Never normalize remote execution just because the API supports it.
7. Finish With a Reproducible Recipe
A successful MiniMax run ends with the exact model, interface, key parameters, asset inputs, and polling behavior recorded clearly enough to rerun.
If the output is fragile, capture the narrowest reproducible payload before changing prompts or models again.
MiniMax Traps
Treating every MiniMax feature as available through every SDK shim -> parameters get ignored and debugging starts from a false premise.
Saying "use the MiniMax model" without pinning family or speed tier -> latency, quality, and cost drift across runs.
Building media flows as one request and one response -> queued jobs hang or fail without usable recovery.
Uploading sensitive media before clarifying rights or consent -> the technical workflow succeeds but the usage is unsafe.
Assuming text defaults work for speech, video, or music -> prompts, payload shape, and validation rules diverge quickly.
Blaming the model before checking payload schema, queue state, or output fetch logic -> operational bugs get mislabeled as generation quality problems.
Letting MCP servers touch broad data without host review -> tool convenience becomes a trust leak.
External Endpoints
Only these endpoint categories are allowed unless the user explicitly approves more:
Endpoint Data Sent Purpose
https://api.minimax.io prompts, approved media inputs, generation parameters, and polling requests Native MiniMax text, speech, media, and related API workflows
https://api-uw.minimax.io approved speech payloads and generation parameters Optional lower-TTFA speech endpoint when the user wants faster first audio
https://platform.minimax.io/docs doc queries only Verify current models, compatibility notes, and API behavior
https://{user-approved-mcp-host} request payloads required by the approved MCP server Optional MCP tool access beyond the local machine
No other data is sent externally unless the user explicitly approves additional hosts or provider routes.
Security & Privacy
Data that leaves your machine:
prompts and parameters sent to MiniMax API endpoints
approved media assets or reference files only for the generation workflow the user requested
optional MCP payloads only for user-approved MCP hosts
optional documentation lookups against official MiniMax docs
Data that stays local:

durable operating notes under ~/minimax/


local prompt drafts, routing choices, and incident notes unless the user exports them
any rejected or unused assets that never get uploaded

This skill does NOT:

treat compatible SDKs as exact feature matches without verification


upload private media, voice references, or lyrics without explicit user intent
enable remote MCP or broad tool access without explicit approval
claim that every MiniMax modality is synchronous or instantly available
modify its own skill files

`Trust`

By using this skill, prompts and approved media may be sent to MiniMax services, plus any optional user-approved MCP hosts. Only install if you trust those services with that data.

`Scope`

This skill ONLY:

helps operate MiniMax text, speech, video, music, and MCP-related workflows safely


routes tasks to the right model family, interface, and job pattern
keeps durable notes for approved defaults, budget boundaries, and recurring failures

This skill NEVER:

treat MiniMax as a generic provider drop-in without checking interface limits


suggest voice imitation or media transformation without rights and consent checks
blur the line between local orchestration and remote MCP execution
promise that queued media jobs behave like low-latency text calls

`Related Skills`


Install with

clawhub install

 if user confirms:

ai - Compare MiniMax against other model providers before locking the stack.

api - Reuse structured HTTP, retry, and payload-debugging patterns around the MiniMax APIs.

models - Choose the right model family and fallback chain for quality, latency, and cost.

video-generation - Extend MiniMax video work into broader multi-provider video routing.

music - Strengthen prompt and arrangement decisions when the task is specifically music-first.

`Feedback`

If useful: clawhub star minimax

Stay updated: clawhub sync`

数据来源：ClawHub ↗ · 中文优化：龙虾技能库

OpenClaw 技能定制 / 插件定制 / 私有工作流定制

免费技能或插件可能存在安全风险，如需更匹配、更安全的方案，建议联系付费定制

了解定制服务

端点	发送数据	目的
https://api.minimax.io	提示、已批准的媒体输入、生成参数和轮询请求	原生MiniMax文本、语音、媒体和相关API工作流
https://api-uw.minimax.io	已批准的语音有效载荷和生成参数	当用户想要更快的首个音频时，可选的低TTFA语音端点
https://platform.minimax.io/docs	仅文档查询	验证当前模型、兼容性注释和API行为
https://{user-approved-mcp-host}	已批准的MCP服务器所需的请求有效载荷	本地机器之外的可选MCP工具访问

Endpoint	Data Sent	Purpose
https://api.minimax.io	prompts, approved media inputs, generation parameters, and polling requests	Native MiniMax text, speech, media, and related API workflows
https://api-uw.minimax.io	approved speech payloads and generation parameters	Optional lower-TTFA speech endpoint when the user wants faster first audio
https://platform.minimax.io/docs	doc queries only	Verify current models, compatibility notes, and API behavior
https://{user-approved-mcp-host}	request payloads required by the approved MCP server	Optional MCP tool access beyond the local machine

License

运行时依赖

版本

安装命令 点击复制

技能文档

适用场景

架构

快速参考

要求

操作覆盖范围

数据存储

核心规则

1. 首先锁定模态和交付物

2. 谨慎选择原生与兼容API

3. 固定确切的模型系列和速度层级

4. 分离同步与异步媒体工作

5. 在生成前验证媒体权利、输入和格式

6. 明确成本和信任边界

7. 以可重现的配方结束

MiniMax陷阱

外部端点

安全与隐私

信任

范围

相关技能

反馈

When to Use

Architecture

Quick Reference

Requirements

Operating Coverage

Data Storage

Core Rules

1. Lock the Modality and Deliverable First

2. Choose Native Versus Compatible APIs Deliberately

3. Pin the Exact Model Family and Speed Tier

4. Separate Sync From Async Media Work

5. Validate Media Rights, Inputs, and Formats Before Generation

6. Make Cost and Trust Boundaries Explicit

7. Finish With a Reproducible Recipe

MiniMax Traps

External Endpoints

Security & Privacy

Trust

Scope

Related Skills

Feedback

安装命令点击复制

`快速参考`

`操作覆盖范围`

`数据存储`

`核心规则`

`1. 首先锁定模态和交付物`

`4. 分离同步与异步媒体工作`

`信任`

`范围`

`相关技能`

`反馈`

`Quick Reference`

`Operating Coverage`

`Data Storage`

`Core Rules`

`1. Lock the Modality and Deliverable First`

`4. Separate Sync From Async Media Work`

`Trust`

`Scope`

`Related Skills`

`Feedback`