适用场景
用户希望将 MiniMax 作为真正的多模态平台使用,而不是模糊的品牌提及。代理处理模型路由、API选择、兼容SDK注意事项、语音生成、排队媒体作业、MCP边界和生产安全重试模式。
当阻碍是操作性问题时使用此技能:错误的界面、错误的模型层级、被忽略的参数、中断的轮询循环、不安全的媒体上传,或跨文本、语音、视频和音乐任务的不良路由。
架构
记忆存储在 ~/minimax/ 中。如果 ~/minimax/ 不存在,请运行 setup.md。结构请参见 memory-template.md。
~/minimax/
|-- memory.md # 持久上下文、激活边界和已批准的默认值
|-- routing.md # 实际有效的模型和界面选择
|-- text-defaults.md # 文本模型固定、SDK兼容性注释和解析规则
|-- speech-defaults.md # 语音、格式、延迟和同意敏感的语音注释
|-- media-jobs.md # 异步视频或音乐作业模式、轮询和输出处理
|-- mcp-notes.md # 已批准的MCP主机、范围和拒绝原因
-- incidents.md # 速率限制、失败作业、不良提示和恢复注释
快速参考
仅加载当前阻碍所需的文件。
| 主题 | 文件 |
|---|
| 设置指南 | setup.md |
| 记忆模板 | memory-template.md |
| 模型选择和路由 | model-routing.md |
| 原生、Anthropic兼容和OpenAI兼容的文本流程 | text-interfaces.md |
| 语音生成和音频交付 | speech-workflows.md |
| 视频、音乐和异步媒体作业 | media-generation.md |
| MCP边界和编排选择 | mcp-and-orchestration.md |
| 故障恢复和调试 | troubleshooting.md |
要求
- MINIMAX_API_KEY
用于直接使用 MiniMax API。
- 选择的客户端界面:原始HTTP、已批准的SDK,或现有的Anthropic兼容或OpenAI兼容集成。
- 在上传私人媒体、克隆或模仿真实人物的声音、启用远程MCP服务器或启动长时间运行的付费生成作业之前,需要明确的用户批准。
- 当任务依赖精确的产品界面时,必须根据官方MiniMax文档验证当前模型名称、兼容性限制和端点行为。
操作覆盖范围
此技能将MiniMax视为执行平台,而不是单行提供商交换。它涵盖:
- 通过原生MiniMax API和兼容SDK界面进行文本生成
- 在当前文本系列(如 MiniMax-M2.5
、MiniMax-M2.5-highspeed、MiniMax-M2.1、MiniMax-M2.1-highspeed 和 MiniMax-M2)之间进行模型路由
- 使用同步HTTP和低延迟端点选择进行语音生成
- 视频和音乐的排队媒体工作流,其中提交、轮询和获取是独立阶段
- MCP感知工作流,其中工具访问、主机信任和数据范围必须明确
- 围绕被忽略的参数、格式错误的有效载荷、长队列时间、速率限制和输出可重复性进行调试
数据存储
仅在 ~/minimax/ 中保留持久的MiniMax操作上下文:
- 用户实际使用的模态:文本、语音、视频、音乐或MCP支持的流程
- 已批准的模型、速度层级和在实际任务中有效的兼容性界面
- 输出默认值,如JSON解析规则、音频格式、轮询间隔和重试策略
- 用户明确批准的媒体安全规则、同意要求和预算边界
- 重复失败,如401错误、被忽略的参数、队列停滞或不良提示模板
核心规则
1. 首先锁定模态和交付物
- 首先命名实际输出:结构化文本、聊天回复、旁白音频、短视频、歌曲草稿或工具增强工作流。
- MiniMax不是单一界面。错误的模态选择会导致错误的端点、错误的延迟预期和错误的重试逻辑。
2. 谨慎选择原生与兼容API
- 当您需要MiniMax特定功能或精确行为时,使用原生MiniMax API。
- 仅当周围应用已经依赖这些SDK且支持的子集足够好时,才使用Anthropic兼容或OpenAI兼容界面。
- 将兼容性层视为更窄的界面,而不是功能完整的副本。
3. 固定确切的模型系列和速度层级
- 明确选择质量优先、速度优先或备用模型,而不是说"使用MiniMax"。
- 当前文本路由应从
MiniMax-M2.5 或 MiniMax-M2.5-highspeed 开始,仅在延迟、成本或兼容性需要时才降级。
在交付硬编码模型列表之前重新检查实时文档,因为MiniMax经常更新其公共界面。4. 分离同步与异步媒体工作
- 同步文本和语音流程通常可以在一个请求中返回。
- 视频和音乐生成通常需要提交、轮询、超时和获取逻辑。
- 不要为本质上排队的媒体作业设计阻塞式一次性工作流。
5. 在生成前验证媒体权利、输入和格式
- 确认用户有权上传或转换任何语音、歌词、参考媒体或品牌资产。
- 在生成前验证格式、时长、语言和输出预期。
- 不良资产假设比不良提示更快地浪费支出。
6. 明确成本和信任边界
- 多模态运行可以将提示、媒体和元数据发送到机器外,并且成本会快速累积。
- 说明哪个端点将接收哪个有效载荷,并在远程MCP或大型媒体上传之前停止,除非用户批准了该路径。
- 不要仅仅因为API支持就将远程执行视为正常。
7. 以可重现的配方结束
- 成功的MiniMax运行以精确的模型、界面、关键参数、资产输入和轮询行为结束,记录足够清晰以便重新运行。
- 如果输出很脆弱,请在再次更改提示或模型之前捕获最窄的可重现有效载荷。
MiniMax陷阱
- 将每个MiniMax功能视为可通过每个SDK垫片使用 -> 参数被忽略,调试从错误的前提开始。
- 说"使用MiniMax模型"而不固定系列或速度层级 -> 延迟、质量和成本在运行之间漂移。
- 将媒体流构建为一个请求和一个响应 -> 排队作业挂起或失败,没有可用的恢复。
- 在澄清权利或同意之前上传敏感媒体 -> 技术工作流成功但使用不安全。
- 假设文本默认值适用于语音、视频或音乐 -> 提示、有效载荷形状和验证规则很快偏离。
- 在检查有效载荷模式、队列状态或输出获取逻辑之前指责模型 -> 操作bug被错误标记为生成质量问题。
- 让MCP服务器在没有主机审查的情况下接触广泛数据 -> 工具便利性变成信任漏洞。
外部端点
除非用户明确批准更多,否则仅允许这些端点类别:
| 端点 | 发送数据 | 目的 |
|---|
| https://api.minimax.io | 提示、已批准的媒体输入、生成参数和轮询请求 | 原生MiniMax文本、语音、媒体和相关API工作流 |
| https://api-uw.minimax.io | 已批准的语音有效载荷和生成参数 | 当用户想要更快的首个音频时,可选的低TTFA语音端点 |
| https://platform.minimax.io/docs | 仅文档查询 | 验证当前模型、兼容性注释和API行为 |
| https://{user-approved-mcp-host} | 已批准的MCP服务器所需的请求有效载荷 | 本地机器之外的可选MCP工具访问 |
除非用户明确批准额外的主机或提供商路由,否则不会向外部发送其他数据。
安全与隐私
离开您机器的数据:
- 发送到MiniMax API端点的提示和参数
- 仅为用户请求的生成工作流发送已批准的媒体资产或参考文件
- 仅为用户批准的MCP主机发送可选的MCP有效载荷
- 可选地查阅官方MiniMax文档
保留在本地的数据:
- ~/minimax/
下的持久操作注释
- 本地提示草稿、路由选择和事件注释,除非用户导出它们
- 任何从未上传的被拒绝或未使用的资产
此技能不会:
- 在没有验证的情况下将兼容SDK视为精确的功能匹配
- 在没有明确用户意图的情况下上传私人媒体、语音参考或歌词
- 在没有明确批准的情况下启用远程MCP或广泛的工具访问
- 声称每个MiniMax模态都是同步的或立即可用的
- 修改其自己的技能文件
信任
通过使用此技能,提示和已批准的媒体可能会发送到MiniMax服务以及任何可选的用户批准的MCP主机。
只有在您信任这些服务处理这些数据时才安装。
范围
此技能仅:
- 帮助安全地操作MiniMax文本、语音、视频、音乐和MCP相关工作流
- 将任务路由到正确的模型系列、界面和作业模式
- 保留已批准默认值、预算边界和重复失败的持久注释
此技能从不:
- 在不检查界面限制的情况下将MiniMax视为通用提供商替代品
- 在没有权利和同意检查的情况下建议语音模仿或媒体转换
- 模糊本地编排和远程MCP执行之间的界限
- 承诺排队的媒体作业表现得像低延迟文本调用
相关技能
如果用户确认,请使用 clawhub install 安装:
ai - 在锁定堆栈之前将MiniMax与其他模型提供商进行比较。
api - 在MiniMax API周围重用结构化HTTP、重试和有效载荷调试模式。
models - 为质量、延迟和成本选择正确的模型系列和备用链。
video-generation - 将MiniMax视频工作扩展到更广泛的多提供商视频路由。
music - 当任务特别以音乐为先时,加强提示和编排决策。反馈
clawhub star minimax
保持更新:clawhub sync`
When to Use
User wants to work with MiniMax as a real multimodal platform, not as a vague brand mention. Agent handles model routing, API selection, compatible SDK caveats, speech generation, queued media jobs, MCP boundaries, and production-safe retry patterns.
Use this when the blocker is operational: wrong interface, wrong model tier, ignored parameters, broken polling loop, unsafe media upload, or poor routing across text, speech, video, and music tasks.
Architecture
Memory lives in ~/minimax/. If ~/minimax/ does not exist, run setup.md. See memory-template.md for structure.
~/minimax/
|-- memory.md # Durable context, activation boundaries, and approved defaults
|-- routing.md # Model and interface choices that worked in practice
|-- text-defaults.md # Text model pins, SDK compatibility notes, and parsing rules
|-- speech-defaults.md # Voice, format, latency, and consent-sensitive speech notes
|-- media-jobs.md # Async video or music job patterns, polling, and output handling
|-- mcp-notes.md # Approved MCP hosts, scopes, and rejection reasons
-- incidents.md # Rate limits, failed jobs, bad prompts, and recovery notes
Quick Reference
Load only the file needed for the current blocker.
| Topic | File |
|---|
| Setup guide | setup.md |
| Memory template | memory-template.md |
| Model selection and routing | model-routing.md |
| Native, Anthropic-compatible, and OpenAI-compatible text flows | text-interfaces.md |
| Speech generation and audio delivery | speech-workflows.md |
| Video, music, and async media jobs | media-generation.md |
| MCP boundaries and orchestration choices | mcp-and-orchestration.md |
| Failure recovery and debugging | troubleshooting.md |
Requirements
- MINIMAX_API_KEY
for direct MiniMax API usage.
- A client surface of choice: raw HTTP, an approved SDK, or an existing Anthropic-compatible or OpenAI-compatible integration.
- Explicit user approval before uploading private media, cloning or imitating a real person's voice, enabling remote MCP servers, or launching long-running paid generation jobs.
- Current model names, compatibility limits, and endpoint behavior must be verified against official MiniMax docs when the task depends on exact product surface.
Operating Coverage
This skill treats MiniMax as an execution platform, not as a one-line provider swap. It covers:
- text generation through native MiniMax APIs and compatible SDK interfaces
- model routing across current text families such as MiniMax-M2.5
, MiniMax-M2.5-highspeed, MiniMax-M2.1, MiniMax-M2.1-highspeed, and MiniMax-M2
- speech generation with synchronous HTTP and lower-latency endpoint choices
- queued media workflows for video and music where submit, poll, and fetch are separate phases
- MCP-aware workflows where tool access, host trust, and data scope must be explicit
- debugging around ignored parameters, malformed payloads, long queue times, rate limits, and output reproducibility
Data Storage
Keep only durable MiniMax operating context in ~/minimax/:
- which modalities the user actually uses: text, speech, video, music, or MCP-backed flows
- approved models, speed tiers, and compatibility interfaces that worked for real tasks
- output defaults such as JSON parsing rules, audio formats, polling intervals, and retry posture
- media safety rules, consent requirements, and budget boundaries the user explicitly approved
- repeated failures such as 401s, ignored params, queue stalls, or bad prompt templates
Core Rules
1. Lock the Modality and Deliverable First
- Start by naming the actual output: structured text, chat reply, narration audio, short video, song draft, or tool-augmented workflow.
- MiniMax is not one surface. The wrong modality choice creates wrong endpoints, wrong latency expectations, and wrong retry logic.
2. Choose Native Versus Compatible APIs Deliberately
- Use native MiniMax APIs when you need MiniMax-specific features or exact behavior.
- Use Anthropic-compatible or OpenAI-compatible interfaces only when the surrounding app already depends on those SDKs and the supported subset is good enough.
- Treat compatibility layers as narrower surfaces, not as feature-complete copies.
3. Pin the Exact Model Family and Speed Tier
- Choose quality-first, speed-first, or fallback models explicitly instead of saying "use MiniMax."
- Current text routing should start with
MiniMax-M2.5 or MiniMax-M2.5-highspeed, then step down only if latency, cost, or compatibility requires it.
Re-check live docs before shipping hardcoded model lists because MiniMax updates its public surface frequently.4. Separate Sync From Async Media Work
- Synchronous text and speech flows can often return in one request.
- Video and music generation usually need submit, poll, timeout, and fetch logic.
- Do not design a blocking one-shot workflow for media jobs that are inherently queued.
5. Validate Media Rights, Inputs, and Formats Before Generation
- Confirm the user has rights to upload or transform any voice, lyrics, reference media, or branded assets.
- Validate format, duration, language, and output expectations before generating.
- Bad asset assumptions waste spend faster than bad prompts.
6. Make Cost and Trust Boundaries Explicit
- Multimodal runs can send prompts, media, and metadata off machine and can accumulate cost quickly.
- State which endpoint will receive which payload, and stop before remote MCP or large media uploads unless the user approved that path.
- Never normalize remote execution just because the API supports it.
7. Finish With a Reproducible Recipe
- A successful MiniMax run ends with the exact model, interface, key parameters, asset inputs, and polling behavior recorded clearly enough to rerun.
- If the output is fragile, capture the narrowest reproducible payload before changing prompts or models again.
MiniMax Traps
- Treating every MiniMax feature as available through every SDK shim -> parameters get ignored and debugging starts from a false premise.
- Saying "use the MiniMax model" without pinning family or speed tier -> latency, quality, and cost drift across runs.
- Building media flows as one request and one response -> queued jobs hang or fail without usable recovery.
- Uploading sensitive media before clarifying rights or consent -> the technical workflow succeeds but the usage is unsafe.
- Assuming text defaults work for speech, video, or music -> prompts, payload shape, and validation rules diverge quickly.
- Blaming the model before checking payload schema, queue state, or output fetch logic -> operational bugs get mislabeled as generation quality problems.
- Letting MCP servers touch broad data without host review -> tool convenience becomes a trust leak.
External Endpoints
Only these endpoint categories are allowed unless the user explicitly approves more:
| Endpoint | Data Sent | Purpose |
|---|
| https://api.minimax.io | prompts, approved media inputs, generation parameters, and polling requests | Native MiniMax text, speech, media, and related API workflows |
| https://api-uw.minimax.io | approved speech payloads and generation parameters | Optional lower-TTFA speech endpoint when the user wants faster first audio |
| https://platform.minimax.io/docs | doc queries only | Verify current models, compatibility notes, and API behavior |
| https://{user-approved-mcp-host} | request payloads required by the approved MCP server | Optional MCP tool access beyond the local machine |
No other data is sent externally unless the user explicitly approves additional hosts or provider routes.
Security & Privacy
Data that leaves your machine:
- prompts and parameters sent to MiniMax API endpoints
- approved media assets or reference files only for the generation workflow the user requested
- optional MCP payloads only for user-approved MCP hosts
- optional documentation lookups against official MiniMax docs
Data that stays local:
- durable operating notes under ~/minimax/
- local prompt drafts, routing choices, and incident notes unless the user exports them
- any rejected or unused assets that never get uploaded
This skill does NOT:
- treat compatible SDKs as exact feature matches without verification
- upload private media, voice references, or lyrics without explicit user intent
- enable remote MCP or broad tool access without explicit approval
- claim that every MiniMax modality is synchronous or instantly available
- modify its own skill files
Trust
By using this skill, prompts and approved media may be sent to MiniMax services, plus any optional user-approved MCP hosts.
Only install if you trust those services with that data.
Scope
This skill ONLY:
- helps operate MiniMax text, speech, video, music, and MCP-related workflows safely
- routes tasks to the right model family, interface, and job pattern
- keeps durable notes for approved defaults, budget boundaries, and recurring failures
This skill NEVER:
- treat MiniMax as a generic provider drop-in without checking interface limits
- suggest voice imitation or media transformation without rights and consent checks
- blur the line between local orchestration and remote MCP execution
- promise that queued media jobs behave like low-latency text calls
Related Skills
Install with clawhub install
if user confirms:
ai - Compare MiniMax against other model providers before locking the stack.
api - Reuse structured HTTP, retry, and payload-debugging patterns around the MiniMax APIs.
models - Choose the right model family and fallback chain for quality, latency, and cost.
video-generation - Extend MiniMax video work into broader multi-provider video routing.
music - Strengthen prompt and arrangement decisions when the task is specifically music-first.Feedback
clawhub star minimax
Stay updated: clawhub sync`