📊 ClawBrain Benchmark — 技能工具

v1.0.2

测试你的 OpenClaw 在 205 个真实场景下的表现，对比 ClawBrain v1.0 编排引擎的提升效果

0· 113·0 当前·0 累计

by @michaelfeng·MIT-0

AI模型访问

使用场景：使用ClawBrain Benchmark — 技能工具进行AI模型访问使用ClawBrain Benchmark — 技能工具

下载技能包

License

MIT-0

最后更新

2026/4/8

安全扫描

VirusTotal

无害

查看报告

OpenClaw

可疑

medium confidence

The skill's description claims to run a 205-scenario benchmark, but the runtime instructions are vague while the skill metadata allows shell execution (exec) and lists curl — this mismatch and the open-ended behavior increases risk.

评估建议

This skill is ambiguous about what it will actually run. Before installing or invoking it: 1) Ask the developer for the exact commands/scripts the skill will execute and any network endpoints it will contact. 2) If the skill must run benchmarks that interact with your system (files, shell, messaging), require a safe, sandboxed mode and explicit allowed paths. 3) If you allow it to run, disable autonomous invocation or run it in a restricted/test agent first. 4) Avoid providing credentials or sen...

详细分析 ▾

⚠ 用途与能力

The skill claims to run extensive benchmarks across file, terminal, messaging, and multi-step scenarios. The metadata lists curl and sets command-dispatch to exec, but SKILL.md contains no concrete commands, no scripts, and no explanation of what will actually be executed. Requiring a shell exec capability and curl is disproportionate unless the skill documents what network calls or shell commands it will run.

⚠ 指令范围

The SKILL.md is high-level and open-ended: it tells the agent to 'run the benchmark' but provides no step-by-step commands, no allowed file paths, and no constraints. The benchmark categories (file ops, terminal commands, messaging) imply actions that could read/write files, run arbitrary shell commands, or send messages — yet the skill does not limit or document those actions. That vagueness grants broad discretion to any agent invocation.

✓ 安装机制

No install spec and no code files—this is instruction-only, so nothing is written to disk by an installer. That keeps install risk low.

✓ 凭证需求

No environment variables, credentials, or config paths are requested. The only declared dependency is curl, which could be reasonable for fetching reports — but because commands are unspecified, it's unclear why curl is required.

ℹ 持久化与权限

always is false and there are no special persistence requests. However the skill is configured for exec-style command dispatch and model-invocation is allowed (platform default). Combined with the skill's vagueness, autonomous or poorly constrained invocations could perform wide-ranging actions if the agent decides to run shell commands.

安全有层次，运行前请审查代码。

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

无特殊依赖

版本

latestv1.0.22026/3/23

Fix display name

● 无害

安装命令

点击复制

官方npx clawhub@latest install clawbrain-pro-benchmark

镜像加速npx clawhub@latest install clawbrain-pro-benchmark --registry https://cn.longxiaskill.com 镜像可用

本土化适配说明

ClawBrain Benchmark — 技能工具安装说明：安装命令：npx clawhub@latest install clawbrain-pro-benchmark

需要定制？告诉我你的需求 →

技能文档

测试你的 AI 在 OpenClaw 中的真实表现。看看它做简单事行不行，做复杂事会不会掉链子。

使用方法

直接说"跑一下 benchmark"或"测试一下模型效果"。

测试什么

10 大类、205 个真实场景：

类别	测什么	为什么重要
文件操作	读、写、编辑文件	基本功
搜索	查资料、抓网页	日常需求
消息	微信、钉钉发消息	沟通协作
终端	跑命令、管服务	开发运维
多步任务	搜索→整理→保存→通知	真正做事的能力
错误恢复	出错了怎么办	靠不靠谱
模糊指令	"帮我准备下"	聪不聪明

评测结果

模型	综合	错误恢复	模糊指令
ClawBrain Pro（编排引擎）	~90%	90%+	75%+
GLM-5	83%	80%	65%
MiniMax-M2.5	81%	76%	55%
Qwen3-Coder-Plus	79%	76%	25%
DeepSeek-V3	73%	56%	65%

ClawBrain Pro 通过编排引擎实现：规划→多模型协作→结果验证，综合表现超越任何单模型。

完整报告：https://clawbrain.dev/blog/openclaw-model-comparison

License

运行时依赖

版本

安装命令

本土化适配说明

技能文档

使用方法

测试什么

评测结果

相关技能推荐