📊 ClawBrain Benchmark — 技能工具

v1.0.2

测试你的 OpenClaw 在 205 个真实场景下的表现,对比 ClawBrain v1.0 编排引擎的提升效果

0· 113·0 当前·0 累计
michaelfeng 头像by @michaelfeng·MIT-0
下载技能包
License
MIT-0
最后更新
2026/4/8
0
安全扫描
VirusTotal
无害
查看报告
OpenClaw
可疑
medium confidence
The skill's description claims to run a 205-scenario benchmark, but the runtime instructions are vague while the skill metadata allows shell execution (exec) and lists curl — this mismatch and the open-ended behavior increases risk.
评估建议
This skill is ambiguous about what it will actually run. Before installing or invoking it: 1) Ask the developer for the exact commands/scripts the skill will execute and any network endpoints it will contact. 2) If the skill must run benchmarks that interact with your system (files, shell, messaging), require a safe, sandboxed mode and explicit allowed paths. 3) If you allow it to run, disable autonomous invocation or run it in a restricted/test agent first. 4) Avoid providing credentials or sen...
详细分析 ▾
用途与能力
The skill claims to run extensive benchmarks across file, terminal, messaging, and multi-step scenarios. The metadata lists curl and sets command-dispatch to exec, but SKILL.md contains no concrete commands, no scripts, and no explanation of what will actually be executed. Requiring a shell exec capability and curl is disproportionate unless the skill documents what network calls or shell commands it will run.
指令范围
The SKILL.md is high-level and open-ended: it tells the agent to 'run the benchmark' but provides no step-by-step commands, no allowed file paths, and no constraints. The benchmark categories (file ops, terminal commands, messaging) imply actions that could read/write files, run arbitrary shell commands, or send messages — yet the skill does not limit or document those actions. That vagueness grants broad discretion to any agent invocation.
安装机制
No install spec and no code files—this is instruction-only, so nothing is written to disk by an installer. That keeps install risk low.
凭证需求
No environment variables, credentials, or config paths are requested. The only declared dependency is curl, which could be reasonable for fetching reports — but because commands are unspecified, it's unclear why curl is required.
持久化与权限
always is false and there are no special persistence requests. However the skill is configured for exec-style command dispatch and model-invocation is allowed (platform default). Combined with the skill's vagueness, autonomous or poorly constrained invocations could perform wide-ranging actions if the agent decides to run shell commands.
安全有层次,运行前请审查代码。

License

MIT-0

可自由使用、修改和再分发,无需署名。

运行时依赖

无特殊依赖

版本

latestv1.0.22026/3/23

Fix display name

无害

安装命令

点击复制
官方npx clawhub@latest install clawbrain-pro-benchmark
镜像加速npx clawhub@latest install clawbrain-pro-benchmark --registry https://cn.longxiaskill.com

技能文档

测试你的 AI 在 OpenClaw 中的真实表现。看看它做简单事行不行,做复杂事会不会掉链子。

使用方法

直接说"跑一下 benchmark"或"测试一下模型效果"。

测试什么

10 大类、205 个真实场景:

类别测什么为什么重要
文件操作读、写、编辑文件基本功
搜索查资料、抓网页日常需求
消息微信、钉钉发消息沟通协作
终端跑命令、管服务开发运维
多步任务搜索→整理→保存→通知真正做事的能力
错误恢复出错了怎么办靠不靠谱
模糊指令"帮我准备下"聪不聪明

评测结果

模型综合错误恢复模糊指令
ClawBrain Pro(编排引擎)~90%90%+75%+
GLM-583%80%65%
MiniMax-M2.581%76%55%
Qwen3-Coder-Plus79%76%25%
DeepSeek-V373%56%65%
ClawBrain Pro 通过编排引擎实现:规划→多模型协作→结果验证,综合表现超越任何单模型。

完整报告:https://clawbrain.dev/blog/openclaw-model-comparison

数据来源ClawHub ↗ · 中文优化:龙虾技能库