📊 ClawBrain Benchmark — 技能工具
v1.0.2测试你的 OpenClaw 在 205 个真实场景下的表现,对比 ClawBrain v1.0 编排引擎的提升效果
0· 113·0 当前·0 累计
安全扫描
OpenClaw
可疑
medium confidenceThe skill's description claims to run a 205-scenario benchmark, but the runtime instructions are vague while the skill metadata allows shell execution (exec) and lists curl — this mismatch and the open-ended behavior increases risk.
评估建议
This skill is ambiguous about what it will actually run. Before installing or invoking it: 1) Ask the developer for the exact commands/scripts the skill will execute and any network endpoints it will contact. 2) If the skill must run benchmarks that interact with your system (files, shell, messaging), require a safe, sandboxed mode and explicit allowed paths. 3) If you allow it to run, disable autonomous invocation or run it in a restricted/test agent first. 4) Avoid providing credentials or sen...详细分析 ▾
⚠ 用途与能力
The skill claims to run extensive benchmarks across file, terminal, messaging, and multi-step scenarios. The metadata lists curl and sets command-dispatch to exec, but SKILL.md contains no concrete commands, no scripts, and no explanation of what will actually be executed. Requiring a shell exec capability and curl is disproportionate unless the skill documents what network calls or shell commands it will run.
⚠ 指令范围
The SKILL.md is high-level and open-ended: it tells the agent to 'run the benchmark' but provides no step-by-step commands, no allowed file paths, and no constraints. The benchmark categories (file ops, terminal commands, messaging) imply actions that could read/write files, run arbitrary shell commands, or send messages — yet the skill does not limit or document those actions. That vagueness grants broad discretion to any agent invocation.
✓ 安装机制
No install spec and no code files—this is instruction-only, so nothing is written to disk by an installer. That keeps install risk low.
✓ 凭证需求
No environment variables, credentials, or config paths are requested. The only declared dependency is curl, which could be reasonable for fetching reports — but because commands are unspecified, it's unclear why curl is required.
ℹ 持久化与权限
always is false and there are no special persistence requests. However the skill is configured for exec-style command dispatch and model-invocation is allowed (platform default). Combined with the skill's vagueness, autonomous or poorly constrained invocations could perform wide-ranging actions if the agent decides to run shell commands.
安全有层次,运行前请审查代码。
运行时依赖
无特殊依赖
版本
latestv1.0.22026/3/23
Fix display name
● 无害
安装命令
点击复制官方npx clawhub@latest install clawbrain-pro-benchmark
镜像加速npx clawhub@latest install clawbrain-pro-benchmark --registry https://cn.longxiaskill.com
技能文档
测试你的 AI 在 OpenClaw 中的真实表现。看看它做简单事行不行,做复杂事会不会掉链子。
使用方法
直接说"跑一下 benchmark"或"测试一下模型效果"。
测试什么
10 大类、205 个真实场景:
| 类别 | 测什么 | 为什么重要 |
|---|---|---|
| 文件操作 | 读、写、编辑文件 | 基本功 |
| 搜索 | 查资料、抓网页 | 日常需求 |
| 消息 | 微信、钉钉发消息 | 沟通协作 |
| 终端 | 跑命令、管服务 | 开发运维 |
| 多步任务 | 搜索→整理→保存→通知 | 真正做事的能力 |
| 错误恢复 | 出错了怎么办 | 靠不靠谱 |
| 模糊指令 | "帮我准备下" | 聪不聪明 |
评测结果
| 模型 | 综合 | 错误恢复 | 模糊指令 |
|---|---|---|---|
| ClawBrain Pro(编排引擎) | ~90% | 90%+ | 75%+ |
| GLM-5 | 83% | 80% | 65% |
| MiniMax-M2.5 | 81% | 76% | 55% |
| Qwen3-Coder-Plus | 79% | 76% | 25% |
| DeepSeek-V3 | 73% | 56% | 65% |
完整报告:https://clawbrain.dev/blog/openclaw-model-comparison