首页龙虾技能列表 › Agent Benchmark — 技能工具

Agent Benchmark — 技能工具

v0.1.1

提供基于12项标准化任务的AI Agent能力评估,涵盖文件操作、数据处理、系统操作、健壮性和代码质量,自动评分生成报告。

0· 69·0 当前·0 累计
by @yuyonghao-123·MIT-0
下载技能包
License
MIT-0
最后更新
2026/3/31
安全扫描
VirusTotal
无害
查看报告
OpenClaw
可疑
medium confidence
The skill mostly does what a benchmark runner would, but there are several mismatches between the documentation and the shipped code and it will execute arbitrary task code and write reports outside its folder — so review before installing or running with sensitive data.
评估建议
What to check before installing or running this skill: - Do not run the skill with sensitive environment variables present. The runner forwards your process.env into any executed task code and tasks can read files and env vars. - The SKILL.md instructs running a PowerShell runner that is not present in the package; instead a Node-based runner (index.js) is included — confirm which runner you intend to use and why documentation and code differ. - Review index.js and any task files (tasks.json, de...
详细分析 ▾
用途与能力
The README/SKILL.md describes a PowerShell-based 'benchmark-runner.ps1' and instructs running PowerShell scripts, but the package actually contains a Node CLI (index.js) and Node test harness. The SKILL metadata lists no required binaries, yet index.js spawns external runtimes ('python', 'node', 'go run'). The repository also ships multiple task sets (PowerShell-style tasks and Python tasks) so it's unclear which runner is authoritative. These inconsistencies mean the claimed purpose (PowerShell edition runner) doesn't fully align with the actual code and runtime assumptions.
指令范围
SKILL.md tells users to run a local PowerShell runner path that is not present in the file manifest. The instructions encourage providing custom task files; index.js will write task code to disk and execute it with child processes, forwarding full process.env to children and allowing those tasks to read environment variables and the filesystem. The runtime behavior (create temp dirs, execute arbitrary code from tasks, and write reports) is broader than the missing/mismatched documentation implies and could run arbitrary user-supplied code.
安装机制
There is no install spec (instruction-only), which is low-risk. However, the Node program expects external interpreters (python/node/go) to be available. Those required binaries are not declared in the skill metadata or SKILL.md, so runtime failures or hidden execution of local interpreters are possible if binaries exist. No remote downloads or unusual install steps are present.
凭证需求
The skill requests no explicit credentials, but index.js passes the agent's full environment into spawned child processes and some default tasks read environment variables (e.g., task-011). The runner also writes reports to '../../memory/benchmark-results.md' (outside the skill folder), which persists data into an agent memory area. Executing arbitrary task code therefore has the ability to read environment variables and exfiltrate data if malicious task definitions are provided — this capability is proportional if you only run trusted tasks, but risky otherwise.
持久化与权限
always: false (good), but index.js writes a generated report into a path two levels up ('../../memory/benchmark-results.md'), which is outside the skill directory and likely into the agent's persistent memory area. The skill therefore persists output in a global location without declaring that behavior in metadata or prompting the user.
index.js:89
Shell command execution detected (child_process).
安全有层次,运行前请审查代码。

License

MIT-0

可自由使用、修改和再分发,无需署名。

运行时依赖

无特殊依赖

版本

latestv0.1.12026/3/30

- 更新 package.json,版本号从 0.1.0 升级为 0.1.1 - 未对 SKILL.md 及核心功能文档进行修改 - 本次为 package 元数据小幅更新,无功能和文档变动

● 无害

安装命令 点击复制

官方npx clawhub@latest install yuyonghao-agent-benchmark
镜像加速npx clawhub@latest install yuyonghao-agent-benchmark --registry https://cn.clawhub-mirror.com
数据来源:ClawHub ↗ · 中文优化:龙虾技能库
OpenClaw 技能定制 / 插件定制 / 私有工作流定制

免费技能或插件可能存在安全风险,如需更匹配、更安全的方案,建议联系付费定制

了解定制服务