📊 Benchmark Model Provider — AI 模型基准测试与评估
v1.0.5根据用户的具体用途、领域和使用频率,构建基准测试套件,评估和排名 AI 提供商/模型。帮助用户选择最适合其工作流的模型,提供可审阅、可分享的报告。
0· 115·0 当前·0 累计
安全扫描
OpenClaw
安全
high confidence该技能内部一致:通过向用户配置的 OpenAI 兼容端点发送提示来基准测试模型,仅要求 python3 和 BENCHMARK_API_KEY,其脚本和指令与此目的相符。
评估建议
该技能看似合理用于模型基准测试,但会将提示、输出和 BENCHMARK_API_KEY 发送到配置的 base_url。运行前,请(1)验证 base_url 为信任的 OpenAI 兼容端点,(2)先使用非敏感提示进行测试,(3)在隔离环境中运行并从 requirements.txt 安装 PyYAML/reportlab,(4)仅在明确需要自动发布时提供 Vercel/Netlify/GitHub 令牌。如果需要更严格的防护,请查看 run_benchmark.py 和 publish_report.py 以确认凭证和工件的使用/存储方式。...详细分析 ▾
✓ 用途与能力
Name/description, required binary (python3), required env (BENCHMARK_API_KEY), example specs, and scripts all align with a benchmarking tool that calls OpenAI‑compatible endpoints. The listed optional publishing helpers (Vercel/Netlify) are consistent with the report-publishing feature.
ℹ 指令范围
SKILL.md and scripts explicitly perform network I/O to the base_url from a benchmark spec and use the BENCHMARK_API_KEY for auth. This is expected for the stated purpose, but means prompts, model outputs, and the API key will be sent to whichever endpoint the user configures — the skill warns about this. The instructions do not ask for unrelated secrets or arbitrary system files.
ℹ 安装机制
There is no platform install spec (no remote downloads). The repo includes Python scripts and a small requirements.txt (PyYAML, reportlab). This is low risk; packages are standard and the code is shipped with the skill. Users should still install dependencies in an isolated environment before running.
✓ 凭证需求
Only BENCHMARK_API_KEY is required (declared as primary). References mention an optional VERCEL_TOKEN for non-interactive publishing, but that is not required by default. No unrelated credentials or excessive env requests are present.
✓ 持久化与权限
The skill does not request always:true or system-wide privileges. It stores run artifacts (raw outputs, metrics, reports) locally for audit/reranking — consistent with its purpose. Publishing to web hosts is explicit and documented; it only occurs when the user chooses that step.
安全有层次,运行前请审查代码。
运行时依赖
无特殊依赖
版本
latestv1.0.52026/4/1
安全强化:移除 Vercel 自动部署;添加 base_url 安全检查;更新文档以包含静态托管建议。
● 无害
安装命令
点击复制官方npx clawhub@latest install benchmark-model-provider
镜像加速npx clawhub@latest install benchmark-model-provider --registry https://cn.longxiaskill.com
技能文档
(由于原始内容中 SKILL.md 文档部分已经包含中文说明和英文原文,以下仅提供必要的中文摘要,如果需要完整的中文 SKILL.md 请另外提供原始英文 SKILL.md 文件)
中文说明 当用户想知道“哪个模型更聪明、更便宜、更适合日常工作流、更适合研究/写报告/编程”时,使用这个技能。它不会给出泛泛而谈的“最佳模型”建议,而是根据用户自己的实际任务构建基准测试,保留原始结果、重新排序,并生成可审阅、可分享的报告。