📊 Benchmark Model Provider — AI 模型基准测试与评估

v1.0.5

根据用户的具体用途、领域和使用频率，构建基准测试套件，评估和排名 AI 提供商/模型。帮助用户选择最适合其工作流的模型，提供可审阅、可分享的报告。

0· 115·0 当前·0 累计

by @tankisstank·MIT-0

数据与API AI模型访问

使用场景：使用Benchmark Model Provider — AI 模型基准测试与评估进行数据与API使用Benchmark Model Provider — AI 模型基准测试与评估

下载技能包

License

MIT-0

最后更新

2026/4/2

安全扫描

VirusTotal

无害

查看报告

OpenClaw

安全

high confidence

该技能内部一致：通过向用户配置的 OpenAI 兼容端点发送提示来基准测试模型，仅要求 python3 和 BENCHMARK_API_KEY，其脚本和指令与此目的相符。

评估建议

该技能看似合理用于模型基准测试，但会将提示、输出和 BENCHMARK_API_KEY 发送到配置的 base_url。运行前，请（1）验证 base_url 为信任的 OpenAI 兼容端点，（2）先使用非敏感提示进行测试，（3）在隔离环境中运行并从 requirements.txt 安装 PyYAML/reportlab，（4）仅在明确需要自动发布时提供 Vercel/Netlify/GitHub 令牌。如果需要更严格的防护，请查看 run_benchmark.py 和 publish_report.py 以确认凭证和工件的使用/存储方式。...

详细分析 ▾

✓ 用途与能力

Name/description, required binary (python3), required env (BENCHMARK_API_KEY), example specs, and scripts all align with a benchmarking tool that calls OpenAI‑compatible endpoints. The listed optional publishing helpers (Vercel/Netlify) are consistent with the report-publishing feature.

ℹ 指令范围

SKILL.md and scripts explicitly perform network I/O to the base_url from a benchmark spec and use the BENCHMARK_API_KEY for auth. This is expected for the stated purpose, but means prompts, model outputs, and the API key will be sent to whichever endpoint the user configures — the skill warns about this. The instructions do not ask for unrelated secrets or arbitrary system files.

ℹ 安装机制

There is no platform install spec (no remote downloads). The repo includes Python scripts and a small requirements.txt (PyYAML, reportlab). This is low risk; packages are standard and the code is shipped with the skill. Users should still install dependencies in an isolated environment before running.

✓ 凭证需求

Only BENCHMARK_API_KEY is required (declared as primary). References mention an optional VERCEL_TOKEN for non-interactive publishing, but that is not required by default. No unrelated credentials or excessive env requests are present.

✓ 持久化与权限

The skill does not request always:true or system-wide privileges. It stores run artifacts (raw outputs, metrics, reports) locally for audit/reranking — consistent with its purpose. Publishing to web hosts is explicit and documented; it only occurs when the user chooses that step.

安全有层次，运行前请审查代码。

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

无特殊依赖

版本

latestv1.0.52026/4/1

安全强化：移除 Vercel 自动部署；添加 base_url 安全检查；更新文档以包含静态托管建议。

● 无害

安装命令

点击复制

官方npx clawhub@latest install benchmark-model-provider

镜像加速npx clawhub@latest install benchmark-model-provider --registry https://cn.longxiaskill.com 镜像可用

本土化适配说明

Benchmark Model Provider — AI 模型基准测试与评估安装说明：安装命令：npx clawhub@latest install benchmark-model-provider

需要定制？告诉我你的需求 →

技能文档

（由于原始内容中 SKILL.md 文档部分已经包含中文说明和英文原文，以下仅提供必要的中文摘要，如果需要完整的中文 SKILL.md 请另外提供原始英文 SKILL.md 文件）

中文说明 当用户想知道“哪个模型更聪明、更便宜、更适合日常工作流、更适合研究/写报告/编程”时，使用这个技能。它不会给出泛泛而谈的“最佳模型”建议，而是根据用户自己的实际任务构建基准测试，保留原始结果、重新排序，并生成可审阅、可分享的报告。

License

运行时依赖

版本

安装命令

本土化适配说明

技能文档

相关技能推荐