安全扫描
OpenClaw
安全
high confidenceThe skill is internally coherent: its instructions, lack of installs, and lack of required credentials line up with its stated purpose of providing evaluation patterns and prompt templates for LLM-as-judge workflows.
评估建议
This skill is a focused playbook for building LLM-as-judge systems and appears coherent and low-risk. Before installing or using it: (1) Review whether you want the evaluator to produce chain-of-thought justifications — these can reveal internal reasoning or sensitive prompt/context and can be disabled if you want only scores. (2) When evaluating private or sensitive content, ensure your evaluation pipeline does not send that data to external models or services you don't control. (3) Prefer usin...详细分析 ▾
✓ 用途与能力
Name and description match the SKILL.md content: the doc is a detailed playbook for LLM-based evaluation (direct scoring, pairwise comparison, rubrics, bias mitigation). There are no requests for unrelated binaries, credentials, or system access that would be out of scope for an evaluation skill.
ℹ 指令范围
The instructions are detailed and focused on evaluation techniques, prompting patterns, bias mitigation, calibration, and statistical analysis. One notable instruction pattern is to require justifications (chain-of-thought) before giving scores; this is appropriate for reliability but can expose evaluator reasoning that some users may prefer to keep private. The skill does not instruct the agent to read local files, environment variables, or send results to external endpoints beyond whatever evaluation pipeline the user implements.
✓ 安装机制
Instruction-only skill with no install spec and no code files. This minimizes disk writes and arbitrary code execution; there are no download URLs, packages, or binaries installed by this skill.
✓ 凭证需求
The skill declares no required environment variables, credentials, or config paths. The guidance to use separate models for generation vs evaluation is sensible but not enforced by any hidden credential requests.
✓ 持久化与权限
always is false and the skill is user-invocable. It does not request persistent system presence or modify other skills' configuration. Autonomous invocation is allowed (platform default) but not combined with other privilege escalations here.
安全有层次,运行前请审查代码。
运行时依赖
无特殊依赖
版本
latestv1.0.02026/4/1
Initial release of advanced-evaluation, a comprehensive skill for building robust LLM evaluation systems. - Provides actionable guidance for implementing LLM-as-judge in automated pipelines. - Explains evaluation methods: direct scoring vs. pairwise comparison, with reliability and bias considerations. - Details systemic LLM biases (e.g., position, length, self-enhancement) and mitigation strategies. - Outlines metric selection frameworks for different evaluation tasks. - Supplies prompt templates and protocols for direct scoring, pairwise comparison, and rubric creation. - Offers practical patterns for evaluation pipeline design and rubric adaptation by domain.
● 无害
安装命令
点击复制官方npx clawhub@latest install advanced-evaluation
镜像加速npx clawhub@latest install advanced-evaluation --registry https://cn.longxiaskill.com