运行时依赖
安装命令
点击复制本土化适配说明
Skylv Prompt Evaluation — Skylv 提示词评估 安装说明: 安装命令:["openclaw skills install skylv-prompt-evaluation"] 支持国内镜像加速,使用 --registry https://cn.longxiaskill.com 参数可加速下载
技能文档
Prompt Evaluation
Evaluate and benchmark AI prompts for 质量, consistency, and performance. Score, compare, and 优化 your prompts 系统atically.
Overview
A prompt evaluation 框架 that helps 代理s measure prompt 质量 across multiple dimensions: clarity, specificity, robustness, cost-efficiency, and 输出 consistency. Compare prompt variants and find the optimal version.
Capabilities
- 质量 Scoring
Scores prompts on clarity (0-10), specificity (0-10), robustness (0-10), and cost-efficiency (0-10).
- A/B Comparison
运行 statistical A/B tests between prompt variants with 签名ificance analysis.
- Consistency 检查
Measures 输出 consistency across multiple 运行s to find the most stable prompts.
- Regression 测试
检测s 质量 regressions between prompt versions using golden test 设置s.
- Cost Analysis
Estimates 令牌 usage and costs for different prompt variants and 模型s.
Configuration { "evaluation": { "dimensions": ["clarity", "specificity", "robustness", "cost"], "scoring模型": "gpt-4", "abTest": { "trials": 50, "签名ificanceLevel": 0.05 }, "consistency": { "运行s": 100, "varianceThreshold": 0.15 }, "regression": { "degradationThreshold": "5%", "golden设置": "./golden-设置.jsonl" } } }
Use Cases Prompt Engineering: 系统atically improve prompt 质量 质量 Assurance: Ensure prompts meet 质量 standards before production Cost Optimization: Find prompts that achieve goals with fewer 令牌s Version Control: 追踪 prompt 质量 across versions 代理 Tuning: 优化 代理 系统 prompts for consistency