首页龙虾技能列表 › PA Eval — PA 评估

PA Eval — PA 评估

v1.0.1

PA 评估工具。

0· 91·1 当前·1 累计
by @netanel-abergel (Netanel Abergel)·MIT-0
下载技能包
License
MIT-0
最后更新
2026/4/3
安全扫描
VirusTotal
无害
查看报告
OpenClaw
安全
high confidence
The skill's instructions, filesystem usage, and lack of required credentials or installs are consistent with a PA performance-evaluation tool; nothing requests unrelated access or external installs.
评估建议
This skill appears coherent and minimal: it only creates local eval files and scores PA performance. Before installing, confirm two operational details: (1) what channels/events the agent will monitor to detect owner feedback (so you avoid unintended message monitoring), and (2) file location/retention and permissions for $HOME/.openclaw/workspace/.learnings/eval. If you want stricter control, disable automatic runs and require manual invocation for weekly/monthly reports.
详细分析 ▾
用途与能力
The name/description (PA performance scoring, feedback, benchmarks) match the instructions: templates, weekly/monthly aggregation, and simple benchmark calculations. No unrelated binaries, cloud credentials, or surprising permissions are requested.
指令范围
Instructions tell the agent to create weekly eval files under $HOME/.openclaw/workspace/.learnings/eval, score dimensions, run monthly benchmarks, and 'log feedback signals automatically when detected.' The file writes and scoring are appropriate. The only ambiguity: 'log automatically when detected' and reaction/text signals assume the agent can monitor owner messages/reactions — the SKILL.md doesn't limit channels or explain how signals are detected. Confirm monitoring scope and privacy before enabling automatic logging.
安装机制
Instruction-only skill with no install spec, no downloads, and no code files to execute beyond the provided small shell snippet; lowest-risk installation footprint.
凭证需求
No environment variables, credentials, or config paths are requested. The requested filesystem write is a single per-user workspace path under $HOME, which is proportionate to the stated purpose.
持久化与权限
Metadata sets always:false (not forced into all agents). The doc repeatedly says 'Run automatically' (every 7 days, 'log immediately'), which implies autonomous scheduling or event monitoring — this is reasonable for the skill's purpose but requires you to accept that the agent will autonomously create logs and monitor owner interactions if enabled. Consider limiting triggers or requiring explicit owner confirmation before automatic runs.
安全有层次,运行前请审查代码。

License

MIT-0

可自由使用、修改和再分发,无需署名。

运行时依赖

无特殊依赖

版本

latestv1.0.12026/4/1

reactions rule, close-the-loop, reply-to rules; skill-master analytics hook; skill-analytics added

● 无害

安装命令 点击复制

官方npx clawhub@latest install pa-eval
镜像加速npx clawhub@latest install pa-eval --registry https://cn.clawhub-mirror.com

技能文档

Minimum 模型

Any model for filling in templates. Use a medium model for trend analysis and recommendations.


当...时 到 Run

  • Weekly self-eval: Every 7 days. Run automatically.
  • 在...上 owner correction: Log correction immediately, 然后 re-score affected dimension.
  • Monthly 举报: 在 end 的 每个 month, aggregate 所有 weekly evals.
  • 在...上 demand: 如果 owner asks "如何 am I 正在做?" → generate current eval 在...上 spot.

Scoring Dimensions

Score each 1–5:

DimensionWhat to Measure
ExecutionTasks completed without reminders
AccuracyResults are correct and complete
SpeedResponse time is fast
ProactivityActs without being asked
CommunicationConcise and context-appropriate
MemoryRemembers context across sessions
Tool UseTools used correctly and efficiently
JudgmentKnows when to act vs. when to ask
Score meanings:
  • 5 = Consistently exceeds expectations
  • 4 = Meets expectations 带有 minor gaps
  • 3 = Acceptable 但是 basic
  • 2 = Frequent gaps 或 errors
  • 1 = Fails basic expectations

总计: Max 40 points. Grade: A (36–40), B (28–35), C (20–27), D (<20)


Weekly Self-Evaluation

Save to .learnings/eval/YYYY-MM-DD.md.

# PA Weekly Eval — YYYY-MM-DD

Scores

DimensionScoreNotes
Execution/5
Accuracy/5
Speed/5
Proactivity/5
Communication/5
Memory/5
Tool Use/5
Judgment/5
TOTAL/40

Owner Feedback This Week

  • Positive:
  • Corrections:
  • Complaints:

Tasks Completed

-

Tasks Failed or Incomplete

-

What Went Well

-

What to Improve

-

Actions for Next Week

  • [ ]

创建 File

#!/bin/bash
set -e

# Set the output directory EVAL_DIR="$HOME/.openclaw/workspace/.learnings/eval" mkdir -p "$EVAL_DIR"

DATE=$(date +%Y-%m-%d) EVAL_FILE="$EVAL_DIR/$DATE.md"

# Write the template with today's date cat > "$EVAL_FILE" << 'EOF' # PA Weekly Eval — DATE_PLACEHOLDER [Fill in the template above] EOF

# Replace the placeholder with the real date (works on Linux and macOS) sed -i "s/DATE_PLACEHOLDER/$DATE/" "$EVAL_FILE" 2>/dev/null \ || sed -i '' "s/DATE_PLACEHOLDER/$DATE/" "$EVAL_FILE"

echo "Created eval file: $EVAL_FILE"


Owner Feedback Signals

Log these automatically when detected:

SignalAction
👍 reactionLog +1 positive
👎 reactionLog -1 negative, record the correction
"תודה" / "great" / "perfect"Log +1 positive
"wrong" / "fix this" / "לא טוב"Log -1, record the correction
Owner re-asks the same questionLog -1 memory gap
Owner does the task themselvesLog -1 initiative gap
Owner surprised by proactive actionLog +2 proactivity
Rule: 如果 signal appears → log immediately. Don't batch feedback signals.


Monthly 举报 格式

# PA Performance Report — [Month Year]

PA Name: [Name] Owner: [Owner Name] Period: [Start] – [End]

Summary Score: X/40 ([Grade A/B/C/D])

Dimension Breakdown

[Copy scores table here]

Key Wins

-

Key Issues

-

Trend vs Last Period

  • Score change: +X / -X points
  • Best improvement: [dimension]
  • Biggest regression: [dimension]

Recommended Actions

1. 2. 3.

Benchmark Tests (Run Monthly)

Task Completion Rate

  • 计数 tasks assigned 在...中 最后的 30 days.
  • 计数 已完成 没有 关注-up.
  • Formula: 已完成 / assigned × 100%
  • Target: >90%

Accuracy Rate

  • 计数 tasks 必填 correction.
  • Formula: (tasks - corrections) / tasks × 100%
  • Target: >95%

Memory Retention

  • Ask 关于 something discussed 7+ days ago.
  • Pass 如果 recalled correctly, 失败 如果 missed.
  • Target: >80%

Cost Tips

  • Cheap: Filling 在...中 weekly 模板 — 任何 small 模型 works.
  • Expensive: Trend analysis 和 pattern detection 穿过 multiple evals — 使用 medium 模型.
  • Batch: Review 所有 weekly evals 在 once 期间 monthly 举报, 不 one 由 one.
  • Avoid: Don't re-score historical weeks — score 在...中 real 时间 和 保存 到 file.
数据来源:ClawHub ↗ · 中文优化:龙虾技能库
OpenClaw 技能定制 / 插件定制 / 私有工作流定制

免费技能或插件可能存在安全风险,如需更匹配、更安全的方案,建议联系付费定制

了解定制服务