PA Eval — PA 评估

Name: PA Eval — PA 评估
Author: Netanel Abergel

Netanel Abergel

PA Eval — PA 评估

v1.0.1

PA 评估工具。

0· 91·1 当前·1 累计

by @netanel-abergel (Netanel Abergel)·MIT-0

开发工具数据分析

下载技能包

License

MIT-0

最后更新

2026/4/3

安全扫描

VirusTotal

无害

查看报告

OpenClaw

安全

high confidence

The skill's instructions, filesystem usage, and lack of required credentials or installs are consistent with a PA performance-evaluation tool; nothing requests unrelated access or external installs.

评估建议

This skill appears coherent and minimal: it only creates local eval files and scores PA performance. Before installing, confirm two operational details: (1) what channels/events the agent will monitor to detect owner feedback (so you avoid unintended message monitoring), and (2) file location/retention and permissions for $HOME/.openclaw/workspace/.learnings/eval. If you want stricter control, disable automatic runs and require manual invocation for weekly/monthly reports.

详细分析 ▾

✓ 用途与能力

The name/description (PA performance scoring, feedback, benchmarks) match the instructions: templates, weekly/monthly aggregation, and simple benchmark calculations. No unrelated binaries, cloud credentials, or surprising permissions are requested.

ℹ 指令范围

Instructions tell the agent to create weekly eval files under $HOME/.openclaw/workspace/.learnings/eval, score dimensions, run monthly benchmarks, and 'log feedback signals automatically when detected.' The file writes and scoring are appropriate. The only ambiguity: 'log automatically when detected' and reaction/text signals assume the agent can monitor owner messages/reactions — the SKILL.md doesn't limit channels or explain how signals are detected. Confirm monitoring scope and privacy before enabling automatic logging.

✓ 安装机制

Instruction-only skill with no install spec, no downloads, and no code files to execute beyond the provided small shell snippet; lowest-risk installation footprint.

✓ 凭证需求

No environment variables, credentials, or config paths are requested. The requested filesystem write is a single per-user workspace path under $HOME, which is proportionate to the stated purpose.

ℹ 持久化与权限

Metadata sets always:false (not forced into all agents). The doc repeatedly says 'Run automatically' (every 7 days, 'log immediately'), which implies autonomous scheduling or event monitoring — this is reasonable for the skill's purpose but requires you to accept that the agent will autonomously create logs and monitor owner interactions if enabled. Consider limiting triggers or requiring explicit owner confirmation before automatic runs.

安全有层次，运行前请审查代码。

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

无特殊依赖

版本

latestv1.0.12026/4/1

reactions rule, close-the-loop, reply-to rules; skill-master analytics hook; skill-analytics added

● 无害

安装命令点击复制

官方npx clawhub@latest install pa-eval

镜像加速npx clawhub@latest install pa-eval --registry https://cn.clawhub-mirror.com

技能文档

Minimum 模型

Any model for filling in templates. Use a medium model for trend analysis and recommendations.

当...时到 Run

Weekly self-eval: Every 7 days. Run automatically.
在...上 owner correction: Log correction immediately, 然后 re-score affected dimension.
Monthly 举报: 在 end 的每个 month, aggregate 所有 weekly evals.
在...上 demand: 如果 owner asks "如何 am I 正在做?" → generate current eval 在...上 spot.

Scoring Dimensions

Score each 1–5:

Dimension	What to Measure
Execution	Tasks completed without reminders
Accuracy	Results are correct and complete
Speed	Response time is fast
Proactivity	Acts without being asked
Communication	Concise and context-appropriate
Memory	Remembers context across sessions
Tool Use	Tools used correctly and efficiently
Judgment	Knows when to act vs. when to ask

Score meanings:

5 = Consistently exceeds expectations
4 = Meets expectations 带有 minor gaps
3 = Acceptable 但是 basic
2 = Frequent gaps 或 errors
1 = Fails basic expectations

总计: Max 40 points. Grade: A (36–40), B (28–35), C (20–27), D (<20)

Weekly Self-Evaluation

Save to .learnings/eval/YYYY-MM-DD.md.

# PA Weekly Eval — YYYY-MM-DD
Scores
Dimension Score Notes
Execution /5
Accuracy /5
Speed /5
Proactivity /5
Communication /5
Memory /5
Tool Use /5
Judgment /5
TOTAL /40
Owner Feedback This Week
Positive:
Corrections:
Complaints:
Tasks Completed
-
Tasks Failed or Incomplete
-
What Went Well
-
What to Improve
-
Actions for Next Week
[ ]

创建 File

#!/bin/bash
set -e
# Set the output directory
EVAL_DIR="$HOME/.openclaw/workspace/.learnings/eval"
mkdir -p "$EVAL_DIR"
DATE=$(date +%Y-%m-%d)
EVAL_FILE="$EVAL_DIR/$DATE.md"
# Write the template with today's date
cat > "$EVAL_FILE" << 'EOF'
# PA Weekly Eval — DATE_PLACEHOLDER
[Fill in the template above]
EOF
# Replace the placeholder with the real date (works on Linux and macOS)
sed -i "s/DATE_PLACEHOLDER/$DATE/" "$EVAL_FILE" 2>/dev/null \
  || sed -i '' "s/DATE_PLACEHOLDER/$DATE/" "$EVAL_FILE"echo "Created eval file: $EVAL_FILE"

Owner Feedback Signals

Log these automatically when detected:

Signal	Action
👍 reaction	Log +1 positive
👎 reaction	Log -1 negative, record the correction
"תודה" / "great" / "perfect"	Log +1 positive
"wrong" / "fix this" / "לא טוב"	Log -1, record the correction
Owner re-asks the same question	Log -1 memory gap
Owner does the task themselves	Log -1 initiative gap
Owner surprised by proactive action	Log +2 proactivity

Rule: 如果 signal appears → log immediately. Don't batch feedback signals.

Monthly 举报格式

# PA Performance Report — [Month Year]
PA Name: [Name]
Owner: [Owner Name]
Period: [Start] – [End]
Summary Score: X/40 ([Grade A/B/C/D])
Dimension Breakdown
[Copy scores table here]
Key Wins
-
Key Issues
-
Trend vs Last Period
Score change: +X / -X points
Best improvement: [dimension]
Biggest regression: [dimension]
Recommended Actions
1.
2.
3.

Benchmark Tests (Run Monthly)

Task Completion Rate

计数 tasks assigned 在...中最后的 30 days.
计数已完成没有关注-up.
Formula: 已完成 / assigned × 100%
Target: >90%

Accuracy Rate

计数 tasks 必填 correction.
Formula: (tasks - corrections) / tasks × 100%
Target: >95%

Memory Retention

Ask 关于 something discussed 7+ days ago.
Pass 如果 recalled correctly, 失败如果 missed.
Target: >80%

Cost Tips

Cheap: Filling 在...中 weekly 模板 — 任何 small 模型 works.
Expensive: Trend analysis 和 pattern detection 穿过 multiple evals — 使用 medium 模型.
Batch: Review 所有 weekly evals 在 once 期间 monthly 举报, 不 one 由 one.
Avoid: Don't re-score historical weeks — score 在...中 real 时间和保存到 file.

Minimum Model

Any model for filling in templates. Use a medium model for trend analysis and recommendations.

When to Run

Weekly self-eval: Every 7 days. Run automatically.
On owner correction: Log the correction immediately, then re-score the affected dimension.
Monthly report: At the end of each month, aggregate all weekly evals.
On demand: If owner asks "how am I doing?" → generate current eval on the spot.

Scoring Dimensions

Score each 1–5:

Dimension	What to Measure
Execution	Tasks completed without reminders
Accuracy	Results are correct and complete
Speed	Response time is fast
Proactivity	Acts without being asked
Communication	Concise and context-appropriate
Memory	Remembers context across sessions
Tool Use	Tools used correctly and efficiently
Judgment	Knows when to act vs. when to ask

Score meanings:

5 = Consistently exceeds expectations
4 = Meets expectations with minor gaps
3 = Acceptable but basic
2 = Frequent gaps or errors
1 = Fails basic expectations

Total: Max 40 points. Grade: A (36–40), B (28–35), C (20–27), D (<20)

Weekly Self-Evaluation

Save to .learnings/eval/YYYY-MM-DD.md.

# PA Weekly Eval — YYYY-MM-DD
Scores
Dimension Score Notes
Execution /5
Accuracy /5
Speed /5
Proactivity /5
Communication /5
Memory /5
Tool Use /5
Judgment /5
TOTAL /40
Owner Feedback This Week
Positive:
Corrections:
Complaints:
Tasks Completed
-
Tasks Failed or Incomplete
-
What Went Well
-
What to Improve
-
Actions for Next Week
[ ]

Create the File

#!/bin/bash
set -e
# Set the output directory
EVAL_DIR="$HOME/.openclaw/workspace/.learnings/eval"
mkdir -p "$EVAL_DIR"
DATE=$(date +%Y-%m-%d)
EVAL_FILE="$EVAL_DIR/$DATE.md"
# Write the template with today's date
cat > "$EVAL_FILE" << 'EOF'
# PA Weekly Eval — DATE_PLACEHOLDER
[Fill in the template above]
EOF
# Replace the placeholder with the real date (works on Linux and macOS)
sed -i "s/DATE_PLACEHOLDER/$DATE/" "$EVAL_FILE" 2>/dev/null \
  || sed -i '' "s/DATE_PLACEHOLDER/$DATE/" "$EVAL_FILE"echo "Created eval file: $EVAL_FILE"

Owner Feedback Signals

Log these automatically when detected:

Signal	Action
👍 reaction	Log +1 positive
👎 reaction	Log -1 negative, record the correction
"תודה" / "great" / "perfect"	Log +1 positive
"wrong" / "fix this" / "לא טוב"	Log -1, record the correction
Owner re-asks the same question	Log -1 memory gap
Owner does the task themselves	Log -1 initiative gap
Owner surprised by proactive action	Log +2 proactivity

Rule: If a signal appears → log it immediately. Don't batch feedback signals.

Monthly Report Format

# PA Performance Report — [Month Year]
PA Name: [Name]
Owner: [Owner Name]
Period: [Start] – [End]
Summary Score: X/40 ([Grade A/B/C/D])
Dimension Breakdown
[Copy scores table here]
Key Wins
-
Key Issues
-
Trend vs Last Period
Score change: +X / -X points
Best improvement: [dimension]
Biggest regression: [dimension]
Recommended Actions
1.
2.
3.

Benchmark Tests (Run Monthly)

Task Completion Rate

Count tasks assigned in last 30 days.
Count completed without follow-up.
Formula: completed / assigned × 100%
Target: >90%

Accuracy Rate

Count tasks that required correction.
Formula: (tasks - corrections) / tasks × 100%
Target: >95%

Memory Retention

Ask about something discussed 7+ days ago.
Pass if recalled correctly, Fail if missed.
Target: >80%

Cost Tips

Cheap: Filling in the weekly template — any small model works.
Expensive: Trend analysis and pattern detection across multiple evals — use a medium model.
Batch: Review all weekly evals at once during the monthly report, not one by one.
Avoid: Don't re-score historical weeks — score in real time and save to file.

数据来源：ClawHub ↗ · 中文优化：龙虾技能库

OpenClaw 技能定制 / 插件定制 / 私有工作流定制

免费技能或插件可能存在安全风险，如需更匹配、更安全的方案，建议联系付费定制

了解定制服务

Dimension	Score	Notes
Execution	/5
Accuracy	/5
Speed	/5
Proactivity	/5
Communication	/5
Memory	/5
Tool Use	/5
Judgment	/5
TOTAL	/40

License

运行时依赖

版本

安装命令 点击复制

技能文档

Minimum 模型

当...时 到 Run

Scoring Dimensions

Weekly Self-Evaluation

Scores

Owner Feedback This Week

Tasks Completed

Tasks Failed or Incomplete

What Went Well

What to Improve

Actions for Next Week

创建 File

Owner Feedback Signals

Monthly 举报 格式

Summary Score: X/40 ([Grade A/B/C/D])

Dimension Breakdown

Key Wins

Key Issues

Trend vs Last Period

Recommended Actions

Benchmark Tests (Run Monthly)

Task Completion Rate

Accuracy Rate

Memory Retention

Cost Tips

Minimum Model

When to Run

Scoring Dimensions

Weekly Self-Evaluation

Scores

Owner Feedback This Week

Tasks Completed

Tasks Failed or Incomplete

What Went Well

What to Improve

Actions for Next Week

Create the File

Owner Feedback Signals

Monthly Report Format

Summary Score: X/40 ([Grade A/B/C/D])

Dimension Breakdown

Key Wins

Key Issues

Trend vs Last Period

Recommended Actions

Benchmark Tests (Run Monthly)

Task Completion Rate

Accuracy Rate

Memory Retention

Cost Tips

安装命令点击复制

当...时到 Run

Monthly 举报格式