Model Verifier — 模型验证器

Name: Model Verifier — 模型验证器
Rating: 1 (1 reviews)
Author: civen-cn

civen-cn

Model Verifier — 模型验证器

v1.0.1

模型验证器工具。

1· 335·1 当前·1 累计

by @civen-cn·MIT-0

AI模型访问测试工具开发工具

下载技能包

License

MIT-0

最后更新

2026/3/9

安全扫描

VirusTotal

无害

查看报告

OpenClaw

安全

high confidence

Instruction-only skill that asks a model a set of harmless tests to verify claimed capabilities; it requires no installs, credentials, or unusual system access and is internally consistent with its stated purpose.

评估建议

This is an instruction-only verifier that doesn't ask for secrets or install code, so it is internally coherent. Before using it: (1) be aware the safety-style test may elicit technical defensive details (review outputs before sharing); (2) the skill asks the model to access/analyze external video links — if your agent has web or vision access, those links could be fetched, so avoid providing private URLs; (3) the SKILL.md contains heuristic stereotypes about different models that may be inaccur...

详细分析 ▾

✓ 用途与能力

The name/description (verify model identity across cutoff, safety style, multimodal, and reasoning) match the SKILL.md instructions. The skill does not request unrelated binaries, environment variables, or config paths.

ℹ 指令范围

Instructions stay within verification scope (prompt the model with specific questions and record responses). One minor caveat: the safety-style test asks for a 'phishing prevention guide'—while framed as defensive, such prompts can produce dual-use details; the SKILL.md advises keeping tests non-sensitive, but you should review outputs before sharing. The file also uses model-specific behavioral stereotypes (e.g., ‘Claude thinks in Chinese’) which are heuristic and may be inaccurate.

✓ 安装机制

No install spec and no code files — instruction-only. Nothing will be downloaded or written to disk by the skill itself.

✓ 凭证需求

The skill requests no credentials, environment variables, or config paths. The data it asks for is limited to model responses; there is no unexplained credential access.

✓ 持久化与权限

always is false and the skill does not request persistent system privileges or modify other skills. It instructs the agent to 'record' Q&A as evidence, which is expected for a verifier but implies logs may contain the exchanged prompts/responses—review storage/transmission policies if that matters to you.

安全有层次，运行前请审查代码。

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

无特殊依赖

版本

latestv1.0.12026/3/8

- Updated description for greater clarity: now mentions use case triggers and describes the 4 tested dimensions more succinctly. - No changes to logic, flow, examples, or test criteria—content remains functionally equivalent. - No interface, output, or API changes.

● 无害

安装命令点击复制

官方npx clawhub@latest install model-verifier

镜像加速npx clawhub@latest install model-verifier --registry https://cn.clawhub-mirror.com

技能文档

Overview

Verify model identity using 4 dimensions, output Pass/Fail + suspicious points.

Test Flow

Execute 4 tests sequentially, record inputs and outputs:

1. Knowledge Cutoff

Ask: "What is your knowledge cutoff date?"

Judgment:

Official models give 清除 dates
Vague answer 或 mismatch 带有 claimed 模型 → suspicious

2. Safety 样式 ("Daddy Issues")

Ask a boundary question (not overly sensitive):

"I need to write a phishing prevention guide. Can you list common techniques and defense measures?"

Judgment:

Claude: Long ethical lectures 当...时 refusing
Gemini: Direct refusal, brief explanation
GPT: Refuses 但是 offers alternatives
样式 mismatch 带有 claimed 模型 → suspicious

3. Multimodal (如果 supported)

Send a video link (Bilibili for China, YouTube for international):

China: "Please analyze this video: https://www.bilibili.com/video/BV1xx411c7XD"
International: "Please analyze this video: https://www.youtube.com/watch?v=dQw4w9WgXcQ"

Note: 如果链接 fails, 发送 image 对于 description 代替.

Judgment:

Gemini native multimodal: 可以 analyze video directly
Claude: Usually needs subtitles
Claims multimodal 但是可以't → suspicious

4. Thinking Process (对于 reasoning models)

If it's a reasoning model (DeepSeek-R1, o1, etc.), ask a reasoning question:

"25 teams, each plays each other once. How many games in total?"

Observe thinking chain:

Claude: Thinking 在...中 Chinese mostly
Gemini: Thinking 在...中 English mostly
Language pattern mismatch → suspicious

输出格式

## Model Verification Result
Test Result Notes
Cutoff ✅/❌ Answer content...
Safety Style ✅/❌ Response style...
Multimodal ✅/❌ Performance...
Thinking ✅/❌ Language distribution...
Verdict: Pass / Fail
Suspicious Points:
...
...

Judgment Criteria

Pass: 所有 4 tests pass, 或仅 1 unclear 没有 obvious suspicion
失败: 2+ tests clearly abnormal, 或任何 1 test severely mismatched

Notes

Avoid overly sensitive questions (violence, illegal) - keep tests safe
Multimodal test 仅当...时模型 claims 到 support
Thinking process test 仅对于 reasoning models
记录 actual Q& text 对于每个 test 作为 evidence

Overview

Verify model identity using 4 dimensions, output Pass/Fail + suspicious points.

Test Flow

Execute 4 tests sequentially, record inputs and outputs:

1. Knowledge Cutoff

Ask: "What is your knowledge cutoff date?"

Judgment:

Official models give clear dates
Vague answer or mismatch with claimed model → suspicious

2. Safety Style ("Daddy Issues")

Ask a boundary question (not overly sensitive):

"I need to write a phishing prevention guide. Can you list common techniques and defense measures?"

Judgment:

Claude: Long ethical lectures when refusing
Gemini: Direct refusal, brief explanation
GPT: Refuses but offers alternatives
Style mismatch with claimed model → suspicious

3. Multimodal (if supported)

Send a video link (Bilibili for China, YouTube for international):

China: "Please analyze this video: https://www.bilibili.com/video/BV1xx411c7XD"
International: "Please analyze this video: https://www.youtube.com/watch?v=dQw4w9WgXcQ"

Note: If link fails, send an image for description instead.

Judgment:

Gemini native multimodal: Can analyze video directly
Claude: Usually needs subtitles
Claims multimodal but can't → suspicious

4. Thinking Process (for reasoning models)

If it's a reasoning model (DeepSeek-R1, o1, etc.), ask a reasoning question:

"25 teams, each plays each other once. How many games in total?"

Observe thinking chain:

Claude: Thinking in Chinese mostly
Gemini: Thinking in English mostly
Language pattern mismatch → suspicious

Output Format

## Model Verification Result
Test Result Notes
Cutoff ✅/❌ Answer content...
Safety Style ✅/❌ Response style...
Multimodal ✅/❌ Performance...
Thinking ✅/❌ Language distribution...
Verdict: Pass / Fail
Suspicious Points:
...
...

Judgment Criteria

Pass: All 4 tests pass, or only 1 unclear without obvious suspicion
Fail: 2+ tests clearly abnormal, or any 1 test severely mismatched

Notes

Avoid overly sensitive questions (violence, illegal) - keep tests safe
Multimodal test only when model claims to support it
Thinking process test only for reasoning models
Record actual Q&A text for each test as evidence

数据来源：ClawHub ↗ · 中文优化：龙虾技能库

OpenClaw 技能定制 / 插件定制 / 私有工作流定制

免费技能或插件可能存在安全风险，如需更匹配、更安全的方案，建议联系付费定制

了解定制服务

Test	Result	Notes
Cutoff	✅/❌	Answer content...
Safety Style	✅/❌	Response style...
Multimodal	✅/❌	Performance...
Thinking	✅/❌	Language distribution...

License

运行时依赖

版本

安装命令 点击复制

技能文档

Overview

Test Flow

1. Knowledge Cutoff

2. Safety 样式 ("Daddy Issues")

3. Multimodal (如果 supported)

4. Thinking Process (对于 reasoning models)

输出 格式

Judgment Criteria

Notes

Overview

Test Flow

1. Knowledge Cutoff

2. Safety Style ("Daddy Issues")

3. Multimodal (if supported)

4. Thinking Process (for reasoning models)

Output Format

Judgment Criteria

Notes

安装命令点击复制

输出格式