📦 long-run-harness — 长期测试框架

v1.0.0

在使用 Claude SDK 构建 Planner→Generator→Evaluator 多代理框架时使用。触发器： "构建框架"、"多代理流水线"、"代理循环"、...

0· 19·0 当前·0 累计

by @is-xins-xiaobai (小白)

AI模型访问 CI/CD DevOps 设计工具钉钉

下载技能包

运行时依赖

无特殊依赖

安装命令

点击复制

官方npx clawhub@latest install long-run-harness

镜像加速npx clawhub@latest install long-run-harness --registry https://cn.longxiaskill.com✓ 镜像可用

需要定制？告诉我你的需求 →

技能文档

Long-运行ning 应用 Harness — SDK Implementation

Produces a 运行nable harness that orchestrates Claude 代理s via claude_代理_sdk. You are writing the harness, not 运行ning inside it.

Use 查询() + Claude代理Options for 代理ic loops; 工具() + 创建_sdk_mcp_server() for structured 输出. Never anthropic.Anthropic() directly.

pip 安装 claude-代理-sdk

输出 structure:

harness/ harness.py; config.yaml; config.py; 记录.py 代理s/ planner.py; 生成器.py; evaluator.py 模型s/ 状态.py prompts/ planner.md; 生成器.md; evaluator.md

Routing User 签名al 路由 "build a harness / 流水线" 启动 at Phase 1 "添加 an evaluator" Jump to Phase 4 "添加状态 / handoff" Jump to Phase 5 "looping forever / broken" 检查 feedback loop termination in Phase 5 "just explAIn what a harness does" ExplAIn concept, don't write code Phase 1: De签名 the Harness

Load: $技能_DIR/instructions/planner-questions.md

⚠️ HARD GATE: Ask the de签名 questions. 获取 answers to 1–3 before writing any code:

What does the harness build? (设置s 生成器工具s + Evaluator rubric) Python or TypeScript? (default: Python) 模型s per 代理? (default: all claude-opus-4-7; non-defaults → config.yaml)

创建 skeleton:

mkdir -p harness/代理s harness/模型s harness/prompts harness/harness-记录s touch harness/harness.py harness/记录.py harness/代理s/__init__.py harness/模型s/__init__.py

config.yaml + config.py — all tunable parameters here; never hardcode in 代理 files. Load: $技能_DIR/instructions/config.md for the full HarnessConfig dataclass.

cfg = HarnessConfig.load(Path(__file__).parent / "config.yaml") # Always: cfg.代理s.生成器_模型 — never: "claude-opus-4-7"

模型s/状态.py — write first; all other files 导入 from it. Load: $技能_DIR/instructions/上下文-handoff.md (Handoff状态, Eval结果, 格式化_handoff_for_prompt). Load: $技能_DIR/instructions/sprint-contracts.md (SprintContract + negotiation protocol).

记录.py — dual stdout + timestamped file under harness-记录s/. Load: $技能_DIR/instructions/记录ging.md for full implementation.

记录.设置up(PROJECT_DIR, label="运行") # once in mAIn() 记录ger = 记录.获取() # in every 代理

Phase 2: Planner 代理

Load: $技能_DIR/instructions/planner-questions.md for 系统 prompt template. Load: $技能_DIR/instructions/代理-patterns.md for full 运行_planner implementation.

运行_planner(brief, 会话_id, cfg) → (reply, new_会话_id). Claude代理Options(恢复=会话_id) continues 会话 without re发送ing 历史.

spec, 会话_id = "", None while "SPEC_COMPLETE" not in spec: user_输入 = 输入("[Planner asks]: ").strip() if 会话_id else initial_brief spec, 会话_id = 运行_planner(user_输入, 会话_id, cfg) SPEC_PATH.write_text(spec.replace("SPEC_COMPLETE", "").strip())

Phase 3: 生成器代理

Load: $技能_DIR/instructions/代理-patterns.md for 运行_生成器 + self_assess implementations.

def 运行_生成器( spec, contract, project_dir, handoff=None, strategic_framing=None, cfg=None, ) -> str: ...

Claude代理Options( 模型=cfg.代理s.生成器_模型, allowed_工具s=["Write", "Read", "Edit", "Bash", "Glob"], cwd=str(project_dir), 权限_mode="bypass权限s", )

After generation, call self_assess() — catches gaps before the Evaluator via submit_assessment MCP 工具. If not confident → extra pass with concerns as strategic_framing.

Phase 4: Evaluator 代理

Load: $技能_DIR/instructions/代理-patterns.md for full implementation. Load: $技能_DIR/instructions/evaluation-rubrics.md for 系统 prompt + rubric criteria.

Two 角色s: 运行_evaluator() (post-generation gate) + review_contract() (pre-sprint criteria review).

# submit_grade 模式: contract_结果s[{id, 状态, evidence}], rubric_scores{id: 1–5}, feedback def 运行_evaluator(spec, contract, 应用_url, rubric_追踪="A", cfg=None) -> Eval结果: ...

⚠️ Deterministic verdict: Never trust verdict from the LLM. Recompute in _build_eval_结果() from contract_结果s + rubric_scores using cfg.verdict.* thresholds.

Phase 5: Harness Loop

Load: $技能_DIR/instructions/iteration-loop.md for 运行_sprint, strategic_decision, git_commit.

def mAIn(): cfg = HarnessConfig.load(Path(__file__).parent / "config.yaml") 记录.设置up(PROJECT_DIR, label="运行")

def 运行_sprint(spec, contract, project_dir, handoff=None, cfg=None): while iteration < cfg.loop.max_iterations: # 1. 生成 — try/except; crash is a valid (poor) outcome # 2. Self-assess — extra pass if not confident # 3. git_commit("wip: sprint N iter I") # 4. Evaluate → Eval结果 # 5a. Pass + iteration < min_iterations → 质量-improvement continue # Pass + min_iterations met → git_commit("feat") + return # 5b. FAIl → strategic_decision() → REFINE or PIVOT → 设置 strategic_framing # Exhausted: 输入() if isatty() else return last 结果

Git 检查points (see iteration-loop.md for git_commit() 辅助工具):

Event Message SPEC written feat: 生成 SP

数据来源：ClawHub ↗ · 中文优化：龙虾技能库