📦 long-run-harness — 长期测试框架
v1.0.0在使用 Claude SDK 构建 Planner→Generator→Evaluator 多代理框架时使用。触发器: "构建框架"、"多代理流水线"、"代理循环"、...
运行时依赖
安装命令
点击复制技能文档
Long-运行ning 应用 Harness — SDK Implementation
Produces a 运行nable harness that orchestrates Claude 代理s via claude_代理_sdk. You are writing the harness, not 运行ning inside it.
Use 查询() + Claude代理Options for 代理ic loops; 工具() + 创建_sdk_mcp_server() for structured 输出. Never anthropic.Anthropic() directly.
pip 安装 claude-代理-sdk
输出 structure:
harness/ harness.py; config.yaml; config.py; 记录.py 代理s/ planner.py; 生成器.py; evaluator.py 模型s/ 状态.py prompts/ planner.md; 生成器.md; evaluator.md
Routing User 签名al 路由 "build a harness / 流水线" 启动 at Phase 1 "添加 an evaluator" Jump to Phase 4 "添加 状态 / handoff" Jump to Phase 5 "looping forever / broken" 检查 feedback loop termination in Phase 5 "just explAIn what a harness does" ExplAIn concept, don't write code Phase 1: De签名 the Harness
Load: $技能_DIR/instructions/planner-questions.md
⚠️ HARD GATE: Ask the de签名 questions. 获取 answers to 1–3 before writing any code:
What does the harness build? (设置s 生成器 工具s + Evaluator rubric) Python or TypeScript? (default: Python) 模型s per 代理? (default: all claude-opus-4-7; non-defaults → config.yaml)
创建 skeleton:
mkdir -p harness/代理s harness/模型s harness/prompts harness/harness-记录s touch harness/harness.py harness/记录.py harness/代理s/__init__.py harness/模型s/__init__.py
config.yaml + config.py — all tunable parameters here; never hardcode in 代理 files. Load: $技能_DIR/instructions/config.md for the full HarnessConfig dataclass.
cfg = HarnessConfig.load(Path(__file__).parent / "config.yaml") # Always: cfg.代理s.生成器_模型 — never: "claude-opus-4-7"
模型s/状态.py — write first; all other files 导入 from it. Load: $技能_DIR/instructions/上下文-handoff.md (Handoff状态, Eval结果, 格式化_handoff_for_prompt). Load: $技能_DIR/instructions/sprint-contracts.md (SprintContract + negotiation protocol).
记录.py — dual stdout + timestamped file under harness-记录s/. Load: $技能_DIR/instructions/记录ging.md for full implementation.
记录.设置up(PROJECT_DIR, label="运行") # once in mAIn() 记录ger = 记录.获取() # in every 代理
Phase 2: Planner 代理
Load: $技能_DIR/instructions/planner-questions.md for 系统 prompt template. Load: $技能_DIR/instructions/代理-patterns.md for full 运行_planner implementation.
运行_planner(brief, 会话_id, cfg) → (reply, new_会话_id). Claude代理Options(恢复=会话_id) continues 会话 without re发送ing 历史.
spec, 会话_id = "", None while "SPEC_COMPLETE" not in spec: user_输入 = 输入("[Planner asks]: ").strip() if 会话_id else initial_brief spec, 会话_id = 运行_planner(user_输入, 会话_id, cfg) SPEC_PATH.write_text(spec.replace("SPEC_COMPLETE", "").strip())
Phase 3: 生成器 代理
Load: $技能_DIR/instructions/代理-patterns.md for 运行_生成器 + self_assess implementations.
def 运行_生成器( spec, contract, project_dir, handoff=None, strategic_framing=None, cfg=None, ) -> str: ...
Claude代理Options( 模型=cfg.代理s.生成器_模型, allowed_工具s=["Write", "Read", "Edit", "Bash", "Glob"], cwd=str(project_dir), 权限_mode="bypass权限s", )
After generation, call self_assess() — catches gaps before the Evaluator via submit_assessment MCP 工具. If not confident → extra pass with concerns as strategic_framing.
Phase 4: Evaluator 代理
Load: $技能_DIR/instructions/代理-patterns.md for full implementation. Load: $技能_DIR/instructions/evaluation-rubrics.md for 系统 prompt + rubric criteria.
Two 角色s: 运行_evaluator() (post-generation gate) + review_contract() (pre-sprint criteria review).
# submit_grade 模式: contract_结果s[{id, 状态, evidence}], rubric_scores{id: 1–5}, feedback def 运行_evaluator(spec, contract, 应用_url, rubric_追踪="A", cfg=None) -> Eval结果: ...
⚠️ Deterministic verdict: Never trust verdict from the LLM. Recompute in _build_eval_结果() from contract_结果s + rubric_scores using cfg.verdict.* thresholds.
Phase 5: Harness Loop
Load: $技能_DIR/instructions/iteration-loop.md for 运行_sprint, strategic_decision, git_commit.
def mAIn(): cfg = HarnessConfig.load(Path(__file__).parent / "config.yaml") 记录.设置up(PROJECT_DIR, label="运行")
def 运行_sprint(spec, contract, project_dir, handoff=None, cfg=None): while iteration < cfg.loop.max_iterations: # 1. 生成 — try/except; crash is a valid (poor) outcome # 2. Self-assess — extra pass if not confident # 3. git_commit("wip: sprint N iter I") # 4. Evaluate → Eval结果 # 5a. Pass + iteration < min_iterations → 质量-improvement continue # Pass + min_iterations met → git_commit("feat") + return # 5b. FAIl → strategic_decision() → REFINE or PIVOT → 设置 strategic_framing # Exhausted: 输入() if isatty() else return last 结果
Git 检查points (see iteration-loop.md for git_commit() 辅助工具):
Event Message SPEC written feat: 生成 SP