安全扫描
OpenClaw
安全
high confidence该技能的代码、指令和要求与其声明的目的相符(SOTA CV/DS 竞赛的规划和工件支架),不请求无关的凭据或静默网络安装。
评估建议
该包看似连贯,专注于竞赛支架,但在运行前:(1)检查脚本以确认 URL/路径/引用清洗行为;(2)在专用竞赛工作空间运行脚本;(3)避免在暴露秘密的环境中运行;(4)如果与真实浏览器/VM 自动化配对,审查执行路径代码以防止意外网络泄漏或凭据使用。...详细分析 ▾
✓ 用途与能力
名称和描述(跨笔记本/VM的 SOTA 竞赛规划)与提供的脚本和参考匹配:工具创建竞赛、候选者、排行榜、运行卡、VM 启动清单,并渲染摘要。声明的二进制要求(python/python3)合适。
ℹ 指令范围
SKILL.md 指示运行本地 Python 脚本来创建和管理竞赛工件,并明确推荐将输出保持在竞赛工作空间内,并为实际运行使用配对的执行路径。范围限制为规划/记录。注意:运行时依赖于 sanitize_* 帮助程序(sota_public_safety)来编辑/清洗路径和 URL — 审查该文件以确认清洗行为如预期。
✓ 安装机制
无安装规格或外部下载;该技能是指令 + 从技能包运行的本地 Python 脚本。清单中无网络获取或存档提取。
✓ 凭证需求
该技能声明无需环境变量或凭据,甚至在程序模板中编码 OAuth 首选策略。脚本读取和清洗文件路径,但不需要令牌或密钥。
✓ 持久化与权限
always:false 和用户可调用;该技能不请求持久的高权限或修改其他技能。它将工件写入用户指定的路径(竞赛根目录),如其目的所期望。
安全有层次,运行前请审查代码。
运行时依赖
无特殊依赖
版本
latestv1.4.12026/3/19
润饰公开技能文档的措辞。
● 无害
安装命令 点击复制
官方npx clawhub@latest install sota-agent
镜像加速npx clawhub@latest install sota-agent --registry https://cn.clawhub-mirror.com
技能文档
目标
将模糊的「击败基准」请求转化为结构化的竞赛:- 固定的目标指标和分割
- 明确的文献和排行榜快照
- 有界的再现计划
- 明确的浏览器、笔记本或 VM 执行路径
- 当笔记本或浏览器状态重要时的 GUI 证据
- 每次回答一个问题的消解
- 仅在主张经受审查后进行推广
...(中间内容省略,仅保留部分示例)...
Goal
Turn a vague "beat the benchmark" request into a disciplined campaign:
- fixed target metric and split
- explicit literature and leaderboard snapshot
- bounded reproduction plan
- explicit browser, notebook, or VM execution lane
- GUI evidence when notebook or browser state matters
- ablations that answer one question at a time
- promotion only when the claim survives review
This skill is the frontier-planning and candidate-selection layer.
For browser evidence, VM execution, and promotion artifacts, pair it with
data-science-cv-repro-lab instead of letting the campaign drift into ad hoc runs.
Use This Skill When
- the user wants a CV or DS system pushed toward state-of-the-art results
- the task involves reproducing or surpassing recent papers
- the workflow needs paper triage, leaderboard tracking, or claim review
- the workflow includes OpenClaw, Colab, Kaggle, browser-only notebook actions, or GUI-heavy pages
- the user needs experiment management across browser research, notebooks, local runs, and long GPU jobs
- the user wants GPU VM or notebook watchdog logic, artifact pulls, or browser evidence for a SOTA candidate
- the question is whether a candidate is a real SOTA step or only noise, leakage, or benchmark overfitting
If the campaign includes serious execution or release review, use this skill to choose and rank candidates,
then use data-science-cv-repro-lab as the execution lane.
Quick Start
- Freeze the claim target before touching recipes.
- Initialize the campaign records immediately.
python3 {baseDir}/scripts/init_sota_campaign.py --root --campaign-id --title .
- Use python3 {baseDir}/scripts/init_sota_leaderboard_snapshot.py --out --task --dataset --metric --split .
- Use python3 {baseDir}/scripts/init_sota_paper_triage.py --out --campaign-id --task .
- Use python3 {baseDir}/scripts/init_sota_program.py --out --campaign-id --task --dataset --metric --split when you need one machine-readable benchmark, rerun, delegation, and auth plan.
- Use python3 {baseDir}/scripts/init_sota_candidate_card.py --out --candidate-id --campaign-id --objective .
- If execution review depends on synced QA runs, runtime sweeps, or benchmark panels, store the paired data-science-cv-repro-lab review dashboard path in the program and candidate records before the claim review starts.
- If the execution path depends on a real browser or notebook UI, use python3 {baseDir}/scripts/init_sota_browser_run_card.py --out --target-url .
- If the browser or notebook surface needs manual or visual QA, use python3 {baseDir}/scripts/init_sota_validation_scorecard.py --out --scorecard-id --surface .
- If a Colab, Kaggle, or notebook export bundle matters, use python3 {baseDir}/scripts/init_sota_artifact_manifest.py --out --bundle-root .
- If a long GPU VM run is involved, use python3 {baseDir}/scripts/init_sota_vm_bootstrap_manifest.py --out --output-root --model-family --command python train.py --epochs 40 .- Separate the campaign roles even if one agent performs all of them.
- Pick the execution lane explicitly.
- Keep file writes inside one campaign workspace.
--out, --bundle-root, and --output-root path under it.
- Do not point the bundled scripts at unrelated home-directory or system paths.
- Treat scripts/sota_public_safety.py as the canonical public-redaction layer for URLs, refs, and paths.- Work the SOTA ladder in order.
- Claim only on full-surface wins.
python3 {baseDir}/scripts/render_sota_claim_summary.py --candidate-card --out .Operating Rules
Campaign rules
- One campaign has one target benchmark contract.
- Do not let the target metric or split drift midstream.
- Keep a short hypothesis backlog and kill low-information ideas quickly.
- Record why each experiment exists before running it.
Codex multi-agent rules
- Main thread owns the benchmark contract, stop conditions, and final claim decision.
- Subagents should do bounded work only: scout, reproduce, ablate, or review.
- Do not let one exploratory thread silently rewrite the campaign contract.
- For repeated claim checks or literature extraction, prefer manifest-driven fanout over conversational drift.
Literature rules
- Read only the papers or repos that change the candidate plan.
- Extract the minimum useful fields: task, metric, split, data, compute, architecture, augmentations, training tricks, and caveats.
- Prefer a reproduced strong baseline over copying five tricks from five papers without control.
- Do not treat leaderboard rows as ground truth without checking task definition and split rules.
Ablation rules
- Change one meaningful variable at a time when the goal is causal understanding.
- If several knobs move together, label the run as a package change, not an ablation.
- Keep one canonical baseline recipe alive for comparison.
- Require the first winning candidate to survive at least one rerun or adjacent-seed check before escalating the claim.
Compute rules
- Spend cheap compute on reproduction and short falsification first.
- Do not push a long run unless the hypothesis would matter if it wins.
- Record training cost, wall time, and hardware for every serious candidate.
- Cut branches that cannot plausibly clear the target with the remaining budget.
OAuth and auth rules
- Use ChatGPT or Codex OAuth-backed sessions as the default and preferred path.
- Prefer Codex multi-agent or app-server workflows over orchestrators that require paid API keys.
- Do not require or recommend
OPENAI_API_KEY, other vendor API keys, or paid inference APIs as the default campaign runtime path. - If a third-party framework only works through paid API keys, treat it as reference material unless it can run fully through local tools and OAuth-backed Codex sessions.
OpenClaw browser rules
- Use OpenClaw for public papers, leaderboards, docs, notebook-only steps, and GUI-heavy flows when the browser lane adds evidence.
- Prefer direct public URLs over uploads or private sessions.
- Capture leaderboard, notebook, or GUI evidence as notes, screenshots, and exact URLs when they are part of the claim path.
- Fail hard on dead browser attach, missing notebook readiness, or unavailable requested model or runtime mode.
- Treat screenshots and GUI evidence as supporting artifacts, not the claim itself.
- Do not use browser-only summaries as the claim itself; claims still require benchmark artifacts.
Colab and notebook GPU rules
- Select the accelerator explicitly before running expensive cells.
- Run a smoke cell that proves imports, runtime, data mounts, and export paths all work.
- Keep one stable export root and pull the artifact manifest plus at least one preview back locally.
- Add the browser run card and validation scorecard when the notebook GUI is part of the evaluation story.
GPU VM rules
- Create a named run root before launch.
- Write a machine-readable VM bootstrap manifest before long runs.
- Run long jobs under a heartbeat, session, or supervisor so liveness is explicit.
- Sync metrics, summaries, and checkpoints back to a trusted store on a schedule.
- Do not promote directly from live VM state; promote from synced artifacts and review evidence.
Claim safety rules
- No SOTA claim without a fixed metric, split, and baseline.
- No SOTA claim on a contaminated benchmark or hidden train-on-test path.
- If the execution story depends on a dashboard or synced review surface, keep the dashboard path, source audit, and leakage audit in the claim packet.
- If a candidate wins only on one slice while regressing important surfaces, hold it.
- Report uncertainty honestly: "best internal result so far" is not the same as "new SOTA".
- Small deltas need rerun or adjacent-seed support before they become claim language.
References
Read only the reference that matches the task:
references/sota-campaign-playbook.md
references/sota-program-rules.md
references/campaign-harness-and-oauth-stack.md
references/benchmark-discipline.md
references/paper-triage.md
references/openclaw-research-lane.md
references/openclaw-browser-lane.md
references/colab-vm-operations.md
references/claim-safety.md
references/public-safety.md
Bundled Scripts
scripts/sota_public_safety.py
scripts/init_sota_campaign.py
scripts/init_sota_program.py
scripts/init_sota_leaderboard_snapshot.py
scripts/init_sota_paper_triage.py
scripts/init_sota_browser_run_card.py
scripts/init_sota_validation_scorecard.py
scripts/init_sota_artifact_manifest.py
scripts/init_sota_candidate_card.py
scripts/init_sota_candidate.py
scripts/init_sota_ablation_queue.py
scripts/init_sota_vm_bootstrap_manifest.py
scripts/update_sota_scoreboard.py
scripts/init_sota_review_packet.py
scripts/render_sota_claim_summary.py
scripts/render_sota_program_summary.py
数据来源:ClawHub ↗ · 中文优化:龙虾技能库
OpenClaw 技能定制 / 插件定制 / 私有工作流定制
免费技能或插件可能存在安全风险,如需更匹配、更安全的方案,建议联系付费定制