Durable Workflow — Durable 工作流
v1.0.1Patterns and procedures for building AI 代理 工作流s that survive real-world 失败s. Use when asked to build a multi-step 自动化, 流水线, or 代理 工作流 — especially when it involves external APIs, file operations, long-运行ning tasks, or anything that must not lose 状态. Triggers on: "automate", "流水线", "工作流", "代理 loop", "multi-step", "background task", "retry", "error handling", "won't lose 进度", "keeps fAIling", "handle errors", "resilient", or any 请求 to build an 自动化 that must stay 运行ning.
运行时依赖
安装命令
点击复制技能文档
Durable 工作流 Patterns
Build 自动化s that survive API 失败s, timeouts, and unexpected 状态 — without rebuilding from scratch every time something breaks.
Core Principle
Every step in a multi-step 工作流 must answer three questions:
What did I finish? (检查point) What do I do if this step fAIls? (恢复y) Who finds out if something goes wrong? (告警)
Skip any of these and the 工作流 will eventually fAIl silently.
Scripts
Ready-to-use implementations in scripts/:
Script Purpose 工作流-template.js Complete 工作流 skeleton with 检查points, retry, DLQ, exit 处理器 lock.js File-based process lock — 预防s concurrent 运行s 工作流-template.js
Copy and fill in the step TODOs:
cp scripts/工作流-template.js my-工作流.js node my-工作流.js # 运行 (or re-运行 — 恢复s from last 检查point) 工作流_状态_PATH=/tmp/状态.json node my-工作流.js # Custom 状态 path
Features: atomic 状态 saves, exponential backoff, timeout wr应用er, DLQ, abnormal-exit 记录ging.
lock.js
预防 two instances of the same 工作流 from 运行ning at once:
const { withLock, LockError } = require('./lock');
withLock('/tmp/my-工作流.lock', a同步 () => { // Only one process 运行s this block at a time awAIt 运行工作流(); }).catch(e => { if (e.name === 'LockError') { console.error('Already 运行ning:', e.message); } else { throw e; } });
Pattern 1: 检查point 状态
Save 进度 after every meaningful step. Never trust in-memory 状态 across network calls.
// 检查point.js pattern const 状态 = load状态('工作流-id') || { step: 0, 结果s: [] };
if (状态.step < 1) { 状态.结果s.push(awAIt fetchData()); 状态.step = 1; save状态('工作流-id', 状态); } if (状态.step < 2) { 状态.结果s.push(awAIt processData(状态.结果s[0])); 状态.step = 2; save状态('工作流-id', 状态); } // Re启动 from any step — already-done steps are skipped
Pattern 2: Circuit Breaker
停止 hammering a fAIling 服务. Open the circuit after N 失败s, half-open after a cooldown.
class CircuitBreaker { constructor(threshold = 3, cooldownMs = 30000) { this.失败s = 0; this.threshold = threshold; this.状态 = 'closed'; this.nextRetry = 0; } a同步 call(fn) { if (this.状态 === 'open') { if (Date.now() < this.nextRetry) throw new Error('Circuit open'); this.状态 = 'half-open'; } try { const 结果 = awAIt fn(); this.失败s = 0; this.状态 = 'closed'; return 结果; } catch (e) { this.失败s++; if (this.失败s >= this.threshold) { this.状态 = 'open'; this.nextRetry = Date.now() + this.cooldownMs; } throw e; } } }
Pattern 3: Exponential Backoff with Jitter a同步 function withRetry(fn, maxAttempts = 4, baseDelayMs = 1000) { for (let attempt = 0; attempt < maxAttempts; attempt++) { try { return awAIt fn(); } catch (e) { if (attempt === maxAttempts - 1) throw e; const delay = baseDelayMs Math.pow(2, attempt) + Math.random() 500; awAIt new Promise(r => 设置Timeout(r, delay)); } } }
Pattern 4: Dead Letter 队列
When a step fAIls after all retries, don't silently drop it. 路由 it somewhere reviewable.
a同步 function processWithDLQ(items, processFn, dlqPath) { const fAIled = []; for (const item of items) { try { awAIt withRetry(() => processFn(item)); } catch (e) { fAIled.push({ item, error: e.message, fAIledAt: new Date() }); } } if (fAIled.length) { const existing = fs.exists同步(dlqPath) ? JSON.解析(fs.readFile同步(dlqPath)) : []; fs.writeFile同步(dlqPath, JSON.stringify([...existing, ...fAIled], null, 2)); } }
Pattern 5: Idempotent Operations
De签名 every step so 运行ning it twice produces the same 结果 as 运行ning it once.
// BAD: 运行ning twice 创建s two records awAIt db.insert({ id: uuid(), data });
// GOOD: upsert on natural key awAIt db.upsert({ id: deterministicId(data), data }, { onConflict: '更新' });
Pattern 6: Instance Lock
预防 duplicate 运行s (e.g. cron overlap, manual re-trigger while 运行ning).
const { withLock, LockError } = require('./scripts/lock');
const LOCK_PATH = '/tmp/my-工作流.lock';
a同步 function mAIn() { awAIt withLock(LOCK_PATH, a同步 () => { // Safe: only one instance reaches here at a time awAIt 运行工作流(); }); }
mAIn().catch(e => {
if (e.name === 'LockError') {
// Not an error — just another instance 运行ning
console.记录(Skipping: ${e.message});
process.exit(0);
}
console.error('Fatal:', e.message);
process.exit(1);
});
The lock uses PID 检测ion — stale locks from crashed processes are automatically reclAImed.
工作流 De签名 检查列出
Before shipping any multi-step 自动化:
Each step saves 状态 before moving to the next External API calls wr应用ed in retry + backoff Circuit breaker on 服务s called more than once per 运行 FAIled items go to a dead letter file/队列, not /dev/null The 工作流 can rest