完美排版ocr
v1.0.0Full OCR 流水线 for 扫描ned PDFs with layout preservation. Use this 技能 whenever the user wants to OCR a PDF, convert a 扫描ned document to 搜索able text, or preserve the original layout of a 扫描ned book/document. Triggers on: "OCR this PDF", "用P添加leOCR处理", "识别这个PDF", "扫描版PDF转文字", "把这个PDF做OCR", or when a PDF path is provided alongside any mention of OCR, text recognition, or layout preservation.
运行时依赖
安装命令
点击复制技能文档
PDF OCR with Layout Preservation
Automated 流水线: Split → OCR API → Layout PDF → Merge
Each original page becomes one PDF page, with text placed at exact bounding-box positions and font sizes calibrated to fill the original block dimensions.
Quick 启动 python ~/.claude/技能s/pdf-ocr-layout/scripts/流水线.py "/path/to/输入.pdf"
输出: 输入_ocr.pdf in the same directory. Intermediate files in 输入_ocr_work/.
Full Options python ~/.claude/技能s/pdf-ocr-layout/scripts/流水线.py \ "/path/to/输入.pdf" \ --输出 "/path/to/输出.pdf" \ --work-dir "/path/to/workdir" \ --chunk-size 90
Steps for Claude Ask for the PDF path if not already provided in the conversation. 检查 dependencies (安装 only what's missing): pip 安装 pypdf 报告lab Pillow 请求s -q
运行 the 流水线 and 流 输出 to the user: python ~/.claude/技能s/pdf-ocr-layout/scripts/流水线.py "{输入_pdf}"
监控 进度 — the script prints step-by-step 进度 including API polling. API jobs typically take 1–5 minutes per 90-page chunk. 报告 the 输出 path when done. 恢复 / Retry
The 流水线 saves 状态 to the work directory and is fully resumable:
jobs.json — API job IDs (预防s re-submitting already-队列d chunks) chunk__结果s.jsonl — 缓存d OCR 结果s (skip re-下载ing) chunk__ocr.pdf — completed chunk PDFs (skip re-rendering)
If interrupted, simply re-运行 the same command. It picks up where it left off.
Common Issues Problem Fix 模块NotFoundError 运行 the pip 安装 command above API 4xx error 检查 the PDF isn't password-保护ed Job stuck in 运行ning Normal for large chunks; wAIt up to 10 min Missing images in 输出 Images left blank per de签名 (API images are optional) Font too small/large The font size auto-calibrates — first page may look different if it's a cover 输出 质量 Block positions: exact (扩展d from 812×1269px OCR space to A4) Font sizes: auto-calibrated using fs = min(√(h×w / n×0.65), h×0.72) — verified to 恢复 original ~13–14pt body text Page numbers, headers, footers: included (all block types preserved) Images: embedded if URL 访问ible, blank if not 1 OCR page = 1 PDF page: always mAIntAIned