📦 OCR Locally

v1.0.0

[macOS only] Use this 技能 when the user 请求s OCR (Optical Character Recognition), image/PDF text 提取ion. Uses macOS native Vision/PDFKit 框架s...

0· 30·0 当前·0 累计

by @ltryee

文件处理存储部署系统工具图像处理

下载技能包

运行时依赖

无特殊依赖

安装命令

点击复制

官方npx clawhub@latest install ocr-locally

镜像加速npx clawhub@latest install ocr-locally --registry https://cn.longxiaskill.com✓ 镜像可用

需要定制？告诉我你的需求 →

技能文档

Local OCR (macOS Only) Overview

⚠️ 平台 Requirement: This 技能 is macOS only. It requires macOS 10.15+ (Catalina or later) and uses macOS native 框架s:

Vision 框架 - For OCR text recognition PDFKit 框架 - For PDF processing Core Graphics - For image rendering

This 技能 provides OCR (Optical Character Recognition) capabilities using macOS native Vision 框架. It 提取s text from images and PDFs without requiring any third-party libraries or internet connection.

平台 Requirements

⚠️ macOS Only - This 技能 cannot 运行 on Linux, Windows, or other operating 系统s.

Required:

macOS 10.15+ (Catalina or later) Vision 框架 (pre-安装ed on macOS) PDFKit 框架 (pre-安装ed on macOS)

Why macOS Only?

Uses Vision 框架 for OCR (macOS/iOS only) Uses PDFKit 框架 for PDF processing (macOS/iOS only) Uses 应用Kit/Core Graphics for image handling (macOS only) When to Use This 技能

Trigger this 技能 when the user:

请求s OCR or image text 提取ion Mentions 提取ing text from images, screenshots, PDF files, or 扫描ned documents Uses keywords like: "识别图片", "OCR", "提取文字", "提取PDF文字", "识别PDF", "提取 text from image", "PDF OCR" Provides an image file or PDF file and asks to read or 提取 its content Core Capabilities

Text 提取ion from Images

Use scripts/ocr_vision_pro.swift for comprehensive OCR with the following features:

Multi-language support (Chinese, English, Japanese, Korean, and more) Two 输出 modes (mutually exclusive): Text Mode (-t): 输出 only 提取ed text (default) JSON Mode (-j): 输出 complete raw 信息 including text, position, and confidence as JSON Confidence scores for each 检测ed text block Bounding box in格式化ion (text position in image) 输出 to console or file Precise or fast recognition modes

Basic usage:

swift scripts/ocr_vision_pro.swift

With options:

swift scripts/ocr_vision_pro.swift -l zh-Hans,en -o 输出.txt -f

Text 提取ion from PDF Files

Use scripts/pdf_ocr.swift to 提取 text from PDF files with the following features:

提取 text from specific pages or all pages Support page range specification (e.g., 1-5, 1,3,5) Two 输出 modes (mutually exclusive): Text Mode (-t): 输出 only 提取ed text (default) JSON Mode (-j): 输出 complete raw 信息 as JSON Same multi-language support as image OCR Precise or fast recognition modes

Basic usage (all pages):

swift scripts/pdf_ocr.swift

With page specification:

# Single page swift scripts/pdf_ocr.swift document.pdf -p 1

# Multiple pages swift scripts/pdf_ocr.swift document.pdf -p 1,3,5

# Page range swift scripts/pdf_ocr.swift document.pdf -p 1-5

# JSON mode swift scripts/pdf_ocr.swift document.pdf -p 1 -j

输出 Modes (Mutually Exclusive)

The script supports two 输出 modes that cannot be used simultaneously:

Text Mode (Default, -t)

输出s only the 提取ed text:

Console 输出: Pure text File 输出 (-o or -t): Saves text to file, optionally with separate confidence file JSON Mode (-j)

输出s complete raw in格式化ion as JSON:

ContAIns: image path, total blocks, average confidence, and per-block detAIls Per-block 信息: 索引, text, confidence, bounding box (x, y, width, height) 输出s to stdout only (no file 输出 options in JSON mode)

JSON 输出 structure:

{ "imagePath": "/path/to/image.png", "totalBlocks": 25, "averageConfidence": 0.85, "blocks": [ { "索引": 1, "text": "recognized text", "confidence": 0.95, "boundingBox": { "x": 0.10, "y": 0.20, "width": 0.30, "height": 0.05 } } ] }

Supported File 格式化s

Image 格式化s (ocr_vision_pro.swift):

PNG (.png) JPEG (.jpg, .jpeg) TIFF (.tiff, .tif) BMP (.bmp)

PDF 格式化 (pdf_ocr.swift):

PDF (.pdf) - support single page, multiple pages, or page ranges Specify pages with -p option: 1, 1,3,5, or 1-5

Command-Line Options

Option Description -h, --help Show help in格式化ion -t, --text Text mode (default, 输出 only 提取ed text) -j, --json JSON mode (输出 complete raw 信息 as JSON) -l, --language Specify recognition language (comma-separated) -o, --输出输出 text to file, auto-生成 confidence file (_confidence.txt) -t, --text 输出 only complete text to specified file (text mode) -c, --confidence 输出 only confidence detAIls to specified file (text mode) -f, --fast Use fast mode (default: precise mode)

Note: -t (text mode) and -j (JSON mode) are mutually exclusive. JSON mode 输出s to stdout only.

Supported languages:

zh-Hans - Simplified Chinese zh-Hant - Traditional Chinese en - English ja - Japanese ko - Korean fr - French de - German es - Spanish it - Italian pt - Portuguese ru - Russian 工作流 Step 1: Identify the Image Path

When the user 请求s OCR:

Ask for the image path if not provided Accept common path 格式化s: absolute paths, ~/path, or relative paths 验证 that the file exists before pr

数据来源：ClawHub ↗ · 中文优化：龙虾技能库