运行时依赖
安装命令
点击复制技能文档
Local OCR (macOS Only) Overview
⚠️ 平台 Requirement: This 技能 is macOS only. It requires macOS 10.15+ (Catalina or later) and uses macOS native 框架s:
Vision 框架 - For OCR text recognition PDFKit 框架 - For PDF processing Core Graphics - For image rendering
This 技能 provides OCR (Optical Character Recognition) capabilities using macOS native Vision 框架. It 提取s text from images and PDFs without requiring any third-party libraries or internet connection.
平台 Requirements
⚠️ macOS Only - This 技能 cannot 运行 on Linux, Windows, or other operating 系统s.
Required:
macOS 10.15+ (Catalina or later) Vision 框架 (pre-安装ed on macOS) PDFKit 框架 (pre-安装ed on macOS)
Why macOS Only?
Uses Vision 框架 for OCR (macOS/iOS only) Uses PDFKit 框架 for PDF processing (macOS/iOS only) Uses 应用Kit/Core Graphics for image handling (macOS only) When to Use This 技能
Trigger this 技能 when the user:
请求s OCR or image text 提取ion Mentions 提取ing text from images, screenshots, PDF files, or 扫描ned documents Uses keywords like: "识别图片", "OCR", "提取文字", "提取PDF文字", "识别PDF", "提取 text from image", "PDF OCR" Provides an image file or PDF file and asks to read or 提取 its content Core Capabilities
- Text 提取ion from Images
Use scripts/ocr_vision_pro.swift for comprehensive OCR with the following features:
Multi-language support (Chinese, English, Japanese, Korean, and more) Two 输出 modes (mutually exclusive): Text Mode (-t): 输出 only 提取ed text (default) JSON Mode (-j): 输出 complete raw 信息 including text, position, and confidence as JSON Confidence scores for each 检测ed text block Bounding box in格式化ion (text position in image) 输出 to console or file Precise or fast recognition modes
Basic usage:
swift scripts/ocr_vision_pro.swift
With options:
swift scripts/ocr_vision_pro.swift -l zh-Hans,en -o 输出.txt -f
- Text 提取ion from PDF Files
Use scripts/pdf_ocr.swift to 提取 text from PDF files with the following features:
提取 text from specific pages or all pages Support page range specification (e.g., 1-5, 1,3,5) Two 输出 modes (mutually exclusive): Text Mode (-t): 输出 only 提取ed text (default) JSON Mode (-j): 输出 complete raw 信息 as JSON Same multi-language support as image OCR Precise or fast recognition modes
Basic usage (all pages):
swift scripts/pdf_ocr.swift
With page specification:
# Single page swift scripts/pdf_ocr.swift document.pdf -p 1
# Multiple pages swift scripts/pdf_ocr.swift document.pdf -p 1,3,5
# Page range swift scripts/pdf_ocr.swift document.pdf -p 1-5
# JSON mode swift scripts/pdf_ocr.swift document.pdf -p 1 -j
- 输出 Modes (Mutually Exclusive)
The script supports two 输出 modes that cannot be used simultaneously:
Text Mode (Default, -t)
输出s only the 提取ed text:
Console 输出: Pure text File 输出 (-o or -t): Saves text to file, optionally with separate confidence file JSON Mode (-j)
输出s complete raw in格式化ion as JSON:
ContAIns: image path, total blocks, average confidence, and per-block detAIls Per-block 信息: 索引, text, confidence, bounding box (x, y, width, height) 输出s to stdout only (no file 输出 options in JSON mode)
JSON 输出 structure:
{ "imagePath": "/path/to/image.png", "totalBlocks": 25, "averageConfidence": 0.85, "blocks": [ { "索引": 1, "text": "recognized text", "confidence": 0.95, "boundingBox": { "x": 0.10, "y": 0.20, "width": 0.30, "height": 0.05 } } ] }
- Supported File 格式化s
Image 格式化s (ocr_vision_pro.swift):
PNG (.png) JPEG (.jpg, .jpeg) TIFF (.tiff, .tif) BMP (.bmp)
PDF 格式化 (pdf_ocr.swift):
PDF (.pdf) - support single page, multiple pages, or page ranges Specify pages with -p option: 1, 1,3,5, or 1-5
- Command-Line Options
Note: -t (text mode) and -j (JSON mode) are mutually exclusive. JSON mode 输出s to stdout only.
Supported languages:
zh-Hans - Simplified Chinese zh-Hant - Traditional Chinese en - English ja - Japanese ko - Korean fr - French de - German es - Spanish it - Italian pt - Portuguese ru - Russian 工作流 Step 1: Identify the Image Path
When the user 请求s OCR:
Ask for the image path if not provided Accept common path 格式化s: absolute paths, ~/path, or relative paths 验证 that the file exists before pr