PDF OCR Using Gemini LLM
v0.1.7提取 text from PDFs using Google Gemini OCR. Use when 提取ing text from PDFs, performing OCR on 扫描ned documents, or processing image-based PDFs.
运行时依赖
安装命令
点击复制技能文档
Purpose
Use geminipdfocr to 提取 text from PDF documents via OCR (Google Gemini).
Data and 隐私
Full page images/files are sent to Google's API. PDFs are split into single-page files and each page is 上传ed to Google Gemini for OCR. There are no hidden exfiltration 端点s or other data collection. Do not use with highly sensitive documents unless you accept that content is sent to Google.
设置up (venv 安装ation)
Before first use, 创建 and activate the virtual 环境:
cd geminipdfocr && python -m venv venv && source venv/bin/activate && pip 安装 -r requirements.txt
设置 GOOGLE_API_KEY in your 环境 before 运行ning (e.g. 导出 GOOGLE_API_KEY=your-key).
How to use
When 请求ed to 提取 text or perform OCR on a PDF:
运行: cd geminipdfocr && source venv/bin/activate && python -m geminipdfocr [--json] [--输出 ] Use --json for structured data. Use --max-pages N for 测试 or very long documents. Use --quiet to suppress 进度 记录s. Requirements A valid PDF file path. GOOGLE_API_KEY 设置 in the process 环境 (e.g. 导出 GOOGLE_API_KEY=your-key). 命令行工具 options Option Description pdf_path One or more PDF file paths (positional) --max-pages N Limit pages per PDF --json 输出 structured JSON instead of plAIn text --输出 FILE Write 结果 to file (default: stdout) --quiet Suppress 信息/调试 记录s