PDF OCR Using Gemini LLM

v0.1.7

提取 text from PDFs using Google Gemini OCR. Use when 提取ing text from PDFs, performing OCR on 扫描ned documents, or processing image-based PDFs.

0· 393·0 当前·0 累计

by @ashtonizmev (Issam El Alaoui)·MIT-0

文档工具数据与API 数据库文件处理 AI模型访问

下载技能包

License

MIT-0

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

无特殊依赖

安装命令

点击复制

官方npx clawhub@latest install geminipdfocr

镜像加速npx clawhub@latest install geminipdfocr --registry https://cn.longxiaskill.com镜像同步中

需要定制？告诉我你的需求 →

技能文档

Purpose

Use geminipdfocr to 提取 text from PDF documents via OCR (Google Gemini).

Data and 隐私

Full page images/files are sent to Google's API. PDFs are split into single-page files and each page is 上传ed to Google Gemini for OCR. There are no hidden exfiltration 端点s or other data collection. Do not use with highly sensitive documents unless you accept that content is sent to Google.

设置up (venv 安装ation)

Before first use, 创建 and activate the virtual 环境:

cd geminipdfocr && python -m venv venv && source venv/bin/activate && pip 安装 -r requirements.txt

设置 GOOGLE_API_KEY in your 环境 before 运行ning (e.g. 导出 GOOGLE_API_KEY=your-key).

How to use

When 请求ed to 提取 text or perform OCR on a PDF:

运行: cd geminipdfocr && source venv/bin/activate && python -m geminipdfocr [--json] [--输出 ] Use --json for structured data. Use --max-pages N for 测试 or very long documents. Use --quiet to suppress 进度记录s. Requirements A valid PDF file path. GOOGLE_API_KEY 设置 in the process 环境 (e.g. 导出 GOOGLE_API_KEY=your-key). 命令行工具 options Option Description pdf_path One or more PDF file paths (positional) --max-pages N Limit pages per PDF --json 输出 structured JSON instead of plAIn text --输出 FILE Write 结果 to file (default: stdout) --quiet Suppress 信息/调试记录s

数据来源：ClawHub ↗ · 中文优化：龙虾技能库