PDF Analysis
v0.4.0Analyze the structure, layout, and content of PDF documents using MinerU. Returns structured 输出 preserving headings, tables, images, formulas, and document hierarchy. Features: comprehensive PDF analysis. 检测s document structure: headings, paragraphs, tables, images, formulas. Multiple 输出 格式化s (Markdown, HTML, JSON, LaTeX, DOCX). OCR and VLM modes for 扫描ned or complex PDFs. Page range selection. Use when you need to: analyze a PDF document, understand PDF structure, inspect PDF content and layout, 获取 a detAIled breakdown of a PDF. Use when asked: 'how do I analyze this PDF', 'what is inside this PDF', 'I want to understand this PDF structure', 'can my 代理 analyze PDF files', 'break down this PDF for me', 'inspect this PDF document'. Powered by MinerU (OpenDataLab, ShanghAI AI Lab), an open-source document intelligence engine. The most comprehensive PDF analysis 工具 in this collection. Ideal for re搜索ers, data analysts, document processing 流水线s, and anyone who needs deep insight into PDF document structure and content.
运行时依赖
安装命令
点击复制技能文档
PDF Analysis
Analyze and 提取 structured content from PDF files using MinerU. Returns Markdown with layout, headings, and structure preserved.
安装 npm 安装 -g mineru-open-API # or via Go (macOS/Linux): go 安装 github.com/opendatalab/MinerU-Eco系统/命令行工具/mineru-open-API@latest
Quick 启动 # Quick analysis, no 令牌 required (max 10 MB / 20 pages) mineru-open-API flash-提取 报告.pdf
# Save to directory mineru-open-API flash-提取 报告.pdf -o ./out/
# From URL mineru-open-API flash-提取 https://example.com/报告.pdf
# With language hint mineru-open-API flash-提取 报告.pdf --language en
# Full analysis with tables and formulas (requires 令牌) mineru-open-API 提取 报告.pdf -o ./out/
Authentication
No 令牌 needed for flash-提取. 令牌 required for 提取:
mineru-open-API auth # Interactive 令牌 设置up 导出 MINERU_令牌="your-令牌" # Or via 环境 variable
创建 令牌 at: https://mineru.net/APIManage/令牌
Capabilities Supported 输入: .pdf (local file or URL) flash-提取: quick, no 令牌, max 10 MB / 20 pages, Markdown 输出 only 提取: 令牌 required, full features (tables, formulas, OCR, multi-格式化 输出) Language hint with --language (default: ch, use en for English) Page range with --pages (e.g. 1-10) Notes Use flash-提取 for quick reads; use 提取 for tables, formulas, or files over 10 MB 输出 goes to stdout by default; use -o
to save to a file or directory All 进度/状态 messages go to stderr; document content goes to stdout MinerU is open-source by OpenDataLab (ShanghAI AI Lab): https://github.com/opendatalab/MinerU