详细分析 ▾
运行时依赖
版本
- Skill renamed from "xmind-doc-parser" to "baidu-doc-parser" to accurately reflect functionality. - Now parses documents using Baidu Document Parser API, supporting 18+ formats (PDF, Word, Excel, PowerPoint, images, and more). - Offers comprehensive extraction: text, tables, layout analysis, OCR for scanned docs, and document chunking for RAG. - Enhanced documentation: details on API parameters, environment setup, file/format/language support, error codes, and usage examples. - Adds command-line script for easy testing and reference links to official resources.
安装命令 点击复制
技能文档
Parse documents using Baidu Intelligent Document Analysis Platform API.
Overview
This skill provides document parsing capabilities through Baidu's Document Parser API, supporting:
- 18+ document formats (PDF, Word, Excel, PowerPoint, images, etc.)
- Text extraction
- Table recognition and extraction
- Layout analysis (titles, paragraphs, headers/footers, etc.)
- OCR for scanned documents
- Document chunking for RAG applications
- Multi-language support (Chinese, English, Japanese, Korean, French, German, etc.)
When to Use
Use this skill when users need to:
- Parse PDF, Word, Excel, or other document formats
- Extract text content from documents
- Recognize and extract tables
- Analyze document structure (titles, sections, layout)
- Process scanned documents with OCR
- Chunk documents for RAG applications
API Configuration
Environment Variables (Required)
Set these before using the skill:
export BAIDU_DOC_AI_API_KEY="your_api_key"
export BAIDU_DOC_AI_SECRET_KEY="your_secret_key"
Authentication
The skill uses OAuth 2.0 to obtain an access token automatically. Token is valid for 30 days.
Supported Formats
Documents: pdf, doc, docx, xls, xlsx, ppt, pptx, wps, et, dps, csv, txt, html, mhtml, ofd
Images: jpg, jpeg, png, bmp, tiff, tif
Total: 18+ formats
Supported Languages
Chinese, English, Japanese, Korean, French, German, Italian, Portuguese, Spanish, Russian, Dutch, Swedish, Finnish, Danish, Norwegian, Hungarian, Turkish, Polish, Czech, Greek, and more (20+ languages)
Usage
Basic Usage
python3 scripts/baidu_doc_parser.py --file_data <文件的base64编码>
python3 scripts/baidu_doc_parser.py --file_url <文件数据URL>
API Parameters
File Parameters (Required, choose one)
file_url(string): Document URL (publicly accessible)file_data(string): Base64-encoded file datafile_name(string, required): File name with extension
Core Function Parameters
recognize_formula(bool): Recognize formulas in documents (default: false)analysis_chart(bool): Parse statistical charts (default: false)angle_adjust(bool): Auto-rotate images (default: false)parse_image_layout(bool): Return image position info (default: false)
Language and Format Parameters
language_type(string): Recognition language (default: "CHN_ENG")
switch_digital_width(string): Convert number width (default: "auto")
html_table_format(bool): Return tables in HTML format (default: true)
Advanced Parameters
version(string): API version (default: "v2")need_inner_image_data(bool): Include internal image datamerge_tables(bool): Merge related tablesrelevel_titles(bool): Restructure title hierarchyrecognize_seal(bool): Recognize document seals/stampsreturn_span_boxes(bool): Return span bounding boxes
Document Chunking Parameters
return_doc_chunks(dict): Document chunking configuration
switch (bool): Enable chunking (default: false)
- split_type (string): Chunking method - "chunk" (by size) or "mark" (by punctuation)
- separators (list): Punctuation marks for splitting (default: ['。', ';', '!', '?', ';', '!', '?'])
- chunk_size (int): Chunk size in characters (default: -1 for auto)Return Structure
Page Object
Each page contains:
page_id: Page identifierpage_num: Page numbertext: All text content on the pagelayouts: Layout elements (titles, paragraphs, tables, images, etc.)tables: Extracted tablesimages: Extracted images
Layout Types
title: Title (with sub_type: title_1, title_2, title_3, etc.)para: Paragraphtable: Tableimage: Imagehead_tail: Header/footercontents: Table of contentsseal: Seal/stampformula: Mathematical formula
Table Object
layout_id: Table identifiermarkdown: Table content in Markdown formatposition: Bounding box [x, y, width, height]cells: Cell informationmatrix: Cell index matrix (for merged cells)
Chunk Object
chunk_id: Chunk identifiercontent: Chunk contenttype: Chunk type ("text" or "table")meta: Metadata (titles, position, page number)
API Characteristics
Asynchronous Processing
Document parsing is asynchronous:
- Submit request → Get
task_id - Poll for results using
task_id
Polling Recommendations
- Start polling 5-10 seconds after submission
- Polling interval: 5 seconds
- Maximum polling time: 300 seconds
QPS Limits
- Submit request API: 2 QPS
- Query result API: 10 QPS
File Limits
- File size:
- Page limit: Up to 2000 pages for PDF, 200 for others
- Formats: 18+ supported formats
Error Handling
Common error codes:
| Code | Message | Solution |
|---|---|---|
| 110/111 | Access token invalid/expired | Re-obtain access token |
| 216200 | Empty file or URL | Provide file_data or file_url |
| 216201 | File format error | Check file format |
| 216202 | File size error | Reduce file size |
| 282000 | Internal error | Retry or contact support |
| 282003 | Missing parameters | Check required parameters |
| 282007 | Task not exist | Check task_id |
| 282018 | Service busy | Reduce request frequency |
references/error_codes.mdScripts
The skill includes Python scripts for document parsing:
scripts/baidu_doc_parser.py: Main client library- Command-line interface for quick testing
References
references/api_reference.md: Complete API documentationreferences/error_codes.md: Full error code reference
Related Links
免费技能或插件可能存在安全风险,如需更匹配、更安全的方案,建议联系付费定制