首页龙虾技能列表 › xmind-doc-parser — 技能工具

xmind-doc-parser — 技能工具

v1.0.1

[自动翻译] Parse documents in 18+ formats using Baidu API to extract text, tables, layout, OCR scanned images, and produce document chunks for RAG.

1· 91·0 当前·0 累计
by @maglanyulan (Maglanyulan)·MIT-0
下载技能包
License
MIT-0
最后更新
2026/3/25
安全扫描
VirusTotal
Pending
查看报告
OpenClaw
可疑
high confidence
The skill's code matches its stated purpose (using Baidu's Document Parser) but the package metadata and instructions disagree about required credentials and config changes, and the skill recommends storing API keys in a global OpenClaw config — these inconsistencies and the potential for broad credential exposure warrant caution.
评估建议
What to consider before installing: - Metadata mismatch: The registry says no env vars required, but SKILL.md and the included script require BAIDU_DOC_AI_API_KEY and BAIDU_DOC_AI_SECRET_KEY. Ask the author to fix the metadata or update the registry entry before trusting the skill. - Credentials: The skill needs your Baidu API key/secret (reasonable for this purpose). Avoid placing these secrets in a global config if you care about limiting access: do not paste keys into ~/.openclaw/openclaw.jso...
详细分析 ▾
用途与能力
The code and SKILL.md implement a Baidu Document Parser client and this matches the skill description. However the registry metadata claims no required environment variables or config paths, while SKILL.md and references clearly require BAIDU_DOC_AI_API_KEY and BAIDU_DOC_AI_SECRET_KEY and suggest editing ~/.openclaw/openclaw.json. That mismatch is incoherent and should be corrected.
指令范围
Runtime instructions are focused on document parsing and polling Baidu's APIs (expected). But ancillary documentation instructs editing the global OpenClaw config file (~/.openclaw/openclaw.json) and restarting the gateway to inject credentials — this references a system path outside the skill's declared scope and effectively centralizes credentials for other skills, increasing blast radius.
安装机制
There is no install spec or remote download; the skill is instruction-only with an included Python script. No external archives or unknown URLs are fetched by the installer. The client uses standard requests to call Baidu endpoints (expected).
凭证需求
The SKILL.md (and the Python client) require BAIDU_DOC_AI_API_KEY and BAIDU_DOC_AI_SECRET_KEY which are proportionate to calling Baidu's API. However the registry metadata declares 'required env vars: none' and 'required config paths: none' — a clear inconsistency. Also references encourage placing these secrets into a global OpenClaw config, which would expose them to other skills.
持久化与权限
The skill does not request always:true and does not modify other skills. However references instruct the operator to place API keys into a shared ~/.openclaw/openclaw.json and restart the gateway; that is a form of persistent credential placement (administrative action) that could increase exposure if other skills or users can read that file.
安全有层次,运行前请审查代码。

License

MIT-0

可自由使用、修改和再分发,无需署名。

运行时依赖

无特殊依赖

版本

latestv1.0.12026/3/24

- Skill renamed from "xmind-doc-parser" to "baidu-doc-parser" to accurately reflect functionality. - Now parses documents using Baidu Document Parser API, supporting 18+ formats (PDF, Word, Excel, PowerPoint, images, and more). - Offers comprehensive extraction: text, tables, layout analysis, OCR for scanned docs, and document chunking for RAG. - Enhanced documentation: details on API parameters, environment setup, file/format/language support, error codes, and usage examples. - Adds command-line script for easy testing and reference links to official resources.

● Pending

安装命令 点击复制

官方npx clawhub@latest install xmind-doc-parser
镜像加速npx clawhub@latest install xmind-doc-parser --registry https://cn.clawhub-mirror.com

技能文档

Parse documents using Baidu Intelligent Document Analysis Platform API.

Overview

This skill provides document parsing capabilities through Baidu's Document Parser API, supporting:

  • 18+ document formats (PDF, Word, Excel, PowerPoint, images, etc.)
  • Text extraction
  • Table recognition and extraction
  • Layout analysis (titles, paragraphs, headers/footers, etc.)
  • OCR for scanned documents
  • Document chunking for RAG applications
  • Multi-language support (Chinese, English, Japanese, Korean, French, German, etc.)

When to Use

Use this skill when users need to:

  • Parse PDF, Word, Excel, or other document formats
  • Extract text content from documents
  • Recognize and extract tables
  • Analyze document structure (titles, sections, layout)
  • Process scanned documents with OCR
  • Chunk documents for RAG applications

API Configuration

Environment Variables (Required)

Set these before using the skill:

export BAIDU_DOC_AI_API_KEY="your_api_key"
export BAIDU_DOC_AI_SECRET_KEY="your_secret_key"

Authentication

The skill uses OAuth 2.0 to obtain an access token automatically. Token is valid for 30 days.

Supported Formats

Documents: pdf, doc, docx, xls, xlsx, ppt, pptx, wps, et, dps, csv, txt, html, mhtml, ofd

Images: jpg, jpeg, png, bmp, tiff, tif

Total: 18+ formats

Supported Languages

Chinese, English, Japanese, Korean, French, German, Italian, Portuguese, Spanish, Russian, Dutch, Swedish, Finnish, Danish, Norwegian, Hungarian, Turkish, Polish, Czech, Greek, and more (20+ languages)

Usage

Basic Usage

python3 scripts/baidu_doc_parser.py --file_data <文件的base64编码> 
python3 scripts/baidu_doc_parser.py --file_url <文件数据URL> 

API Parameters

File Parameters (Required, choose one)

  • file_url (string): Document URL (publicly accessible)
  • file_data (string): Base64-encoded file data
  • file_name (string, required): File name with extension

Core Function Parameters

  • recognize_formula (bool): Recognize formulas in documents (default: false)
  • analysis_chart (bool): Parse statistical charts (default: false)
  • angle_adjust (bool): Auto-rotate images (default: false)
  • parse_image_layout (bool): Return image position info (default: false)

Language and Format Parameters

  • language_type (string): Recognition language (default: "CHN_ENG")
- Options: CHN_ENG, JAP, KOR, FRE, SPA, POR, GER, ITA, RUS, DAN, DUT, MAL, SWE, IND, POL, ROM, TUR, GRE, HUN, THA, VIE, ARA, HIN
  • switch_digital_width (string): Convert number width (default: "auto")
- Options: "auto" (no conversion), "half" (half-width), "full" (full-width)
  • html_table_format (bool): Return tables in HTML format (default: true)

Advanced Parameters

  • version (string): API version (default: "v2")
  • need_inner_image_data (bool): Include internal image data
  • merge_tables (bool): Merge related tables
  • relevel_titles (bool): Restructure title hierarchy
  • recognize_seal (bool): Recognize document seals/stamps
  • return_span_boxes (bool): Return span bounding boxes

Document Chunking Parameters

  • return_doc_chunks (dict): Document chunking configuration
- switch (bool): Enable chunking (default: false) - split_type (string): Chunking method - "chunk" (by size) or "mark" (by punctuation) - separators (list): Punctuation marks for splitting (default: ['。', ';', '!', '?', ';', '!', '?']) - chunk_size (int): Chunk size in characters (default: -1 for auto)

Return Structure

Page Object

Each page contains:

  • page_id: Page identifier
  • page_num: Page number
  • text: All text content on the page
  • layouts: Layout elements (titles, paragraphs, tables, images, etc.)
  • tables: Extracted tables
  • images: Extracted images

Layout Types

  • title: Title (with sub_type: title_1, title_2, title_3, etc.)
  • para: Paragraph
  • table: Table
  • image: Image
  • head_tail: Header/footer
  • contents: Table of contents
  • seal: Seal/stamp
  • formula: Mathematical formula

Table Object

  • layout_id: Table identifier
  • markdown: Table content in Markdown format
  • position: Bounding box [x, y, width, height]
  • cells: Cell information
  • matrix: Cell index matrix (for merged cells)

Chunk Object

  • chunk_id: Chunk identifier
  • content: Chunk content
  • type: Chunk type ("text" or "table")
  • meta: Metadata (titles, position, page number)

API Characteristics

Asynchronous Processing

Document parsing is asynchronous:

  • Submit request → Get task_id
  • Poll for results using task_id

Polling Recommendations

  • Start polling 5-10 seconds after submission
  • Polling interval: 5 seconds
  • Maximum polling time: 300 seconds

QPS Limits

  • Submit request API: 2 QPS
  • Query result API: 10 QPS

File Limits

  • File size:
- URL mode: PDF up to 300MB, others up to 50MB - Base64 mode: Up to 50MB
  • Page limit: Up to 2000 pages for PDF, 200 for others
  • Formats: 18+ supported formats

Error Handling

Common error codes:

CodeMessageSolution
110/111Access token invalid/expiredRe-obtain access token
216200Empty file or URLProvide file_data or file_url
216201File format errorCheck file format
216202File size errorReduce file size
282000Internal errorRetry or contact support
282003Missing parametersCheck required parameters
282007Task not existCheck task_id
282018Service busyReduce request frequency
For complete error codes, see references/error_codes.md

Scripts

The skill includes Python scripts for document parsing:

  • scripts/baidu_doc_parser.py: Main client library
  • Command-line interface for quick testing

References

  • references/api_reference.md: Complete API documentation
  • references/error_codes.md: Full error code reference

Related Links

数据来源:ClawHub ↗ · 中文优化:龙虾技能库
OpenClaw 技能定制 / 插件定制 / 私有工作流定制

免费技能或插件可能存在安全风险,如需更匹配、更安全的方案,建议联系付费定制

了解定制服务