xmind-doc-parser — 技能工具

Name: xmind-doc-parser — 技能工具
Rating: 1

v1.0.1

[自动翻译] Parse documents in 18+ formats using Baidu API to extract text, tables, layout, OCR scanned images, and produce document chunks for RAG.

1· 91·0 当前·0 累计

by @maglanyulan (Maglanyulan)·MIT-0

数据与API AI模型访问

使用场景：使用xmind-doc-parser — 技能工具进行数据与API使用xmind-doc-parser — 技能工具

下载技能包

License

MIT-0

最后更新

2026/3/25

安全扫描

VirusTotal

Pending

查看报告

OpenClaw

可疑

high confidence

The skill's code matches its stated purpose (using Baidu's Document Parser) but the package metadata and instructions disagree about required credentials and config changes, and the skill recommends storing API keys in a global OpenClaw config — these inconsistencies and the potential for broad credential exposure warrant caution.

评估建议

What to consider before installing: - Metadata mismatch: The registry says no env vars required, but SKILL.md and the included script require BAIDU_DOC_AI_API_KEY and BAIDU_DOC_AI_SECRET_KEY. Ask the author to fix the metadata or update the registry entry before trusting the skill. - Credentials: The skill needs your Baidu API key/secret (reasonable for this purpose). Avoid placing these secrets in a global config if you care about limiting access: do not paste keys into ~/.openclaw/openclaw.jso...

详细分析 ▾

⚠ 用途与能力

The code and SKILL.md implement a Baidu Document Parser client and this matches the skill description. However the registry metadata claims no required environment variables or config paths, while SKILL.md and references clearly require BAIDU_DOC_AI_API_KEY and BAIDU_DOC_AI_SECRET_KEY and suggest editing ~/.openclaw/openclaw.json. That mismatch is incoherent and should be corrected.

⚠ 指令范围

Runtime instructions are focused on document parsing and polling Baidu's APIs (expected). But ancillary documentation instructs editing the global OpenClaw config file (~/.openclaw/openclaw.json) and restarting the gateway to inject credentials — this references a system path outside the skill's declared scope and effectively centralizes credentials for other skills, increasing blast radius.

✓ 安装机制

There is no install spec or remote download; the skill is instruction-only with an included Python script. No external archives or unknown URLs are fetched by the installer. The client uses standard requests to call Baidu endpoints (expected).

⚠ 凭证需求

The SKILL.md (and the Python client) require BAIDU_DOC_AI_API_KEY and BAIDU_DOC_AI_SECRET_KEY which are proportionate to calling Baidu's API. However the registry metadata declares 'required env vars: none' and 'required config paths: none' — a clear inconsistency. Also references encourage placing these secrets into a global OpenClaw config, which would expose them to other skills.

ℹ 持久化与权限

The skill does not request always:true and does not modify other skills. However references instruct the operator to place API keys into a shared ~/.openclaw/openclaw.json and restart the gateway; that is a form of persistent credential placement (administrative action) that could increase exposure if other skills or users can read that file.

安全有层次，运行前请审查代码。

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

无特殊依赖

版本

latestv1.0.12026/3/24

- Skill renamed from "xmind-doc-parser" to "baidu-doc-parser" to accurately reflect functionality. - Now parses documents using Baidu Document Parser API, supporting 18+ formats (PDF, Word, Excel, PowerPoint, images, and more). - Offers comprehensive extraction: text, tables, layout analysis, OCR for scanned docs, and document chunking for RAG. - Enhanced documentation: details on API parameters, environment setup, file/format/language support, error codes, and usage examples. - Adds command-line script for easy testing and reference links to official resources.

● Pending

安装命令

点击复制

官方npx clawhub@latest install xmind-doc-parser

镜像加速npx clawhub@latest install xmind-doc-parser --registry https://cn.longxiaskill.com 镜像可用

本土化适配说明

xmind-doc-parser — 技能工具安装说明：安装命令：npx clawhub@latest install xmind-doc-parser

需要定制？告诉我你的需求 →

技能文档

Parse documents using Baidu Intelligent Document Analysis Platform API.

Overview

This skill provides document parsing capabilities through Baidu's Document Parser API, supporting:

18+ document formats (PDF, Word, Excel, PowerPoint, images, etc.)
Text extraction
Table recognition and extraction
Layout analysis (titles, paragraphs, headers/footers, etc.)
OCR for scanned documents
Document chunking for RAG applications
Multi-language support (Chinese, English, Japanese, Korean, French, German, etc.)

When to Use

Use this skill when users need to:

Parse PDF, Word, Excel, or other document formats
Extract text content from documents
Recognize and extract tables
Analyze document structure (titles, sections, layout)
Process scanned documents with OCR
Chunk documents for RAG applications

API Configuration

Environment Variables (Required)

Set these before using the skill:

export BAIDU_DOC_AI_API_KEY="your_api_key"
export BAIDU_DOC_AI_SECRET_KEY="your_secret_key"

Authentication

The skill uses OAuth 2.0 to obtain an access token automatically. Token is valid for 30 days.

Supported Formats

Documents: pdf, doc, docx, xls, xlsx, ppt, pptx, wps, et, dps, csv, txt, html, mhtml, ofd

Images: jpg, jpeg, png, bmp, tiff, tif

Total: 18+ formats

Supported Languages

Chinese, English, Japanese, Korean, French, German, Italian, Portuguese, Spanish, Russian, Dutch, Swedish, Finnish, Danish, Norwegian, Hungarian, Turkish, Polish, Czech, Greek, and more (20+ languages)

Usage

Basic Usage

python3 scripts/baidu_doc_parser.py --file_data <文件的base64编码> 
python3 scripts/baidu_doc_parser.py --file_url <文件数据URL>

API Parameters

File Parameters (Required, choose one)

file_url (string): Document URL (publicly accessible)
file_data (string): Base64-encoded file data
file_name (string, required): File name with extension

Core Function Parameters

recognize_formula (bool): Recognize formulas in documents (default: false)
analysis_chart (bool): Parse statistical charts (default: false)
angle_adjust (bool): Auto-rotate images (default: false)
parse_image_layout (bool): Return image position info (default: false)

Language and Format Parameters

language_type (string): Recognition language (default: "CHN_ENG")

- Options: CHN_ENG, JAP, KOR, FRE, SPA, POR, GER, ITA, RUS, DAN, DUT, MAL, SWE, IND, POL, ROM, TUR, GRE, HUN, THA, VIE, ARA, HIN

switch_digital_width (string): Convert number width (default: "auto")

- Options: "auto" (no conversion), "half" (half-width), "full" (full-width)

html_table_format (bool): Return tables in HTML format (default: true)

Advanced Parameters

version (string): API version (default: "v2")
need_inner_image_data (bool): Include internal image data
merge_tables (bool): Merge related tables
relevel_titles (bool): Restructure title hierarchy
recognize_seal (bool): Recognize document seals/stamps
return_span_boxes (bool): Return span bounding boxes

Document Chunking Parameters

return_doc_chunks (dict): Document chunking configuration

- switch (bool): Enable chunking (default: false) - split_type (string): Chunking method - "chunk" (by size) or "mark" (by punctuation) - separators (list): Punctuation marks for splitting (default: ['。', '；', '！', '？', ';', '!', '?']) - chunk_size (int): Chunk size in characters (default: -1 for auto)

Return Structure

Page Object

Each page contains:

page_id: Page identifier
page_num: Page number
text: All text content on the page
layouts: Layout elements (titles, paragraphs, tables, images, etc.)
tables: Extracted tables
images: Extracted images

Layout Types

title: Title (with sub_type: title_1, title_2, title_3, etc.)
para: Paragraph
table: Table
image: Image
head_tail: Header/footer
contents: Table of contents
seal: Seal/stamp
formula: Mathematical formula

Table Object

layout_id: Table identifier
markdown: Table content in Markdown format
position: Bounding box [x, y, width, height]
cells: Cell information
matrix: Cell index matrix (for merged cells)

Chunk Object

chunk_id: Chunk identifier
content: Chunk content
type: Chunk type ("text" or "table")
meta: Metadata (titles, position, page number)

API Characteristics

Asynchronous Processing

Document parsing is asynchronous:

Submit request → Get task_id
Poll for results using task_id

Polling Recommendations

Start polling 5-10 seconds after submission
Polling interval: 5 seconds
Maximum polling time: 300 seconds

QPS Limits

Submit request API: 2 QPS
Query result API: 10 QPS

File Limits

File size:

- URL mode: PDF up to 300MB, others up to 50MB - Base64 mode: Up to 50MB

Page limit: Up to 2000 pages for PDF, 200 for others
Formats: 18+ supported formats

Error Handling

Common error codes:

Code	Message	Solution
110/111	Access token invalid/expired	Re-obtain access token
216200	Empty file or URL	Provide file_data or file_url
216201	File format error	Check file format
216202	File size error	Reduce file size
282000	Internal error	Retry or contact support
282003	Missing parameters	Check required parameters
282007	Task not exist	Check task_id
282018	Service busy	Reduce request frequency

For complete error codes, see references/error_codes.md

Scripts

The skill includes Python scripts for document parsing:

scripts/baidu_doc_parser.py: Main client library
Command-line interface for quick testing

References

references/api_reference.md: Complete API documentation
references/error_codes.md: Full error code reference

xmind-doc-parser — 技能工具

License

运行时依赖

版本

安装命令

本土化适配说明

技能文档

Overview

When to Use

API Configuration

Environment Variables (Required)

Authentication

Supported Formats

Supported Languages

Usage

Basic Usage

API Parameters

File Parameters (Required, choose one)

Core Function Parameters

Language and Format Parameters

Advanced Parameters

Document Chunking Parameters

Return Structure

Page Object

Layout Types

Table Object

Chunk Object

API Characteristics

Asynchronous Processing

Polling Recommendations

QPS Limits

File Limits

Error Handling

Scripts

References

Related Links

相关技能推荐