首页龙虾技能列表 › HTML Extract

📄 HTML Extract

v0.4.0

Extract content from HTML pages and files using MinerU. Converts HTML to clean, structured Markdown preserving headings, lists, tables, and text hierarchy. F...

0· 132·0 当前·0 累计
by @mzlzyca (mzlzyCA)·MIT-0
下载技能包
License
MIT-0
最后更新
2026/4/3
安全扫描
VirusTotal
无害
查看报告
OpenClaw
安全
high confidence
The skill's declared binary requirement (mineru-open-api) and single MINERU_TOKEN credential match its HTML extraction purpose and the SKILL.md instructions; nothing appears disproportionate or unrelated.
评估建议
This skill is internally consistent with its stated purpose, but you should verify the mineru-open-api package before installing: check the npm package page and the GitHub repo linked from the MinerU homepage (https://mineru.net / https://github.com/opendatalab). Treat MINERU_TOKEN as a secret (do not reuse highly privileged credentials), create a token with least privilege if possible, and rotate it if you later stop using the skill. If you're cautious, install the CLI in an isolated environmen...
详细分析 ▾
用途与能力
The name/description (HTML extraction via MinerU) align with the declared runtime requirement (mineru-open-api) and the single required env var (MINERU_TOKEN). Requiring a MinerU CLI and token is expected for this functionality.
指令范围
SKILL.md contains explicit commands using mineru-open-api (extract, crawl) and only references local HTML files, URLs, and the MINERU_TOKEN. It does not instruct reading unrelated system files, other environment variables, or exfiltrating data to unexpected endpoints.
安装机制
Installers are npm (mineru-open-api) and go install from the GitHub repo — these are standard package sources. Installing third-party packages runs remote code at install/runtime, so verify the npm package and GitHub repository are the legitimate MinerU project before installing.
凭证需求
Only one credential (MINERU_TOKEN) is required and is declared as primaryEnv. This is proportionate to a CLI that calls a remote MinerU API. No unrelated secrets or broad filesystem config paths are requested.
持久化与权限
The skill does not request always:true or other elevated persistence. It is user-invocable and allows normal autonomous invocation, which is the platform default and reasonable for this capability.
安全有层次,运行前请审查代码。

License

MIT-0

可自由使用、修改和再分发,无需署名。

运行时依赖

无特殊依赖

版本

latestv0.4.02026/3/27

SEO: expand description for better ClawHub vector search discovery

● 无害

安装命令 点击复制

官方npx clawhub@latest install html-extract
镜像加速npx clawhub@latest install html-extract --registry https://cn.clawhub-mirror.com

技能文档

Extract text and content from local HTML files to Markdown using MinerU. For live web page URLs, use mineru-open-api crawl.

Install

npm install -g mineru-open-api
# or via Go (macOS/Linux):
go install github.com/opendatalab/MinerU-Ecosystem/cli/mineru-open-api@latest

Quick Start

# Extract from a local HTML file (requires token)
mineru-open-api extract page.html -o ./out/

# Extract from a remote HTML URL (requires token) mineru-open-api extract https://example.com/page.html -o ./out/

# Extract web page content via crawl (requires token) mineru-open-api crawl https://example.com/article -o ./out/

# With language hint mineru-open-api extract page.html --language en -o ./out/

Authentication

Token required:

mineru-open-api auth             # Interactive token setup
export MINERU_TOKEN="your-token" # Or via environment variable

Create token at: https://mineru.net/apiManage/token

Capabilities

  • Supported input: local .html file or remote HTML URL
  • HTML requires extract (token required) — not supported by flash-extract
  • For live web pages, use mineru-open-api crawl (also requires token)
  • Language hint with --language (default: ch, use en for English)

Notes

  • HTML is NOT supported by flash-extract — always use extract or crawl
  • Output goes to stdout by default; use -o to save to a file or directory
  • All progress/status messages go to stderr; document content goes to stdout
  • MinerU is open-source by OpenDataLab (Shanghai AI Lab): https://github.com/opendatalab/MinerU
数据来源:ClawHub ↗ · 中文优化:龙虾技能库
OpenClaw 技能定制 / 插件定制 / 私有工作流定制

免费技能或插件可能存在安全风险,如需更匹配、更安全的方案,建议联系付费定制

了解定制服务