首页龙虾技能列表 › HTML to HTML

📄 HTML to HTML

v0.4.0

Clean and restructure HTML documents using MinerU. Takes messy or complex HTML and produces clean, well-formatted HTML output with proper structure preserved...

0· 125·0 当前·0 累计
by @mzlzyca (mzlzyCA)·MIT-0
下载技能包
License
MIT-0
最后更新
2026/4/3
安全扫描
VirusTotal
无害
查看报告
OpenClaw
安全
high confidence
The skill's requirements, instructions, and install steps match its stated purpose (running the mineru-open-api CLI with a MINERU_TOKEN to clean HTML).
评估建议
This skill appears coherent: it runs the mineru-open-api CLI and needs a MINERU_TOKEN from mineru.net. Before installing, verify the npm package and GitHub repo are legitimate (check publisher, recent commits, and npm download counts). Treat MINERU_TOKEN like any API credential: only provide a token with the minimal needed scopes, avoid using it with highly sensitive local HTML unless you accept sending content to the MinerU service, and rotate/delete the token if you stop using the skill.
详细分析 ▾
用途与能力
Name/description (HTML cleanup via MinerU) align with required binary (mineru-open-api) and required env var (MINERU_TOKEN). The primary credential and declared binaries are exactly what the CLI needs to function.
指令范围
SKILL.md only instructs the agent to run mineru-open-api commands against remote URLs or local HTML files, use the auth flow, and write output to stdout or files. It does not ask the agent to read unrelated system files, other credentials, or post data to unexpected endpoints beyond MinerU's API.
安装机制
Installation options are standard package installs (npm package and Go install from a GitHub repo). These are expected for a CLI; no arbitrary download URLs, extract steps, or personal servers are used.
凭证需求
Only MINERU_TOKEN is required and declared as the primary credential, which is proportionate for a hosted extraction/processing service. No unrelated secrets or config paths are requested.
持久化与权限
Skill is not forced-always; it is user-invocable and does not request elevated persistent presence or modifications to other skills or system-wide configs.
安全有层次,运行前请审查代码。

License

MIT-0

可自由使用、修改和再分发,无需署名。

运行时依赖

无特殊依赖

版本

latestv0.4.02026/3/27

SEO: expand description for better ClawHub vector search discovery

● 无害

安装命令 点击复制

官方npx clawhub@latest install html-to-html
镜像加速npx clawhub@latest install html-to-html --registry https://cn.clawhub-mirror.com

技能文档

Fetch a remote web page or local HTML file and convert it to clean structured HTML using MinerU. Strips noise and preserves semantic content.

Install

npm install -g mineru-open-api
# or via Go (macOS/Linux):
go install github.com/opendatalab/MinerU-Ecosystem/cli/mineru-open-api@latest

Quick Start

# Crawl a web page and output clean HTML (requires token)
mineru-open-api crawl https://example.com/article -f html -o ./out/

# Re-extract a local HTML file to clean HTML (requires token) mineru-open-api extract page.html -f html -o ./out/

# Batch crawl multiple URLs to HTML (requires token) mineru-open-api crawl url1 url2 -f html -o ./pages/

Authentication

Token required:

mineru-open-api auth             # Interactive token setup
export MINERU_TOKEN="your-token" # Or via environment variable

Create token at: https://mineru.net/apiManage/token

Capabilities

  • Input: remote web page URL or local .html file
  • Output: clean structured HTML (-f html)
  • For remote URLs: use crawl -f html
  • For local HTML files: use extract -f html
  • Requires token — not available in flash-extract

Notes

  • HTML output (-f html) requires token; not available in flash-extract
  • crawl supports output formats: md, html, json
  • extract supports output formats: md, html, latex, docx, json
  • Output goes to stdout by default; use -o to save to a file or directory
  • All progress/status messages go to stderr; document content goes to stdout
  • MinerU is open-source by OpenDataLab (Shanghai AI Lab): https://github.com/opendatalab/MinerU
数据来源:ClawHub ↗ · 中文优化:龙虾技能库
OpenClaw 技能定制 / 插件定制 / 私有工作流定制

免费技能或插件可能存在安全风险,如需更匹配、更安全的方案,建议联系付费定制

了解定制服务