🕸️ Scrapling Web Extractor — 网页抓取与 Markdown 转换器
v1.0.0使用 Scrapling抓取一个或多个公共网页,提取主要内容,并使用 html2text 将 HTML 转换为 Markdown。支持静态 HTTP、并发异步、隐匿反爬虫(Camoufox/Firefox)和动态 Playwright Chromium抓取模式,具有生产级自动匹配功能。
0· 364·0 当前·0 累计
安全扫描
OpenClaw
安全
high confidence该技能的代码、运行指令和输入与网页到 Markdown 抓取器一致 — 不会请求无关的凭据或系统全局权限,其行为与描述匹配。
评估建议
["该技能内部一致,但安装前请检查:","1. 脚本动态导入并依赖外部 'scrapling' 包和 Playwright — 安装前审计或信任这些包。","2. 正当使用隐匿模式和代理访问反爬虫保护的页面,但不得用于绕过登录墙、CAPTCHAs、付费墙或访问受限内容(如 SKILL.md 中所述)。","3. Playwright 安装会下载 Chromium 二进制文件 — 确保接受此下载。","4. 运行时传递的代理凭据将用于路由请求 — 保持安全,避免提供不可信的凭据。","5. 工具将写入 Markdown 文件和自动匹配数据库到输出目录 — 根据需要审查和管理这些本地文件。"]...详细分析 ▾
✓ 用途与能力
Name, description, README, SKILL.md and the included Python script all align: they implement fetching public web pages (static or JS), extracting main content and converting HTML to Markdown. Features like stealth, proxies, Playwright, and automatch are legitimate for robust scraping and are consistent with the stated purpose.
ℹ 指令范围
SKILL.md and the script limit network calls to user-supplied URLs and an optional proxy. The skill provides flags to enable stealth, proxying, and Playwright rendering; these are powerful but described and constrained (rules state not to bypass logins/paywalls). The code dynamically imports the 'scrapling' package at runtime, so actual fetching behavior depends on that external dependency.
✓ 安装机制
No install spec is included (instruction-only); the README suggests installing third-party Python packages (scrapling, html2text, Playwright). That is a normal, low-risk pattern for an instruction-only Python skill, but it does mean the fetched packages and Playwright binaries will be installed separately by the user.
✓ 凭证需求
The skill declares no required environment variables or credentials. Proxy credentials can be supplied as runtime flags (appropriate for a scraper). The script's security manifest claims it reads only user-provided URL/file inputs and writes only to the chosen output directory and the Scrapling-managed local DB—no unexpected secrets are requested.
ℹ 持久化与权限
always is false and the skill is user-invocable. It writes local output files and (per its manifest) a Scrapling automatch SQLite DB; this is reasonable for its functionality but does create persistent local artifacts that a user should be aware of.
安全有层次,运行前请审查代码。
运行时依赖
无特殊依赖
版本
latestv1.0.02026/3/10
初始发布。- 4 种抓取模式:http、async、stealth(Camoufox)、dynamic(Playwright)- 基于 CSS 选择器的内容提取,带 auto_save / auto_match- 代理支持,humanize、geoip、block-webrtc 选项- --disable-resources 和 --block-images 用于加速抓取- --retry N 带指数退避- 结构化 JSON 输出,包含每页标题、Markdown 和状态
● 无害
安装命令 点击复制
官方npx clawhub@latest install web-markdown-scraper
镜像加速npx clawhub@latest install web-markdown-scraper --registry https://cn.clawhub-mirror.com
技能文档
使用此技能时用户想要:
- 抓取一个或多个公共网页(静态或 JavaScript 渲染)
- 将 HTML 页面转换为干净的 Markdown
- 提取文章/正文文本用于摘要、分析或索引
- 通过隐匿模式绕过反爬虫保护(Cloudflare、Datadome 等)
- 并发抓取多个 URL(异步模式)
- 可靠地跟踪页面元素,适应网站设计更改(自动匹配)
- 将提取的结果保存为
.md文件
...(其他内容保持原样,不翻译)
数据来源:ClawHub ↗ · 中文优化:龙虾技能库
OpenClaw 技能定制 / 插件定制 / 私有工作流定制
免费技能或插件可能存在安全风险,如需更匹配、更安全的方案,建议联系付费定制