首页龙虾技能列表 › Website Scraper Pro — 技能工具

🌐 Website Scraper Pro — 技能工具

v0.1.0

Run a local script to scrape a single web page into clean markdown or deterministic JSON with Crawl4AI. 适用于: user needs direct page retrieval from a URL...

0· 290·0 当前·0 累计
by @youpele52·MIT-0
下载技能包
License
MIT-0
最后更新
2026/3/10
安全扫描
VirusTotal
无害
查看报告
OpenClaw
安全
high confidence
The skill's code, instructions, and requirements are coherent with its stated purpose (single-page, JS-aware scraping) and do not request unrelated credentials or system access.
评估建议
This skill appears to do what it says: run a local Crawl4AI scraper for a single URL. Before installing or running it, check that the 'uv' binary on your system is the trusted tool you expect (it will execute the inline script block and install Python packages), and confirm you are comfortable having the 'crawl4ai' package and (optionally) Playwright download browser binaries (chromium) at runtime. Avoid scraping sensitive internal URLs. If you need stricter control, review the included Python f...
详细分析 ▾
用途与能力
Name/description, CLI flags, required binary 'uv', and included Python code all align: the skill runs a local Crawl4AI-based scraper for a single URL. No unexpected credentials, system paths, or unrelated binaries are requested.
指令范围
SKILL.md instructs running the bundled script via 'uv run'. The code fetches only the target URL, processes page HTML into markdown/JSON, and normalizes links. It does not read arbitrary host files or environment variables, nor does it post scraped data to external endpoints. It does suppress crawler stdout/stderr and handles browser-setup errors as expected.
安装机制
Registry has no install spec, but main.py contains an inline script header that 'uv run' will use to install the 'crawl4ai' dependency into an isolated environment. This is coherent for a bundled script, but it means network package installation will occur at runtime — verify you trust the package source and the 'uv' tool that performs the install.
凭证需求
No environment variables, credentials, or config paths are requested. The tool asks only for a URL and optional flags, which is proportionate to its function.
持久化与权限
The skill is not always-enabled and does not request elevated or persistent system privileges or modify other skills' configurations. Autonomous invocation by the agent is allowed (default) and consistent with normal skill behavior.
安全有层次,运行前请审查代码。

License

MIT-0

可自由使用、修改和再分发,无需署名。

运行时依赖

无特殊依赖

版本

latestv0.1.02026/3/10

Initial release of Website Scraper Pro. - Scrape a single web page into clean markdown or deterministic JSON using Crawl4AI. - Supports JS-aware scraping for client-side rendered pages. - Deterministic, query-focused narrowing of content without internal AI processing. - Outputs either markdown or structured JSON including title, links, and metadata. - Usage is limited to single-page extraction; no site-wide crawling or web search.

● 无害

安装命令 点击复制

官方npx clawhub@latest install website-scraper-pro
镜像加速npx clawhub@latest install website-scraper-pro --registry https://cn.clawhub-mirror.com

技能文档

When to use

  • The user wants the content of a single web page from a specific URL.
  • The user wants clean markdown extracted from an article, docs page, blog post, or landing page.
  • The user wants a JS-aware scrape for a page that depends on client-side rendering.
  • The user wants deterministic query-focused narrowing of one page without using an AI model inside the skill.
  • The user wants structured JSON output with markdown, title, links, and metadata.

When NOT to use

  • The user wants a broad web search across multiple sources.
  • The user wants a site-wide crawl, recursive crawl, or multi-page extraction workflow.
  • The user wants AI summarization, synthesis, or answer generation inside the scraper itself.
  • The user wants authenticated browser automation or interactive form submission.

Commands

Scrape a page to markdown

uv run /root/.openclaw/workspace/skills/website-scraper-pro/src/main.py 

Scrape a JS-heavy page

uv run /root/.openclaw/workspace/skills/website-scraper-pro/src/main.py  --js

Scrape a page and narrow by query

uv run /root/.openclaw/workspace/skills/website-scraper-pro/src/main.py  --query ""

Return deterministic JSON

uv run /root/.openclaw/workspace/skills/website-scraper-pro/src/main.py  --format json

Examples

# Default markdown scrape
uv run /root/.openclaw/workspace/skills/website-scraper-pro/src/main.py https://example.com

# JS-aware scrape uv run /root/.openclaw/workspace/skills/website-scraper-pro/src/main.py https://example.com --js

# Query-focused retrieval uv run /root/.openclaw/workspace/skills/website-scraper-pro/src/main.py https://example.com --query "documentation examples"

# JSON output uv run /root/.openclaw/workspace/skills/website-scraper-pro/src/main.py https://example.com --format json

Output

  • Default output is clean markdown for a single page.
  • --query keeps the output deterministic and non-LLM.
  • --format json returns deterministic JSON with fields such as title, url, markdown, links, and metadata when available.

Notes

  • This v1 does not use AI models internally. It is a deterministic retrieval tool only.
  • The skill is single-page only. It does not do deep crawling, site maps, schema extraction, or RAG.
  • uv run reads the inline # /// script dependency block in main.py and installs crawl4ai in an isolated environment.
  • If browser setup is missing, run one-time setup commands such as:
- uv run --with crawl4ai crawl4ai-setup - uv run --with crawl4ai python -m playwright install chromium
  • Do NOT use web search for this workflow when a direct URL is available.
  • Call uv run src/main.py directly as shown above.
数据来源:ClawHub ↗ · 中文优化:龙虾技能库
OpenClaw 技能定制 / 插件定制 / 私有工作流定制

免费技能或插件可能存在安全风险,如需更匹配、更安全的方案,建议联系付费定制

了解定制服务