首页龙虾技能列表 › Scrapling — 技能工具

Scrapling — 技能工具

v1.0.3

[自动翻译] Web scraping and data extraction using the Python Scrapling library. Use to scrape static HTML pages, JavaScript-rendered pages (Playwright), and anti...

1· 460·2 当前·2 累计
by @piyushzinc (PiyushZinc)·MIT-0
下载技能包
License
MIT-0
最后更新
2026/3/6
安全扫描
VirusTotal
无害
查看报告
OpenClaw
安全
medium confidence
The skill's code, instructions, and requirements line up with a web-scraping helper that wraps a Python library (scrapling); nothing in the package appears to request unrelated credentials or perform hidden exfiltration, but installing third-party packages and using stealth fetchers has normal operational and legal risks to review.
评估建议
This skill is coherent for web scraping, but before installing: 1) Review the third-party Python package 'scrapling' (PyPI, source repo, maintainers) to ensure it is trustworthy; 2) Be aware that pip installing extras and running Playwright will download and execute external code and browser binaries — do so in a controlled environment if unsure; 3) Scraping stealth/anti-bot protected sites can violate terms of service or laws — only use against sites you are authorized to scrape; 4) The example...
详细分析 ▾
用途与能力
Name and description match the included SKILL.md and the Python helper. The skill documents static, dynamic, and stealthy fetchers and includes a matching CLI/py script. There are no environment variables, config paths, or unrelated binaries requested that would be inconsistent with a scraping tool.
指令范围
SKILL.md and the script stay within scraping scope: they instruct installing scrapling and Playwright, choosing fetchers, running the included CLI, and optionally using sessions (including a login example). The instructions do show examples that post login forms (session.post) which implies handling credentials, but the skill does not request or capture secrets itself. The doc also recommends respecting site terms and adding safety controls.
安装机制
This is an instruction-only skill with no install spec; it tells users to pip install 'scrapling' and optional extras and to run Playwright installer. Installing Python packages and Playwright is expected for this functionality, but it does entail downloading and executing third-party code (PyPI packages and browser drivers), which is normal but should be reviewed before installation.
凭证需求
The skill declares no required environment variables, credentials, or config paths. Example code demonstrates how to post credentials for login flows, which is appropriate for session-based scraping, but the skill itself does not request or attempt to exfiltrate secrets.
持久化与权限
The skill is not always-included and allows normal autonomous invocation. It does not request permanent system-wide privileges or modify other skills' configurations. There is no install-time behavior in the bundle that persists state beyond normal use.
安全有层次,运行前请审查代码。

License

MIT-0

可自由使用、修改和再分发,无需署名。

运行时依赖

无特殊依赖

版本

latestv1.0.32026/3/5

Improve search visibility with keyword-rich description and tags

● 无害

安装命令 点击复制

官方npx clawhub@latest install scrapling-extract
镜像加速npx clawhub@latest install scrapling-extract --registry https://cn.clawhub-mirror.com

技能文档

Extract structured website data with resilient selection patterns, adaptive relocation, and the right Scrapling fetcher mode for each target.

Workflow

  • Identify target type before writing code:
- Use Fetcher for static pages and API-like HTML responses. - Use DynamicFetcher when JavaScript rendering is required. - Use StealthyFetcher when anti-bot protection or browser fingerprinting issues are likely.
  • Choose output contract first:
- Return JSON for pipelines/automation. - Return Markdown/text for summarization or RAG ingestion. - Keep stable field names even if selector strategy changes.
  • Implement selectors in this order:
- Start with CSS selectors and pseudo-elements (for example ::text, ::attr(href)). - Fall back to XPath for ambiguous DOM structure. - Enable adaptive relocation for brittle or changing pages.
  • Add safety controls:
- Respect target site terms and legal boundaries. - Add timeouts, retries, and explicit error handling. - Log status code, URL, and selector misses for debugging.
  • Validate on at least 2 pages:
- Test one happy path and one edge case page. - Confirm required fields are non-empty. - Keep extraction deterministic (no hidden random choices).

Quick Setup

  • Install base package:
- pip install scrapling
  • Install fetchers when browser-based fetching is needed:
- pip install "scrapling[fetchers]" - scrapling install - python3 -m playwright install (required for DynamicFetcher and StealthyFetcher)
  • Install optional extras as needed:
- pip install "scrapling[shell]" for shell + extract commands - pip install "scrapling[ai]" for MCP capabilities

Execution Patterns

Pattern: One-off terminal extraction

Use Scrapling CLI for fastest no-code extraction:

scrapling extract get "https://example.com" content.md --css-selector "main"

Pattern: Python extraction script

Use the bundled helper:

# Static page (default)
python scripts/extract_with_scrapling.py --url "https://example.com" --css "h1::text"

# JavaScript-rendered page python scripts/extract_with_scrapling.py --url "https://example.com" --fetcher dynamic --css "h1::text"

# Anti-bot protected page python scripts/extract_with_scrapling.py --url "https://example.com" --fetcher stealthy --css "h1::text"

Pattern: Session-based scraping

Use session classes when cookies/state must persist across requests.

from scrapling.fetchers import FetcherSession

session = FetcherSession() login_page = session.post("https://example.com/login", data={"user": "...", "pass": "..."}) protected_page = session.get("https://example.com/dashboard") headline = protected_page.css_first("h1::text")

Use StealthySession or DynamicSession as drop-in replacements for anti-bot or JS-rendered targets.

Pattern: DOM change resilience

Use auto_save=True on initial capture and retry with adaptive selection on later runs when selectors break.

from scrapling.fetchers import Fetcher

# First run: saves DOM snapshot so adaptive relocation can work later page = Fetcher.auto_match("https://example.com", auto_save=True, disable_adaptive=False) price = page.css_first(".price::text")

# Later runs: automatically relocates the selector even if the DOM changed page = Fetcher.auto_match("https://example.com", auto_save=False, disable_adaptive=False) price = page.css_first(".price::text")

References

数据来源:ClawHub ↗ · 中文优化:龙虾技能库
OpenClaw 技能定制 / 插件定制 / 私有工作流定制

免费技能或插件可能存在安全风险,如需更匹配、更安全的方案,建议联系付费定制

了解定制服务