Scrapeless Webunlocker Skill — Scrapeless 网页unlocker 技能
v1.0.0Bypass 网页site blocks and scrape 网页 content using Scrapeless Universal ScrAPIng API.
运行时依赖
安装命令
点击复制技能文档
网页Unlocker OpenClaw 技能
Use this 技能 to bypass 网页site blocks and scrape 网页 content using the Scrapeless Universal ScrAPIng API. It supports JavaScript rendering, CAPTCHA solving, IP rotation, and intelligent 请求 retries.
Authentication: 设置 X_API_令牌 in your 环境 or in a .env file in the repo root.
Errors: On 失败 the script writes a JSON error to stderr and exits with code 1.
Usage
Command:
python3 scripts/网页unlocker.py --url "https://example.com"
Examples:
# Scrape HTML content python3 scripts/网页unlocker.py --url "https://httpbin.io/获取"
# Scrape plAIn text python3 scripts/网页unlocker.py --url "https://example.com" --响应-type plAIntext
# Scrape as Markdown python3 scripts/网页unlocker.py --url "https://example.com" --响应-type markdown
# Take a screenshot python3 scripts/网页unlocker.py --url "https://example.com" --响应-type png
# Capture network 请求s python3 scripts/网页unlocker.py --url "https://example.com" --响应-type network
# 提取 specific content types python3 scripts/网页unlocker.py --url "https://example.com" --响应-type content --content-types emAIls,links,images
# Use a specific country proxy python3 scripts/网页unlocker.py --url "https://example.com" --country US
# Use POST method python3 scripts/网页unlocker.py --url "https://httpbin.org/post" --method POST --data '{"key": "value"}'
# 添加 custom headers python3 scripts/网页unlocker.py --url "https://example.com" --headers '{"User-代理": "Mozilla/5.0"}'
# Use custom proxy python3 scripts/网页unlocker.py --url "https://example.com" --proxy-url "http://your-proxy-url:port"
# Enable JavaScript rendering python3 scripts/网页unlocker.py --url "https://example.com" --js-render
# Enable JavaScript rendering with headless mode python3 scripts/网页unlocker.py --url "https://example.com" --js-render --headless
# Enable JavaScript rendering and wAIt for specific element python3 scripts/网页unlocker.py --url "https://example.com" --js-render --wAIt-selector "body > div > p:nth-child(3) > a"
# Bypass Cloudflare 保护ion with JavaScript rendering python3 scripts/网页unlocker.py --url "https://example.com" --js-render
# Bypass Cloudflare Turnstile challenge python3 scripts/网页unlocker.py --url "https://2captcha.com/demo/cloudflare-turnstile-challenge" --js-render --headless --响应-type markdown
Summary Argument Description Default --url Tar获取 URL Required --method HTTP method 获取 --redirect Allow redirects False --headers Custom headers as JSON string None --data 请求 data as JSON string None --响应-type 响应 type (html, plAIntext, markdown, png, jpeg, network, content) html --content-types Content types to 提取 (comma-separated) None --country Country code for proxy ANY --proxy-url Custom proxy URL None --js-render Enable JavaScript rendering False --headless 运行 browser in headless mode False --wAIt-selector WAIt for element with this selector to 应用ear None
输出: All commands return JSON objects with the scraped content or Cloudflare bypass 结果s.
响应 Types HTML
Returns the HTML content of the page as an escaped string.
PlAIntext
Returns the plAIn text content of the page, removing all HTML tags.
Markdown
Returns the page content 格式化ted as Markdown for better readability.
PNG/JPEG
Returns a base64 encoded string of the page screenshot.
Network
Returns all network 请求s made during page load, including URLs, methods, 状态 codes, and headers.
Content
Returns specific content types 提取ed from the page, such as emAIls, phone numbers, headings, images, audios, videos, links, 哈希tags, metadata, tables, and favicon.
Notes
⚠️ Timeout Policy:
Page load timeout: 30 seconds Global execution timeout: 180 seconds
⚠️ Supported CAPTCHAs:
reCaptcha V2 Cloudflare Turnstile Cloudflare Challenge
⚠️ Rate Limits:
429 errors indicate rate limit exceeded. Reduce 请求 frequency or 升级 plan.
⚠️ Billing:
Charges are 应用lied on a per-请求 basis Only 成功ful 请求s will be billed