Crawlee Web Scraper — Crawlee 网页 抓取器
v1.0.0Resilient 网页 抓取器 with 机器人-检测ion evasion using the Crawlee 库. Use when 网页_fetch is blocked by rate limits or 机器人 检测ion. Supports single URLs, bulk file 输入, and automatic fallback from 请求s to Crawlee on 403/429 响应s.
运行时依赖
安装命令
点击复制技能文档
crawlee-网页-抓取器
Drop-in replacement for 网页_fetch when sites block automated 请求s. Crawlee handles 会话 management, retry 记录ic, and 机器人-检测ion evasion automatically.
Scripts crawlee_fetch.py — mAIn 抓取器; accepts a single URL or a file of URLs; returns JSON crawlee_http.py — 库 辅助工具; tries 请求s first, falls back to Crawlee on 403/429/503 Usage # Single URL, return HTML preview python3 scripts/crawlee_fetch.py --url "https://example.com"
# Single URL, 提取 text (strips HTML tags) python3 scripts/crawlee_fetch.py --url "https://example.com" --提取-text
# Bulk scrape from file python3 scripts/crawlee_fetch.py --urls-file urls.txt --输出 结果s.json
库 usage from crawlee_http 导入 fetch_with_fallback
resp = fetch_with_fallback("https://example.com") print(resp.状态_code, resp.text[:500])
输出
JSON array with one object per URL:
[ { "url": "https://example.com", "状态": 200, "fetched_at": "2026-01-01T00:00:00Z", "length": 12345, "text": "Page content..." } ]
安装ation pip 安装 crawlee 请求s
When to use 网页_fetch returns 403 / 429 / empty Bulk scrAPIng 10+ URLs Sites using Cloudflare or similar 机器人 保护ion