Web Fetcher — 智能网页内容抓取器

Name: Web Fetcher — 智能网页内容抓取器
Author: alexxiong

alexxiong

Web Fetcher — 智能网页内容抓取器

v0.1.1

智能抓取网页内容（文章、视频），支持微信、飞书、哔哩哔哩、知乎、头条、YouTube 等平台。触发器：抓取文章、下载网页、保存文章、fetch URL 等。

0· 401·3 当前·3 累计

by @alexxxiong (alexxiong)·MIT-0

网络工具浏览器自动化 AI模型访问系统工具

下载技能包

License

MIT-0

最后更新

2026/4/12

安全扫描

VirusTotal

可疑

查看报告

OpenClaw

Error

静态分析：检测到 1 个模式

评估建议

这些模式可能指示有风险的行为。安装前，请检查上面的 VirusTotal 和 OpenClaw 结果进行上下文感知分析。

详细分析 ▾

⚠ lib/article.py:178

检测到动态代码执行

安全有层次，运行前请审查代码。

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

无特殊依赖

版本

latestv0.1.12026/3/10

初始发布：智能网页内容抓取器，用于文章和视频

● 可疑

安装命令点击复制

官方npx clawhub@latest install web-fetcher

镜像加速npx clawhub@latest install web-fetcher --registry https://cn.clawhub-mirror.com

技能文档

智能网页抓取器

为 Claude Code 设计的智能网页内容抓取器。自动检测平台，使用最佳策略抓取文章或下载视频。

快速开始

# 抓取文章
python3 {SKILL_DIR}/fetcher.py "URL" -o ~/docs/
# 下载视频
python3 {SKILL_DIR}/fetcher.py "https://b23.tv/xxx" -o ~/videos/
# 批量抓取
python3 {SKILL_DIR}/fetcher.py --urls-file urls.txt -o ~/docs/

安装依赖

仅安装所需依赖 — 运行时检查依赖: | 依赖 | 目的 | 安装 | |-----------|---------|---------| | scrapling | 文章抓取（HTTP + 浏览器） | pip install scrapling | | yt-dlp | 视频下载 | pip install yt-dlp | | camoufox | 反检测浏览器（小红书、微博） | pip install camoufox && python3 -m camoufox fetch | | html2text | HTML 转 Markdown | pip install html2text |

智能路由

抓取器自动根据 URL 检测平台: | 平台 | 方法 | 备注 | |----------|--------|-------| | mp.weixin.qq.com | scrapling | 提取 data-src 图像，处理 SVG 占位符 | | .feishu.cn | 虚拟滚动 | 通过滚动收集所有块，使用 cookie 下载图像 | | zhuanlan.zhihu.com | scrapling | .Post-RichText 选择器 | | www.zhihu.com | scrapling | .RichContent 选择器 | | www.toutiao.com | scrapling | 处理 toutiaoimg.com base64 占位符 | | www.xiaohongshu.com | camoufox | 需要反 bot 保护的隐身浏览器 | | www.weibo.com | camoufox | 需要反 bot 保护的隐身浏览器 | | bilibili.com / b23.tv | yt-dlp | 支持质量选择的视频下载 | | youtube.com / youtu.be | yt-dlp | 视频下载 | | douyin.com | yt-dlp | 视频下载 | | 未知 URL | scrapling | 泛型抓取，具有回退级别 |
CLI 参考

python3 {SKILL_DIR}/fetcher.py [URL] [选项] 参数： url URL 地址选项： -o, --output DIR 输出目录（默认：当前目录） -q, --quality N 视频质量（如 1080, 720，默认：1080） --method METHOD 强制方法：scrapling, camoufox, ytdlp, feishu --selector CSS 强制 CSS 选择器用于内容提取 --urls-file FILE 包含 URL 的文件（每行一个，# 为注释） --audio-only 提取音频（视频下载） --no-images 跳过图像下载（文章） --cookies-browser NAME 浏览器用于 cookie（如 chrome, firefox）

平台说明

微信 (mp.weixin.qq.com)

图像使用 data-src 属性与 mmbiz.qpic.cn URL

可见标签包含 SVG 占位符（懒加载）

图像下载需要 Referer: https://mp.weixin.qq.com/ 头部

Scrapling GET 通常有效，无需浏览器

飞书 (.feishu.cn)

使用虚拟滚动 — 内容块按需渲染
抓取器滚动整个文档，收集 [data-block-id] 元素
图像需要认证抓取（cookie），通过浏览器的 fetch API 下载
可能显示 "无法打印" 文件，但会自动清理

哔哩哔哩

短链接 (b23.tv) 自动解析
对于付费/会员内容，使用 --cookies-browser chrome
默认质量为 1080p，通过 -q 调整

故障排除

| 问题 | 解决方案 | |---------|----------| | scrapling not found | pip install scrapling | | yt-dlp not found | pip install yt-dlp | | 文章内容太短 | 对于 JS 密集的页面，尝试 --method camoufox | | 飞书返回登录页 | 文档可能需要认证 | | 哔哩哔哩 403 | 使用 --cookies-browser chrome | | 图像下载失败 | 检查网络；微信图像需要 Referer 头部（自动处理） |

手动使用

当 CLI 不适合您的需求时，直接使用模块:

from lib.router import route, check_dependency
from lib.article import fetch_article
from lib.video import fetch_video
from lib.feishu import fetch_feishu
# 路由 URL
r = route("https://mp.weixin.qq.com/s/xxx")
# {'type': 'article', 'method': 'scrapling', 'selector': '#js_content', 'post': 'wx_images'}
# 抓取文章
fetch_article(url, output_dir="/tmp/out", route_config=r)
# 下载视频
fetch_video(url, output_dir="/tmp/out", quality="720")# 抓取飞书文档
fetch_feishu(url, output_dir="/tmp/out")

Smart web content fetcher for Claude Code. Automatically detects platform and uses the best strategy to fetch articles or download videos.

Quick Start

# Fetch an article
python3 {SKILL_DIR}/fetcher.py "URL" -o ~/docs/
# Download a video
python3 {SKILL_DIR}/fetcher.py "https://b23.tv/xxx" -o ~/videos/# Batch fetch from file
python3 {SKILL_DIR}/fetcher.py --urls-file urls.txt -o ~/docs/

Install Dependencies

Install only what you need — dependencies are checked at runtime:

Dependency	Purpose	Install
scrapling	Article fetching (HTTP + browser)	`pip install scrapling`
yt-dlp	Video download	`pip install yt-dlp`
camoufox	Anti-detection browser (Xiaohongshu, Weibo)	`pip install camoufox && python3 -m camoufox fetch`
html2text	HTML to Markdown conversion	`pip install html2text`

Smart Routing

The fetcher automatically detects the platform from the URL:

Platform	Method	Notes
mp.weixin.qq.com	scrapling	Extracts `data-src` images, handles SVG placeholders
.feishu.cn	Virtual scroll	Collects all blocks via scrolling, downloads images with cookies
zhuanlan.zhihu.com	scrapling	`.Post-RichText` selector
www.zhihu.com	scrapling	`.RichContent` selector
www.toutiao.com	scrapling	Handles `toutiaoimg.com` base64 placeholders
www.xiaohongshu.com	camoufox	Anti-bot protection requires stealth browser
www.weibo.com	camoufox	Anti-bot protection requires stealth browser
bilibili.com / b23.tv	yt-dlp	Video download, supports quality selection
youtube.com / youtu.be	yt-dlp	Video download
douyin.com	yt-dlp	Video download
Unknown URLs	scrapling	Generic fetch with fallback tiers

CLI Reference

python3 {SKILL_DIR}/fetcher.py [URL] [OPTIONS]
Arguments:
  url                    URL to fetchOptions:
  -o, --output DIR       Output directory (default: current)
  -q, --quality N        Video quality, e.g. 1080, 720 (default: 1080)
  --method METHOD        Force method: scrapling, camoufox, ytdlp, feishu
  --selector CSS         Force CSS selector for content extraction
  --urls-file FILE       File with URLs (one per line, # for comments)
  --audio-only           Extract audio only (video downloads)
  --no-images            Skip image download (articles)
  --cookies-browser NAME Browser for cookies (e.g., chrome, firefox)

Platform Notes

WeChat (mp.weixin.qq.com)

Images use data-src attribute with mmbiz.qpic.cn URLs
Visible tags contain SVG placeholders (lazy loading)
Image download requires Referer: https://mp.weixin.qq.com/ header
Scrapling GET usually works; no browser needed

Feishu (.feishu.cn)

Uses virtual scroll — content blocks are rendered on-demand
The fetcher scrolls through the entire document, collecting [data-block-id] elements
Images require authenticated fetch (cookies), downloaded via browser's fetch API
May show "Unable to print" artifacts which are auto-cleaned

Bilibili

Short links (b23.tv) are auto-resolved
For premium/member content, use --cookies-browser chrome
Default quality is 1080p, adjustable with -q

Troubleshooting

Problem	Solution
`scrapling not found`	`pip install scrapling`
`yt-dlp not found`	`pip install yt-dlp`
Article content too short	Try `--method camoufox` for JS-heavy pages
Feishu returns login page	The doc may require authentication
Bilibili 403	Use `--cookies-browser chrome`
Image download fails	Check network; WeChat images need Referer header (auto-handled)

Manual Usage

When the CLI doesn't fit your needs, use the modules directly:

from lib.router import route, check_dependency
from lib.article import fetch_article
from lib.video import fetch_video
from lib.feishu import fetch_feishu
# Route a URL
r = route("https://mp.weixin.qq.com/s/xxx")
# {'type': 'article', 'method': 'scrapling', 'selector': '#js_content', 'post': 'wx_images'}
# Fetch article
fetch_article(url, output_dir="/tmp/out", route_config=r)
# Download video
fetch_video(url, output_dir="/tmp/out", quality="720")# Fetch Feishu doc
fetch_feishu(url, output_dir="/tmp/out")

数据来源：ClawHub ↗ · 中文优化：龙虾技能库

OpenClaw 技能定制 / 插件定制 / 私有工作流定制

免费技能或插件可能存在安全风险，如需更匹配、更安全的方案，建议联系付费定制

了解定制服务

License

运行时依赖

版本

安装命令 点击复制

技能文档

智能网页抓取器

快速开始

安装依赖

智能路由

CLI 参考

平台说明

微信 (mp.weixin.qq.com)

飞书 (.feishu.cn)

哔哩哔哩

故障排除

手动使用

Quick Start

Install Dependencies

Smart Routing

CLI Reference

Platform Notes

WeChat (mp.weixin.qq.com)

Feishu (.feishu.cn)

Bilibili

Troubleshooting

Manual Usage

安装命令点击复制