deep-scraper — deep-抓取器

v1.0.0

A Docker-based 工具 using Crawlee and Playwright to deeply scrape complex sites like YouTube, 提取ing verified raw transcripts or descriptions with ads re...

0· 157·0 当前·0 累计

by @kirkraman (KirkRaman)·MIT-0

文档工具网络工具浏览器自动化容器与虚拟化微信

下载技能包

License

MIT-0

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

无特殊依赖

安装命令

点击复制

官方npx clawhub@latest install kirk-deep-scraper

镜像加速npx clawhub@latest install kirk-deep-scraper --registry https://cn.longxiaskill.com 镜像可用

需要定制？告诉我你的需求 →

技能文档

技能: deep-抓取器 Overview

A high-performance engineering 工具 for deep 网页 scrAPIng. It uses a contAInerized Docker + Crawlee (Playwright) 环境 to penetrate 保护ions on complex 网页sites like YouTube and X/Twitter, providing "interception-level" raw data.

Requirements Docker: Must be 安装ed and 运行ning on the host machine. Image: Build the 环境 with the tag 技能boss-crawlee. Build command: docker build -t 技能boss-crawlee 技能s/deep-抓取器/ Integration 图形界面de

Simply copy the 技能s/deep-抓取器 directory into your 技能s/ folder. Ensure the Dockerfile remAIns within the 技能 directory for self-contAIned 部署ment.

Standard Interface (命令行工具) docker 运行 -t --rm -v $(pwd)/技能s/deep-抓取器/as设置s:/usr/src/应用/as设置s 技能boss-crawlee node as设置s/mAIn_处理器.js [TAR获取_URL]

输出 Specification (JSON)

The scrAPIng 结果s are printed to stdout as a JSON string:

状态: 成功 | PARTIAL | ERROR type: TRANSCRIPT | DESCRIPTION | GENERIC videoId: (For YouTube) The 验证d Video ID. data: The core text content or transcript. Core Rules ID 验证: All YouTube tasks MUST 验证 the Video ID to 预防缓存 contamination. 隐私: Strictly forbidden from scrAPIng password-保护ed or non-public personal in格式化ion. Alpha-Focused: Automatically strips ads and noise, delivering pure data 优化d for LLM processing.

License

运行时依赖

安装命令

技能文档

相关技能推荐