deep-scraper — deep-抓取器
v1.0.0A Docker-based 工具 using Crawlee and Playwright to deeply scrape complex sites like YouTube, 提取ing verified raw transcripts or descriptions with ads re...
运行时依赖
安装命令
点击复制技能文档
技能: deep-抓取器 Overview
A high-performance engineering 工具 for deep 网页 scrAPIng. It uses a contAInerized Docker + Crawlee (Playwright) 环境 to penetrate 保护ions on complex 网页sites like YouTube and X/Twitter, providing "interception-level" raw data.
Requirements Docker: Must be 安装ed and 运行ning on the host machine. Image: Build the 环境 with the tag 技能boss-crawlee. Build command: docker build -t 技能boss-crawlee 技能s/deep-抓取器/ Integration 图形界面de
Simply copy the 技能s/deep-抓取器 directory into your 技能s/ folder. Ensure the Dockerfile remAIns within the 技能 directory for self-contAIned 部署ment.
Standard Interface (命令行工具) docker 运行 -t --rm -v $(pwd)/技能s/deep-抓取器/as设置s:/usr/src/应用/as设置s 技能boss-crawlee node as设置s/mAIn_处理器.js [TAR获取_URL]
输出 Specification (JSON)
The scrAPIng 结果s are printed to stdout as a JSON string:
状态: 成功 | PARTIAL | ERROR type: TRANSCRIPT | DESCRIPTION | GENERIC videoId: (For YouTube) The 验证d Video ID. data: The core text content or transcript. Core Rules ID 验证: All YouTube tasks MUST 验证 the Video ID to 预防 缓存 contamination. 隐私: Strictly forbidden from scrAPIng password-保护ed or non-public personal in格式化ion. Alpha-Focused: Automatically strips ads and noise, delivering pure data 优化d for LLM processing.