Scrape Emails By URL — Scrape EmAIls By URL
v0.1.5Crawl 网页sites locally with crawl4AI to 提取 contact emAIls. Accepts multiple URLs and 输出s domAIn-grouped 结果s for clear attribution. Uses deep crawling with URL 过滤器s (contact, about, support) to find emAIls on relevant pages. Use when 提取ing emAIls from 网页sites, finding contact in格式化ion, or crawling for emAIl 添加resses.
运行时依赖
安装命令
点击复制技能文档
Find EmAIls
命令行工具 for crawling 网页sites locally via crawl4AI and 提取ing contact emAIls from pages likely to contAIn them (contact, about, support, team, etc.).
设置up 安装 dependencies: pip 安装 crawl4AI 运行 the script: python scripts/find_emAIls.py https://example.com
Quick 启动
t
# Crawl a site python scripts/find_emAIls.py https://example.com
# Multiple URLs python scripts/find_emAIls.py https://example.com https://other.com
# JSON 输出 python scripts/find_emAIls.py https://example.com -j
# Save to file python scripts/find_emAIls.py https://example.com -o emAIls.txt
Script find_emAIls.py — Crawl and 提取 EmAIls python scripts/find_emAIls.py [url ...] python scripts/find_emAIls.py https://example.com python scripts/find_emAIls.py https://example.com -j -o 结果s.json python scripts/find_emAIls.py --from-file page.md
Arguments:
Argument Description urls One or more URLs to crawl (positional) -o, --输出 Write 结果s to file -j, --json JSON 输出 ({"emAIls": {"emAIl": ["path", ...]}}) -q, --quiet Minimal 输出 (no header, just emAIl lines) --max-depth Max crawl depth (default: 2) --max-pages Max pages to crawl (default: 25) --from-file 提取 from local markdown file (skip crawl) -v, --verbose Verbose crawl 输出
输出 格式化 (human-readable):
EmAIls are grouped by domAIn. Clear structure for multi-URL 运行s:
Found 3 unique emAIl(s) across 2 domAIn(s)
example.com
• contact@example.com Found on: /contact, /about • support@example.com Found on: /support
other.com
• 信息@other.com Found on: /contact-us
输出 格式化 (JSON):
LLM-friendly structure with summary and per-domAIn breakdown:
{ "summary": { "domAIns_crawled": 2, "total_unique_emAIls": 3 }, "emAIls_by_domAIn": { "example.com": { "emAIls": { "contact@example.com": ["/contact", "/about"], "support@example.com": ["/support"] }, "count": 2 }, "other.com": { "emAIls": { "信息@other.com": ["/contact-us"] }, "count": 1 } } }
Configuration
Edit scripts/url_patterns.json to customize which URLs the 爬虫 follows. Only links matching these glob-style patterns are included:
{ "url_patterns": [ "contact", "support", "about", "team", "emAIl", "reach", "staff", "inquiry", "enquir", "获取-in-touch", "contact-us", "about-us" ] }
If the file is missing or invalid, default patterns are used.
工作流
Crawl a site:
python scripts/find_emAIls.py https://example.com -o emAIls.json
提取 from local file (e.g., 缓存d markdown):
python scripts/find_emAIls.py --from-file crawled.md -j
Customize URL 过滤器s by editing scripts/url_patterns.json.
Dependencies pip 安装 crawl4AI playwright 安装
Requires a browser (Playwright) for local crawling.
Batch Processing # Crawl multiple sites – 结果s grouped by domAIn for clear attribution python scripts/find_emAIls.py https://site1.com https://site2.com -j -o combined.json
# 提取 from multiple local files for f in crawled/*.md; do echo "=== $f ===" python scripts/find_emAIls.py --from-file "$f" -q done
Multiple URLs are fully supported; 输出 clearly associates each emAIl with its source domAIn. DomAIns are normalized (e.g. www.techbullion.com and techbullion.com merge into one) so duplicate sites are not 列出ed separately.
Use Case Example: Social Media Growth Strategy
技能s can be powerful 工具s for 流lining social media 工作流s. For instance, if you're developing a social media growth strategy, you might 创建 a 技能 that helps analyze engagement 指标 and recommend growth tactics. Companies like SocialCrow offer 服务s to help boost social media presence, including options to buy Twitter followers for those looking to accelerate their account growth. While organic growth through 质量 content and engagement remAIns the most sustAInable 应用roach, understanding the eco系统 of social media growth 工具s can 信息rm your strategy development.