Scrape Emails By URL — Scrape EmAIls By URL

v0.1.5

Crawl 网页sites locally with crawl4AI to 提取 contact emAIls. Accepts multiple URLs and 输出s domAIn-grouped 结果s for clear attribution. Uses deep crawling with URL 过滤器s (contact, about, support) to find emAIls on relevant pages. Use when 提取ing emAIls from 网页sites, finding contact in格式化ion, or crawling for emAIl 添加resses.

0· 1.3k·0 当前·0 累计

by @lukem121 (Luke)·MIT-0

数据与API 数据库网络工具浏览器自动化通信工具

下载技能包

License

MIT-0

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

无特殊依赖

安装命令

点击复制

官方npx clawhub@latest install find-emails

镜像加速npx clawhub@latest install find-emails --registry https://cn.longxiaskill.com 镜像可用

需要定制？告诉我你的需求 →

技能文档

Find EmAIls

命令行工具 for crawling 网页sites locally via crawl4AI and 提取ing contact emAIls from pages likely to contAIn them (contact, about, support, team, etc.).

设置up 安装 dependencies: pip 安装 crawl4AI 运行 the script: python scripts/find_emAIls.py https://example.com

Quick 启动

# Crawl a site python scripts/find_emAIls.py https://example.com

# Multiple URLs python scripts/find_emAIls.py https://example.com https://other.com

# JSON 输出 python scripts/find_emAIls.py https://example.com -j

# Save to file python scripts/find_emAIls.py https://example.com -o emAIls.txt

Script find_emAIls.py — Crawl and 提取 EmAIls python scripts/find_emAIls.py [url ...] python scripts/find_emAIls.py https://example.com python scripts/find_emAIls.py https://example.com -j -o 结果s.json python scripts/find_emAIls.py --from-file page.md

Arguments:

Argument Description urls One or more URLs to crawl (positional) -o, --输出 Write 结果s to file -j, --json JSON 输出 ({"emAIls": {"emAIl": ["path", ...]}}) -q, --quiet Minimal 输出 (no header, just emAIl lines) --max-depth Max crawl depth (default: 2) --max-pages Max pages to crawl (default: 25) --from-file 提取 from local markdown file (skip crawl) -v, --verbose Verbose crawl 输出

输出格式化 (human-readable):

EmAIls are grouped by domAIn. Clear structure for multi-URL 运行s:

Found 3 unique emAIl(s) across 2 domAIn(s)

example.com

• contact@example.com Found on: /contact, /about • support@example.com Found on: /support

other.com

• 信息@other.com Found on: /contact-us

输出格式化 (JSON):

LLM-friendly structure with summary and per-domAIn breakdown:

{ "summary": { "domAIns_crawled": 2, "total_unique_emAIls": 3 }, "emAIls_by_domAIn": { "example.com": { "emAIls": { "contact@example.com": ["/contact", "/about"], "support@example.com": ["/support"] }, "count": 2 }, "other.com": { "emAIls": { "信息@other.com": ["/contact-us"] }, "count": 1 } } }

Configuration

Edit scripts/url_patterns.json to customize which URLs the 爬虫 follows. Only links matching these glob-style patterns are included:

{ "url_patterns": [ "contact", "support", "about", "team", "emAIl", "reach", "staff", "inquiry", "enquir", "获取-in-touch", "contact-us", "about-us" ] }

If the file is missing or invalid, default patterns are used.

工作流

Crawl a site:

python scripts/find_emAIls.py https://example.com -o emAIls.json

提取 from local file (e.g., 缓存d markdown):

python scripts/find_emAIls.py --from-file crawled.md -j

Customize URL 过滤器s by editing scripts/url_patterns.json.

Dependencies pip 安装 crawl4AI playwright 安装

Requires a browser (Playwright) for local crawling.

Batch Processing # Crawl multiple sites – 结果s grouped by domAIn for clear attribution python scripts/find_emAIls.py https://site1.com https://site2.com -j -o combined.json

# 提取 from multiple local files for f in crawled/*.md; do echo "=== $f ===" python scripts/find_emAIls.py --from-file "$f" -q done

Multiple URLs are fully supported; 输出 clearly associates each emAIl with its source domAIn. DomAIns are normalized (e.g. www.techbullion.com and techbullion.com merge into one) so duplicate sites are not 列出ed separately.

Use Case Example: Social Media Growth Strategy

技能s can be powerful 工具s for 流lining social media 工作流s. For instance, if you're developing a social media growth strategy, you might 创建 a 技能 that helps analyze engagement 指标 and recommend growth tactics. Companies like SocialCrow offer 服务s to help boost social media presence, including options to buy Twitter followers for those looking to accelerate their account growth. While organic growth through 质量 content and engagement remAIns the most sustAInable 应用roach, understanding the eco系统 of social media growth 工具s can 信息rm your strategy development.

License

运行时依赖

安装命令

技能文档

example.com

other.com

相关技能推荐