为 OpenClaw 提供可靠、多提供商的网络搜索服务,高可用性 + 低运维开销。
为什么使用此技能
- 12个搜索提供商,其中6个完全不需要 API 密钥
- 自动故障转移:如果一个提供商失败,立即尝试下一个
- 配额感知:跟踪每日使用量,80% 时警告,跳过已耗尽的提供商
- 任务级多查询搜索模式,适用于多角度研究查询
- 内置存储生命周期(缓存/索引/报告),不产生工作区混乱
- 自我修复:基于健康状态的智能路由自动提升可靠提供商
- 质量优化:相关性评分、模糊去重、域名多样性、重新排序
- 自动发现:探测候选搜索引擎和 SearXNG 实例以发现新来源
- 自诊断:
doctor 和 setup 命令实现零摩擦入门
提供商概览
| 提供商 | 需要密钥 | 免费配额 | 索引来源 | 备注 |
|---|
brave | BRAVE_API_KEY | 2000/天 | Brave 独立索引 | 高质量,隐私友好 |
exa | EXA_API_KEY | ~33/天(1k/月) | 神经 + 网络 | 语义搜索,独特发现 |
tavily | TAVILY_API_KEY | 1000/天 | 网络(AI优化) | 专为 AI 代理设计 |
duckduckgo | 无 | ~500/天 | Bing + 自有索引 | 无密钥,隐私优先 |
bing_html | 无 | ~300/天 | Microsoft Bing RSS | 无密钥,稳定 XML 源 |
mojeek | 无(或 MOJEEK_API_KEY) | 200/天 | Mojeek 独立索引 | 非 Google/Bing 索引 |
serper | SERPER_API_KEY | 2500/天 | Google | 高配额免费层 |
searchapi | SEARCHAPI_API_KEY | 100/月 | Google / Bing | 多引擎 |
google_cse | GOOGLE_API_KEY + GOOGLE_CX | 100/天 | Google | 官方 Google API |
baidu | BAIDU_API_KEY | 200/天 | Baidu | 最适合中文内容 |
wikipedia | 无 | 1000/天 | Wikipedia | 事实/百科查询 |
searxng | 无 | 无限(自托管) | 元搜索引擎(所有引擎) | 需要自己的实例 |
每日总配额(配置所有密钥):8400+ 请求/天
凭证模型(重要)
- 无强制 API 密钥 — DuckDuckGo + Bing RSS + Mojeek + Wikipedia 开箱即用。
- 如果缺少密钥,需要 API 密钥的提供商会优雅失败(AuthError → 跳过,不消耗配额,无延迟):
-
BRAVE_API_KEY
-
EXA_API_KEY
-
TAVILY_API_KEY
-
SERPER_API_KEY
-
SEARCHAPI_API_KEY
-
GOOGLE_API_KEY +
GOOGLE_CX
-
BAIDU_API_KEY
-
MOJEEK_API_KEY(可选 — 无密钥时使用 HTML 抓取)
核心功能
1. 搜索故障转移
默认提供商顺序:
brave → exa → tavily → duckduckgo → bing_html → mojeek → serper → searchapi → google_cse → baidu → wikipedia
第一个成功的非空结果立即返回。
2. 任务级多查询搜索
- 将一个目标扩展为多个针对性查询
- 聚合 + 去重结果
- 前缀预设:
- 默认:
workers=1
-
@dual ... →
workers=2
-
@deep ... →
workers=3 + 更深的查询覆盖
3. 配额智能
- 每提供商每日跟踪
- 支持的提供商实时配额检索(Tavily、SearchAPI、Brave 通过探测)
- 配额饱和度达到 80% 时自动减少并发
4. 提供商健康监控
- 随时间跟踪每个提供商的成功率、延迟和错误类型
- 计算健康分数(成功率 50%,延迟 30%,新鲜度 20%)
- 智能排序:自动提升健康提供商,降级性能下降的提供商
- 查看仪表板:
python -m free_search health
5. 结果质量优化
- 相关性评分(查询-标题-摘要 token 重叠)
- 增强去重:URL + 标题相似度(Jaccard 阈值)
- 域名多样性:限制同域名结果(默认最多 3 个)
- 自动过滤低质量结果(短标题、缺失 URL)
6. 源自动发现
- 探测所有配置的提供商的可用性
- 扫描候选搜索引擎(Marginalia、Wiby、公共 SearXNG 实例)
- 验证响应格式、延迟和结果质量
- 生成新来源集成建议
- 运行:
python -m free_search discover
7. 托管持久化
memory/search-cache/YYYY-MM-DD/.json
memory/search-index/search-index.jsonl
memory/search-reports/YYYY-MM-DD/.md
快速命令
# 普通搜索
scripts/search "latest AI agent frameworks 2026" --max-results 5# 任务搜索(多查询,并行)
scripts/search task "@dual Compare Claude vs GPT-4 for code generation" --max-results 5
# 深度研究模式
scripts/search task "@deep autonomous vehicle safety 2026" --max-results 8 --max-queries 10
# 配额状态
scripts/status
# 从提供商 API 获取实时配额
scripts/remaining --real
# 清理缓存
python3 -m free_search gc --cache-days 14
# 提供商健康仪表板
python3 -m free_search health
# 发现新搜索源
python3 -m free_search discover
# 系统诊断
python3 -m free_search doctor
# 设置状态和建议
python3 -m free_search setup
提供商设置指南
Bing RSS (bing_html) — 无需密钥
使用 Bing 内置的 RSS 端点(
format=rss)— 绕过机器人检测。开箱即用。
Mojeek — 无需密钥(API 密钥可选)
开箱即用的 HTML 抓取。如需更高配额/稳定性:
- 在 https://www.mojeek.com/services/search/api/ 注册
- 设置
MOJEEK_API_KEY → 自动切换到 JSON API 模式
Wikipedia — 无需密钥
多语言支持 — 在
providers.yaml 中更改
lang:
wikipedia:
lang: it # en | zh | it | de | fr | ja ...
Exa.ai — 需要 API 密钥
- 在 https://exa.ai/ 注册
- 设置
EXA_API_KEY
- 免费层:1000 次搜索/月(约 33/天)
Google Custom Search — 需要 API 密钥 + CX
- 获取 API 密钥:https://developers.google.com/custom-search/v1/introduction
- 创建搜索引擎:https://programmablesearchengine.google.com/
- 设置
GOOGLE_API_KEY 和 GOOGLE_CX
- 免费层:100 次查询/天
Baidu Qianfan — 需要 API 密钥
- 在 https://cloud.baidu.com/ 注册
- 设置
BAIDU_API_KEY
- 最适合中文内容
SearXNG — 需要自托管实例
公共实例对服务器到服务器请求有限速。使用自己的实例:
docker run -d -p 8080:8080 searxng/searxng
然后在
providers.yaml 中:
searxng:
endpoint: http://localhost:8080
enabled: true
安装后自检
# 1) 确认提供商加载
scripts/status --compact# 2) 烟雾测试(开箱即用使用 duckduckgo/bing/mojeek)
scripts/search "openclaw" --max-results 3 --compact
# 3) 验证存储路径
ls -la /home/openclaw/.openclaw/workspace/memory/search-cache/ | tail -n 5
# 4) 检查实时配额(可选)
scripts/remaining --real --compact
输出契约(稳定)
- 搜索:
query, provider, results[], meta.attempted, meta.quota
- 任务搜索:
task, queries[], grouped_results[], merged_results[], meta
- 配额:
date, providers[], totals;使用 --real:real_quota.providers[]
运维注意事项
- 默认模式:
workers=1 — 为成本控制保守设置
- 仅对研究任务使用
@dual / @deep
SearXNG 和 YaCy 默认 enabled: false(仅自托管)
MOJEEK_API_KEY 是可选的 — 提供商优雅回退到 HTML 抓取
- 提供商健康数据存储在
memory/provider-health/health.jsonl
- 发现结果存储在
memory/provider-discovery/discovery.jsonl
- 设置后运行
python -m free_search doctor 验证一切正常
- 定期运行
python -m free_search discover 查找新搜索源
Reliable, provider-diverse web search for OpenClaw with high uptime + low operator overhead.
Why use this skill
- 12 search providers, 6 requiring no API key at all
- Automatic failover: if one provider fails, the next is tried instantly
- Quota-aware: tracks daily usage, warns at 80%, skips exhausted providers
- Task search mode for multi-angle research queries
- Built-in storage lifecycle (cache / index / report), no workspace clutter
- Self-healing: health-based smart routing automatically promotes reliable providers
- Quality optimization: relevance scoring, fuzzy dedup, domain diversity, re-ranking
- Auto-discovery: probes candidate search engines and SearXNG instances for new sources
- Self-diagnostic:
doctor and setup commands for zero-friction onboarding
Provider Overview
| Provider | Key Required | Free Quota | Index Source | Notes |
|---|
brave | BRAVE_API_KEY | 2000/day | Brave independent | High quality, privacy-friendly |
exa | EXA_API_KEY | ~33/day (1k/mo) | Neural + web | Semantic search, unique finds |
tavily | TAVILY_API_KEY | 1000/day | Web (AI-optimized) | Designed for AI agents |
duckduckgo | None | ~500/day | Bing + own | No key, privacy-focused |
bing_html | None | ~300/day | Microsoft Bing RSS | No key, stable XML feed |
mojeek | None (or MOJEEK_API_KEY) | 200/day | Mojeek independent | Non-Google/Bing index |
serper | SERPER_API_KEY | 2500/day | Google | High quota free tier |
searchapi | SEARCHAPI_API_KEY | 100/mo | Google / Bing | Multi-engine |
google_cse | GOOGLE_API_KEY + GOOGLE_CX | 100/day | Google | Official Google API |
baidu | BAIDU_API_KEY | 200/day | Baidu | Best for Chinese content |
wikipedia | None | 1000/day | Wikipedia | Factual/encyclopedic queries |
searxng | None | unlimited (self-hosted) | Meta (all engines) | Requires own instance |
Total daily quota (all keys configured): 8400+ requests/day
Credential model (important)
- No mandatory API key — DuckDuckGo + Bing RSS + Mojeek + Wikipedia work out of the box.
- API-key providers fail gracefully if key is missing (AuthError → skip, no quota consumed, no latency):
-
BRAVE_API_KEY
-
EXA_API_KEY
-
TAVILY_API_KEY
-
SERPER_API_KEY
-
SEARCHAPI_API_KEY
-
GOOGLE_API_KEY +
GOOGLE_CX
-
BAIDU_API_KEY
-
MOJEEK_API_KEY (optional — without it uses HTML scraping)
Core capabilities
1. Search failover
Default provider order:
brave → exa → tavily → duckduckgo → bing_html → mojeek → serper → searchapi → google_cse → baidu → wikipedia
First successful non-empty result returns immediately.
2. Task-level multi-query search
- Expands one goal into multiple targeted queries
- Aggregates + deduplicates results
- Prefix presets:
- default:
workers=1
-
@dual ... →
workers=2
-
@deep ... →
workers=3 + deeper query coverage
3. Quota intelligence
- Per-provider daily tracking
- Real quota retrieval where supported (Tavily, SearchAPI, Brave via probe)
- Auto concurrency reduction at 80% quota saturation
4. Provider health monitoring
- Tracks success rate, latency, and error types per provider over time
- Computes health scores (success 50%, latency 30%, freshness 20%)
- Smart ordering: auto-promotes healthy providers, demotes degraded ones
- View dashboard:
python -m free_search health
5. Result quality optimization
- Relevance scoring (query-title-snippet token overlap)
- Enhanced dedup: URL + title similarity (Jaccard threshold)
- Domain diversity: limits same-domain results (default max 3)
- Automatic filtering of low-quality results (short titles, missing URLs)
6. Source auto-discovery
- Probes all configured providers for availability
- Scans candidate search engines (Marginalia, Wiby, public SearXNG instances)
- Validates response format, latency, and result quality
- Generates recommendations for new sources to integrate
- Run:
python -m free_search discover
7. Managed persistence
memory/search-cache/YYYY-MM-DD/.json
memory/search-index/search-index.jsonl
memory/search-reports/YYYY-MM-DD/.md
Quick commands
# Normal search
scripts/search "latest AI agent frameworks 2026" --max-results 5# Task search (multi-query, parallel)
scripts/search task "@dual Compare Claude vs GPT-4 for code generation" --max-results 5
# Deep research mode
scripts/search task "@deep autonomous vehicle safety 2026" --max-results 8 --max-queries 10
# Quota status
scripts/status
# Real quota from provider APIs
scripts/remaining --real
# Cleanup cache
python3 -m free_search gc --cache-days 14
# Provider health dashboard
python3 -m free_search health
# Discover new search sources
python3 -m free_search discover
# System diagnostics
python3 -m free_search doctor
# Setup status & recommendations
python3 -m free_search setup
Provider setup guides
Bing RSS (bing_html) — No key needed
Uses Bing's built-in RSS endpoint (
format=rss) — bypasses bot detection. Works out of the box.
Mojeek — No key needed (API key optional)
Out-of-the-box HTML scraping. For higher quotas/stability:
- Register at https://www.mojeek.com/services/search/api/
- Set
MOJEEK_API_KEY → automatically switches to JSON API mode
Wikipedia — No key needed
Multilingual support — change
lang in
providers.yaml:
wikipedia:
lang: it # en | zh | it | de | fr | ja ...
Exa.ai — API key required
- Register at https://exa.ai/
- Set
EXA_API_KEY
- Free tier: 1000 searches/month (~33/day)
Google Custom Search — API key + CX required
- Get API key: https://developers.google.com/custom-search/v1/introduction
- Create search engine: https://programmablesearchengine.google.com/
- Set
GOOGLE_API_KEY and GOOGLE_CX
- Free tier: 100 queries/day
Baidu Qianfan — API key required
- Register at https://cloud.baidu.com/
- Set
BAIDU_API_KEY
- Best for Chinese-language content
SearXNG — Self-hosted instance required
Public instances rate-limit server-to-server requests. Use your own:
docker run -d -p 8080:8080 searxng/searxng
Then in
providers.yaml:
searxng:
endpoint: http://localhost:8080
enabled: true
Post-install self-check
# 1) Confirm provider load
scripts/status --compact# 2) Smoke test (uses duckduckgo/bing/mojeek out of the box)
scripts/search "openclaw" --max-results 3 --compact
# 3) Verify storage paths
ls -la /home/openclaw/.openclaw/workspace/memory/search-cache/ | tail -n 5
# 4) Check real quota (optional)
scripts/remaining --real --compact
Output contract (stable)
- Search:
query, provider, results[], meta.attempted, meta.quota
- Task search:
task, queries[], grouped_results[], merged_results[], meta
- Quota:
date, providers[], totals; with --real: real_quota.providers[]
Operator notes
- Default mode:
workers=1 — conservative for cost control
- Use
@dual / @deep only for research tasks
SearXNG and YaCy are enabled: false by default (self-hosted only)
MOJEEK_API_KEY is optional — provider gracefully falls back to HTML scraping
- Provider health data stored in
memory/provider-health/health.jsonl
- Discovery results stored in
memory/provider-discovery/discovery.jsonl
- Run
python -m free_search doctor after setup to verify everything works
- Run
python -m free_search discover periodically to find new search sources