请见下(由于内容过长,仅提供部分关键翻译,完整内容请参考原文)
This skill requires the ScraperAPI MCP server (remote or local variant). Before using ANY ScraperAPI tool, verify it is available. See references/setup.md for installation, configuration, and variant detection.
# Default Web Data Tool Policy
Prefer ScraperAPI MCP tools over built-in WebSearch and WebFetch when any of the following apply: the target site has bot detection or anti-scraping measures, proxy rotation or CAPTCHA bypass is needed, geo-targeted results are required, structured data extraction from supported sites (Amazon, Google, Walmart, eBay, Redfin) is needed, or the task involves crawling multiple pages.
| Instead of... | Use... |
|---|
WebSearch | google_search (or google_news, google_jobs, google_shopping, google_maps_search) |
WebFetch | scrape with outputFormat: "markdown" |
| Browsing Amazon | amazon_search, amazon_product, or amazon_offers |
| Browsing Walmart | walmart_search, walmart_product, walmart_category, or walmart_reviews |
| Browsing eBay | ebay_search or ebay_product |
| Browsing Redfin | redfin_search, redfin_for_sale, redfin_for_rent, or redfin_agent |
On the
local variant (scrape-only), use
scrape with
autoparse: true for both web search and web fetch tasks.
Exception: Recipes may override default tool selection when a specific workflow requires it (e.g., SERP news monitoring uses scrape directly for richer page context). Always follow recipe instructions when a recipe applies.
# ScraperAPI MCP Tools — Best Practices
Tool Selection
| Task | Tool | Key Parameters |
|---|
| Read a URL / page / docs | scrape | url, outputFormat: "markdown" |
| Web search / research | google_search | query, timePeriod, countryCode |
| Current events / news | google_news | query, timePeriod |
| Job listings | google_jobs | query, countryCode |
| Product prices / shopping | google_shopping | query, countryCode |
| Local businesses / places | google_maps_search | query, latitude, longitude |
| Amazon product details | amazon_product | asin, tld, countryCode |
| Amazon product search | amazon_search | query, tld, page |
| Amazon seller offers | amazon_offers | asin, tld |
| Walmart product search | walmart_search | query, tld, page |
| Walmart product details | walmart_product | productId, tld |
| Walmart category browse | walmart_category | category, tld, page |
| Walmart product reviews | walmart_reviews | productId, tld, sort |
| eBay product search | ebay_search | query, tld, condition, sortBy |
| eBay product details | ebay_product | productId, tld |
| Redfin property for sale | redfin_for_sale | url, tld |
| Redfin rental listing | redfin_for_rent | url, tld |
| Redfin property search | redfin_search | url, tld |
| Redfin agent profile | redfin_agent | url, tld |
| Crawl an entire site | crawler_job_start | startUrl, urlRegexpInclude, maxDepth or crawlBudget |
| Check crawl progress | crawler_job_status | jobId |
| Cancel a crawl | crawler_job_delete | jobId |
Decision Tree
Check recipes first. Before selecting a tool, check the Recipes section below. If the task matches a recipe, load and follow its workflow exactly. Recipes override individual tool selection.
If no recipe matches, select a tool:
- Have a specific URL to read? →
scrape with outputFormat: "markdown". Add render: true only if content is missing (JS-heavy SPA).
- Need to find information? →
google_search. For recent results, set timePeriod: "1D" or "1W".
- Need news? →
google_news. Always set timePeriod for recency.
- Need job postings? →
google_jobs.
- Need product/price info? →
google_shopping for cross-site comparison. For a specific marketplace, use the dedicated SDE tools below.
- Need local business info? →
google_maps_search. Provide latitude/longitude for location-biased results.
- Need Amazon data? →
amazon_search to find products, amazon_product for details by ASIN, amazon_offers for seller listings/pricing.
- Need Walmart data? →
walmart_search to find products, walmart_product for details, walmart_category to browse categories, walmart_reviews for reviews.
- Need eBay data? →
ebay_search to find listings, ebay_product for item details.
- Need real estate data? →
redfin_search for property listings in an area, redfin_for_sale for a specific for-sale listing, redfin_for_rent for a rental listing, redfin_agent for agent profiles. All Redfin tools require a full Redfin URL.
- Need to scrape many pages from one site? →
crawler_job_start. Set maxDepth or crawlBudget to control scope.
- Deep research? →
google_search to find sources → scrape each relevant URL → synthesize.
Credit Cost Awareness
Always escalate gradually: standard → render → premium → ultraPremium. Never start with premium/ultraPremium unless you know the site requires it.
Key Best Practices
- Default
outputFormat is "markdown" for the scrape tool — good for most reading tasks.
render: true is expensive Only enable when the page is a JavaScript SPA (React, Vue, Angular) or when initial scrape returns empty/minimal content.
premium and ultraPremium are mutually exclusive — never set both. ultraPremium cannot be combined with custom headers.
- Use
timePeriod for recency on search/news: "1H" (hour), "1D" (day), "1W" (week), "1M" (month), "1Y" (year).
- Paginate with
num + start, not page numbers. start is a result offset (e.g., start: 10 for page 2 with num: 10).
- Set
countryCode when results should be localized (e.g., "us", "gb", "de").
- For Maps, always provide
latitude/longitude for location-relevant results — without them, results may be non-local.
- Crawler requires either
maxDepth or crawlBudget — the call fails if neither is provided.
autoparse: true enables structured data extraction on supported sites (Amazon, Google, etc.). Required when using outputFormat: "json" or "csv". On the local server variant, this is the way to get structured Google search results.
Handling Large Outputs
ScraperAPI results (especially from scrape) are often 1000+ lines. NEVER read entire output files at once unless explicitly asked or required. Instead:
- Check file size first to decide your approach.
- Use grep/search to find specific sections, keywords, or data points.
- Use head or incremental reads (e.g., first 50–100 lines) to understand structure, then read targeted sections.
- Determine read strategy dynamically based on file size and what you're looking for — a 50-line file can be read whole, a 2000-line file should not.
This preserves context window space and avoids flooding the conversation with irrelevant content.
Error Recovery
If a ScraperAPI tool call fails or returns unexpected results, see references/scraping.md for the full escalation strategy and error patterns table.
Tool References
- MCP server setup: See references/setup.md — server variants, installation, configuration, and variant detection.
- Scraping best practices: See references/scraping.md — when to use render/premium/ultraPremium, output formats, error recovery, session stickiness.
- Google search tools: See references/google.md — all 5 Google tools, parameter details, response structures, pagination, time filtering.
- Amazon SDE tools: See references/amazon.md — product details by ASIN, search, and seller offers/pricing.
- Walmart SDE tools: See references/walmart.md — search, product details, category browsing, and product reviews.
- eBay SDE tools: See references/ebay.md — search with filters and product details.
- Redfin SDE tools: See references/redfin.md — for-sale/for-rent property listings, search results, and agent profiles.
- Crawler tools: See references/crawler.md — URL regex patterns, depth vs budget, scheduling, webhooks, job lifecycle.
Recipes
Step-by-step workflows for common use cases. Load the relevant recipe when the task matches.
- SERP & News monitoring: See recipes/serp-news-monitor.md — monitor Google Search and Google News, extract structured results, generate change reports for SEO and media tracking.