Browserbase Scraper — 技能工具

Name: Browserbase Scraper — 技能工具
Author: wirelessjoe

wirelessjoe

Browserbase Scraper — 技能工具

v0.1.0

[自动翻译] Scrape Cloudflare-protected websites using Stagehand + Browserbase cloud browsers. Use when the user needs to extract data from websites with bot prot...

0· 422·0 当前·0 累计

by @wirelessjoe·MIT-0

浏览器自动化云服务数据分析智能体

下载技能包

License

MIT-0

最后更新

2026/3/10

安全扫描

VirusTotal

无害

查看报告

OpenClaw

可疑

medium confidence

The skill's runtime instructions legitimately require Browserbase and an LLM API key, but the registry metadata does not declare these credentials and the package references missing example files — this mismatch is incoherent and worth caution.

评估建议

Do not install blindly. Key points to consider before using: (1) The SKILL.md requires BROWSERBASE_API_KEY, BROWSERBASE_PROJECT_ID and GOOGLE_GENERATIVE_AI_API_KEY but the registry metadata lists none — ask the publisher to correct the metadata so required credentials are visible. (2) Use scoped or disposable API keys (test account) rather than production credentials. (3) Confirm the source/owner (the skill has no homepage and unknown source); absence of code files means you rely entirely on ins...

详细分析 ▾

⚠ 用途与能力

The SKILL.md describes scraping Cloudflare-protected sites with Browserbase/Stagehand and optionally Gemini — those env vars (BROWSERBASE_API_KEY, BROWSERBASE_PROJECT_ID, GOOGLE_GENERATIVE_AI_API_KEY) are coherent with the purpose. However, the registry metadata claims no required env vars or primary credential, which is inconsistent with the instructions and could mislead users about what secrets are needed.

ℹ 指令范围

The instructions stay within scraping/scraper operation (npm install, Stagehand init, page navigation, waiting, scrolling, extracting and parsing). They do not request unrelated system data. Minor issues: the docs reference a local file (scripts/example_scraper.js) that is not present in the package, and the SKILL.md suggests using 'OpenClaw cron' without providing the example script — this leaves gaps a user would need to fill.

✓ 安装机制

This is instruction-only (no install spec) and recommends installing @browserbasehq/stagehand via npm. That is a proportionate, standard install recommendation for the described functionality; nothing in the SKILL.md instructs downloading arbitrary executables or third-party archives.

⚠ 凭证需求

The SKILL.md requires two Browserbase credentials and an LLM API key — reasonable for a cloud-browser + AI extraction flow — but the published registry metadata declares no required environment variables or primary credential. The omission is a material mismatch: users may not realize they must provide API keys. Also the skill example uses process.env directly; verify you will supply only scoped/test keys and not high-privilege production credentials.

✓ 持久化与权限

The skill is not always-enabled and does not request system config paths or persistent privileges. There are no install hooks or indications it will modify other skills or system-wide settings.

安装前注意事项

The SKILL.md requires BROWSERBASE_API_KEY, BROWSERBASE_PROJECT_ID and GOOGLE_GENERATIVE_AI_API_KEY but the registry metadata lists none — ask the publisher to correct the metadata so required credentials are visible. (
Use scoped or disposable API keys (test account) rather than production credentials. (
Confirm the source/owner (the skill has no homepage and unknown source); absence of code files means you rely entirely on instructions — request the example scripts referenced (scripts/example_scraper.js) before running. (
Be aware scraping Cloudflare-protected sites can violate terms of service or laws; ensure you have permission. (
Run initial tests in an isolated environment and rotate keys if you expose them during testing. If the publisher responds and metadata is fixed (or example scripts are provided), this looks coherent; until then treat it cautiously.

安全有层次，运行前请审查代码。

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

无特殊依赖

版本

latestv0.1.02026/3/10

● 无害

安装命令点击复制

官方npx clawhub@latest install browserbase-scraper-skill

镜像加速npx clawhub@latest install browserbase-scraper-skill --registry https://cn.clawhub-mirror.com

技能文档

Bypass Cloudflare and bot protection using Stagehand + Browserbase cloud browsers with AI-powered extraction.

When to Use

Website blocks curl/fetch with Cloudflare "Just a moment..." page
Playwright headless gets detected and blocked
Need structured data extraction from dynamic content
Scraping auction sites, marketplaces, or other protected pages

Prerequisites

npm install @browserbasehq/stagehand zod

Required environment variables:

BROWSERBASE_API_KEY — from browserbase.com dashboard
BROWSERBASE_PROJECT_ID — from browserbase.com
GOOGLE_GENERATIVE_AI_API_KEY — for Gemini extraction (or use OpenAI)

Quick Start

import { Stagehand } from '@browserbasehq/stagehand';
const stagehand = new Stagehand({
  env: 'BROWSERBASE',
  apiKey: process.env.BROWSERBASE_API_KEY,
  projectId: process.env.BROWSERBASE_PROJECT_ID,
  model: {
    modelName: 'google/gemini-3-flash-preview',
    apiKey: process.env.GOOGLE_GENERATIVE_AI_API_KEY,
  },
});
await stagehand.init();
const page = stagehand.context.pages()[0];
// Navigate (Cloudflare bypass is automatic)
await page.goto('https://protected-site.com/search?q=term');
await page.waitForTimeout(5000); // Let page fully load
// AI-powered extraction (instruction-only works best)
const data = await stagehand.extract(
  Extract all product listings as JSON array:
  [{ "title": "...", "price": 123, "url": "..." }]
  Return ONLY the JSON array.
);await stagehand.close();

Key Patterns

1. Instruction-Only Extraction (Recommended)

Schema-based extraction often returns empty. Use natural language instructions instead:

const extraction = await stagehand.extract(
  Look at this page and extract:
  - All item titles
  - Prices as numbers
  - URLs
  Return as JSON array.
);

2. Handle Cloudflare Delays

Sometimes the challenge takes longer:

const title = await page.title();
if (title.toLowerCase().includes('moment')) {
  await page.waitForTimeout(10000); // Wait for challenge
}

3. Scroll to Load More

Many sites lazy-load content:

for (let i = 0; i < 5; i++) {
  await page.evaluate(() => window.scrollBy(0, window.innerHeight));
  await page.waitForTimeout(800);
}

4. Parse Extraction Results

The extraction returns a string that needs parsing:

let listings = [];
try {
  const jsonMatch = extraction?.extraction?.match(/\[[\s\S]\]/);
  if (jsonMatch) listings = JSON.parse(jsonMatch[0]);
} catch (e) {
  console.log('Parse error:', e.message);
}

Browserbase Free Tier Limits

1 concurrent session — cron jobs can conflict with interactive use

Sessions auto-close after inactivity

Use stagehand.close() to release session immediately

Cron Integration

For scheduled scraping, use OpenClaw cron with isolated sessions:

openclaw cron add \
  --name "Daily Scrape" \
  --cron "0 6   " \
  --session isolated \
  --message "Run: node ~/scripts/scraper.js"

Troubleshooting

Issue	Solution
Empty extraction	Use instruction-only (no schema), increase wait time
Cloudflare loop	Wait 10-15s, check if title contains "moment"
Session limit	Close other Browserbase sessions, check dashboard
429 errors	Wait for session to complete, don't retry immediately

Example: Full Scraper

See scripts/example_scraper.js for a complete working example.

Bypass Cloudflare and bot protection using Stagehand + Browserbase cloud browsers with AI-powered extraction.

When to Use

Website blocks curl/fetch with Cloudflare "Just a moment..." page
Playwright headless gets detected and blocked
Need structured data extraction from dynamic content
Scraping auction sites, marketplaces, or other protected pages

Prerequisites

npm install @browserbasehq/stagehand zod

Required environment variables:

BROWSERBASE_API_KEY — from browserbase.com dashboard
BROWSERBASE_PROJECT_ID — from browserbase.com
GOOGLE_GENERATIVE_AI_API_KEY — for Gemini extraction (or use OpenAI)

Quick Start

import { Stagehand } from '@browserbasehq/stagehand';
const stagehand = new Stagehand({
  env: 'BROWSERBASE',
  apiKey: process.env.BROWSERBASE_API_KEY,
  projectId: process.env.BROWSERBASE_PROJECT_ID,
  model: {
    modelName: 'google/gemini-3-flash-preview',
    apiKey: process.env.GOOGLE_GENERATIVE_AI_API_KEY,
  },
});
await stagehand.init();
const page = stagehand.context.pages()[0];
// Navigate (Cloudflare bypass is automatic)
await page.goto('https://protected-site.com/search?q=term');
await page.waitForTimeout(5000); // Let page fully load
// AI-powered extraction (instruction-only works best)
const data = await stagehand.extract(
  Extract all product listings as JSON array:
  [{ "title": "...", "price": 123, "url": "..." }]
  Return ONLY the JSON array.
);await stagehand.close();

Key Patterns

1. Instruction-Only Extraction (Recommended)

Schema-based extraction often returns empty. Use natural language instructions instead:

const extraction = await stagehand.extract(
  Look at this page and extract:
  - All item titles
  - Prices as numbers
  - URLs
  Return as JSON array.
);

2. Handle Cloudflare Delays

Sometimes the challenge takes longer:

const title = await page.title();
if (title.toLowerCase().includes('moment')) {
  await page.waitForTimeout(10000); // Wait for challenge
}

3. Scroll to Load More

Many sites lazy-load content:

for (let i = 0; i < 5; i++) {
  await page.evaluate(() => window.scrollBy(0, window.innerHeight));
  await page.waitForTimeout(800);
}

4. Parse Extraction Results

The extraction returns a string that needs parsing:

let listings = [];
try {
  const jsonMatch = extraction?.extraction?.match(/\[[\s\S]\]/);
  if (jsonMatch) listings = JSON.parse(jsonMatch[0]);
} catch (e) {
  console.log('Parse error:', e.message);
}

Browserbase Free Tier Limits

1 concurrent session — cron jobs can conflict with interactive use

Sessions auto-close after inactivity

Use stagehand.close() to release session immediately

Cron Integration

For scheduled scraping, use OpenClaw cron with isolated sessions:

openclaw cron add \
  --name "Daily Scrape" \
  --cron "0 6   " \
  --session isolated \
  --message "Run: node ~/scripts/scraper.js"

Troubleshooting

Issue	Solution
Empty extraction	Use instruction-only (no schema), increase wait time
Cloudflare loop	Wait 10-15s, check if title contains "moment"
Session limit	Close other Browserbase sessions, check dashboard
429 errors	Wait for session to complete, don't retry immediately

Example: Full Scraper

See scripts/example_scraper.js for a complete working example.

数据来源：ClawHub ↗ · 中文优化：龙虾技能库

OpenClaw 技能定制 / 插件定制 / 私有工作流定制

免费技能或插件可能存在安全风险，如需更匹配、更安全的方案，建议联系付费定制

了解定制服务

License

运行时依赖

版本

安装命令 点击复制

技能文档

When to Use

Prerequisites

Quick Start

Key Patterns

1. Instruction-Only Extraction (Recommended)

2. Handle Cloudflare Delays

3. Scroll to Load More

4. Parse Extraction Results

Browserbase Free Tier Limits

Cron Integration

Troubleshooting

Example: Full Scraper

When to Use

Prerequisites

Quick Start

Key Patterns

1. Instruction-Only Extraction (Recommended)

2. Handle Cloudflare Delays

3. Scroll to Load More

4. Parse Extraction Results

Browserbase Free Tier Limits

Cron Integration

Troubleshooting

Example: Full Scraper

安装命令点击复制