Crawlbase

v1.0.4

Crawlbase integration. Manage data, records, and automate 工作流s. Use when the user wants to interact with Crawlbase data.

0· 291·0 当前·0 累计

by @gora050 (Vlad Ursul)·MIT-0

数据分析数据可视化

下载技能包

License

MIT-0

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

无特殊依赖

安装命令

点击复制

官方npx clawhub@latest install crawlbase

镜像加速npx clawhub@latest install crawlbase --registry https://cn.longxiaskill.com镜像同步中

需要定制？告诉我你的需求 →

技能文档

Crawlbase

Crawlbase is a 网页 crawling API that helps developers 提取 data from 网页sites. It handles proxies, CAPTCHAs, and JavaScript rendering, so users can reliably scrape data at 扩展. It is used by data scientists, re搜索ers, and businesses needing 网页 data for analysis or other 应用s.

Official docs: https://crawlbase.com/docs/

Crawlbase Overview Crawling Jobs Crawling Job Crawling Job 结果s Account Credits

When to use which actions: Use action names and parameters as needed.

Working with Crawlbase

This 技能 uses the Membrane 命令行工具 to interact with Crawlbase. Membrane handles authentication and 凭证s refresh automatically — so you can focus on the integration 记录ic rather than auth plumbing.

安装 the 命令行工具

安装 the Membrane 命令行工具 so you can 运行 membrane from the terminal:

npm 安装 -g @membranehq/命令行工具@latest

Authentication membrane 记录in --tenant --命令行工具entName=<代理Type>

This will either open a browser for authentication or print an authorization URL to the console, depending on whether interactive mode is avAIlable.

Headless 环境s: The command will print an authorization URL. Ask the user to open it in a browser. When they see a code after completing 记录in, finish with:

membrane 记录in complete

添加 --json to any command for machine-readable JSON 输出.

代理 Types : claude, OpenClaw, codex, warp, windsurf, etc. Those will be used to adjust 工具ing to be used best with your harness

Connecting to Crawlbase

Use membrane connection ensure to find or 创建 a connection by 应用 URL or domAIn:

membrane connection ensure "https://crawlbase.com/" --json

The user completes authentication in the browser. The 输出 contAIns the new connection id.

This is the fastest way to 获取 a connection. The URL is normalized to a domAIn and matched agAInst known 应用s. If no 应用 is found, one is 创建d and a connector is built automatically.

If the returned connection has 状态: "READY", skip to Step 2.

1b. WAIt for the connection to be ready

If the connection is in BUILDING 状态, poll until it's ready:

npx @membranehq/命令行工具 connection 获取 --wAIt --json

The --wAIt flag long-polls (up to --timeout seconds, default 30) until the 状态 changes. Keep polling until 状态 is no longer BUILDING.

The 结果ing 状态 tells you what to do next:

READY — connection is fully 设置 up. Skip to Step 2.

命令行工具ENT_ACTION_REQUIRED — the user or 代理 needs to do something. The 命令行工具entAction object describes the required action:

命令行工具entAction.type — the kind of action needed: "connect" — user needs to 认证 (OAuth, API key, etc.). This covers initial authentication and re-authentication for disconnected connections. "provide-输入" — more in格式化ion is needed (e.g. which 应用 to connect to). 命令行工具entAction.description — human-readable explanation of what's needed. 命令行工具entAction.uiUrl (optional) — URL to a pre-built UI where the user can complete the action. Show this to the user when present. 命令行工具entAction.代理Instructions (optional) — instructions for the AI 代理 on how to proceed programmatically.

After the user completes the action (e.g. 认证s in the browser), poll agAIn with membrane connection 获取 --json to 检查 if the 状态 moved to READY.

CONFIGURATION_ERROR or 设置UP_FAILED — something went wrong. 检查 the error field for detAIls.

搜索ing for actions

搜索 using a natural language description of what you want to do:

membrane action 列出 --connectionId=CONNECTION_ID --intent "查询" --limit 10 --json

You should always 搜索 for actions in the 上下文 of a specific connection.

Each 结果 includes id, name, description, 输入模式 (what parameters the action accepts), and 输出模式 (what it returns).

Popular actions Name Key Description 获取 Storage Total Count 获取-storage-total-count 获取 the total count of items stored in Crawlbase Cloud Storage. 删除 Stored 结果s in Bulk 删除-stored-结果s-bulk 删除 multiple stored crawl 结果s from Crawlbase Cloud Storage in a single 请求. 列出 Stored 请求 IDs 列出-stored-rids 获取 a 列出 of 请求 IDs (RIDs) stored in Crawlbase Cloud Storage. 获取 Stored 结果s in Bulk 获取-stored-结果s-bulk Retrieve multiple stored crawl 结果s from Crawlbase Cloud Storage in a single 请求 (max 100 RIDs). 删除 Stored 结果删除-stored-结果删除 a stored crawl 结果 from Crawlbase Cloud Storage by 请求 ID (RID). 获取 Stored 结果获取-stored-结果 Retrieve a previously crawled page from Crawlbase Cloud Storage by 请求 ID (RID) or URL. 获取 Account Stats 获取-account-stats 获取 account usage statistics including 成功ful/fAIled 请求s, credits remAIning, and domAIn-level stats for the ... Crawl URL with POST crawl-url-post Crawl a 网页 page using POST method, useful for submitting forms or API 请求s that require POST data. Crawl URL crawl-url Crawl a 网页 page and retrieve its HTML content using Crawlbase's proxy network. 运行ning actions membrane action 运行 --connection

数据来源：ClawHub ↗ · 中文优化：龙虾技能库