browser scraper — 技能工具

Name: browser scraper — 技能工具
Author: neekey

neekey

browser scraper — 技能工具

v1.0.0

[自动翻译] Scrape websites using a real Chrome browser with the user's Chrome profile — shares cookies, auth, and fingerprint to bypass bot detection (Cloudflare...

0· 70·0 当前·0 累计

by @neekey·MIT-0

浏览器自动化文件处理安全云服务智能体

下载技能包

License

MIT-0

最后更新

2026/3/30

安全扫描

VirusTotal

Pending

查看报告

OpenClaw

可疑

medium confidence

The skill does what it says (uses a real Chrome profile to evade bot detection) but it reads and modifies local browser profile files (cookies, session files, SingletonLock), which is sensitive and the registry metadata does not declare those config-paths or warn about destructive actions.

评估建议

This skill intentionally launches Chrome with your profile and will share cookies, auth tokens and browser fingerprint to evade bot detection — that means any site you visit via the skill can observe your logged-in session. The script also deletes 'SingletonLock' and session files inside the profile directory to avoid launch conflicts; that can remove session state or cause unexpected browser behavior. Before using: (1) review the code yourself or run it in an isolated account/container, (2) pre...

详细分析 ▾

ℹ 用途与能力

The code and SKILL.md align with the declared purpose: it launches Playwright with a real Chrome profile to share cookies/auth and patch navigator.webdriver. However the implementation deletes stale lock and session files in the target profile directory (unlinkSync calls) — this is functionally related to using a persistent profile but is a potentially destructive side-effect that users may not expect.

⚠ 指令范围

The runtime instructions and script access the user's system Chrome profile directories, clean up (delete) SingletonLock/Session files, and may read cookies/auth state implicitly by launching a persistent profile. While reading the profile is part of bypassing bot detection, deleting session files and altering a user's profile is beyond passive scraping and carries data-loss/privacy risk. The SKILL.md does not adequately enumerate these destructive file operations.

✓ 安装机制

There is no remote download/install step; the package lists Playwright as a dependency (package.json/lock present) and the SKILL.md instructs installing Playwright via npm. No external URLs or extract-from-URL installations were used.

⚠ 凭证需求

The skill metadata declares no required config paths or credentials, yet the script directly reads and modifies standard Chrome profile paths (system default and skill-local profiles). Access to those profile directories can expose sensitive cookies, session tokens, and other private data. The fact these filesystem accesses are not declared in the registry metadata is an incoherence and raises privacy risk.

ℹ 持久化与权限

The skill is not always-enabled and does not request special agent privileges. Still, it modifies user files in the browser profile (deleting lock/session files). That is a non-trivial privilege to exercise on a user's machine and should be considered before running.

⚠ scripts/scrape.mjs:158

Shell command execution detected (child_process).

安全有层次，运行前请审查代码。

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

无特殊依赖

版本

latestv1.0.02026/3/30

Initial release of browser-scraper. - Enables scraping of websites using a real Chrome browser and user Chrome profile to bypass bot detection and access authenticated content. - Supports both default system Chrome profiles and custom named profiles for isolated sessions. - Offers optional features: headless mode, adjustable wait times for dynamic content, and interactive mode keeping the browser open. - Outputs extracted data as JSON and saves page screenshots. - Requires Playwright and a local Chrome/Chromium installation. - Includes troubleshooting and usage tips for avoiding profile/lock conflicts and improving scrape results.

● Pending

安装命令点击复制

官方npx clawhub@latest install browser-scraper

镜像加速npx clawhub@latest install browser-scraper --registry https://cn.clawhub-mirror.com

技能文档

Scrapes web pages using Playwright with a real Chrome/Chromium binary and an existing user profile. Bypasses bot detection by sharing existing cookies, fingerprint, and session.

Profiles

The scraper supports multiple Chrome profiles:

Default (no --profile flag): Uses the system's default Chrome profile

- macOS: ~/Library/Application Support/Google/Chrome/Default - Linux: ~/.config/google-chrome/Default - Windows: %LOCALAPPDATA%\Google\Chrome\User Data\Default

Named profile (--profile ): Uses profiles// under the skill directory

- Create a profile by launching Chrome with --profile-directory=Profile 1 or similar, then point the scraper at that folder - Useful for: isolating logins, avoiding conflicts with your main Chrome session, scraping without auth

Script

# Default profile (system Chrome)
node scripts/scrape.mjs  [css_selector]
# Named profile (profiles//)
node scripts/scrape.mjs  [css_selector] --profile 
# Headless mode (faster, higher block risk)
node scripts/scrape.mjs  --headless --profile 
# Keep browser open after scraping (for interactive use)
node scripts/scrape.mjs  --profile  --keep-open# Extra wait for lazy-loaded content (default: 3000ms)
node scripts/scrape.mjs  --profile  --wait 6000

Run from the skill directory:

cd ~/.openclaw-yekeen/workspace/skills/browser-scraper/
node scripts/scrape.mjs https://www.reddit.com/

Output

JSON to stdout: matched elements or page preview
Screenshot saved to /tmp/browser-scraper-last.png

Key Design

channel: 'chrome' — launches real Chrome when available, falls back to system Chromium
launchPersistentContext with the profile directory
--disable-blink-features=AutomationControlled + navigator.webdriver patch
headless: false by default to avoid SingletonLock conflicts

Requirements

Playwright installed: npm install playwright
Chrome or Chromium installed on the system
On macOS/Linux: the channel: 'chrome' option requires Chrome (not Chromium) to be installed

Tips

Chrome must not already be open with the target profile (SingletonLock error). Close Chrome first, or use a named profile to avoid conflicts.
If you get a SingletonLock error with a named profile, delete the SingletonLock file in that profile directory and try again.
Use --keep-open to leave the browser open for interactive use after scraping — Ctrl+C to close.
For sites with lazy-loaded content: use --wait flag or modify the script to increase waitForTimeout
For Reddit: use selector shreddit-post and read attributes (post-title, author, score, permalink)
To create a fresh isolated profile: run Chrome from the terminal with --profile-directory=Profile X and log in, then point the scraper at that directory

Scrapes web pages using Playwright with a real Chrome/Chromium binary and an existing user profile. Bypasses bot detection by sharing existing cookies, fingerprint, and session.

Profiles

The scraper supports multiple Chrome profiles:

Default (no --profile flag): Uses the system's default Chrome profile

- macOS: ~/Library/Application Support/Google/Chrome/Default - Linux: ~/.config/google-chrome/Default - Windows: %LOCALAPPDATA%\Google\Chrome\User Data\Default

Named profile (--profile ): Uses profiles// under the skill directory

- Create a profile by launching Chrome with --profile-directory=Profile 1 or similar, then point the scraper at that folder - Useful for: isolating logins, avoiding conflicts with your main Chrome session, scraping without auth

Script

# Default profile (system Chrome)
node scripts/scrape.mjs  [css_selector]
# Named profile (profiles//)
node scripts/scrape.mjs  [css_selector] --profile 
# Headless mode (faster, higher block risk)
node scripts/scrape.mjs  --headless --profile 
# Keep browser open after scraping (for interactive use)
node scripts/scrape.mjs  --profile  --keep-open# Extra wait for lazy-loaded content (default: 3000ms)
node scripts/scrape.mjs  --profile  --wait 6000

Run from the skill directory:

cd ~/.openclaw-yekeen/workspace/skills/browser-scraper/
node scripts/scrape.mjs https://www.reddit.com/

Output

JSON to stdout: matched elements or page preview
Screenshot saved to /tmp/browser-scraper-last.png

Key Design

channel: 'chrome' — launches real Chrome when available, falls back to system Chromium
launchPersistentContext with the profile directory
--disable-blink-features=AutomationControlled + navigator.webdriver patch
headless: false by default to avoid SingletonLock conflicts

Requirements

Playwright installed: npm install playwright
Chrome or Chromium installed on the system
On macOS/Linux: the channel: 'chrome' option requires Chrome (not Chromium) to be installed

Tips

Chrome must not already be open with the target profile (SingletonLock error). Close Chrome first, or use a named profile to avoid conflicts.
If you get a SingletonLock error with a named profile, delete the SingletonLock file in that profile directory and try again.
Use --keep-open to leave the browser open for interactive use after scraping — Ctrl+C to close.
For sites with lazy-loaded content: use --wait flag or modify the script to increase waitForTimeout
For Reddit: use selector shreddit-post and read attributes (post-title, author, score, permalink)
To create a fresh isolated profile: run Chrome from the terminal with --profile-directory=Profile X and log in, then point the scraper at that directory

数据来源：ClawHub ↗ · 中文优化：龙虾技能库

OpenClaw 技能定制 / 插件定制 / 私有工作流定制

免费技能或插件可能存在安全风险，如需更匹配、更安全的方案，建议联系付费定制

了解定制服务

License

运行时依赖

版本

安装命令 点击复制

技能文档

Profiles

Script

Output

Key Design

Requirements

Tips

Profiles

Script

Output

Key Design

Requirements

Tips

安装命令点击复制