详细分析 ▾
运行时依赖
版本
Initial release of browser-scraper. - Enables scraping of websites using a real Chrome browser and user Chrome profile to bypass bot detection and access authenticated content. - Supports both default system Chrome profiles and custom named profiles for isolated sessions. - Offers optional features: headless mode, adjustable wait times for dynamic content, and interactive mode keeping the browser open. - Outputs extracted data as JSON and saves page screenshots. - Requires Playwright and a local Chrome/Chromium installation. - Includes troubleshooting and usage tips for avoiding profile/lock conflicts and improving scrape results.
安装命令 点击复制
技能文档
Scrapes web pages using Playwright with a real Chrome/Chromium binary and an existing user profile. Bypasses bot detection by sharing existing cookies, fingerprint, and session.
Profiles
The scraper supports multiple Chrome profiles:
- Default (no
--profileflag): Uses the system's default Chrome profile
~/Library/Application Support/Google/Chrome/Default
- Linux: ~/.config/google-chrome/Default
- Windows: %LOCALAPPDATA%\Google\Chrome\User Data\Default- Named profile (
--profile): Usesprofiles/under the skill directory/
--profile-directory=Profile 1 or similar, then point the scraper at that folder
- Useful for: isolating logins, avoiding conflicts with your main Chrome session, scraping without authScript
# Default profile (system Chrome)
node scripts/scrape.mjs [css_selector]# Named profile (profiles//)
node scripts/scrape.mjs [css_selector] --profile
# Headless mode (faster, higher block risk)
node scripts/scrape.mjs --headless --profile
# Keep browser open after scraping (for interactive use)
node scripts/scrape.mjs --profile --keep-open
# Extra wait for lazy-loaded content (default: 3000ms)
node scripts/scrape.mjs --profile --wait 6000
Run from the skill directory:
cd ~/.openclaw-yekeen/workspace/skills/browser-scraper/
node scripts/scrape.mjs https://www.reddit.com/
Output
- JSON to stdout: matched elements or page preview
- Screenshot saved to
/tmp/browser-scraper-last.png
Key Design
channel: 'chrome'— launches real Chrome when available, falls back to system ChromiumlaunchPersistentContextwith the profile directory--disable-blink-features=AutomationControlled+navigator.webdriverpatchheadless: falseby default to avoid SingletonLock conflicts
Requirements
- Playwright installed:
npm install playwright - Chrome or Chromium installed on the system
- On macOS/Linux: the
channel: 'chrome'option requires Chrome (not Chromium) to be installed
Tips
- Chrome must not already be open with the target profile (SingletonLock error). Close Chrome first, or use a named profile to avoid conflicts.
- If you get a
SingletonLockerror with a named profile, delete theSingletonLockfile in that profile directory and try again. - Use
--keep-opento leave the browser open for interactive use after scraping — Ctrl+C to close. - For sites with lazy-loaded content: use
--waitflag or modify the script to increasewaitForTimeout - For Reddit: use selector
shreddit-postand read attributes (post-title,author,score,permalink) - To create a fresh isolated profile: run Chrome from the terminal with
--profile-directory=Profile Xand log in, then point the scraper at that directory
免费技能或插件可能存在安全风险,如需更匹配、更安全的方案,建议联系付费定制