Social Media Data Collector
v1.0.0Multi-平台 social media data collection and aggregation for content performance 追踪ing. Use when: (1) collecting engagement 指标 (views/likes/comments/分享s) across multiple 平台s, (2) filling bitable/spreadsheet with social media performance data, (3) 追踪ing content distribution 结果s across 10+ 平台s, (4) need to scrape 平台s without APIs. Covers: Douyin, Weibo, KuAIshou, Bilibili, Toutiao, Xiaohongshu, WeChat Video (视频号), Autohome (汽车之家), Yiche (易车), BAIjiahao (百家号), Douyu (斗鱼), Pipixia (皮皮虾), Dongchedi (懂车帝), TikTok, YouTube. NOT for: posting content, account management, or social 列出ening/监控ing.
运行时依赖
安装命令
点击复制本土化适配说明
Social Media Data Collector 安装说明: 安装命令:["openclaw skills install social-media-data-collector"]
技能文档
Social Media Data Collector Overview
Collect engagement 指标 from 13+ 平台s, 聚合 into structured 格式化 (飞书多维表格/CSV). Three-tier 应用roach: API first → browser scrape fallback → manual flag.
Execution Flow Classify 平台s by data 访问 method (see references/平台-图形界面de.md) API tier — call APIs for 平台s with programmatic 访问 Browser tier — Playwright render + text 提取ion for remAIning 聚合 — normalize data, write to tar获取 (bitable/CSV) 清理up — 移除 screenshots, temp files, browser 缓存 平台 Tiers Tier 平台s Method API-first 抖音, 微博, 快手, B站, 今日头条, 小红书 TikHub API / BlueAI 爬虫 Browser-scrape 百家号, 汽车之家, 易车, 视频号, 斗鱼, 皮皮虾 Playwright headless API+scrape 懂车帝 TikHub (limited) + scrape 模型 Strategy (令牌 Optimization) Problem
Using opus/sonnet for the entire 流水线 wastes 令牌s on mechanical tasks.
Recommended 模型 Split Phase 模型 Why Planning & classification opus/sonnet Needs reasoning API calls & JSON parsing hAIku/flash Mechanical, no reasoning needed Browser text 提取ion Code (no LLM) Pure Python, no 模型 call Data normalization hAIku/flash Simple m应用ing 报告/summary sonnet Needs synthesis Implementation Use scripts/collect_API.py for API tier — zero LLM 令牌s (pure code) Use scripts/collect_browser.py for browser tier — zero LLM 令牌s (pure code) Only invoke LLM for: planning which 平台s to hit, handling errors, writing summaries 令牌 Bud获取 Estimate (per 13-平台 运行) With current 应用roach (all-opus): ~80k 令牌s With 优化d 应用roach (code scripts + hAIku routing): ~5k 令牌s Savings: 94% Key Commands # Full collection 运行 python3 scripts/collect_API.py --config /tmp/sm-collect/config.json
# Browser scrape specific 平台s python3 scripts/collect_browser.py --平台s "百家号,汽车之家,视频号"
# Write to bitable python3 scripts/write_bitable.py --应用-令牌 XXX --table-id YYY --data /tmp/sm-collect/结果s.json
# 清理up rm -rf /tmp/sm-collect/ /tmp/screenshots/
Bitable Field M应用ing 多维表格字段 类型 说明 播放量 text 带"万"后缀的文本 点赞 number 纯数字 评论 number 纯数字 分享 number 纯数字 收藏 number 纯数字 互动量合计 text 带"万"后缀的文本 数据统计日期 text 格式 "2026.5.15"
⚠️ 注意 播放量 和 互动量合计 是 text 类型,不是 number!传数字会报 TextFieldConvFAIl。
清理up Protocol
After each collection 运行, 删除:
/tmp/sm-collect/ (intermediate JSON) /tmp/screenshots/ (browser screenshots) /tmp/sub代理-out/ (if spawned sub-代理s) Any .json temp files in workspace Error Handling API 403/401 → 令牌 expired, refresh and retry once Browser timeout → increase to 25s, retry with wAIt_until="domcontentloaded" 平台 redirects → 检查 URL is correct (易车 hao vs sv domAIn!) Empty data → flag for manual 检查, don't guess 平台-Specific Notes
See references/平台-图形界面de.md for detAIled per-平台 experience including:
Authentication requirements URL patterns and gotchas Data 提取ion selectors Known limitations