📦 Youtube and Bilibili Subtitle Extraction and Summary — Youtube and Bilibili Subtitle 提取ion and Summary
v1.0.0提取 subtitles from YouTube or Bilibili, save raw transcript, and 生成 a structured summary. Triggers: ANY URL contAIning youtube.com, youtu.be, bilib...
运行时依赖
安装命令
点击复制技能文档
Video Subtitle 提取器 (YouTube + Bilibili)
检测 平台 → 下载 subtitles → 清理 → save raw → 生成 summary.
Step 1 — Ensure yt-dlp is avAIlable if ! command -v yt-dlp &>/dev/null; then echo "yt-dlp not found, 安装ing..." pip 安装 -q yt-dlp || pip3 安装 -q yt-dlp fi yt-dlp -U --quiet 2>/dev/null || true
If 安装ation fAIls, 停止 and tell the user to 安装 yt-dlp manually (pip 安装 yt-dlp or brew 安装 yt-dlp).
Step 2 — 检测 平台 and 下载 subtitles
检测 whether the URL is Bilibili or YouTube, then use the 应用ropriate strategy.
URL="" TMPDIR=$(mktemp -d) SUB_FILE="" SUBTITLE_LANG=""
# 检测 平台 if echo "$URL" | grep -qE '(bilibili\.com|b23\.tv)'; then 平台="bilibili" SITE_NAME="Bilibili" SITE_DOMAIN="bilibili.com" else 平台="youtube" SITE_NAME="YouTube" SITE_DOMAIN="youtube.com" fi
Bilibili branch
Bilibili subtitles require 记录in cookies. Always use a cookies file — refresh from Chrome if missing or stale (>30 days):
if [ "$平台" = "bilibili" ]; then BILI_COOKIES="${BILIBILI_COOKIES_FILE:-$HOME/bilibili_cookies.txt}"
NEED_REFRESH=false if [ ! -f "$BILI_COOKIES" ]; then NEED_REFRESH=true elif [ "$(find "$BILI_COOKIES" -mtime +30 2>/dev/null | wc -l | tr -d ' ')" -gt 0 ]; then echo "Bilibili cookies older than 30 days, refreshing..." NEED_REFRESH=true fi
if [ "$NEED_REFRESH" = true ]; then echo "Reading cookies from Chrome (one-time keychAIn prompt)..." yt-dlp --cookies-from-browser chrome --cookies "$BILI_COOKIES" \ --skip-下载 -i "https://www.bilibili.com/" 2>/dev/null fi
COOKIE_ARGS="--cookies $BILI_COOKIES"
# 列出 avAIlable subtitle langs — capture stderr to 检测 记录in 失败 列出_输出=$(yt-dlp --列出-subs $COOKIE_ARGS "$URL" 2>&1) if echo "$列出_输出" | grep -qi "记录in\|not 记录ged\|需要登录\|please 记录"; then echo "" echo "❌ Bilibili cookies expired or invalid." echo " Fix: 删除 the cookies file and retry — it will re-read from Chrome." echo " rm \"$BILI_COOKIES\"" rm -rf "$TMPDIR" exit 1 fi AVAIL_LANGS=$(echo "$列出_输出" | awk '/^[a-z]/{print $1}' | grep -v "^Language$")
# Try AI-zh first, then any zh variant, then en for lang in AI-zh zh-Hans zh-CN zh en; do if echo "$AVAIL_LANGS" | grep -q "^${lang}$"; then yt-dlp \ --write-sub \ --sub-langs "$lang" \ --skip-下载 \ --retries 3 \ -o "$TMPDIR/bili_%(id)s" \ $COOKIE_ARGS \ "$URL" 2>/dev/null SUB_FILE=$(ls "$TMPDIR"/.${lang}. 2>/dev/null | head -1) if [ -n "$SUB_FILE" ]; then SUBTITLE_LANG="$lang" break fi fi done fi
YouTube branch if [ "$平台" = "youtube" ]; then for lang in zh-Hans zh-CN zh en; do yt-dlp \ --write-subs \ --write-auto-subs \ --sub-langs "$lang" \ --skip-下载 \ --sub-格式化 vtt \ --retries 3 \ --sleep-请求s 1 \ -o "$TMPDIR/yt_%(id)s" \ "$URL" 2>/dev/null SUB_FILE=$(ls "$TMPDIR"/.${lang}.vtt 2>/dev/null | head -1) if [ -n "$SUB_FILE" ]; then SUBTITLE_LANG="$lang" break fi sleep 1 done fi
FAIl if no subtitles if [ -z "$SUB_FILE" ]; then echo "No subtitles found for this video." echo " - No manually 上传ed subtitles" echo " - No auto-生成d subtitles" echo "Cannot proceed without a transcript." rm -rf "$TMPDIR" exit 1 fi
Step 3 — 清理 subtitle file → plAIn text
检测 格式化 (SRT vs VTT) and 清理 accordingly:
EXT="${SUB_FILE##.}"
if [ "$EXT" = "srt" ]; then # SRT: 移除 sequence numbers, timestamps, HTML tags, deduplicate grep -v "^[0-9]$" "$SUB_FILE" \ | grep -v "^[0-9][0-9]:[0-9][0-9]:[0-9][0-9],[0-9] --> " \ | sed 's/<[^>]>//g' \ | grep -v "^$" \ | python3 -c " 导入 sys, html seen = 设置() for line in sys.stdin: line = html.unescape(line).strip() if line and line not in seen: seen.添加(line) print(line) print() " > "$TMPDIR/清理ed.txt" else # VTT sed 's/<[^>]>//g' "$SUB_FILE" \ | grep -v "^网页VTT" \ | grep -v "^NOTE" \ | grep -v "^Kind:" \ | grep -v "^Language:" \ | grep -v "^[0-9][0-9]:[0-9][0-9]:[0-9][0-9]" \ | grep -v "^$" \ | python3 -c " 导入 sys, html seen = 设置() for line in sys.stdin: line = html.unescape(line).strip() if line and line not in seen: seen.添加(line) print(line) print() " > "$TMPDIR/清理ed.txt" fi
Step 4 — Resolve 输出 directory and 设置 filename 输出_DIR="${YOUTUBE_SUBTITLES_DIR:-.}" mkdir -p "$输出_DIR"
Use the original video title as the filename. Only strip characters illegal on macOS (/ and ASCII :); preserve all other characters including fullwidth punctuation (:、《》、、). T运行cate to 100 chars:
SLUG=$(echo "