📦 Baoyu Youtube Transcript — 技能工具

v1.103.1

Downloads YouTube video transcripts/subtitles and cover images by URL or video ID. Supports multiple languages, translation, chapters, and speaker identifica...

0· 555·10 当前·10 累计
by @jimliu (Jim Liu 宝玉)·MIT-0
下载技能包
License
MIT-0
最后更新
2026/4/13
0
安全扫描
VirusTotal
无害
查看报告
OpenClaw
安全
high confidence
The skill's code, runtime instructions, and resource usage line up with its description: it fetches YouTube metadata/transcripts, caches them to disk, and falls back to yt-dlp when needed.
评估建议
This skill appears to do exactly what it says: fetch YouTube transcripts and thumbnails, cache them locally, and optionally fall back to yt-dlp. Things to consider before installing/running: - It will perform network requests to YouTube and write files under the output directory (default ./youtube-transcript). If you care about disk location or multi-user privacy, set --output-dir to a suitable path. - The code may spawn yt-dlp as a fallback (child_process.spawnSync is present). If yt-dlp is i...
详细分析 ▾
用途与能力
The name/description (download YouTube transcripts and cover images) matches the included scripts and runtime instructions. Required binaries (bun or npx) are only for running the provided TypeScript scripts; no unrelated credentials or config paths are requested.
指令范围
Instructions stay within the stated purpose but explicitly perform network requests to YouTube (InnerTube) and write output to a local cache/output directory (default: ./youtube-transcript). They also describe a fallback to yt-dlp and the ability to pass browser cookies to yt-dlp. These behaviors are expected for a transcript downloader but you should be aware the skill will: fetch HTML, extract an InnerTube API key from the page, call YouTube endpoints, download thumbnails, and create files under the chosen output directory.
安装机制
There is no install spec (instruction-only in the registry), and the included source is executed via bun or npx. No remote archives or arbitrary downloads are performed by the installer. This is a low-risk install model; runtime network activity occurs when you run the script.
凭证需求
The skill declares no required environment variables. It documents a single optional env var (YOUTUBE_TRANSCRIPT_COOKIES_FROM_BROWSER) used for yt-dlp cookies-from-browser fallback; this is reasonable and proportional to its stated fallback behavior. No unrelated secrets or cloud credentials are requested.
持久化与权限
always: false and normal autonomous invocation are used. The skill writes cached files and thumbnails into a local directory it controls (youtube-transcript by default) and updates a local index (.index.json). It does not request system-wide privileges or modify other skills.
scripts/youtube.ts:293
Shell command execution detected (child_process).
scripts/youtube.ts:377
Environment variable access combined with network send.
安全有层次,运行前请审查代码。

License

MIT-0

可自由使用、修改和再分发,无需署名。

运行时依赖

无特殊依赖

版本

latestv1.103.12026/3/22

## 1.103.1 - 2026-04-13 ### Fixes - `baoyu-markdown-to-html`: decode HTML entities and strip tags from article summary - `baoyu-post-to-weibo`: decode HTML entities and strip tags from article summary

无害

安装命令

点击复制
官方npx clawhub@latest install baoyu-youtube-transcript
镜像加速npx clawhub@latest install baoyu-youtube-transcript --registry https://cn.longxiaskill.com

技能文档

Downloads transcripts (subtitles/captions) from YouTube videos. Works with both manually created and auto-generated transcripts. No API key or browser required — uses YouTube's InnerTube API directly and automatically falls back to yt-dlp when YouTube blocks the direct API path.

Fetches video metadata and cover image on first run, caches raw data for fast re-formatting.

Script Directory

Scripts in scripts/ subdirectory. {baseDir} = this SKILL.md's directory path. Resolve ${BUN_X} runtime: if bun installed → bun; if npx available → npx -y bun; else suggest installing bun. Replace {baseDir} and ${BUN_X} with actual values.

ScriptPurpose
scripts/main.tsTranscript download CLI

Usage

# Default: markdown with timestamps (English)
${BUN_X} {baseDir}/scripts/main.ts 

# Specify languages (priority order) ${BUN_X} {baseDir}/scripts/main.ts --languages zh,en,ja

# Without timestamps ${BUN_X} {baseDir}/scripts/main.ts --no-timestamps

# With chapter segmentation ${BUN_X} {baseDir}/scripts/main.ts --chapters

# With speaker identification (requires AI post-processing) ${BUN_X} {baseDir}/scripts/main.ts --speakers

# SRT subtitle file ${BUN_X} {baseDir}/scripts/main.ts --format srt

# Translate transcript ${BUN_X} {baseDir}/scripts/main.ts --translate zh-Hans

# List available transcripts ${BUN_X} {baseDir}/scripts/main.ts --list

# Force re-fetch (ignore cache) ${BUN_X} {baseDir}/scripts/main.ts --refresh

Options

OptionDescriptionDefault
YouTube URL or video ID (multiple allowed)Required
--languages Language codes, comma-separated, in priority orderen
--format Output format: text, srttext
--translate Translate to specified language code
--listList available transcripts instead of fetching
--timestampsInclude [HH:MM:SS → HH:MM:SS] timestamps per paragraphon
--no-timestampsDisable timestamps
--chaptersChapter segmentation from video description
--speakersRaw transcript with metadata for speaker identification
--exclude-generatedSkip auto-generated transcripts
--exclude-manually-createdSkip manually created transcripts
--refreshForce re-fetch, ignore cached data
-o, --output Save to specific file pathauto-generated
--output-dir Base output directoryyoutube-transcript

Optional Environment Variables

VariableDescription
YOUTUBE_TRANSCRIPT_COOKIES_FROM_BROWSERPassed to yt-dlp --cookies-from-browser during fallback, e.g. chrome, safari, firefox, or chrome:Profile 1

Input Formats

Accepts any of these as video input:

  • Full URL: https://www.youtube.com/watch?v=dQw4w9WgXcQ
  • Short URL: https://youtu.be/dQw4w9WgXcQ
  • Embed URL: https://www.youtube.com/embed/dQw4w9WgXcQ
  • Shorts URL: https://www.youtube.com/shorts/dQw4w9WgXcQ
  • Video ID: dQw4w9WgXcQ

Output Formats

FormatExtensionDescription
text.mdMarkdown with frontmatter (incl. description), title heading, summary, optional TOC/cover/timestamps/chapters/speakers
srt.srtSubRip subtitle format for video players

Output Directory

youtube-transcript/
├── .index.json                          # Video ID → directory path mapping (for cache lookup)
└── {channel-slug}/{title-full-slug}/
    ├── meta.json                        # Video metadata (title, channel, description, duration, chapters, etc.)
    ├── transcript-raw.json              # Raw transcript snippets from YouTube API (cached)
    ├── transcript-sentences.json        # Sentence-segmented transcript (split by punctuation, merged across snippets)
    ├── imgs/
    │   └── cover.jpg                    # Video thumbnail
    ├── transcript.md                    # Markdown transcript (generated from sentences)
    └── transcript.srt                   # SRT subtitle (generated from raw snippets, if --format srt)
  • {channel-slug}: Channel name in kebab-case
  • {title-full-slug}: Full video title in kebab-case

The --list mode outputs to stdout only (no file saved).

Caching

On first fetch, the script saves:

  • meta.json — video metadata, chapters, cover image path, language info
  • transcript-raw.json — raw transcript snippets from YouTube API ({ text, start, duration }[])
  • transcript-sentences.json — sentence-segmented transcript ({ text, start: "HH:mm:ss", end: "HH:mm:ss" }[]), split by sentence-ending punctuation (.?!…。?! etc.), timestamps proportionally allocated by character length, CJK-aware text merging
  • imgs/cover.jpg — video thumbnail

Subsequent runs for the same video use cached data (no network calls). Use --refresh to force re-fetch. If a different language is requested, the cache is automatically refreshed.

When YouTube returns anti-bot / blocked responses on the direct InnerTube path, the script retries with alternate client identities and then falls back to yt-dlp if available. If fallback is needed but yt-dlp is unavailable, the agent should decide how to make yt-dlp available and continue rather than pushing the installation decision to the user.

SRT output (--format srt) is generated from transcript-raw.json. Text/markdown output uses transcript-sentences.json for natural sentence boundaries.

Workflow

When user provides a YouTube URL and wants the transcript:

  • Run with --list first if the user hasn't specified a language, to show available options
  • Always single-quote the URL when running the script — zsh treats ? as a glob wildcard, so an unquoted YouTube URL causes "no matches found": use 'https://www.youtube.com/watch?v=ID'
  • Default: run with --chapters --speakers for the richest output (chapters + speaker identification)
  • The script auto-saves cached data + output file and prints the file path
  • For --speakers mode: after the script saves the raw file, follow the speaker identification workflow below to post-process with speaker labels

When user only wants a cover image or metadata, running the script with any option will also cache meta.json and imgs/cover.jpg.

When re-formatting the same video (e.g., first text then SRT), the cached data is reused — no re-fetch needed.

Chapter & Speaker Workflow

Chapters (--chapters)

The script parses chapter timestamps from the video description (e.g., 0:00 Introduction), segments the transcript by chapter boundaries, groups snippets into readable paragraphs, and saves as .md with a Table of Contents. No further processing needed.

If no chapter timestamps exist in the description, the transcript is output as grouped paragraphs without chapter headings.

Speaker Identification (--speakers)

Speaker identification requires AI processing. The script outputs a raw .md file containing:

  • YAML frontmatter with video metadata (title, channel, date, cover, description, language)
  • Video description (for speaker name extraction)
  • Chapter list from description (if available)
  • Raw transcript in SRT format (pre-computed start/end timestamps, token-efficient)

After the script saves the raw file, spawn a sub-agent (use a cheaper model like Sonnet for cost efficiency) to process speaker identification:

  • Read the saved .md file
  • Read the prompt template at {baseDir}/prompts/speaker-transcript.md
  • Process the raw transcript following the prompt:
- Identify speakers using video metadata (title → guest, channel → host, description → names) - Detect speaker turns from conversation flow, question-answer patterns, and contextual cues - Segment into chapters (use description chapters if available, else create from topic shifts) - Format with Speaker Name: labels, paragraph grouping (2-4 sentences), and [HH:MM:SS → HH:MM:SS] timestamps
  • Overwrite the .md file with the processed transcript (keep the YAML frontmatter)

When --speakers is used, --chapters is implied — the processed output always includes chapter segmentation.

Error Cases

ErrorMeaning
Transcripts disabledVideo has no captions at all
No transcript foundRequested language not available
Video unavailableVideo deleted, private, or region-locked
IP blockedToo many requests, try again later
Age restrictedVideo requires login for age verification
bot detectedThe script retries alternate clients and then yt-dlp; if fallback tooling is missing, the agent should resolve that itself, otherwise if it still fails try YOUTUBE_TRANSCRIPT_COOKIES_FROM_BROWSER=safari (or your browser)
数据来源ClawHub ↗ · 中文优化:龙虾技能库