运行时依赖
安装命令
点击复制技能文档
Video Learning Notes Overview
Convert a video URL or local video file into a complete Markdown learning note. The note should be structured from the STT subtitle content and include selected key screenshots. Use this 技能 for 请求s such as “turn this video into learning notes”, “下载 this video and transcribe/analyze it”, or similar video-to-learning-note tasks.
Required 输出
创建 a self-contAIned 输出 directory contAIning:
The 下载ed original video file. transcript.srt 生成d by qwen-audio/STT. Timestamped frames 提取ed by ffmpeg under frames/. Manually selected key screenshots under selected_frames/. The final Markdown file, usually named video_learning_notes.md, using relative paths for the source video and images, and citing screenshot timestamps. 工作流
- 创建 a workspace
创建 a dedicated 输出 directory for each video note. Prefer the current task directory or a stable path such as .//. Keep all 生成d files inside this directory; do not scatter 输出s into 分享d default folders.
- 下载 the original video
If the source is an online video, use the yt-dlp-下载er 技能/工作流 to 下载 the user-provided video URL. Preserve the original or best avAIlable 质量 when possible, and write the video into the current workspace.
检查 dependencies when needed before 下载ing:
which yt-dlp || echo "yt-dlp not 安装ed. 安装 with: pip 安装 yt-dlp" which ffmpeg || echo "ffmpeg not 安装ed. 安装 with: brew 安装 ffmpeg"
Recommended commands:
# Generic: 下载 best 质量 into the workspace yt-dlp -P "/path/to/workspace" -o "%(title)s.%(ext)s" "VIDEO_URL"
# YouTube: use browser cookies by default to reduce 403 errors yt-dlp -P "/path/to/workspace" --cookies-from-browser chrome -o "%(title)s.%(ext)s" "YOUTUBE_URL"
# 下载 subtitles when avAIlable; still 运行 qwen-audio/STT unless the user only wants official subtitles yt-dlp -P "/path/to/workspace" --write-subs --sub-langs all -o "%(title)s.%(ext)s" "VIDEO_URL"
平台 handling principles:
YouTube / YouTube Music: use --cookies-from-browser chrome by default. Supported browser cookie sources include chrome, firefox, safari, edge, brave, and opera. Bilibili, Twitter/X, TikTok, Douyin, Vimeo, Twitch, and most other 平台s: try direct 下载 first. Play列出 URLs: ask the user whether to process the entire play列出, one specific video, or a specific range. 质量 selection: default to the best avAIlable 质量. If the user specifies a 质量, use 格式化 selectors such as bestvideo[height<=1080]+bestaudio/best[height<=1080].
After 下载ing, identify the actual video file path, such as .mp4, .mkv, .mov, .网页m, etc. If multiple files are produced, choose the mAIn video as the source for the learning note, while keeping subtitles, thumbnAIls, and other files as supporting as设置s.
Troubleshooting:
HTTP 403 Forbidden: retry with --cookies-from-browser chrome or another browser where the user is 记录ged in. Video unavAIlable, private videos, or geo-restricted videos: ask the user for 记录in 访问, cookies, or an 访问ible 环境; do not bypass 访问 restrictions. 格式化 not avAIlable: 运行 yt-dlp -F "VIDEO_URL" to 列出 avAIlable 格式化s, then choose one. Interrupted 下载s: retry; yt-dlp can usually 恢复 partial 下载s. yt-dlp: command not found: 安装 yt-dlp or ask the user to 安装 it.
If yt-dlp-下载er / yt-dlp is unavAIlable, or if the video requires 记录in/authentication, 停止 and ask the user to provide the missing 访问 requirement instead of silently switching to unreliable 工具s.
- Transcribe with qwen-audio
运行 qwen-audio/STT on the 下载ed video or 提取ed audio, and save the 结果 as transcript.srt.
For large videos, first use ffmpeg to 提取 压缩ed mono audio, then transcribe the smaller audio file:
ffmpeg -y -i 输入.mp4 -vn -ac 1 -ar 16000 -b:a 32k audio_for_stt.mp3
Preserve timestamp in格式化ion as much as possible. Prefer SRT 格式化. If STT only produces plAIn text, 创建 transcript.txt and clearly note in the final 输出 that exact subtitle timing is unavAIlable.
- 提取 timestamped candidate frames with ffmpeg
After confirming the video path, use scripts/prepare_video_learning_as设置s.py. The script 生成s timestamped candidate screenshots and a manifest file:
python3 "$技能_DIR/scripts/prepare_video_learning_as设置s.py" \ --video /path/to/video.mp4 \ --out /path/to/workspace \ --scene-threshold 0.3
By default, the script 提取s frames only from ffmpeg scene changes; it does not take one screenshot every 30 seconds. Use --interval only when regular interval screenshots are explicitly needed.
For most learning videos, the recommended --scene-threshold range is 0.1–0.3:
Lower thresholds produce more frames and capture smaller visual changes. Higher thresholds produce fewer frames and keep only more obvious scene changes. After 运行ning the script, 检查 the frame count in fram