将 Markdown 技术文档自动转换成带配音旁白的专业视频

v1.0.0

将 Markdown 技术文档自动转换成带配音旁白的专业视频。使用 edge-tts 生成自然人声、Remotion 渲染视觉场景、FFmpeg 合并音视频，输出 1920×1080 全高清视频。适用场景：项目文档视频化、教程制作、知识分享。

0· 0·0 当前·0 累计

by @mengbin92·MIT-0

文档工具视频处理微信

下载技能包

License

MIT-0

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

无特殊依赖

安装命令

点击复制

官方npx clawhub@latest install doc-to-video

镜像加速npx clawhub@latest install doc-to-video --registry https://cn.longxiaskill.com 镜像可用

需要定制？告诉我你的需求 →

技能文档

🎬 Doc to Video：Markdown 文档转专业视频

技能名称：doc-to-video 适用版本：OpenClaw / QClaw 技能类型：文档 → 视频自动化输出格式：1920×1080 MP4，H.264 视频 + AAC 音频

将 Markdown 技术文档一键转换成带自然人声旁白的专业视频。从内容分析、旁白编写、配音生成、视觉渲染，到音视频合并，全流程自动化。

📌 效果预览

本技能已在三个真实项目中验证：

视频时长场景数文件大小 Docker Registry 使用指南 ~153s 9个 ~3.2MB Docker Registry 搭建记录 ~207s 16个 ~5.1MB Solidity Nomad 多签教程 ~210s 11个 ~6.2MB 🔧 核心技术栈 Markdown 文档 │ ▼ ┌─────────────────┐ │ edge-tts │ ← 中文自然人声（Tingting/XiaoxiaoNeural） │ Python 生成配音 │ └────────┬────────┘ │ .m4a 音频文件 ▼ ┌─────────────────┐ │ FFmpeg atempo │ ← 加速配音匹配目标时长 └────────┬────────┘ │ ▼ ┌─────────────────┐ │ Remotion │ ← React 场景组件，TypeScript │ 视觉场景渲染 │ 帧率 30fps，分辨率 1920×1080 └────────┬────────┘ │ MP4 视频（无声） ▼ ┌─────────────────┐ │ FFmpeg 合并 │ ← 去原音 + 嵌入配音 └────────┬────────┘ │ ▼ 带配音的 MP4 视频 ✅

📦 安装方式一：一键安装（推荐）技能hub 安装 doc-to-video

技能Hub 自动安装 Python 依赖（edge-tts）和 Node 依赖（Remotion）。

方式二：手动安装 # 1. 安装 Python 依赖 pip3 安装 edge-tts

# 2. 安装 FFmpeg brew 安装 ffmpeg # macOS apt 安装 ffmpeg # Ubuntu/Debian

# 3. 确认 Remotion 已安装在工作区 ls /Users/mac/.qclaw-oversea/workspace/node_模块s/.bin/remotion

🚀 快速开始 Step 1：创建工作目录 mkdir my-video-project && cd my-video-project mkdir -p src audio out

Step 2：编写生成_audio.py #!/usr/bin/env python3 """生成各场景配音（edge-tts XiaoxiaoNeural）""" 导入 a同步io, edge_tts, os

SCENES = [ ("00_title", "欢迎观看本教程。本节介绍主要内容..."), ("01_chapter1", "第一章，首先介绍背景知识..."), ("02_chapter2", "第二章，讲解核心概念..."), # 更多场景... ]

VOICE = "zh-CN-XiaoxiaoNeural" os.makedirs("audio", exist_ok=True)

a同步 def gen(scene_id: str, text: str): m4a = f"audio/{scene_id}.m4a" if os.path.exists(m4a): print(f" [skip] {scene_id}") return print(f" → {scene_id}...") awAIt edge_tts.Communicate(text, VOICE).save(m4a) print(f" done")

a同步 def mAIn(): awAIt a同步io.gather([gen(sid, txt) for sid, txt in SCENES]) print("\nAll done!")

a同步io.运行(mAIn())

Step 3：生成配音 python3 生成_audio.py

Step 4：测量各段音频时长 for f in audio/.m4a; do dur=$(ffprobe -v error -show_entries 格式化=duration \ -of default=noprint_wr应用ers=1:nokey=1 "$f") echo "$f: ${dur}s" done

Step 5：拼接 + 加速音频 # 生成文件列表 cat > audio/file_列出.txt << 'EOF' file 'audio/00_title.m4a' file 'audio/01_chapter1.m4a' file 'audio/02_chapter2.m4a' # ...所有文件 EOF

# 拼接 ffmpeg -y -f concat -safe 0 -i audio/file_列出.txt \ -codec:a libmp3lame -q扩展:a 2 audio/combined_raw.mp3

# 加速（示例：原始 360s → 目标 210s，加速比 1.714） # 两级 atempo = sqrt(1.714) ≈ 1.31 ffmpeg -y -i audio/combined_raw.mp3 \ -过滤器:a "atempo=1.31,atempo=1.31" \ -codec:a aac -b:a 128k audio/combined_final.m4a

Step 6：编写 Remotion 场景组件 // src/Scene.tsx 导入 React from "react"; 导入 { useCurrentFrame } from "remotion";

function prog(t: number, s: number, d: number): number { return Math.min(1, Math.max(0, (t - s) / d)); }

// 精确帧边界（先渲染一次确认实际帧数后填入） const F = [0, 266, 1096, 1780, 2730, 3545, 4093, 4610, 5215, 5715, 6130];

导出 const Scene: React.FC = () => { const f = useCurrentFrame(); if (f < F[1]) return ; if (f < F[2]) return ; // ... 更多场景 return ; };

Step 7：入口文件 src/索引.tsx 导入 React from "react"; 导入 { Composition, registerRoot } from "remotion"; 导入 { Scene } from "./Scene";

registerRoot(() => ( ));

Step 8：渲染 + 合并 cd /path/to/workspace

# 第一次渲染：确认实际帧数 ./node_模块s/.bin/remotion render \ my-project/src/索引.tsx MyVideo \ out/temp.mp4

# ffprobe 确认实际帧数 ffprobe -v error -select_流s v:0 \ -show_entries 流=nb_frames -of csv=p=0 out/temp.mp4 # → 假设输出 6295，用此值更新 F[] 和 durationInFrames

# 重新渲染（用精确帧数） ./node_模块s/.bin/remotion render \ my-project/src/索引.tsx MyVideo \ out/final_video.mp4

# 合并音视频 ffmpeg -y -i out/final_video.mp4 -an -c:v copy /tmp/noaudio.mp4 ffmpeg -y -i /tmp/noaudio.mp4 -i audio/combined_final.m4a \ -c:v copy -c:a aac -b:a 128k -shortest \ out/final_with_audio.mp4

# 验证 ffprobe -v error -show_流s out/final_with_audio.mp4 \ | grep -E "codec_type|duration"

🔑 核心经验：音视频同步的坑与解法 ❌ 错误做法（会导致不同步）估算时长 → 计算帧边界 → 渲染 → 合并音频 ↑ 用的是估算帧数，实际渲染帧数可能不同

Remotion 渲染的实际帧数不一定等于 durationInFrames 设置值！因为 Remotion 按内容自动决定帧数，CSS 动画时长也会影响。

✅ 正确做法（两步确认法）估算时长 → 渲染一次视频 → ffprobe 确认实际帧数 ↓ 用实际帧数重新计算帧边界更新 F[] + durationInFrames → 重新渲染 → 合并

帧边界计算公式：

某场景开始帧 = round(该场景前累计秒数 / 音频总秒数 × 实际渲染总帧数)

为什么音频用 FFmpeg atempo 而不是 Remotion 内置？

Remotion 内置

🎨 场景组件设计规范布局原则背景：深色渐变（#0b1d3a → #1a3a6b）或代码风格（#0d1117）字体：标题 40–52px，内容 15–17px，等宽 13–14px 间距：水平留白 80–100px，垂直居中动画原则 // 动画进度 0→1（约 1–1.5 秒

License

运行时依赖

安装命令

技能文档

相关技能推荐