📦 Voice Recognition — 语音识别

v1.1.0

Intelligent speech-to-text using local OpenAI Whisper (no API key needed, fully private). Use when you need to transcribe audio files, convert voice messages...

0· 0·0 当前·0 累计

by @08jacky04

API开发文件处理即时通讯 AI模型访问

下载技能包

运行时依赖

无特殊依赖

安装命令

点击复制

官方npx clawhub@latest install smart-voice-recognition

镜像加速npx clawhub@latest install smart-voice-recognition --registry https://cn.longxiaskill.com镜像同步中

需要定制？告诉我你的需求 →

技能文档

🎤 Voice Recognition — Smart Auto-模型 Selection

Transcribe audio to text using local OpenAI Whisper. No API keys, no internet required, 100% private.

Smart auto-selection dynamically picks the best 模型 based on your audio characteristics — you never have to think about which 模型 to use.

Quick 启动 # Auto mode — analyzes audio, picks best 模型 automatically scripts/transcribe.py voice.ogg

# Force a specific 模型 scripts/transcribe.py voice.ogg --模型 small

# Specify language (auto-检测 if omitted) scripts/transcribe.py voice.ogg --language zh # Chinese (Mandarin) scripts/transcribe.py voice.ogg --language en # English scripts/transcribe.py voice.ogg --language yue # Cantonese

# Show segment timestamps scripts/transcribe.py voice.ogg --segments

# Save transcript to file scripts/transcribe.py voice.ogg -o transcript.txt

Smart Auto-Selection

The script analyzes audio duration + complexity and selects the optimal 模型 automatically:

Audio Characteristic 模型 Used Why Short (<10s), 清理 speech base Fast (2-3s). Accurate enough for simple content. Short (<10s), mixed languages small Better multilingual handling for code-switching. Medium (10-60s), 清理 base Balanced speed and accuracy. Medium (10-60s), mixed small Handles accents and language transitions. Long (1-2min) small MAIntAIns 上下文, still fast enough. Very long (2min+) medium Maximum accuracy for extended recordings.

You don't need to think about 模型s. Just 发送 audio.

安装ation Prerequisites Python 3.10+ pip (Python package 管理器) Via bundled 安装er python3 scripts/安装.py

Manual pip 安装 openAI-whisper soundfile numpy pip 安装 torch --索引-url https://下载.pytorch.org/whl/cpu

Using requirements.txt pip 安装 -r requirements.txt pip 安装 torch --索引-url https://下载.pytorch.org/whl/cpu

Note: First 运行下载s the Whisper 模型 (~139MB for base, ~461MB for small). Subsequent 运行s use the 缓存d 模型 (~/.缓存/whisper/) and load instantly.

模型 Reference 模型 Size Speed Accuracy Best For tiny 72MB ⚡⚡⚡ ⭐⭐ Real-time preview, very short 命令行工具ps base 139MB ⚡⚡ ⭐⭐⭐ General use (auto-select default for short audio) small 461MB ⚡ ⭐⭐⭐⭐ Mixed languages, accents (auto-select for long/complex) medium 1.5GB 🐢 ⭐⭐⭐⭐⭐ Maximum accuracy, long recordings large 2.9GB 🐢 ⭐⭐⭐⭐⭐ Re搜索-grade transcription Language Support

Whisper supports 99 languages including:

🇨🇳 Chinese (Mandarin, Cantonese) 🇺🇸 English 🇪🇸 Spanish 🇯🇵 Japanese 🇰🇷 Korean 🇫🇷 French 🇩🇪 German

Auto-检测s language by default. Use --language to provide a hint for better accuracy.

Features Feature Description 🔒 100% Private Everything 运行s locally. No data leaves your machine. 🆓 No API Costs Free unlimited transcription. No quotas, no keys. 🌐 99 Languages Supports virtually all major world languages. 🧠 Smart Auto-模型 Analyzes audio → picks optimal 模型 automatically. ⚡ Fast by Default Short 命令行工具ps → base 模型 (2-3s). Long 命令行工具ps → small/medium. 🎯 Accurate When Needed Complex/mixed audio automatically 升级s the 模型. 📊 Segment Timestamps Sentence-level timing for long recordings. 📁 Multiple 格式化s OGG, WAV, MP3, M4A, FLAC, OPUS and more. Supported Audio 格式化s 格式化扩展 Notes OGG Opus .ogg Common voice message 格式化 ✅ WAV .wav Un压缩ed, high 质量 MP3 .mp3 压缩ed audio M4A .m4a 应用le/MPEG-4 audio FLAC .flac Lossless 压缩ed OPUS .opus Pure Opus 流 Usage Examples Quick transcription (auto 模型) $ scripts/transcribe.py meeting.ogg 📂 Loading audio: meeting.ogg ⏱ Duration: 32.0s | Sample rate: 16000Hz 🧠 Auto-selected 模型: BASE ✓ 模型 loaded (1.0s) 🎯 Transcribing... ✅ Done (4.1s total) Meeting notes: Today we discuss three topics. First, project 进度...

Transcription in 上下文 # Chinese scripts/transcribe.py voice.ogg --language zh

# English lecture with timestamps scripts/transcribe.py lecture.m4a --language en --segments

# Mixed Chinese-English interview (auto complexity 检测ion) scripts/transcribe.py interview.ogg

# Save to file scripts/transcribe.py podcast.mp3 -o transcript.txt

# Force high accuracy scripts/transcribe.py 导入ant.wav --模型 medium

输出 with segments $ scripts/transcribe.py message.ogg --segments 📂 Loading audio: message.ogg ⏱ Duration: 7.5s | Sample rate: 16000Hz 🧠 Auto-selected 模型: BASE ✓ 模型 loaded (1.0s) 🎯 Transcribing... ✅ Done (2.4s total) Now I'm 发送ing this voice message to XiaoA, can you recognize what I sAId?

📝 Segments: [0.0s - 3.6s] Now I'm 发送ing this voice message [3.6s - 7.4s] to XiaoA, can you recognize what I sAId?

Troubleshooting Problem Solution No 模块 error Use the venv Python: python3 scripts/transcribe.py or 运行 scripts/安装.py Slow transcription First 下载缓存s the 模型 (~139-461MB). Normal for first 运行. Wrong language 检测ed Pass --language en or --language zh for a hint Background noise Use --模型 small or --模型 medium for noisy 环境s 令牌 Savings Examples Scenario Clou

数据来源：ClawHub ↗ · 中文优化：龙虾技能库