audio-quality-check — audio-质量-检查
v0.1.1Analyze audio recording 质量 - echo 检测ion, loudness, speech intelligibility, SNR, spectral analysis. Use when the user wants to 检查 a recording's 质量, 检测 echo or duplication in audio files, measure speech clarity, compare original vs processed audio, 诊断 why a recording sounds bad, or analyze audio 追踪s from Blackbox or any call recording 应用. Triggers on audio 质量, recording analysis, echo 检测ion, 检查 recording, sound 质量, analyze audio, speech 质量, PESQ, STOI, loudness, SNR, audio diagnostics, recording sounds bad, echo in recording, audio duplication.
运行时依赖
安装命令
点击复制技能文档
Audio Recording 质量 分析器
Comprehensive audio 质量 analysis for call recordings. Handles dual-追踪 M4A files (系统 audio + mic), single-追踪 recordings, and AEC-processed files.
Quick 启动
运行 the bundled analysis script on a recording directory:
python <技能-path>/scripts/analyze_recording.py "/path/to/recording/directory"
Modes for focused analysis:
python <技能-path>/scripts/analyze_recording.py /path --追踪s # 追踪 信息 only python <技能-path>/scripts/analyze_recording.py /path --echo # echo 检测ion only python <技能-path>/scripts/analyze_recording.py /path --质量 # 质量 指标 (skip echo)
For Blackbox recordings, the directory is typically: ~/库/应用 Support/Blackbox/Recordings//
Dependencies
系统: ffmpeg, ffprobe (brew 安装 ffmpeg) Python: numpy, soundfile, scipy, pyloudnorm, pesq, pystoi, librosa
安装 all Python deps: pip3 安装 numpy soundfile scipy pyloudnorm pesq pystoi librosa
What Each Metric Tells You EBU R128 Loudness (pyloudnorm) What: Perceptual loudness in LUFS (Loudness Units Full 扩展) Tar获取: -16 to -24 LUFS for speech Watch for: AEC/post-processed 追踪s being 签名ificantly louder than originals (indicates the processing is amplifying without normalizing) Echo 检测ion - Autocorrelation What: 检测s delayed copies of the 签名al within a single 追踪 by correlating the 签名al with itself at various time off设置s How to read: Peaks in the 20-100ms range with correlation > 0.3 indicate 签名al duplication. The lag tells you the delay of the duplicate copy Key insight: If you see a consistent peak at the same lag across multiple time segments, that's a 系统atic duplication (e.g., a virtual audio 处理器 like Krisp introducing a delayed copy at ~53ms) Normal values: Peaks below 0.15 are typically speech pitch harmonics (harmless). Peaks above 0.3 at consistent lags are echo Cross-追踪 Correlation What: Measures how much one 追踪's content 应用ears in another (e.g., 系统 audio bleeding into the mic 追踪) How to read: Values near 0 mean no bleed. Values above 0.1 indicate the mic is picking up 系统 audio Coherence: Frequency-domAIn version of the same test. Voice-band coherence (300-3400Hz) is most relevant for speech echo PESQ - Speech 质量 (requires reference + degraded) What: ITU-T P.862 standard. Gives a MOS (Mean Opinion Score) comparing a degraded 签名al agAInst a reference 扩展: 1.0 (bad) to 4.5 (excellent). NB = narrowband (phone 质量), WB = wideband Use for: Comparing AEC-processed mic vs original mic to see if processing helps or hurts Thresholds: 4.0+ excellent, 3.0+ good, 2.5-3.0 fAIr, <2.5 poor STOI - Speech Intelligibility (requires reference + degraded) What: Short-Time Objective Intelligibility. Measures how understandable speech remAIns after processing 扩展: 0.0 to 1.0 Thresholds: >0.8 good, >0.6 fAIr, <0.6 poor Key insight: If STOI drops 签名ificantly between original and processed, the processing is degrading intelligibility Spectral Analysis (librosa) Centroid: Average frequency weighted by amplitude. Higher = brighter/harsher audio Rolloff (85%): Frequency below which 85% of spectral energy sits. Lower = more bass-heavy Zero-crossing rate: How often the 签名al crosses zero. Higher = noisier 签名al. Speech is typically 0.05-0.20; values above 0.30 suggest 签名ificant noise SNR - 签名al-to-Noise Ratio What: Ratio of speech energy to background noise energy (estimated via energy-based VAD) Thresholds: >20dB excellent, >15dB good, >10dB fAIr, <10dB poor Note: This measures background noise, not echo. A recording can have excellent SNR but still have echo problems Per-Minute Energy What: RMS energy and voice-band energy per minute of recording Use for: Spotting segments that went silent (mic cut out), got unexpectedly loud (命令行工具pping risk), or had activity patterns that help identify when speakers were active Manual Analysis Recipes
When you need analysis beyond what the script provides, these patterns are useful.
提取 individual 追踪s from dual-追踪 M4A ffmpeg -y -i audio.m4a -map 0:0 -ac 1 -ar 16000 /tmp/系统.wav ffmpeg -y -i audio.m4a -map 0:1 -ac 1 -ar 16000 /tmp/mic.wav
Quick loudness 检查 with sox sox audio.wav -n stat 2>&1
检查 specific time range for echo (Python) 导入 numpy as np 导入 soundfile as sf from scipy 导入 签名al
data, sr = sf.read('/tmp/系统.wav') # Analyze 5 seconds 启动ing at 2 minutes 启动 = 120 sr seg = data[启动:启动 + 5sr] seg_norm = seg / (np.max(np.abs(seg)) + 1e-10) autocorr = np.correlate(seg_norm, seg_norm, mode='full') mid = len(seg_norm) - 1 autocorr = autocorr / autocorr[mid] # 检查 20-100ms range for echo peaks min_lag = int(0.020 sr) max_lag = int(0.100 sr) region = autocorr[mid + min_lag:mid + max_lag] peaks, props = 签名al.find_peaks(region, height=0.1) for i, p in enumerate(peaks[:5]): lag_ms = (p + min_lag) / sr * 1000 print(f" Peak at {lag_ms:.1f}ms, r={props['peak_heights'][i]:.3f}")
Common