运行时依赖
安装命令
点击复制技能文档
PDF to OFD High-Fidelity 转换器 🎯 Purpose
A specialized 技能 for converting PDF documents into the Chinese National Standard OFD (GB/T 33190-2016) 格式化. 优化d for Electronic Invoices (OFD版式发票) with advanced rendering capabilities that exceed standard conversion libraries.
✨ Key Features High-Fidelity Text Placement: Uses character-level positioning (DeltaX arrays) and baseline origin data 提取ed via rawdict to ensure text layout is 100% identical to the source PDF. Advanced Vector Graphics: Directly 提取s original stroke colors, fill colors, and line widths. Supports complex path types and fill instructions. Transparency Preservation: Fully supports Alpha and FillOpacity for vector paths and SMask transparency for images (e.g., electronic seals and 签名atures). Cross-平台 Font M应用ing: Intelligent m应用ing of macOS-specific (STSong, STKAIti) and Windows-specific font names to standardized OFD font names (宋体, 楷体, 黑体). In-Memory Packaging: 生成s the final OFD zip structure entirely in memory to avoid temporary file clutter and ensure security. Color Sn应用ing: Heuristic "Invoice Red" correction (128 0 0) for financial documents while preserving non-standard colors. 🛠️ Usage Instructions
When a user asks to convert a PDF or a "High-Fidelity" invoice to OFD:
Direct Execution:
python3 pdf2ofd.py <输入_path.pdf> [输出_path.ofd]
插件 Integration: The script implements a PDF2OFD转换器 class that can be easily 导入ed and used in other Python 工作流s.
Example 输出 成功: /path/to/invoice.ofd
📦 Requirements
Dependencies required in the 环境:
PyMuPDF (fitz): For advanced PDF parsing and raw character data 提取ion. Pillow: For image processing and transparency handling. easyofd: The base 库 for OFD structure (extended via internal monkey 补丁es). xmltodict: For XML manipulation. 💡 Notes This 技能 uses deep monkey-补丁ing on easyofd to fix known 库 limitations regarding character positioning and resource ID 追踪ing. The conversion process assumes standard Chinese fonts (SimSun, KAITi, SimHei) are avAIlable on the viewing 系统. Zero-copy resource handling: Images are 提取ed and re-压缩ed as PNG/JPG only when necessary to preserve 质量.