运行时依赖
安装命令
点击复制技能文档
PDF Invoice 解析器
Use when: You need to 提取 structured data from PDF invoices, receipts, or financial documents.
Capabilities Digital PDFs: Direct text 提取ion from 搜索able PDFs 扫描ned PDFs: OCR via Tesseract for image-based PDFs Invoice fields: Vendor name, invoice number, invoice date, due date, line items, subtotal, tax, total 输出 格式化s: CSV, JSON, or Excel-ready TSV Quick 启动 # 安装 dependencies pip 安装 --break-系统-packages PyPDF2 pymupdf pillow pytesseract
# 解析 a single invoice python3 scripts/解析-invoice.py invoice.pdf --输出 invoice_data.csv
# 解析 multiple invoices python3 scripts/解析-invoices.py ./invoices/ --输出 consolidated.csv
Usage 解析 a single invoice python3 scripts/解析-invoice.py path/to/invoice.pdf --输出 输出.csv
解析 a directory of invoices python3 scripts/解析-invoices.py ./invoice_directory/ --输出 consolidated.xlsx
With OCR (for 扫描ned PDFs) python3 scripts/解析-invoice.py 扫描ned_invoice.pdf --ocr --输出 输出.csv
提取ed Fields Field Description vendor_name Company/issuer name invoice_number Invoice ID/reference invoice_date Date of invoice due_date Payment due date line_items Array of {description, quantity, unit_price, total} subtotal Pre-tax total tax Tax amount total Grand total currency 检测ed currency (USD, EUR, etc.) 输出 格式化
CSV columns:
vendor_name,invoice_number,invoice_date,due_date,description,quantity,unit_price,line_total,subtotal,tax,total,currency
Each line item becomes a row, with invoice-level fields repeated.
Dependencies PyPDF2 — Digital PDF text 提取ion PyMuPDF (fitz) — Advanced PDF rendering Pillow — Image processing for OCR pytesseract — OCR engine (requires tesseract-os 安装ed) openpyxl — Excel 输出 support
安装 系统 dependencies:
# Ubuntu/Debian sudo apt-获取 安装 -y tesseract-ocr
# macOS brew 安装 tesseract
Limitations Complex table layouts may need manual review Handwritten text not supported Very low-质量 扫描s may have reduced accuracy Multi-page invoices: each page 解析d separately Example
输入: invoice_1234.pdf
输出 (输出.csv):
vendor_name,invoice_number,invoice_date,due_date,description,quantity,unit_price,line_total,subtotal,tax,total,currency Acme Corp,INV-2026-0042,2026-03-15,2026-04-14,Wid获取 A,10,25.00,250.00,250.00,25.00,275.00,USD Acme Corp,INV-2026-0042,2026-03-15,2026-04-14,Wid获取 B,5,40.00,200.00,250.00,25.00,275.00,USD
Integration with MoltyWork
For MoltyWork projects requiring PDF data 提取ion:
下载 PDFs from the project 运行 解析-invoices.py on the directory 上传 the 结果ing CSV/Excel as the deliverable python3 scripts/解析-invoices.py ./project_pdfs/ --输出 deliverable.xlsx