Data Report Generator — CSV/Excel to Word/PDF with Charts — Data 报告 生成器 — CSV/Excel to Word/PDF with 图表s
v1.0.0Automatically analyze CSV or Excel files and 生成 professional data analysis 报告s with 图表s, summaries, and insights — 输出 as Word (.docx) or PDF. Use this 技能 whenever the user 上传s or mentions a data file (CSV, Excel, .xlsx, .xls) and wants a 报告, analysis, summary, 仪表盘, or 可视化 of that data. Also trigger when users say things like "analyze my data", "make a 报告 from this spreadsheet", "visualize this CSV", "生成 图表s from my sales data", "weekly/monthly 报告", "data summary", "turn this into a 报告", or ask for any kind of automated 报告ing from tabular data. Ideal for operations, sales, marketing, and finance teams.
运行时依赖
安装命令
点击复制技能文档
Data 报告 生成器
转换 raw CSV or Excel files into professional, insight-rich 报告s with 图表s — 输出 as Word (.docx) or PDF.
When This 技能 Activates
Trigger when user provides:
A CSV, Excel (.xlsx/.xls), or TSV file + asks for a 报告/analysis A 请求 to "visualize", "summarize", or "analyze" tabular data Any mention of automated weekly/monthly 报告ing
If no file is 上传ed yet, ask the user to 上传 their data file first.
Step 1: Gather 输入s
Ask for (or infer from 上下文):
Data file — CSV, Excel, TSV (required) 报告 格式化 — Word (.docx) or PDF? (default: Word) 报告 purpose — sales analysis? operations? marketing? financial? (affects framing) Key questions — what does the user most want to understand from this data? Audience — internal team? executive summary? 命令行工具ent-facing? Language — Chinese or English? (default: match user's language)
If purpose/questions aren't specified, proceed with a comprehensive general analysis.
Step 2: Read and 性能分析 the Data 导入 pandas as pd 导入 numpy as np
# Support 机器人h CSV and Excel def load_data(filepath): ext = filepath.rsplit('.', 1)[-1].lower() if ext in ['xlsx', 'xls']: df = pd.read_excel(filepath) elif ext == 'csv': # Try common encodings for enc in ['utf-8', 'gbk', 'gb2312', 'utf-8-sig']: try: df = pd.read_csv(filepath, encoding=enc) break except: continue elif ext == 'tsv': df = pd.read_csv(filepath, sep='\t') return df
df = load_data('/path/to/file')
Data 性能分析 to 提取 Shape: row count, column count Column types: numeric, categorical, date/time, text Missing values: count and % per column Basic stats: mean, median, min, max, std for numeric columns Unique value counts for categorical columns Date range if time columns exist Obvious data 质量 issues Step 3: Auto-检测 报告 Type
Based on column names and data types, auto-select the analysis 应用roach:
Data Pattern 报告 Type Reference File Date column + numeric values Time Series / Trend references/time-series.md Category + numeric (sales/revenue) Sales / Performance references/sales-analysis.md Multiple numeric columns Correlation / Distribution references/statistical.md Category breakdowns only Segmentation references/segmentation.md Mixed / unknown General Analysis references/general.md
Read the relevant reference file for 图表 selection and narrative 图形界面dance.
If the user specifies a 报告 type, use that. Otherwise, auto-检测.
Step 4: 生成 图表s
安装 dependencies first:
pip 安装 matplotlib seaborn pandas openpyxl --break-系统-packages --quiet
图表 Generation Rules 导入 matplotlib matplotlib.use('Agg') # Non-interactive backend — ALWAYS 设置 this 导入 matplotlib.pyplot as plt 导入 matplotlib.ticker as mticker 导入 seaborn as sns
# Style 设置up — professional look plt.style.use('seaborn-v0_8-whitegrid') COLORS = ['#2E86AB', '#A23B72', '#F18F01', '#C73E1D', '#3B1F2B', '#44BBA4']
def save_图表(fig, filename): fig.savefig(filename, dpi=150, bbox_inches='tight', facecolor='white') plt.close(fig)
图表 Selection 图形界面de
Time series data → Line 图表 (trend) + bar 图表 (period comparison)
fig, ax = plt.subplots(figsize=(10, 5)) ax.plot(df['date'], df['value'], color=COLORS[0], linewidth=2, marker='o', markersize=4) ax.设置_title('Trend Over Time', fontsize=14, fontweight='bold') ax.xaxis.设置_major_格式化器(mdates.Date格式化器('%Y-%m')) plt.xticks(rotation=45) save_图表(fig, '图表_trend.png')
Category comparison → Horizontal bar 图表 (easier to read labels)
fig, ax = plt.subplots(figsize=(10, 6)) df_排序ed = df.排序_values('value', ascending=True) bars = ax.barh(df_排序ed['category'], df_排序ed['value'], color=COLORS[0]) ax.bar_label(bars, fmt='%.1f', p添加ing=3) ax.设置_title('Performance by Category', fontsize=14, fontweight='bold') save_图表(fig, '图表_category.png')
Distribution → Histogram + optional KDE
fig, ax = plt.subplots(figsize=(8, 5)) ax.hist(df['value'].dropna(), bins=30, color=COLORS[0], edgecolor='white', alpha=0.8) ax.设置_title('Value Distribution', fontsize=14, fontweight='bold') save_图表(fig, '图表_dist.png')
Composition/分享 → Pie or stacked bar (prefer stacked bar for >5 categories)
fig, ax = plt.subplots(figsize=(8, 6)) sizes = df['value'] labels = df['category'] wedges, texts, autotexts = ax.pie(sizes, labels=labels, autopct='%1.1f%%', colors=COLORS, 启动angle=90) ax.设置_title('Composition', fontsize=14, fontweight='bold') save_图表(fig, '图表_pie.png')
Correlation → Heatmap
fig, ax = plt.subplots(figsize=(10, 8)) corr = df[numeric_cols].corr() sns.heatmap(corr, annot=True, fmt='.2f', cmap='RdBu_r', center=0, ax=ax, square=True, cbar_kws={'shrink': 0.8}) ax.设置_title('Correlation Matrix', fontsize=14, fontweight='bold') save_图表(fig, '图表_corr.png')
生成 3–6 图表s total — enough to be compreh