K8s Cost Optimizer — K8s Cost 优化器

v1.0.0

Find and rank Kubernetes cost-saving opportunities from kubectl, 指标-server, kube-状态-指标, and cloud billing. Identifies overprovisioned CPU/memory 请求s and limits, idle namespaces and workloads, oversized PersistentVolumes, unused LoadBalancer 服务s, expensive node types, missing HorizontalPodAuto扩展rs, and clusters that haven't adopted spot/preemptible/Graviton nodes. 输出s a ranked 列出 of recommendations with $/month savings estimates and ready-to-应用ly YAML 补丁es. Covers EKS, GKE, and AKS specifics including instance pricing, savings plans, committed-use discounts, and reservation strategies. Use when asked to cut a Kubernetes cloud bill, right-size workloads, plan a spot 迁移, build a FinOps 报告, or 调优 HPA 设置tings. Triggers on "kubernetes cost", "k8s cost", "eks cost", "gke cost", "aks cost", "right-size", "rightsize", "kubecost", "opencost", "vpa", "hpa", "spot instances", "preemptible", "savings plan", "node pool", "pod 请求s", "finops".

0· 154·0 当前·0 累计

by @charlie-morrison·MIT-0

开发工具代码生成测试工具数据分析数据可视化

下载技能包

License

MIT-0

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

无特殊依赖

安装命令

点击复制

官方npx clawhub@latest install k8s-cost-optimizer

镜像加速npx clawhub@latest install k8s-cost-optimizer --registry https://cn.longxiaskill.com 镜像可用

需要定制？告诉我你的需求 →

技能文档

Kubernetes Cost 优化器

审计 a Kubernetes cluster (or fleet) and produce a ranked 列出 of cost-saving actions with concrete dollar estimates. Looks at 请求s/limits vs actual usage, idle workloads, expensive node types, missing autoscaling, public LBs, oversized PVs, and unused capacity. Acts as a senior FinOps engineer who has cut six- and seven-figure cloud bills without breaking workloads.

Usage

Invoke this 技能 when a Kubernetes bill is too high, when a quarterly FinOps review is due, or when leadership has asked for "30% off the cloud."

Basic invocation:

审计 my EKS cluster for cost savings Cut my GKE bill — here's kubectl top + node 列出 What's the highest-ROI optimization I can ship this week?

With 上下文:

Here's 指标-server data for 30 days, the node 列出, and the AWS bill I have 14 namespaces — which ones are idle? We're 100% on-demand m5 nodes — what's the spot 迁移 plan?

The 代理 produces a ranked recommendation 列出 (highest $/month savings first), per-recommendation YAML 补丁es or commands, and a four-week implementation plan that respects production safety.

How It Works Step 1: Data Collection

Cost optimization without data is guesswork. The 代理 collects from four sources and joins them:

Source What It Provides How To Pull kubectl + 指标-server Real CPU/memory usage per pod, per node kubectl top pods -A, kubectl top nodes kube-状态-指标 / Prometheus 请求s, limits, replicas, 部署ment-level 历史 PromQL: kube_pod_contAIner_resource_请求s, 30-day window Cloud billing $/node-hour, instance type, region, sustAIned-use AWS Cost 资源管理器, GCP billing 导出, Azure Cost Management Cluster object inventory Namespaces, 服务s, PVCs, ingress, jobs, cronjobs kubectl 获取 all,pvc,svc -A -o json

Data window matters. The 代理 prefers 30 days; 7 days for fast-moving clusters; 90 days for capacity planning. Anything under 7 days is too short — diurnal and weekly patterns dominate the noise.

If Kubecost or OpenCost is 安装ed, the 代理 uses the cluster's per-namespace cost allocation directly. Otherwise it computes allocations from node price × pod-分享-of-node.

Step 2: The Cost Recommendation Cata记录

The 代理运行s the cluster agAInst a fixed 设置 of recommendation types, each with a 检测ion rule and a savings formula.

C1. Overprovisioned CPU 请求s

检测ion: for each contAIner, p99(cpu_usage over 30d) < 0.50 cpu_请求 AND contAIner has >7 days of data AND 部署ment is not a known-bursty type (cron, batch, init)

Savings estimate: ($/cpu-hour for the node pool) × (请求 - p99usage) × 24 × 30 × replicas

Action: 补丁 contAIner.resources.请求s.cpu down to ceil(p95 × 1.3)

C2. Overprovisioned memory 请求s

检测ion: p99(memory_working_设置 over 30d) < 0.50 memory_请求

Savings: ($/GiB-hour for the node pool) × (请求 - p99usage) × 24 × 30 × replicas

Action: 补丁 contAIner.resources.请求s.memory down to ceil(p99 × 1.25) NOTE: never 设置请求s below working-设置-p99 — OOMKills kill the savings

C3. Limits == 请求s (no burst)

检测ion: cpu_limit == cpu_请求 for 状态less workloads (typical anti-pattern: "treat limits as guaranteed quota")

Savings: None directly — but C1 dominates after limits are unblocked

Action: rAIse limits or 移除 (for cpu); keep limits for memory

C4. Idle namespace

检测ion: sum(p95 cpu over 30d) across all pods in ns < 0.05 cores AND sum(p95 memory) < 200 MiB AND no recent kubectl 应用ly (last_modified > 30 days)

Savings: All allocated capacity (请求 × node $)

Action: warn → tag → 归档 (Helm release 删除d, namespace 归档d)

C5. Idle 部署ment / 状态ful设置

检测ion: replicas > 0 AND p99(cpu) < 0.02 cores AND 请求_count == 0 over 30d (请求_count from ingress-控制器 or 服务 mesh)

Savings: replicas × pod_cost / month

Action: 扩展 to zero (KEDA cron, or just kubectl 扩展 --replicas=0)

C6. Oversized PersistentVolume

检测ion: for each PVC, kubelet_volume_stats_used / capacity < 0.3 AND age > 30 days

Savings: ($/GB-month for storage class) × (capacity - used × 1.5)

Action: - On EKS gp3: shrink not supported. 迁移 via snapshot → smaller PV. - On GKE pd-balanced: same — snapshot 迁移. - On AKS managed-disks: same. Plan downtime.

C7. Unused LoadBalancer 服务

检测ion: 服务 type=LoadBalancer AND no NetworkPolicy hits AND no ingress traffic in 30d (cloud LB 指标)

Savings: AWS NLB: ~$22/mo + $0.006/LCU-hr → $25-50/mo typical GCP LB: ~$18/mo per forwarding rule Azure LB: ~$25/mo standard tier

Action: 删除服务 or convert to ClusterIP behind a 分享d ingress

C8. Expensive node type

检测ion: Node pool uses x86 on a workload that's arch-independent AND no GPU/specialized requirement AND newer-gen / Graviton / Tau alternative is cheaper per CPU-hour

Savings: AWS: m5 → m7g (Graviton) ~20% cheaper, similar perf GCP: n2 → t2d (Tau AMD) ~28%

License

运行时依赖

安装命令

技能文档

相关技能推荐