package Ollama Manager — Ollama工具
v1.3.1[AI辅助] Manage Ollama models across your machines — see what's loaded, what's eating disk, what's never used, and what you should pull next. Get AI-powered recommend...
详细分析 ▾
运行时依赖
版本
Cross-platform support: macOS, Linux, and Windows. Updated OS metadata, descriptions, and hardware recommendations.
安装命令 点击复制
技能文档
You're helping someone wrangle their Ollama models. They've got Ollama models scattered across machines — some Ollama models loaded, some sitting cold on disk, some they forgot they pulled via Ollama six months ago. This skill gives you the tools to see every Ollama model, clean up the mess, and figure out what Ollama models they actually need.
Ollama problem
Ollama makes it too easy to pull models. ollama pull this, ollama pull that — suddenly you've got 200GB of Ollama models across three machines and no idea which Ollama models you actually use. No way to see Ollama disk usage across machines. No way to compare which Ollama model is faster on which hardware. No "hey, you haven't touched this 40GB Ollama model in two weeks, maybe delete it?"
That's what Ollama Manager is for.
获取 started 带有 Ollama Manager
pip install ollama-herd # install the Ollama management toolkit
herd # start the Ollama router (tracks all your Ollama machines)
herd-node # run on each Ollama machine you want to manage
Package: ollama-herd | Repo: github.com/geeks-accelerator/ollama-herd
Connect 到 Ollama fleet
The Ollama manager talks to an Ollama Herd router at http://localhost:11435. This router already knows about all your Ollama machines — it tracks heartbeats, loaded Ollama models, disk usage, and Ollama performance history.
See 什么 Ollama models 您've got
Every Ollama 模型 可用 穿过 所有 machines
# ollama_all_models — list every Ollama model on every node
curl -s http://localhost:11435/api/tags | python3 -m json.tool
Shows every Ollama model on every machine with sizes and which nodes have them.
什么 Ollama models actually loaded 在...中 GPU memory right 现在
# ollama_hot_models — Ollama models ready to serve instantly
curl -s http://localhost:11435/api/ps | python3 -m json.tool
These are the "hot" Ollama models — ready to serve instantly. Everything else is cold on disk and needs Ollama loading time.
Per-machine Ollama breakdown 带有 disk usage
# ollama_disk_usage — per-node Ollama model sizes
curl -s http://localhost:11435/dashboard/api/model-management | python3 -m json.tool
The real picture: Ollama model sizes, last-used timestamps, which machines have which Ollama models, and how much disk each is eating.
Figure out 什么 Ollama models 到 keep
哪个 Ollama models actually 获取 used?
sqlite3 ~/.fleet-manager/latency.db "SELECT model, COUNT() as requests, SUM(COALESCE(completion_tokens,0)) as tokens_generated, ROUND(AVG(latency_ms)/1000.0, 1) as avg_secs FROM request_traces WHERE status='completed' GROUP BY model ORDER BY requests DESC"
哪个 Ollama models haven't 已 touched?
sqlite3 ~/.fleet-manager/latency.db "SELECT model, MAX(datetime(timestamp, 'unixepoch', 'localtime')) as last_used, COUNT() as total_requests FROM request_traces GROUP BY model ORDER BY last_used ASC"
If an Ollama model's last request was weeks ago, it's a candidate for deletion.
如何 much disk 每个 Ollama 模型 使用?
curl -s http://localhost:11435/dashboard/api/model-management | python3 -c "
import sys, json
data = json.load(sys.stdin)
for node in data:
print(f\"\\n{node['node_id']}:\")
ollama_total = 0
for m in node.get('models', []):
size = m.get('size_gb', 0)
ollama_total += size
print(f\" {m['name']:40s} {size:6.1f} GB\")
print(f\" {'OLLAMA TOTAL':40s} {ollama_total:6.1f} GB\")
"
什么 Ollama models fast 和 什么's slow?
sqlite3 ~/.fleet-manager/latency.db "SELECT model, node_id, ROUND(AVG(latency_ms)/1000.0, 1) as avg_secs, COUNT(*) as n FROM request_traces WHERE status='completed' GROUP BY model, node_id HAVING n > 5 ORDER BY avg_secs"
获取 Ollama recommendations
什么 Ollama models 应该 I running?
# ollama_recommendations — optimal Ollama model mix per node
curl -s http://localhost:11435/dashboard/api/recommendations | python3 -m json.tool
AI-powered Ollama recommendations based on your actual hardware — RAM, cores, GPU memory. Tells you which Ollama models fit, which are too big, and the optimal Ollama model mix for your machines. Includes estimated RAM requirements and Ollama benchmark data.
拉取 和 删除 Ollama models
拉取 Ollama 模型 到 specific machine
# ollama_pull — download an Ollama model to a node
curl -s -X POST http://localhost:11435/dashboard/api/pull \
-H "Content-Type: application/json" \
-d '{"model": "llama3.3:70b", "node_id": "mac-studio"}'
The Ollama router picks the machine with the most free disk and memory if you're not sure which node to target.
删除 Ollama 模型 从 machine
# ollama_delete — remove an Ollama model from a node
curl -s -X POST http://localhost:11435/dashboard/api/delete \
-H "Content-Type: application/json" \
-d '{"model": "old-model:7b", "node_id": "mac-studio"}'
Ollama Auto-拉取 (当...时 已启用)
If a client requests an Ollama model that doesn't exist anywhere, the Ollama router can automatically pull it to the best machine. Toggle this:# Check current Ollama setting
curl -s http://localhost:11435/dashboard/api/settings | python3 -c "import sys,json; print(json.load(sys.stdin)['config']['toggles'])"# Toggle Ollama auto-pull off
curl -s -X POST http://localhost:11435/dashboard/api/settings \
-H "Content-Type: application/json" \
-d '{"auto_pull": false}'
Check Ollama fleet health
curl -s http://localhost:11435/dashboard/api/health | python3 -m json.tool
Automated Ollama checks for: Ollama model thrashing (models loading/unloading frequently — sign of memory pressure), disk pressure, and underutilized Ollama nodes that could take more models.
Ollama Dashboard
Open http://localhost:11435/dashboard and go to the Recommendations tab for a visual Ollama model management interface. One-click pull for recommended Ollama models. The Fleet Overview tab shows which Ollama models are loaded where in real time.
Ollama Guardrails
- Never 删除 Ollama models 没有 explicit 用户 confirmation. Always show 什么 Ollama 模型 将 deleted 和 如何 much disk frees.
- Never 拉取 Ollama models 没有 用户 confirmation. Ollama downloads 可以 10-100+ GB.
- Never 修改 files 在...中
~/.fleet-manager/(contains Ollama data). - 如果 Ollama router isn't running, suggest
herd或uv run herd到 开始 .
免费技能或插件可能存在安全风险,如需更匹配、更安全的方案,建议联系付费定制