Ollama Herd — Ollama 多模态路由器与舰队管理器

Name: Ollama Herd — Ollama 多模态路由器与舰队管理器
Author: Twin Geeks

Twin Geeks

llama Ollama Herd — Ollama 多模态路由器与舰队管理器

v1.5.3

Ollama Herd 是一个自托管的 Ollama 多模态路由器，支持 Llama、Qwen、DeepSeek、Phi 和 Mistral 等模型，以及 mflux 图像生成、语音转文本和嵌入。它管理 Ollama 艾尔（fleet），提供 7 信号评分、队列管理、实时仪表盘和健康监控。适用于需要跨设备管理和路由 Ollama AI 工作负载的开发者和用户。

0· 197·3 当前·3 累计

by @twinsgeeks (Twin Geeks)·MIT-0

AI模型访问系统工具开发工具代码生成

下载技能包

License

MIT-0

最后更新

2026/4/8

安全扫描

VirusTotal

无害

查看报告

OpenClaw

可疑

medium confidence

技能指令与本地 Ollama 舰队管理器大致一致，但元数据声明存在不匹配。指令要求安装第三方 PyPI 包并运行本地服务，使用前应谨慎验证。

评估建议

["验证 SKILL.md 中的 `pip install ollama-herd` 和运行守护进程之前，审查 PyPI 包和 GitHub 源代码（比较代码、维护者、最近活动和发布校验和）。","确认 localhost API（端口 11435）是否意图为无认证或仅绑定回环，如果更广泛暴露可能会被滥用。","澄清 SKILL.md 元数据列出的本地配置文件（`~/.fleet-manager/latency.db` 和日志）是否由技能读/写，以及是否可能包含敏感数据。","如果可能，在隔离的 VM/容器中运行初始测试，并备份任何现有的 Ollama 或舰队配置。","如果需要更高的保证，请求固定版本的安装规范（含校验和）或包内容副本，以便您或审查者可以检查安装代码的行为（网络调用、文件 I/O、子进程）。"]...

详细分析 ▾

ℹ 用途与能力

名称/描述（Ollama 多模态路由器/舰队管理器）与 SKILL.md 内容一致，文档中包含舰队状态、模型拉取、健康检查等端点。要求使用 curl/wget 和可选的 python/pip/sqlite3 对于基于 CLI/HTTP 的本地管理器是合理的。然而，注册表摘要早前列出无需配置路径，而 SKILL.md 元数据声明了配置路径（`~/.fleet-manager/latency.db` 和 `~/.fleet-manager/logs/herd.jsonl`），这种不一致未解释，暗示技能期望访问未在注册表中声明的本地文件。

⚠ 指令范围

SKILL.md 指示代理和用户安装 PyPI 包、运行 herd 和 herd-node，并调用许多 localhost:11435 端点（GET/POST）以更改路由设置和管理模型。这些操作与舰队管理器一致，但很强大：安装软件、启动服务、执行模型拉取/删除。指令引用元数据中的本地配置/日志路径，但从未明确显示读取它们；这引发了范围问题（代理是否会读取或修改这些文件？）。此外，指令假设未经认证或仅本地的 HTTP API 在端口 11435 上，但未指定 API 的安全模型。

⚠ 安装机制

注册表中没有正式的安装规范（仅指令），但 SKILL.md 指示用户/代理从 PyPI 运行 `pip install ollama-herd`。安装第三方 PyPI 包并运行其守护进程是一个中等风险的操作——它在主机上执行上游代码。由于注册表提供了无固定版本、无校验和或本地包捆绑，因此您应该在安装之前验证 PyPI 包和上游源。

ℹ 凭证需求

技能不请求环境变量或凭据，这是合理的。然而，它暗示访问本地端口和文件（元数据 configPaths）。注册表中未声明的配置路径与 SKILL.md 元数据中的存在之间的不一致可能会隐藏文件访问期望。没有请求凭据，但调用端点可以影响系统状态（拉取/删除模型），因此本地进程权限是必需的。

✓ 持久化与权限

技能不强制启用（始终：false）并允许正常的自主调用。它不请求修改其他技能或系统范围的代理设置在注册表中。主要的持久性/权限问题是操作性的：安装包并运行 `herd-node` 将创建长时间运行的服务和打开本地端口——适合舰队管理器，但用户必须选择加入。

安全有层次，运行前请审查代码。

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

🖥️ OSmacOS · Linux · Windows

版本

latestv1.5.32026/3/24

添加 `/api/pull` 端点文档和示例

● 无害

安装命令点击复制

官方npx clawhub@latest install ollama-herd

镜像加速npx clawhub@latest install ollama-herd --registry https://cn.clawhub-mirror.com

技能文档

您正在管理一个 Ollama 艾尔 — 一个智能的 Ollama 多模态路由器，跨多设备分布 Ollama AI 工作负载。Ollama Herd 处理 4 种模型类型：Ollama LLM 推理、图像生成（mflux）、语音转文本（Qwen3-ASR）和 Ollama 嵌入。Ollama 评分引擎根据 7 个信号（热态、内存适配、队列深度、延迟历史、角色亲和性、可用性趋势、上下文适配）评估节点，并将每个 Ollama 请求路由到最佳设备。...（以下内容与原文相同，未翻译以保持代码块不变）

You are managing an Ollama Herd fleet — a smart Ollama multimodal router that distributes Ollama AI workloads across multiple devices. Ollama Herd handles 4 model types: Ollama LLM inference, image generation (mflux), speech-to-text (Qwen3-ASR), and Ollama embeddings. The Ollama scoring engine evaluates nodes on 7 signals (thermal state, memory fit, queue depth, latency history, role affinity, availability trend, context fit) and routes each Ollama request to the optimal device.

Install Ollama Herd

pip install ollama-herd          # install Ollama Herd from PyPI
herd                             # start the Ollama router
herd-node                        # start an Ollama node agent (run on each device)

PyPI: ollama-herd | Source: github.com/geeks-accelerator/ollama-herd

Ollama Router endpoint

The Ollama Herd router runs at http://localhost:11435 by default. If the user has specified a different Ollama URL, use that instead.

Ollama API endpoints

Use curl to interact with the Ollama fleet:

Ollama fleet status — overview of all Ollama nodes and queues

# ollama_fleet_status — check Ollama node health
curl -s http://localhost:11435/fleet/status | python3 -m json.tool

Returns:

fleet.nodes_total / fleet.nodes_online — how many Ollama devices are in the fleet
fleet.models_loaded — total Ollama models currently loaded across all nodes
fleet.requests_active — total in-flight Ollama requests
nodes[] — per-node details: Ollama status, hardware, memory, CPU, disk, loaded Ollama models with context lengths
queues — per Ollama node:model queue depths (pending, in-flight, done, failed)

List all Ollama models available across the fleet

# ollama_model_list — all Ollama models on all nodes
curl -s http://localhost:11435/api/tags | python3 -m json.tool

Pull an Ollama model onto the fleet

# ollama_pull_model — pull a model (auto-selects best node, streams progress)
curl -N http://localhost:11435/api/pull -d '{"name": "codestral"}'
# pull to a specific node
curl -N http://localhost:11435/api/pull -d '{"name": "llama3.3:70b", "node_id": "mac-studio"}'# non-streaming (blocks until complete)
curl http://localhost:11435/api/pull -d '{"name": "phi4", "stream": false}'

List Ollama models currently loaded in memory

# ollama_loaded_models — hot Ollama models in GPU memory
curl -s http://localhost:11435/api/ps | python3 -m json.tool

OpenAI-compatible Ollama model list

curl -s http://localhost:11435/v1/models | python3 -m json.tool

Ollama usage statistics (per-node, per-model daily aggregates)

curl -s http://localhost:11435/dashboard/api/usage | python3 -m json.tool

Recent Ollama request traces

# ollama_traces — recent Ollama routing decisions
curl -s "http://localhost:11435/dashboard/api/traces?limit=20" | python3 -m json.tool

Returns the last N Ollama routing decisions with: model requested, node selected, score, latency, tokens, retry/fallback status, tags.

Ollama fleet health analysis

curl -s http://localhost:11435/dashboard/api/health | python3 -m json.tool

Returns 15 automated Ollama health checks: offline/degraded nodes, memory pressure, underutilized nodes, VRAM fallbacks, KV cache bloat (OLLAMA_NUM_PARALLEL too high), version mismatch, context protection, zombie reaper, Ollama model thrashing, request timeouts, error rates, retry rates, client disconnects, and incomplete streams.

Ollama model recommendations

curl -s http://localhost:11435/dashboard/api/recommendations | python3 -m json.tool

Returns AI-powered Ollama model mix recommendations per node based on hardware capabilities, Ollama usage patterns, and curated benchmark data.

Ollama settings

# View current Ollama config and node versions curl -s http://localhost:11435/dashboard/api/settings | python3 -m json.tool

# Toggle Ollama runtime settings (auto_pull, vram_fallback) curl -s -X POST http://localhost:11435/dashboard/api/settings \ -H "Content-Type: application/json" \ -d '{"auto_pull": false}'

Ollama model management

# View per-node Ollama model details with sizes and usage curl -s http://localhost:11435/dashboard/api/model-management | python3 -m json.tool # Pull an Ollama model onto a specific node curl -s -X POST http://localhost:11435/dashboard/api/pull \ -H "Content-Type: application/json" \ -d '{"model": "llama3.3:70b", "node_id": "mac-studio"}'

# Delete an Ollama model from a specific node curl -s -X POST http://localhost:11435/dashboard/api/delete \ -H "Content-Type: application/json" \ -d '{"model": "old-model:7b", "node_id": "mac-studio"}'

Ollama model insights (summary statistics)

curl -s http://localhost:11435/dashboard/api/models | python3 -m json.tool

Per-app Ollama analytics (requires request tagging)

curl -s http://localhost:11435/dashboard/api/apps | python3 -m json.tool

Ollama Dashboard

The Ollama web dashboard is at http://localhost:11435/dashboard. It has eight tabs:

Fleet Overview — live Ollama node cards, queue depths, and request counts via SSE
Trends — Ollama requests per hour, average latency, and token throughput charts (24h–7d)
Model Insights — per-Ollama-model latency, tokens/sec, usage comparison
Apps — per-tag Ollama analytics with request volume, latency, tokens, error rates
Benchmarks — Ollama capacity growth over time with per-run throughput and latency percentiles
Health — 15 automated Ollama fleet health checks with severity levels
Recommendations — Ollama model mix recommendations per node with one-click pull
Settings — Ollama runtime toggle switches, read-only config tables, and node version tracking

Direct the user to open this URL in their browser for visual Ollama monitoring.

Ollama Resilience features

Auto-retry — if an Ollama node fails before the first response chunk, re-scores and retries on the next-best Ollama node (up to 2 retries)
Ollama model fallbacks — clients specify backup Ollama models; tries alternatives when the primary is unavailable
Context protection — strips num_ctx from Ollama requests when unnecessary to prevent Ollama model reload hangs; auto-upgrades to a larger loaded model
VRAM-aware fallback — routes to an already-loaded Ollama model in the same category instead of cold-loading
Zombie reaper — background task detects and cleans up stuck in-flight Ollama requests
Auto-pull — automatically pulls missing Ollama models onto the best available node

Common Ollama tasks

Check if the Ollama fleet is healthy

Hit /fleet/status and verify nodes_online > 0
Hit /dashboard/api/health for automated Ollama health checks with severity levels
Look at Ollama queue depths — deep queues may indicate a bottleneck

Find which Ollama node has a specific model

Hit /fleet/status and inspect each Ollama node's ollama.models_loaded and ollama.models_available
Or hit /api/tags for a flat list of all available Ollama models with which nodes have them

Check if an Ollama model is loaded (hot) or cold

Hit /api/ps — Ollama models listed here are currently loaded in memory (hot)
Models in /api/tags but not in /api/ps are on disk but not loaded (cold)

View recent Ollama inference activity

Hit /dashboard/api/traces?limit=10 to see the last 10 Ollama requests
Each trace shows: Ollama model, node, score, latency, tokens, retry/fallback status

Diagnose slow Ollama responses

Check /dashboard/api/traces for high latency Ollama entries
Check /fleet/status for Ollama nodes with high queue depths or memory pressure
Check if the Ollama model had to cold-load (look for low scores in trace)
Check if num_ctx is being sent — Ollama context protection logs show if requests triggered reloads

Query the Ollama trace database directly

# Recent Ollama failures
sqlite3 ~/.fleet-manager/latency.db "SELECT request_id, model, status, error_message FROM request_traces WHERE status='failed' ORDER BY timestamp DESC LIMIT 10"# Slowest Ollama requests
sqlite3 ~/.fleet-manager/latency.db "SELECT model, node_id, latency_ms/1000.0 as secs FROM request_traces WHERE status='completed' ORDER BY latency_ms DESC LIMIT 10"

Test Ollama inference through the fleet

# Ollama via OpenAI format
curl -s http://localhost:11435/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"llama3.3:70b","messages":[{"role":"user","content":"Hello from Ollama"}],"stream":false}'# Ollama native format
curl -s http://localhost:11435/api/chat \
  -d '{"model":"llama3.3:70b","messages":[{"role":"user","content":"Hello from Ollama"}],"stream":false}'

Ollama Guardrails

Never restart or stop the Ollama Herd router or Ollama node agents without explicit user confirmation.
Never delete or modify files in ~/.fleet-manager/ (contains Ollama latency data, traces, and logs).
Do not pull Ollama models onto nodes without user confirmation — Ollama model downloads can be large (10-100+ GB).
Do not delete Ollama models without user confirmation.
If an Ollama node shows as offline, report it to the user rather than attempting to SSH into the machine.

Ollama Failure handling

If curl to the Ollama router fails with connection refused, tell the user the Ollama Herd router may not be running and suggest herd to start it.
If the Ollama fleet status shows 0 nodes online, suggest starting Ollama node agents with herd-node on their devices.
If Ollama mDNS discovery fails, suggest using --router-url http://router-ip:11435 for explicit connection.
If Ollama requests hang with 0 bytes returned, check if the client is sending num_ctx — Ollama context protection should strip it.
If a specific Ollama API endpoint returns an error, show the user the full error response and suggest checking the Ollama JSONL logs at ~/.fleet-manager/logs/herd.jsonl.

数据来源：ClawHub ↗ · 中文优化：龙虾技能库

OpenClaw 技能定制 / 插件定制 / 私有工作流定制

免费技能或插件可能存在安全风险，如需更匹配、更安全的方案，建议联系付费定制

了解定制服务

License

运行时依赖

版本

安装命令 点击复制

技能文档

Install Ollama Herd

Ollama Router endpoint

Ollama API endpoints

Ollama fleet status — overview of all Ollama nodes and queues

List all Ollama models available across the fleet

Pull an Ollama model onto the fleet

List Ollama models currently loaded in memory

OpenAI-compatible Ollama model list

Ollama usage statistics (per-node, per-model daily aggregates)

Recent Ollama request traces

Ollama fleet health analysis

Ollama model recommendations

Ollama settings

Ollama model management

Ollama model insights (summary statistics)

Per-app Ollama analytics (requires request tagging)

Ollama Dashboard

Ollama Resilience features

Common Ollama tasks

Check if the Ollama fleet is healthy

Find which Ollama node has a specific model

Check if an Ollama model is loaded (hot) or cold

View recent Ollama inference activity

Diagnose slow Ollama responses

Query the Ollama trace database directly

Test Ollama inference through the fleet

Ollama Guardrails

Ollama Failure handling

安装命令点击复制