Smart Model Routing for Z.AI — 智能 Z.AI 模型路由

T

Smart Model Routing for Z.AI — 智能 Z.AI 模型路由

v1.0.0

自动将任务路由到最便宜的有效 z.ai (GLM) 模型。采用三层进阶策略：Flash → Standard → Plus/32B，根据任务复杂度智能升级，降低 API 成本。

1· 1,230·2 当前·3 累计·💬 2

by @princnl (T)·MIT-0

AI模型访问 API工具自动化数据分析文件处理

下载技能包

License

MIT-0

最后更新

2026/2/26

安全扫描

VirusTotal

无害

查看报告

OpenClaw

安全

high confidence

此技能为指令式路由策略，不需要凭据、不安装任何内容，指令与目的相符，仅提供 z.ai (GLM) 模型（Flash → Standard → Plus/32B）选择规则，不自行调用 z.ai 或产生费用。

评估建议

本技能为策略文档（路由启发式），不直接调用 z.ai 或产生费用，也不请求任何秘密。使用前请确认您的代理平台支持模型选择（sessions_spawn 或等效）并尊重这些规则。建议使用非关键任务测试升级规则，避免不必要的成本增加。如需自动调用 z.ai 模型，请确保其他地方有适当的平台配置和凭据。监控日志，设置预算/超时防护，防止自动升级或重试导致意外的 API 费用。...

详细分析 ▾

ℹ 用途与能力

名称和描述（自动路由到最便宜的工作 GLM 模型）与 SKILL.md 指南匹配。然而，该技能仅为指令文档 — 包含规则和示例 session_spawn 调用，但无代码、无集成、无凭据。这意味着它无法自己联系 z.ai 或强制路由；它仅告诉代理如何决定选择哪个模型。这是一致的，但重要的是要理解。

✓ 指令范围

SKILL.md 仅限于模型选择的分类和升级规则，包括示例用法（sessions_spawn）。它不指示读取无关文件、发送数据到外部端点或收集秘密。指南假设代理/平台提供 sessions_spawn 能能。

✓ 安装机制

无安装规格和代码文件。减少磁盘写入和第三方安装；降低风险，与指令仅限策略一致。

✓ 凭证需求

未请求环境变量、凭据或配置路径。与指令仅限路由策略成比例。

✓ 持久化与权限

技能不请求 always:true，不修改其他技能，也不保持持久权限。平台默认允许自主调用，但技能本身不需要提升存在或跨技能访问。

安全有层次，运行前请审查代码。

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

无特殊依赖

版本

latestv1.0.02026/2/9

smart-model-routing-for-zai 1.0.0 引入了针对 z.ai (GLM) 的自动升级模型路由策略，实现成本有效的准确任务处理： - 实现三层任务路由：Flash → Standard → Plus/32B，从最便宜的模型开始，仅在必要时升级。 - 提供清晰的规则和决策树，指导基于任务复杂度和所需推理时升级。 - 包含详细示例和快速参考指南，用于选择正确的模型层。 - 旨在在确保正确和高效响应广泛用户任务的同时，尽量减少 API 成本。

● 无害

安装命令点击复制

官方npx clawhub@latest install smart-model-routing-for-zai

镜像加速npx clawhub@latest install smart-model-routing-for-zai --registry https://cn.clawhub-mirror.com

技能文档

三层 z.ai (GLM) 路由：Flash → Standard → Plus / 32B

从最便宜的模型开始。仅在需要时升级。旨在最大限度地降低 API 成本，同时不牺牲正确性。

黄金法则

如果人类需要超过 30 秒的专注思考，从 Flash 升级到 Standard。
如果任务涉及架构、复杂权衡或深度推理，升级到 Plus / 32B。

模型实际用途（相对）

层级	示例模型	用途
Flash	GLM-4.5-Flash, GLM-4.7-Flash	最快且最便宜
Standard	GLM-4.6, GLM-4.7	强大的推理与代码
Plus / 32B	GLM-4-Plus, GLM-4-32B-128K	深度推理与架构

底线： 错误的模型选择会浪费金钱或时间。简单任务用 Flash，正常工作用 Standard，复杂决策用 Plus/32B。

💚 FLASH — 简单任务的默认选择

保持使用 Flash：

事实问答 — "X是什么"、"Y是谁"、"Z是什么时候"
快速查找 — 定义、单位转换、简短翻译
状态检查 — 监控、文件读取、会话状态
心跳检测 — 定期检查、OK响应
记忆与提醒
随意对话 — 问候、确认
简单文件操作 — 读取、列出、基本写入
单行任务 — 任何能用1-2句话回答的问题
Cron 任务（默认始终使用 Flash）

切勿在 Flash 上执行这些操作

❌ 编写超过10行的代码
❌ 创建对比表格
❌ 编写超过3段的内容
❌ 进行多步骤分析
❌ 编写报告或提案

💛 STANDARD — 核心主力

升级到 Standard：

代码与技术

代码生成 — 函数、脚本、功能
调试 — 常规 bug 调查
代码审查 — PR、重构
文档 — README、注释、指南

分析与规划

比较与评估
规划 — 路线图、任务分解
研究综合
多步骤推理

写作与内容

长篇写作（>3段）
长文档摘要
结构化输出 — 表格、大纲

大多数真实用户对话属于此类。

❤️ PLUS / 32B — 仅用于复杂推理

升级到 Plus / 32B：

架构与设计

系统和服务架构
数据库模式设计
分布式或多租户系统
跨多个文件的重大重构

深度分析

复杂调试（竞态条件、微妙 bug）
安全审查
性能优化策略
根本原因分析

战略与判断性工作

战略规划
细微判断和模糊性处理
深度或多来源研究
关键生产决策

🔄 实现

对于子代理

// 常规监控
sessions_spawn(task="检查备份状态", model="GLM-4.5-Flash")
// 标准代码工作
sessions_spawn(task="构建 REST API 端点", model="GLM-4.7")// 架构决策
sessions_spawn(task="为多租户设计数据库模式", model="GLM-4-Plus")

对于 Cron 任务

{
  "payload": {
    "kind": "agentTurn",
    "model": "GLM-4.5-Flash"
  }
}

除非任务真正需要推理，否则 Cron 任务始终使用 Flash。

📊 快速决策树

这是问候、查找、状态检查还是1-2句话的回答？是 → FLASH 否 ↓ 这是代码、分析、规划、写作还是多步骤任务？是 → STANDARD 否 ↓

这是架构、深度推理还是关键决策？是 → PLUS / 32B 否 → 默认使用 STANDARD，如遇困难则升级

📋 快速参考卡

┌─────────────────────────────────────────────────────────────┐
│                  智能模型切换                               │
│              Flash → Standard → Plus / 32B                  │
├─────────────────────────────────────────────────────────────┤
│  💚 FLASH (最便宜)                                         │
│  • 问候、状态检查、快速查找                                 │
│  • 事实问答、提醒                                          │
│  • 简单文件操作、1-2句回答                                  │
├─────────────────────────────────────────────────────────────┤
│  💛 STANDARD (主力)                                         │
│  • 代码>10行、调试                                          │
│  • 分析、比较、规划                                          │
│  • 报告、长篇写作                                          │
├─────────────────────────────────────────────────────────────┤
│  ❤️ PLUS / 32B (复杂)                                      │
│  • 架构决策                                                │
│  • 复杂调试、多文件重构                                     │
│  • 战略规划、深度研究                                       │
├─────────────────────────────────────────────────────────────┤
│  💡 规则：超过30秒人类思考 → 升级                            │
│  💰 从便宜开始 → 仅在需要时扩展                              │
└─────────────────────────────────────────────────────────────┘

专为 z.ai (GLM) 设置构建。

Three-tier z.ai (GLM) routing: Flash → Standard → Plus / 32B

Start with the cheapest model. Escalate only when needed. Designed to minimize API cost without sacrificing correctness.

The Golden Rule

If a human would need more than 30 seconds of focused thinking, escalate from Flash to Standard.
If the task involves architecture, complex tradeoffs, or deep reasoning, escalate to Plus / 32B.

Model Reality (Relative)

Tier	Example Models	Purpose
Flash	GLM-4.5-Flash, GLM-4.7-Flash	Fastest & cheapest
Standard	GLM-4.6, GLM-4.7	Strong reasoning & code
Plus / 32B	GLM-4-Plus, GLM-4-32B-128K	Heavy reasoning & architecture

Bottom line: Wrong model selection wastes money OR time. Flash for simple, Standard for normal work, Plus/32B for complex decisions.

💚 FLASH — Default for Simple Tasks

Stay on Flash for:

Factual Q&A — “what is X”, “who is Y”, “when did Z”
Quick lookups — definitions, unit conversions, short translations
Status checks — monitoring, file reads, session state
Heartbeats — periodic checks, OK responses
Memory & reminders
Casual conversation — greetings, acknowledgments
Simple file ops — read, list, basic writes
One-liner tasks — anything answerable in 1–2 sentences
Cron jobs (always Flash by default)

NEVER do these on Flash

❌ Write code longer than 10 lines
❌ Create comparison tables
❌ Write more than 3 paragraphs
❌ Do multi-step analysis
❌ Write reports or proposals

💛 STANDARD — Core Workhorse

Escalate to Standard for:

Code & Technical

Code generation — functions, scripts, features
Debugging — normal bug investigation
Code review — PRs, refactors
Documentation — README, comments, guides

Analysis & Planning

Comparisons and evaluations
Planning — roadmaps, task breakdowns
Research synthesis
Multi-step reasoning

Writing & Content

Long-form writing (>3 paragraphs)
Summaries of long documents
Structured output — tables, outlines

Most real user conversations belong here.

❤️ PLUS / 32B — Complex Reasoning Only

Escalate to Plus / 32B for:

Architecture & Design

System and service architecture
Database schema design
Distributed or multi-tenant systems
Major refactors across multiple files

Deep Analysis

Complex debugging (race conditions, subtle bugs)
Security reviews
Performance optimization strategy
Root cause analysis

Strategic & Judgment-Based Work

Strategic planning
Nuanced judgment and ambiguity
Deep or multi-source research
Critical production decisions

🔄 Implementation

For Subagents

```javascript // Routine monitoring sessions_spawn(task="Check backup status", model="GLM-4.5-Flash")

// Standard code work sessions_spawn(task="Build the REST API endpoint", model="GLM-4.7")

// Architecture decisions sessions_spawn(task="Design the database schema for multi-tenancy", model="GLM-4-Plus") For Cron Jobs json Copy code { "payload": { "kind": "agentTurn", "model": "GLM-4.5-Flash" } } Always use Flash for cron unless the task genuinely needs reasoning.

📊 Quick Decision Tree pgsql Copy code Is it a greeting, lookup, status check, or 1–2 sentence answer? YES → FLASH NO ↓

Is it code, analysis, planning, writing, or multi-step? YES → STANDARD NO ↓

Is it architecture, deep reasoning, or a critical decision? YES → PLUS / 32B NO → Default to STANDARD, escalate if struggling 📋 Quick Reference Card less Copy code ┌─────────────────────────────────────────────────────────────┐ │ SMART MODEL SWITCHING │ │ Flash → Standard → Plus / 32B │ ├─────────────────────────────────────────────────────────────┤ │ 💚 FLASH (cheapest) │ │ • Greetings, status checks, quick lookups │ │ • Factual Q&A, reminders │ │ • Simple file ops, 1–2 sentence answers │ ├─────────────────────────────────────────────────────────────┤ │ 💛 STANDARD (workhorse) │ │ • Code > 10 lines, debugging │ │ • Analysis, comparisons, planning │ │ • Reports, long writing │ ├─────────────────────────────────────────────────────────────┤ │ ❤️ PLUS / 32B (complex) │ │ • Architecture decisions │ │ • Complex debugging, multi-file refactoring │ │ • Strategic planning, deep research │ ├─────────────────────────────────────────────────────────────┤ │ 💡 RULE: >30 sec human thinking → escalate │ │ 💰 START CHEAP → SCALE ONLY WHEN NEEDED │ └─────────────────────────────────────────────────────────────┘ Built for z.ai (GLM) setups.

数据来源：ClawHub ↗ · 中文优化：龙虾技能库

OpenClaw 技能定制 / 插件定制 / 私有工作流定制

免费技能或插件可能存在安全风险，如需更匹配、更安全的方案，建议联系付费定制

了解定制服务

License

运行时依赖

版本

安装命令 点击复制

技能文档

黄金法则

模型实际用途（相对）

💚 FLASH — 简单任务的默认选择

切勿在 Flash 上执行这些操作

💛 STANDARD — 核心主力

代码与技术

分析与规划

写作与内容

❤️ PLUS / 32B — 仅用于复杂推理

架构与设计

深度分析

战略与判断性工作

🔄 实现

对于子代理

对于 Cron 任务

📊 快速决策树

📋 快速参考卡

The Golden Rule

Model Reality (Relative)

💚 FLASH — Default for Simple Tasks

NEVER do these on Flash

💛 STANDARD — Core Workhorse

Code & Technical

Analysis & Planning

Writing & Content

❤️ PLUS / 32B — Complex Reasoning Only

Architecture & Design

Deep Analysis

Strategic & Judgment-Based Work

🔄 Implementation

For Subagents

安装命令点击复制