Speech — 技能工具

Name: Speech — 技能工具
Author: Parker

Parker

Speech — 技能工具

v0.1.0

[自动翻译] Use when the user asks for text-to-speech narration or voiceover, accessibility reads, audio prompts, or batch speech generation via the OpenAI Audio ...

0· 345·5 当前·5 累计

by @patches429 (Parker)·MIT-0

AI模型访问 API工具开发工具自动化

下载技能包

License

MIT-0

最后更新

2026/4/11

安全扫描

VirusTotal

无害

查看报告

OpenClaw

可疑

medium confidence

The skill is a coherent text-to-speech integration, but the registry metadata omits a required API credential (OPENAI_API_KEY) declared in SKILL.md and the bundled script — this mismatch should be resolved before installing.

评估建议

This appears to be a legitimate TTS skill that uses the OpenAI Audio API and a bundled Python CLI. Before installing: (1) confirm how your agent platform expects required credentials to be declared and stored — SKILL.md and the script require OPENAI_API_KEY but the registry metadata does not list it; (2) provide the API key via an environment variable or platform secret store (do not paste the key into chat); (3) review the bundled script locally (it supports --dry-run which prints payloads) and...

详细分析 ▾

⚠ 用途与能力

Name, description, SKILL.md, references, and the bundled CLI script all align: this is a TTS skill that uses the OpenAI Audio API and built-in voices. However, the registry metadata claims no required environment variables or primary credential while the runtime instructions and script require OPENAI_API_KEY — an inconsistency in declared requirements.

✓ 指令范围

SKILL.md instructs the agent to use the bundled CLI (scripts/text_to_speech.py), collect inputs, optionally write transient JSONL under tmp/, and write outputs under output/speech/. It requires an API key for live network calls and explicitly discourages pasting the key in chat. The instructions do not ask for unrelated files, additional credentials, or external endpoints beyond the OpenAI API.

✓ 安装机制

There is no install spec (instruction-only), and the one bundled script relies on the public openai Python package. Installation guidance recommends pip (or uv pip). There are no downloads from arbitrary URLs or archive extraction steps in the repo.

⚠ 凭证需求

The runtime requires OPENAI_API_KEY for live API calls (and the script checks env). The skill metadata, however, lists no required env vars or primary credential — this omission is a red flag because the agent platform may not surface or protect the API key as expected. No other unrelated credentials or sensitive config paths are requested.

✓ 持久化与权限

The skill does not request always:true, does not modify other skills' configs, and has normal ephemeral behavior (writes outputs and temporary JSONL). Autonomous invocation is allowed by default (platform normal) but not combined with other elevated privileges here.

安全有层次，运行前请审查代码。

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

无特殊依赖

版本

latestv0.1.02026/3/9

Initial release of the speech skill. - Provides text-to-speech narration, voiceover, and batch speech generation using the OpenAI Audio API and bundled CLI. - Supports single or batch audio generation workflows with clear decision logic. - Covers default voice/model selection, instruction formatting, file conventions, and dependency setup. - Enforces environment checks, API key requirements, and output organization. - Includes comprehensive instruction on user input augmentation and delivery customization. - References sample templates and modules for common use cases (narration, IVR, accessibility, etc.).

● 无害

安装命令点击复制

官方npx clawhub@latest install speech

镜像加速npx clawhub@latest install speech --registry https://cn.clawhub-mirror.com

技能文档

Generate spoken audio for the current project (narration, product demo voiceover, IVR prompts, accessibility reads). Defaults to gpt-4o-mini-tts-2025-12-15 and built-in voices, and prefers the bundled CLI for deterministic, reproducible runs.

When to use

Generate a single spoken clip from text
Generate a batch of prompts (many lines, many files)

Decision tree (single vs batch)

If the user provides multiple lines/prompts or wants many outputs -> batch
Else -> single

Workflow

Decide intent: single vs batch (see decision tree above).
Collect inputs up front: exact text (verbatim), desired voice, delivery style, format, and any constraints.
If batch: write a temporary JSONL under tmp/ (one job per line), run once, then delete the JSONL.
Augment instructions into a short labeled spec without rewriting the input text.
Run the bundled CLI (scripts/text_to_speech.py) with sensible defaults (see references/cli.md).
For important clips, validate: intelligibility, pacing, pronunciation, and adherence to constraints.
Iterate with a single targeted change (voice, speed, or instructions), then re-check.
Save/return final outputs and note the final text + instructions + flags used.

Temp and output conventions

Use tmp/speech/ for intermediate files (for example JSONL batches); delete when done.
Write final artifacts under output/speech/ when working in this repo.
Use --out or --out-dir to control output paths; keep filenames stable and descriptive.

Dependencies (install if missing)

Prefer uv for dependency management.

Python packages:

uv pip install openai

If uv is unavailable:

python3 -m pip install openai

Environment

OPENAI_API_KEY must be set for live API calls.

If the key is missing, give the user these steps:

Create an API key in the OpenAI platform UI: https://platform.openai.com/api-keys
Set OPENAI_API_KEY as an environment variable in their system.
Offer to guide them through setting the environment variable for their OS/shell if needed.

Never ask the user to paste the full key in chat. Ask them to set it locally and confirm when ready.

If installation isn't possible in this environment, tell the user which dependency is missing and how to install it locally.

Defaults & rules

Use gpt-4o-mini-tts-2025-12-15 unless the user requests another model.
Default voice: cedar. If the user wants a brighter tone, prefer marin.
Built-in voices only. Custom voices are out of scope for this skill.
instructions are supported for GPT-4o mini TTS models, but not for tts-1 or tts-1-hd.
Input length must be <= 4096 characters per request. Split longer text into chunks.
Enforce 50 requests/minute. The CLI caps --rpm at 50.
Require OPENAI_API_KEY before any live API call.
Provide a clear disclosure to end users that the voice is AI-generated.
Use the OpenAI Python SDK (openai package) for all API calls; do not use raw HTTP.
Prefer the bundled CLI (scripts/text_to_speech.py) over writing new one-off scripts.
Never modify scripts/text_to_speech.py. If something is missing, ask the user before doing anything else.

Instruction augmentation

Reformat user direction into a short, labeled spec. Only make implicit details explicit; do not invent new requirements.

Quick clarification (augmentation vs invention):

If the user says "narration for a demo", you may add implied delivery constraints (clear, steady pacing, friendly tone).
Do not introduce a new persona, accent, or emotional style the user did not request.

Template (include only relevant lines):

Voice Affect: 
Tone: 
Pacing: 
Emotion: 
Pronunciation: 
Pauses: 
Emphasis: 
Delivery:

Augmentation rules:

Keep it short; add only details the user already implied or provided elsewhere.
Do not rewrite the input text.
If any critical detail is missing and blocks success, ask a question; otherwise proceed.

Examples

Single example (narration)

Input text: "Welcome to the demo. Today we'll show how it works."
Instructions:
Voice Affect: Warm and composed.
Tone: Friendly and confident.
Pacing: Steady and moderate.
Emphasis: Stress "demo" and "show".

Batch example (IVR prompts)

{"input":"Thank you for calling. Please hold.","voice":"cedar","response_format":"mp3","out":"hold.mp3"}
{"input":"For sales, press 1. For support, press 2.","voice":"marin","instructions":"Tone: Clear and neutral. Pacing: Slow.","response_format":"wav"}

Instructioning best practices (short list)

Structure directions as: affect -> tone -> pacing -> emotion -> pronunciation/pauses -> emphasis.
Keep 4 to 8 short lines; avoid conflicting guidance.
For names/acronyms, add pronunciation hints (e.g., "enunciate A-I") or supply a phonetic spelling in the text.
For edits/iterations, repeat invariants (e.g., "keep pacing steady") to reduce drift.
Iterate with single-change follow-ups.

More principles: references/prompting.md. Copy/paste specs: references/sample-prompts.md.

Guidance by use case

Use these modules when the request is for a specific delivery style. They provide targeted defaults and templates.

Narration / explainer: references/narration.md
Product demo / voiceover: references/voiceover.md
IVR / phone prompts: references/ivr.md
Accessibility reads: references/accessibility.md

CLI + environment notes

CLI commands + examples: references/cli.md
API parameter quick reference: references/audio-api.md
Instruction patterns + examples: references/voice-directions.md
If network approvals / sandbox settings are getting in the way: references/codex-network.md

Reference map

references/cli.md: how to run speech generation/batches via scripts/text_to_speech.py (commands, flags, recipes).
references/audio-api.md: API parameters, limits, voice list.
references/voice-directions.md: instruction patterns and examples.
references/prompting.md: instruction best practices (structure, constraints, iteration patterns).
references/sample-prompts.md: copy/paste instruction recipes (examples only; no extra theory).
references/narration.md: templates + defaults for narration and explainers.
references/voiceover.md: templates + defaults for product demo voiceovers.
references/ivr.md: templates + defaults for IVR/phone prompts.
references/accessibility.md: templates + defaults for accessibility reads.
references/codex-network.md: environment/sandbox/network-approval troubleshooting.

Generate spoken audio for the current project (narration, product demo voiceover, IVR prompts, accessibility reads). Defaults to gpt-4o-mini-tts-2025-12-15 and built-in voices, and prefers the bundled CLI for deterministic, reproducible runs.

When to use

Generate a single spoken clip from text
Generate a batch of prompts (many lines, many files)

Decision tree (single vs batch)

If the user provides multiple lines/prompts or wants many outputs -> batch
Else -> single

Workflow

Decide intent: single vs batch (see decision tree above).
Collect inputs up front: exact text (verbatim), desired voice, delivery style, format, and any constraints.
If batch: write a temporary JSONL under tmp/ (one job per line), run once, then delete the JSONL.
Augment instructions into a short labeled spec without rewriting the input text.
Run the bundled CLI (scripts/text_to_speech.py) with sensible defaults (see references/cli.md).
For important clips, validate: intelligibility, pacing, pronunciation, and adherence to constraints.
Iterate with a single targeted change (voice, speed, or instructions), then re-check.
Save/return final outputs and note the final text + instructions + flags used.

Temp and output conventions

Use tmp/speech/ for intermediate files (for example JSONL batches); delete when done.
Write final artifacts under output/speech/ when working in this repo.
Use --out or --out-dir to control output paths; keep filenames stable and descriptive.

Dependencies (install if missing)

Prefer uv for dependency management.

Python packages:

uv pip install openai

If uv is unavailable:

python3 -m pip install openai

Environment

OPENAI_API_KEY must be set for live API calls.

If the key is missing, give the user these steps:

Create an API key in the OpenAI platform UI: https://platform.openai.com/api-keys
Set OPENAI_API_KEY as an environment variable in their system.
Offer to guide them through setting the environment variable for their OS/shell if needed.

Never ask the user to paste the full key in chat. Ask them to set it locally and confirm when ready.

If installation isn't possible in this environment, tell the user which dependency is missing and how to install it locally.

Defaults & rules

Use gpt-4o-mini-tts-2025-12-15 unless the user requests another model.
Default voice: cedar. If the user wants a brighter tone, prefer marin.
Built-in voices only. Custom voices are out of scope for this skill.
instructions are supported for GPT-4o mini TTS models, but not for tts-1 or tts-1-hd.
Input length must be <= 4096 characters per request. Split longer text into chunks.
Enforce 50 requests/minute. The CLI caps --rpm at 50.
Require OPENAI_API_KEY before any live API call.
Provide a clear disclosure to end users that the voice is AI-generated.
Use the OpenAI Python SDK (openai package) for all API calls; do not use raw HTTP.
Prefer the bundled CLI (scripts/text_to_speech.py) over writing new one-off scripts.
Never modify scripts/text_to_speech.py. If something is missing, ask the user before doing anything else.

Instruction augmentation

Reformat user direction into a short, labeled spec. Only make implicit details explicit; do not invent new requirements.

Quick clarification (augmentation vs invention):

If the user says "narration for a demo", you may add implied delivery constraints (clear, steady pacing, friendly tone).
Do not introduce a new persona, accent, or emotional style the user did not request.

Template (include only relevant lines):

Voice Affect: 
Tone: 
Pacing: 
Emotion: 
Pronunciation: 
Pauses: 
Emphasis: 
Delivery:

Augmentation rules:

Keep it short; add only details the user already implied or provided elsewhere.
Do not rewrite the input text.
If any critical detail is missing and blocks success, ask a question; otherwise proceed.

Examples

Single example (narration)

Input text: "Welcome to the demo. Today we'll show how it works."
Instructions:
Voice Affect: Warm and composed.
Tone: Friendly and confident.
Pacing: Steady and moderate.
Emphasis: Stress "demo" and "show".

Batch example (IVR prompts)

{"input":"Thank you for calling. Please hold.","voice":"cedar","response_format":"mp3","out":"hold.mp3"}
{"input":"For sales, press 1. For support, press 2.","voice":"marin","instructions":"Tone: Clear and neutral. Pacing: Slow.","response_format":"wav"}

Instructioning best practices (short list)

Structure directions as: affect -> tone -> pacing -> emotion -> pronunciation/pauses -> emphasis.
Keep 4 to 8 short lines; avoid conflicting guidance.
For names/acronyms, add pronunciation hints (e.g., "enunciate A-I") or supply a phonetic spelling in the text.
For edits/iterations, repeat invariants (e.g., "keep pacing steady") to reduce drift.
Iterate with single-change follow-ups.

More principles: references/prompting.md. Copy/paste specs: references/sample-prompts.md.

Guidance by use case

Use these modules when the request is for a specific delivery style. They provide targeted defaults and templates.

Narration / explainer: references/narration.md
Product demo / voiceover: references/voiceover.md
IVR / phone prompts: references/ivr.md
Accessibility reads: references/accessibility.md

CLI + environment notes

CLI commands + examples: references/cli.md
API parameter quick reference: references/audio-api.md
Instruction patterns + examples: references/voice-directions.md
If network approvals / sandbox settings are getting in the way: references/codex-network.md

Reference map

references/cli.md: how to run speech generation/batches via scripts/text_to_speech.py (commands, flags, recipes).
references/audio-api.md: API parameters, limits, voice list.
references/voice-directions.md: instruction patterns and examples.
references/prompting.md: instruction best practices (structure, constraints, iteration patterns).
references/sample-prompts.md: copy/paste instruction recipes (examples only; no extra theory).
references/narration.md: templates + defaults for narration and explainers.
references/voiceover.md: templates + defaults for product demo voiceovers.
references/ivr.md: templates + defaults for IVR/phone prompts.
references/accessibility.md: templates + defaults for accessibility reads.
references/codex-network.md: environment/sandbox/network-approval troubleshooting.

数据来源：ClawHub ↗ · 中文优化：龙虾技能库

OpenClaw 技能定制 / 插件定制 / 私有工作流定制

免费技能或插件可能存在安全风险，如需更匹配、更安全的方案，建议联系付费定制

了解定制服务

License

运行时依赖

版本

安装命令 点击复制

技能文档

When to use

Decision tree (single vs batch)

Workflow

Temp and output conventions

Dependencies (install if missing)

Environment

Defaults & rules

Instruction augmentation

Examples

Single example (narration)

Batch example (IVR prompts)

Instructioning best practices (short list)

Guidance by use case

CLI + environment notes

Reference map

When to use

Decision tree (single vs batch)

Workflow

Temp and output conventions

Dependencies (install if missing)

Environment

Defaults & rules

Instruction augmentation

Examples

Single example (narration)

Batch example (IVR prompts)

Instructioning best practices (short list)

Guidance by use case

CLI + environment notes

Reference map

安装命令点击复制