Aliyun Wan Digital Human — 技能工具

Name: Aliyun Wan Digital Human — 技能工具
Author: cinience

cinience

Aliyun Wan Digital Human — 技能工具

v1.0.0

[自动翻译] Use when generating talking, singing, or presentation videos from a single character image and audio with Alibaba Cloud Model Studio digital-human mod...

0· 59·0 当前·0 累计

by @cinience·MIT-0

AI模型访问系统工具云服务开发工具自动化

下载技能包

License

MIT-0

最后更新

2026/4/2

安全扫描

VirusTotal

无害

查看报告

OpenClaw

可疑

medium confidence

The skill appears to be a simple request-preparer for Alibaba Cloud's 'wan' digital-human models, but its instructions and metadata disagree about credential requirements and promised runtime behaviors that the included code does not perform.

评估建议

What to check before installing or using this skill: - Expectation vs. code: the included script only creates a request.json and does not call any remote API, perform detection, or poll tasks — SKILL.md describes additional runtime steps that are not implemented here. Ask: where (which component) actually sends requests to Alibaba Cloud and performs detection/polling? Obtain that code or documentation before sending real credentials or data. - Credential confusion: SKILL.md asks for DASHSCOPE_...

详细分析 ▾

⚠ 用途与能力

The name/description (generate talking/singing avatar video via Alibaba Cloud Model Studio) matches the included script which only prepares a JSON request payload. However the SKILL.md also demands a DASHSCOPE_API_KEY or an entry in ~/.alibabacloud/credentials even though the repository metadata declares no required env vars and the included script does not read credentials or make network calls. That mismatch is unexpected and unexplained.

⚠ 指令范围

SKILL.md instructs validating images, running a detection model first, polling tasks, and saving normalized payloads, chosen resolution, and task polling snapshots under output/. The provided script only writes a single request.json and prints its path; it does not perform detection, polling, network calls, or record the additional evidence SKILL.md promises. The instructions also tell users to set DASHSCOPE_API_KEY or a credentials file entry, but no code in this bundle uses those values.

✓ 安装机制

This is instruction-only with a tiny helper script and no install spec — minimal disk/runtime footprint and no archived downloads. Low install risk.

⚠ 凭证需求

Registry metadata claims no required env vars, but SKILL.md requires DASHSCOPE_API_KEY (or a dashscope_api_key in ~/.alibabacloud/credentials). Requesting cloud credentials is reasonable for a cloud integration, but the specific variable name and the mismatch with metadata are unexplained. It's unclear whether DASHSCOPE_API_KEY is an Alibaba Cloud credential, a third-party proxy, or something else — this should be clarified before trusting keys.

✓ 持久化与权限

The skill does not request always: true, does not include install scripts that modify global agent settings, and is user-invocable only. No elevated persistence or system-wide changes are present.

安全有层次，运行前请审查代码。

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

无特殊依赖

版本

latestv1.0.02026/4/2

Initial release of aliyun-wan-digital-human skill. - Enables generation of talking, singing, or presentation videos from a character image and audio using Alibaba Cloud Model Studio digital-human models. - Supports image validation and video generation workflows with distinct model names: `wan2.2-s2v-detect` for validation and `wan2.2-s2v` for video. - Exposes a normalized interface for detection and video creation requests. - Requires API key setup and China (Beijing) region. - Outputs all requests, responses, and task snapshots to a dedicated directory for traceability.

● 无害

安装命令点击复制

官方npx clawhub@latest install aliyun-wan-digital-human

镜像加速npx clawhub@latest install aliyun-wan-digital-human --registry https://cn.clawhub-mirror.com

技能文档

Category: provider

# Model Studio Digital Human

Validation

mkdir -p output/aliyun-wan-digital-human
python -m py_compile skills/ai/video/aliyun-wan-digital-human/scripts/prepare_digital_human_request.py && echo "py_compile_ok" > output/aliyun-wan-digital-human/validate.txt

Pass criteria: command exits 0 and output/aliyun-wan-digital-human/validate.txt is generated.

Output And Evidence

Save normalized request payloads, chosen resolution, and task polling snapshots under output/aliyun-wan-digital-human/.
Record image/audio URLs and whether the input image passed detection.

Use this skill for image + audio driven speaking, singing, or presenting characters.

Critical model names

Use these exact model strings:

wan2.2-s2v-detect
wan2.2-s2v

Selection guidance:

Run wan2.2-s2v-detect first to validate the image.
Use wan2.2-s2v for the actual video generation job.

Prerequisites

China mainland (Beijing) only.
Set DASHSCOPE_API_KEY in your environment, or add dashscope_api_key to ~/.alibabacloud/credentials.
Input audio should contain clear speech or singing, and input image should depict a clear subject.

Normalized interface (video.digital_human)

Detect Request

model (string, optional): default wan2.2-s2v-detect
image_url (string, required)

Generate Request

model (string, optional): default wan2.2-s2v
image_url (string, required)
audio_url (string, required)
resolution (string, optional): 480P or 720P
scenario (string, optional): talk, sing, or perform

Response

task_id (string)
task_status (string)
video_url (string, when finished)

Quick start

python skills/ai/video/aliyun-wan-digital-human/scripts/prepare_digital_human_request.py \
  --image-url "https://example.com/anchor.png" \
  --audio-url "https://example.com/voice.mp3" \
  --resolution 720P \
  --scenario talk

Operational guidance

Use a portrait, half-body, or full-body image with a clear face and stable framing.
Match audio length to the desired output duration; the output follows the audio length up to the model limit.
Keep image and audio as public HTTP/HTTPS URLs.
If the image fails detection, do not proceed directly to video generation.

Output location

Default output: output/aliyun-wan-digital-human/request.json
Override base dir with OUTPUT_DIR.

References

references/sources.md

Category: provider

# Model Studio Digital Human

Validation

mkdir -p output/aliyun-wan-digital-human
python -m py_compile skills/ai/video/aliyun-wan-digital-human/scripts/prepare_digital_human_request.py && echo "py_compile_ok" > output/aliyun-wan-digital-human/validate.txt

Pass criteria: command exits 0 and output/aliyun-wan-digital-human/validate.txt is generated.

Output And Evidence

Save normalized request payloads, chosen resolution, and task polling snapshots under output/aliyun-wan-digital-human/.
Record image/audio URLs and whether the input image passed detection.

Use this skill for image + audio driven speaking, singing, or presenting characters.

Critical model names

Use these exact model strings:

wan2.2-s2v-detect
wan2.2-s2v

Selection guidance:

Run wan2.2-s2v-detect first to validate the image.
Use wan2.2-s2v for the actual video generation job.

Prerequisites

China mainland (Beijing) only.
Set DASHSCOPE_API_KEY in your environment, or add dashscope_api_key to ~/.alibabacloud/credentials.
Input audio should contain clear speech or singing, and input image should depict a clear subject.

Normalized interface (video.digital_human)

Detect Request

model (string, optional): default wan2.2-s2v-detect
image_url (string, required)

Generate Request

model (string, optional): default wan2.2-s2v
image_url (string, required)
audio_url (string, required)
resolution (string, optional): 480P or 720P
scenario (string, optional): talk, sing, or perform

Response

task_id (string)
task_status (string)
video_url (string, when finished)

Quick start

python skills/ai/video/aliyun-wan-digital-human/scripts/prepare_digital_human_request.py \
  --image-url "https://example.com/anchor.png" \
  --audio-url "https://example.com/voice.mp3" \
  --resolution 720P \
  --scenario talk

Operational guidance

Use a portrait, half-body, or full-body image with a clear face and stable framing.
Match audio length to the desired output duration; the output follows the audio length up to the model limit.
Keep image and audio as public HTTP/HTTPS URLs.
If the image fails detection, do not proceed directly to video generation.

Output location

Default output: output/aliyun-wan-digital-human/request.json
Override base dir with OUTPUT_DIR.

References

references/sources.md

数据来源：ClawHub ↗ · 中文优化：龙虾技能库

OpenClaw 技能定制 / 插件定制 / 私有工作流定制

免费技能或插件可能存在安全风险，如需更匹配、更安全的方案，建议联系付费定制

了解定制服务

License

运行时依赖

版本

安装命令 点击复制

技能文档

Validation

Output And Evidence

Critical model names

Prerequisites

Normalized interface (video.digital_human)

Detect Request

Generate Request

Response

Quick start

Operational guidance

Output location

References

Validation

Output And Evidence

Critical model names

Prerequisites

Normalized interface (video.digital_human)

Detect Request

Generate Request

Response

Quick start

Operational guidance

Output location

References

安装命令点击复制