详细分析 ▾
运行时依赖
版本
Initial release of aliyun-wan-digital-human skill. - Enables generation of talking, singing, or presentation videos from a character image and audio using Alibaba Cloud Model Studio digital-human models. - Supports image validation and video generation workflows with distinct model names: `wan2.2-s2v-detect` for validation and `wan2.2-s2v` for video. - Exposes a normalized interface for detection and video creation requests. - Requires API key setup and China (Beijing) region. - Outputs all requests, responses, and task snapshots to a dedicated directory for traceability.
安装命令 点击复制
技能文档
Category: provider
# Model Studio Digital Human
Validation
mkdir -p output/aliyun-wan-digital-human
python -m py_compile skills/ai/video/aliyun-wan-digital-human/scripts/prepare_digital_human_request.py && echo "py_compile_ok" > output/aliyun-wan-digital-human/validate.txt
Pass criteria: command exits 0 and output/aliyun-wan-digital-human/validate.txt is generated.
Output And Evidence
- Save normalized request payloads, chosen resolution, and task polling snapshots under
output/aliyun-wan-digital-human/. - Record image/audio URLs and whether the input image passed detection.
Use this skill for image + audio driven speaking, singing, or presenting characters.
Critical model names
Use these exact model strings:
wan2.2-s2v-detectwan2.2-s2v
Selection guidance:
- Run
wan2.2-s2v-detectfirst to validate the image. - Use
wan2.2-s2vfor the actual video generation job.
Prerequisites
- China mainland (Beijing) only.
- Set
DASHSCOPE_API_KEYin your environment, or adddashscope_api_keyto~/.alibabacloud/credentials. - Input audio should contain clear speech or singing, and input image should depict a clear subject.
Normalized interface (video.digital_human)
Detect Request
model(string, optional): defaultwan2.2-s2v-detectimage_url(string, required)
Generate Request
model(string, optional): defaultwan2.2-s2vimage_url(string, required)audio_url(string, required)resolution(string, optional):480Por720Pscenario(string, optional):talk,sing, orperform
Response
task_id(string)task_status(string)video_url(string, when finished)
Quick start
python skills/ai/video/aliyun-wan-digital-human/scripts/prepare_digital_human_request.py \
--image-url "https://example.com/anchor.png" \
--audio-url "https://example.com/voice.mp3" \
--resolution 720P \
--scenario talk
Operational guidance
- Use a portrait, half-body, or full-body image with a clear face and stable framing.
- Match audio length to the desired output duration; the output follows the audio length up to the model limit.
- Keep image and audio as public HTTP/HTTPS URLs.
- If the image fails detection, do not proceed directly to video generation.
Output location
- Default output:
output/aliyun-wan-digital-human/request.json - Override base dir with
OUTPUT_DIR.
References
references/sources.md
免费技能或插件可能存在安全风险,如需更匹配、更安全的方案,建议联系付费定制