📦 Dataset Finder — 搜数据集

v0.1.0

一句话指令即可在 Kaggle、Hugging Face、UCI、Data.gov 等主流仓库中搜索并下载数据集,自动预览统计信息、生成数据卡片,为机器学习项目快速备好所需数据。需先安装 OpenClawCLI。

1· 1.6k·7 当前·7 累计
by @anisafifi (Anis Afifi)·Creative
下载技能包
License
Creative
最后更新
2026/3/1
0
安全扫描
VirusTotal
无害
查看报告
OpenClaw
可疑
medium confidence
The skill mostly does what it says (search/download datasets), but there are mismatches in the metadata/instructions (an unexplained requirement for OpenClawCLI and some incomplete/placeholder behaviors) that warrant caution before installing or running it.
评估建议
What to check before installing/using this skill: - Source & provenance: the package's 'Homepage' and 'Source' are empty and SKILL.md claims a dependency on 'OpenClawCLI (clawhub.ai)' that the code does not use — ask the publisher why OpenClawCLI is required and verify the project's origin before running. - Review the code yourself: the included scripts/dataset.py performs network requests, scraping, and writes files to datasets/. Inspect it if you have sensitive local data or policies about do...
详细分析 ▾
用途与能力
The skill's code and SKILL.md implement dataset search/download/preview across Kaggle, Hugging Face, UCI and saving results locally — which matches the stated purpose. However the SKILL.md and README repeatedly say 'Requires OpenClawCLI installation from clawhub.ai' even though neither the code nor install instructions use or call an OpenClawCLI binary or service. That claimed dependency is unexplained and inconsistent with the included Python script and requirements.
指令范围
Runtime instructions are limited to running the included Python script, installing listed Python packages, and placing Kaggle/Hugging Face credentials where their respective CLIs/APIs expect them. The script downloads files from public repositories, scrapes UCI (placeholder implementation), and writes datasets under local 'datasets/' directories. There are no instructions to read arbitrary unrelated system files or to exfiltrate data to unknown endpoints.
安装机制
There is no automated install spec (instruction-only for pip install), and the included requirements.txt references standard public Python packages (kaggle, datasets, pandas, requests, beautifulsoup4, etc.). No downloads from obscure URLs or archive extraction steps are present. This is a low-risk install mechanism but requires installing networked packages from PyPI as usual.
凭证需求
The registry metadata lists no required environment variables, which aligns with the code. The SKILL.md correctly documents Kaggle credentials (kaggle.json in ~/.kaggle) and optional HF_TOKEN for Hugging Face. That is proportionate to dataset downloads. The anomalous claim that OpenClawCLI is required (and a reference to clawhub.ai) is not justified by the code and should be clarified before use.
持久化与权限
The skill does not request persistent/global privileges, does not set always:true, and contains no installation steps that would modify other skills or system-wide agent configuration. It writes downloaded datasets into local directories (datasets/...), which is expected behavior for its purpose.
安全有层次,运行前请审查代码。

License

Creative

请查看许可证条款了解详情。

运行时依赖

无特殊依赖

版本

latestv0.1.02026/2/8

Initial public release of Dataset Finder. - Search, download, and explore datasets from Kaggle, Hugging Face, UCI ML Repository, and Data.gov. - Preview datasets (stats, columns, types, missing values, sample rows). - Generate data cards with schema, usage, license, and citation details. - Manage and list local datasets. - Requires OpenClawCLI for core functionality.

无害

安装命令

点击复制
官方npx clawhub@latest install dataset-finder
镜像加速npx clawhub@latest install dataset-finder --registry https://cn.longxiaskill.com
数据来源ClawHub ↗ · 中文优化:龙虾技能库