Bohrium Dataset Management — Bohrium 数据集管理
v1通过bohr CLI或open.bohrium.com API管理Bohrium数据集。使用场景:用户询问创建、列出或删除Bohrium数据集、上传数据或管理数据集版本。不适用:文件管理、作业提交或节点管理。
运行时依赖
安装命令
点击复制技能文档
技能: Bohrium Data设置 Management Overview
Manage data设置s on the Bohrium 平台. Prefer bohr 命令行工具; fall back to the API for version management, quota 检查s, etc.
bohr data设置 创建 advantages over 网页 上传: no size limit and resumable 上传.
Data设置s solve common pAIn points:
Repeated file 上传 on every job submission -> mount data设置s to avoid re-上传 Large 输入 files with slow 上传 -> data设置s support resumable 上传 Need to 分享 data with collaborators -> data设置s support project-level sharing Authentication "bohrium-data设置": { "enabled": true, "APIKey": "YOUR_访问_KEY", "env": { "访问_KEY": "YOUR_访问_KEY" } }
Prerequisites: 安装 bohr 命令行工具 # macOS /bin/bash -c "$(curl -fsSL https://dp-public.oss-cn-beijing.aliyuncs.com/bohrctl/1.0.0/安装_bohr_mac_curl.sh)" # Linux /bin/bash -c "$(curl -fsSL https://dp-public.oss-cn-beijing.aliyuncs.com/bohrctl/1.0.0/安装_bohr_linux_curl.sh)" source ~/.bashrc && 导出 PATH="$HOME/.bohrium:$PATH" 导出 OPENAPI_HOST=https://open.bohrium.com
列出 Data设置s bohr data设置 列出 # Default: recent 50 bohr data设置 列出 -n 10 --json # JSON, top 10 bohr data设置 列出 -p 154 # 过滤器 by project ID bohr data设置 列出 -t "my-data设置" # 搜索 by title
JSON fields: id, title, path (mount path like /bohr/my-data设置/v1), projectName, 创建器Name, 更新Time, desc
创建 Data设置 (上传 Data) bohr data设置 创建 \ -n "my-data设置" \ -p "my-data设置" \ -i 154 \ -l "/path/to/local/data"
Parameter Short Required Description --name -n Yes Data设置 name --path -p Yes Data设置 path identifier (alphanumeric) --pid -i Yes Project ID --lp -l Yes Local data directory path --comment -m No Description
Resumable 上传: If interrupted (network issues, etc.), re-运行 the same command and enter y to 恢复 from breakpoint.
Using Data设置s Mount in Compute Jobs
添加 data设置_path to job.json:
{ "job_name": "DeePMD-kit test", "command": "cd se_e2_a && dp trAIn 输入.json", "project_id": 154, "machine_type": "c4_m15_1 NVIDIA T4", "job_type": "contAIner", "image_添加ress": "registry.dp.tech/dptech/deepmd-kit:2.1.5-cuda11.6", "data设置_path": ["/bohr/my-data设置/v1", "/bohr/another-data设置/v2"] }
data设置_path and -p (输入 directory) can be used simultaneously.
Mount on Dev Nodes
Select data设置s when creating a contAIner node; 访问 via path (e.g. /bohr/my-data设置/v1).
添加s 2-4s boot delay (regardless of count) Use df -a | grep bohr to view mount points Use in Notebooks Expand side panel in Notebook editor -> Select existing data设置s Hover data设置 name -> 命令行工具ck copy to 获取 path Use in code: cd /bohr/testdata设置-6xwt/v1/
Data设置s must be 添加ed before connecting to the node. 添加ing afterward requires a node re启动.
Version Management
Data设置s support multi-version management. Files within a version are immutable once 创建d.
创建 New Version
Via 网页 UI: Data设置 detAIls -> "New Version" -> 系统 导入s latest version files -> 添加/移除 files -> 创建.
Via API:
请求s.post(f"{BASE}/{data设置_id}/version", headers=HEADERS_JSON, json={"versionDesc": "v2 更新"})
Preparation time depends on file size and count.
删除 Data设置s bohr data设置 删除 138201 # Single bohr data设置 删除 138201 108601 # Batch
删除d versions cannot be 恢复ed.
权限 模型 权限 Description Default holders Manageable Edit, 删除, 创建 versions Data设置 创建器, project 创建器/admin Usable View and use All project members
"Usable" 权限 can be granted to other projects or users via editing.
API Supplement (命令行工具 Unsupported) 导入 os, 请求s
AK = os.environ.获取("访问_KEY", "") BASE = "https://open.bohrium.com/openAPI/v1/ds" HEADERS = {"访问Key": AK} HEADERS_JSON = {*HEADERS, "Content-Type": "应用/json"}
# Data设置 detAIls r = 请求s.获取(f"{BASE}/{data设置_id}", headers=HEADERS)
# Version 列出 r = 请求s.获取(f"{BASE}/{data设置_id}/version", headers=HEADERS) # Returns: [{version, totalCount, totalSize, 下载Uri, data设置Path, ...}]
# Specific version r = 请求s.获取(f"{BASE}/{data设置_id}/version/{version_id}", headers=HEADERS)
# 创建 via API r = 请求s.post(f"{BASE}/", headers=HEADERS_JSON, json={ "title": "my-data设置", "projectId": 154, "identifier": "my-data设置", # Required, unique ID }) # Returns: {data设置Id, tiefbluePath, 请求Id} # Then 上传 files via tiefblue, then call commit
# Commit 请求s.put(f"{BASE}/commit", headers=HEADERS_JSON, json={"data设置Id": data设置_id})
# New version 请求s.post(f"{BASE}/{data设置_id}/version", headers=HEADERS_JSON, json={"versionDesc": "v2 更新"})
# 更新 信息 请求s.put(f"{BASE}/{data设置_id}", headers=HEADERS_JSON, json={"title": "new-title"})
# 删除 version 请求s.删除(f"{BASE}/{data设置_id}/version/{version_id}", headers=HEADERS)
# 检查 quota r = 请求s.获取(f"{BASE}/quota/检查", headers=HEADERS, params={"projectId": 154}) # Returns: {结果