Local Storage 本地存储

基本介绍

local storage 在 pod 所在的物理机节点分配一块 ephemeral 存储空间。该空间为临时性存储，Pod 重建、删除或调度时数据会丢失。适用场景为业务需要大容量本地存储，且无需持久化保存数据。

注意：
专属节点池、预约节点池中的工作负载，开启本地存储不会再额外收取费用。

不同规格支持的 local storage 容量上限

目前仅A800，H800，H100支持 local storage。

H800支持的 local storage 上限

规格名称	容量上限
bob-eci.h800.5xlarge	768 GB
bob-eci.h800.10xlarge	1536 GB
bob-eci.h800.20xlarge	3072 GB
bob-eci.h800.40xlarge	6144 GB

A800支持的 local storage 上限

规格名称	容量上限
bob-eci.a800.5large	768 GB
bob-eci.a800.10large	1536 GB
bob-eci.a800.20large	3072 GB
bob-eci.a800.40large	6144 GB
bob-eci.a800.5xlarge	768 GB
bob-eci.a800.10xlarge	1536 GB
bob-eci.a800.20xlarge	3072 GB
bob-eci.a800.40xlarge	6144 GB

H100支持的 local storage 上限

规格名称	容量上限
bob-eci.h100.5xlarge	768 GB
bob-eci.h100.10xlarge	1536 GB
bob-eci.h100.20xlarge	3072 GB
bob-eci.h100.40xlarge	6144 GB

基于开发机使用本地存储

具体示例如下：

基于 k8s 工作负责使用本地存储

Storage Class

ephemeral-els-premium

注意：
不可以直接用这个sc创建pvc，需要与pod一同使用。
参考下面的示例。

Deployment 使用本地存储 yaml 示例

yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: test1
  namespace: default
  labels:
    app: test1
spec:
  replicas: 1
  selector:
    matchLabels:
      app: test1
  template:
    metadata:
      labels:
        app: test1
    spec:
      schedulerName: volcano
      terminationGracePeriodSeconds: 1
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: cloud.ebtech.com/gpu
                    operator: In
                    values: ["A800_NVLINK_80GB"]
      containers:
        - name: server
          image: registry-cn-huabei1-internal.ebcloud.com/ebsys/pytorch:2.7.0-cuda12.9-python3.12-ubuntu24.04-v01
          imagePullPolicy: IfNotPresent
          command:
            - bash
            - -c
            - |
              # 捕获 SIGTERM 信号并立即退出
              trap 'echo "Received SIGTERM, exiting..."; exit 0' SIGTERM

              # 使用一个简单的循环，而不是 sleep infinity
              # 这样可以更快地响应信号
              while true; do
                sleep 1
              done
          resources:
            limits:
              cpu: "10"
              memory: 100Gi
              nvidia.com/gpu: "1"
          securityContext:
            capabilities:
              add: ["IPC_LOCK"]
          volumeMounts:
            - mountPath: /mnt/tmp-test1
              name: tmp-test1
      volumes:
        - name: tmp-test1
          ephemeral:
            volumeClaimTemplate:
              spec:
                accessModes: [ "ReadWriteOnce" ]
                resources:
                  requests:
                    storage: 500Gi
                storageClassName: ephemeral-els-premium
                volumeMode: Filesystem

StatefulSet 使用本地存储 yaml 示例

yaml

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: test2
  namespace: default
  labels:
    app: test2
spec:
  replicas: 1
  serviceName: test2
  selector:
    matchLabels:
      app: test2
  template:
    metadata:
      labels:
        app: test2
    spec:
      schedulerName: volcano
      terminationGracePeriodSeconds: 1
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: cloud.ebtech.com/gpu
                    operator: In
                    values: ["A800_NVLINK_80GB"]
      containers:
        - name: server
          image: registry-cn-huabei1-internal.ebcloud.com/ebsys/pytorch:2.7.0-cuda12.9-python3.12-ubuntu24.04-v01
          imagePullPolicy: IfNotPresent
          command:
            - bash
            - -c
            - |
              # 捕获 SIGTERM 信号并立即退出
              trap 'echo "Received SIGTERM, exiting..."; exit 0' SIGTERM

              # 使用一个简单的循环，而不是 sleep infinity
              # 这样可以更快地响应信号
              while true; do
                sleep 1
              done
          resources:
            limits:
              cpu: "10"
              memory: 100Gi
              nvidia.com/gpu: "1"
          securityContext:
            capabilities:
              add: ["IPC_LOCK"]
          volumeMounts:
            - mountPath: /mnt/tmp-test1
              name: tmp-test1
      volumes:
        - name: tmp-test1
          ephemeral:
            volumeClaimTemplate:
              spec:
                accessModes: [ "ReadWriteOnce" ]
                resources:
                  requests:
                    storage: 500Gi
                storageClassName: ephemeral-els-premium
                volumeMode: Filesystem

Pod 使用本地存储 yaml 示例

yaml

apiVersion: v1
kind: Pod
metadata:
  name: test3
  namespace: default
  labels:
    app: test3
spec:
  schedulerName: volcano
  terminationGracePeriodSeconds: 1
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: cloud.ebtech.com/gpu
                operator: In
                values: ["A800_NVLINK_80GB"]
  containers:
    - name: server
      image: registry-cn-huabei1-internal.ebcloud.com/ebsys/pytorch:2.7.0-cuda12.9-python3.12-ubuntu24.04-v01
      imagePullPolicy: IfNotPresent
      command:
        - bash
        - -c
        - |
          # 捕获 SIGTERM 信号并立即退出
          trap 'echo "Received SIGTERM, exiting..."; exit 0' SIGTERM

          # 使用一个简单的循环，而不是 sleep infinity
          # 这样可以更快地响应信号
          while true; do
            sleep 1
          done
      resources:
        limits:
          cpu: "10"
          memory: 100Gi
          nvidia.com/gpu: "1"
      securityContext:
        capabilities:
          add: ["IPC_LOCK"]
      volumeMounts:
        - mountPath: /mnt/tmp-test1
          name: tmp-test1
  volumes:
    - name: tmp-test1
      ephemeral:
        volumeClaimTemplate:
          spec:
            accessModes: [ "ReadWriteOnce" ]
            resources:
              requests:
                storage: 500Gi
            storageClassName: ephemeral-els-premium
            volumeMode: Filesystem

Local Storage 本地存储 ​

基本介绍 ​

不同规格支持的 local storage 容量上限 ​

H800支持的 local storage 上限 ​

A800支持的 local storage 上限 ​

H100支持的 local storage 上限 ​

基于开发机使用本地存储 ​

基于 k8s 工作负责使用本地存储 ​

Storage Class ​

Deployment 使用本地存储 yaml 示例 ​

StatefulSet 使用本地存储 yaml 示例 ​

Pod 使用本地存储 yaml 示例 ​

Local Storage 本地存储

基本介绍

不同规格支持的 local storage 容量上限

H800支持的 local storage 上限

A800支持的 local storage 上限

H100支持的 local storage 上限

基于开发机使用本地存储

基于 k8s 工作负责使用本地存储

Storage Class

Deployment 使用本地存储 yaml 示例

StatefulSet 使用本地存储 yaml 示例

Pod 使用本地存储 yaml 示例