提交工作负载

K8S工作负载类型

K8S原生的工作负载，包括Deployment、StatefulSet、Pod等，均可以基于yaml进行提交。

英博云支持通过K8S原生的yaml语法，指定运行工作负载的节点类型，以及实例规格。

一个Deployment示例

以下提供了一个部署nginx的例子：

注意：
这里我们通过nodeAffinity标签指定节点类型为CPU，具体是这个label：cloud.ebtech.com/cpu=amd-epyc-milan。你也可以整体省略affinity部分，英博云默认会调度到CPU节点。
实例规格方面，我们指定为：125毫核 CPU，256MB 内存，这是英博云CPU资源的最小规格。
关于节点类型及规格的更多信息，参考：节点类型与规格

# nginx.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      affinity:  # 这里指定节点类型，本例为CPU
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: cloud.ebtech.com/cpu
                operator: In
                values:
                - amd-epyc-milan
      containers:
      - name: nginx
        image: nginx:1.21.6
        ports:
        - containerPort: 80
        resources:
          limits:  # 这里指定实例规格，对CPU来说，需要指定内存及CPU
            memory: "256Mi" # 指定内存需求
            cpu: "125m"     # 指定cpu需求
        livenessProbe:
          httpGet:
            path: /
            port: 80
          initialDelaySeconds: 15
          periodSeconds: 20
        readinessProbe:
          httpGet:
            path: /
            port: 80
          initialDelaySeconds: 5
          periodSeconds: 10
      restartPolicy: Always

将以上文件保存为：nginx.yaml，然后执行以下命令，提交工作负载

kubectl apply -f nginx.yaml

提交GPU工作负载

若希望提交GPU类型工作负载，必须通过 nodeAffinity 指定GPU的节点类型，同时需要并在resources字段，指定卡的数量。示例如下：

apiVersion: apps/v1
kind: Deployment
metadata:
  name: deepseek-chat
  namespace: default
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: deepseek
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: deepseek
    spec:
      affinity:  # 这里指定节点类型，本例为4090D GPU节点
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: cloud.ebtech.com/gpu
                operator: In
                values:
                - RTX_4090D
     containers:
      - image: registry-cn-huabei1-internal.ebcloud.com/tenant-61616664/chat-inference:latest
        imagePullPolicy: Always
        name: deepseek
        ports:
        - containerPort: 8000
          protocol: TCP
        resources:
          limits:      # 这里指定示例规格，对于GPU来说，需要指定卡数量、内存及CPU
            nvidia.com/gpu: 1 # 这里指定GPU卡数量
            memory: "100Gi"   # 指定内存
            cpu: "10"         # 指定CPU
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30

指定GPU驱动版本

当选择GPU节点类型时，可以进一步指定GPU卡的驱动版本，语法是在 nodeAffinity 字段，进一步补充驱动信息，示例如下：

    ...
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: cloud.ebtech.com/gpu
                operator: In
                values:
                - H800_NVLINK_80GB     # 指定节点类型为 H800 节点
              - key: cloud.ebtech.com/gpu-driver
                operator: In
                values:
                - 580.65.06            # 指定驱动版本为 580.65.06
    ...

为工作负载开启 Gang 调度

Gang 调度是一种 all-or-nothing 的调度机制，用于一个工作负载需要多个 Pod 配合的场景。

开启gang调度后，英博云会保证所有 Pod 资源到位后，才会启动调度，避免部分pod启动占据资源造成消费，但是因为数量不足，无法真正完成任务。

apiVersion: v1
kind: Deployment
metadata:
  annotations:
    eks.ebcloud.com/gang-min-member: 2  # 开启 Gang 调度，这里表示至少2个Pod成组调度，一起执行Pod分配或者拒绝分配
    ...
  name: pod-xxx
  namespace: pod-xxx
spec:
    ...

提交 Spot 竞价实例工作负载

英博云提供闲时资源作为 Spot 竞价实例，竞价实例价格有一定折扣，但是可能随时被停止以释放资源。

K8S原生RS，包括 Deployment、Pod、StatefuSet，以及用户自定义的 RS 类型的工作负载均可以开启，开启方式为配置 annotations 参数，具体值为：eks.ebcloud.com/enable-spot: "true"，示例如下：

apiVersion: v1
kind: Deployment
metadata:
  annotations:
    eks.ebcloud.com/enable-spot: "true"  # 竞价annotation 标记
    ...
  name: pod-xxx
  namespace: pod-xxx
spec:
    ...