Gpu Fpt Cloud K8S
FPT Cloud provides Kubernetes with NVIDIA GPU support, featuring the following key functionalities:
- Flexible GPU configuration with the option to choose GPU types and GPU memory for each Worker Group.
- Automatic GPU resource management and allocation in Kubernetes with NVIDIA Operator.
- Visualization and monitoring of GPUs through NVIDIA DCGM (Data Center GPU Manager).
- Automatic scaling of Containers/Nodes with Autoscaler based on increasing/decreasing GPU resource demands from applications.
- Support for GPU sharing using the Multi-Instance mechanism, optimizing resource utilization and GPU usage cost.
FPT Cloud utilizes the NVIDIA GPU Operator to provide an automated tool for managing all the necessary software components to use GPUs on Kubernetes (K8s). The GPU Operator allows users to utilize GPU resources in a K8s cluster similar to how they use CPUs. The software components include:
- NVIDIA Drivers (CUDA, MIG, ...)
- NVIDIA Device Plugin
- NVIDIA Container Toolkit
- NVIDIA GPU Feature Discovery
- NVIDIA Data Center GPU Manager (Monitoring)
The K8s Operator automatically configures Multi-Instance GPU (MIG) for workers. To achieve MIG configuration, workers need to be labeled according to profiles supported by Nvidia. MIG configurations are listed in the Configmap default-mig-parted-config on K8s in the 'gpu-operator' namespace.
MIG on Kubernetes is designed as a controller. It monitors changes to the label 'nvidia.com/mig.config' on the worker, then applies the MIG configuration requested by the user. When the label changes, MIG first stops all GPU pods, including the device plugin, GFD (GPU Feature Discovery), and DCGM exporter. It then stops all systemd services on GPU workers if the driver is pre-installed. These services are listed in the configmap named 'default-gpu-clients.' Finally, MIG reapplies the MIG configuration and restarts the GPU pods (and potentially GPU systemd services on GPU workers if needed). Enabling MIG mode requires restarting the worker.
FPT Cloud currently supports the Nvidia A30 GPU card and supports the following MIG profiles-labels:

