GCP - Kubernetes
This guide shows you how to setup Lumeo Gateways to run in a GCP Kubernetes Cluster.
Overview
This guide contains instructions to run Lumeo Gateway containers in a kubernetes cluster using the Google Kubernetes Engine in GCP.
Kubernetes Cluster Setup
Set default project
This guide assumes you are using a GCP project titled lumeo-kubernetes
gcloud config set project lumeo-kubernetes
Setup the kubernetes cluster
Helpful links: https://cloud.google.com/kubernetes-engine/docs/how-to/gpus
Private vs Public GKE Cluster
If you plan to process RTSP streams in these Lumeo Gateways within your cluster, you will need to set up your GKE cluster as a private cluster (ie nodes don't get auto-assigned public IPs). Doing so will let you use it with Google Cloud NAT which is required to make RTSP streaming work (Cloud NAT doesn't work with public IP nodes).
Note that it is not possible to switch a GKE cluster's mode once created.
gcloud beta container clusters create "lumeo-gateways" \
--zone "us-central1-a" \
--machine-type "e2-medium" \
--image-type "COS_CONTAINERD" \
--disk-type "pd-standard" --disk-size "75" \
--scopes "https://www.googleapis.com/auth/devstorage.read_only","https://www.googleapis.com/auth/logging.write","https://www.googleapis.com/auth/monitoring","https://www.googleapis.com/auth/servicecontrol","https://www.googleapis.com/auth/service.management.readonly","https://www.googleapis.com/auth/trace.append" \
--max-pods-per-node "48" \
--num-nodes "1" \
--enable-private-nodes --master-ipv4-cidr "172.17.0.0/28" --enable-master-global-access \
--enable-ip-alias \
--network "projects/lumeo-kubernetes/global/networks/default" \
--subnetwork "projects/lumeo-kubernetes/regions/us-central1/subnetworks/default" \
--no-enable-intra-node-visibility \
--no-enable-master-authorized-networks \
--addons HorizontalPodAutoscaling,HttpLoadBalancing,GcePersistentDiskCsiDriver \
--enable-autoupgrade --enable-autorepair \
--enable-shielded-nodes \
--node-locations "us-central1-a"
gcloud beta container clusters create "lumeo-gateways" \
--zone "us-central1-a" \
--machine-type "e2-medium" \
--image-type "COS_CONTAINERD" \
--disk-type "pd-standard" --disk-size "75" \
--scopes "https://www.googleapis.com/auth/devstorage.read_only","https://www.googleapis.com/auth/logging.write","https://www.googleapis.com/auth/monitoring","https://www.googleapis.com/auth/servicecontrol","https://www.googleapis.com/auth/service.management.readonly","https://www.googleapis.com/auth/trace.append" \
--max-pods-per-node "48" \
--num-nodes "1" \
--enable-ip-alias \
--network "projects/lumeo-kubernetes/global/networks/default" \
--subnetwork "projects/lumeo-kubernetes/regions/us-central1/subnetworks/default" \
--no-enable-intra-node-visibility \
--no-enable-master-authorized-networks \
--addons HorizontalPodAutoscaling,HttpLoadBalancing,GcePersistentDiskCsiDriver \
--enable-autoupgrade --enable-autorepair \
--enable-shielded-nodes \
--node-locations "us-central1-a"
Setup Google Cloud NAT
Only if you created a Private GKE cluster in the previous step.
Reference : https://cloud.google.com/nat/docs/gke-example (Skip forward to Step 6)
Create a Cloud Router
gcloud compute routers create lumeo-gateways-nat-router --network default --region us-central1
Configure the Router
gcloud compute routers nats create lumeo-gateways-nat-config \
--router-region us-central1 \
--router lumeo-gateways-nat-router \
--nat-all-subnet-ip-ranges \
--auto-allocate-nat-external-ips \
--enable-dynamic-port-allocation
Create node pool with GPUs
Make sure that you have enough GPU quota approved by GCP.
gcloud container node-pools create "gpu-pool" \
--zone us-central1-a --cluster lumeo-gateways \
--machine-type "n1-standard-8" \
--accelerator "type=nvidia-tesla-t4,count=1" \
--disk-type "pd-standard" --disk-size "75" \
--enable-autoupgrade --enable-autorepair \
--enable-autoscaling --num-nodes 1 --min-nodes 1 --max-nodes 5
Setup kubectl
gcloud container clusters get-credentials lumeo-gateways --zone us-central1-a
Setup cluster to install GPU drivers
Install GCP's Default Drivers (v535)
This is the recommended way by Google.
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/cos/daemonset-preloaded-latest.yaml
Install v525 Nvidia Drivers
Use this approach only if GCP default drivers do not work.
Download the following file and run :
kubectl apply -f daemonset-nvidia-driver-installer.yaml
# Copyright 2022 Google Inc. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# The Dockerfile and other source for this daemonset are in
# https://cos.googlesource.com/cos/tools/+/refs/heads/master/src/cmd/cos_gpu_installer/
#
# This is the same as ../../daemonset.yaml except that it assumes that the
# docker image is present on the node instead of downloading from GCR. This
# allows easier upgrades because GKE can preload the correct image on the
# node and the daemonset can just use that image.
# Lumeo Updates: Update the Nvidia Driver version to 515.86.01 or 525.60.13
# Original file from : https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/cos/daemonset-preloaded-latest.yaml
# https://storage.googleapis.com/nvidia-drivers-us-public/tesla/525.60.13/NVIDIA-Linux-x86_64-525.60.13.run
# https://storage.googleapis.com/nvidia-drivers-us-public/tesla/515.86.01/NVIDIA-Linux-x86_64-515.86.01.run
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: nvidia-driver-installer
namespace: kube-system
labels:
k8s-app: nvidia-driver-installer
spec:
selector:
matchLabels:
k8s-app: nvidia-driver-installer
updateStrategy:
type: RollingUpdate
template:
metadata:
labels:
name: nvidia-driver-installer
k8s-app: nvidia-driver-installer
spec:
priorityClassName: system-node-critical
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: cloud.google.com/gke-accelerator
operator: Exists
tolerations:
- operator: "Exists"
hostNetwork: true
hostPID: true
volumes:
- name: dev
hostPath:
path: /dev
- name: vulkan-icd-mount
hostPath:
path: /home/kubernetes/bin/nvidia/vulkan/icd.d
- name: nvidia-install-dir-host
hostPath:
path: /home/kubernetes/bin/nvidia
- name: root-mount
hostPath:
path: /
- name: cos-tools
hostPath:
path: /var/lib/cos-tools
- name: nvidia-config
hostPath:
path: /etc/nvidia
initContainers:
- image: "gcr.io/cos-cloud/cos-gpu-installer:latest" #"cos-nvidia-installer:fixed"
imagePullPolicy: IfNotPresent
name: nvidia-driver-installer
resources:
requests:
cpu: "0.15"
securityContext:
privileged: true
env:
- name: NVIDIA_DRIVER_VERSION
value: "525.60.13" # or 515.86.01
- name: NVIDIA_INSTALL_DIR_HOST
value: /home/kubernetes/bin/nvidia
- name: NVIDIA_INSTALL_DIR_CONTAINER
value: /usr/local/nvidia
- name: VULKAN_ICD_DIR_HOST
value: /home/kubernetes/bin/nvidia/vulkan/icd.d
- name: VULKAN_ICD_DIR_CONTAINER
value: /etc/vulkan/icd.d
- name: ROOT_MOUNT_DIR
value: /root
- name: COS_TOOLS_DIR_HOST
value: /var/lib/cos-tools
- name: COS_TOOLS_DIR_CONTAINER
value: /build/cos-tools
volumeMounts:
- name: nvidia-install-dir-host
mountPath: /usr/local/nvidia
- name: vulkan-icd-mount
mountPath: /etc/vulkan/icd.d
- name: dev
mountPath: /dev
- name: root-mount
mountPath: /root
- name: cos-tools
mountPath: /build/cos-tools
#command: ['/cos-gpu-installer', 'install', '--allow-unsigned-driver', '--nvidia-installer-url=https://storage.googleapis.com/nvidia-drivers-us-public/tesla/525.60.13/NVIDIA-Linux-x86_64-525.60.13.run']
- image: "gcr.io/gke-release/nvidia-partition-gpu@sha256:c54fd003948fac687c2a93a55ea6e4d47ffbd641278a9191e75e822fe72471c2"
name: partition-gpus
env:
- name: LD_LIBRARY_PATH
value: /usr/local/nvidia/lib64
resources:
requests:
cpu: "0.15"
securityContext:
privileged: true
volumeMounts:
- name: nvidia-install-dir-host
mountPath: /usr/local/nvidia
- name: dev
mountPath: /dev
- name: nvidia-config
mountPath: /etc/nvidia
containers:
- image: "gcr.io/google-containers/pause:2.0"
name: pause
Deploy Lumeo
Create a secret with your Lumeo App ID and Access Token.
The App ID and Access Token can be found in Workspace settings in Lumeo Console. See API for details.
Warning: Access Tokens start with a '$', so ensure you use single quotes below to prevent shell substitution.
kubectl create secret generic 'replace-with-lumeo-app-id' --from-literal=LUMEO_APP_ID='replace-with-lumeo-app-id' --from-literal=LUMEO_API_KEY='replace-with-lumeo-access-token'
Replace the App ID in lumeo-gateway.yaml
Search for replace-with-lumeo-app-id in the yaml template below.
# This Service is required just for K8S to run the Stateful set.
# See https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#limitations
# Note that Lumeo gateways do not expose any local APIs that can be used directly via a Kubernetes "Service".
apiVersion: v1
kind: Service
metadata:
name: lumeod
labels:
app: lumeod
spec:
ports:
clusterIP: None
selector:
app: lumeod
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: lumeo-gateway
spec:
selector:
matchLabels:
app: lumeod
serviceName: "lumeod"
replicas: 1
updateStrategy:
type: RollingUpdate
template:
metadata:
labels:
app: lumeod
spec:
initContainers:
- name: lumeo-temp-fix-for-gateway-config-paths
image: busybox
command:
- sh
- -c
- |
if [ ! -d "/var/lib/lumeo/models" ]; then
chmod -R 777 /var/lib/lumeo && mkdir -p /var/lib/lumeo/upload && mkdir -p /var/lib/lumeo/models && mkdir -p /var/lib/lumeo/media && chmod -R 777 /var/lib/lumeo
fi
if [ ! -d "/var/lib/lumeo/tracker_configs" ]; then
mkdir -p /var/lib/lumeo/tracker_configs && chmod -R 777 /var/lib/lumeo/tracker_configs
fi
volumeMounts:
- name: lumeo-gateway-config
mountPath: /var/lib/lumeo
containers:
- name: lumeo
image: 'lumeo/gateway-nvidia-dgpu:latest'
imagePullPolicy: Always
envFrom:
- secretRef:
name: 'replace-with-lumeo-app-id'
env:
- name: CONTAINER_MODEL
value: 'Kubernetes'
volumeMounts:
- name: lumeo-gateway-config
mountPath: /var/lib/lumeo
resources:
limits:
nvidia.com/gpu: 1
volumeClaimTemplates:
- metadata:
name: lumeo-gateway-config
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 10Gi # Increase this if you intend to deploy large models or generate local media.
-
Create a Stateful set.
This creates Lumeo gateways in the Lumeo Workspace specified using the App Id above.kubectl apply -f lumeo-gateway.yaml
Monitor stateful set creation with :
kubectl get statefulsets
Once created, these gateways will appear in your Lumeo account with names starting from
lumeo-gateway-0
, ...
Scaling the Cluster
kubectl scale statefulsets lumeo-gateway --replicas=2
Note:
- Scaling up will create new gateways with consecutive names (
lumeo-gateway-0
,lumeo-gateway-1
, ..) if they didn't exist previously. - Scaling down the set will remove the highest numbered gateway from the set. The gateway will go offline in the Lumeo console.
- When scaling up after scaling down, new gateways will NOT be created for those instances that existed before. Those gateways will just go online in the Lumeo console.
Updating lumeod versions
Replace the version below with the version you wish to deploy.
kubectl set image statefulset/lumeo-gateway --selector app=lumeod lumeo=lumeo/gateway-nvidia-dgpu:1.3.29
Updated 8 months ago