# Kubernetes

UM-Bridge provides a kubernetes-based solution for running any UM-Bridge model container on cloud platforms at HPC scale.

These instructions show how to use the UM-Bridge kubernetes configuration. It assumes that you have a kubernetes cluster available, for example by following our insructions for setting up Google Kubernetes Engine (GKE).

## Step 1: Clone UM-Bridge

First, clone the UM-Bridge repository by running:

```
git clone https://github.com/UM-Bridge/umbridge.git
```

In the cloned repository, navigate to the folder `kubernetes`. It contains all that is needed in the following.

## Step 2: Set up load balancer

First, retrieve HAProxy:

```
helm repo add haproxytech https://haproxytech.github.io/helm-charts
```

```
helm repo update
```

```
helm install kubernetes-ingress haproxytech/kubernetes-ingress \
--create-namespace \
--namespace \
haproxy-controller \
--set controller.service.type=LoadBalancer \
--set controller.replicaCount=1 \
--set defaultBackend.replicaCount=1 \
--set controller.logging.level=debug \
--set controller.ingressClass=haproxy
```

Then, start HAProxy with the configuration provided by UM-Bridge:

```
kubectl apply -f setup/svc-ingress.yml
```

## Step 3: Run model instances

To start model instances, run:

```
kubectl create -f model.yaml
```

You can check the status of the model instances by running:

```
kubectl get pods
```

They can be deleted again via:

```
kubectl delete -f model.yaml
```

The default `model.yaml` may be adjusted to run your own model by changing:
- `image`: The docker image containing a model with UM-Bridge support. Any image provided by the UM-Bridge project will work right away.
- `replicas`: The number of model instances to run.
- `limits`/`requests`: The CPU and memory resources your model should receive. Keep `limits` equal to `requests` to ensure that your model has exclusive access to a fixed amount of resources.
- `env`: Environment variables that should be passed to the model.

## Step 4: Calling the model

The model instances are now available through your load balancer's IP address, which you can determine from:

```
kubectl get services --namespace=haproxy-controller
```

The model instances may be accessed from any UM-Bridge client, and up to `replicas` requests will be handled in parallel.

## Multinode MPI on kubernetes

The instructions above work for any UM-Bridge model container, even ones that are MPI parallel. However, a single container is naturally limited to a single physical node. In order to parallelize across nodes (and therefore across containers) via MPI, the additional steps below are needed.

### Step 1: mpi-operator base image

The multinode MPI configuration makes use of the [mpi-operator](https://github.com/kubeflow/mpi-operator) from kubeflow. This implies that the mode base image has to be constructed via one of the following base images, depending on MPI implementation:

- `mpioperator/openmpi-builder`
- `mpioperator/intel-builder`

When separating between builder and final image, the corresponding base images may be used for the latter:

- `mpioperator/openmpi`
- `mpioperator/intel`


### Step 2: Deploy mpi-operator

In addition to choosing a suitable base image for the model, the mpi-òperator needs to be deployed on the cluster:

```
kubectl apply -f https://raw.githubusercontent.com/kubeflow/mpi-operator/master/deploy/v2beta1/mpi-operator.yaml
```

### Step 3: Setting up NFS

The multinode MPI setup mounts a shared (NFS) file system on the `/shared` directory of your model container, replicating a traditional HPC setup. The NFS server is set up via:

```
kubectl apply -f setup/nfs.yaml
```

Note: This assumes a disk 'gce-nfs-disk' to be set up in GCE!

In order to finish the setup we need the IP address. We can get this from

```
kubectl describe service
```

Change `setup/nfs-pv-pvc.yaml` to the IP address you just retrieved, and (if needed) adjust storage capacity to the attached disk.

Then run:

```
kubectl apply -f setup/nfs-pv-pvc.yaml
```

### Step 4: Running a job on the new cluster

The job configuration is located in `multinode-mpi-model.yaml`. It is largely analogous to `model.yaml`, except that both launcher and worker containers are configured. The relevant additional config options are:

- `slotsPerWorker`: The number of MPI ranks per worker container.
- `mpiImplementation`: By default set to OpenMPI, but can be changed to `Intel`.
- `command`: The launcher is expected to run the UM-Bridge server, which then should call mpirun for model evaluations. Workers should only execute the (pre-defined) `sshd` command in order to listen for requests from the launcher.
- `replicas`: Expected to be 1 for the launcher, and the number of model instances you want to run for the workers.

The `multinode-mpi-model.yaml` file describes a single launcher with a number of workers assigned to it. In order to run multiple jobs (each a launcher and multiple MPI parallel workers), run the following script:

```
bash launch-multinode-mpi-model.sh
```

It creates a number of jobs from the `multinode-mpi-model.yaml` file, each time substituting `JOB_INDEX` for a unique index.

Just as before, access to the model instances is now available via the load balancer's IP address, as described in Step 4 of the single-node setup.

All jobs can be shut down via:

```
kubectl delete MPIJob --all
```