Running A More Recent Triton Helm Chart

If you try to run Triton on your Kubernetes cluster, be it with the Helm chart from the GitHub repo, or NVIDIA NGC, you'll notice that it (at least at the time of writing):

Here're a few hints how you can get it to work in an initial, POC kind of way.

Fix The Helm Chart

Apart from operability tweaks, there's an issue with the securityContext section in the deployment manifest file. The quick fix is easy: just create a securityContext entry directly under the spec line, instead of as part of the container block. Move the fsGroup line there. You can see an example in the official k8s docs.

Use A Newer Docker Image

You can find a list of available Triton Inference Server Docker images over here (there might be a newer one, take a look at the table or the left-hand side menu). The table shows which image versions contain which Triton Inference Server versions and other dependencies there as well.

You can also see a list of image tags over here at NVIDIA NGC. Copy the pull command for your desired version and pass it to the Helm chart.

Here's how a pull command looks like for the above version:

docker pull nvcr.io/nvidia/tritonserver:20.10-py3

Maybe: Backwards Incompatible Changes

Depending on whether the Triton Inference Server Helm chart is as old as it is right now (the one from ngc seems to be a bit older), you'll need to do the following two things:

  1. Adjust the command which the container executes as defined in the args: block. Make sure to change the trtserver to tritonserver.

  2. Adjust the urls for the liveness and readiness probes to point to /v2/... instead of /api/... as well.

That's It!

Now you should be able to run a first POC Helm installation on your k8s cluster. It't not perfect, but it should be good enough to get started.

If you don't have a well-tested model repository, here are instruction on where to get one going quickly.

Hi! I'm Vladislav, I help companies deploy GPU-heavy AI products to Kubernetes.