Running A More Recent Triton Helm Chart
2020-10-30If you try to run Triton on your Kubernetes cluster, be it with the Helm chart from the GitHub repo, or NVIDIA NGC, you'll notice that it (at least at the time of writing):
- Doesn't work on newer k8s versions out of the box,
- Uses an old Docker image of the Triton Inference Server
Here're a few hints how you can get it to work in an initial, POC kind of way.
Fix The Helm Chart
Apart from operability tweaks, there's an issue with the securityContext section in the deployment manifest file. The quick fix is easy: just create a securityContext
entry directly under the spec
line, instead of as part of the container block. Move the fsGroup
line there. You can see an example in the official k8s docs.
Use A Newer Docker Image
You can find a list of available Triton Inference Server Docker images over here (there might be a newer one, take a look at the table or the left-hand side menu). The table shows which image versions contain which Triton Inference Server versions and other dependencies there as well.
You can also see a list of image tags over here at NVIDIA NGC. Copy the pull command for your desired version and pass it to the Helm chart.
Here's how a pull command looks like for the above version:
docker pull nvcr.io/nvidia/tritonserver:20.10-py3
Maybe: Backwards Incompatible Changes
Depending on whether the Triton Inference Server Helm chart is as old as it is right now (the one from ngc seems to be a bit older), you'll need to do the following two things:
-
Adjust the command which the container executes as defined in the
args:
block. Make sure to change thetrtserver
totritonserver
. -
Adjust the urls for the liveness and readiness probes to point to
/v2/...
instead of/api/...
as well.
That's It!
Now you should be able to run a first POC Helm installation on your k8s cluster. It't not perfect, but it should be good enough to get started.
If you don't have a well-tested model repository, here are instruction on where to get one going quickly.