Running Triton With An Example Model Repository

Getting Triton to run on Kubernetes requires a bit of fiddling at the moment. Providing a model repository is one more step and one more possible source of errors.

When creating a first POC Kubernetes Triton Inference Server installation, you're better off taking small steps and getting things right one by one.

Here's a quick way to get an example model repository going without much effort and without risking annoying mistakes.

One Weird Trick

Instead of reading the docs and trying to get the expected format right, head over to docs/examples in the Triton GitHub repository.

Clone it locally, and run ./fetch_models.sh in that directory. That's it! Your example model repository is almost good to go.

All that's left to do, is to upload it to S3, Google Cloud Storage (as described in the docs). If you're on Azure and lucky, maybe support for the Azure storage filesystem PR was merged already and you can use that. If not, baking the example model repository into a temporary Docker image is a valid approach for a POC.

Depending on the place you've chosen for the model repository, you'll need to adjust the second part of the args: array in the deployment.yaml manifest file by specifying the value of the image.modelRepositoryPath variable.

What Now?

You might want to adjust the Helm chart to work with a newer version of Triton if you're deploying to Kubernetes. You can read more about it here.

Hi! I'm Vladislav, I help companies deploy GPU-heavy AI products to Kubernetes.