Deep Learning Hardware

Get an NVIDIA DGX A100 for just $199.000 USD. You're good to go!

Honestly. Oh my. What a piece of hardware. Wow.

But seriously, here's my current understanding of what information regarding hardware for deep learning seems helpful:

There's an awesome (awe inspiring, truly) article by Tim Dettmers - he goes into a lot of depth around deep learning GPUs. Questions like "when do you need more than 11GB of memory?" or "is it worth it to upgrade your RTX 20 cards to RTX 30 ones". There are also some TL;DR advice at the bottom if you need to get a quick overview.

I also found Reddit Reddit discussions useful to see what opinions and viewpoints exist around this topic.

Lambda Labs also has a helpful article comparing different NVIDIA RTX 30 cards - I feel like there's a lot to learn from it for me, not only about the hardware but what other important considerations exist when approaching this choice.

Of course, when talking hardware there's also another option looming: does it make sense to use a GPU cloud offering for your particular usecase? I found this comment to be a good starting point.

That's about how much I know about deep learning hardware at this point. I guess the needs of a casual practitioner, a high-throughput AI research lab and a productized service working within stricter economic and SLO constraints are different though. I wouldn't build a multi-GPU machine for my own AI learning and experimenting, but that would make sense for a team. It doesn't seem to make much sense to go to the cloud for a non-scale-requiring application, but there are scenarios where building a k8s cluster for your product and using cloud GPUs is a very good decision.

Hi! I'm Vladislav, I help companies deploy GPU-heavy AI products to Kubernetes.