Why does Machine Learning require GPUs?

One of the biggest breakthroughs that is behind the current Artificial Intelligence boom is related to GPUs — Graphics Processing Units. The things that allow you to run graphics-intensive video games on your laptop.

Machine Learning workflows, like training and inference, require a LOT of processing power. For example, Alpaca has 354 million parameters. Chat GPT has something like 200 billion at the time of writing this. Bloomberg’s model has like 55 billion at the time of writing this. That’s a lot of data to run through. What if we could do it parallel?

That’s essentially what we use GPUs for in ML — running training and inference workloads at a massive scale of parallelization.

So why GPUs not CPUs?

GPUs are not categorically better at everything than CPUs, they both have things that they are good at. GPUs are GREEAT for parallel computing. Why? CPUs typically have 4, 8, or 16 cores, while GPUs can have thousands. Literally 3000-4000 cores sometimes.

It just so happens that neural network workloads are embarassingly parallel. That is to say, neural network workloads are already, just by the nature of their structure, highly suitable for parallelization. They can easily be broken down into parallelizable task. So, GPUs + embarassingly parallel.

What is a “TPU”?

The industry may be slowly transitioning to using something called TPUs for processing machine learning workloads. TPU stands for Tensor Processing Unit — it’s like a GPU but specifically for processing ML tensors. TPUs is actually originating from Google as a hardware architecture specifically optimized for the tensor operations at the core of machine learning. More specifically, TPUs are optimized for matrix multiplication operations and parallel computation.

Wtf is “CUDA”?

The industry-leading manufacturer of GPUs is a company called Nvidia. They make GPUs that are SO high in demand, basically everyone wants one, it’s difficult to even get cloud compute resource access to NVIDIA GPUs.

CUDA (Compute Unified Device Architecture) is a parallel computing platform and programming model developed by NVIDIA for using GPUs for distributed parallel computing. It’s basically a C++ API layer on top of GPUs that allows other languages (such as python) to utilize the underlying GPU compute resources. Without this, we wouldn’t be able to run stuff on GPUs.

Here’s a ASCII chart of the arch:


                                                                                                                               
 +---------------------------------------------------------------------------------------------------+                         
 |                                                                                                   |                         
 |                                                                                                   |                         
 |                                                                                                   |                         
 |                                       python 🐍                                                   |                         
 |                                                                                                   |                         
 |                                                                                                   |                         
 +---------------------------------------------------------------------------------------------------+                         
                                               /--- \                                                                          
                                         /-----      -\                                                                        
                                  /------              \                                                                       
                            /-----                      -\                                                                     
 +---------------------------------------+                -\                                                                   
 |             CUDA libs (C++)           |                  \                                                                  
 +---------------------------------------+                   -\                                                                
+------------------------------------------+                   \                                                               
|                                          |                    -\                                                             
|                                          |                      -\                                                           
|                CUDA (C++/GPU-Specific)   |                        \                                                          
|                                          |                         -\                                                        
+------------------------------------------+                           -                                                     
+------------------------------------------+        +--------------------------------+                                         
|                                          |        |                                |                                         
|                  GPU                     |        +--------------------------------+                                         
|                                          |        |            CPU                 |                                         
|                                          |        +--------------------------------+                                         
|                                          |        |                                |                                         
+------------------------------------------+        +--------------------------------+

Last modified: April 27, 2023