How to Optimize GPU Utilization (original) (raw)

For decades, CPUs have been the workhorse of the IT industry. Newer versions have added more processing power, but their basic architecture has remained unchanged. They still perform compute operations serially, completing one job before moving on to another.

More recently, the introduction of GPUs has completely redefined what enterprises can accomplish with their compute hardware. Unlike CPUs, GPUs perform parallel operations. This helps them provide the very high throughput needed to process massive datasets. AI model training is one example of a data-driven use case that’s well-suited to run on GPUs. So much of the AI progress we’ve seen over the last several years simply would not have been possible without GPUs.

As enterprises build their optimized AI strategies, they often worry about how to get the GPU capacity needed to enable that strategy. It’s a valid concern, but there are steps they can take to help them use GPUs to their full potential while avoiding bottlenecks caused by lack of availability:

Choose the right hardware for your use cases

Despite how transformative GPUs have become, CPUs still have a role to play in enterprise IT. It’s just that their role has evolved and become more specialized. While GPUs are best for high-throughput use cases, CPUs are well-suited for low-latency use cases.

Let’s consider modern manufacturing. It’s an industry that relies heavily on data, but how and where that data is created also matters. The data is created by many distributed sensors that work independently. These sensors are often found in facilities at the digital edge, located far away from centralized compute infrastructure.

Deploying commodity CPUs closer to the data sources can help keep latency low, thus ensuring the manufacturer always has access to timely insights. In addition, CPUs can help companies balance other priorities, such as keeping costs low and using energy efficiently.

From an AI standpoint, it’s true that model training relies on GPUs, due to the massive size of the datasets involved. Training workloads are also less sensitive to network latency when moving data prior to the start of the job. This means that even though GPUs are often deployed far away from data sources—in core data centers or in the public cloud—performance degradation caused by external network latency is generally less of an issue.

However, model training is only one aspect of the AI workflow, and it would be a mistake to think that AI can only be done on GPUs. In contrast to training workloads, AI inference workloads are highly sensitive to latency. Since they can be deployed quickly and cost efficiently in edge locations, CPUs may be the best choice for some inference workloads.

Hardware requirements can also differ based on how far along an enterprise is in their AI journey. An organization that’s still early in the training process will likely dedicate a lot of hardware resources to prepping data. A model’s accuracy depends on both the quantity and quality of data used to train it, and ensuring data quality requires prepping raw data. Unlike the training itself, data prep is typically something that CPUs can easily support.

In addition, businesses that are still in the proof-of-concept stage of their AI strategy often start with a smaller OEM solution that includes both CPU and GPU capabilities on the same platform. As they progress, they may decide to graduate to a larger dedicated GPU platform.

It’s also important for businesses to consider the balance between cloud GPUs and on-premises GPUs. Many companies look to the cloud to help them get started with GPUs, because they want to avoid the long deployment process for physical hardware. However, not all these companies stay in the cloud—at least not for all their workloads. Instead, they’ll often choose a hybrid model that allows them to capitalize on the benefits of on-premises GPUs, including cost savings, lower latency and better security controls.

There’s not always a binary right or wrong choice when it comes to selecting compute hardware for your different workloads. It’s about making thoughtful decisions to help you balance different priorities, optimize resources and put yourself in the best position to succeed with AI.

Pair the right hardware with the right infrastructure and services

Another key aspect of an effective AI hardware strategy is to make more efficient use of the GPUs you already have. Investing in the right network infrastructure to support your compute hardware can help you achieve that. Massive parallel processing jobs for AI training are often spread across multiple GPUs, and the processing itself is only one part of the job. The GPUs must also notify one another of their results and then synchronize those results across all the different GPUs.

The total job completion time includes both the GPU processing time and the network transmission time to notify and synchronize the results. Therefore, GPUs can be sensitive to both network latency and compute latency. Investing in better network infrastructure can remove performance bottlenecks and help keep GPU utilization rates high. This could help you use fewer GPUs to run the same training workloads.

Investing in your network can be helpful no matter how far along you are in your GPU strategy. Even if you don’t have any GPUs yet, you don’t have to wait to build a GPU-ready network. You can start optimizing your existing network today, and then add GPUs as you acquire them.

You also need to consider the power and cooling needs of your AI workloads, and how they could impact your choice of compute hardware. Legacy data centers likely won’t be able to support power-dense GPUs. To get the full value of your GPUs, you’ll need to host them in a modernized data center capable of meeting their needs for high power density.

In particular, it can help to work with a colocation provider that’s invested heavily in liquid cooling to provide the energy-efficient cooling capabilities needed to support high-performance hardware such as GPUs. Equinix has begun rolling out liquid cooling at 100 of our Equinix IBX® colocation data centers in 45 different metros.

Equinix and NVIDIA can help meet your AI hardware requirements

In addition to our work pioneering liquid cooling technology to support high-density hardware for AI, our partnership with NVIDIA prioritizes GPU availability for our mutual customers, enabling us to deploy within six months. Our joint offering—Equinix Private AI with NVIDIA DGX—makes it easy to get the GPU capacity you need, packaged with interconnection and managed services to keep those GPUs running at their full potential.

To learn more, read our Equinix Private AI with NVIDIA DGX solution brief. Or, watch the video below featuring Jon Lin, Equinix EVP and General Manager of Data Center Services, and Matt Hull, NVIDIA VP of Global AI Solutions.

You may also be interested in

Check out the ebook Unleash new possibilities with private AI to learn more about impactful private AI use cases with NVIDIA and Equinix.