Deep learning is a subset of machine learning that does not rely on structured data to develop accurate predictive models. This method uses networks of algorithms modeled after neural networks in the brain to distill and correlate large amounts of data. The more data you feed your network, the more accurate the model becomes.
You can functionally train deep learning models using sequential processing methods. However, the amount of data needed and the length of data processing make it impractical if not impossible to train models without parallel processing. Parallel processing enables multiple data objects to be processed at the same time, drastically reducing training time. This parallel processing is typically accomplished through the use of graphical processing units (GPUs).
GPUs are specialized processors created to work in parallel. These units can provide significant advantages over traditional CPUs, including up to 10x more speed. Typically, multiple GPUs are built into a system in addition to CPUs. While the CPUs can handle more complex or general tasks, the GPUs can handle specific, highly repetitive processing tasks.
This is part of an extensive series of guides about machine learning.
In this article, you will learn:
Also refer to our other detailed guides about:
Once multiple GPUs are added to your systems, you need to build parallelism into your deep learning processes. There are two main methods to add parallelism—models and data.
Model parallelism is a method you can use when your parameters are too large for your memory constraints. Using this method, you split your model training processes across multiple GPUs and perform each process in parallel (as illustrated in the image below) or in series. Model parallelism uses the same dataset for each portion of your model and requires synchronizing data between the splits.
Data parallelism is a method that uses duplicates of your model across GPUs. This method is useful when the batch size used by your model is too large to fit on a single machine, or when you want to speed up the training process. With data parallelism, each copy of your model is trained on a subset of your dataset simultaneously. Once done, the results of the models are combined and training continues as normal.
TensorFlow is an open source framework, created by Google, that you can use to perform machine learning operations. The library includes a variety of machine learning and deep learning algorithms and models that you can use as a base for your training. It also includes built-in methods for distributed training using GPUs.
Through the API, you can use the tf.distribute.Strategy method to distribute your operations across GPUs, TPUs or machines. This method enables you to create and support multiple user segments and to switch between distributed strategies easily.
Two additional strategies that extend the distribute method are MirroredStrategy and TPUStrategy. Both of these enable you to distribute your workloads, the former across multiple GPUs and the latter across multiple Tensor Processing Units (TPUs). TPUs are units available through Google Cloud Platform that are specifically optimized for training with TensorFlow.
Both of these methods use roughly the same data-parallel process, summarized as follows:
Learn more in our guide to TensorFlow multiple GPU and Keras multiple GPU
PyTorch is an open source scientific computing framework based on Python. You can use it to train machine learning models using tensor computations and GPUs. This framework supports distributed training through the torch.distributed backend.
With PyTorch, there are three parallelism (or distribution) classes that you can perform with GPUs. These include:
There are three main deployment models you can use when implementing machine learning operations that use multiple GPUs. The model you use depends on where your resources are hosted and the size of your operations.
GPU servers are servers that incorporate GPUs in combination with one or more CPUs. When workloads are assigned to these servers, the CPUs act as a central management hub for the GPUs, distributing tasks and collecting outputs as available.
GPU clusters are computing clusters with nodes that contain one or more GPUs. These clusters can be formed from duplicates of the same GPU (homogeneous) or from different GPUs (heterogeneous). Each node in a cluster is connected via an interconnect to enable the transmission of data.
Kubernetes is an open source platform you can use to orchestrate and automate container deployments. This platform offers support for the use of GPUs in clusters to enable workload acceleration, including for deep learning.
When using GPUs with Kubernetes, you can deploy heterogeneous clusters and specify your resources, such as memory requirements. You can also monitor these clusters to ensure reliable performance and optimize GPU utilization. Learn about Kubernetes architecture and how it can be used to support Deep Learning.
Run:AI automates resource management and workload orchestration for machine learning infrastructure. With Run:AI, you can automatically run as many deep learning experiments as needed on multi-GPU infrastructure.
Here are some of the capabilities you gain when using Run:AI:
Run:AI simplifies machine learning infrastructure pipelines, helping data scientists accelerate their productivity and the quality of their models.
Learn more about the Run:AI GPU virtualization platform.
Check out the following articles to learn more about working with multi GPU infrastructure:
Tensorflow with Multiple GPUs: Strategies and Tutorials
TensorFlow is one of the most popular frameworks for machine learning and deep learning training. It includes a range of built-in functionalities and tools to help you train efficiently, including providing methods for distributed training with GPUs.
In this article you’ll learn what TensorFlow is and how you can perform distributed training with TensorFlow methods. You’ll also see two brief tutorials that show how to use TensorFlow distributed with estimators and Horovod.
Read more: Tensorflow with Multiple GPUs: How to Perform Distributed Training
Keras Multi GPU: A Practical Guide
Keras is a deep learning API you can use to perform fast distributed training with multi GPU. Distributed training with GPUs enable you to perform training tasks in parallel, thus distributing your model training tasks over multiple resources. You can do that via model parallelism or via data parallelism. This article explains how Keras multi GPU works and examines tips for managing the limitations of multi GPU training with Keras.
Learn the basics of distributed training, how to use Keras Multi GPU, and tips for managing the limitations of Keras with multiple GPUs.
Read more: Keras Multi GPU: A Practical Guide
PyTorch Multi GPU: 4 Techniques Explained
PyTorch provides a Python-based library package and a deep learning platform for scientific computing tasks. Learn four techniques you can use to accelerate tensor computations with PyTorch multi GPU techniques—data parallelism, distributed data parallelism, model parallelism, and elastic training.
Learn how to accelerate deep learning tensor computations with 3 multi GPU techniques—data parallelism, distributed data parallelism and model parallelism.
Read more: PyTorch Multi GPU: 4 Techniques Explained
How to Build Your GPU Cluster: Process and Hardware Options
A GPU cluster is a group of computers that have a graphics processing unit (GPU) on every node. Multiple GPUs provide accelerated computing power for specific computational tasks, such as image and video processing and training neural networks and other machine learning algorithms.
Learn how to build a GPU cluster for AI/ML research, and discover hardware options including data center grade GPUs and massive scale GPU servers.
Read more: How to Build Your GPU Cluster: Process and Hardware Options
Kubernetes GPU: Scheduling GPUs On-Premises or on EKS, GKE, and AKS
Kubernetes is a highly popular container orchestrator, which can be deployed on-premises, in the cloud, and in hybrid environments.
Learn how to schedule GPU resources with Kubernetes, which now supports NVIDIA and AMD GPUs. Self-host Kubernetes GPUs or tap into GPU resources on cloud-based managed Kubernetes services.
Read more: Kubernetes GPU: Scheduling GPUs On-Premises or on EKS, GKE, and AKS
GPU Scheduling: What are the Options?
A graphics processing unit (GPU) is an electronic chip that renders graphics by quickly performing mathematical calculations. GPUs use parallel processing to enable several processors to handle different parts of one task.
Learn the challenges of GPU scheduling and how to schedule workloads on GPUs with Kubernetes, Hashicorp Nomad, and Microsoft Windows 10 DirectX.
Read more: GPU Scheduling: What are the Options?
CPU vs GPU: Architecture, Pros and Cons, and Special Use Cases
A graphics processing unit (GPU) is a computer processor that performs rapid calculations to render images and graphics. A CPU is a processor consisting of logic gates that handle the low-level instructions in a computer system.
Learn about CPU vs GPU architecture, pros and cons, and using CPUs/GPUs for special use cases like machine learning and high performance computing (HPC).
Read more: CPU vs GPU: Architecture, Pros and Cons, and Special Use Cases
Automate Hyperparameter Tuning Across Multiple GPU
In this post, we will review how hyperparameters and hyperparameter tuning plays an important role in the design and training of machine learning networks. Choosing the optimal hyperparameter values directly influences the architecture and quality of the model. This crucial process also happens to be one of the most difficult, tedious, and complicated tasks in machine learning training.
Read more: Automate Hyperparameter Tuning Across Multiple GPU
Want to learn more about Machine Learning Operations?
Machine learning operations (MLops) are the pipelines and practices that enable you to plan and perform machine learning training and implementation. These operations directly impact the effectiveness of your models and your time to market.
To learn more, have a look at these articles:
Want to learn more about Deep Learning GPUs?
Graphical processing units (GPUs) are an essential component of fast, efficient deep learning processes. Deep learning GPUs enable you to significantly speed training and operation times, allowing for innovative experiments and applications.
To learn more, have a look at these articles:
Together with our content partners, we have authored in-depth guides on several other topics that can also be useful as you explore the world of AI Technology.