HPC on Azure: A Quick Guide

High-Performance Computing (HPC) enables you to leverage cloud-based supercomputing with the scalability of the cloud. Most major cloud vendors offer HPC services, including Azure, Google, and AWS. In this article, you’ll learn what HPC is and how it works on Azure. You’ll also learn how to deploy and manage Azure-based HPC.

 

What Is HPC?

High-Performance Computing (HPC) uses aggregated processing power to perform computations. HPC systems are composed of nodes (computer clusters) that use parallel processing with distributed workloads. This parallel, distributed design enables you to perform analyses and solve complex problems that you cannot with traditional computers.

While traditional computers usually only have one or two Central Processing Units (CPUs), HPC systems typically 16 to 64 nodes, each with two or more CPUs. Also, while traditional computers only have one set of memory and storage, HPC nodes each contribute storage and memory. This drastically increases the amount of data you can store and process.

In addition to CPUs, Graphics Processing Units (GPUs) are common in HPC systems. GPUs are designed for specific tasks, unlike CPUs which are multipurpose. This specialization enables GPUs to perform that specific task faster than a CPU can. HPC systems use GPUs as co-processors to boost computational performance. When GPUs and CPUs are combined, it’s called hybrid computing.

HPC systems can provide organizations with smaller budgets access to some of the computing power that supercomputers can provide. Although supercomputers are more powerful, these machines can cost $20 million or more. This price makes them accessible only to the largest enterprises.

Typical use cases of HPC systems include:

  • Financial services—supports automated trading, analysis of stock trends, and identifying fraud.
  • Research labs—supports drug research and development, modeling of physical properties like fluid dynamics, and training of complex AI algorithms.
  • Oil and gas companies—supports the identification of natural resource stores and can be used to optimize the productivity of existing operations.
  • Media and entertainment—supports special effects rendering, film editing, and live streaming. 

HPC is commonly created with Linux systems, although Windows machines can also be used. These machines can be either physical or virtual and it is becoming common for HPC systems to be built on cloud infrastructures. Currently, all major cloud vendors provide support and resources for HPC deployments, including services for managing workloads and orchestration. There are also a number of third-party integrations supported by vendors to help with the deployment and migration of HPC workloads.

 

HPC on Azure

Cloud-based HPC enables you to scale resources on demand. This enables you to maximize cost efficiency by only provisioning resources when needed. When you create a workload, you can provision as many cores as you need and as tasks are completed, you can drop resources. Flexible management of resources enables you to optimize your spending and gain access to computing power that might otherwise be unaffordable.

Azure HPC deployments are well-suited to:

  • High volumes of small tasks
  • Complex, lengthy problems
  • Memory-intensive simulations
  • Simulations that require parallel processing

HPC Deployments in Azure

You can deploy HPC in Azure entirely in the cloud or as a hybrid deployment. Hybrid deployments enable you to simultaneously run workloads on-premise and in the cloud. You can also use Azure HPC deployments to provide burst capacity for workloads that exceed your on-premise capabilities.

Azure HPC enables you to speed processing time by running workloads on numerous, parallel resources. You also have the option of using high-speed interconnects to boost processing for workloads with tightly coupled tasks that cannot run in parallel. Azure offers both Remote Direct Memory Access (RDMA) and Infiniband connections to ensure low-latency connections between nodes.

When configuring nodes in Azure, you can select from GPU-enabled or compute-intensive virtual machines (VMs). You should make this decision based on a combination of budget and computational needs.

HPC deployments in Azure typically include:

  • HPC head node—serves as a central managing server for your cluster. You use the head node to manage and schedule tasks to your worker nodes.
  • Virtual Network—connects your head node to your worker nodes and storage services. This is a private network in which you control the infrastructure, traffic between subnets, and can assign IP addresses and DNS servers. Your connections are secured through ExpressRoute or IPsec VPN.
  • Virtual Machine Scale Sets—enables you to create thousands of VMs. This service includes features for autoscaling and load balancing and supports deployment across availability zones. On Scale Sets VMs, you can run a variety of databases including MongoDB, Cassandra, and Hadoop.
  • Storage—you can use any available storage service, including disk, file, or blob storage. Hybrid storage and Data Lake solutions are also available. The Avere vFXT service can help you improve storage performance. It includes an intelligent cache feature that you can use to run low-latency file system workloads.
  • Azure Resource Manager—enables you to deploy your applications using script files or templates.

Managing HPC Deployments in Azure

Azure provides several services to help you manage your HPC deployments. These services are native, integrate fully with additional Azure services, and include Microsoft support.

Azure Batch

Azure Batch is a managed HPC application service. It manages VM provisioning, task assignment and runtime, and monitoring. When using Batch, you are responsible for configuring your VM pool and uploading your workloads. You can also use Batch to set policies for job scheduling and autoscale your deployments.

Microsoft HPC Pack

HPC Pack is a service that helps you manage VM clusters and enables you to monitor and schedule jobs. It is useful for moving on-premise workloads to your cloud deployment or creating a hybrid deployment.

Azure CycleCloud

Azure CycleCloud is a service you can use to manage and orchestrate workloads, set access controls with Active Directory, and create governance policies for clusters. You can use it with several popular scheduling tools, including HPC Pack, Slurm, Symphony, and Grid Engine.

 

Conclusion

HPC can be a valuable asset for workloads that require fast and scalable parallel processing. Azure is known for its enterprise-level services for cloud, on-premise, and hybrid environments. Azure’s HPC offerings include a balanced level of automation processes and manual controls, which ensure that you can easily manage the operation while maintaining a cost-effective budget.

 

Author Bio

Gilad David Maayan author image

Gilad David Maayan is a technology writer who has worked with over 150 technology companies including SAP, Samsung NEXT, NetApp and Imperva, producing technical and thought leadership content that elucidates technical solutions for developers and IT leadership.

LinkedIn: https://www.linkedin.com/in/giladdavidmaayan/