High performance computing (HPC) has become an indispensable tool for scientific discovery, engineering innovation, and business transformation. However, deploying and managing HPC environments can be complex and time-consuming, often requiring specialized expertise.
To address these challenges, Google Cloud has developed the Cloud HPC Toolkit, an open-source toolkit that simplifies the deployment and management of HPC workloads and environments on Google Cloud. The toolkit uses the infrastructure-as-code (IaC) approach, where the environments are described in human-readable YAML blueprints. It also provides a set of tools that automate many of the tasks involved in setting up and managing HPC clusters, making it easier for users to get started with HPC on Google Cloud.
Our customers, and Google teams, have been using the Toolkit to provision a diverse set of HPC environments: from a simple auto-scaling HPC environment with the Slurm scheduler, to elaborate HPC clusters serving diverse workloads for entire organizations. From HPC environments that leverage Google Batch or Google Kubernetes Engine (GKE), to environments that leverage partner technology available on Google Cloud. For AI/ML workloads, we’ve used the Cloud HPC Toolkit to provision tailored GPU-based HPC environments with NVIDIA GPUs ready to fine-tune and train AI/ML models.
The Cloud HPC Toolkit Blueprints Catalog
The Cloud HPC Toolkit Blueprints Catalog was built to make it easy for anybody to leverage a wide range of HPC environments as a starting point. The Blueprints Catalog provides an easy-to-understand breakdown of key information and technology used in any given blueprint, and allows you to filter down to a blueprint that meets your needs.
A collection of pre-configured blueprints that provide everything to deploy common HPC workloads, the Cloud HPC Blueprint Catalog makes it easy to get started with HPC on Google Cloud. These blueprints serve as templates with best practices and configurations for various HPC scenarios, simplifying the deployment process. The Cloud HPC Toolkit takes these blueprints as input and provisions the respective infrastructure in the cloud.
With the Cloud HPC Toolkit Blueprints Catalog, you can quickly deploy and configure HPC environments tailored to your specific needs, eliminating the need to manually set up and configure each component from scratch. They are also extensible for user-specific requirements, such as the installation of a specific software. This not only saves time and effort, but also reduces the risk of errors and inconsistencies in the deployment process.
New blueprints for diverse use cases
The Cloud HPC Toolkit’s new Blueprint Catalog now includes a new set of use-case blueprints that are tailored to specific industries and applications, in addition to the general-purpose and partner-oriented blueprints. These blueprints provide a more streamlined and optimized starting point for deploying HPC workloads in specific domains.
The Blueprint Catalog includes blueprints for deploying clusters based on popular HPC schedulers, such as Slurm, HTCondor, and PBS Pro, which are responsible for managing and allocating resources within an HPC cluster. The catalog also includes blueprints that feature common storage options, such as Filestore, Google Cloud Storage FUSE, or DDN EXAScaler. These blueprints provide a ready-to-use setup for popular HPC schedulers and storage solutions, ensuring optimal resource utilization and simple, reliable deployment.
As part of our HPC for Life Sciences solution, we provide pre-configured blueprints for running genomics and drug discovery workloads such as GROMACS. The Computer Aided Engineering solution (see diagram above) is tailored to running simulations and design optimization tasks, and includes blueprints for popular CAE applications, including Siemens Star-CCM+, OpenFOAM, and ANSYS Fluent. The Weather Forecasting solution is optimized for climate models and low-latency, tightly coupled workloads, and includes a blueprint for WRFV3. These blueprints provide a pre-configured environment for running these applications, including the necessary software dependencies and optimized settings.
For Machine Learning workloads, we support the latest GPU machine types, and provide general-purpose, ML-enabled environments on Compute Engine with Slurm, including support for GPUs and TPUs, as well as Google Kubernetes Engine. We also provide a Quantum Computing simulation blueprint that uses QSim.
We’re continuously expanding the Blueprint Catalog to encompass a growing range of HPC scenarios, ensuring that users have access to the latest and most effective configurations for their workloads. This ongoing development highlights the commitment of the Cloud HPC Toolkit to provide users with a comprehensive and up-to-date toolkit for HPC deployments.
Recent Cloud HPC Toolkit improvements
In addition to the new use-case blueprint catalog and new blueprints, we keep improving the Cloud HPC Toolkit itself. In recent releases, we have enhanced the toolkit with new features and improvements, including:
- Support for H3 and A3 VMs
- Support for Shielded VMs
- Spack support, public build cache, and module redesign
- Improved error messages and handling
- Improved Chrome Remote Desktop support
- Most recent DDN EXAScaler support
- Improved support for HTCondor
- Most recent Slurm on Google Cloud support
- Native Filestore and Google Cloud Storage support for GKE
In short, the Cloud HPC Toolkit is a powerful tool that can help organizations of all sizes deploy and manage HPC workloads on Google Cloud. The Cloud HPC Toolkit's Blueprint Catalog, along with the recent enhancements, make it easier to get started with HPC on Google Cloud.
You can read more about using the Cloud HPC Toolkit in the HPC Toolkit documentation, including our quickstart guides. You can start exploring the code by checking out our Github repo. We would love to hear how the Cloud HPC toolkit is working for you through our support channels. You can read more about Google Cloud’s HPC solutions on our HPC Solution page, and contact us to learn more.