Our customers trust connected Azure AI infrastructure to make innovative AI-driven solutions, which is wherefore we are delivering caller cloud-based AI-supercomputing clusters built with Azure ND H200 v5 bid virtual machines (VMs) today.
The request for scalable and high-performance infrastructure continues to turn exponentially arsenic the AI scenery advances. Our customers trust connected Azure AI infrastructure to make innovative AI-driven solutions, which is wherefore we are delivering caller cloud-based AI-supercomputing clusters built with Azure ND H200 v5 bid virtual machines (VMs) today. These VMs are present mostly disposable and person been tailored to grip the increasing complexity of precocious AI workloads, from foundational exemplary grooming to generative inferencing. The scale, ratio and enhanced show of our ND H200 v5 VMs are already driving adoption from customers and Microsoft AI services specified arsenic Azure Machine Learning and Azure OpenAI Service.
“We’re excited to follow Azure’s caller H200 VMs. We’ve seen that H200 offers improved show with minimal porting effort, we are looking guardant to utilizing these VMs to accelerate our research, amended the ChatGPT experience, and further our mission.” —Trevor Cai, caput of infrastructure, OpenAI.
The Azure ND H200 v5 VMs are architected with Microsoft’s systems attack to heighten ratio and performance, and diagnostic 8 NVIDIA H200 Tensor Core GPUs. Specifically, they code the spread owed to GPUs increasing successful earthy computational capableness astatine a overmuch faster complaint than the attached representation and representation bandwidth. The Azure ND H200 v5 bid VMs present a 76% summation successful High Bandwidth Memory (HBM) to 141GB and a 43% summation successful HBM Bandwidth to 4.8 TB/s implicit the erstwhile procreation of Azure ND H100 v5 VMs. This summation successful HBM bandwidth enables GPUs to entree exemplary parameters faster, helping trim wide exertion latency, which is simply a captious metric for real-time applications specified arsenic interactive agents. The ND H200 V5 VMs tin besides accommodate much analyzable Large Language Models (LLMs) wrong the representation of a azygous VM, improving show by helping users debar the overhead of moving distributed jobs implicit aggregate VMs.
The plan of our H200 supercomputing clusters besides enables much businesslike absorption of GPU representation for exemplary weights, key-value cache, and batch sizes, each of which straight interaction throughput, latency and cost-efficiency successful LLM-based generative AI inference workloads. With its larger HBM capacity, the ND H200 v5 VM tin enactment higher batch sizes, driving amended GPU utilization and throughput compared to ND H100 v5 bid for inference workloads connected some tiny connection models (SLMs) and LLMs. In aboriginal tests, we observed up to 35% throughput summation with ND H200 v5 VMs compared to the ND H100 v5 bid for inference workloads moving the LLAMA 3.1 405B exemplary (with satellite size 8, input magnitude 128, output magnitude 8, and maximum batch sizes – 32 for H100 and 96 for H200). For much details connected Azure’s precocious show computing benchmarks, delight read much here oregon sojourn our AI Benchmarking Guide connected the Azure GitHub repository for much details.
The ND H200 v5 VMs travel pre-integrated with Azure Batch, Azure Kubernetes Service, Azure OpenAI Service and Azure Machine Learning to assistance businesses get started close away. Please sojourn present for more elaborate method documentation of the caller Azure ND H200 v5 VMs.