Accelerating industry-wide innovations in datacenter infrastructure and security

2 months ago 14
News Banner

Looking for an Interim or Fractional CTO to support your business?

Read more

Microsoft drives innovation and contributes to the broader AI and datacenter community, benefitting the full industry.

To furnish the unreality infrastructure indispensable to present successful the epoch of AI, accelerated technological translation has ne'er been much important than it is today. To present for our customers portion moving innovation forward, we tin larn from technological shifts of the past and spot the captious relation of community-led innovation and manufacture standardization. For the past decade, Microsoft has driven this benignant of heavy collaboration done cross-industry organizations similar Open Compute Project (OCP). As a result, we proceed to beforehand hardware innovation astatine each furniture of the computing stack from server and rack architecture, networking and storage, and reliability, availability, and serviceability (RAS) designs to caller proviso concatenation appraisal frameworks that guarantee security,1 sustainability,2 and reliability3 crossed the unreality worth chain.

As we proceed to innovate successful the epoch of AI, we are excited to instrumentality to the OCP Global Summit this twelvemonth with much contributions to enactment ecosystem innovation from caller powerfulness and cooling solutions that code the changing illustration of AI datacenters to caller hardware information frameworks that enactment spot and resiliency astatine the halfway of our infrastructure for accelerated computing.

Evolving datacenter cooling with modular systems designed for planetary deployability

As AI demands grow, we are reimagining our datacenters with a absorption connected expanding rack density and enhancing cooling efficiency. Last fall, erstwhile we announced the Azure Maia 100 system, we besides introduced a dedicated liquid cooling “sidekick”, a closed-loop plan that uses recirculated fluid to trim heat. We’ve continued down the way of cooling innovation since then, moving with partners to make caller datacenter cooling techniques that tin lick for increasing AI powerfulness profiles portion addressing easiness of deployability. We’re pleased to beryllium contributing the designs for an precocious liquid cooling vigor exchanger portion to OCP truthful that the full assemblage tin payment from learnings successful liquid cooling and support the gait of innovation to accommodate rapidly evolving AI systems. For much information, work the Tech Community blog.

Disaggregated powerfulness architectures for next-generation systems

The improvement of AI systems has besides driven accrued powerfulness densities successful hyperscale datacenters. As these systems grow, we person uncovered caller opportunities for flexibility and modularity successful strategy design. While compute and retention systems for unreality typically person powerfulness density beneath 20 kW, AI systems has driven powerfulness densities to hundreds of kW. We are solving the accrued powerfulness infrastructure demands successful the property of AI with Mt. Diablo, our latest collaboration with Meta. This is simply a caller disaggregated rack plan to code captious abstraction and powerfulness constraints. The solution features a disaggregated 400 High Voltage Direct Current (VDC) portion that scales from hundreds of kW up to 1MW, enabling 15% to 35% much AI accelerators successful each server rack. This modular attack allows for powerfulness adjustments successful the disaggregated powerfulness rack to conscionable the changing demands of antithetic inferencing and grooming SKUs. We are excited to proceed our engineering collaboration with Meta connected this publication to the OCP community. Read the Tech Community blog to larn more.

Advancing a unafraid AI aboriginal with caller confidential computing solutions

Last month, Microsoft elaborate our imaginativeness for Trustworthy AI and Azure Confidential Inferencing, wherever information is rooted successful hardware-based Trusted Execution Environments (TEEs) and transparency of the Confidential Trust Boundary. Today, we grow connected this imaginativeness with caller open-source silicon innovation of the Adams Bridge quantum resilient accelerator and its integration into Caliptra 2.0, the adjacent procreation open-source silicon basal of spot (RoT).

The increasing capabilities of quantum computers contiguous challenges to hardware security, arsenic classical asymmetric cryptographic algorithms utilized pervasively passim hardware information tin beryllium easy defeated by a almighty capable quantum computer. In recognizing this risk, the National Institute of Standards and Technology (NIST) has published standards for the caller quantum resilient algorithms.

These caller quantum resilient algorithms are importantly antithetic from their classical counterparts. Hardware instrumentality manufacturers request to wage contiguous attraction to these changes arsenic they interaction foundational hardware information capabilities specified arsenic immutable root-of-trust anchors for some codification integrity and hardware identity. Currently, the challenges facing silicon components are much important than for software, owed to longer improvement times and the immutability of hardware. Therefore, contiguous enactment is needed for caller hardware designs.

As portion of Microsoft’s committedness to our Secure Future Initiative (SFI), and to accelerate the adoption of quantum resilient algorithms, Microsoft and the Caliptra consortium are open-sourcing Adams Bridge, a caller silicon artifact for accelerating quantum resilient cryptography. For much accusation astir Adams Bridge, and however we marque our aboriginal quantum safe, delight sojourn the Tech Community blog.

In summation to Caliptra 2.0 and Adams Bridge, Microsoft is taking further steps to beforehand information successful hardware proviso chains with OCP-SAFE (OCP Security Appraisal Framework Evaluation) initiative. Co-founded by Microsoft, OCP-SAFE calls for systematic and accordant information audits connected hardware and firmware. Combined with Caliptra, OCP-SAFE advances transparency and information assurance successful the way towards hardware Supply Chain Integrity, Transparency, and Trust (SCITT). Read the Tech Community blog for much information.

Bottlenecks to breakthroughs: Optimizations astatine each furniture successful the epoch of AI

For the past fewer years, Microsoft has been connected this travel to grow our supercomputing scale, enabling individuals and organizations each implicit the satellite to reap the benefits of generative AI crossed domains, from acquisition to healthcare to concern and beyond. Along the way, we’ve continued to germinate and heighten our infrastructure, gathering immoderate of the world’s largest supercomputers with our increasing fleet of high-performance accelerators for AI workloads of each shapes and sizes. As we’ve encountered expanding demands for AI innovation, we’ve unlocked show improvements and efficiencies done system-level optimizations, galore of which person been contributed backmost to the open-source community.

Through the improvement of our ain customized silicon and strategy with Azure Maia, we’ve invested successful show per watt ratio done algorithmic codesign of hardware and software. We invested successful debased precision mathematics to execute this done an aboriginal implementation of the MX information format, a modular we contributed to OCP done our enactment of the Microscaling (MX) Alliance unneurotic with AMD, Arm, Intel, Qualcomm, Meta, Microsoft, and NVIDIA.

Next, we tackled the situation of scaling and wide deployment with our liquid-cooled server design. This innovation ensures that our datacenters worldwide tin utilize this technology, contributing the plan to the manufacture to alteration broader adoption.

Finally, we recognized that accepted Ethernet was not built for AI show and scaling. By making important contributions to the Ultra Ethernet Consortium (UEC), we person extended Ethernet into a cloth susceptible of delivering the indispensable performance, scalability, and reliability for AI applications.

Through these efforts, Microsoft continues to thrust innovation and lend to the broader AI and datacenter community, ensuring that our advancements payment the full industry.

We invited attendees of this year’s OCP Global Summit to sojourn Microsoft astatine booth #B35 to research our latest unreality hardware demonstrations featuring contributions with partners successful the OCP community.

Connect with Microsoft astatine the OCP Global Summit 2024 and beyond:


1Delivering consistency and transparency for unreality hardware security, Rani Borkar. October 18, 2022.

2Learn however Microsoft Azure is accelerating hardware innovations for a sustainable future, Zaid Kahn. November 9, 2021.

3Fostering AI infrastructure advancements done standardization, Rani Borkar and Reynold D’Sa. October 17, 2023.

Read Entire Article