Thursday, September 11, 2025
HomeArtificial IntelligenceSustainable by design: Innovating for power effectivity in AI, half 1

Sustainable by design: Innovating for power effectivity in AI, half 1


Be taught extra about how we’re making progress in the direction of our sustainability commitments via the Sustainable by design weblog collection, beginning with Sustainable by design: Advancing the sustainability of AI.

Earlier this summer time, my colleague Noelle Walsh printed a weblog detailing how we’re working to preserve water in our datacenter operations: Sustainable by design: Remodeling datacenter water effectivity, as a part of our dedication to our sustainability targets of turning into carbon damaging, water optimistic, zero waste, and defending biodiversity.

At Microsoft, we design, construct, and function cloud computing infrastructure spanning the entire stack, from datacenters to servers to customized silicon. This creates distinctive alternatives for orchestrating how the weather work collectively to reinforce each efficiency and effectivity. We think about the work to optimize energy and power effectivity a important path to assembly our pledge to be carbon damaging by 2030, alongside our work to advance carbon-free electrical energy and carbon removing.

Discover how we’re advancing the sustainability of AI

Discover our three areas of focus

The speedy development in demand for AI innovation to gasoline the following frontiers of discovery has offered us with a possibility to revamp our infrastructure programs, from datacenters to servers to silicon, with effectivity and sustainability on the forefront. Along with sourcing carbon-free electrical energy, we’re innovating at each stage of the stack to cut back the power depth and energy necessities of cloud and AI workloads. Even earlier than the electrons enter our datacenters, our groups are targeted on how we are able to maximize the compute energy we are able to generate from every kilowatt-hour (kWh) of electrical energy.

On this weblog, I’d prefer to share some examples of how we’re advancing the facility and power effectivity of AI. This features a whole-systems method to effectivity and making use of AI, particularly machine studying, to the administration of cloud and AI workloads.

Driving effectivity from datacenters to servers to silicon

Maximizing {hardware} utilization via sensible workload administration

True to our roots as a software program firm, one of many methods we drive energy effectivity inside our datacenters is thru software program that permits workload scheduling in actual time, so we are able to maximize the utilization of current {hardware} to satisfy cloud service demand. For instance, we’d see larger demand when persons are beginning their workday in a single a part of the world, and decrease demand throughout the globe the place others are winding down for the night. In lots of instances, we are able to align availability for inner useful resource wants, comparable to operating AI coaching workloads throughout off-peak hours, utilizing current {hardware} that might in any other case be idle throughout that timeframe. This additionally helps us enhance energy utilization.

We use the facility of software program to drive power effectivity at each stage of the infrastructure stack, from datacenters to servers to silicon.

Traditionally throughout the trade, executing AI and cloud computing workloads has relied on assigning central processing models (CPUs), graphics processing models (GPUs), and processing energy to every crew or workload, delivering a CPU and GPU utilization price of round 50% to 60%. This leaves some CPUs and GPUs with underutilized capability, potential capability that might ideally be harnessed for different workloads. To deal with the utilization problem and enhance workload administration, we’ve transitioned Microsoft’s AI coaching workloads right into a single pool managed by a machine studying know-how known as Venture Forge.

application
Venture Forge world scheduler makes use of machine studying to nearly schedule coaching and inferencing workloads to allow them to run throughout timeframes when {hardware} has out there capability, bettering utilization charges to 80% to 90% at scale.

At present in manufacturing throughout Microsoft providers, this software program makes use of AI to nearly schedule coaching and inferencing workloads, together with clear checkpointing that saves a snapshot of an utility or mannequin’s present state so it may be paused and restarted at any time. Whether or not operating on companion silicon or Microsoft’s customized silicon comparable to Maia 100, Venture Forge has constantly elevated our effectivity throughout Azure to 80 to 90% utilization at scale.

Safely harvesting unused energy throughout our datacenter fleet

One other manner we enhance energy effectivity includes inserting workloads intelligently throughout a datacenter to soundly harvest any unused energy. Energy harvesting refers to practices that allow us to maximise using our out there energy. For instance, if a workload shouldn’t be consuming the total quantity of energy allotted to it, that extra energy could be borrowed by and even reassigned to different workloads. Since 2019, this work has recovered roughly 800 megawatts (MW) of electrical energy from current datacenters, sufficient to energy roughly 2.8 million miles pushed by an electrical automobile.1  

Over the previous yr, whilst buyer AI workloads have elevated, our price of enchancment in energy financial savings has doubled. We’re persevering with to implement these finest practices throughout our datacenter fleet as a way to recuperate and re-allocate unused energy with out impacting efficiency or reliability.

Driving IT {hardware} effectivity via liquid cooling

Along with energy administration of workloads, we’re targeted on lowering the power and water necessities of cooling the chips and the servers that home these chips. With the highly effective processing of contemporary AI workloads comes elevated warmth technology, and utilizing liquid-cooled servers considerably reduces the electrical energy required for thermal administration versus air-cooled servers. The transition to liquid cooling additionally permits us to get extra efficiency out of our silicon, because the chips run extra effectively inside an optimum temperature vary.

A big engineering problem we confronted in rolling out these options was the best way to retrofit current datacenters designed for air-cooled servers to accommodate the newest developments in liquid cooling. With customized options such because the “sidekick,” a part that sits adjoining to a rack of servers and circulates fluid like a automobile radiator, we’re bringing liquid cooling options into current datacenters, lowering the power required for cooling whereas rising rack density. This in flip will increase the compute energy we are able to generate from every sq. foot inside our datacenters.

Be taught extra and discover sources for cloud and AI effectivity

Keep tuned to be taught extra on this subject, together with how we’re working to deliver promising effectivity analysis out of the lab and into business operations. It’s also possible to learn extra on how we’re advancing sustainability via our Sustainable by design weblog collection, beginning with Sustainable by design: Advancing the sustainability of AI and Sustainable by design: Remodeling datacenter water effectivity

For architects, lead builders, and IT determination makers who wish to be taught extra about cloud and AI effectivity, we advocate exploring the sustainability steering within the Azure Nicely-Architected Framework. This documentation set aligns to the design rules of the Inexperienced Software program Basis and is designed to assist clients plan for and meet evolving sustainability necessities and rules across the improvement, deployment, and operations of IT capabilities.   


1Equivalency assumptions based mostly on estimates that an electrical automobile can journey on common about 3.5 miles per kilowatt hour (kWh) x 1 hour x 800.

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments