On-Prem

HPC

Intel, AMD just created a headache for datacenters

Server silos didn't see today's watt-gobbling, space-heater chips coming


In pursuit of ever-higher compute density, chipmakers are juicing their chips with more and more power, and according to the Uptime Institute, this could spell trouble for many legacy datacenters ill equipped to handle new, higher wattage systems.

AMD's Epyc 4 Genoa server processors announced late last year, and Intel's long-awaited fourth-gen Xeon Scalable silicon released earlier this month, are the duo's most powerful and power-hungry chips to date, sucking down 400W and 350W respectively, at least at the upper end of the product stack.

The higher TDP arrives in lock step with higher core counts and clock speeds than previous CPU cores from either vendor. It's now possible to cram more than 192 x64 cores into your typical 2U dual socket system, something that just five years ago would have required at least three nodes.

However, as Uptime noted, many legacy datacenters were not designed to accommodate systems this power dense. A single dual-socket system from either vendor can easily exceed a kilowatt, and depending on the kinds of accelerators being deployed in these systems, boxen can consume well in excess of that figure.

The rapid trend towards hotter, more power dense systems upends decades-old assumptions about datacenter capacity planning, according to Uptime, which added: "This trend will soon reach a point when it starts to destabilize existing facility design assumptions."

This trend will soon reach a point when it starts to destabilize existing facility design assumptions

A typical rack remains under 10kW of design capacity, the analysts note. But with modern systems trending toward higher compute density and by extension power density, that's no longer adequate.

While Uptime notes that for new builds, datacenter operators can optimize for higher rack power densities, they still need to account for 10 to 15 years of headroom. As a result, datacenter operators must speculate as the long-term power and cooling demands which invites the risk of under or over building.

With that said, Uptime estimates that within a few years a quarter rack will reach 10kW of consumption. That works out to approximately 1kW per rack unit for a standard 42U rack.

Keeping cool

Powering these systems isn't the only challenge facing datacenter operators. All computers are essentially space heaters that convert electricity into computational work with the byproduct being thermal energy.

According to Uptime, high-performance computing applications offer a glimpse of the thermal challenges to come for more mainstream parts. One of the bigger challenges being substantially lower case temperatures compared to prior generations. These have fallen from 80C to 82C just a few years ago to as low as 55C for a growing number of models.

"This is a key problem: removing greater volumes of lower-temperature heat is thermodynamically challenging," the analysts wrote. "Many 'legacy' facilities are limited in their ability to supply the necessary airflow to cool high-density IT."

To mitigate this, the American Society of Heating, Refrigerating and Air-Conditioning Engineers (ASHRAE) have issued revised operating recommendations [PDF] for datacenters including provisions for dedicated low-temperature areas.

Liquid cooling has also gained considerable attention as chips have grown ever hotter. During the Supercomputing Conference last year we took a deeper dive at the various technologies available to cool emerging systems.

But while these technologies have matured in recent years, Uptime notes they still suffer from a general lack of standardization "raising fears of vendor lock in and supply chain constraints for key parts as well as reduced choice in server configurations."

Efforts to remedy these challenges have been underway for years. Both Intel and the Open Compute Project are both working on liquid and immersion cooling reference designs to improve compatibility across vendors.

Early last year Intel announced a $700 million "mega lab" which would oversee the development of immersion and liquid cooling standards. Meanwhile, OCP's advanced cooling solutions sub project, has been working on this problem since 2018.

Despite these challenges, Uptime notes that the flux in datacenter technologies also opens doors for operators to get a leg up on their competition, if they're willing to take the risk.

Power is getting more expensive

And there may be good reason to do just that, according to Uptime's research, which shows that energy prices are expected to continue their upward trajectory over the next few years.

"Power prices were on an upward trajectory before Russias' invasion of Ukraine. Wholesale forward prices for electricity were already shutting up — in both the European and US markets — in 2021," Uptime noted.

While not directly addressed in the institute's report, it's no secret that direct liquid cooling and immersion cooling can achieve considerably lower power usage effectiveness (PUE) compared to air cooling. The metric describes how much of the power used by datacenters goes toward compute, storage, or networking equipment. The closer the PUE is to 1.0, the more efficient the facility.

Immersion cooling has among the lowest PUE ratings of any thermal management regime. Vendors like Submer often claim efficiency ratings as low as 1.03.

Every watt saved by IT reduces pressures elsewhere

The cost of electricity isn't the only concern facing datacenter operators, Uptime analysts noted. They also face regulatory and environmental hurdles from municipalities concerned about the space and power consumption of neighboring datacenter operations.

The European Commission is expected to adopt new regulations under the Energy Efficiency Directive which, Uptime says will force datacenters to reduce both energy consumption and carbon emissions. Similar regulation has been floated stateside. Most recently a bill was introduced in the Oregon assembly that would require datacenters and cryptocoin mining operations to curb carbon emissions or face fines.

Uptime expects the opportunities for efficiency gains to become more evident as these regulations force regular reporting of power consumption and carbon emissions.

"Every watt saved by IT reduces pressures elsewhere," the analysts wrote. "Reporting requirements will sooner or later shed light on the vast potential for greater energy efficiency currently hidden in IT." ®

Send us news
29 Comments

Core blimey, Intel's answer to AMD and Ampere's cloudy chips has 288 of them

And they're all tailored for efficiency

AI startup Lamini bets future on AMD's Instinct GPUs

Oh MI word: In the AI race, any accelerator beats none at all

Intel starts mass production on Intel 4 node using EUV in Irish fab

First Euro facility to use the next-gen lithography tech for commercial production

Intel CTO suggests using AI to port CUDA code to – surprise! – Intel chips

This is about ending Nvidia's vendor lock-in, insists Greg Lavender

Intel thinks glass substrates are a clear winner in multi-die packaging

Don't get too excited, tech won't be ready until the end of the decade

Desktop AI isn’t happening, says AMD, and might not for quite a while

Chip designer has extended support for modest desktop CPUs, citing Intel setting expectations for cheap and not-so-speedy silicon

Supermicro CEO predicts 20 percent of datacenters will adopt liquid cooling

30th birthday post reveals 2004 disaster movie continues to influence company strategy

European Commission hits Intel with new fine over antitrust findings

What a difference a year makes: in June '22 it was asking for half a billion in interest back after a successful appeal

AMD's latest Epyc is slimmer, cooler, and ready to party at the edge

Little chip promises big power savings

Intel slaps forehead, says I got it: AI PCs. Sell them AI PCs

People try to put us down, talkin' 'bout ML generation

Intel aims to patch semiconductor skills gap with one-year cert program

New fabs won't achieve much without specialized staff to fill them

Schneider Electric warns that existing datacenters aren't buff enough for AI

You're going to need liquid-cooled servers, 415V PDUs, two-ton racks, and plenty of software management