Evolving the Data Center to Enable AI Computing

infrastructure

The modern data center is going through a major transformation at the moment. With the exponential rise of artificial intelligence (AI) adoption across all industries and enterprises, the traditional data center is approaching a reality where it must adapt its physical infrastructure to accommodate the requirements to power AI applications and technologies inside the envelope.

During a panel discussion at this year’s Compu Dynamics Thought Leadership Summit, NVIDIA’s Andrew Miller explained that a data center’s physical infrastructure is a critical element to enable AI computing within the white space. “…data centers, and all of the infrastructure that goes into [them]…[is] supporting those workloads just the same way that data centers have supported traditional computing for years,” he said.

However, the differences between AI computing and traditional computing are going to require a radical change within the physical environment. AI workloads inherently need more power, produce more thermal energy, and require that compute resources be densely packed close together. Unfortunately, there is no go-to guidebook on how to retrofit traditional data centers – or even for the design and construction of new AI data centers.

“There is no reference architecture,” CoreWeave’s Jim Julson said at the Summit. “All of this has a cost, and the cost is infrastructure. It’s extremely challenging where we’re at today, because we’re kind of building the wings on the plane while it’s plummeting to the earth.”

infrastructure

“The basic construction of the fiber has to change…Everything is now smaller and more dense so that we can support all the interconnects. From a basic connectivity perspective, that’s the major impact on fiber infrastructure.”
– Nathan Benton, viaPhoton

Accommodating AI workloads requires maximizing literally every inch of a data center’s physical space. All aspects from ceiling support for hanging loads to a cable’s fiber density must be carefully considered to maximize space and create an environment that can fully support operations.

One major aspect of the white space where AI workloads require a change in infrastructure is fiber cable density. AI data centers can sometimes require thousands of fiber connections in a single panel, a significantly larger and denser amount than what traditional data centers required in the past. “It’s all about getting as much connectivity to every cabinet as possible,” said viaPhoton’s Nathan Benton.

Benton explained that the fiber cabling found in traditional data centers still employs LC and MPO connectors, which are unable to support the fiber densities needed to enable AI workloads. According to Benton, data centers will have to swap out legacy connectors for smaller form factor connectors to support the thousands of fiber connections. But it’s not just the connectors that are going to have to be reconsidered. The actual design of the fiber itself will have to be reimagined as well.

As Miller noted, traditional data centers typically require eight to 16 fibers for a single rack to uplink to the network. He reported that the current generation of AI workloads are requiring around 48 fibers, if not more. “It’s just going to keep growing as the speeds continue to grow and as the rack densities continue to increase – just so that we can get enough network connectivity into that footprint.”

According to Benton, the construction of the fiber cable and connectors is going to be a key factor in accommodating higher fiber densities. “The basic construction of the fiber has to change,” said Benton. “We’re trying to make the cable smaller because it all goes up in the cable conveyance between racks. Everything is now smaller and more dense so that we can support all the interconnects. From a basic connectivity perspective, that’s the major impact on fiber infrastructure.”

Infrastructure

“For AI to be successful at scale, there needs to be tighter integration between the data centers themselves and the facilities, hardware, and software that’s actually running on top of them.”
– Andrew Miller, NVIDIA

But smaller form factors and denser fiber cables are just one challenge in an AI data center. Power level requirements drastically increase when employing AI workloads. This increase in power also equates to an increase in thermal requirements. “The power and thermal requirements that are coming down the pipe…are just continuing to increase,” said Miller. “When I started, a 5 kW rack was pretty dense. Now we’re talking 20 to 40 kW racks. In the next generation, we’re talking multiples of that.”

But to properly cool the data center following the necessary thermal requirements produces a Catch-22 scenario for owners and operators. The amount of power that is required to facilitate AI workloads can create voltage sags on the cooling systems, which can completely disrupt data center operations due to overheating and malfunction.

“A lot more coordination needs to be brought into this entire infrastructure stack,” said Miller. “For AI to be successful at scale, there needs to be tighter integration between the data centers themselves and the facilities, hardware, and software that’s actually running on top of them.”

Miller explained that leveraging a holistic approach to AI data center design that views the entire facility as a singular ecosystem would enable a bolstered infrastructure to accommodate power, thermal, and cooling requirements.

“Every facet of every side of data center infrastructure, electrical and mechanical systems, structured cabling, networking, hardware, firmware…is all critical to delivering performance,” said Miller.

To learn more about how AI and ML are influencing the data center industry, click HERE to download a complimentary copy of the eBook, “The Impact of AI: How the Latest Tech Obsession is Changing the Data Center as We Know It.”

Related Posts