Building the AI factory grid of tomorrow

December 1, 2025 | Sunil Sadani, GVP, Product Management, Routing Infrastructure Solutions, HPE

HPE and NVIDIA reveal a new blueprint for connecting AI Factory Grid with intelligent high-speed networking

In this article
  • NVIDIA and HPE advance the AI Factory Grid vision by combining intelligent AI fabric with high-scale routing to interconnect next-generation AI factories
  • NVIDIA Spectrum-X and ConnectX-8 SuperNICs enable low-latency, RDMA-aware interconnects that scale AI workloads across distributed factories into a grid
  • HPE Juniper PTX for secure, long-distance multi-cloud transport to seamlessly interconnect AI factories into a grid and HPE Juniper MX platforms for secure, high-scale, multi-tenant, onramp to the AI Factory Grid 

The artificial intelligence (AI) landscape is evolving at an unprecedented pace, driving rapid adoption across industries. To support this transformation, existing data centers—originally built to handle generic workloads—must undergo a fundamental shift. The emergence of AI workloads demands specialized, high-performance infrastructure capable of supporting the full AI lifecycle, from data ingestion and model training to optimization and real-time inference.

Enter the AI factory: a next-generation, mega/giga-scale data center purpose-built for AI. These AI factories serve as centralized hubs, integrating compute, storage, and networking resources to deliver scalable, high-speed performance for complex AI tasks. But as these hubs hit physical limits in terms of power, cooling and space it is imperative that the AI factories must be connected together. This interconnected network of AI factories a.k.a. the AI Factory Grid – provides a unified, optimized environment designed to overcome scalability challenges. The AI Factory Grid enables seamless deployment of AI solutions on a truly global scale, unlocking new levels of performance, efficiency, and innovation. The AI Factory Grid demands a new kind of network—one that is intelligent, secure and scalable, across geographies.

The recent announcement about the seamless integration of AI fabric from NVIDIA and high-speed, data-center interconnect and high-scale AI on-ramp solution from HPE [Juniper Networking] made me think of how the two industry leaders bringing two essential elements of AI factory, have embarked on a mission together to build the AI factories of tomorrow. Be it the passion for innovation and the rich history of custom silicon design or building large-scale complex interconnected systems for today / future – the common trait in their DNA binds them in more ways than one.

Today we explore how the collaboration unlocks a unified architecture for building the AI Factory Grid. That brings us to a pertinent question.

Why the AI Factory needs to scale across, not just up or out

The most common strategy for scaling, i.e., a 2-dimensional approach to scale-up (more powerful servers and bigger, dense vertically integrated systems) and scale-out (more servers)—is no longer sufficient. NVIDIA Spectrum-XGS Ethernet solution combining the NVIDIA Spectrum-X Ethernet switch and NVIDIA ConnectX-8 SuperNICs introduced a 3rd dimension – scale across. This dimension enables interconnection of multiple geographically distributed AI factories into the AI Factory Grid, enabling massive training and inference workloads to operate as one.

When looking inside of an AI factory, it is racks full of the latest NVIDIA AI infrastructure connected through ConnectX-8 SuperNICs. Together with NCCL optimization to support interconnection of clusters over larger distances, they’re interconnected through a low-latency Ethernet fabric built using NVIDIA Spectrum-X Ethernet switches that are purpose-built for AI workloads. The Spectrum-X Ethernet switches, being RDMA aware, can switch the traffic across the fabric for optimized GPU-to-GPU communication. This is the first building block of the AI Factory Grid.

Connecting AI Factories together

An AI Factory Grid addresses the problem of unifying AI clusters across multiple AI Factories that may be spaced over several 10s or even 100s of kms apart. In some cases, this requires purpose-built systems that can securely interconnect the AI factories over a Metro or different cities while spatially re-using expensive resources like optical fiber, using coherent optics. Moreover, infrastructure for most leading AI spawns across more than one cloud. A multi-cloud interconnection also requires support for more complex WAN networking protocols, at a large scale. This creates the need for routing platforms with the ability to provide massive interconnection bandwidth, scale and port density. Combined with ability to power coherent optics at 400G/800G speeds these systems must offer zero trade-offs in port densities (an often-cited drawback of legacy systems).

Next-gen, high-performance, energy-efficient routing platforms like the HPE Juniper PTX series routers – PTX10002, PTX10004 and PTX10008, using latest generation Express 5 custom-silicon have been designed to address these challenges. These deep-buffer routing platforms support latest and greatest routing protocols for reliable transport using IP/MPLS, to carry traffic across these elephant destinations. With ability to support 800G ZR+ coherent optics for long-distance high-speed links, without any trade-offs on port-density or performance, these routers bring down the TCO by up to 45%. Security for data-in-transit with encryption at line-rate is of utmost importance. All the PTX10000 series 800G capable routers support line-rate MACsec at 800GE speeds for secure WAN connection over dark-fiber or IP over DWDM. Deployed at scale by hyper-scalers, neo clouds, CSPs and large enterprises alike for data-center WAN, these platforms are the foundation of interconnection across AI factories.

Connecting to the AI Factory Grid

Growing AI consumption is an important aspect for AI Factory Grid support. With increasing adoption of AI, up to 96% of enterprises across different verticals are planning to double/quadruple their connection bandwidth to the cloud. It is daunting challenge to on-board several 1000s of customers, 10s of 1000s of endpoints and 100s of thousands of tunnels – that directly point to workload hosts. This demands an intelligent edge for on-ramp of customers to an AI Factory Grid.

Onboarding customers requires segmentation between multiple tenants, dynamically load-balance customers across different AI factories based on workload placement, agility to support different interconnection speeds and flexibility to support complex routing protocols with reliable IP/MPLS transport. It is equally important to provide a choice of overlays and secure underlays using MACsec or IPsec based on choice of connectivity and transport. The HPE Juniper MX series routers – MX301, MX304 and modular MX10000 powered by fully programmable, custom-built Trio 6 silicon provide industry-leading logical scale for multi-tenant connectivity. The programmable networking silicon with support for line-rate MACsec, inline IPsec and feature rich routing protocol stack. With industry leading and unparalleled logical scale to support 10s of 1000s of customers and millions of tunnels, these routers eliminate need for intermediate gateways thereby providing express connectivity to workload hosts and saving costs of additional sites and devices over 25%. Unique capabilities such as arbitrary header match, combined with traffic-engineering capabilities, allow steering of network traffic that is aware of workload placement. A testimony to the flexibility and scale of these platforms is that – these have been already proven in some of the largest hyper-scalers in the role of virtual private cloud gateways as well as with service providers that provide direct connectivity to the cloud. Ideally suited for multicloud routing, the MX series platforms as the final building blocks, make the network for AI Factory Grid networking stack complete.

This integration combines 3 high-performance silicon parts from 2 industry leaders to create one highly scalable, agile network fabric that links multiple AI factories into a powerful Factory Grid.

The 3-2-1 Integration

The seamless integration of 3 high-performance silicon (NVIDIA Spectrum-X Ethernet, HPE Juniper Express 5 and Trio 6) from 2 industry leaders to deliver 1 solution – a highly scalable, agile, network fabric interconnecting several AI Factories into a Factory Grid makes this partnership formidable.

Summary

The future of AI infrastructure lies in distributed, interconnected AI factories. By combining NVIDIA’s intelligent, low-latency fabric with Juniper’s robust, long-distance transport and enterprise edge solutions, organizations can build scalable, secure, and high-performance AI Factory Grids. This hybrid model isn’t just a technical solution—it’s a strategic imperative for any enterprise or cloud provider looking to lead in the AI era.

Additional resources:

Share this article