Cisco, Nvidia expand collab to push Ethernet into AI clusters

Trending 1 week ago

At Cisco Live successful Amsterdam connected Tuesday, nan endeavor networking goliath announced a bid of hardware and package platforms successful collaboration pinch Nvidia tailored to everyone's favourite buzzword these days: AL/ML.

A cardinal attraction of nan collaboration is making AI systems easier to deploy and negociate utilizing modular Ethernet, thing we're judge each those who've gone done problem getting their CCNA and/or CCNP certificates will appreciate.

While nan GPUs that powerfulness AI clusters thin to predominate nan conversation, nan high-performance, low-latency networks required to support tin beryllium rather complex. While it's existent that modern GPU nodes use heavy from speedy 200Gb/s, 400Gb/s, and soon 800Gb/s networking, this is only portion of nan equation, peculiarly erstwhile it comes to training. Because these workloads often person to beryllium distributed crossed aggregate servers containing 4 aliases 8 GPUs, immoderate further latency tin lead to extended training times.

Because of this, Nvidia's InfiniBand continues to predominate AI networking deployments. In a caller question and reply pinch Dell'Oro Group's endeavor expert Sameh Boujelbene estimated that astir 90 percent of deployments are utilizing Nvidia/Mellanox's InfiniBand — not Ethernet.

That's not to opportunity Ethernet isn't gaining traction. Emerging technologies, for illustration smartNICs and AI-optimized move ASICs pinch heavy packet buffers person helped to curb packet loss, making Ethernet astatine slightest behave much for illustration InfiniBand.

For instance, Cisco's Silicon One G200 move ASIC, which we looked at past summer, boasts a number of features beneficial to AI networks, including precocious congestion management, packet-spraying techniques, and nexus failover. But its important to statement these features aren't unsocial to Cisco, and Nvidia and Broadcom person some announced likewise tin switches successful caller years.

Dell'Oro predicts Ethernet's domiciled successful AI networks to seizure astir 20 points of gross stock by 2027. One of nan reasons for this is nan industry's familiarity pinch Ethernet. While AI deployments whitethorn still require circumstantial tuning, enterprises already cognize really to deploy and negociate Ethernet infrastructure.

This truth unsocial makes collaborations pinch networking vendors for illustration Cisco an charismatic imaginable for Nvidia. While it whitethorn trim into income of Nvidia's ain InfiniBand aliases Spectrum Ethernet switches, nan salary disconnected is nan expertise to put much GPUs into nan hands of enterprises that mightiness different person balked astatine nan imaginable of deploying an wholly abstracted web stack.

Cisco plays nan endeavor AI angle

To support these efforts, Cisco and Nvidia person rolling retired reference designs and systems, which purpose to guarantee compatibility and thief to reside knowledge gaps for deploying networking, storage, and compute infrastructure successful support of their AI deployments.

These reference designs target platforms that enterprises are apt to person already invested in, including kit from Pure Storage, NetApp, and Red Hat. Unsurprisingly they besides service to push Cisco's GPU accelerated systems. These see reference designs and automation scripts for applying its FlexPod and FlashStack frameworks to AI inferencing workloads. Inferencing, peculiarly connected mini domain circumstantial models, are expected by galore to dress up nan bulk of endeavor AI deployments since they're comparatively frugal to tally and train.

The FlashStack AI Cisco Verified Design (CVD) is fundamentally a playbook for how to deploy Cisco's networking and GPU-accelerated UCS systems alongside Pure Storage's flash retention arrays. The FlexPod AI (CVD), meanwhile, appears to follow a akin pattern, but swaps Pure for NetApp's retention platform. Cisco says these will beryllium fresh to rotation retired later this month, pinch much Nvidia-backed CVDs coming successful nan future.

  • AMD crams 5 compute architectures onto a azygous board
  • Think vessel funded by Big Tech argues AI's ambiance effect is thing to interest about
  • Untangling Meta's scheme for its homegrown AI chips, group to really rotation retired this year
  • Singtel does nan 'we're building datacenters to big Nvidia clusters' thing

Speaking of Cisco's UCS compute platform, nan networking strategy has besides rolled retired an edge-focused type of its X-Series leaf systems which tin beryllium equipped pinch Nvidia's latest GPUs.

The X Direct chassis features 8 slots that tin beryllium populated pinch a operation of dual aliases quad-socket compute blades, aliases PCIe description nodes for GPU compute. Additional X-Fabric modules tin besides beryllium utilized to grow nan system's GPU capacity.

However, it's worthy noting that dissimilar galore of nan GPU nodes we've seen from Supermicro, Dell, HPE, and others, which employment Nvidia's astir powerful SXM modules, Cisco's UCS X Direct strategy only appears to support little TDP PCIe-based GPUs.

According to nan information sheet, each server tin beryllium equipped pinch up to six compact GPUs per server, aliases up to 2 dual-slot, full- length, full-height GPUs.

This will apt beryllium limiting for those looking to tally monolithic ample connection models consuming hundreds of gigabytes of GPU memory. However, it's astir apt much than capable for moving smaller conclusion workloads, for things for illustration information preprocessing astatine nan edge.

Cisco is targeting nan level astatine manufacturing, healthcare, and those moving mini datacenters. ®