Key Idea
Use a HyperX topology for photonic wafer-scale interposer NoCs.
Wafer-scale integration has grown in popularity over the past few years, with a number of players (Cerebras, Tesla Dojo, UCLA NanoCAD, etc.) building wafer-scale systems.
An alternative approach to electrical WSI approaches is photonic/optical approaches, which promise the benefits of reduced latency and energy for long distance connections on the wafer.
However, currently, a good network topology for such implementations has not been provided.
Background on Photonic WSI
Photonic WSI approaches ([1], [2], [3]) use on-chip waveguides and reticle stitching in order to achieve wafer-scale optical interconnects.
A variety of technology platforms are possible for light generation, modulation and waveguides (e.g. external laser, micro-ring resonator and silicon on insulator).
However, there is still no settled network topology choice for this use case.
From [3]
Shows two possible options, neither of which are ideal. The fully connected mesh is excessively painful on routing resources and quickly stops scaling to a high number of nodes.
The 2-D torus fails to take advantage of the near zero cost and latency difference between long haul and short haul connections in photonic WSI.
Amusingly, this mirrors the evolution in HPC networking topologies, where torii were preferred (due to short-haul copper cables) until long-haul optical cables became cheaper, which motivated the creation and migration to topologies like Dragonfly or Megafly.
Key Points
I argue the HyperX topology is an excellent fit for this use case.
A 2-D HyperX has the following advantages:
Low diameter (2 for 2-D HyperX) reduces hop count
Takes advantage of the fact that long distance connections in optical are cheap and electrical to optical conversions are expensive.
Also reduces latency and energy
Regular layout eliminates waveguide bends, maximizes BW density and limits layer count (to 2)
Especially compared to all-to-all or Dragonfly topologies
Low-diameter topologies highly utilize links, reducing costs
Important since links are more expensive in optical WSI compared to electrical WSI
Squeezes more out of aggregate BW
Low switch radix
Optical circuit switches can be used along each dimension of the HyperX for traffic shaping
Smaller advantages:
Allows for fully passive optical wafer-scale interposer
The 2-D HyperX eliminates the need for optical circuit switches and therefore allows material platforms like SiN that are fully passive, lower loss and cheaper.
Allows for optical technologies that don’t have optical circuit switch capability
Eg directly modulated VCSEL or Micro-LED platforms
Comparison to Dragonfly
How does HyperX compare to other low-diameter networks (e.g. Dragonfly) in the use case for Photonic WSI?
Reference [4] goes through a cost model example comparing Dragonfly and HyperX, which suggests that when the cost of local and global connections are more similar, then Dragonfly loses to HyperX. In the regime of photonic WSI, there is no cost difference between local and global connections, tilting things in favor of HyperX.
Fig. 1 (from [5]).
Fig. 1 shows an example Dragonfly topology for 16 chips/nodes. Note that an actual layout generally can’t involve diagonal waveguides, so will require bends. In any case, the layout for Dragonfly will require more layers and be significantly more complex than for HyperX.
Example 4x4 Sketch
Note that this is amenable to a very dense layout, utilizing almost all the available space for waveguides, and minimizing waveguide crossings.
Evaluation
Evaluation is challenging because traditionally HyperX (and other low-diameter topologies) rely upon adaptive routing.
Adaptive routing techniques like progressive adaptive routing may be too expensive to use in what are effectively NoC routers.
However, random placement of work on nodes can alleviate some of these issues, likely especially when combined with oblivious routing techniques (eg Valiant) [6], [7], [8].
References
Lightelligence Hummingbird Low-Latency Optical Connection Engine - ServeTheHome
Practical and Efficient Incremental Adaptive Routing for HyperX Networks
Evaluation of novel interconnect technologies for ASC applications
Evaluating Trade-offs in Potential Exascale Interconnect Topologies
HyperX: Topology, Routing, and Packaging of Efficient Large-Scale Networks
https://sc19.supercomputing.org/proceedings/tech_paper/tech_paper_files/pap113s5.pdf