NVLink

NVLink is a wire-based communications protocol serial multi-lane near-range communication link developed by Nvidia.
Unlike PCI Express, a device can consist of multiple NVLinks, and devices use mesh networking to communicate instead of a central Hub. The protocol was first announced in March 2014 and uses a proprietary High-Speed Signaling interconnect.

Principle

NVLink is a wire-based communications protocol for near-range semiconductor communications developed by Nvidia that can be used for data and control code transfers in processor systems between CPUs and GPUs and solely between GPUs. NVLink specifies a point-to-point connection with data rates of 20 and 25 Gbit/s per differential pair. Eight differential pairs form a "sub-link" and two "sub-links", one for each direction, form a "link". The total data rate for a sub link is 25 Gbit/s and the total data rate for a link is 50 Gbit/s. Each V100 GPU supports up to six links. Thus, each GPU is capable of supporting up to 300 Gbit/s in total bi-directional bandwidth. NVLink products introduced to date focus on the high-performance application space. Announced May 14, 2020, NVLink 3.0 increases the data rate per differential pair from 25 Gbit/s to 50 Gbit/s while halving the number of pairs per NVLink from 8 to 4. With 12 links for an Ampere-based A100 GPU the brings the total bandwidth to 600 Gbit/sec.

Performance

The following table shows a basic metrics comparison based upon standard specifications:

Interconnect	Transfer Rate	Line Code	Eff. Payload Rate per Lane per Direction	Max total Lane Length
PCIe 1.x	2.5 GT/s	8b/10b	~0.25 GB/s	20" = ~51 cm
PCIe 2.x	5 GT/s	8b/10b	~0.5 GB/s	20" = ~51 cm
PCIe 3.x	8 GT/s	128b/130b	~1 GB/s	20" = ~51 cm
PCIe 4.0	16 GT/s	128b/130b	~2 GB/s	8−12" = ~20−30 cm
PCIe 5.0	32 GT/s	128b/130b	~4 GB/s
NVLink 1.0	20 Gbit/s		~2.5 GB/s
NVLink 2.0	25 Gbit/s		~3.125 GB/s
NVLink 3.0	50 Gbit/s		~6.25 GB/s

The following table shows a comparison of relevant bus parameters for real world semiconductors that all offer NVLink as one of their options:

Semiconductor	Board/Bus delivery variant	Interconnect	Transmission Technology Rate	Lanes per Sub-Link	Sub-Link Data Rate	Sub-Link or Unit Count	Total Data Rate	Total Lanes	Total Data Rate
Nvidia GP100	P100 SXM, P100 PCI-E	PCIe 3.0	8 GT/s	16 + 16 Ⓑ	128 Gbit/s = 16 GByte/s	1	16 + 16 GByte/s	32 Ⓒ	32 GByte/s
Nvidia GV100	V100 SXM2, V100 PCI-E	PCIe 3.0	8 GT/s	16 + 16 Ⓑ	128 Gbit/s = 16 GByte/s	1	16 + 16 GByte/s	32 Ⓒ	32 GByte/s
Nvidia TU104	GeForce RTX 2080, Quadro RTX 5000	PCIe 3.0	8 GT/s	16 + 16 Ⓑ	128 Gbit/s = 16 GByte/s	1	16 + 16 GByte/s	32 Ⓒ	32 GByte/s
Nvidia TU102	GeForce RTX 2080 Ti, Quadro RTX 6000/8000	PCIe 3.0	8 GT/s	16 + 16 Ⓑ	128 Gbit/s = 16 GByte/s	1	16 + 16 GByte/s	32 Ⓒ	32 GByte/s
Nvidia Xavier		PCIe 4.0 Ⓓ 2 units: x8 1 unit: x4 3 units: x1	16 GT/s	8 + 8 Ⓑ 4 + 4 Ⓑ 1 + 1	128 Gbit/s = 16 GByte/s 64 Gbit/s = 8 GByte/s 16 Gbit/s = 2 GByte/s	Ⓓ 2 1 3	Ⓓ 32 + 32 GByte/s 8 + 8 GByte/s 6 + 6 GByte/s	40 Ⓑ	80 GByte/s
IBM Power9		PCIe 4.0	16 GT/s	16 + 16 Ⓑ	256 Gbit/s = 32 GByte/s	3	96 + 96 GByte/s	96	192 GByte/s
Nvidia GA100	Ampere A100	PCIe 4.0	16 GT/s	16 + 16 Ⓑ	256 Gbit/s = 32 GByte/s	1	32 + 32 GByte/s	32 Ⓒ	64 GByte/s
Nvidia GP100	P100 SXM,	NVLink 1.0	20 GT/s	8 + 8 Ⓐ	160 Gbit/s = 20 GByte/s	4	80 + 80 GByte/s	64	160 GByte/s
Nvidia Xavier		NVLink 1.0	20 GT/s	8 + 8 Ⓐ	160 Gbit/s = 20 GByte/s
IBM Power8+		NVLink 1.0	20 GT/s	8 + 8 Ⓐ	160 Gbit/s = 20 GByte/s	4	80 + 80 GByte/s	64	160 GByte/s
Nvidia GV100	V100 SXM2	NVLink 2.0	25 GT/s	8 + 8 Ⓐ	200 Gbit/s = 25 GByte/s	6	150 + 150 GByte/s	96	300 GByte/s
IBM Power9		NVLink 2.0	25 GT/s	8 + 8 Ⓐ	200 Gbit/s = 25 GByte/s	6	150 + 150 GByte/s	96	300 GByte/s
NVSwitch		NVLink 2.0	25 GT/s	8 + 8 Ⓐ	200 Gbit/s = 25 GByte/s	2 * 8 + 2 = 18	450 + 450 GByte/s	288	900 GByte/s
Nvidia TU104	GeForce RTX 2080, Quadro RTX 5000	NVLink 2.0	25 GT/s	8 + 8 Ⓐ	200 Gbit/s = 25 GByte/s	1	25 + 25 GByte/s	16	50 GByte/s
Nvidia TU102	GeForce RTX 2080 Ti, Quadro RTX 6000/8000	NVLink 2.0	25 GT/s	8 + 8 Ⓐ	200 Gbit/s = 25 GByte/s	2	50 + 50 GByte/s	32	100 GByte/s
Nvidia GA100	Ampere A100	NVLink 3.0	50 GT/s	8 + 8 Ⓐ	400 Gbit/s = 50 GByte/s	6	300 + 300 GByte/s	96	600 GByte/s

Note: Data Rate columns were rounded by being approximated by transmission rate, see real world performance paragraph
Real world performance could be determined by applying different encapsulation taxes as well usage rate. Those come from various sources:

128b/130b line code
Link control characters
Transaction header
Buffering capabilities
DMA usage on computer side

Those physical limitations usually reduce the data rate to between 90 and 95% of the transfer rate.
NVLink benchmarks show an achievable transfer rate of about 35.3 Gbit/s for a 40 Gbit/s NVLink connection towards a P100 GPU in a system that is driven by a set of IBM Power8 CPUs.

Usage with Plug-In Boards

For the various versions of plug-in boards that are exposing extra connectors for joining them into a NVLink group a similar amount of slightly varying, relatively compact, PCB based interconnection plugs does exist. Typically only boards of same type will mate together due to their physical and logical design. For some setups two identical plugs need to be applied for achieving the full data rate. As of now the typical plug is U-shaped with a fine grid edge connector on each of the end strokes of the shape facing away from the viewer. The wide of the plugs determines how far away the plug-in cards need to be seated to the main board of the hosting computer system - a distance of for the placement of the card is commonly determined by the matching plug. The interconnect is often referred as SLI from 2004 for its structural design and appearance even if the modern NVLink based design is of a quite different technical nature with different features in its basic levels compared to the former design. Reported real world devices are:

Quadro GP100
Quadro GV100
GeForce RTX 2080 based on TU104
GeForce RTX 2080 Ti based on TU102
Quadro RTX 5000 based on TU104
Quadro RTX 6000 based on TU102
Quadro RTX 8000 based on TU102
Service Software and Programming

By means of the NVML-API offers for the Tesla, Quadro and Grid line of products NVidia a set of functions for programmatically controlling some aspects of NVLink interconnects on Windows and Linux systems, such as component evaluation and versions along with status/error querying and performance monitoring. Further with the provision of the NCCL library developers in the public space shall be enabled for realizing e.g. powerful implementations for artificial intelligence and similar computation hungry topics atop NVLink. The page "3D Settings" => "Configure SLI, Surround, PhysX" from the NVidia Control panel and the CUDA sample application "simpleP2P" are supposed using such APIs as mentioned upfront for realizing their services in respect to their NVLink features. At least on the Linux platform the command line application with a certain sub-command "nvidia-smi nvlink" provides a similar set of advanced information and control.

History

On 5 April 2016, Nvidia announced that NVLink would be implemented in the Pascal-microarchitecture-based GP100 GPU, as used in, for example, Nvidia Tesla P100 products. With the introduction of the DGX-1 high performance computer base it was possible to have up to eight P100 modules in a single rack system connected to up to two host CPUs. The carrier board allows for a dedicated board for routing the NVLink connections – each P100 requires 800 pins, 400 for PCIe + power, and another 400 for the NVLinks, adding up to nearly 1600 board traces for NVLinks alone. Each CPU has direct connection to 4 units of P100 via PCIe and each P100 has one NVLink each to the 3 other P100s in the same CPU group plus one more NVLink to one P100 in the other CPU group. Each NVLink offers a bidirectional 20 GB/sec up 20 GB/sec down, with 4 links per GP100 GPU, for an aggregate bandwidth of 80 GB/sec up and another 80 GB/sec down. NVLink supports routing so that in the DGX-1 design for every P100 a total of 4 of the other 7 P100s are directly reachable and the remaining 3 are reachable with only one hop. According to depictions in Nvidia's blog based publications from 2014 NVLink allows bundling of individual links for increased point to point performance so that for example a design with two P100s and all links established between the two units would allow the full NVLink bandwidth of 80 GB/s between them.
At GTC2017, Nvidia presented its Volta generation of GPUs and indicated the integration of a revised version 2.0 of NVLink that would allow total i/o data rates of 300 GB/s for a single chip for this design, and further announced the option for pre-orders with a delivery promise for Q3/2017 of the DGX-1 and DGX-Station high performance computers that will be equipped with GPU modules of type V100 and have NVLink 2.0 realized in either a networked of or a fully interconnected fashion of one group of four V100 modules.
In 2017-2018, IBM and Nvidia delivered two supercomputers for the US Department of Energy named "Summit" and "Sierra", which combine IBM's POWER9 family of CPUs and Nvidia's Volta architecture, using NVLink 2.0 for the CPU-GPU and GPU-GPU interconnects and InfiniBand EDR for the system interconnects.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...

NVLink

Principle

Performance

Usage with Plug-In Boards

Service Software and Programming

History