by Ryan Smith on June 22, 2020 3: 30 AM EST
- Posted in
- Deep Studying
With the birth of their Ampere structure and new A100 accelerator barely a month in the assist of them, NVIDIA this morning is announcing the PCIe model of their accelerator as share of the starting up of the now-digital ISC Digital convention for excessive performance computing. The more straight-laced counterpart to NVIDIA’s flagship SXM4 model of the A100 accelerator, the PCie model of the A100 is designed to supply A100 in a more former blueprint component for customers who need something that they can high-tail into standardized servers. Overall the PCIe A100 affords the identical height performance because the SXM4 A100, on the opposite hand with a lower 250 Watt TDP, precise-world performance received’t be quite as excessive.
The fundamental counterpart to NVIDIA’s SXM blueprint component accelerators, NVIDIA’s PCIe accelerators attend to flesh out the opposite side of NVIDIA’s accelerator lineup. Whereas NVIDIA would gladly sell everybody SXM-primarily based completely completely accelerators – which can likely maybe come with the costly NVIDIA HGX provider board – there are composed loads of customers who maintain so as to use GPU accelerators in popular, PCIe-primarily based completely completely rackmount servers. Or for smaller workloads, customers don’t need the blueprint of 4-contrivance and better scalability equipped by SXM-blueprint component accelerators. So with their PCIe cards, NVIDIA can attend the reduction of the accelerator market that their SXM products can’t reach.
The PCIe A100, in turn, is a full-fledged A100, factual in a a sort of blueprint component and with a more appropriate TDP. By the use of height performance, the PCIe A100 is factual as rapidly as its SXM4 counterpart; NVIDIA this time isn’t transport this as a slit-down configuration with lower clockspeeds or fewer helpful blocks than the flagship SXM4 model. As a result the PCIe card brings all the pieces A100 affords to the desk, with the identical heavy level of curiosity on tensor operations, along with the brand new higher precision TF32 and FP64 formats, as well to even sooner integer inference.
|NVIDIA Accelerator Specification Comparison|
|FP32 CUDA Cores||6912||6912||5120||3584|
|Memory Clock||2.4Gbps HBM2||2.4Gbps HBM2||1.75Gbps HBM2||1.4Gbps HBM2|
|Memory Bus Width||5120-bit||5120-bit||4096-bit||4096-bit|
|Single Precision||19.5 TFLOPs||19.5 TFLOPs||14.1 TFLOPs||9.3 TFLOPs|
|Double Precision||9.7 TFLOPs
(1/2 FP32 price)
(1/2 FP32 price)
(1/2 FP32 price)
(1/2 FP32 price)
|INT8 Tensor||624 TOPs||624 TOPs||N/A||N/A|
|FP16 Tensor||312 TFLOPs||312 TFLOPs||112 TFLOPs||N/A|
|TF32 Tensor||156 TFLOPs||156 TFLOPs||N/A||N/A|
|Relative Efficiency (SXM Model)||90%||100%||N/A||N/A|
6 Hyperlinks? (300GB/sec?)
12 Hyperlinks (600GB/sec)
4 Hyperlinks (200GB/sec)
4 Hyperlinks (160GB/sec)
|Manufacturing Route of||TSMC 7N||TSMC 7N||TSMC 12nm FFN||TSMC 16nm FinFET|
|Interface||PCIe 4.0||SXM4||PCIe 3.0||SXM|
But since the twin-slot add-in card blueprint component is designed for lower TDP products, offering much less room for cooling and generally much less access to energy as well, the PCIe model of the A100 does maintain to ratchet down its TDP from 400W to 250W. That’s a big 38% reduction in energy consumption, and which capacity the PCIe A100 isn’t going so as to match the sustained performance figures of its SXM4 counterpart – that’s the income of going with a blueprint component with higher energy and cooling budgets. All told, the PCIe model of the A100 may perhaps likely maybe likely composed ship about 90% of the performance of the SXM4 model on single-GPU workloads, which for the kind of big fall in TDP, isn’t a corrupt commerce-off.
And on this note, I may perhaps likely maybe likely composed give NVIDIA credit score the put credit score is due: not just like the PCIe model of the V100 accelerator, NVIDIA is doing a mighty better job of documenting these performance differences. This time around NVIDIA is explicitly noting the 90% resolve in their their specification sheets and related advertising and marketing affords. So there needs to be a lot much less confusion about how the PCIe model of the accelerator compares to the SXM model.
Diverse than the blueprint component and TDP adjustments, primarily the most efficient other primary deviation for the PCIe A100 from the SXM model is the amount of NVLink-related GPUs supported. For their PCIe card NVIDIA is yet one more time the usage of NVLink bridges related all around the stop of A100 cards, bearing in mind two (and most efficient two) cards to be linked together. NVIDIA’s product sheet doesn’t listing the total bandwidth on hand, however because the PCIe V100 supported up to 100GB/sec in every route the usage of two hyperlinks, the PCIe A100 and its 3 NVLink connectors wants so as to achieve 150GB/sec, if not more.
In another case the PCIe A100 comes with the identical old trimmings of the blueprint component. The cardboard is entirely passively cooled, designed for use with servers with extremely effective chassis followers. And even though not pictured in NVIDIA’s official pictures, there are sockets for PCIe energy connectors. Meanwhile, with the diminished usage of NVLink in this model of the cardboard, A100’s native PCIe 4 enhance will positively be of elevated importance right here, underscoring the advantage that an AMD Epyc + NVIDIA A100 pairing has factual now since AMD is largely the most efficient x86 server vendor with PCIe 4 enhance.
Wrapping things up, while NVIDIA isn’t announcing explicit pricing or availability files recently, the brand new PCIe A100 cards needs to be transport soon. The broader compatibility of the PCIe card has helped NVIDIA to line up over 50 server wins at this level, with 30 of those servers blueprint to ship this summer.