4x RTX 2080 TI with Quadro Nvlink | Performance Test

TensorFlow CNN: ResNet-50 FP16 & FP32

Deep learning benchmark 2019/ Tensorflow, Nvidia, Deep learning Workstation, THREADRIPPER

Convolutional Neural NetsDocker container image tensorflow:18.03-py2 from NGC


Hardware used:

CPU – THREADRIPPER 1900

32 GB ram DDR4

4X RTX 2080 Ti with 2X Nvlink Quadro

EVGA 1600w

MSI carbon x399


TEST rules: FP32 & FP16

1- 2x 2080 ti w/o nvlink

2- 2x 2080 ti w/ nvlink

3- 4x 2080 ti w/o nvlink

4- 4x 2080 ti w/ nvlink

5- 2x 2080 ti w/ nvlink 2x w/o nvlink

In this case, we test all possibilities


Checking the nvlink status:

techno@dl:~$ nvidia-smi nvlink --status -i 0
GPU 0: GeForce RTX 2080 Ti (UUID: GPU-c8aa2ad3-943c-665e-90fc-c9af727289cc)
         Link 0: 25.781 GB/s
         Link 1: 25.781 GB/s
techno@dl:~$ nvidia-smi nvlink --status -i 
Option "-i" is missing its value.
techno@dl:~$ nvidia-smi nvlink --status 
GPU 0: GeForce RTX 2080 Ti (UUID: GPU-c8aa2ad3-943c-665e-90fc-c9af727289cc)
         Link 0: 25.781 GB/s
         Link 1: 25.781 GB/s
GPU 1: GeForce RTX 2080 Ti (UUID: GPU-31f2f22f-b288-01f6-c102-c9990658aebe)
         Link 0: 25.781 GB/s
         Link 1: 25.781 GB/s
GPU 2: GeForce RTX 2080 Ti (UUID: GPU-6be7a8ec-bc7f-9347-6d5c-5557e23d4b37)
         Link 0: 25.781 GB/s
         Link 1: 25.781 GB/s
GPU 3: GeForce RTX 2080 Ti (UUID: GPU-7a82b7e5-96b1-11aa-5413-82fcdca4554f)
         Link 0: 25.781 GB/s
         Link 1: 25.781 GB/s
techno@dl:~$

Working good – 25GB P2P so 50GB bidirectional – Ok

Downloading the docker containers for the test: ( NGC containers, need docker installation and NGC account ( Login from terminal to pull the images )

1- sudo docker run –runtime=nvidia –rm -it -v $HOME/projects:/projects nvcr.io/nvidia/tensorflow:18.03-py2


Frameworks and Model used:

Tensorflow 1.4.0

Cuda 9  

Multi-GPU support utilizing the NCCL communication library for the CNN code


Benchmark Results:


Conclusions:

According to our tests, we can see that using Quadro Nvlink, we see an increase in the number of images that can be processed, the greatest impact is seen in a 4-card system, in which the connection of 2 Nvlink was made by pairs of cards.

In our opinion, the best configuration would be a workstation with 4x 2080 ti with 2 Quadros Nvlinks since we see an increase of 13% when using Nvlinks.


DLBT is our ( Deep learning benchmark tool ) , we make benchmarking easy, to download our free app for Linux , check here https://technopremium.com/

Comments

  1. Doug Holland

    I’m wondering why you used Quadro RTX bridges instead of the GeForce bridge? I ask as I have two Quadro RTX 5000’s and I need a 4-slot bridge. I have seen conflicting information online as to whether the GeForce or Titan 4-slot bridges would work with Quadro cards as there is currently no four slot bridge for Quadro.

    Any information you can share would be very welcome. I’d imaging if a Quadro bridge works with GeForce cards then a GeForce bridge should work with Quadro cards.

    • technql1

      Hi,

      That was one of our tests, the problem we had using the normal bridge on the RTX 2080 Ti and TITAN was that the gap between cards was too big and connecting 4 of this cards on a normal case was impossible to use the Nvlink, so we decide to use the Quadro bridge and worked, but the Nvlink for Gforce is not working on Quadro cards.

      Also, there are no 4 ways Nvlink like was with SLI, but using the Quadro bridge we can use on pairs and we will see a 15% performance increase on each pair.

Comments are closed.