2

Strange mapping: example

In the following example, the first column is chosen in the code, second column is the one that does the work instead:

0:0 1234 MiB
1:2 1234 MiB
2:7 1234 MiB

3:5 2341 MiB

4:1 3412 MiB
5:3 3412 MiB
6:4 3412 MiB
7:6 3412 MiB

Thus, to get 0,4,5,6,7, you code: 0,6,3,7,2

How to check the mapping

I have this strange mapping all the time. You can test it like this:

  • build a tiny dummy model or
  • load a pretrained model.

Then put this model on each device, one after the other, and in each step, check the change in !nvidia-smi|tail to see to which GPU the cuda device got mapped. This mapping does not change for the whole session, and the mapping even stays the same after a server relaunch. Thus, it seems to be set by the technical hierarchy of the GPUs which does not change unless you change the hardware.

Code to build some model

Dummy model (quick check, take this)

This tiny dummy tensor is code from PyTorch How to delete PyTorch objects correctly from memory:

import torch
model = torch.Tensor(10,10)

This builds a tiny tensor model. This is better than downloading a full pretrained model as in the next heading if you only want to check the cuda-to-GPU alignments as a test.

Full model (do not take this, take the dummy model instead)

from transformers import (AutoTokenizer, AutoModelForCausalLM, TextDataset, 
    DataCollatorForLanguageModeling, Trainer, TrainingArguments)

def get_model(model_name): # Load pre-trained model and tokenizer tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name) return tokenizer, model

model_name = "dbmdz/german-gpt2" tokenizer, model = get_model(model_name)

Code to check the devices

And this is the code that I ran for each of the devices, from 0 to 7. After each step, check which of the GPUs was filled with a new memory entry.

# Example for 0:
model = model.to('cuda:0')

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 1073165 C .../miniconda3/bin/python3.9 1000MiB | +-----------------------------------------------------------------------------+

Thus, 0->0.

Example for 1:

model = model.to('cuda:1')

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 1073165 C .../miniconda3/bin/python3.9 1000MiB | | 2 N/A N/A 1073165 C .../miniconda3/bin/python3.9 1000MiB | +-----------------------------------------------------------------------------+

Thus, 1->2.

...

Example for 2:

model = model.to('cuda:2')

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 1073165 C .../miniconda3/bin/python3.9 1000MiB | | 2 N/A N/A 1073165 C .../miniconda3/bin/python3.9 1000MiB | | 7 N/A N/A 1073165 C .../miniconda3/bin/python3.9 1000MiB | +-----------------------------------------------------------------------------+

Thus, 2->7.

Further checks:

  • 3->5.
  • 4->1
  • 5->3
  • 6->4
  • 7->6

Thus to get these, > 0,1,2,3,4,5,6,7 you need to take devices: > 0,4,1,5,6,3,7,2

So that at cuda:7, you know the whole mapping only by checking what has changed:

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 1073165 C .../miniconda3/bin/python3.9 1000MiB | | 1 N/A N/A 1073165 C .../miniconda3/bin/python3.9 1000MiB | | 2 N/A N/A 1073165 C .../miniconda3/bin/python3.9 1000MiB | | 3 N/A N/A 1073165 C .../miniconda3/bin/python3.9 1000MiB | | 4 N/A N/A 1073165 C .../miniconda3/bin/python3.9 1000MiB | | 5 N/A N/A 1073165 C .../miniconda3/bin/python3.9 1000MiB | | 6 N/A N/A 1073165 C .../miniconda3/bin/python3.9 1000MiB | | 7 N/A N/A 1073165 C .../miniconda3/bin/python3.9 1000MiB | +-----------------------------------------------------------------------------+

Further setup

The environment variable CUDA_VISIBLE_DEVICES is empty, and by changing it, the mapping will not change. The strange mapping has not changed since the beginning of the project even though I changed this variable a lot:

enter image description here

Question

What can be done to get rid the strange mapping between coded GPU (first column in the first example above) and chosen GPU (second column), that is, if you check the code (model.to('cuda:MY_GPU_NUMBER') against the outcome of (nvidia-smi).

questionto42
  • 215
  • 1
  • 10

1 Answers1

2

You can try setting environment variable CUDA_DEVICE_ORDER to value PCI_BUS_ID to get the ordering to be the same as the physical arrangement in the PCI lanes.

noe
  • 28,203
  • 1
  • 49
  • 83