Strange mapping: example
In the following example, the first column is chosen in the code, second column is the one that does the work instead:
0:0 1234 MiB
1:2 1234 MiB
2:7 1234 MiB
3:5 2341 MiB
4:1 3412 MiB
5:3 3412 MiB
6:4 3412 MiB
7:6 3412 MiB
Thus, to get 0,4,5,6,7, you code: 0,6,3,7,2
How to check the mapping
I have this strange mapping all the time. You can test it like this:
- build a tiny dummy model or
- load a pretrained model.
Then put this model on each device, one after the other, and in each step, check the change in !nvidia-smi|tail to see to which GPU the cuda device got mapped. This mapping does not change for the whole session, and the mapping even stays the same after a server relaunch. Thus, it seems to be set by the technical hierarchy of the GPUs which does not change unless you change the hardware.
Code to build some model
Dummy model (quick check, take this)
This tiny dummy tensor is code from PyTorch How to delete PyTorch objects correctly from memory:
import torch
model = torch.Tensor(10,10)
This builds a tiny tensor model. This is better than downloading a full pretrained model as in the next heading if you only want to check the cuda-to-GPU alignments as a test.
Full model (do not take this, take the dummy model instead)
from transformers import (AutoTokenizer, AutoModelForCausalLM, TextDataset,
DataCollatorForLanguageModeling, Trainer, TrainingArguments)
def get_model(model_name):
# Load pre-trained model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
return tokenizer, model
model_name = "dbmdz/german-gpt2"
tokenizer, model = get_model(model_name)
Code to check the devices
And this is the code that I ran for each of the devices, from 0 to 7. After each step, check which of the GPUs was filled with a new memory entry.
# Example for 0:
model = model.to('cuda:0')
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1073165 C .../miniconda3/bin/python3.9 1000MiB |
+-----------------------------------------------------------------------------+
Thus, 0->0.
Example for 1:
model = model.to('cuda:1')
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1073165 C .../miniconda3/bin/python3.9 1000MiB |
| 2 N/A N/A 1073165 C .../miniconda3/bin/python3.9 1000MiB |
+-----------------------------------------------------------------------------+
Thus, 1->2.
...
Example for 2:
model = model.to('cuda:2')
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1073165 C .../miniconda3/bin/python3.9 1000MiB |
| 2 N/A N/A 1073165 C .../miniconda3/bin/python3.9 1000MiB |
| 7 N/A N/A 1073165 C .../miniconda3/bin/python3.9 1000MiB |
+-----------------------------------------------------------------------------+
Thus, 2->7.
Further checks:
- 3->5.
- 4->1
- 5->3
- 6->4
- 7->6
Thus to get these,
> 0,1,2,3,4,5,6,7
you need to take devices:
> 0,4,1,5,6,3,7,2
So that at cuda:7, you know the whole mapping only by checking what has changed:
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1073165 C .../miniconda3/bin/python3.9 1000MiB |
| 1 N/A N/A 1073165 C .../miniconda3/bin/python3.9 1000MiB |
| 2 N/A N/A 1073165 C .../miniconda3/bin/python3.9 1000MiB |
| 3 N/A N/A 1073165 C .../miniconda3/bin/python3.9 1000MiB |
| 4 N/A N/A 1073165 C .../miniconda3/bin/python3.9 1000MiB |
| 5 N/A N/A 1073165 C .../miniconda3/bin/python3.9 1000MiB |
| 6 N/A N/A 1073165 C .../miniconda3/bin/python3.9 1000MiB |
| 7 N/A N/A 1073165 C .../miniconda3/bin/python3.9 1000MiB |
+-----------------------------------------------------------------------------+
Further setup
The environment variable CUDA_VISIBLE_DEVICES is empty, and by changing it, the mapping will not change. The strange mapping has not changed since the beginning of the project even though I changed this variable a lot:
Question
What can be done to get rid the strange mapping between coded GPU (first column in the first example above) and chosen GPU (second column), that is, if you check the code (model.to('cuda:MY_GPU_NUMBER') against the outcome of (nvidia-smi).
