The selected device can be changed with a torch.cuda.device
context manager.
Cross-CPU operation are note allowed by default, with the only exception of copy_()
.
1 | x = torch.cuda.FloatTensor(1) |
Memory management
empty_cache()
can release all unused cached memory from Pytorch so that those can be used by other GPU applications.
Best practices
Device-annostic code
A common pattern is to use Python's argparse
module to read in user arguments, and have a flag that can be used to disable CUDA, int combination with is_available()
. In the following, args.cuda
results in a flag that can be used to cast tensors and modules to CUDA if desired:
1 | import argparse |
If modules or tensors need to be sent to the GPU, args.cuda
can be used as fllows:
1 | x = torch.Tensor(8, 42) |
1 | dtype = torch.cuda.FloatTensor |
CUDA_VISIBLE_DEVICES
and torch.cuda.device
1 | print("Outside device is 0") # On device 0 (default in most scenarios) |
1 | x_cpu = torch.FloatTensor(1) |
1 | x_cpu = torch.FloatTensor(1) |
Use pinned memory buffers
pin_memory()
cuda(async = True)
pin_memory = True