🤔 PyTorch Understanding
squeeze
and unsqueeze
Simply put, unsqueeze()
“adds” superficial 1
dimension to tensor (at specified dimension), while squeeze
removes all superficial 1
dimensions from tensor.
squeeze
torch.squeeze(input, dim=None, out=None) → Tensor
Documentation:
torch.squeeze
Returns a tensor with all the dimensions of
input
of size 1 removed.- For example, if input is of shape: $(A \times 1 \times B \times C \times 1 \times D)$ then the out tensor will be of shape: $(A \times B \times C \times D)$.
- When
dim
is given, a squeeze operation is done only in the given dimension.- If input is of shape: $(A \times 1 \times B)$,
squeeze(input, 0)
leaves the tensor unchanged - But
squeeze(input, 1)
will squeeze the tensor to the shape $(A \times B)$ .
- If input is of shape: $(A \times 1 \times B)$,
Example
>>> x = torch.zeros(2, 1, 2, 1, 2) >>> x.size() # alternative: x.shape torch.Size([2, 1, 2, 1, 2]) >>> y = torch.squeeze(x) >>> y.size() torch.Size([2, 2, 2]) >>> y = torch.squeeze(x, 0) >>> y.size() torch.Size([2, 1, 2, 1, 2]) >>> y = torch.squeeze(x, 1) # specify dimension >>> y.size() torch.Size([2, 2, 1, 2])
unsqueeze
torch.unsqueeze(input, dim) → Tensor
Documentation:
torch.unsqueeze
Returns a new tensor with a dimension of size one inserted at the specified position.
Example
x = torch.tensor([1, 2, 3, 4]) x.shape
torch.Size([4])
torch.unsqueeze(x, 0), torch.unsqueeze(x, 0).shape
(tensor([[1, 2, 3, 4]]), torch.Size([1, 4]))
torch.unsqueeze(x, 1), torch.unsqueeze(x, 1).shape
(tensor([[1], [2], [3], [4]]), torch.Size([4, 1]))
We can also achieve the same effect with view
:
y = x.view(-1, 4) # same as torch.unsqueeze(x, 0)
y, y.shape
(tensor([[1, 2, 3, 4]]), torch.Size([1, 4]))
y = x.view(4, -1) # same as torch.unsqueeze(x, 1)
y, y.shape
(tensor([[1],
[2],
[3],
[4]]), torch.Size([4, 1]))
unsqueeze()
is particularly useful when we feed a single sample to our neural network. PyTorch requires mini-batch input, let’s say dimension is [batch_size, channels, w, h]
. However, a single sample has the dimension [channels, w, h]
, which could lead to dimension error when we feed it to the network. We can use unsqueeze()
to change its dimension to [1, channels, w, h]
. This mocks up a mini-batch with only one sample and won’t cause any error.
Casting Tensor’s Type
Casting in PyTorch is as simple as typing the name of the type you wish to cast to, and treating it as a method.
Example
x = torch.ones([2, 3], dtype=torch.int32)
x
tensor([[1, 1, 1],
[1, 1, 1]], dtype=torch.int32)
y = x.float() # cast to float
y, y.dtype
(tensor([[1., 1., 1.],
[1., 1., 1.]]), torch.float32)
Require Gradient
To track the gradient of a tensor, its requires_grad
attribute should be True
. There’re two ways to achieve this:
Create a tensor, then call
requires_grad_()
Example:
import torch x = torch.ones(2, 3) x
tensor([[1., 1., 1.], [1., 1., 1.]])
x.requires_grad_() x
tensor([[1., 1., 1.], [1., 1., 1.]], requires_grad=True)
Specify
requires_grad=True
during creating tensory = torch.ones((2, 3), requires_grad=True) y
tensor([[1., 1., 1.], [1., 1., 1.]], requires_grad=True)
Not Taking Gradient
During training, we will update the weights and biases based on the gradient and learning rate.
When we do so, we have to tell PyTorch not to take the gradient of this step too—otherwise things will get very confusing when we try to compute the derivative at the next batch!
Assign to tensor’s data
attribute
If we assign to the data
attribute of a tensor then PyTorch will not take the gradient of that step.
E.g.
for p in model.parameters():
p.data -= p.grad * lr
Wrap in torch.no_grad()
E.g.
with torch.no_grad():
for p in models.parameters():
p -= p.grad * lr
Enable/Disabel gradient tracking dynamically
Use torch.set_grad_enabled (mode: bool)
(Documentation)
with torch.set_grad_enabled(flag):
// do something
pass
Zero the Gradient
In PyToch, after updating the parameters using their gradients in one iteration, the gradients need to be zeroed. Otherwise they’ll be accumulated in the next iteration!
zero_()
E.g.
for p in model.parameters():
p.grad.zero_()
Set grad
as None
E.g.
for p in model.parameters():
p.grad = None
Use optimizer.zero_grad()
Running loss
loss.item()
returns the average loss for each sample within the batch, i.e. loss of entire mini-batch but divided by the batch size. Therefore, to get the running loss of the mini-batch, we need to do:
running_loss += loss.item() * batch_size
Reference
Number of model’s parameters
Use model.parameters()
Sum the number of elements for every parameter group:
def get_num_params(model):
return sum(p.numel() for p in model.parameters())
Calculate only the trainable parameters:
def get_num_trainable_params(model):
return sum(p.numel() for p in model.parameters() if p.requires_grad)
Reference: How do I check the number of parameters of a model?
Use model.named_paramters()
To get the parameter count of each layer, PyTorch has model.named_paramters() that returns an iterator of both the parameter name and the parameter itself.
Example
from prettytable import PrettyTable
def count_parameters(model):
table = PrettyTable(["Modules", "Parameters"])
total_params = 0
for name, parameter in model.named_parameters():
if parameter.requires_grad:
param = parameter.numel()
table.add_row([name, param])
total_params+=param
print(table)
print(f"Total Trainable Params: {total_params}")
return total_params
count_parameters(net)
Reference: Check the total number of parameters in a PyTorch model
Split tensors
torch.split()
: Splits the tensor into chunks of specified sizestorch.chunk()
: Attempts to split a tensor into the specified number of chunks.
torch.split()
Split tensor evenly:
import torch
tensor = torch.rand((4, 6, 2, 2))
tensor.shape
torch.Size([4, 6, 2, 2])
tensor_split = tensor.split(1, dim=1) # split tensor into chunks of size 1 along dimension 1
print(f"#chunks: {len(tensor_split)}; chunk size: {tensor_split[0].shape}")
#chunks: 6; chunk size: torch.Size([4, 1, 2, 2])
You can also split the tensor into chunks of different specific sizes (i.e. not evenly):
chunk_sizes = [1, 2, 3]
tensor_split = tensor.split(chunk_sizes, dim=1) # split a into chunks of size 1 along dimension 1
print(f"#chunks: {len(tensor_split)}")
for idx, chunk in enumerate(tensor_split):
print(f"Shape of chunk {idx}: {chunk.shape}")
#chunks: 3
Shape of chunk 0: torch.Size([4, 1, 2, 2])
Shape of chunk 1: torch.Size([4, 2, 2, 2])
Shape of chunk 2: torch.Size([4, 3, 2, 2])
torch.chunk()
Split a tensor into the specified number of chunks
tensor_chunks = tensor.chunk(3, dim=1) # split tensor into 3 chunks along dimension 1
print(f"#chunks: {len(tensor_chunks)}")
for idx, chunk in enumerate(tensor_chunks):
print(f"Shape of chunk {idx}: {chunk.shape}")
#chunks: 3
Shape of chunk 0: torch.Size([4, 2, 2, 2])
Shape of chunk 1: torch.Size([4, 2, 2, 2])
Shape of chunk 2: torch.Size([4, 2, 2, 2])