🤔 PyTorch Understanding

`squeeze` and `unsqueeze`

Simply put, unsqueeze() “adds” superficial 1 dimension to tensor (at specified dimension), while squeeze removes all superficial 1 dimensions from tensor.

`squeeze`

torch.squeeze(input, dim=None, out=None) → Tensor

Documentation: torch.squeeze
Returns a tensor with all the dimensions of input of size 1 removed.
- For example, if input is of shape: $(A \times 1 \times B \times C \times 1 \times D)$ then the out tensor will be of shape: $(A \times B \times C \times D)$ .
- When dim is given, a squeeze operation is done only in the given dimension.
  - If input is of shape: $(A \times 1 \times B)$ , squeeze(input, 0) leaves the tensor unchanged
  - But squeeze(input, 1) will squeeze the tensor to the shape $(A \times B)$ .

Example

>>> x = torch.zeros(2, 1, 2, 1, 2)
>>> x.size() # alternative: x.shape
torch.Size([2, 1, 2, 1, 2])
>>> y = torch.squeeze(x)
>>> y.size()
torch.Size([2, 2, 2])
>>> y = torch.squeeze(x, 0)
>>> y.size()
torch.Size([2, 1, 2, 1, 2])
>>> y = torch.squeeze(x, 1) # specify dimension
>>> y.size()
torch.Size([2, 2, 1, 2])

`unsqueeze`

torch.unsqueeze(input, dim) → Tensor

Documentation: torch.unsqueeze
Returns a new tensor with a dimension of size one inserted at the specified position.

Example

x = torch.tensor([1, 2, 3, 4])
x.shape

torch.Size([4])

torch.unsqueeze(x, 0), torch.unsqueeze(x, 0).shape

(tensor([[1, 2, 3, 4]]), torch.Size([1, 4]))

torch.unsqueeze(x, 1), torch.unsqueeze(x, 1).shape

(tensor([[1],
         [2],
         [3],
         [4]]), torch.Size([4, 1]))

We can also achieve the same effect with view:

y = x.view(-1, 4) # same as torch.unsqueeze(x, 0)
y, y.shape

(tensor([[1, 2, 3, 4]]), torch.Size([1, 4]))

y = x.view(4, -1) # same as torch.unsqueeze(x, 1)
y, y.shape

(tensor([[1],
         [2],
         [3],
         [4]]), torch.Size([4, 1]))

unsqueeze() is particularly useful when we feed a single sample to our neural network. PyTorch requires mini-batch input, let’s say dimension is [batch_size, channels, w, h]. However, a single sample has the dimension [channels, w, h], which could lead to dimension error when we feed it to the network. We can use unsqueeze() to change its dimension to [1, channels, w, h]. This mocks up a mini-batch with only one sample and won’t cause any error.

Casting Tensor’s Type

Casting in PyTorch is as simple as typing the name of the type you wish to cast to, and treating it as a method.

Example

x = torch.ones([2, 3], dtype=torch.int32)
x

tensor([[1, 1, 1],
        [1, 1, 1]], dtype=torch.int32)

y = x.float() # cast to float
y, y.dtype

(tensor([[1., 1., 1.],
         [1., 1., 1.]]), torch.float32)

Require Gradient

To track the gradient of a tensor, its requires_grad attribute should be True. There’re two ways to achieve this:

Create a tensor, then call requires_grad_()

Example:

import torch

x = torch.ones(2, 3)
x

tensor([[1., 1., 1.],
        [1., 1., 1.]])

x.requires_grad_()
x

tensor([[1., 1., 1.],
        [1., 1., 1.]], requires_grad=True)

Specify requires_grad=True during creating tensor

y = torch.ones((2, 3), requires_grad=True)
y

tensor([[1., 1., 1.],
        [1., 1., 1.]], requires_grad=True)

Not Taking Gradient

During training, we will update the weights and biases based on the gradient and learning rate.

When we do so, we have to tell PyTorch not to take the gradient of this step too—otherwise things will get very confusing when we try to compute the derivative at the next batch!

Assign to tensor’s `data` attribute

If we assign to the data attribute of a tensor then PyTorch will not take the gradient of that step.

E.g.

for p in model.parameters():
  p.data -= p.grad * lr

Wrap in `torch.no_grad()`

E.g.

with torch.no_grad():
	for p in models.parameters():
    p -= p.grad * lr

Enable/Disabel gradient tracking dynamically

Use torch.set_grad_enabled (mode: bool) (Documentation)

with torch.set_grad_enabled(flag):
    // do something
    pass

Zero the Gradient

In PyToch, after updating the parameters using their gradients in one iteration, the gradients need to be zeroed. Otherwise they’ll be accumulated in the next iteration!

`zero_()`

E.g.

for p in model.parameters():
  p.grad.zero_()

Set `grad` as None

E.g.

for p in model.parameters():
  p.grad = None

Use `optimizer.zero_grad()`

Running loss

loss.item() returns the average loss for each sample within the batch, i.e. loss of entire mini-batch but divided by the batch size. Therefore, to get the running loss of the mini-batch, we need to do:

running_loss += loss.item() * batch_size

Reference

Number of model’s parameters

Use `model.parameters()`

Sum the number of elements for every parameter group:

def get_num_params(model):
    return sum(p.numel() for p in model.parameters())

Calculate only the trainable parameters:

def get_num_trainable_params(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

Reference: How do I check the number of parameters of a model?

Use `model.named_paramters()`

To get the parameter count of each layer, PyTorch has model.named_paramters() that returns an iterator of both the parameter name and the parameter itself.

Example

from prettytable import PrettyTable

def count_parameters(model):
    table = PrettyTable(["Modules", "Parameters"])
    total_params = 0
    
    for name, parameter in model.named_parameters():
        if parameter.requires_grad:
            param = parameter.numel()
            table.add_row([name, param])
            total_params+=param
            
    print(table)
    print(f"Total Trainable Params: {total_params}")
    
    return total_params
    
count_parameters(net)

Reference: Check the total number of parameters in a PyTorch model

Split tensors

torch.split(): Splits the tensor into chunks of specified sizes
torch.chunk(): Attempts to split a tensor into the specified number of chunks.

`torch.split()`

Split tensor evenly:

import torch

tensor = torch.rand((4, 6, 2, 2))
tensor.shape

torch.Size([4, 6, 2, 2])

tensor_split = tensor.split(1, dim=1) # split tensor into chunks of size 1 along dimension 1
print(f"#chunks: {len(tensor_split)}; chunk size: {tensor_split[0].shape}")

#chunks: 6; chunk size: torch.Size([4, 1, 2, 2])

You can also split the tensor into chunks of different specific sizes (i.e. not evenly):

chunk_sizes = [1, 2, 3]
tensor_split = tensor.split(chunk_sizes, dim=1) # split a into chunks of size 1 along dimension 1
print(f"#chunks: {len(tensor_split)}")
for idx, chunk in enumerate(tensor_split):
    print(f"Shape of chunk {idx}: {chunk.shape}")

#chunks: 3
Shape of chunk 0: torch.Size([4, 1, 2, 2])
Shape of chunk 1: torch.Size([4, 2, 2, 2])
Shape of chunk 2: torch.Size([4, 3, 2, 2])

`torch.chunk()`

Split a tensor into the specified number of chunks

tensor_chunks = tensor.chunk(3, dim=1) # split tensor into 3 chunks along dimension 1
print(f"#chunks: {len(tensor_chunks)}")
for idx, chunk in enumerate(tensor_chunks):
    print(f"Shape of chunk {idx}: {chunk.shape}")

#chunks: 3
Shape of chunk 0: torch.Size([4, 2, 2, 2])
Shape of chunk 1: torch.Size([4, 2, 2, 2])
Shape of chunk 2: torch.Size([4, 2, 2, 2])

Last updated on 2024-09-05

← 📈 Visualization with TensorBoard 2020-09-22

📚 PyTorch Resources 2020-09-07 →

🤔 PyTorch Understanding

squeeze and unsqueeze

squeeze

unsqueeze

Casting Tensor’s Type

Require Gradient

Not Taking Gradient

Assign to tensor’s data attribute

Wrap in torch.no_grad()

Enable/Disabel gradient tracking dynamically

Zero the Gradient

zero_()

Set grad as None

Use optimizer.zero_grad()

Running loss

Number of model’s parameters

Use model.parameters()

Use model.named_paramters()

Split tensors

torch.split()

torch.chunk()

`squeeze` and `unsqueeze`

`squeeze`

`unsqueeze`

Assign to tensor’s `data` attribute

Wrap in `torch.no_grad()`

`zero_()`

Set `grad` as None

Use `optimizer.zero_grad()`

Use `model.parameters()`

Use `model.named_paramters()`

`torch.split()`

`torch.chunk()`