Pytorch: Machine Learning library

Pytorch is one of open-source, modern deep learning libraries out there and what we will use in this workshop. Other popular libraries include Tensorflow, Keras, MXNet, Spark ML, etc. …

All of those libraries works very similar in terms of implementing your neural network architecture. If you are new, probably any of Pytorch/Keras/Tensorflow would work well with lots of guidance/examples/discussion-forums online! Common things you have to learn include:

  1. Data types (typically arbitrary dimension matrix, or tensor )

  2. Data loading tools (streamline prepping data into appropraite types from input files)

  3. Chaining operations = a computation graph

In this notebook, we cover the basics part in each of topics above.

1. Tensor data types in PyTorch

In pytorch, we use torch.Tensor object to represent data matrix. It is a lot like numpy array but not quite the same. torch provide APIs to easily convert data between numpy array and torch.Tensor. Let’s play a little bit.

from __future__ import print_function
import numpy as np
import torch
SEED=123
np.random.seed(SEED)
torch.manual_seed(SEED)
Copy to clipboard
<torch._C.Generator at 0x7fdd47da8fb0>
Copy to clipboard

… yep, that’s how we set pytorch random number seed! (see Python-03-Numpy if you don’t know about a seed)

Creating a torch.Tensor

Pytorch provides constructors similar to numpy (and named same way where possible to avoid users having to look-up function names). Here are some examples.

# Tensor of 0s = numpy.zeros
t=torch.zeros(2,3)
print('torch.zeros:\n',t)

# Tensor of 1s = numpy.ones
t=torch.ones(2,3)
print('\ntorch.ones:\n',t)

# Tensor from a sequential integers = numpy.arange
t=torch.arange(0,6,1).reshape(2,3).float()
print('\ntorch.arange:\n',t)

# Normal distribution centered at 0.0 and sigma=1.0 = numpy.rand.randn
t=torch.randn(2,3)
print('\ntorch.randn:\n',t)
Copy to clipboard
torch.zeros:
 tensor([[0., 0., 0.],
        [0., 0., 0.]])

torch.ones:
 tensor([[1., 1., 1.],
        [1., 1., 1.]])

torch.arange:
 tensor([[0., 1., 2.],
        [3., 4., 5.]])

torch.randn:
 tensor([[-0.1115,  0.1204, -0.3696],
        [-0.2404, -1.1969,  0.2093]])
Copy to clipboard

… or you can create from a simple list, tuple, and numpy arrays.

# Create numpy array
data_np = np.zeros([10,10],dtype=np.float32)
# Fill something
np.fill_diagonal(data_np,1.)
print('Numpy data\n',data_np)

# Create torch.Tensor
data_torch = torch.Tensor(data_np)
print('\ntorch.Tensor data\n',data_torch)

# One can make also from a list
data_list = [1,2,3]
data_list_torch = torch.Tensor(data_list)
print('\nPython list :',data_list)
print('torch.Tensor:',data_list_torch)
Copy to clipboard
Numpy data
 [[1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]]

torch.Tensor data
 tensor([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 1., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]])

Python list : [1, 2, 3]
torch.Tensor: tensor([1., 2., 3.])
Copy to clipboard

Converting back from torch.Tensor to a numpy array can be easily done

# Bringing back into numpy array
data_np = data_torch.numpy()
print('\nNumpy data (converted back from torch.Tensor)\n',data_np)
Copy to clipboard
Numpy data (converted back from torch.Tensor)
 [[1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]]
Copy to clipboard

Ordinary operations to an array also exists like numpy.

# mean & std
print('mean',data_torch.mean(),'std',data_torch.std(),'sum',data_torch.sum())
Copy to clipboard
mean tensor(0.1000) std tensor(0.3015) sum tensor(10.)
Copy to clipboard

We see the return of those functions (mean,std,sum) are tensor objects. If you would like a single scalar value, you can call item function.

# mean & std
print('mean',data_torch.mean().item(),'std',data_torch.std().item(),'sum',data_torch.sum().item())
Copy to clipboard
mean 0.10000000149011612 std 0.30151134729385376 sum 10.0
Copy to clipboard

Tensor addition and multiplication

Common operations include element-wise multiplication, matrix multiplication, and reshaping. Read the documentation to find the right function for what you want to do!

# Two matrices 
data_a = np.zeros([3,3],dtype=np.float32)
data_b = np.zeros([3,3],dtype=np.float32)
np.fill_diagonal(data_a,1.)
data_b[0,:]=1.
# print them
print('Two numpy matrices')
print(data_a)
print(data_b,'\n')

# Make torch.Tensor
torch_a = torch.Tensor(data_a)
torch_b = torch.Tensor(data_b)

print('torch.Tensor element-wise multiplication:')
print(torch_a*torch_b)

print('\ntorch.Tensor matrix multiplication:')
print(torch_a.matmul(torch_b))

print('\ntorch.Tensor matrix addition:')
print(torch_a-torch_b)

print('\nadding a scalar 1:')
print(torch_a+1)
Copy to clipboard
Two numpy matrices
[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]
[[1. 1. 1.]
 [0. 0. 0.]
 [0. 0. 0.]] 

torch.Tensor element-wise multiplication:
tensor([[1., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]])

torch.Tensor matrix multiplication:
Copy to clipboard
tensor([[1., 1., 1.],
        [0., 0., 0.],
        [0., 0., 0.]])

torch.Tensor matrix addition:
tensor([[ 0., -1., -1.],
        [ 0.,  1.,  0.],
        [ 0.,  0.,  1.]])

adding a scalar 1:
tensor([[2., 1., 1.],
        [1., 2., 1.],
        [1., 1., 2.]])
Copy to clipboard

Reshaping

You can access the tensor shape via .shape attribute like numpy

print('torch_a shape:',torch_a.shape)
print('The 0th dimension size:',torch_a.shape[0])
Copy to clipboard
torch_a shape: torch.Size([3, 3])
The 0th dimension size: 3
Copy to clipboard

Similarly, there is a reshape function

torch_a.reshape(1,9).shape
Copy to clipboard
torch.Size([1, 9])
Copy to clipboard

… and you can also use -1 in the same way you used for numpy

torch_a.reshape(-1,3).shape
Copy to clipboard
torch.Size([3, 3])
Copy to clipboard

Indexing (Slicing)

We can use a similar indexing trick like we tried with a numpy array

torch_a[0,:]
Copy to clipboard
tensor([1., 0., 0.])
Copy to clipboard

or a boolean mask generation

mask = torch_a == 0.
mask
Copy to clipboard
tensor([[False,  True,  True],
        [ True, False,  True],
        [ True,  True, False]])
Copy to clipboard

… and slicing with it using masked_select function

torch_a.masked_select(~mask)
Copy to clipboard
tensor([1., 1., 1.])
Copy to clipboard

2. Data loading tools in Pytorch

In Python-02-Python, we covered an iteratable class and how it could be useful to generalize a design of data access tools. Pytorch (and any other ML libraries out there) provides a generalized tool to interface such iteratable data instance called DataLoader. Desired capabilities of such tools include ability to choose random vs. ordered subset in data, parallelized workers to simultaneously prepare multiple batch data, etc..

Let’s practice the use of DataLoader.

First, we define the same iteretable class mentioned in Python-02-Python notebook.

class dataset:
    
    def __init__(self):
        self._data = tuple(range(100))
        
    def __len__(self):
        return len(self._data)
    
    def __getitem__(self,index):
        return self._data[index]
    
data = dataset()
Copy to clipboard

Here is how you can instantiate a DataLoader. We construct an instance called loader that can automatically packs 10 elements of data (batch_size=10) that is randomly selected (shuffle=True) using 1 parallel worker to prepare such data (num_workers=1).

from torch.utils.data import DataLoader
loader = DataLoader(data,batch_size=10,shuffle=True,num_workers=1)
Copy to clipboard

The dataloader itself is an iterable object. We created a dataloader with batch size 10 where the dataset instance has the length 100. This means, if we iterate on the dataloader instance, we get 10 separate batch data.

for index, batch_data in enumerate(loader):
    print('Batch entry',index,'... batch data',batch_data)
Copy to clipboard
Batch entry 0 ... batch data tensor([23, 14, 64, 51, 94, 25, 38, 44, 70, 28])
Batch entry 1 ... batch data tensor([37, 57, 66, 43, 53, 13, 72, 48, 74, 62])
Batch entry 2 ... batch data tensor([89,  3, 40, 92, 86, 65, 63, 95, 21, 97])
Batch entry 3 ... batch data tensor([ 9, 42, 45, 54, 31, 87, 99, 46, 98, 26])
Batch entry 4 ... batch data tensor([41, 80, 36, 90,  0, 59, 52, 69, 17, 56])
Batch entry 5 ... batch data tensor([16, 61, 82, 30, 77, 73, 96, 33,  6, 83])
Batch entry 6 ... batch data tensor([39,  5, 24, 32, 85, 35, 50, 60,  1, 78])
Batch entry 7 ... batch data tensor([18,  2, 71,  7, 34, 20, 49, 10,  8, 84])
Batch entry 8 ... batch data tensor([76, 93, 12, 81, 22, 55,  4, 19, 11, 27])
Batch entry 9 ... batch data tensor([29, 15, 47, 88, 75, 68, 67, 58, 79, 91])
Copy to clipboard

We can see that data elements are chosen randomly as we chose “shuffle=True”. Does this cover all data elements in the dataset? Let’s check this by combining all iterated data.

data_collection = []
for index,batch_data in enumerate(loader):
    data_collection += [int(v) for v in batch_data]
    
import numpy as np
np.unique(data_collection)
Copy to clipboard
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
       34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
       51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
       68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,
       85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])
Copy to clipboard

This covers the minimal concept of DataLoader you need to know in order to follow the workshop. You can read more about DataLoader in pytorch documentation here and also more extended example in their tutorial if you are interested in exploring yourself.

3. Computation graph

The last point to cover is how to chain modularized mathematical operations.

To get started, let’s introduce a few, well used mathematical operations in pytorch.

  • torch.nn.ReLU (link) … a function that takes an input tenor and outputs a tensor of the same shape where elements are 0 if the corresponding input element has a value below 0, and otherwise the same value.

  • torch.nn.Softmax (link) … a function that applies a softmax function on the specified dimension of an input data.

  • torch.nn.MaxPool2d (link) … a function that down-sample the input matrix by taking maximum value from sub-matrices of a specified shape.

Let’s see what each of these functions do first using a simple 2D matrix data.

# Create a 2D tensor of shape (1,5,5) with some negative and positive values
data = torch.randn(25).reshape(1,5,5)
data
Copy to clipboard
tensor([[[ 1.5810,  1.3010,  1.2753, -0.2010, -0.1606],
         [-0.4015,  0.6957, -1.8061, -1.1589, -0.4210],
         [-0.9620,  1.2825,  0.8768,  1.6221, -1.4779],
         [ 1.1331, -1.2203, -1.1285,  0.4135,  0.2892],
         [ 2.2473, -0.8036, -0.2808,  0.7697, -0.6596]]])
Copy to clipboard

Here’s how ReLU works

op0 = torch.nn.ReLU()
op0(data)
Copy to clipboard
tensor([[[1.5810, 1.3010, 1.2753, 0.0000, 0.0000],
         [0.0000, 0.6957, 0.0000, 0.0000, 0.0000],
         [0.0000, 1.2825, 0.8768, 1.6221, 0.0000],
         [1.1331, 0.0000, 0.0000, 0.4135, 0.2892],
         [2.2473, 0.0000, 0.0000, 0.7697, 0.0000]]])
Copy to clipboard

Here’s how Softmax works

op1 = torch.nn.Softmax(dim=2)
op1(data)
Copy to clipboard
tensor([[[0.3526, 0.2665, 0.2597, 0.0593, 0.0618],
         [0.1757, 0.5264, 0.0431, 0.0824, 0.1723],
         [0.0327, 0.3086, 0.2057, 0.4334, 0.0195],
         [0.4725, 0.0449, 0.0492, 0.2301, 0.2032],
         [0.7093, 0.0336, 0.0566, 0.1618, 0.0388]]])
Copy to clipboard

Here’s how MaxPool2d works with a kernel shape (5,1)

op2 = torch.nn.MaxPool2d(kernel_size=(1,5))
op2(data)
Copy to clipboard
/usr/local/lib/python3.8/dist-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at  /pytorch/c10/core/TensorImpl.h:1156.)
  return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
Copy to clipboard
tensor([[[1.5810],
         [0.6957],
         [1.6221],
         [1.1331],
         [2.2473]]])
Copy to clipboard

So if we want to define a computation graph that applies these operations in a sequential order, we could try:

op2(op1(op0(data)))
Copy to clipboard
tensor([[[0.3444],
         [0.3339],
         [0.3874],
         [0.3905],
         [0.6472]]])
Copy to clipboard

Pytorch provides tools called containers to make this easy. Let’s try torch.nn.Sequential (see different type of containers here).

myop = torch.nn.Sequential(op0,op1,op2)
myop(data)
Copy to clipboard
tensor([[[0.3444],
         [0.3339],
         [0.3874],
         [0.3905],
         [0.6472]]])
Copy to clipboard

We might wonder “Can I add a custom operation to this graph?” Yes, we can add any module that inherits from torch.nn.Module class. Let’s define one for ourself.

class AddOne(torch.nn.Module):

    # always call the base class constructor for defining your torch.nn.Module inherit class!
    def __init__(self):
        super().__init__()
        
    # forward needs to be defined. This is called by "()" function call.
    def forward(self,input):
        
        return input + 1;
Copy to clipboard

Now let’s add our operation

myop = torch.nn.Sequential(op0,op1,op2,AddOne())
myop(data)
Copy to clipboard
tensor([[[1.3444],
         [1.3339],
         [1.3874],
         [1.3905],
         [1.6472]]])
Copy to clipboard

Of course, you can also embed op0, op1, and op2 inside one module.

class MyOp(torch.nn.Module):
    
    def __init__(self):
        super().__init__()
        self._sequence = torch.nn.Sequential(torch.nn.ReLU(), 
                                             torch.nn.Softmax(dim=2), 
                                             torch.nn.MaxPool2d(kernel_size=(1,5)),
                                             AddOne(),
                                            )
        
    def forward(self,input):
        
        return self._sequence(input)
Copy to clipboard

Let’s try using it.

myop = MyOp()
myop(data)
Copy to clipboard
tensor([[[1.3444],
         [1.3339],
         [1.3874],
         [1.3905],
         [1.6472]]])
Copy to clipboard

Extra: GPU acceleration

This section only works if you run this notebook on a GPU-enabled machine (not on the binder unfortunately)

Putting torch.Tensor on GPU is as easy as calling .cuda() function (and if you want to bring it back to cpu, call .cpu() on a cuda.Tensor). Let’s do a simple speed comparison.

Create two arrays with an identical data type, shape, and values.

# Create 1000x1000 matrix
data_np=np.zeros([1000,1000],dtype=np.float32)
data_cpu = torch.Tensor(data_np).cpu()
#data_gpu = torch.Tensor(data_np).cuda()
Copy to clipboard

Time fifth power of the matrix on CPU

%%timeit
mean = (data_cpu ** 5).mean().item()
Copy to clipboard
6.09 ms ± 195 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Copy to clipboard

… and next on GPU

%%timeit
mean = (data_gpu ** 5).mean().item()
Copy to clipboard
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
/tmp/ipykernel_46365/2823993350.py in <module>
----> 1 get_ipython().run_cell_magic('timeit', '', 'mean = (data_gpu ** 5).mean().item()\n')

/usr/local/lib/python3.8/dist-packages/IPython/core/interactiveshell.py in run_cell_magic(self, magic_name, line, cell)
   2401             with self.builtin_trap:
   2402                 args = (magic_arg_s, cell)
-> 2403                 result = fn(*args, **kwargs)
   2404             return result
   2405 

/usr/local/lib/python3.8/dist-packages/decorator.py in fun(*args, **kw)
    230             if not kwsyntax:
    231                 args, kw = fix(args, kw, sig)
--> 232             return caller(func, *(extras + args), **kw)
    233     fun.__name__ = func.__name__
    234     fun.__doc__ = func.__doc__

/usr/local/lib/python3.8/dist-packages/IPython/core/magic.py in <lambda>(f, *a, **k)
    185     # but it's overkill for just that one bit of state.
    186     def magic_deco(arg):
--> 187         call = lambda f, *a, **k: f(*a, **k)
    188 
    189         if callable(arg):

/usr/local/lib/python3.8/dist-packages/IPython/core/magics/execution.py in timeit(self, line, cell, local_ns)
   1167             for index in range(0, 10):
   1168                 number = 10 ** index
-> 1169                 time_number = timer.timeit(number)
   1170                 if time_number >= 0.2:
   1171                     break

/usr/local/lib/python3.8/dist-packages/IPython/core/magics/execution.py in timeit(self, number)
    167         gc.disable()
    168         try:
--> 169             timing = self.inner(it, self.timer)
    170         finally:
    171             if gcold:

<magic-timeit> in inner(_it, _timer)

NameError: name 'data_gpu' is not defined
Copy to clipboard

… which is more than x10 faster than the cpu counter part :)

But there’s a catch you should be aware! Preparing a data on GPU does take time because data needs to be sent to GPU, which could take some time. Let’s compare the time it takes to create a tensor on CPU v.s. GPU.

%%timeit
data_np=np.zeros([1000,1000],dtype=np.float32)
data_cpu = torch.Tensor(data_np).cpu()
Copy to clipboard
165 µs ± 1.02 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Copy to clipboard
%%timeit
#data_np=np.zeros([1000,1000],dtype=np.float32)
#data_gpu = torch.Tensor(data_np).cuda()
Copy to clipboard
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/ipykernel_46365/3126707622.py in <module>
----> 1 get_ipython().run_cell_magic('timeit', '', '#data_np=np.zeros([1000,1000],dtype=np.float32)\n#data_gpu = torch.Tensor(data_np).cuda()\n')

/usr/local/lib/python3.8/dist-packages/IPython/core/interactiveshell.py in run_cell_magic(self, magic_name, line, cell)
   2401             with self.builtin_trap:
   2402                 args = (magic_arg_s, cell)
-> 2403                 result = fn(*args, **kwargs)
   2404             return result
   2405 

/usr/local/lib/python3.8/dist-packages/decorator.py in fun(*args, **kw)
    230             if not kwsyntax:
    231                 args, kw = fix(args, kw, sig)
--> 232             return caller(func, *(extras + args), **kw)
    233     fun.__name__ = func.__name__
    234     fun.__doc__ = func.__doc__

/usr/local/lib/python3.8/dist-packages/IPython/core/magic.py in <lambda>(f, *a, **k)
    185     # but it's overkill for just that one bit of state.
    186     def magic_deco(arg):
--> 187         call = lambda f, *a, **k: f(*a, **k)
    188 
    189         if callable(arg):

/usr/local/lib/python3.8/dist-packages/IPython/core/magics/execution.py in timeit(self, line, cell, local_ns)
   1144 
   1145         t0 = clock()
-> 1146         code = self.shell.compile(timeit_ast, "<magic-timeit>", "exec")
   1147         tc = clock()-t0
   1148 

/usr/lib/python3.8/codeop.py in __call__(self, source, filename, symbol)
    141 
    142     def __call__(self, source, filename, symbol):
--> 143         codeob = compile(source, filename, symbol, self.flags, 1)
    144         for feature in _features:
    145             if codeob.co_flags & feature.compiler_flag:

ValueError: empty body on For
Copy to clipboard

As you can see, it takes nearly 10 times longer time to create this particular data tensor on our GPU. This speed depends on many factors including your hardware configuration (e.g. CPU-GPU communication via PCI-e or NVLINK). It makes sense to move computation that takes longer than this data transfer time to perform on GPU.