Pytorch: Machine Learning library

Pytorch is one of open-source, modern deep learning libraries out there and what we will use in this workshop. Other popular libraries include Tensorflow, Keras, MXNet, Spark ML, etc. …

All of those libraries works very similar in terms of implementing your neural network architecture. If you are new, probably any of Pytorch/Keras/Tensorflow would work well with lots of guidance/examples/discussion-forums online! Common things you have to learn include:

  1. Data types (typically arbitrary dimension matrix, or tensor )

  2. Data loading tools (streamline prepping data into appropraite types from input files)

  3. Chaining operations = a computation graph

In this notebook, we cover the basics part in each of topics above.

1. Tensor data types in PyTorch

In pytorch, we use torch.Tensor object to represent data matrix. It is a lot like numpy array but not quite the same. torch provide APIs to easily convert data between numpy array and torch.Tensor. Let’s play a little bit.

from __future__ import print_function
import numpy as np
import torch
SEED=123
np.random.seed(SEED)
torch.manual_seed(SEED)
<torch._C.Generator at 0x7fdd47da8fb0>

… yep, that’s how we set pytorch random number seed! (see Python-03-Numpy if you don’t know about a seed)

Creating a torch.Tensor

Pytorch provides constructors similar to numpy (and named same way where possible to avoid users having to look-up function names). Here are some examples.

# Tensor of 0s = numpy.zeros
t=torch.zeros(2,3)
print('torch.zeros:\n',t)

# Tensor of 1s = numpy.ones
t=torch.ones(2,3)
print('\ntorch.ones:\n',t)

# Tensor from a sequential integers = numpy.arange
t=torch.arange(0,6,1).reshape(2,3).float()
print('\ntorch.arange:\n',t)

# Normal distribution centered at 0.0 and sigma=1.0 = numpy.rand.randn
t=torch.randn(2,3)
print('\ntorch.randn:\n',t)
torch.zeros:
 tensor([[0., 0., 0.],
        [0., 0., 0.]])

torch.ones:
 tensor([[1., 1., 1.],
        [1., 1., 1.]])

torch.arange:
 tensor([[0., 1., 2.],
        [3., 4., 5.]])

torch.randn:
 tensor([[-0.1115,  0.1204, -0.3696],
        [-0.2404, -1.1969,  0.2093]])

… or you can create from a simple list, tuple, and numpy arrays.

# Create numpy array
data_np = np.zeros([10,10],dtype=np.float32)
# Fill something
np.fill_diagonal(data_np,1.)
print('Numpy data\n',data_np)

# Create torch.Tensor
data_torch = torch.Tensor(data_np)
print('\ntorch.Tensor data\n',data_torch)

# One can make also from a list
data_list = [1,2,3]
data_list_torch = torch.Tensor(data_list)
print('\nPython list :',data_list)
print('torch.Tensor:',data_list_torch)
Numpy data
 [[1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]]

torch.Tensor data
 tensor([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 1., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]])

Python list : [1, 2, 3]
torch.Tensor: tensor([1., 2., 3.])

Converting back from torch.Tensor to a numpy array can be easily done

# Bringing back into numpy array
data_np = data_torch.numpy()
print('\nNumpy data (converted back from torch.Tensor)\n',data_np)
Numpy data (converted back from torch.Tensor)
 [[1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]]

Ordinary operations to an array also exists like numpy.

# mean & std
print('mean',data_torch.mean(),'std',data_torch.std(),'sum',data_torch.sum())
mean tensor(0.1000) std tensor(0.3015) sum tensor(10.)

We see the return of those functions (mean,std,sum) are tensor objects. If you would like a single scalar value, you can call item function.

# mean & std
print('mean',data_torch.mean().item(),'std',data_torch.std().item(),'sum',data_torch.sum().item())
mean 0.10000000149011612 std 0.30151134729385376 sum 10.0

Tensor addition and multiplication

Common operations include element-wise multiplication, matrix multiplication, and reshaping. Read the documentation to find the right function for what you want to do!

# Two matrices 
data_a = np.zeros([3,3],dtype=np.float32)
data_b = np.zeros([3,3],dtype=np.float32)
np.fill_diagonal(data_a,1.)
data_b[0,:]=1.
# print them
print('Two numpy matrices')
print(data_a)
print(data_b,'\n')

# Make torch.Tensor
torch_a = torch.Tensor(data_a)
torch_b = torch.Tensor(data_b)

print('torch.Tensor element-wise multiplication:')
print(torch_a*torch_b)

print('\ntorch.Tensor matrix multiplication:')
print(torch_a.matmul(torch_b))

print('\ntorch.Tensor matrix addition:')
print(torch_a-torch_b)

print('\nadding a scalar 1:')
print(torch_a+1)
Two numpy matrices
[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]
[[1. 1. 1.]
 [0. 0. 0.]
 [0. 0. 0.]] 

torch.Tensor element-wise multiplication:
tensor([[1., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]])

torch.Tensor matrix multiplication:
tensor([[1., 1., 1.],
        [0., 0., 0.],
        [0., 0., 0.]])

torch.Tensor matrix addition:
tensor([[ 0., -1., -1.],
        [ 0.,  1.,  0.],
        [ 0.,  0.,  1.]])

adding a scalar 1:
tensor([[2., 1., 1.],
        [1., 2., 1.],
        [1., 1., 2.]])

Reshaping

You can access the tensor shape via .shape attribute like numpy

print('torch_a shape:',torch_a.shape)
print('The 0th dimension size:',torch_a.shape[0])
torch_a shape: torch.Size([3, 3])
The 0th dimension size: 3

Similarly, there is a reshape function

torch_a.reshape(1,9).shape
torch.Size([1, 9])

… and you can also use -1 in the same way you used for numpy

torch_a.reshape(-1,3).shape
torch.Size([3, 3])

Indexing (Slicing)

We can use a similar indexing trick like we tried with a numpy array

torch_a[0,:]
tensor([1., 0., 0.])

or a boolean mask generation

mask = torch_a == 0.
mask
tensor([[False,  True,  True],
        [ True, False,  True],
        [ True,  True, False]])

… and slicing with it using masked_select function

torch_a.masked_select(~mask)
tensor([1., 1., 1.])

2. Data loading tools in Pytorch

In Python-02-Python, we covered an iteratable class and how it could be useful to generalize a design of data access tools. Pytorch (and any other ML libraries out there) provides a generalized tool to interface such iteratable data instance called DataLoader. Desired capabilities of such tools include ability to choose random vs. ordered subset in data, parallelized workers to simultaneously prepare multiple batch data, etc..

Let’s practice the use of DataLoader.

First, we define the same iteretable class mentioned in Python-02-Python notebook.

class dataset:
    
    def __init__(self):
        self._data = tuple(range(100))
        
    def __len__(self):
        return len(self._data)
    
    def __getitem__(self,index):
        return self._data[index]
    
data = dataset()

Here is how you can instantiate a DataLoader. We construct an instance called loader that can automatically packs 10 elements of data (batch_size=10) that is randomly selected (shuffle=True) using 1 parallel worker to prepare such data (num_workers=1).

from torch.utils.data import DataLoader
loader = DataLoader(data,batch_size=10,shuffle=True,num_workers=1)

The dataloader itself is an iterable object. We created a dataloader with batch size 10 where the dataset instance has the length 100. This means, if we iterate on the dataloader instance, we get 10 separate batch data.

for index, batch_data in enumerate(loader):
    print('Batch entry',index,'... batch data',batch_data)
Batch entry 0 ... batch data tensor([23, 14, 64, 51, 94, 25, 38, 44, 70, 28])
Batch entry 1 ... batch data tensor([37, 57, 66, 43, 53, 13, 72, 48, 74, 62])
Batch entry 2 ... batch data tensor([89,  3, 40, 92, 86, 65, 63, 95, 21, 97])
Batch entry 3 ... batch data tensor([ 9, 42, 45, 54, 31, 87, 99, 46, 98, 26])
Batch entry 4 ... batch data tensor([41, 80, 36, 90,  0, 59, 52, 69, 17, 56])
Batch entry 5 ... batch data tensor([16, 61, 82, 30, 77, 73, 96, 33,  6, 83])
Batch entry 6 ... batch data tensor([39,  5, 24, 32, 85, 35, 50, 60,  1, 78])
Batch entry 7 ... batch data tensor([18,  2, 71,  7, 34, 20, 49, 10,  8, 84])
Batch entry 8 ... batch data tensor([76, 93, 12, 81, 22, 55,  4, 19, 11, 27])
Batch entry 9 ... batch data tensor([29, 15, 47, 88, 75, 68, 67, 58, 79, 91])

We can see that data elements are chosen randomly as we chose “shuffle=True”. Does this cover all data elements in the dataset? Let’s check this by combining all iterated data.

data_collection = []
for index,batch_data in enumerate(loader):
    data_collection += [int(v) for v in batch_data]
    
import numpy as np
np.unique(data_collection)
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
       34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
       51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
       68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,
       85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])

This covers the minimal concept of DataLoader you need to know in order to follow the workshop. You can read more about DataLoader in pytorch documentation here and also more extended example in their tutorial if you are interested in exploring yourself.

3. Computation graph

The last point to cover is how to chain modularized mathematical operations.

To get started, let’s introduce a few, well used mathematical operations in pytorch.

  • torch.nn.ReLU (link) … a function that takes an input tenor and outputs a tensor of the same shape where elements are 0 if the corresponding input element has a value below 0, and otherwise the same value.

  • torch.nn.Softmax (link) … a function that applies a softmax function on the specified dimension of an input data.

  • torch.nn.MaxPool2d (link) … a function that down-sample the input matrix by taking maximum value from sub-matrices of a specified shape.

Let’s see what each of these functions do first using a simple 2D matrix data.

# Create a 2D tensor of shape (1,5,5) with some negative and positive values
data = torch.randn(25).reshape(1,5,5)
data
tensor([[[ 1.5810,  1.3010,  1.2753, -0.2010, -0.1606],
         [-0.4015,  0.6957, -1.8061, -1.1589, -0.4210],
         [-0.9620,  1.2825,  0.8768,  1.6221, -1.4779],
         [ 1.1331, -1.2203, -1.1285,  0.4135,  0.2892],
         [ 2.2473, -0.8036, -0.2808,  0.7697, -0.6596]]])

Here’s how ReLU works

op0 = torch.nn.ReLU()
op0(data)
tensor([[[1.5810, 1.3010, 1.2753, 0.0000, 0.0000],
         [0.0000, 0.6957, 0.0000, 0.0000, 0.0000],
         [0.0000, 1.2825, 0.8768, 1.6221, 0.0000],
         [1.1331, 0.0000, 0.0000, 0.4135, 0.2892],
         [2.2473, 0.0000, 0.0000, 0.7697, 0.0000]]])

Here’s how Softmax works

op1 = torch.nn.Softmax(dim=2)
op1(data)
tensor([[[0.3526, 0.2665, 0.2597, 0.0593, 0.0618],
         [0.1757, 0.5264, 0.0431, 0.0824, 0.1723],
         [0.0327, 0.3086, 0.2057, 0.4334, 0.0195],
         [0.4725, 0.0449, 0.0492, 0.2301, 0.2032],
         [0.7093, 0.0336, 0.0566, 0.1618, 0.0388]]])

Here’s how MaxPool2d works with a kernel shape (5,1)

op2 = torch.nn.MaxPool2d(kernel_size=(1,5))
op2(data)
/usr/local/lib/python3.8/dist-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at  /pytorch/c10/core/TensorImpl.h:1156.)
  return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
tensor([[[1.5810],
         [0.6957],
         [1.6221],
         [1.1331],
         [2.2473]]])

So if we want to define a computation graph that applies these operations in a sequential order, we could try:

op2(op1(op0(data)))
tensor([[[0.3444],
         [0.3339],
         [0.3874],
         [0.3905],
         [0.6472]]])

Pytorch provides tools called containers to make this easy. Let’s try torch.nn.Sequential (see different type of containers here).

myop = torch.nn.Sequential(op0,op1,op2)
myop(data)
tensor([[[0.3444],
         [0.3339],
         [0.3874],
         [0.3905],
         [0.6472]]])

We might wonder “Can I add a custom operation to this graph?” Yes, we can add any module that inherits from torch.nn.Module class. Let’s define one for ourself.

class AddOne(torch.nn.Module):

    # always call the base class constructor for defining your torch.nn.Module inherit class!
    def __init__(self):
        super().__init__()
        
    # forward needs to be defined. This is called by "()" function call.
    def forward(self,input):
        
        return input + 1;

Now let’s add our operation

myop = torch.nn.Sequential(op0,op1,op2,AddOne())
myop(data)
tensor([[[1.3444],
         [1.3339],
         [1.3874],
         [1.3905],
         [1.6472]]])

Of course, you can also embed op0, op1, and op2 inside one module.

class MyOp(torch.nn.Module):
    
    def __init__(self):
        super().__init__()
        self._sequence = torch.nn.Sequential(torch.nn.ReLU(), 
                                             torch.nn.Softmax(dim=2), 
                                             torch.nn.MaxPool2d(kernel_size=(1,5)),
                                             AddOne(),
                                            )
        
    def forward(self,input):
        
        return self._sequence(input)

Let’s try using it.

myop = MyOp()
myop(data)
tensor([[[1.3444],
         [1.3339],
         [1.3874],
         [1.3905],
         [1.6472]]])

Extra: GPU acceleration

This section only works if you run this notebook on a GPU-enabled machine (not on the binder unfortunately)

Putting torch.Tensor on GPU is as easy as calling .cuda() function (and if you want to bring it back to cpu, call .cpu() on a cuda.Tensor). Let’s do a simple speed comparison.

Create two arrays with an identical data type, shape, and values.

# Create 1000x1000 matrix
data_np=np.zeros([1000,1000],dtype=np.float32)
data_cpu = torch.Tensor(data_np).cpu()
#data_gpu = torch.Tensor(data_np).cuda()

Time fifth power of the matrix on CPU

%%timeit
mean = (data_cpu ** 5).mean().item()
6.09 ms ± 195 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

… and next on GPU

%%timeit
mean = (data_gpu ** 5).mean().item()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
/tmp/ipykernel_46365/2823993350.py in <module>
----> 1 get_ipython().run_cell_magic('timeit', '', 'mean = (data_gpu ** 5).mean().item()\n')

/usr/local/lib/python3.8/dist-packages/IPython/core/interactiveshell.py in run_cell_magic(self, magic_name, line, cell)
   2401             with self.builtin_trap:
   2402                 args = (magic_arg_s, cell)
-> 2403                 result = fn(*args, **kwargs)
   2404             return result
   2405 

/usr/local/lib/python3.8/dist-packages/decorator.py in fun(*args, **kw)
    230             if not kwsyntax:
    231                 args, kw = fix(args, kw, sig)
--> 232             return caller(func, *(extras + args), **kw)
    233     fun.__name__ = func.__name__
    234     fun.__doc__ = func.__doc__

/usr/local/lib/python3.8/dist-packages/IPython/core/magic.py in <lambda>(f, *a, **k)
    185     # but it's overkill for just that one bit of state.
    186     def magic_deco(arg):
--> 187         call = lambda f, *a, **k: f(*a, **k)
    188 
    189         if callable(arg):

/usr/local/lib/python3.8/dist-packages/IPython/core/magics/execution.py in timeit(self, line, cell, local_ns)
   1167             for index in range(0, 10):
   1168                 number = 10 ** index
-> 1169                 time_number = timer.timeit(number)
   1170                 if time_number >= 0.2:
   1171                     break

/usr/local/lib/python3.8/dist-packages/IPython/core/magics/execution.py in timeit(self, number)
    167         gc.disable()
    168         try:
--> 169             timing = self.inner(it, self.timer)
    170         finally:
    171             if gcold:

<magic-timeit> in inner(_it, _timer)

NameError: name 'data_gpu' is not defined

… which is more than x10 faster than the cpu counter part :)

But there’s a catch you should be aware! Preparing a data on GPU does take time because data needs to be sent to GPU, which could take some time. Let’s compare the time it takes to create a tensor on CPU v.s. GPU.

%%timeit
data_np=np.zeros([1000,1000],dtype=np.float32)
data_cpu = torch.Tensor(data_np).cpu()
165 µs ± 1.02 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%%timeit
#data_np=np.zeros([1000,1000],dtype=np.float32)
#data_gpu = torch.Tensor(data_np).cuda()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/ipykernel_46365/3126707622.py in <module>
----> 1 get_ipython().run_cell_magic('timeit', '', '#data_np=np.zeros([1000,1000],dtype=np.float32)\n#data_gpu = torch.Tensor(data_np).cuda()\n')

/usr/local/lib/python3.8/dist-packages/IPython/core/interactiveshell.py in run_cell_magic(self, magic_name, line, cell)
   2401             with self.builtin_trap:
   2402                 args = (magic_arg_s, cell)
-> 2403                 result = fn(*args, **kwargs)
   2404             return result
   2405 

/usr/local/lib/python3.8/dist-packages/decorator.py in fun(*args, **kw)
    230             if not kwsyntax:
    231                 args, kw = fix(args, kw, sig)
--> 232             return caller(func, *(extras + args), **kw)
    233     fun.__name__ = func.__name__
    234     fun.__doc__ = func.__doc__

/usr/local/lib/python3.8/dist-packages/IPython/core/magic.py in <lambda>(f, *a, **k)
    185     # but it's overkill for just that one bit of state.
    186     def magic_deco(arg):
--> 187         call = lambda f, *a, **k: f(*a, **k)
    188 
    189         if callable(arg):

/usr/local/lib/python3.8/dist-packages/IPython/core/magics/execution.py in timeit(self, line, cell, local_ns)
   1144 
   1145         t0 = clock()
-> 1146         code = self.shell.compile(timeit_ast, "<magic-timeit>", "exec")
   1147         tc = clock()-t0
   1148 

/usr/lib/python3.8/codeop.py in __call__(self, source, filename, symbol)
    141 
    142     def __call__(self, source, filename, symbol):
--> 143         codeob = compile(source, filename, symbol, self.flags, 1)
    144         for feature in _features:
    145             if codeob.co_flags & feature.compiler_flag:

ValueError: empty body on For

As you can see, it takes nearly 10 times longer time to create this particular data tensor on our GPU. This speed depends on many factors including your hardware configuration (e.g. CPU-GPU communication via PCI-e or NVLINK). It makes sense to move computation that takes longer than this data transfer time to perform on GPU.