Pytorch: Machine Learning library¶
Pytorch is one of open-source, modern deep learning libraries out there and what we will use in this workshop. Other popular libraries include Tensorflow, Keras, MXNet, Spark ML, etc. …
All of those libraries works very similar in terms of implementing your neural network architecture. If you are new, probably any of Pytorch/Keras/Tensorflow would work well with lots of guidance/examples/discussion-forums online! Common things you have to learn include:
Data types (typically arbitrary dimension matrix, or tensor )
Data loading tools (streamline prepping data into appropraite types from input files)
Chaining operations = a computation graph
In this notebook, we cover the basics part in each of topics above.
1. Tensor data types in PyTorch¶
In pytorch
, we use torch.Tensor
object to represent data matrix. It is a lot like numpy
array but not quite the same. torch
provide APIs to easily convert data between numpy
array and torch.Tensor
. Let’s play a little bit.
from __future__ import print_function
import numpy as np
import torch
SEED=123
np.random.seed(SEED)
torch.manual_seed(SEED)
<torch._C.Generator at 0x7fdd47da8fb0>
… yep, that’s how we set pytorch random number seed! (see Python-03-Numpy if you don’t know about a seed)
Creating a torch.Tensor¶
Pytorch provides constructors similar to numpy (and named same way where possible to avoid users having to look-up function names). Here are some examples.
# Tensor of 0s = numpy.zeros
t=torch.zeros(2,3)
print('torch.zeros:\n',t)
# Tensor of 1s = numpy.ones
t=torch.ones(2,3)
print('\ntorch.ones:\n',t)
# Tensor from a sequential integers = numpy.arange
t=torch.arange(0,6,1).reshape(2,3).float()
print('\ntorch.arange:\n',t)
# Normal distribution centered at 0.0 and sigma=1.0 = numpy.rand.randn
t=torch.randn(2,3)
print('\ntorch.randn:\n',t)
torch.zeros:
tensor([[0., 0., 0.],
[0., 0., 0.]])
torch.ones:
tensor([[1., 1., 1.],
[1., 1., 1.]])
torch.arange:
tensor([[0., 1., 2.],
[3., 4., 5.]])
torch.randn:
tensor([[-0.1115, 0.1204, -0.3696],
[-0.2404, -1.1969, 0.2093]])
… or you can create from a simple list, tuple, and numpy arrays.
# Create numpy array
data_np = np.zeros([10,10],dtype=np.float32)
# Fill something
np.fill_diagonal(data_np,1.)
print('Numpy data\n',data_np)
# Create torch.Tensor
data_torch = torch.Tensor(data_np)
print('\ntorch.Tensor data\n',data_torch)
# One can make also from a list
data_list = [1,2,3]
data_list_torch = torch.Tensor(data_list)
print('\nPython list :',data_list)
print('torch.Tensor:',data_list_torch)
Numpy data
[[1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]]
torch.Tensor data
tensor([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 1., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]])
Python list : [1, 2, 3]
torch.Tensor: tensor([1., 2., 3.])
Converting back from torch.Tensor
to a numpy array can be easily done
# Bringing back into numpy array
data_np = data_torch.numpy()
print('\nNumpy data (converted back from torch.Tensor)\n',data_np)
Numpy data (converted back from torch.Tensor)
[[1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]]
Ordinary operations to an array also exists like numpy
.
# mean & std
print('mean',data_torch.mean(),'std',data_torch.std(),'sum',data_torch.sum())
mean tensor(0.1000) std tensor(0.3015) sum tensor(10.)
We see the return of those functions (mean
,std
,sum
) are tensor objects. If you would like a single scalar value, you can call item
function.
# mean & std
print('mean',data_torch.mean().item(),'std',data_torch.std().item(),'sum',data_torch.sum().item())
mean 0.10000000149011612 std 0.30151134729385376 sum 10.0
Tensor addition and multiplication¶
Common operations include element-wise multiplication, matrix multiplication, and reshaping. Read the documentation to find the right function for what you want to do!
# Two matrices
data_a = np.zeros([3,3],dtype=np.float32)
data_b = np.zeros([3,3],dtype=np.float32)
np.fill_diagonal(data_a,1.)
data_b[0,:]=1.
# print them
print('Two numpy matrices')
print(data_a)
print(data_b,'\n')
# Make torch.Tensor
torch_a = torch.Tensor(data_a)
torch_b = torch.Tensor(data_b)
print('torch.Tensor element-wise multiplication:')
print(torch_a*torch_b)
print('\ntorch.Tensor matrix multiplication:')
print(torch_a.matmul(torch_b))
print('\ntorch.Tensor matrix addition:')
print(torch_a-torch_b)
print('\nadding a scalar 1:')
print(torch_a+1)
Two numpy matrices
[[1. 0. 0.]
[0. 1. 0.]
[0. 0. 1.]]
[[1. 1. 1.]
[0. 0. 0.]
[0. 0. 0.]]
torch.Tensor element-wise multiplication:
tensor([[1., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]])
torch.Tensor matrix multiplication:
tensor([[1., 1., 1.],
[0., 0., 0.],
[0., 0., 0.]])
torch.Tensor matrix addition:
tensor([[ 0., -1., -1.],
[ 0., 1., 0.],
[ 0., 0., 1.]])
adding a scalar 1:
tensor([[2., 1., 1.],
[1., 2., 1.],
[1., 1., 2.]])
Reshaping¶
You can access the tensor shape via .shape
attribute like numpy
print('torch_a shape:',torch_a.shape)
print('The 0th dimension size:',torch_a.shape[0])
torch_a shape: torch.Size([3, 3])
The 0th dimension size: 3
Similarly, there is a reshape
function
torch_a.reshape(1,9).shape
torch.Size([1, 9])
… and you can also use -1 in the same way you used for numpy
torch_a.reshape(-1,3).shape
torch.Size([3, 3])
Indexing (Slicing)¶
We can use a similar indexing trick like we tried with a numpy array
torch_a[0,:]
tensor([1., 0., 0.])
or a boolean mask generation
mask = torch_a == 0.
mask
tensor([[False, True, True],
[ True, False, True],
[ True, True, False]])
… and slicing with it using masked_select
function
torch_a.masked_select(~mask)
tensor([1., 1., 1.])
2. Data loading tools in Pytorch¶
In Python-02-Python, we covered an iteratable class and how it could be useful to generalize a design of data access tools. Pytorch (and any other ML libraries out there) provides a generalized tool to interface such iteratable data instance called DataLoader
. Desired capabilities of such tools include ability to choose random vs. ordered subset in data, parallelized workers to simultaneously prepare multiple batch data, etc..
Let’s practice the use of DataLoader
.
First, we define the same iteretable class mentioned in Python-02-Python notebook.
class dataset:
def __init__(self):
self._data = tuple(range(100))
def __len__(self):
return len(self._data)
def __getitem__(self,index):
return self._data[index]
data = dataset()
Here is how you can instantiate a DataLoader
. We construct an instance called loader
that can automatically packs 10 elements of data (batch_size=10
) that is randomly selected (shuffle=True
) using 1 parallel worker to prepare such data (num_workers=1
).
from torch.utils.data import DataLoader
loader = DataLoader(data,batch_size=10,shuffle=True,num_workers=1)
The dataloader itself is an iterable object. We created a dataloader with batch size 10 where the dataset instance has the length 100. This means, if we iterate on the dataloader instance, we get 10 separate batch data.
for index, batch_data in enumerate(loader):
print('Batch entry',index,'... batch data',batch_data)
Batch entry 0 ... batch data tensor([23, 14, 64, 51, 94, 25, 38, 44, 70, 28])
Batch entry 1 ... batch data tensor([37, 57, 66, 43, 53, 13, 72, 48, 74, 62])
Batch entry 2 ... batch data tensor([89, 3, 40, 92, 86, 65, 63, 95, 21, 97])
Batch entry 3 ... batch data tensor([ 9, 42, 45, 54, 31, 87, 99, 46, 98, 26])
Batch entry 4 ... batch data tensor([41, 80, 36, 90, 0, 59, 52, 69, 17, 56])
Batch entry 5 ... batch data tensor([16, 61, 82, 30, 77, 73, 96, 33, 6, 83])
Batch entry 6 ... batch data tensor([39, 5, 24, 32, 85, 35, 50, 60, 1, 78])
Batch entry 7 ... batch data tensor([18, 2, 71, 7, 34, 20, 49, 10, 8, 84])
Batch entry 8 ... batch data tensor([76, 93, 12, 81, 22, 55, 4, 19, 11, 27])
Batch entry 9 ... batch data tensor([29, 15, 47, 88, 75, 68, 67, 58, 79, 91])
We can see that data elements are chosen randomly as we chose “shuffle=True”. Does this cover all data elements in the dataset? Let’s check this by combining all iterated data.
data_collection = []
for index,batch_data in enumerate(loader):
data_collection += [int(v) for v in batch_data]
import numpy as np
np.unique(data_collection)
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,
85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])
This covers the minimal concept of DataLoader
you need to know in order to follow the workshop. You can read more about DataLoader
in pytorch documentation here and also more extended example in their tutorial if you are interested in exploring yourself.
3. Computation graph¶
The last point to cover is how to chain modularized mathematical operations.
To get started, let’s introduce a few, well used mathematical operations in pytorch.
torch.nn.ReLU
(link) … a function that takes an input tenor and outputs a tensor of the same shape where elements are 0 if the corresponding input element has a value below 0, and otherwise the same value.torch.nn.Softmax
(link) … a function that applies a softmax function on the specified dimension of an input data.torch.nn.MaxPool2d
(link) … a function that down-sample the input matrix by taking maximum value from sub-matrices of a specified shape.
Let’s see what each of these functions do first using a simple 2D matrix data.
# Create a 2D tensor of shape (1,5,5) with some negative and positive values
data = torch.randn(25).reshape(1,5,5)
data
tensor([[[ 1.5810, 1.3010, 1.2753, -0.2010, -0.1606],
[-0.4015, 0.6957, -1.8061, -1.1589, -0.4210],
[-0.9620, 1.2825, 0.8768, 1.6221, -1.4779],
[ 1.1331, -1.2203, -1.1285, 0.4135, 0.2892],
[ 2.2473, -0.8036, -0.2808, 0.7697, -0.6596]]])
Here’s how ReLU
works
op0 = torch.nn.ReLU()
op0(data)
tensor([[[1.5810, 1.3010, 1.2753, 0.0000, 0.0000],
[0.0000, 0.6957, 0.0000, 0.0000, 0.0000],
[0.0000, 1.2825, 0.8768, 1.6221, 0.0000],
[1.1331, 0.0000, 0.0000, 0.4135, 0.2892],
[2.2473, 0.0000, 0.0000, 0.7697, 0.0000]]])
Here’s how Softmax
works
op1 = torch.nn.Softmax(dim=2)
op1(data)
tensor([[[0.3526, 0.2665, 0.2597, 0.0593, 0.0618],
[0.1757, 0.5264, 0.0431, 0.0824, 0.1723],
[0.0327, 0.3086, 0.2057, 0.4334, 0.0195],
[0.4725, 0.0449, 0.0492, 0.2301, 0.2032],
[0.7093, 0.0336, 0.0566, 0.1618, 0.0388]]])
Here’s how MaxPool2d
works with a kernel shape (5,1)
op2 = torch.nn.MaxPool2d(kernel_size=(1,5))
op2(data)
/usr/local/lib/python3.8/dist-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /pytorch/c10/core/TensorImpl.h:1156.)
return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
tensor([[[1.5810],
[0.6957],
[1.6221],
[1.1331],
[2.2473]]])
So if we want to define a computation graph that applies these operations in a sequential order, we could try:
op2(op1(op0(data)))
tensor([[[0.3444],
[0.3339],
[0.3874],
[0.3905],
[0.6472]]])
Pytorch provides tools called containers to make this easy. Let’s try torch.nn.Sequential
(see different type of containers here).
myop = torch.nn.Sequential(op0,op1,op2)
myop(data)
tensor([[[0.3444],
[0.3339],
[0.3874],
[0.3905],
[0.6472]]])
We might wonder “Can I add a custom operation to this graph?” Yes, we can add any module that inherits from torch.nn.Module
class. Let’s define one for ourself.
class AddOne(torch.nn.Module):
# always call the base class constructor for defining your torch.nn.Module inherit class!
def __init__(self):
super().__init__()
# forward needs to be defined. This is called by "()" function call.
def forward(self,input):
return input + 1;
Now let’s add our operation
myop = torch.nn.Sequential(op0,op1,op2,AddOne())
myop(data)
tensor([[[1.3444],
[1.3339],
[1.3874],
[1.3905],
[1.6472]]])
Of course, you can also embed op0
, op1
, and op2
inside one module.
class MyOp(torch.nn.Module):
def __init__(self):
super().__init__()
self._sequence = torch.nn.Sequential(torch.nn.ReLU(),
torch.nn.Softmax(dim=2),
torch.nn.MaxPool2d(kernel_size=(1,5)),
AddOne(),
)
def forward(self,input):
return self._sequence(input)
Let’s try using it.
myop = MyOp()
myop(data)
tensor([[[1.3444],
[1.3339],
[1.3874],
[1.3905],
[1.6472]]])
Extra: GPU acceleration¶
This section only works if you run this notebook on a GPU-enabled machine (not on the binder unfortunately)
Putting torch.Tensor
on GPU is as easy as calling .cuda()
function (and if you want to bring it back to cpu, call .cpu()
on a cuda.Tensor
). Let’s do a simple speed comparison.
Create two arrays with an identical data type, shape, and values.
# Create 1000x1000 matrix
data_np=np.zeros([1000,1000],dtype=np.float32)
data_cpu = torch.Tensor(data_np).cpu()
#data_gpu = torch.Tensor(data_np).cuda()
Time fifth power of the matrix on CPU
%%timeit
mean = (data_cpu ** 5).mean().item()
6.09 ms ± 195 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
… and next on GPU
%%timeit
mean = (data_gpu ** 5).mean().item()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
/tmp/ipykernel_46365/2823993350.py in <module>
----> 1 get_ipython().run_cell_magic('timeit', '', 'mean = (data_gpu ** 5).mean().item()\n')
/usr/local/lib/python3.8/dist-packages/IPython/core/interactiveshell.py in run_cell_magic(self, magic_name, line, cell)
2401 with self.builtin_trap:
2402 args = (magic_arg_s, cell)
-> 2403 result = fn(*args, **kwargs)
2404 return result
2405
/usr/local/lib/python3.8/dist-packages/decorator.py in fun(*args, **kw)
230 if not kwsyntax:
231 args, kw = fix(args, kw, sig)
--> 232 return caller(func, *(extras + args), **kw)
233 fun.__name__ = func.__name__
234 fun.__doc__ = func.__doc__
/usr/local/lib/python3.8/dist-packages/IPython/core/magic.py in <lambda>(f, *a, **k)
185 # but it's overkill for just that one bit of state.
186 def magic_deco(arg):
--> 187 call = lambda f, *a, **k: f(*a, **k)
188
189 if callable(arg):
/usr/local/lib/python3.8/dist-packages/IPython/core/magics/execution.py in timeit(self, line, cell, local_ns)
1167 for index in range(0, 10):
1168 number = 10 ** index
-> 1169 time_number = timer.timeit(number)
1170 if time_number >= 0.2:
1171 break
/usr/local/lib/python3.8/dist-packages/IPython/core/magics/execution.py in timeit(self, number)
167 gc.disable()
168 try:
--> 169 timing = self.inner(it, self.timer)
170 finally:
171 if gcold:
<magic-timeit> in inner(_it, _timer)
NameError: name 'data_gpu' is not defined
… which is more than x10 faster than the cpu counter part :)
But there’s a catch you should be aware! Preparing a data on GPU does take time because data needs to be sent to GPU, which could take some time. Let’s compare the time it takes to create a tensor on CPU v.s. GPU.
%%timeit
data_np=np.zeros([1000,1000],dtype=np.float32)
data_cpu = torch.Tensor(data_np).cpu()
165 µs ± 1.02 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%%timeit
#data_np=np.zeros([1000,1000],dtype=np.float32)
#data_gpu = torch.Tensor(data_np).cuda()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/tmp/ipykernel_46365/3126707622.py in <module>
----> 1 get_ipython().run_cell_magic('timeit', '', '#data_np=np.zeros([1000,1000],dtype=np.float32)\n#data_gpu = torch.Tensor(data_np).cuda()\n')
/usr/local/lib/python3.8/dist-packages/IPython/core/interactiveshell.py in run_cell_magic(self, magic_name, line, cell)
2401 with self.builtin_trap:
2402 args = (magic_arg_s, cell)
-> 2403 result = fn(*args, **kwargs)
2404 return result
2405
/usr/local/lib/python3.8/dist-packages/decorator.py in fun(*args, **kw)
230 if not kwsyntax:
231 args, kw = fix(args, kw, sig)
--> 232 return caller(func, *(extras + args), **kw)
233 fun.__name__ = func.__name__
234 fun.__doc__ = func.__doc__
/usr/local/lib/python3.8/dist-packages/IPython/core/magic.py in <lambda>(f, *a, **k)
185 # but it's overkill for just that one bit of state.
186 def magic_deco(arg):
--> 187 call = lambda f, *a, **k: f(*a, **k)
188
189 if callable(arg):
/usr/local/lib/python3.8/dist-packages/IPython/core/magics/execution.py in timeit(self, line, cell, local_ns)
1144
1145 t0 = clock()
-> 1146 code = self.shell.compile(timeit_ast, "<magic-timeit>", "exec")
1147 tc = clock()-t0
1148
/usr/lib/python3.8/codeop.py in __call__(self, source, filename, symbol)
141
142 def __call__(self, source, filename, symbol):
--> 143 codeob = compile(source, filename, symbol, self.flags, 1)
144 for feature in _features:
145 if codeob.co_flags & feature.compiler_flag:
ValueError: empty body on For
As you can see, it takes nearly 10 times longer time to create this particular data tensor on our GPU. This speed depends on many factors including your hardware configuration (e.g. CPU-GPU communication via PCI-e or NVLINK). It makes sense to move computation that takes longer than this data transfer time to perform on GPU.