Up and Running with Python3¶
In this notebook we cover numpy
, very widely used scientific python applications, it is so popular that most of other scientific applications are either built on top of numpy
or at least support its data types. We cover the basics:
This notebook is based on tutorial by Marco Del Tutto for an introductory workshop.
NumPy: Getting Started¶
NumPy is a fundamental package for scientific computing with Python. Is a Python library adding support for multi-dimensional arrays and matrices, as well as many useful mathematical functions to operate on these arrays.
import numpy as np
print(np.__version__)
1.21.1
In the previous notebook, we saw how to construct lists. Now, we will start from lists, and see how we can construct numpy arrays from them.
masses_list = [0.511, 105.66, 1.78e3]
masses_array = np.array(masses_list)
print(masses_array)
[5.1100e-01 1.0566e+02 1.7800e+03]
Multiply every element by a number:
masses_array_gev = masses_array * 1e-3
print(masses_array_gev)
[5.1100e-04 1.0566e-01 1.7800e+00]
To get the size of the array, you can use the len()
function, or the .size
attribute.
You can get the shape of the object by using the .shape
attribute
# EXCERCISE: Check that len and size give the same result. What does shape return?
print('len is ', len(masses_array))
print('size is ', masses_array.size)
print('shape is ', masses_array.shape)
len is 3
size is 3
shape is (3,)
# EXCERCISE: Try np.linspace, np.zeros, np.ones
np.linspace(5, 15, 9)
np.zeros(9)
np.ones(9)
np.zeros((5, 4))
array([[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]])
NumPy DataTypes¶
Up until this point, we have been using the default datatypes that NumPy selects for arrays. In the cases for arange and linspace, the default types are integers.
In the case of zeros and ones, the default type is floating point. Each of these functions has a dtype
parameter. For example, we can look here and we see linspace has a dtype
parameter and its default value is set to None
. You can use this parameter to determine the datatype for each element in an array. Remember that each element must have the same datatype.
At this link you can find all the NumPy datatypes.
In the previous examples, we saw that the ones
function and the zeros
function return arrays that contain floating point values.
You can change this and select the datatype that you want by setting a value for the dtype
parameter.
For example you can do np.ones(9, dtype='int64')
.
# EXCERCISE: Create an array with zeros that has 11 elements, each of which is a 64-bit integer
a = np.zeros(11, dtype='int64')
print(a, type(a))
print(a.dtype)
[0 0 0 0 0 0 0 0 0 0 0] <class 'numpy.ndarray'>
int64
And…there is also the complex data type!
You can specify a complex type in python using j
as imaginary number, as in 1+2j
.
# EXCERCISE: Try to add an imaginary number to a numpy array and print the array
masses_list = [0.511, 105.66, 1.78e3]
masses_list.append(1+2j)
masses_array = np.array(masses_list)
print(masses_array)
[5.1100e-01+0.j 1.0566e+02+0.j 1.7800e+03+0.j 1.0000e+00+2.j]
Array Indexing, Reshaping, Slicing, Masking¶
masses_array = np.array([2.2, 4.7, 0.511, 1.28, 96, 105.66, 173e3, 4.18e3, 1.78e3, 0, 0, 91.19e3, 80.39e3, 124.97e3])
You can use negatixe index to start counting from the end of the array.
For example, to select the last element:
print(masses_array[-1])
124970.0
Or to select the penultimate element:
print(masses_array[-2])
80390.0
Reshape¶
The above array masses_array
is a 1-D array with 14 elements in it.
Numpy allows to resphape it easily. For example, we can transform it into a 2-D array with 7 columns and 2 rows.
masses_array_2d = np.reshape(masses_array, (7, 2))
print(masses_array_2d)
[[2.2000e+00 4.7000e+00]
[5.1100e-01 1.2800e+00]
[9.6000e+01 1.0566e+02]
[1.7300e+05 4.1800e+03]
[1.7800e+03 0.0000e+00]
[0.0000e+00 9.1190e+04]
[8.0390e+04 1.2497e+05]]
reshape
also exists as an array attribute
masses_array_2d = masses_array.reshape((7,2))
print(masses_array_2d)
[[2.2000e+00 4.7000e+00]
[5.1100e-01 1.2800e+00]
[9.6000e+01 1.0566e+02]
[1.7300e+05 4.1800e+03]
[1.7800e+03 0.0000e+00]
[0.0000e+00 9.1190e+04]
[8.0390e+04 1.2497e+05]]
Exercise: try to reshape into (7,3)
.
#Exercise: try to reshape into (7,3)
reshape
function allows you to specify one of shape dimension value to be -1
, which would mean “go figure out what it should be.”
masses_array_2d = masses_array_2d.reshape((-1,7))
print(masses_array_2d.shape)
(2, 7)
Slicing¶
A basic slice syntax is i:j:k
where i
is the starting index, j
is the stopping index, and k
is the step:
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
print(x[1:7:2])
[1 3 5]
Now, if i
is not given, it defaults to 0.
If j
is not given, it defaults to the lenght of the array (call it n
).
If k
is not given it defaults to 1.
Example i = 3
, j
and k
defaulted to n
and 1:
print(x[3:])
[3 4 5 6 7 8 9]
Example i
defaulted to 0, j = 4
and k
defaulted and 1:
print(x[:4])
[0 1 2 3]
Example i
defaulted to 0, j = 4
and k = 2
:
print(x[:4:2])
[0 2]
Masking¶
Let’s start with a numpy array
vector = np.array([26, 14, 1, -28, 8, 7])
Then, we create a “mask”. We construct a list with contains True and False values, depending if the elements of vector
are divisbile or not by 7.
mask = 0 == (vector % 7)
print(mask)
[False True False True False True]
Finally, we can applt this mask to our vector in order to select only elements that are divisible by 7:
print(vector[mask])
[ 14 -28 7]
Saving & Reading an array¶
It’s useful to be able to save your array = data! So here’s how you can save multiple arrays in a file.
np.savez('erase.npz', kazu=masses_array, daniel=masses_array_2d)
The above command saves masses_array
and masses_array_2d
data with keywords “kazu” and “daniel”.
Confirm the file erase.npz
is created. The choice of this filename is to remind our future-self that this can be removed :)
! ls -lht erase.npz
-rw-r--r-- 1 ldomine ki 730 Jun 30 16:24 erase.npz
Now let’s re-read data from the file.
data = np.load('erase.npz')
type(data)
numpy.lib.npyio.NpzFile
You can access erase.npz
file contents by the keywords you defined upon saving the data.
print('shape of daniel',data['daniel'].shape)
print('contents of kazu')
print(data['kazu'])
shape of daniel (2, 7)
contents of kazu
[2.2000e+00 4.7000e+00 5.1100e-01 1.2800e+00 9.6000e+01 1.0566e+02
1.7300e+05 4.1800e+03 1.7800e+03 0.0000e+00 0.0000e+00 9.1190e+04
8.0390e+04 1.2497e+05]
Linear Algebra¶
The np.matrix
function returned a matrix from an array like object, or from a string of data.
A matrix is a specialized 2D array that retains its 2D nature through operations.
It has special operators such as asterisk for matrix multiplication, and a double asterisk for matrix power or matrix exponentiation operations.
Let’s contruct the a CKM matrix:
ckm_matrix = np.matrix([[0.97427, 0.22534, 0.00351 ],
[0.22520, 0.97344, 0.0412 ],
[0.00867, 0.0404, 0.999146]])
print(ckm_matarix)
print(type(ckm_matrix))
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
/tmp/ipykernel_46289/2919251912.py in <module>
2 [0.22520, 0.97344, 0.0412 ],
3 [0.00867, 0.0404, 0.999146]])
----> 4 print(ckm_matarix)
5 print(type(ckm_matrix))
NameError: name 'ckm_matarix' is not defined
Again, we can use the .shape
attribute to see what is the shape of this matrix:
print(ckm_matrix.shape)
(3, 3)
And also ndim
to see the number of dimensions:
print(ckm_matrix.ndim)
2
Let’s use the help
function to see what opetations are available:
#help(np.matrix)
The transpose attribute .T to calculate the transpose of this matrix.
Next we’ll use another attribute, .I, to calculate the inverse of this matrix. Notice that the inverse is calculated on my first matrix, and not upon the transform of my first matrix.
For example, is the transpose of the CKM matrix:
ckm_matrix.T
matrix([[0.97427 , 0.2252 , 0.00867 ],
[0.22534 , 0.97344 , 0.0404 ],
[0.00351 , 0.0412 , 0.999146]])
Let’s check that the CKM matrix is unitary:
result = ckm_matrix * ckm_matrix.I.T
print(result)
[[ 9.99931003e-01 3.78984306e-04 -5.17913575e-03]
[-1.46579176e-04 9.99999968e-01 8.01956999e-04]
[ 5.79675730e-03 -2.16445093e-03 1.00003722e+00]]
Random numbers¶
A random number generator is handy for simulations etc. Here’s an example of a flat random number between 0 to 1.
flat_random = np.random.random(100000)
print('shape',flat_random.shape)
print('mean',flat_random.mean(),'std',flat_random.std())
print('min',flat_random.min(),'max',flat_random.max())
shape (100000,)
mean 0.500900215150849 std 0.28911272784081987
min 3.5732453503523054e-05 max 0.9999933452524224
… and there are others, like a normal distribution
flat_random = np.random.randn(100)
print('shape',flat_random.shape)
print('mean',flat_random.mean(),'std',flat_random.std())
print('min',flat_random.min(),'max',flat_random.max())
shape (100,)
mean -0.051172243085185205 std 0.8910350550591932
min -2.5038000033125143 max 2.158881960944597
Random seed¶
A reproducible behavior is important for many things including debugging of your code. For a random number generator, this is controlled by what’s called seed. If you set the random number seed, then the sampled values from a distribution is predictable even though they may appear random. Let’s give a shot!
SEED=123
np.random.seed(SEED)
Now let’s sample 3 random values sampled from a normal distribution.
print(np.random.randn(3))
[-1.0856306 0.99734545 0.2829785 ]
… and try again.
print(np.random.randn(3))
[-1.50629471 -0.57860025 1.65143654]
OK, so I don’t know what values come out if we try yet another time. However, if we re-set he seed, we can expect the exact same values to be drawn.
SEED=123
np.random.seed(SEED)
print(np.random.randn(3))
[-1.0856306 0.99734545 0.2829785 ]
Voila!