Up and Running with Python3¶

In this notebook we cover numpy, very widely used scientific python applications, it is so popular that most of other scientific applications are either built on top of numpy or at least support its data types. We cover the basics:

Introduction
Data types
Array indexing, reshaping, slicing, masking
Saving array data
Linear algebra
Random numbers
Conclusion

This notebook is based on tutorial by Marco Del Tutto for an introductory workshop.

NumPy: Getting Started¶

NumPy is a fundamental package for scientific computing with Python. Is a Python library adding support for multi-dimensional arrays and matrices, as well as many useful mathematical functions to operate on these arrays.

import numpy as np
print(np.__version__)

1.21.1

In the previous notebook, we saw how to construct lists. Now, we will start from lists, and see how we can construct numpy arrays from them.

masses_list = [0.511, 105.66, 1.78e3]
masses_array = np.array(masses_list)
print(masses_array)

[5.1100e-01 1.0566e+02 1.7800e+03]

Multiply every element by a number:

masses_array_gev = masses_array * 1e-3
print(masses_array_gev)

[5.1100e-04 1.0566e-01 1.7800e+00]

To get the size of the array, you can use the len() function, or the .size attribute.

You can get the shape of the object by using the .shape attribute

# EXCERCISE: Check that len and size give the same result. What does shape return?
print('len is ', len(masses_array))
print('size is ', masses_array.size)
print('shape is ', masses_array.shape)

len is  3
size is  3
shape is  (3,)

# EXCERCISE: Try np.linspace, np.zeros, np.ones
np.linspace(5, 15, 9)
np.zeros(9)
np.ones(9)
np.zeros((5, 4))

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

NumPy DataTypes¶

Up until this point, we have been using the default datatypes that NumPy selects for arrays. In the cases for arange and linspace, the default types are integers.

In the case of zeros and ones, the default type is floating point. Each of these functions has a dtype parameter. For example, we can look here and we see linspace has a dtype parameter and its default value is set to None. You can use this parameter to determine the datatype for each element in an array. Remember that each element must have the same datatype.

At this link you can find all the NumPy datatypes.

In the previous examples, we saw that the ones function and the zeros function return arrays that contain floating point values.

You can change this and select the datatype that you want by setting a value for the dtype parameter. For example you can do np.ones(9, dtype='int64').

# EXCERCISE: Create an array with zeros that has 11 elements, each of which is a 64-bit integer
a = np.zeros(11, dtype='int64')
print(a, type(a))
print(a.dtype)

[0 0 0 0 0 0 0 0 0 0 0] <class 'numpy.ndarray'>
int64

And…there is also the complex data type!

You can specify a complex type in python using j as imaginary number, as in 1+2j.

# EXCERCISE: Try to add an imaginary number to a numpy array and print the array
masses_list = [0.511, 105.66, 1.78e3]
masses_list.append(1+2j)
masses_array = np.array(masses_list)
print(masses_array)

[5.1100e-01+0.j 1.0566e+02+0.j 1.7800e+03+0.j 1.0000e+00+2.j]

Array Indexing, Reshaping, Slicing, Masking¶

masses_array = np.array([2.2, 4.7, 0.511, 1.28, 96, 105.66, 173e3, 4.18e3, 1.78e3, 0, 0, 91.19e3, 80.39e3, 124.97e3])

You can use negatixe index to start counting from the end of the array.

For example, to select the last element:

print(masses_array[-1])

124970.0

Or to select the penultimate element:

print(masses_array[-2])

80390.0

Reshape¶

The above array masses_array is a 1-D array with 14 elements in it. Numpy allows to resphape it easily. For example, we can transform it into a 2-D array with 7 columns and 2 rows.

masses_array_2d = np.reshape(masses_array, (7, 2))
print(masses_array_2d)

[[2.2000e+00 4.7000e+00]
 [5.1100e-01 1.2800e+00]
 [9.6000e+01 1.0566e+02]
 [1.7300e+05 4.1800e+03]
 [1.7800e+03 0.0000e+00]
 [0.0000e+00 9.1190e+04]
 [8.0390e+04 1.2497e+05]]

reshape also exists as an array attribute

masses_array_2d = masses_array.reshape((7,2))
print(masses_array_2d)

[[2.2000e+00 4.7000e+00]
 [5.1100e-01 1.2800e+00]
 [9.6000e+01 1.0566e+02]
 [1.7300e+05 4.1800e+03]
 [1.7800e+03 0.0000e+00]
 [0.0000e+00 9.1190e+04]
 [8.0390e+04 1.2497e+05]]

Exercise: try to reshape into (7,3).

#Exercise: try to reshape into (7,3)

reshape function allows you to specify one of shape dimension value to be -1, which would mean “go figure out what it should be.”

masses_array_2d = masses_array_2d.reshape((-1,7))
print(masses_array_2d.shape)

(2, 7)

Slicing¶

A basic slice syntax is i:j:k where i is the starting index, j is the stopping index, and k is the step:

x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
print(x[1:7:2])

[1 3 5]

Now, if i is not given, it defaults to 0.

If j is not given, it defaults to the lenght of the array (call it n).

If k is not given it defaults to 1.

Example i = 3, j and k defaulted to n and 1:

print(x[3:])

[3 4 5 6 7 8 9]

Example i defaulted to 0, j = 4 and k defaulted and 1:

print(x[:4])

[0 1 2 3]

Example i defaulted to 0, j = 4 and k = 2:

print(x[:4:2])

[0 2]

Masking¶

Let’s start with a numpy array

vector = np.array([26, 14, 1, -28, 8, 7])

Then, we create a “mask”. We construct a list with contains True and False values, depending if the elements of vector are divisbile or not by 7.

mask = 0 == (vector % 7)
print(mask)

[False  True False  True False  True]

Finally, we can applt this mask to our vector in order to select only elements that are divisible by 7:

print(vector[mask])

[ 14 -28   7]

Saving & Reading an array¶

It’s useful to be able to save your array = data! So here’s how you can save multiple arrays in a file.

np.savez('erase.npz', kazu=masses_array, daniel=masses_array_2d)

The above command saves masses_array and masses_array_2d data with keywords “kazu” and “daniel”.

Confirm the file erase.npz is created. The choice of this filename is to remind our future-self that this can be removed :)

! ls -lht erase.npz

-rw-r--r-- 1 ldomine ki 730 Jun 30 16:24 erase.npz

Now let’s re-read data from the file.

data = np.load('erase.npz')
type(data)

numpy.lib.npyio.NpzFile

You can access erase.npz file contents by the keywords you defined upon saving the data.

print('shape of daniel',data['daniel'].shape)
print('contents of kazu')
print(data['kazu'])

shape of daniel (2, 7)
contents of kazu
[2.2000e+00 4.7000e+00 5.1100e-01 1.2800e+00 9.6000e+01 1.0566e+02
 1.7300e+05 4.1800e+03 1.7800e+03 0.0000e+00 0.0000e+00 9.1190e+04
 8.0390e+04 1.2497e+05]

Linear Algebra¶

The np.matrix function returned a matrix from an array like object, or from a string of data.

A matrix is a specialized 2D array that retains its 2D nature through operations.

It has special operators such as asterisk for matrix multiplication, and a double asterisk for matrix power or matrix exponentiation operations.

Let’s contruct the a CKM matrix:

ckm_matrix = np.matrix([[0.97427, 0.22534, 0.00351 ],
                        [0.22520, 0.97344, 0.0412  ],
                        [0.00867, 0.0404,  0.999146]])
print(ckm_matarix)
print(type(ckm_matrix))

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
/tmp/ipykernel_46289/2919251912.py in <module>
      2                         [0.22520, 0.97344, 0.0412  ],
      3                         [0.00867, 0.0404,  0.999146]])
----> 4 print(ckm_matarix)
      5 print(type(ckm_matrix))

NameError: name 'ckm_matarix' is not defined

Again, we can use the .shape attribute to see what is the shape of this matrix:

print(ckm_matrix.shape)

(3, 3)

And also ndim to see the number of dimensions:

print(ckm_matrix.ndim)

Let’s use the help function to see what opetations are available:

#help(np.matrix)

The transpose attribute .T to calculate the transpose of this matrix.

Next we’ll use another attribute, .I, to calculate the inverse of this matrix. Notice that the inverse is calculated on my first matrix, and not upon the transform of my first matrix.

For example, is the transpose of the CKM matrix:

ckm_matrix.T

matrix([[0.97427 , 0.2252  , 0.00867 ],
        [0.22534 , 0.97344 , 0.0404  ],
        [0.00351 , 0.0412  , 0.999146]])

Let’s check that the CKM matrix is unitary:

result = ckm_matrix * ckm_matrix.I.T
print(result)

[[ 9.99931003e-01  3.78984306e-04 -5.17913575e-03]
 [-1.46579176e-04  9.99999968e-01  8.01956999e-04]
 [ 5.79675730e-03 -2.16445093e-03  1.00003722e+00]]

Random numbers¶

A random number generator is handy for simulations etc. Here’s an example of a flat random number between 0 to 1.

flat_random = np.random.random(100000)
print('shape',flat_random.shape)
print('mean',flat_random.mean(),'std',flat_random.std())
print('min',flat_random.min(),'max',flat_random.max())

shape (100000,)
mean 0.500900215150849 std 0.28911272784081987
min 3.5732453503523054e-05 max 0.9999933452524224

… and there are others, like a normal distribution

flat_random = np.random.randn(100)
print('shape',flat_random.shape)
print('mean',flat_random.mean(),'std',flat_random.std())
print('min',flat_random.min(),'max',flat_random.max())

shape (100,)
mean -0.051172243085185205 std 0.8910350550591932
min -2.5038000033125143 max 2.158881960944597

Random seed¶

A reproducible behavior is important for many things including debugging of your code. For a random number generator, this is controlled by what’s called seed. If you set the random number seed, then the sampled values from a distribution is predictable even though they may appear random. Let’s give a shot!

SEED=123
np.random.seed(SEED)

Now let’s sample 3 random values sampled from a normal distribution.

print(np.random.randn(3))

[-1.0856306   0.99734545  0.2829785 ]

… and try again.

print(np.random.randn(3))

[-1.50629471 -0.57860025  1.65143654]

OK, so I don’t know what values come out if we try yet another time. However, if we re-set he seed, we can expect the exact same values to be drawn.

SEED=123
np.random.seed(SEED)
print(np.random.randn(3))

[-1.0856306   0.99734545  0.2829785 ]

Voila!

lartpc_mlreco3d Tutorials