Hide code cell source
import numpy as np

Vectors and matricies#

You can think of a numpy.ndarray as n-dimentional (or multi-dimentional) array or if you are more mathematically inclined as vector or matrix.

A vector is simply a 1-dimensional array. In numpy there are multiple ways to create one.

Static creation#

To practice using numpy.ndarray you can create one using hard coded values. The example below first creates an array using two parameters: a list of (float values) and expcitly declaring its data type as np.float64. We can review the dimensions of the array using .shape. This returns a tuple. In this case (3,) that tells us it has 1 dimension of length 3.

my_arr = np.array([88.9, 45.6, 20.4], dtype=np.float64)
print(my_arr)
print(my_arr.dtype)
print(my_arr.shape)
print(len(my_arr))
[88.9 45.6 20.4]
float64
(3,)
3

Note that the object passed to np.array() is a list. In general to cast from a list to an ndarray us simple.

lst = [88.9, 45.6, 20.4]
arr = np.array(lst)
print(arr)
[88.9 45.6 20.4]

Tip

In general its unlikely that you will do much hard coding of array values in real data science. I’ve occationally use it to help debug code; although I normally defer to np.arange that we will see next.

Dynamic creation#

Sequences of numbers can be created using np.arange. This is particularly useful when testing code. The parameters work in a similar manner to range i.e. start, stop, step.

my_arr = np.arange(10)
print(my_arr)
[0 1 2 3 4 5 6 7 8 9]
my_arr = np.arange(5, 10)
print(my_arr)
[5 6 7 8 9]

A use case I often encounter is the need to create an array of a fixed size that is empty. There are a number of ways to do this. Using np.empty will just allocate memory and you will get whatever is already there. For example:

my_arr = np.empty(shape=8, dtype=np.int64)
print(my_arr)
[94505342737536              0              0              0
              0              0              0              0]

I occationally find ‘empty’ arrays confusing - particularly when debugging code, as it can reuse previously allocated memory that contains values used early in the algorithm or model I’m running. I find it easier (and less confusing) to create an array and fill it with a fixed known value. There are some easy efficient ways to do this as well.

Assume you need to create a vector of length 5 and it will hold positive numbers in the range -127 to 127 (signed 8 bit integers).

zeros = np.zeros(shape=5, dtype=np.int8)
ones = np.ones(shape=5, dtype=np.int8)
neg_ones = -np.ones(shape=5, dtype=np.int8)
neg_fives = np.full(shape=5, fill_value=-5, dtype=np.int8)
print(zeros)
print(ones)
print(neg_ones)
print(neg_fives)
[0 0 0 0 0]
[1 1 1 1 1]
[-1 -1 -1 -1 -1]
[-5 -5 -5 -5 -5]

Loading data from file#

In many health data science applications, data will be held in an external file e.g a Comma Seperated Files (where data fields are delimited by a comma). numpy has a several built in functions for loading this data. If you data contain no missing values the loadtxt is very simple.

The file minor_illness_ed_attends.csv contains the rate of attendance per 10,000 of population. The first row is a head and will be skipped on read in.

file_name = 'data/minor_illness_ed_attends.csv'
ed_data = np.loadtxt(file_name, skiprows=1, delimiter=',')
ed_data.shape
(74,)

There are 74 elements in our vector. The first 10 are.

ed_data[:10]
array([2.11927795, 3.49057545, 3.98922908, 2.36860477, 3.24124863,
       2.8672584 , 3.11658522, 2.74259499, 3.61523885, 3.61523885])

saving arrays to file

I once set a piece of university MSc coursework where students were required to save the contents of an array to file. Shortly afterwards, I recieved a pretty extensive telling off from a student as “I hadn’t taught them how to save arrays and a ‘friend’ had spent several hours attempting the task”. I felt about 5 inches tall after this and to avoid future pain for learners I now reveal the method I had inadvertently kept secret from my class. I believe this book is perhaps the only place in the universe where it is documented.

np.savetxt('my_array.csv', ed_data)

Matricies#

Recall that a 1 dimentional array in numpy is a vector. It is trivial to extend what we have learnt to a 2D matrix. Let’s start with a simple \(2 \times 2\) matrix \(A\).

\( A = \begin{bmatrix} 1 & 2\\ 3 & 4 \end{bmatrix}\)

To create the equivalent numpy 2D array:

a = np.array([[1, 2], [3, 4]])
print(a)
[[1 2]
 [3 4]]

If we now inspect the .shape property of the array we see that the dimensions are represented in a tuple of length 2.

print(a.shape)
(2, 2)

Note that numpy defauted to int64 as the data type.

To be more explicit about type we can specify it.

a = np.array([[1, 2], [3, 4]], dtype=np.uint8)
print(a.dtype)
uint8

To access the element \(ij\) in a 2D matrix use array[i, j] notation. For example, the element i=1, j=1 contains the value 4.

The main thing to remember here is that like other collection types in python arrays are zero indexed. So the a[0, 0] would return 1 in our example.

a[1, 1]
np.uint8(4)

The next section will explore how you can slice arrays and use advanced boolean and fancy indexing.