Show code cell source
import numpy as np
Vectors and matricies#
You can think of a numpy.ndarray
as n-dimentional (or multi-dimentional) array or if you are more mathematically inclined as vector or matrix.
A vector is simply a 1-dimensional array. In numpy
there are multiple ways to create one.
Static creation#
To practice using numpy.ndarray
you can create one using hard coded values. The example below first creates an array using two parameters: a list of (float values) and expcitly declaring its data type as np.float64
. We can review the dimensions of the array using .shape
. This returns a tuple. In this case (3,) that tells us it has 1 dimension of length 3.
my_arr = np.array([88.9, 45.6, 20.4], dtype=np.float64)
print(my_arr)
print(my_arr.dtype)
print(my_arr.shape)
print(len(my_arr))
[88.9 45.6 20.4]
float64
(3,)
3
Note that the object passed to np.array()
is a list
. In general to cast from a list
to an ndarray
us simple.
lst = [88.9, 45.6, 20.4]
arr = np.array(lst)
print(arr)
[88.9 45.6 20.4]
Tip
In general its unlikely that you will do much hard coding of array values in real data science. I’ve occationally use it to help debug code; although I normally defer to np.arange
that we will see next.
Dynamic creation#
Sequences of numbers can be created using np.arange
. This is particularly useful when testing code. The parameters work in a similar manner to range
i.e. start, stop, step.
my_arr = np.arange(10)
print(my_arr)
[0 1 2 3 4 5 6 7 8 9]
my_arr = np.arange(5, 10)
print(my_arr)
[5 6 7 8 9]
A use case I often encounter is the need to create an array of a fixed size that is empty. There are a number of ways to do this. Using np.empty
will just allocate memory and you will get whatever is already there. For example:
my_arr = np.empty(shape=8, dtype=np.int64)
print(my_arr)
[94505342737536 0 0 0
0 0 0 0]
I occationally find ‘empty’ arrays confusing - particularly when debugging code, as it can reuse previously allocated memory that contains values used early in the algorithm or model I’m running. I find it easier (and less confusing) to create an array and fill it with a fixed known value. There are some easy efficient ways to do this as well.
Assume you need to create a vector of length 5 and it will hold positive numbers in the range -127 to 127 (signed 8 bit integers).
zeros = np.zeros(shape=5, dtype=np.int8)
ones = np.ones(shape=5, dtype=np.int8)
neg_ones = -np.ones(shape=5, dtype=np.int8)
neg_fives = np.full(shape=5, fill_value=-5, dtype=np.int8)
print(zeros)
print(ones)
print(neg_ones)
print(neg_fives)
[0 0 0 0 0]
[1 1 1 1 1]
[-1 -1 -1 -1 -1]
[-5 -5 -5 -5 -5]
Loading data from file#
In many health data science applications, data will be held in an external file e.g a Comma Seperated Files (where data fields are delimited by a comma). numpy
has a several built in functions for loading this data. If you data contain no missing values the loadtxt
is very simple.
The file minor_illness_ed_attends.csv
contains the rate of attendance per 10,000 of population. The first row is a head and will be skipped on read in.
file_name = 'data/minor_illness_ed_attends.csv'
ed_data = np.loadtxt(file_name, skiprows=1, delimiter=',')
ed_data.shape
(74,)
There are 74 elements in our vector. The first 10 are.
ed_data[:10]
array([2.11927795, 3.49057545, 3.98922908, 2.36860477, 3.24124863,
2.8672584 , 3.11658522, 2.74259499, 3.61523885, 3.61523885])
saving arrays to file
I once set a piece of university MSc coursework where students were required to save the contents of an array to file. Shortly afterwards, I recieved a pretty extensive telling off from a student as “I hadn’t taught them how to save arrays and a ‘friend’ had spent several hours attempting the task”. I felt about 5 inches tall after this and to avoid future pain for learners I now reveal the method I had inadvertently kept secret from my class. I believe this book is perhaps the only place in the universe where it is documented.
np.savetxt('my_array.csv', ed_data)
Matricies#
Recall that a 1 dimentional array in numpy
is a vector. It is trivial to extend what we have learnt to a 2D matrix. Let’s start with a simple \(2 \times 2\) matrix \(A\).
\( A = \begin{bmatrix} 1 & 2\\ 3 & 4 \end{bmatrix}\)
To create the equivalent numpy 2D array:
a = np.array([[1, 2], [3, 4]])
print(a)
[[1 2]
[3 4]]
If we now inspect the .shape
property of the array we see that the dimensions are represented in a tuple of length 2.
print(a.shape)
(2, 2)
Note that numpy
defauted to int64
as the data type.
To be more explicit about type we can specify it.
a = np.array([[1, 2], [3, 4]], dtype=np.uint8)
print(a.dtype)
uint8
To access the element \(ij\) in a 2D matrix use array[i, j]
notation. For example, the element i=1
, j=1
contains the value 4.
The main thing to remember here is that like other collection types in python arrays are zero indexed. So the
a[0, 0]
would return 1 in our example.
a[1, 1]
np.uint8(4)
The next section will explore how you can slice arrays and use advanced boolean and fancy indexing.