Using Numpy

Numpy is mainly used for performing calculations and it is based on the use of arrays (somewhat like matrices and tensors) to store data.

Installation

Conda

conda install numpy

Pip

pip install numpy

Documentation

Documentation for numpy can be found here.

Importing

import numpy as np

Creating arrays

Numpy arrays can be created from Python lists:

arr = np.array([1, 2, 3])

a_list = [1, 2, 3]
arr = np.array(a_list)

However, Numpy has several functions for creating arrays which you will use quite often. The idea is to create an array, then perform manipulations on it as you wish.

You can create an array containing only zeros (more useful than you might think):

# Create a one dimensional array of zeros
X = np.zeros((2,))
# Create a two dimensional array of zeros
X2 = np.zeros((2,10))
# and 3 dimensional...
X3 = np.zeros((2,10,3))

You can also do the same thing for arrays containing ones:

# Create a one dimensional array of ones
X = np.ones((2,))
# Create a two dimensional array of ones
X2 = np.ones((2,10))
# and 3 dimensional...
X3 = np.ones((2,10,3))

If you would like a sequence of numbers, there are two ways to do so, both of the functions below create one-dimensional arrays containing sequences of numbers from 0-100 in steps of one. Note how they specify the construction in different ways:

Y_1 = np.linspace(0, 100, num=100)
Y_2 = np.arange(0, 100, 1)

There is much more. More array creation routines can be found in the documentation.

Inspection

You can print numpy arrays, and if the array is small, that's fine...

A = np.array([1, 2, 3])
print(A)

But if the array is large, it is hard to look at!

You can look at the shape of the array:

print(A.shape)

Array Indexing

Using indices to access subsets of arrays is one of the most useful features of numpy. For one-dimensional arrays, indexing is quite similar to native Python lists.

X = np.linspace(0, 100, num=100)
print(X[0]) # 0
print(X[-1]) # 100
print(X[1:]) # 1 - 100
print(X[1:10]) # 1 - 9

Things get more interesting for two-dimensional and higher arrays.

X = np.array([[1, 2, 3],[4, 5, 6]])

print(X[0,0]) # 1

print(X[0]) # [1, 2, 3]
print(X[1]) # [4, 5, 6]

print(X[:,0]) # [1, 4]
print(X[:,1]) # [2, 5]

print(X[:,1:])
# [[2, 3],
#  [5, 6]]

Combining Arrays

Arrays can be combined in various ways. Here are two examples.

A = np.array([1, 2, 3]) # shape: 3,
B = np.array([1, 2, 3]) # shape: 3,
C = np.hstack((A, B)) # shape: 6,

A = np.array([1, 2, 3]) # shape: 3,
B = np.array([1, 2, 3]) # shape: 3,
C = np.vstack((A, B)) # shape: 2,3

Note that creating arrays in this manner can slow your program down if you do it too much. Behind the scenes, numpy has to ask for more memory from your computer (RAM) when it creates a new array, which the computer has to find and allocated. This is a computationally expensive process. For instance, if you find these operations in for loops, another option is to create a new container before hand, and modify it in the loop. This way, the expensive memory allocation only happens once, and the data inside that memory is modified.

Below is a silly example, hopefully it communicates the idea of creating an array in memory, before using it over and over:

A = np.array([1, 2, 3]) # shape: 3,
B = np.array([1, 2, 3]) # shape: 3,

C = np.zeros((2,3)) # shape: 2,3
for x in range(0, 1000):
  C[0] = A
  C[1] = B

Maths

You can perform maths on individual arrays:

A = np.array([1, 2, 3])
A_summed = np.sum(A) # Sum
A_mean = np.mean(A) # Mean
A_std = np.std(A, ddof=1) # Standard deviation
A_max = np.max(A) # Maximum value
A_min = np.min(A) # Minimum value

Arrays can be multiplied element-wise:

A = np.array([1, 2, 3])
B = np.array([1, 2, 3])
C = A * B
# Answer: [1, 4, 9]

You can take the dot product:

A = np.array([1, 2, 3])
B = np.array([1, 2, 3])
C = np.dot(A, B)

There is a huge amount more that numpy can calculate. Search the documentation for the one you need.

Finding Things in Arrays

You can find subsets of arrays (such as where the min/max indices are) in a variety of ways:

A = np.array([1, 2, 3])
min_idx = np.argmin(A) # 0
max_idx = np.argmax(A) # 2

Sometimes, you want to find the highest/lowest index, absolutely:

A = np.array([1, 2, 3, -1, -2, -3])
min_idx = np.argmin(A) # 5
max_idx = np.argmax(A) # 2
# Numpy returns the index of the first min/max element
min_abs_idx = np.argmin(np.abs(A)) # 0
max_abs_idx = np.argmax(np.abs(A)) # 2

You can also find regions of arrays matching a query using masks:

A = np.array([1, 2, 3])
mask = A > 1 # [False, True, True]
A[mask] # [2, 3]
mask = (A > 1) & (A < 3) # [False, True, False]
A[mask] # [2]

Masks can also be used to find indices using np.where()

A = np.array([1, 2, 3])
idx = np.where(A > 1)[0] # [1,2]
idx = np.where((A > 1) & (A < 3))[0] # [1]