Numpy Tutorial — 2: Beginners Guide to Data Analysis

This is part two of the NumPy tutorial series. If you’ve not read my previous tutorial on NumPy, I’d recommend you to do so here. In this tutorial, I’m going to cover some important things that are required for data analysis, meaning, I’m not going to cover everything possible with NumPy.

Let’s proceed by first pointing out the way we import NumPy, which you've already seen as follows:

import numpy as np

Indexing and Slicing of NumPy array

In the previous tutorial, we have seen how to create a NumPy array and how to play around with its shape. In this tutorial, we will see how to extract specific values from the array using indexing and slicing.

Slicing 1-D NumPy arrays

Slicing means retrieving elements from one index to another index. All we have to do is to pass the starting and ending point in the index like this: [start: end].

However, you can even take it up a notch by passing the step-size. Well, suppose you wanted to print every other element from the array, you would define your step-size as 2, meaning get the element 2 places away from the present index.

Incorporating all this into a single index would look something like this: [start:end:step-size].

In:  a = np.array([1,2,3,4,5,6])
print(a[1:5:2])
Out: [2 4]

Notice that the last element did not get considered. This is because slicing includes the start index but excludes the end index.

If you don’t specify the start or end index, it is taken as 0 or array size, respectively, as default. And the step-size by default is 1.

In:  a = np.array([1,2,3,4,5,6])
print(a[:6:2])
print(a[1::2])
print(a[1:6:])
Out: [1 3 5]
[2 4 6]
[2 3 4 5 6]

Slicing 2-D NumPy arrays

Now, a 2-D array has rows and columns so it can get a little tricky to slice 2-D arrays. But once you understand it, you can slice any dimension array!

Before learning how to slice a 2-D array, let’s have a look at how to retrieve an element from a 2-D array:

In:  a = np.array([[1,2,3],[4,5,6]])
print(a[0,0])
print(a[1,2])
print(a[1,0])
Out: 1
6
4

Here, we provided the row value and column value to identify the element we wanted to extract. While in a 1-D array, we were only providing the column value since there was only 1 row.

So, to slice a 2-D array, you need to mention the slices for both, the row and the column:

In:  a = np.array([[1,2,3],[4,5,6]])
# print first row values
print('First row values :','\n',a[0:1,:])
# with step-size for columns
print('Alternate values from first row :','\n',a[0:1,::2])
print('Second column values :','\n',a[:,1::2])
print('Arbitrary values :','\n',a[0:1,1:3])
Out: First row values :
[[1 2 3]]
Alternate values from first row :
[[1 3]]
Second column values :
[[2]
[5]]
Arbitrary values :
[[2 3]]

So far we haven’t seen a 3-D array. Let’s first visualize how a 3-D array looks like:

In:  a = np.array([[[1,2],[3,4],[5,6]],
[[7,8],[9,10],[11,12]],
[[13,14],[15,16],[17,18]]])
print(a)
Out: [[[ 1 2]
[ 3 4]
[ 5 6]]

[[ 7 8]
[ 9 10]
[11 12]]

[[13 14]
[15 16]
[17 18]]]

In addition to the row and

columns, as in a 2-D array, a 3-D array also has a depth axis where it stacks one 2-D array behind the other. So, when you are slicing a 3-D array, you also need to mention which 2-D array you are slicing. This usually comes as the first value in the index:

In:  print('First array, first row, first column value :', 
'\n',a[0,0,0])
print('First array last column :','\n',a[0,:,1])
print('First two rows for second and third arrays
:','\n',a[1:,0:2,0:2])
Out: First array, first row, first column value :
1
First array last column :
[2 4 6]
First two rows for second and third arrays :
[[[ 7 8]
[ 9 10]]

[[13 14]
[15 16]]]

If in case you wanted the values as a single dimension array, you can always use the flatten() method to do the job!

In:  print('Printing as a single array 
:','\n',a[1:,0:2,0:2].flatten())
Out: Printing as a single array :
[ 7 8 9 10 13 14 15 16]

Negative slicing of NumPy arrays

An interesting way to slice your array is to use negative slicing. Negative slicing prints elements from the end rather than the beginning. Have a look below:

In:  a = np.array([[1,2,3,4,5],[6,7,8,9,10]])
print(a[:,-1])
Out: [ 5 10]

Here, the last values for each row were printed. If, however, we wanted to extract from the end, we would have to explicitly provide a negative step-size otherwise the result would be an empty list.

In: print(a[:,-1:-3:-1])Out:[[ 5  4]
[10 9]]

The interesting use of negative slicing is to reverse the original array.

In: a = np.array([[1,2,3,4,5],[6,7,8,9,10]])
print('Original array :','\n',a)
print('Reversed array :','\n',a[::-1,::-1])
Out:Original array :
[[ 1 2 3 4 5]
[ 6 7 8 9 10]]
Reversed array :
[[10 9 8 7 6]
[ 5 4 3 2 1]]

You can also use the np.flip() method to reverse a ndarray.

In: a = np.array([[1,2,3,4,5],[6,7,8,9,10]])
print('Original array :','\n',a)
print('Reversed array vertically :','\n',np.flip(a,axis=1))
print('Reversed array horizontally :','\n',np.flip(a,axis=0))
Out:Original array :
[[ 1 2 3 4 5]
[ 6 7 8 9 10]]
Reversed array vertically :
[[ 5 4 3 2 1]
[10 9 8 7 6]]
Reversed array horizontally :
[[ 6 7 8 9 10]
[ 1 2 3 4 5]]

Stacking and Concatenating NumPy arrays

Stacking ndarrays

You can create a new array by combining existing arrays. This you can do in two ways:

  • Either combine the arrays vertically (i.e. along the rows) using the np.vstack() method, thereby increasing the number of rows in the resulting array
  • Or horizontally combine the arrays (i.e. along with the columns) using the np.hstack(), thereby increasing the number of columns in the resulting array
In: a = np.arange(0,5) 
b = np.arange(5,10)
print('Array 1 :','\n',a)
print('Array 2 :','\n',b)
print('Vertical stacking :','\n',np.vstack((a,b)))
print('Horizontal stacking :','\n',np.hstack((a,b)))
Out:Array 1 :
[0 1 2 3 4]
Array 2 :
[5 6 7 8 9]
Vertical stacking :
[[0 1 2 3 4]
[5 6 7 8 9]]
Horizontal stacking :
[0 1 2 3 4 5 6 7 8 9]

A point to note here is that the axis along which you are combining the array should have the same size otherwise you are bound to ger an error!

Another interesting way to combine arrays is by using the np.dstack() method. It combines array elements index by index and stacks them along the depth axis:

In: a = [[1,2],[3,4]]
b = [[5,6],[7,8]]
c = np.dstack((a,b))
print('Array 1 :','\n',a)
print('Array 2 :','\n',b)
print('Dstack :','\n',c)
print(c.shape)
Out:Array 1 :
[[1, 2], [3, 4]]
Array 2 :
[[5, 6], [7, 8]]
Dstack :
[[[1 5]
[2 6]]

[[3 7]
[4 8]]]

Concatenating ndarrays

While stacking arrays is one way of combining old arrays to ger a new one, you could also use the np.concatenate() method where the passed arrays are joined along an existing axis:

In: a = np.arange(0,5).reshape(1,5)
b = np.arange(5,10).reshape(1,5)
print('Array 1 :','\n',a)
print('Array 2 :','\n',b)
print('Concatenate along rows
:','\n',np.concatenate((a,b),axis=0))
print('Concatenate along columns
:','\n',np.concatenate((a,b),axis=1))
Out:Array 1 :
[[0 1 2 3 4]]
Array 2 :
[[5 6 7 8 9]]
Concatenate along rows :
[[0 1 2 3 4]
[5 6 7 8 9]]
Concatenate along columns :
[[0 1 2 3 4 5 6 7 8 9]]

The drawback of this method is that the original array must have the axis along which you want to combine. Otherwise, get ready to be greeted by an error.

Another very useful function is to np.append() method that adds new elements to the end of a ndarray. This is obviously useful when you already have an existing ndarray but want to add new values to it.

In: a = np.array([[1,2],
[3,4]])
np.append(a,[[5,6]],axis=0)
Out:array([[1, 2],
[3, 4],
[5, 6]])

Broadcasting in NumPy arrays — A class apart

Broadcasting is one of the best features of ndarrays. It lets you perform arithmetics operations between ndarrays of different sizes or between a ndarray and a simple number!

Broadcasting essentially stretches the smaller ndarray so that it matches the shape of the larger ndarray:

In: a = np.arange(10,20,2)
b = np.array([[2],[2]])
print('Adding two different sized arrays:','\n',a+b)
print('Multiplying an ndarray and a number:','\n',a*2)
Out:Adding two different sized arrays:
[[12 14 16 18 20]
[12 14 16 18 20]]
Multiplying an ndarray and a number:
[20 24 28 32 36]

Its working can be thought of like stretching or making copies of the scalar, the number, [2,2,2] to match the shape of the ndarray, and then operate element-wise. But no such copies are being made. It is just a way of thinking about how broadcasting is working.

This is very useful because it is more efficient to multiply an array with a scalar value rather than another array! It is important to note that two ndarrays can broadcast together only when they are compatible.

Ndarrays are compatible when:

  1. Both have the same dimensions
  2. Either of the ndarrays has a dimension of 1. The one having a dimension of 1 is broadcast to meet the size requirements of the larger ndarray.
In: a = np.ones((3,3))
b = np.array([2])
a+b
Out:array([[3., 3., 3.],
[3., 3., 3.],
[3., 3., 3.]])

NumPy Ufuncs — The secret of its success!

Python is a dynamically typed language. This means the data type of a variable does not need to be known at the time of the assignment. Python will automatically determine it at run-time. While this means a cleaner and easier code to write, it also makes Pyhton sluggish.

This problem manifests itself when Python has to do many operations repeatedly, like the addition of two arrays. This is so because each time an operation needs to be performed, Python has to check the data type of the element. This problem is overcome by NumPy using the ufuncs function. ufuncs are Universal functions in NumPy that are simply mathematical functions. They perform fast element-wise functions. They are called automatically when you are performing simple arithmetic operations on NumPy arrays because they act as wrappers for NumPy ufuncs.

Maths with NumPy arrays

Here are some of the most important and useful operations that you will need to perform on your NumPy array.

Basic arithmetic operations on NumPy arrays

The basic arithmetic operations can easily be performed on NumPy arrays. The important thing to remember is that these simple arithmetics operation symbols just to act as wrappers for NumPy ufuncs.

In: a = np.arange(1,6)
b = np.arange(6,11)

print('Addition :', a+5)
print('Subtract :' ,a-5)
print('Multiply :', a*5)
print('Divide :', a/5)
print('Power :', a**2)
print('Remainder :', a%5)
Out:Addition : [ 6 7 8 9 10]
Subtract : [-4 -3 -2 -1 0]
Multiply : [ 5 10 15 20 25]
Divide : [0.2 0.4 0.6 0.8 1. ]
Power : [ 1 4 9 16 25]
Remainder : [1 2 3 4 0]

Mean, Median and Standard deviation

To find the mean, median, and standard deviation of a NumPy array, use the mean(), median(), and std() methods:

In: a = np.arange(5,15,2)
print('Mean :', np.mean(a))
print('Median :', np.median(a))
print('Standard deviation', np.std(a))
Out:Mean : 9.0
Median : 9.0
Standard deviation 2.8284271247461903

Min-Max values and their indexes

Min and Max values in an array can be easily found using the min() and max() methods:

In: a = np.array([[1,6],[4,3]])
print('Min :', np.min(a,axis=0))
print('Max :', np.max(a,axis=1))
Out:Min : [1 3]
Max : [6 4]

You can also easily determine
the index of the minimum or maximum value in the ndarray along a particular axis using the argmin() and argmax() methods:

In: a = np.array([[1,6,5], 
[4,3,7]])
# minimum along a column
print('Min :',np.argmin(a,axis=0))
# maximum along a row
print('Max :',np.argmax(a,axis=1))
Out:Min : [0 1 0]
Max : [1 2]

In this article, I gave a basic idea about NumPy array slicing, concatenating, broadcasting, and mathematical operations.

The Google Colab code is available — here.

Thank you for reading, if you liked this article, a clap/recommendation would be really appreciated. It helps me to write more such articles.

I am a Python programming student and learning different concepts. Here I am publishing what I learnt. I hope these articles are helpful.