What is NumPy? Explaining how it works in Python (original) (raw)

Stephen J. Bigelow

Stephen J. Bigelow,

Published: Jul 19, 2024

NumPy is an open source mathematical and scientific computing library for Python programming tasks. The name NumPy is shorthand for Numerical Python. The NumPy library offers a collection of high-level mathematical functions including support for multi-dimensional arrays, masked arrays and matrices. NumPy also includes various logical and mathematical capabilities for those arrays such as shape manipulation, sorting, selection, linear algebra, statistical operations, random number generation and discrete Fourier transforms.

As with any programming library, NumPy needs to be added only to an existing Python installation, and programmers can easily write Python code that makes calls and exchanges data with NumPy features and functions. The NumPy library was first released in 2006. Today, the scientific computing community supports the open source library, and NumPy is currently available through GitHub. Development of the NumPy library is active and ongoing.

How are NumPy arrays different from Python lists?

In simplest terms, a Python list is most suitable for data storage and not ideally intended for math tasks, while a NumPy list can easily support math-intensive tasks.

Python provides a basic data structure called a list. A Python list supports a changeable -- or mutable -- ordered sequence of data elements or values called items. A single list can also contain many different data types. This makes lists handy for storing multiple data items as a single variable -- such as customer contact information and account numbers. However, lists are potentially inefficient, using significant amounts of memory and posing problems attempting to process mathematical operations on varied item types.

By comparison, NumPy is built around the idea of a homogeneous data array. Although a NumPy array can specify and support various data types, any array created in NumPy should use only one desired data type -- a different array can be made for a different data type. This approach requires less memory and allows more efficient system performance when processing mathematical operations on array elements.

What are ndarrays and how to use them

The primary data structure in NumPy is the N-dimensional array -- called an ndarray or simply an array. Every ndarray is a fixed-size array that is kept in memory and contains the same type of data such as integer or floating-point numbers.

An ndarray can possess up to three dimensions including array length, width and height or layers. Ndarrays use the shape attribute to return a tuple (an ordered sequence of numbers) stipulating the dimensions of the array. The data type used in the array is specified through the dtype attribute assigned to the array. These can include integers, strings, floating-point numbers and so on.

As a simple example, the following code snippet creates an array and uses shape and dtype attributes to render a two-dimensional array using two rows of three elements in the array, each intended to hold 4-byte (32-bit) integer numbers:

g = np.array([[1, 2, 3], [4, 5, 6]], np.int32)
type(g)
<type 'numpy.ndarray'>
g.shape
(2, 3)
g.dtype
dtype('int32')

Several NumPy functions can easily create arrays that are empty or prefilled with either zeros or ones. Different NumPy arrays can also be stacked (combined) either vertically or horizontally.

An established ndarray can be indexed to access its contents. For example, to read the second row and third column of the array above, try the indexing function such as:

g[1,2]

The return value should be 6. Remember that the first row and column would be [0,0], so the second row and third column would be [1,2].

Once created, programmers can perform mathematical operations on the contents of an ndarray. As examples, simple math functions include the following:

Addition add the elements of an array: numpy.add(x,y)
Subtraction subtract the elements of an array: numpy.subtract(x,y)
Multiplication multiply the elements of an array: numpy.multiply(x,y)
Division divide the elements of an array: numpy.divide(x,y)
Power raise one array element to the power of another: numpy.power(x,y)
Matrix multiply apply matrix multiplication to the array: numpy.matmul(x,y)

The following simple example creates two one-dimensional arrays and then adds the elements of one array to the elements of a second array:

array1 = numpy.array([1, 2, 3])
array2 = numpy.array([4, 5, 6])
result = numpy.add(array1, array2)
print(result)

The addition's resulting printed (displayed) output would then be [5 7 9].

Over 60 basic math functions and many other complex functions support logic, algebra, trigonometry and calculus. NumPy documentation provides detailed lists of available functions and code examples for programmers to learn and implement NumPy capabilities.

Common NumPy applications and uses

The NumPy mathematical library can be used by any software developer (at any experience level) seeking to integrate complex numerical computing functions into their Python codebase. NumPy is also routinely used in many different data science, machine learning (ML) and scientific Python software packages including the following:

Matplotlib.
Pandas.
scikit-image.
scikit-learn.
SciPy.

NumPy is regularly applied in a wide range of use cases including the following:

Data manipulation and analysis. NumPy can be used for data cleaning, transformation and aggregation. The data can then be readily processed through varied NumPy mathematical operations such as statistical analysis, Fourier analysis and linear algebra -- all vital for advanced data analytics and data science tasks.
Scientific computing. NumPy handles advanced mathematical operations such as matrix multiplication, eigenvalue calculation and differential equations. This makes NumPy particularly valuable across a vast range of simulation, modeling, visualization, computational processing and other scientific computing tasks.
Machine learning. ML tasks are math-intensive, and ML libraries -- such as TensorFlow and scikit-learn -- readily use NumPy for mathematical computations needed to support ML algorithms and model training.
Signal and image processing. Signals and images can readily be represented as data arrays, and NumPy can provide tools needed to perform important processing tasks on that data for purposes such as enhancement and filtering.

Limitations of NumPy

Although NumPy offers some noteworthy advantages for mathematical software designers, several limitations must be considered. Common disadvantages include the following:

Lower flexibility. The focus on numerical and homogeneous data types is key to NumPy performance and efficiency, but it can also limit the flexibility of NumPy compared to other storage array mechanisms where heterogeneous data types must be supported. Additionally, NumPy lacks support for missing values, requiring careful validation and vetting of the data set.
Non-numerical data types. NumPy can support many different data types, but its primary focus is on numerical data types, such as floating-point numbers, and non-numerical data types, such as text strings, which might see little benefit from NumPy array storage compared to other array storage mechanisms such as Python lists.
Demands of change. NumPy and Python lists are both mutable -- array contents can be appended, extended and combined. However, NumPy is inefficient in handling such tasks, and routines designed to change, add, combine or delete data within the array can suffer performance limitations because of how NumPy allocates and uses memory.

Ultimately, NumPy provides a powerful platform for scientific computation, but it is not a replacement for all array programming tasks. Software designers must carefully evaluate the nature and requirements of the array before selecting NumPy versus Python lists or other array mechanisms.

How to install and import NumPy

The only prerequisite for NumPy is a Python distribution. The Anaconda distribution of Python already includes Python and NumPy and might be easier for users just getting started with NumPy and scientific computing projects.

For users already employing a version of Python, NumPy can be installed using Conda (a package, dependency, and environment management tool) or Pip (a package manager for Python modules). Simple installation commands for each package manager can appear as:

conda install numpy

pip install numpy

Once installed, the NumPy library can be added or connected to the Python codebase using the Python import command such as:

import numpy

The module name ("numpy" in this case) can also be renamed through the import command to make the module name more relevant, readable or easier to reference. As an example, to import NumPy as "np" use the as switch such as:

import numpy as np

Thus, any calls made to NumPy can be made using the abbreviation np instead.