Reading and writing files — NumPy v2.3.dev0 Manual (original) (raw)

This page tackles common applications; for the full collection of I/O routines, see Input and output.

Reading text and CSV files#

With no missing values#

Use numpy.loadtxt.

With missing values#

Use numpy.genfromtxt.

numpy.genfromtxt will either

With non-whitespace delimiters#

with open("csv.txt", "r") as f: ... print(f.read()) 1, 2, 3 4,, 6 7, 8, 9

Masked-array output#

np.genfromtxt("csv.txt", delimiter=",", usemask=True) masked_array( data=[[1.0, 2.0, 3.0], [4.0, --, 6.0], [7.0, 8.0, 9.0]], mask=[[False, False, False], [False, True, False], [False, False, False]], fill_value=1e+20)

Array output#

np.genfromtxt("csv.txt", delimiter=",") array([[ 1., 2., 3.], [ 4., nan, 6.], [ 7., 8., 9.]])

Array output, specified fill-in value#

np.genfromtxt("csv.txt", delimiter=",", dtype=np.int8, filling_values=99) array([[ 1, 2, 3], [ 4, 99, 6], [ 7, 8, 9]], dtype=int8)

Whitespace-delimited#

numpy.genfromtxt can also parse whitespace-delimited data files that have missing values if

File with width=4. The data does not have to be justified (for example,

the 2 in row 1), the last column can be less than width (for example, the 6

in row 2), and no delimiting character is required (for instance 8888 and 9

in row 3)

with open("fixedwidth.txt", "r") as f:
... data = (f.read())
print(data)
1 2 3
44 6
7 88889

Showing spaces as ^

print(data.replace(" ","^"))
1^^^2^^^^^^3
44^^^^^^6
7^^^88889
np.genfromtxt("fixedwidth.txt", delimiter=4)
array([[1.000e+00, 2.000e+00, 3.000e+00],
[4.400e+01, nan, 6.000e+00],
[7.000e+00, 8.888e+03, 9.000e+00]])

7 888 9

np.genfromtxt("skip.txt", invalid_raise=False)
main:1: ConversionWarning: Some errors were detected !
Line #2 (got 2 columns instead of 3)
array([[ 1., 2., 3.],
[ 7., 888., 9.]])

44 6
7 888 9

Tabs vs. spaces

print(data.replace("\t","^"))
1^2^3
44^ ^6
7^888^9
np.genfromtxt("tabs.txt", delimiter="\t", missing_values=" +")
array([[ 1., 2., 3.],
[ 44., nan, 6.],
[ 7., 888., 9.]])

Read a file in .npy or .npz format#

Choices:

Write to a file to be read back by NumPy#

Binary#

Usenumpy.save, or to store multiple arrays numpy.savezor numpy.savez_compressed.

For security and portability, setallow_pickle=False unless the dtype contains Python objects, which requires pickling.

Masked arrays can't currently be saved, nor can other arbitrary array subclasses.

Human-readable#

numpy.save and numpy.savez create binary files. To write a human-readable file, use numpy.savetxt. The array can only be 1- or 2-dimensional, and there’s no ` savetxtz` for multiple files.

Large arrays#

See Write or read large arrays.

Read an arbitrarily formatted binary file (“binary blob”)#

Use a structured array.

Example:

The .wav file header is a 44-byte block preceding data_size bytes of the actual sound data:

chunk_id "RIFF" chunk_size 4-byte unsigned little-endian integer format "WAVE" fmt_id "fmt " fmt_size 4-byte unsigned little-endian integer audio_fmt 2-byte unsigned little-endian integer num_channels 2-byte unsigned little-endian integer sample_rate 4-byte unsigned little-endian integer byte_rate 4-byte unsigned little-endian integer block_align 2-byte unsigned little-endian integer bits_per_sample 2-byte unsigned little-endian integer data_id "data" data_size 4-byte unsigned little-endian integer

The .wav file header as a NumPy structured dtype:

wav_header_dtype = np.dtype([ ("chunk_id", (bytes, 4)), # flexible-sized scalar type, item size 4 ("chunk_size", "<u4"), # little-endian unsigned 32-bit integer ("format", "S4"), # 4-byte string, alternate spelling of (bytes, 4) ("fmt_id", "S4"), ("fmt_size", "<u4"), ("audio_fmt", "<u2"), # ("num_channels", "<u2"), # .. more of the same ... ("sample_rate", "<u4"), # ("byte_rate", "<u4"), ("block_align", "<u2"), ("bits_per_sample", "<u2"), ("data_id", "S4"), ("data_size", "<u4"), # # the sound data itself cannot be represented here: # it does not have a fixed size ])

header = np.fromfile(f, dtype=wave_header_dtype, count=1)[0]

This .wav example is for illustration; to read a .wav file in real life, use Python’s built-in module wave.

(Adapted from Pauli Virtanen, Advanced NumPy, licensed under CC BY 4.0.)

Write or read large arrays#

Arrays too large to fit in memory can be treated like ordinary in-memory arrays using memory mapping.

Memory mapping lacks features like data chunking and compression; more full-featured formats and libraries usable with NumPy include:

For tradeoffs among memmap, Zarr, and HDF5, seepythonspeed.com.

Write files for reading by other (non-NumPy) tools#

Formats for exchanging data with other tools include HDF5, Zarr, and NetCDF (see Write or read large arrays).

Write or read a JSON file#

NumPy arrays and most NumPy scalars are not directlyJSON serializable. Instead, use a custom json.JSONEncoder for NumPy types, which can be found using your favorite search engine.

Save/restore using a pickle file#

Avoid when possible; pickles are not secure against erroneous or maliciously constructed data.

Use numpy.save and numpy.load. Set allow_pickle=False, unless the array dtype includes Python objects, in which case pickling is required.

numpy.load and pickle submodule also support unpickling files created with NumPy 1.26.

Convert from a pandas DataFrame to a NumPy array#

See pandas.Series.to_numpy.

Save/restore using tofile and fromfile#

In general, prefer numpy.save and numpy.load.

numpy.ndarray.tofile and numpy.fromfile lose information on endianness and precision and so are unsuitable for anything but scratch storage.