Binary data without lazy loading (original) (raw)
Binary data without lazy loading#
Author: Aureliana Barghini (B-Open)
BackendEntrypoint#
Implement a subclass of BackendEntrypoint
that expose a method open_dataset
:
from xarray.backends import BackendEntrypoint
class MyBackendEntrypoint(BackendEntrypoint): def open_dataset( self, filename_or_obj, *, drop_variables=None, ):
return my_open_dataset(filename_or_obj, drop_variables=drop_variables)
BackendEntrypoint integration#
Declare this class as an external plugin in your setup.py
:
setuptools.setup( entry_points={ 'xarray.backends': ['engine_name=package.module:my_backendentrypoint'], }, )
or pass it in xr.open_dataset
:
xr.open_dataset(filename, engine=MyBackendEntrypoint)
Example backend for binary files#
Create sample files#
Define the entrypoint#
Example of backend to open binary files
It Works
But it may be memory demanding
<xarray.DataArray 'foo' (x: 30000000)> Size: 240MB [30000000 values with dtype=int64] Coordinates:
x (x) int64 240MB 0 10 20 30 ... 299999970 299999980 299999990
...
[30000000 values with dtype=int64]Coordinates: (1)
- x
(x)
int64
0 10 20 ... 299999980 299999990
array([ 0, 10, 20, ..., 299999970, 299999980, 299999990])
- x
Indexes: (1)
- PandasIndex
PandasIndex(Index([ 0, 10, 20, 30, 40, 50,
60, 70, 80, 90,
...
299999900, 299999910, 299999920, 299999930, 299999940, 299999950,
299999960, 299999970, 299999980, 299999990],
dtype='int64', name='x', length=30000000))
- PandasIndex
Attributes: (0)
<xarray.DataArray 'foo' (x: 30000000)> Size: 240MB [30000000 values with dtype=float64] Coordinates:
x (x) int64 240MB 0 10 20 30 ... 299999970 299999980 299999990
...
[30000000 values with dtype=float64]Coordinates: (1)
- x
(x)
int64
0 10 20 ... 299999980 299999990
array([ 0, 10, 20, ..., 299999970, 299999980, 299999990])
- x
Indexes: (1)
- PandasIndex
PandasIndex(Index([ 0, 10, 20, 30, 40, 50,
60, 70, 80, 90,
...
299999900, 299999910, 299999920, 299999930, 299999940, 299999950,
299999960, 299999970, 299999980, 299999990],
dtype='int64', name='x', length=30000000))
- PandasIndex
Attributes: (0)
<xarray.DataArray 'foo' (x: 11)> Size: 88B [11 values with dtype=float64] Coordinates:
x (x) int64 88B 0 10 20 30 40 50 60 70 80 90 100
...
[11 values with dtype=float64]Coordinates: (1)
- x
(x)
int64
0 10 20 30 40 50 60 70 80 90 100
array([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100])
- x
Indexes: (1)
- PandasIndex
PandasIndex(Index([0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100], dtype='int64', name='x'))
- PandasIndex
Attributes: (0)