add support for opening file from memory by thehesiod · Pull Request #652 · Unidata/netcdf4-python (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation40 Commits12 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

thehesiod

@thehesiod

@thehesiod

btw unittests are failing because it can't find the nc_open_mem method, not sure if this should be dynamically looked up or bump up the minimum netCDF version required

@jswhit

This is great, thanks!

To fix the issue with the tests, take a look at the check_api function in setup.py. You can add a check there to see if the include file has nc_open_mem. setup.py should then modify include/constants.pyx to add DEF HAS_NC_OPEN_MEM = 1. Thenc_open_mem calls in _netCDF4.pyx can then be wrapped with IF HAS_NC_OPEN_MEM:.

@thehesiod

btw, why is netCDF4/_netCDF4.c checked in? it's generated when setup.py is run.

@jswhit

We'll also need a test.

_netCD4.c has to be manually generated. It should be created with all the values in include/constants.pyx set to 0. It's only used if cython is not installed.

@jswhit

Probably we should just require cython and get rid of _netCDF.c in the repo.

@matthew-brett

Cython is super-easy to install these days - there are wheels for many versions for OSX, Windows, Linux.

@thehesiod

working on updates, btw mind if I PEP'ify some of the files? get a lot of warnings in PyCharm

@thehesiod

@thehesiod

ok, made nc_open_mem lookup dynamic and added a test.

@thehesiod

btw, I'm not even sure how the .c file here helps because isn't it specific to a python version / platform / cython version? Not sure how that ever worked correctly. The one in this PR is definitely wrong so need to figure out how to create a "correct" one.

@thehesiod

btw let me know how I should fix this new unittest error. I'm guessing it's compiling with all the settings off. Further it won't work until there's a version of netcdf-c with the fix

@shoyer

Probably we should just require cython and get rid of _netCDF.c in the repo.

The standard way to do this is to generate C files for upload to pypi, but not include them in the source.
cytoolz has a relatively sane version of this setup: https://github.com/pytoolz/cytoolz/blob/master/setup.py

shoyer

@@ -1712,7 +1721,7 @@ references to the parent Dataset or Group.
the parent Dataset or Group."""
def __init__(self, filename, mode='r', clobber=True, format='NETCDF4',

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make filename optional if memory is provided?

Even if netCDF-C needs a bogus filename argument, I imagine we could generate something at random.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It gets reflected in the Dataset.filepath attribute so I think we should keep it. My unittest actually validates this

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I guess so. It just seems pretty pointless to me :)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya, weirdness :)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some text should be added to the docstring noting that filename is only used to set the value of filepath when memory is specified.

@jswhit

Looks like the _netCDF.c still uses nc_open_mem, so that will cause a failing test. To generate one that will work with the minimum version of the netcdf library, edit constants.pyx and set all the elements to zero, then cd _netCDF; cython -I../include _netCDF4.pyx, then commit the new version of _netCDF.c.

I'd like to remove _netCDF.c from the repo, but that should be a separate pull request.

Are the other failing tests due to the bug in Unidata/netcdf-c#394?

@thehesiod

@thehesiod

updated, so I think the tests are failing now because it's running the mem test with the feature disabled?

@jswhit

The test fails on my machine even though I have 4.4.1 (with nc_open_mem). I guess this is because of Unidata/netcdf-c#394? If so, a solution would be to disable the test in run_all.py if __netcdf4libversion__ < 4.4.1.2 (assuming that the fix gets in the next release of netcdf-c).

@thehesiod

jswhit

"""
cdef int grpid, ierr, numgrps, numdims, numvars
cdef char *path
cdef char namstring[NC_MAX_NAME+1]
memset(&self._buffer, 0, sizeof(self._buffer))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably ought to add _buffer to _private_atts so it doesn't get mistaken for a netcdf attribute.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

@jswhit

Don't see any real benefit to using memoryviews, since we're not slicing or manipulating the buffer in any way.

jswhit

# so it will compile with versions <= 4.2.
NC_DISKLESS = 0x0008
NC_INMEMORY = 0x8000

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't look like NC_INMEMORY is ever used

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

jswhit

class TestOpenMem(unittest.TestCase):
def test_mem_open(self):
# Needs: https://github.com/Unidata/netcdf-c/pull/400
if netCDF4.__netcdf4libversion__ >= '4.4.1.2':

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just add this check in run_all.py so the test is skipped altogether?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually updated the test to test different scenarios of this

@jswhit

Merging now - thanks again for this contribution!

@thehesiod

thank you for the quick responses! pleasure!

@TWellman

Is there a working example/documentation based on this discussion for pulling a dataset via a url into memory using the newest netCDF4 version? I've read material on adding a memory=bytes construct in the init function to use the in-memory read capability from nc_open_mem(). However, my initial attempts have produced file location errors - "No such file or directory". I also ran a portion of the test script provided at https://github.com/Unidata/netcdf4-python/blob/master/test/tst_open_mem.py. An error was raised for netCDF4.netcdf4libversion = '4.4.1.1' (< 4.4.1.2). Do I need to pull from Unidata/netcdf-c#400 and integrate netcdf library changes directly into the package? Clarification is appreciated!

@TWellman

I am testing on a Jupyter notebook using Python 3.5 and netCDF4 version 1.3.0, but could switch if needed.

@jswhit

@dopplershift

Another option is to pass it a filename to an empty file, like from tempfile.NamedTemporaryFile

@TWellman

Great, substituting a completely unrelated local netCDF file as the "filename" but also passing a memory file in byte format proved to be successful. The returned file dimensions were consistent with the memory file rather than the unrelated local netCDF file with different dimensions/variables. So that checks out.

I also attempted unsuccessfully to use the temp file option using FILE_NAME = tempfile.NamedTemporaryFile(suffix='.nc', delete=False).name, mainly because it is much cleaner in my view. However, using a temp file as prescribed returned an OSError: NetCDF: Unknown file format. Should I format the temp file in a specific way? Thank you.

@thehesiod

I think "filename" just needs to be non-empty string

@TWellman

If I just input a text string the code returns an OSError: No such file or directory, e.g. using nc_file = netCDF4.Dataset('temp_file', mode = 'r', memory=bytefile)

@thehesiod

what version of netcdf-c ?

@dopplershift

@TWellman I needed it to be a netCDF file (apparently--I guess I forgot about it). This is what I've done for now until netCDF 4.5 is released:

with NamedTemporaryFile(suffix='.nc') as temp:
    # Create a temporary netCDF file to work around bug in netCDF C 4.4.1.1
    # We shouldn't actually need any file on disk.
    nc_temp = Dataset(temp.name, 'w').close()
    nc =  Dataset(temp.name, memory=mem_buffer)

@TWellman

I should probably upgrade as suggested earlier. I am using netCDF4.netcdf4libversion = '4.4.1.1', netCDF4 version 1.3.0.

@TWellman