add support for opening file from memory by thehesiod · Pull Request #652 · Unidata/netcdf4-python (original) (raw)
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation40 Commits12 Checks0 Files changed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
btw unittests are failing because it can't find the nc_open_mem
method, not sure if this should be dynamically looked up or bump up the minimum netCDF version required
This is great, thanks!
To fix the issue with the tests, take a look at the check_api
function in setup.py
. You can add a check there to see if the include file has nc_open_mem
. setup.py should then modify include/constants.pyx
to add DEF HAS_NC_OPEN_MEM = 1
. Thenc_open_mem
calls in _netCDF4.pyx
can then be wrapped with IF HAS_NC_OPEN_MEM:
.
btw, why is netCDF4/_netCDF4.c checked in? it's generated when setup.py is run.
We'll also need a test.
_netCD4.c has to be manually generated. It should be created with all the values in include/constants.pyx set to 0. It's only used if cython is not installed.
Probably we should just require cython and get rid of _netCDF.c in the repo.
Cython is super-easy to install these days - there are wheels for many versions for OSX, Windows, Linux.
working on updates, btw mind if I PEP'ify some of the files? get a lot of warnings in PyCharm
ok, made nc_open_mem lookup dynamic and added a test.
btw, I'm not even sure how the .c file here helps because isn't it specific to a python version / platform / cython version? Not sure how that ever worked correctly. The one in this PR is definitely wrong so need to figure out how to create a "correct" one.
btw let me know how I should fix this new unittest error. I'm guessing it's compiling with all the settings off. Further it won't work until there's a version of netcdf-c with the fix
Probably we should just require cython and get rid of _netCDF.c in the repo.
The standard way to do this is to generate C files for upload to pypi, but not include them in the source.
cytoolz has a relatively sane version of this setup: https://github.com/pytoolz/cytoolz/blob/master/setup.py
@@ -1712,7 +1721,7 @@ references to the parent Dataset or Group. |
---|
the parent Dataset or Group.""" |
def __init__(self, filename, mode='r', clobber=True, format='NETCDF4', |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make filename
optional if memory
is provided?
Even if netCDF-C needs a bogus filename
argument, I imagine we could generate something at random.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It gets reflected in the Dataset.filepath
attribute so I think we should keep it. My unittest actually validates this
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I guess so. It just seems pretty pointless to me :)
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ya, weirdness :)
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some text should be added to the docstring noting that filename
is only used to set the value of filepath
when memory
is specified.
Looks like the _netCDF.c
still uses nc_open_mem
, so that will cause a failing test. To generate one that will work with the minimum version of the netcdf library, edit constants.pyx
and set all the elements to zero, then cd _netCDF; cython -I../include _netCDF4.pyx
, then commit the new version of _netCDF.c
.
I'd like to remove _netCDF.c from the repo, but that should be a separate pull request.
Are the other failing tests due to the bug in Unidata/netcdf-c#394?
updated, so I think the tests are failing now because it's running the mem test with the feature disabled?
The test fails on my machine even though I have 4.4.1 (with nc_open_mem). I guess this is because of Unidata/netcdf-c#394? If so, a solution would be to disable the test in run_all.py if __netcdf4libversion__
< 4.4.1.2 (assuming that the fix gets in the next release of netcdf-c).
""" |
---|
cdef int grpid, ierr, numgrps, numdims, numvars |
cdef char *path |
cdef char namstring[NC_MAX_NAME+1] |
memset(&self._buffer, 0, sizeof(self._buffer)) |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably ought to add _buffer
to _private_atts
so it doesn't get mistaken for a netcdf attribute.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
Don't see any real benefit to using memoryviews, since we're not slicing or manipulating the buffer in any way.
# so it will compile with versions <= 4.2. |
---|
NC_DISKLESS = 0x0008 |
NC_INMEMORY = 0x8000 |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't look like NC_INMEMORY
is ever used
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
class TestOpenMem(unittest.TestCase): |
---|
def test_mem_open(self): |
# Needs: https://github.com/Unidata/netcdf-c/pull/400 |
if netCDF4.__netcdf4libversion__ >= '4.4.1.2': |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just add this check in run_all.py so the test is skipped altogether?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually updated the test to test different scenarios of this
Merging now - thanks again for this contribution!
thank you for the quick responses! pleasure!
Is there a working example/documentation based on this discussion for pulling a dataset via a url into memory using the newest netCDF4 version? I've read material on adding a memory=bytes construct in the init function to use the in-memory read capability from nc_open_mem(). However, my initial attempts have produced file location errors - "No such file or directory". I also ran a portion of the test script provided at https://github.com/Unidata/netcdf4-python/blob/master/test/tst_open_mem.py. An error was raised for netCDF4.netcdf4libversion = '4.4.1.1' (< 4.4.1.2). Do I need to pull from Unidata/netcdf-c#400 and integrate netcdf library changes directly into the package? Clarification is appreciated!
I am testing on a Jupyter notebook using Python 3.5 and netCDF4 version 1.3.0, but could switch if needed.
Another option is to pass it a filename to an empty file, like from tempfile.NamedTemporaryFile
Great, substituting a completely unrelated local netCDF file as the "filename" but also passing a memory file in byte format proved to be successful. The returned file dimensions were consistent with the memory file rather than the unrelated local netCDF file with different dimensions/variables. So that checks out.
I also attempted unsuccessfully to use the temp file option using FILE_NAME = tempfile.NamedTemporaryFile(suffix='.nc', delete=False).name, mainly because it is much cleaner in my view. However, using a temp file as prescribed returned an OSError: NetCDF: Unknown file format. Should I format the temp file in a specific way? Thank you.
I think "filename" just needs to be non-empty string
If I just input a text string the code returns an OSError: No such file or directory, e.g. using nc_file = netCDF4.Dataset('temp_file', mode = 'r', memory=bytefile)
what version of netcdf-c ?
@TWellman I needed it to be a netCDF file (apparently--I guess I forgot about it). This is what I've done for now until netCDF 4.5 is released:
with NamedTemporaryFile(suffix='.nc') as temp:
# Create a temporary netCDF file to work around bug in netCDF C 4.4.1.1
# We shouldn't actually need any file on disk.
nc_temp = Dataset(temp.name, 'w').close()
nc = Dataset(temp.name, memory=mem_buffer)
I should probably upgrade as suggested earlier. I am using netCDF4.netcdf4libversion = '4.4.1.1', netCDF4 version 1.3.0.