Issue 5202: wave.py cannot write wave files into a shell pipeline (original) (raw)

Issue5202

Created on 2009-02-10 11:45 by drj, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
wave-20090210.patch drj,2009-02-10 11:51 wave.py patch
writeheader_without_tell.diff gpolo,2009-02-10 16:56 review
wave_write_unseekable.patch serhiy.storchaka,2013-09-06 21:40 review
wave_write_unseekable_2.patch serhiy.storchaka,2013-11-11 18:16 review
Messages (20)
msg81539 - (view) Author: David Jones (drj) * Date: 2009-02-10 11:44
When using the wave module to output wave files, the output file cannot be a Unix pipeline. Example. The following program outputs a (trivial) wave file on stdout: #!/usr/bin/env python import sys import wave w = wave.open(sys.stdout, 'w') w.setnchannels(1) w.setsampwidth(1) w.setframerate(32000) w.setnframes(0) w.close() It can create a wave file like this: $ ./bugex > foo.wav When used in a pipeline we get: $ ./bugex | wc Traceback (most recent call last): File "./bugex", line 9, in w.close() File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/wave.py ", line 437, in close self._ensure_header_written(0) File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/wave.py ", line 458, in _ensure_header_written self._write_header(datasize) File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/wave.py ", line 465, in _write_header self._form_length_pos = self._file.tell() IOError: [Errno 29] Illegal seek Exception exceptions.IOError: (29, 'Illegal seek') in <bound method Wave_write.__del__ of <wave.Wave_write instance at 0x71418>> ignored 0 1 8 The wave module has almost all it needs to work around this problem. The wave module will only seek the output if it needs to patch the header. If you use setnframes to write the correct number of frames before writing them with writeframesraw then the header will not be patched upon calling close. However... The problem is that the "tell" method is invoked on the output stream (to record where the header is, in the event that we need to patch it); the "tell" method fails with an exception when the output is a pipeline (see example, above). Exceptions from "tell" when writing the header initially (in _write_header) should be ignored. If _patchheader is later invoked it will fail due to lack of pos.
msg81541 - (view) Author: David Jones (drj) * Date: 2009-02-10 11:51
Attached is a patch which is a diff from this version of wave.py : http://svn.python.org/view/*checkout*/python/trunk/Lib/wave.py?rev=54394
msg81542 - (view) Author: Guilherme Polo (gpolo) * (Python committer) Date: 2009-02-10 12:28
Wouldn't it be better if you only ignored the 'illegal seek' error instead of ignoring any ioerror (should it even be always discarded) ? I get a 'bad file descriptor' under Windows 7, but, again, can it be always discarded ? You can also reproduce the problem without using wave: >>> import sys >>> sys.stdout.tell() I'm really unsure about the proposed patch.
msg81546 - (view) Author: David Jones (drj) * Date: 2009-02-10 13:02
On 10 Feb 2009, at 12:28, Guilherme Polo wrote: > > Guilherme Polo <ggpolo@gmail.com> added the comment: > > Wouldn't it be better if you only ignored the 'illegal seek' error > instead of ignoring any ioerror (should it even be always discarded) ? No. > I > get a 'bad file descriptor' under Windows 7, but, again, can it be > always discarded ? Yes. To expand: Observe that the exception is raised when we are writing the header for the first time. The exception is not raised when we attempt to seek to patch the header, it is raised when we recording the file position so that we can seek to it later. We record the file position even though we might not use it later (the file position is only needed if we need to patch the header). So if we don't need to patch the header, we do not need the file position. So we can clearly ignore any error in attempting to get the file position. If we do need to patch the header, then we need the file position. If we do not have the file position (because the earlier attempt to get it failed), then patching the header will fail when it attempts a seek. This seems reasonable to me. > > You can also reproduce the problem without using wave: > >>>> import sys >>>> sys.stdout.tell() That does not reproduce the problem. The problem is not that tell raises an exception, the problem is that tell raises an exception and it is only being used to get some information that may be not needed later. Therefore the exception should be ignored, and a problem should only be raised if it turns out that we did need for information that we couldn't get. > > I'm really unsure about the proposed patch. Noted. I also note that my patch can be improved by removing its last 11 lines.
msg81547 - (view) Author: David Jones (drj) * Date: 2009-02-10 13:09
On 10 Feb 2009, at 12:28, Guilherme Polo wrote: > > Guilherme Polo <ggpolo@gmail.com> added the comment: > > I'm really unsure about the proposed patch. Perhaps my example was too trivial. The point is that if you call setnframes then you can get wave.py to avoid patching the header; so it does not need to seek on the output file. However, that _still_ doesn't let you pipe the output, because of the "tell" problem. That's what the patch is for. Here is a (slightly) less trivial example: #!/usr/bin/env python import sys import wave w = wave.open(sys.stdout, 'w') w.setnchannels(1) w.setsampwidth(1) w.setframerate(2000) w.setnframes(100) for _ in range(50): w.writeframesraw('\x00\xff') w.close() (The wave file that it outputs is 100ms of 1000 Hz sine wave by the way) Note the call to setnframes _before_ the data is written. That's what means the header does not need to be patched. With my patch applied the output of this program can be fed to a pipe. If you remove the call to setnframes then the header will need to be patched, and this still (correctly, usefully) raises an error with my patch applied.
msg81553 - (view) Author: David Jones (drj) * Date: 2009-02-10 14:36
On 10 Feb 2009, at 13:02, David Jones wrote: > > I also note that my patch can be improved by removing its last 11 > lines. Er, no it can't. What was I thinking?
msg81563 - (view) Author: Guilherme Polo (gpolo) * (Python committer) Date: 2009-02-10 16:56
I see what you want to do, but I fell really uncomfortable by totally ignoring IOError. I could get a bad file descriptor under Linux too, and I wouldn't like to see it discarded for no reason. Now, is there some problem if we remove the calls to the "tell" method in _write_header ? See patch attached (tests are very welcome too).
msg81601 - (view) Author: David Jones (drj) * Date: 2009-02-10 21:15
On 10 Feb 2009, at 16:57, Guilherme Polo wrote: > > Guilherme Polo <ggpolo@gmail.com> added the comment: > > Now, is there some problem if we remove the calls to the "tell" method > in _write_header ? See patch attached (tests are very welcome too). Yes
msg81665 - (view) Author: David Jones (drj) * Date: 2009-02-11 20:34
On 10 Feb 2009, at 21:15, David Jones wrote: > > David Jones <drj@pobox.com> added the comment: > > On 10 Feb 2009, at 16:57, Guilherme Polo wrote: > >> >> Guilherme Polo <ggpolo@gmail.com> added the comment: >> >> Now, is there some problem if we remove the calls to the "tell" >> method >> in _write_header ? See patch attached (tests are very welcome too). > > Yes Ahem. Pardon me for answering you without reading your patch. I have now read your patch and it does more than just remove the calls to "tell". In fact it looks very fine. It makes wave.py more like sunau.py in that it "just knows" what the offsets into the header are. I think I like that (especially the way you use the struct format string to compute the second offset). It also removes that nagging question at the back of my mind: "why does wave.py use tell when it could simply just know the offsets, which are constant anyway?". And it works. How cool is that? I had changed my project to use sunau anyway, because that worked with pipes already. Tests, you say...
msg81676 - (view) Author: Guilherme Polo (gpolo) * (Python committer) Date: 2009-02-11 21:36
Nice. I said tests in hope wave gets more tests, since right one there is a single one. I will see if I can produce something.
msg81735 - (view) Author: David Jones (drj) * Date: 2009-02-12 09:00
The following program does a very basic do-i-get-back-what-i-wrote test. sunau can't cope; I am investigating. #!/usr/bin/env python # IdIdId # Audio File Tests import aifc import sunau import wave import struct import sys from StringIO import StringIO frames = struct.pack('256B', *range(256)) log = sys.stderr # Basic test of reproducability. # We test that a set of frames (an entirely artifical set, see `frames`, # above) can be written to an audio file and read back again to get the # same set of frames. # We test mono/stereo, 8-bit/16-bit, and a few framerates. # As of 2009-02-12 sunau does not pass these tests, so I recommend that # you remove it. for af in (aifc, sunau, wave): for nchannels in (1, 2): for sampwidth in (1, 2): for framerate in (11000, 44100, 96000): print >> log, "%s %d/%d/%d" % (af.__name__, nchannels, sampwidth, framerate) f = StringIO() w = af.open(f, 'w') w.setnchannels(nchannels) w.setsampwidth(sampwidth) w.setframerate(framerate) w.writeframesraw(frames) w.close() s = f.getvalue() f = StringIO(s) w = af.open(f) assert w.getnchannels() == nchannels assert w.getsampwidth() == sampwidth assert w.getframerate() == framerate assert w.readframes(len(frames)//nchannels//sampwidth) == frames assert w.readframes(1) == ''
msg81903 - (view) Author: David Jones (drj) * Date: 2009-02-13 10:45
On 12 Feb 2009, at 09:00, David Jones wrote: > > David Jones <drj@pobox.com> added the comment: > > The following program does a very basic do-i-get-back-what-i-wrote > test. > sunau can't cope; I am investigating. I see. sunau uses mu-law compression by default which makes it non- invertable. How stupid. Inserting: w.setcomptype('NONE', 'Pointless Argument') just after setframerate fixes the tests so that all 3 modules pass. Of course, this is still only the most very basic test that one might want to do. And it doesn't cover the case mentioned in this bug report anyway. (drat, just found this, should've sent it yesterday)
msg112649 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2010-08-03 18:39
Is this still a problem with 2.7-3.2? GP, what state do you think either patch is in?
msg182113 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-02-14 17:58
> Now, is there some problem if we remove the calls to the "tell" method > in _write_header ? See patch attached (tests are very welcome too). Yes, there is a problem. User can pass already open file to wave.open() and file position can be not 0 at the start of the WAVE file. But you can do with only one tell(). Note a magic number 36 in many places of the code. This is struct.calcsize(wave_header_format). Test is needed as well as a documentation change. I think this is rather a new feature and should be added only in 3.4. Actually the current behavior is documented: "If *file* is a string, open the file by that name, otherwise treat it as a seekable file-like object."
msg196848 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-09-03 14:47
Here is corrected patch (it uses relative seek()) with a lot of tests.
msg197110 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-09-06 21:39
Oh, I forgot attach a patch. In any case it already slightly outdated. After looking at other audio modules I think David's approach is better. It is also used in the chunk module. Here is updated patch with tests (tests are not final, provides better tests).
msg197111 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-09-06 21:40
Oh, I forgot attach a patch. In any case it already slightly outdated. After looking at other audio modules I think David's approach is better. It is also used in the chunk module. Here is updated patch with tests (tests are not final, provides better tests).
msg202636 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-11-11 18:16
Here is simplified and updated to tip patch.
msg203025 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-11-16 11:04
New changeset 6a599249e8b7 by Serhiy Storchaka in branch 'default': Issue #5202: Added support for unseekable files in the wave module. http://hg.python.org/cpython/rev/6a599249e8b7
msg212938 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2014-03-08 17:54
New changeset b861c7717c79 by R David Murray in branch 'default': whatsnew: Wave_write handles unseekable files. (#5202) http://hg.python.org/cpython/rev/b861c7717c79
History
Date User Action Args
2022-04-11 14:56:45 admin set github: 49452
2014-03-08 17:54:16 python-dev set messages: +
2013-11-16 11:57:36 serhiy.storchaka set status: open -> closedresolution: fixedstage: patch review -> resolved
2013-11-16 11:04:27 python-dev set nosy: + python-devmessages: +
2013-11-11 18:16:08 serhiy.storchaka set files: + wave_write_unseekable_2.patchmessages: +
2013-09-06 21:40:45 serhiy.storchaka set assignee: serhiy.storchakastage: needs patch -> patch review
2013-09-06 21:40:09 serhiy.storchaka set files: + wave_write_unseekable.patchmessages: +
2013-09-06 21:39:23 serhiy.storchaka set dependencies: + Unify audio modules testsmessages: +
2013-09-03 14:47:34 serhiy.storchaka set messages: +
2013-02-14 17:58:51 serhiy.storchaka set versions: + Python 3.4, - Python 3.1, Python 2.7, Python 3.2nosy: + serhiy.storchakamessages: + type: behavior -> enhancementstage: patch review -> needs patch
2010-08-03 18:39:04 terry.reedy set versions: + Python 3.1, Python 2.7, Python 3.2, - Python 2.6, Python 2.5nosy: + terry.reedymessages: + stage: patch review
2009-02-13 10:45:52 drj set messages: +
2009-02-12 09:00:10 drj set messages: +
2009-02-11 21:36:52 gpolo set messages: +
2009-02-11 20:34:15 drj set messages: +
2009-02-10 21:15:50 drj set messages: +
2009-02-10 16:57:00 gpolo set files: + writeheader_without_tell.diffmessages: +
2009-02-10 14:36:30 drj set messages: +
2009-02-10 13:09:29 drj set messages: +
2009-02-10 13:02:16 drj set messages: +
2009-02-10 12:28:41 gpolo set nosy: + gpolomessages: +
2009-02-10 11:51:51 drj set files: + wave-20090210.patchkeywords: + patchmessages: +
2009-02-10 11:45:00 drj create