Issue 22915: sax.parser cannot get xml data from a subprocess pipe (original) (raw)

Created on 2014-11-21 23:54 by JocelynJ, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
test-22915.diff JocelynJ,2014-11-22 22:52 test review
toto.py JocelynJ,2014-11-22 22:57 testcase
sax_non_str_file_name.patch serhiy.storchaka,2014-11-23 10:47 review
Messages (12)
msg231504 - (view) Author: Jocelyn (JocelynJ) * Date: 2014-11-21 23:54
With the attached code, and an xml file, I got the following error with python 3.4.2: $ cat toto.xml $ python3.4 toto.py Traceback (most recent call last): File "toto.py", line 10, in parse(proc.stdout, ContentHandler()) File "/usr/lib/python3.4/xml/sax/__init__.py", line 33, in parse parser.parse(source) File "/usr/lib/python3.4/xml/sax/expatreader.py", line 107, in parse xmlreader.IncrementalParser.parse(self, source) File "/usr/lib/python3.4/xml/sax/xmlreader.py", line 119, in parse self.prepareParser(source) File "/usr/lib/python3.4/xml/sax/expatreader.py", line 111, in prepareParser self._parser.SetBase(source.getSystemId()) TypeError: must be str, not int The same test works with python2, and I would expect the code to work with python 3 too. Thanks, Jocelyn
msg231505 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-11-22 00:10
My guess is that this is similar to issue 21044.
msg231508 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-11-22 02:29
I've looked at the sax code, and this does indeed have the same root cause: in python2 a dummy string was used for the 'name' attribute of io objects that are based on file descriptors, whereas in python3 the name is the integer value of the file descriptor. In test_sax we can see getSystemId being tested that it returns a filename (see test_expat_locator_withinfo). The fix should be analogous to the issue 21044 fix: check that 'name' is a string and if not don't use it. I'm marking this as easy; hopefully someone will want to tackle figuring out exactly where in the code it needs to be fixed and adding tests for it.
msg231537 - (view) Author: Jocelyn (JocelynJ) * Date: 2014-11-22 22:52
Here is a patch to add a test to test_sax.py. I'm not sure on the fix. Is the following one a correct one ? def prepareParser(self, source): if source.getSystemId() is not None: - self._parser.SetBase(source.getSystemId()) + self._parser.SetBase(str(source.getSystemId())) Or should I remove the call to SetBase if getSystemId() is not a string ?
msg231538 - (view) Author: Jocelyn (JocelynJ) * Date: 2014-11-22 22:57
Forgot to attach the testcase when opening the bug.
msg231545 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-11-23 05:46
Has anyone investigated what exactly sax uses SystemId/SetBase *for*? I think think we need that info in order to decide what to do, and I'm not familiar with sax.
msg231553 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-11-23 10:47
This bug should be fixed in other place. Here is a patch.
msg231568 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-11-23 16:27
Serhiy's patch looks correct to me. Given that if the source doesn't have a name attribute it is simply not set in the existing code, this change should be "safe" (backward compatible). Elsewhere the possibility was raised of converting the int to a string (<fdopen: N>), but that issue is a more global one and would apply at the io module level...and if implemented this fix would automatically take advantage of it. So I think this should be committed.
msg231572 - (view) Author: Jocelyn (JocelynJ) * Date: 2014-11-23 17:35
The only explicit documentation I found on SystemId is from the java specification (it is my understanding that python sax implementation is adapted from Java one): http://www.saxproject.org/apidoc/org/xml/sax/InputSource.html#setSystemId%28java.lang.String%29 The documentation says that "The system identifier is optional if there is a byte stream or a character stream". So, I agree that Serhiy's patch looks correct. Note that I'm not sure that my testcase with a subprocess is covered by Serhiy's tests, as these tests call parser() with a file object.
msg231777 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2014-11-27 20:17
New changeset 27ae1a476ef7 by Serhiy Storchaka in branch '3.4': Issue #22915: SAX parser now supports files opened with file descriptor or https://hg.python.org/cpython/rev/27ae1a476ef7 New changeset ce9881eecfb4 by Serhiy Storchaka in branch 'default': Issue #22915: SAX parser now supports files opened with file descriptor or https://hg.python.org/cpython/rev/ce9881eecfb4
msg231780 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-11-27 20:53
Jocelyn, in both cases the argument of parse() is a file object with integer name. Tests use one of simplest way to create such object.
msg231782 - (view) Author: Jocelyn (JocelynJ) * Date: 2014-11-27 21:10
Ok, I understand your tests now. Thanks for the fixes !
History
Date User Action Args
2022-04-11 14:58:10 admin set github: 67104
2014-11-27 21:10:16 JocelynJ set messages: +
2014-11-27 20:53:22 serhiy.storchaka set status: open -> closedresolution: fixedmessages: + stage: commit review -> resolved
2014-11-27 20:17:05 python-dev set nosy: + python-devmessages: +
2014-11-23 17:35:31 JocelynJ set messages: +
2014-11-23 16:27:17 r.david.murray set messages: + stage: patch review -> commit review
2014-11-23 10:47:15 serhiy.storchaka set files: + sax_non_str_file_name.patchnosy: + christian.heimes, serhiy.storchakamessages: + assignee: serhiy.storchakastage: needs patch -> patch review
2014-11-23 05:46:36 r.david.murray set messages: +
2014-11-22 22:57:39 JocelynJ set files: + toto.pymessages: +
2014-11-22 22:52:19 JocelynJ set files: + test-22915.diffkeywords: + patchmessages: +
2014-11-22 09:33:53 Avneesh.Chadha set nosy: + Avneesh.Chadha
2014-11-22 08:02:54 serhiy.storchaka set stage: needs patch
2014-11-22 02:29:52 r.david.murray set keywords: + easymessages: +
2014-11-22 00:11:13 r.david.murray set versions: + Python 3.5
2014-11-22 00:10:02 r.david.murray set nosy: + r.david.murraymessages: +
2014-11-21 23:54:11 JocelynJ create