Issue 9922: subprocess.getstatusoutput can fail with utf8 UnicodeDecodeError (original) (raw)

Created on 2010-09-22 22:06 by debatem1, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
issue9922-py3k.patch	ned.deily,2010-09-23 10:19
issue9922-31-rev1.patch	ned.deily,2010-12-12 08:16	31 backport (revision 1)
issue9922.34.diff	tim.golden,2013-11-03 17:23	review

Messages (12)
msg117156 - (view)	Author: geremy condra (debatem1)	Date: 2010-09-22 22:06
It looks like subprocess.getstatusoutput on 3.2a1 tries to coerce to UTF-8, which fails when dealing with bytes. This demonstrates the behavior nearly all the time for me on Ubuntu 10.04: >>> import subprocess >>> subprocess.getstatusoutput('dd if=/dev/random bs=1K count=1') Here's the tracebacks from a few runs: >>> subprocess.getstatusoutput('dd if=/dev/random bs=1K count=1') Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python3.2/subprocess.py", line 585, in getstatusoutput text = pipe.read() File "/usr/local/lib/python3.2/codecs.py", line 300, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf8' codec can't decode byte 0xcb in position 3: invalid continuation byte >>> subprocess.getstatusoutput('dd if=/dev/random bs=1K count=1') Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python3.2/subprocess.py", line 585, in getstatusoutput text = pipe.read() File "/usr/local/lib/python3.2/codecs.py", line 300, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf8' codec can't decode byte 0xe4 in position 2: invalid continuation byte >>> subprocess.getstatusoutput('dd if=/dev/random bs=1K count=1') Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python3.2/subprocess.py", line 585, in getstatusoutput text = pipe.read() File "/usr/local/lib/python3.2/codecs.py", line 300, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf8' codec can't decode byte 0xac in position 0: invalid start byte >>> subprocess.getstatusoutput('dd if=/dev/random bs=1K count=1') Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python3.2/subprocess.py", line 585, in getstatusoutput text = pipe.read() File "/usr/local/lib/python3.2/codecs.py", line 300, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf8' codec can't decode byte 0xf1 in position 0: invalid continuation byte >>> And here's the version info: Python 3.2a1 (r32a1:83318, Aug 13 2010, 22:32:03) [GCC 4.4.3] on linux2 Type "help", "copyright", "credits" or "license" for more information.
msg117176 - (view)	Author: Ned Deily (ned.deily) *	Date: 2010-09-23 10:19
In Python 3, subprocess.Popen returns stdout as "bytes" rather than "string" so it seems reasonable that subprocess.getstatusoutput should do the same. >>> subprocess.Popen(['dd if=/dev/random bs=1024 count=1'], shell=True, stdout=subprocess.PIPE).communicate()[0] 1+0 records in 1+0 records out 1024 bytes transferred in 0.000142 secs (7218432 bytes/sec) b'\x11\xfb\xe1w ... The problem is reproducible on 3.1 and py3k. The attached patch for py3k (with a backport to 3.1) corrects getstatusoutput to return bytes. It also includes a test case and updates the docs to show byte output.
msg123831 - (view)	Author: Ned Deily (ned.deily) *	Date: 2010-12-12 08:16
Update 31 backport patch to reflect revert of unittest method names prior to 3.1.3 release.
msg129976 - (view)	Author: STINNER Victor (vstinner) *	Date: 2011-03-03 15:48
I think that it is now too late to change getstatusoutput() output type (str->bytes). I prefer Unicode and I think that most users will have to decode bytes to Unicode anyway. So the right solution is to be able to configure encoding and errors used by TextIOWrapper: see issue #6135.
msg130006 - (view)	Author: Ned Deily (ned.deily) *	Date: 2011-03-03 22:08
I don't have a strong feeling about it but it seems to me that getstatusoutput is broken now so something should needs to be changed. If I understand your suggestion, adding a universal_newlines option to getstatusoutput similar to Popen, with a True (default) to return str False to return bytes, should provide an upwards compatible compromise. And whatever solution is adopted for Issue6135 should be able to be applied here as well. On the other hand, getstatusoutput was a carryover from the commands module in Python 2 and I'm not sure why it was not just removed in Python 3 as it seems to duplicate what can be done with Popen. Perhaps it should just be deprecated and removed?
msg150072 - (view)	Author: Ross Lagerwall (rosslagerwall)	Date: 2011-12-22 06:07
getstatusoutput() is broken given that it doesn't work on windows yet it should. I'd also recommend leaving the behavior as is and deprecating the function (and getoutput() while we're at it). A related bug (#10197) also recommends deprecating getstatusoutput.
msg150091 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2011-12-22 12:11
Well, getoutput and getstatusoutput are arguably ugly. However, since they are very high-level functions meant to quickly execute commands, returning str makes sense. You can't do anything smart with the output anyway, since stdout and stderr are intermingled: any binary data will be ruined by accompanying stderr output (e.g. warnings). If you want to process the output data instead of displaying it to the user, use check_call() instead. We could perhaps use the "surrogateescape" error handler in getoutput and getstatusoutput, but that's really putting lipstick on a pig.
msg150093 - (view)	Author: Alyssa Coghlan (ncoghlan) *	Date: 2011-12-22 14:04
It is indeed unfortunate that these two functions weren't killed off in the Py3k transition instead of being migrated from the commands module to subprocess (their current implementation runs counter to the entire subprocess security design by implicitly invoking the shell). Probably the most straightforward way to make them better behaved is to move them over to being based on subprocess.Popen: def getstatusoutput(cmd): try: data = check_output(cmd, shell=True, universal_newlines=True, stderr=STDOUT) status = 0 except CalledProcessError as ex: data = ex.output status = ex.returncode return status, data def getoutput(cmd): return getstatusoutput(cmd)[1]
msg202035 - (view)	Author: Tim Golden (tim.golden) *	Date: 2013-11-03 16:28
This has nearly been fixed by the rewrite of the get[status]output code under . There is an issue, though, with the use there of universal_newlines mode, which forces check_output to return a string rather than bytes.
msg202039 - (view)	Author: Tim Golden (tim.golden) *	Date: 2013-11-03 17:19
The code I've just committed to means that the get[status]output functions now pass their (few) tests on all platforms. More by chance than judgement, that change employed universal_newlines which has the effect of forcing the output of check_output to string rather than bytes. Having just re-read all the comments, we have three positions: a) Do nothing: these are outdated functions and anyone who has a problem with undecodable bytes will have to use one of the other functions where they have more control. b) Use the surrogateescape encoding in every case to produce some kind of output rather than an exception. c) Tweak the code to produce bytes in every case. This is actually simple: removing universal_newlines support will do this. (I already have working code for this). I think (b) is more trouble than it's worth. So (a) Do Nothing; or (c) Produce Bytes. Going by the law of "Status Quo wins", if no-one chimes in with a strong case for switching to bytes in a few days, I propose to close this as Won't Fix.
msg202045 - (view)	Author: Gregory P. Smith (gregory.p.smith) *	Date: 2013-11-03 17:41
If anybody using them in Python 3.3 is already depending upon them returning strings, changing it to return bytes will break their code... so that ship as unfortunately sailed. Changing that only in 3.4 would just cause churn and make code using this more difficult to be compatible across both. Updating the documentation to state the utf-8 decoding behavior and the caveats of that is important. That documentation note should also mention what people should use instead if they want binary data instead of text.
msg202631 - (view)	Author: Tim Golden (tim.golden) *	Date: 2013-11-11 15:13
Closing this as won't fix. The code has been reimplemented and additional documentation has been added over at . Given that these are legacy functions, I don't propose to do any more here.

History
Date	User	Action	Args
2022-04-11 14:57:06	admin	set	github: 54131
2013-11-11 15:13:58	tim.golden	set	status: open -> closedresolution: wont fixmessages: +
2013-11-03 17:41:31	gregory.p.smith	set	messages: +
2013-11-03 17:23:51	tim.golden	set	files: + issue9922.34.diff
2013-11-03 17:19:48	tim.golden	set	assignee: tim.goldenversions: + Python 3.3, Python 3.4, - Python 3.1, Python 3.2
2013-11-03 17:19:21	tim.golden	set	messages: +
2013-11-03 16:28:18	tim.golden	set	nosy: + tim.goldenmessages: +
2011-12-27 19:07:00	cvrebert	set	nosy: + cvrebert
2011-12-22 14:04:44	ncoghlan	set	messages: +
2011-12-22 12:11:29	pitrou	set	nosy: + pitrou, ncoghlanmessages: +
2011-12-22 06:07:02	rosslagerwall	set	nosy: + rosslagerwallmessages: +
2011-03-03 22:08:14	ned.deily	set	nosy:gregory.p.smith, astrand, vstinner, ned.deily, debatem1messages: +
2011-03-03 15:48:49	vstinner	set	nosy: + vstinnermessages: +
2010-12-12 08:16:40	ned.deily	set	files: - issue9922-31.patch
2010-12-12 08:16:28	ned.deily	set	files: + issue9922-31-rev1.patchmessages: +
2010-09-23 11:03:25	pitrou	set	nosy: + gregory.p.smith
2010-09-23 10:20:03	ned.deily	set	files: + issue9922-31.patch
2010-09-23 10:19:42	ned.deily	set	files: + issue9922-py3k.patchtitle: subprocess.getstatusoutput and bytes -> subprocess.getstatusoutput can fail with utf8 UnicodeDecodeErrorkeywords: + patchnosy: + astrand, ned.deilyversions: + Python 3.1messages: + stage: patch review
2010-09-22 22:06:38	debatem1	create