Issue 4785: json.JSONDecoder() strict argument undocumented and potentially confusing (original) (raw)
Created on 2008-12-30 17:42 by beazley, last changed 2022-04-11 14:56 by admin. This issue is now closed.
Messages (6)
Author: David M. Beazley (beazley)
Date: 2008-12-30 17:42
The strict parameter to JSONDecoder() is undocumented and is confusing because someone might assume it has something to do with the encoding parameter or the general handling of parsing errors (which it doesn't).
As far as I can determine by reading the source, strict determines whether or not JSON strings are allowed to contain literal newlines in them or not. For example (note: loads() passes its parameters to JSONDecoder):
s = '{"test":"Hello\nWorld"}' print(s) {"test":"Hello World"} json.loads(s) Traceback (most recent call last): ... File "/tmp/lib/python3.0/json/decoder.py", line 159, in JSONString return scanstring(match.string, match.end(), encoding, strict) ValueError: Invalid control character at: line 1 column 14 (char 14)
json.loads(s,strict=False) {'test': 'Hello\nWorld'}
Note in this last example how the result has the literal newline embedded in it when strict is set False.
Author: Tal Einat (taleinat) *
Date: 2010-06-04 15:10
This goes down into _json.scanstring. Looking at the C code for scanstring_unicode, the strict parameter allow control characters inside strings: "if strict is zero then literal control characters are allowed". From the code itself (current py3k head, r81032), it seems this means any character <= 0x1f. See scanstring_unicode in http://svn.python.org/view/python/branches/py3k/Modules/_json.c?revision=81032&view=markup for details.
Documentation should be updated accordingly.
Author: Tal Einat (taleinat) *
Date: 2010-06-04 15:13
This goes down into _json.scanstring. Looking at the C code for scanstring_unicode, strict=False allows control characters inside strings: "if strict is zero then literal control characters are allowed". From the code itself (current py3k head, r81032), it seems this means any character <= 0x1f. See scanstring_unicode in http://svn.python.org/view/python/branches/py3k/Modules/_json.c?revision=81032&view=markup for details.
Documentation should be updated accordingly.
Author: Tal Einat (taleinat) *
Date: 2010-06-05 09:18
Documentation patch attached against py3k branch.
Changes are:
- Added to documentation of JSONDecoder:
If strict is False
(True
is the default), then control characters will be allowed inside strings. Control characters in this context are those with character codes in the 0-31 range, including '\t'
(tab), '\n'
, '\r'
and '\0'
.
Added clarification in documentation of json.load and json.dump that unless the cls kwarg is specified, the JSONEncoder/JSONDecoder class will be used.
Mirrored these additions in the relevant doc-strings (JSONDecoder.init, json.load, json.loads, json.dump, json.dumps).
Copied description of the object_pairs_hook kwargs from the documentation to the relevant doc-strings, which otherwise fully mirrored the documentation. (json.load, json.loads, JSONDecoder.init)
Author: Tal Einat (taleinat) *
Date: 2010-06-05 09:19
Similar patch against trunk; same changes as for the py3k branch.
Author: Georg Brandl (georg.brandl) *
Date: 2010-10-15 17:04
Thanks, applied in r85543 and r85544.
History
Date
User
Action
Args
2022-04-11 14:56:43
admin
set
github: 49035
2010-10-15 17:04:56
georg.brandl
set
status: open -> closed
resolution: fixed
messages: +
2010-06-05 09:20:11
taleinat
set
versions: + Python 2.7
2010-06-05 09:19:44
taleinat
set
files: + json_docs_trunk.diff
messages: +
2010-06-05 09🔞23
taleinat
set
files: + json_docs_py3k.diff
keywords: + patch
messages: +
versions: + Python 3.2, - Python 2.6, Python 3.0
2010-06-04 15:13:47
taleinat
set
messages: +
2010-06-04 15:10:12
taleinat
set
nosy: + taleinat
messages: +
2008-12-30 17:42:13
beazley
create