Issue 4785: json.JSONDecoder() strict argument undocumented and potentially confusing (original) (raw)

Created on 2008-12-30 17:42 by beazley, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (6)

msg78550 - (view)

Author: David M. Beazley (beazley)

Date: 2008-12-30 17:42

The strict parameter to JSONDecoder() is undocumented and is confusing because someone might assume it has something to do with the encoding parameter or the general handling of parsing errors (which it doesn't).

As far as I can determine by reading the source, strict determines whether or not JSON strings are allowed to contain literal newlines in them or not. For example (note: loads() passes its parameters to JSONDecoder):

s = '{"test":"Hello\nWorld"}' print(s) {"test":"Hello World"} json.loads(s) Traceback (most recent call last): ... File "/tmp/lib/python3.0/json/decoder.py", line 159, in JSONString return scanstring(match.string, match.end(), encoding, strict) ValueError: Invalid control character at: line 1 column 14 (char 14)

json.loads(s,strict=False) {'test': 'Hello\nWorld'}

Note in this last example how the result has the literal newline embedded in it when strict is set False.

msg107067 - (view)

Author: Tal Einat (taleinat) * (Python committer)

Date: 2010-06-04 15:10

This goes down into _json.scanstring. Looking at the C code for scanstring_unicode, the strict parameter allow control characters inside strings: "if strict is zero then literal control characters are allowed". From the code itself (current py3k head, r81032), it seems this means any character <= 0x1f. See scanstring_unicode in http://svn.python.org/view/python/branches/py3k/Modules/_json.c?revision=81032&view=markup for details.

Documentation should be updated accordingly.

msg107068 - (view)

Author: Tal Einat (taleinat) * (Python committer)

Date: 2010-06-04 15:13

This goes down into _json.scanstring. Looking at the C code for scanstring_unicode, strict=False allows control characters inside strings: "if strict is zero then literal control characters are allowed". From the code itself (current py3k head, r81032), it seems this means any character <= 0x1f. See scanstring_unicode in http://svn.python.org/view/python/branches/py3k/Modules/_json.c?revision=81032&view=markup for details.

Documentation should be updated accordingly.

msg107125 - (view)

Author: Tal Einat (taleinat) * (Python committer)

Date: 2010-06-05 09:18

Documentation patch attached against py3k branch.

Changes are:

If strict is False (True is the default), then control characters will be allowed inside strings. Control characters in this context are those with character codes in the 0-31 range, including '\t' (tab), '\n', '\r' and '\0'.

msg107126 - (view)

Author: Tal Einat (taleinat) * (Python committer)

Date: 2010-06-05 09:19

Similar patch against trunk; same changes as for the py3k branch.

msg118806 - (view)

Author: Georg Brandl (georg.brandl) * (Python committer)

Date: 2010-10-15 17:04

Thanks, applied in r85543 and r85544.

History

Date

User

Action

Args

2022-04-11 14:56:43

admin

set

github: 49035

2010-10-15 17:04:56

georg.brandl

set

status: open -> closed
resolution: fixed
messages: +

2010-06-05 09:20:11

taleinat

set

versions: + Python 2.7

2010-06-05 09:19:44

taleinat

set

files: + json_docs_trunk.diff

messages: +

2010-06-05 09🔞23

taleinat

set

files: + json_docs_py3k.diff
keywords: + patch
messages: +

versions: + Python 3.2, - Python 2.6, Python 3.0

2010-06-04 15:13:47

taleinat

set

messages: +

2010-06-04 15:10:12

taleinat

set

nosy: + taleinat
messages: +

2008-12-30 17:42:13

beazley

create