Issue 4783: document that json.load/dump can’t be used twice on the same stream (original) (raw)

Created on 2008-12-30 17:16 by beazley, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
issue4783.diff ezio.melotti,2011-04-13 08:24 Patch to add a note against 2.7 review
Messages (10)
msg78547 - (view) Author: David M. Beazley (beazley) Date: 2008-12-30 17:16
The json module is described as having an interface similar to pickle: json.dump() json.dumps() json.load() json.loads() I think it would be a WISE idea to add a huge warning message to the documentation that these functions should *NOT* be used to serialize or unserialize multiple objects on the same file stream like pickle. For example: f = open("stuff","w") json.dump(obj1, f) json.dump(obj2, f) # NO! FLAMING DEATH! f = open("stuff","r") obj1 = json.load(f) obj2 = json.load(f) # NO! EXTRA CRIPSY FLAMING DEATH! For one, it doesn't work. load() actually reads the whole file into a big string and tries to parse it as a single object. If there are multiple objects in the file, you get a nasty exeption. Second, I'm not even sure this is technically allowed by the JSON spec. As far as I call tell, concatenating JSON objects together in the same file falls into the same category as concatenating two HTML documents together in the same file (something you just don't do). Related: json.load() should probably not be used on any streaming input source such as a file wrapped around a socket. The first thing it does is consume the entire input by calling f.read()---which probably not what someone is expecting (and it might even cause the whole program to hang).
msg78555 - (view) Author: Bob Ippolito (bob.ippolito) * (Python committer) Date: 2008-12-30 18:42
You're the first person to ever raise any of these issues in the slightly more than 3 years that the package has been around (by other names), so I'm not sure such a warning needs to be that big. JSON doesn't really have any framing, so serializing more than one document to or from the same place doesn't work so well. It's not even talked about in the spec, and I've never seen someone try it before.
msg78556 - (view) Author: David M. Beazley (beazley) Date: 2008-12-30 19:02
Just consider me to be an impartial outside reviewer. Hypothetically, let's say I'm a Python programmer who knows a thing or two about standard library modules (like pickle), but I'm new to JSON so I come looking at the json module documentation. The documentation tells me it uses the same interface as pickle and marshal (even naming those two modules right off the bat). So, right away, I'm thinking the module probably does all of the usual things that pickle and marshal can do. For instance, serializing multiple objects to the same stream. However, it doesn't work this way and the only way to find out that it doesn't work is to either try it and get an error, or to read the source code and figure it out. I'm not reporting this as an end-user of the json module, but as a Python book author who is trying to get things right and to be precise. I think if you're going to keep the pickle and marshal reference I would add the warning message. Otherwise, I wouldn't mention pickle or marshal at all.
msg78557 - (view) Author: Bob Ippolito (bob.ippolito) * (Python committer) Date: 2008-12-30 20:30
Ok, I've added some notes to the trunk of simplejson's documentation. Not sure when/if that'll hit the Python trunks, I've been having a hard time getting my other patches to sync up with simplejson through: http://bugs.python.org/issue4136
msg78559 - (view) Author: David M. Beazley (beazley) Date: 2008-12-30 20:49
Thanks! Hopefully I'm not giving you too much work to do :-). Cheers, Dave
msg114850 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-08-24 23:22
Bob, what is the status of this bug?
msg133650 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011-04-13 08:24
Attached patch adds a note about the effects of using dump several times on the same file.
msg133688 - (view) Author: Bryce Verdier (louiscipher) Date: 2011-04-13 20:25
Not to nitpick, but what about the wording used in the simplejson documentation that Bob wrote almost 3 years ago? Note JSON is not a framed protocol so unlike pickle or marshal it does not make sense to serialize more than one JSON document without some container protocol to delimit them I also feel that it sounds a little bit cleaner. http://simplejson.github.com/simplejson/#simplejson.dump
msg133705 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011-04-14 03:01
I saw that and found it not clear, that's why I rephrased it. In order to understand that one has to know what is a "framed protocol", what can be considered a "JSON document" (is a single object a JSON document? or does it need to be serialized first?), what is a "container protocol" (can I use one? where can I find it? is there a default one for JSON?). I think it's clearer to just say that you can't do json.dump(obj1, f); dump(obj2, f). I also omitted the note on `load`, because if you can't add more objects to the same file using json.dump you won't even try to use json.load to extract them one by one.
msg133784 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-04-15 04:40
New changeset 8dbf072556b9 by Ezio Melotti in branch '2.7': #4783: document that is not possible to use json.dump twice on the same stream. http://hg.python.org/cpython/rev/8dbf072556b9 New changeset 2ec08aa2c566 by Ezio Melotti in branch '3.1': #4783: document that is not possible to use json.dump twice on the same stream. http://hg.python.org/cpython/rev/2ec08aa2c566 New changeset 1e315794ac8c by Ezio Melotti in branch '3.2': #4783: Merge with 3.1. http://hg.python.org/cpython/rev/1e315794ac8c New changeset 91881e304e13 by Ezio Melotti in branch 'default': #4783: Merge with 3.2. http://hg.python.org/cpython/rev/91881e304e13
History
Date User Action Args
2022-04-11 14:56:43 admin set github: 49033
2011-04-15 04:41:56 ezio.melotti set status: open -> closedassignee: bob.ippolito -> ezio.melottiresolution: fixedstage: patch review -> resolved
2011-04-15 04:40:55 python-dev set nosy: + python-devmessages: +
2011-04-14 03:01:34 ezio.melotti set messages: +
2011-04-13 20:25:42 louiscipher set nosy: + louisciphermessages: +
2011-04-13 08:24:17 ezio.melotti set files: + issue4783.diffnosy: + ezio.melottimessages: + keywords: + patch, easy, needs reviewstage: patch review
2010-08-24 23:22:06 eric.araujo set nosy: + docs@python, eric.araujo, - georg.brandltitle: json documentation needs a BAWM (Big A** Warning Message) -> document that json.load/dump can’t be used twice on the same streammessages: + versions: + Python 3.1, Python 2.7, Python 3.2, - Python 2.6, Python 3.0
2008-12-30 20:49:56 beazley set messages: +
2008-12-30 20:30:36 bob.ippolito set messages: +
2008-12-30 19:02:16 beazley set messages: +
2008-12-30 18:42:33 bob.ippolito set priority: lowmessages: +
2008-12-30 17:23:10 benjamin.peterson set assignee: georg.brandl -> bob.ippolitonosy: + bob.ippolito
2008-12-30 17:16:49 beazley create