Issue 5381: json needs object_pairs_hook (original) (raw)

Created on 2009-02-27 08:37 by rhettinger, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
json_hook.diff rhettinger,2009-02-27 08:37 proof-of-concept patch: object_pair_hook()
json_hook.diff rhettinger,2009-03-18 03:55 pairs hook patch with tests and docs
Messages (15)
msg82825 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2009-02-27 08:37
If PEP372 goes through, Python is going to gain an ordered dict soon. The json module's encoder works well with it: >>> items = [('one', 1), ('two', 2), ('three',3), ('four',4), ('five',5)] >>> json.dumps(OrderedDict(items)) '{"one": 1, "two": 2, "three": 3, "four": 4, "five": 5}' But the decoder doesn't fare so well. The existing object_hook for the decoder passes in a dictionary instead of a list of pairs. So, all the ordering information is lost: >>> jtext = '{"one": 1, "two": 2, "three": 3, "four": 4, "five": 5}' >>> json.loads(jtext, object_hook=OrderedDict) OrderedDict({u'four': 4, u'three': 3, u'five': 5, u'two': 2, u'one': 1}) A solution is to provide an alternate hook that emits a sequence of pairs. If present, that hook should run instead of object_hook. A rough proof-of-concept patch is attached. FWIW, sample ordered dict code is at: http://code.activestate.com/recipes/576669/
msg82860 - (view) Author: Bob Ippolito (bob.ippolito) * (Python committer) Date: 2009-02-27 18:48
Why? According to RFC (emphasis mine): An object is an *unordered* collection of zero or more name/value pairs, where a name is a string and a value is a string, number, boolean, null, object, or array.
msg82864 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2009-02-27 19:59
Same reason as for config files and yaml files. Sometimes those files represent human edited input and if a machine re-edits, filters, or copies, it is nice to keep the original order (though it may make no semantic difference to the computer). For example, jsonrpc method invocations are done with objects having three properties (method, params, id). The machine doesn't care about the order of the properties but a human reader prefers the order listed: --> {"method": "postMessage", "params": ["Hello all!"], "id": 99} <-- {"result": 1, "error": null, "id": 99} If you're testing a program that filters json data (like a typical xml task), it is nice to write-out data in the same order received (failing to do that is a common complaint about misdesigned xml filters): --> {{"title": "awk", "author":"aho", "isbn":"123456789X"}, {"title": "taocp", "author":"knuth", "isbn":"987654321X"}" <-- {{"title": "awk", "author":"aho"}, {"title": "taocp", "author":"knuth"}} Semantically, those entries can be scrambled; however, someone reading the filtered result desires that the input and output visually correspond as much as possible. An object_pairs_hook makes this possible.
msg82865 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2009-02-27 20:11
FWIW, here's the intended code for the filter in the last post: books = json.loads(infile, object_hook=OrderedDict) for book in books: del book['isbn'] json.dumps(books, outfile)
msg82870 - (view) Author: Bob Ippolito (bob.ippolito) * (Python committer) Date: 2009-02-27 20:48
Fair enough, but the patch isn't usable because the decoder was rewritten in a later version of simplejson. There's another issue with patch to backport those back into Python http://bugs.python.org/issue4136 or you could just use the simplejson source here http://code.google.com/p/simplejson/
msg82872 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2009-02-27 20:57
Thanks. I'll write-up a patch against http://code.google.com/p/simplejson/ and assign it back to you for review.
msg82885 - (view) Author: Armin Ronacher (aronacher) * (Python committer) Date: 2009-02-27 23:38
Motivation: Yes. JSON says it's unordered. However Hashes in Ruby are ordered since 1.9 and they were since the very beginning in JavaScript and PHP.
msg83164 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2009-03-04 23:39
After enhancing namedtuple and ConfigParser, I found a simpler approach that doesn't involve extending the API. The simple way is to use ordered dictionaries directly. With a small tweak to OD's repr, it is fully substitutable for a dict without changing any client code or doctests (the OD loses its own eval/repr order-preserving roundtrip but what json already gives now). See attached patch.
msg83165 - (view) Author: Bob Ippolito (bob.ippolito) * (Python committer) Date: 2009-03-04 23:46
Unfortunately this is a patch for the old json lib... the new one has a C API and an entirely different method of parsing documents (for performance reasons).
msg83166 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2009-03-05 00:15
When do you expect the new C version to go in? I'm looking forward to it.
msg83170 - (view) Author: Bob Ippolito (bob.ippolito) * (Python committer) Date: 2009-03-05 00:29
Whenever someone applies the patch for http://bugs.python.org/issue4136 -- I don't know when that will happen.
msg83733 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2009-03-18 03:55
Bob would you please take a look at the attached patch.
msg83819 - (view) Author: Bob Ippolito (bob.ippolito) * (Python committer) Date: 2009-03-19 18:56
This patch looks good to me, my only comment is that the patch mixes tabs and spaces in the C code in a file that had no tabs previously
msg83820 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2009-03-19 19:19
Thanks for looking at this. Fixed the tab/space issue. Committed in r70471
msg84441 - (view) Author: Bob Ippolito (bob.ippolito) * (Python committer) Date: 2009-03-29 22:37
I fixed two problems with this that didn't show up in the test suite, this feature didn't work in load() and there was a problem with the pure python code path because the Python scanner needed a small change. Unfortunately I'm not sure how to best test the pure python code path with Python's test suite, but I ran across it when backporting to simplejson. r70702
History
Date User Action Args
2022-04-11 14:56:46 admin set github: 49631
2009-03-29 22:37:55 bob.ippolito set messages: +
2009-03-22 05:51:36 cheeaun set nosy: + cheeaun
2009-03-19 19:19:28 rhettinger set status: open -> closedresolution: acceptedmessages: +
2009-03-19 18:56:55 bob.ippolito set messages: +
2009-03-18 03:55:07 rhettinger set priority: normal -> highassignee: rhettinger -> bob.ippolitomessages: + files: + json_hook.diff
2009-03-18 02:01:21 rhettinger set files: - json_ordered.diff
2009-03-05 04:06:04 rhettinger set title: json need object_pairs_hook -> json needs object_pairs_hook
2009-03-05 00:29:21 bob.ippolito set messages: +
2009-03-05 00:15:07 rhettinger set messages: +
2009-03-04 23:46:19 bob.ippolito set messages: +
2009-03-04 23:39:56 rhettinger set files: + json_ordered.diffmessages: +
2009-02-27 23:38:55 aronacher set nosy: + aronachermessages: +
2009-02-27 20:57:26 rhettinger set assignee: bob.ippolito -> rhettingermessages: +
2009-02-27 20:48:23 bob.ippolito set resolution: not a bug -> (no value)messages: +
2009-02-27 20:11:16 rhettinger set messages: +
2009-02-27 19:59:12 rhettinger set messages: +
2009-02-27 18:48:08 bob.ippolito set resolution: not a bugmessages: +
2009-02-27 08:37:54 rhettinger create