Issue 31377: remove *_INTERNED opcodes from marshal (original) (raw)

Created on 2017-09-07 04:54 by benjamin.peterson, last changed 2022-04-11 14:58 by admin.

Messages (8)
msg301569 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2017-09-07 04:54
The *_INTERN opcodes inform the marsahl reader to intern the encoded string after deserialization. I believe for pycs this is pointless because PyCode_New ends up interning all strings that are interesting to intern. Writing this opcodes makes pycs non-deterministic because the intern state may be inconsistent in the writer. See https://bugzilla.opensuse.org/show_bug.cgi?id=1049186
msg301571 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-09-07 06:25
Marshal is used not only in pyc files. It is used for fast data serialization, faster than pickle, json, etc.
msg301572 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2017-09-07 06:41
Used but not really supported. Anyway, I doubt intern round-tripping is a particularly important.
msg301576 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2017-09-07 08:17
w_ref() depends on refcnt already. I don't think removing *_INTERN opcode makes PYC reproducible. https://github.com/python/cpython/blob/1f06a680de465be0c24a78ea3b610053955daa99/Python/marshal.c#L269-L271 I think "intern one string, then share it 10 times" is faster than "share one string 10 times, then intern each of 10 references".
msg301592 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2017-09-07 16:36
On Thu, Sep 7, 2017, at 01:17, INADA Naoki wrote: > > INADA Naoki added the comment: > > w_ref() depends on refcnt already. > I don't think removing *_INTERN opcode makes PYC reproducible. > https://github.com/python/cpython/blob/1f06a680de465be0c24a78ea3b610053955daa99/Python/marshal.c#L269-L271 I know—we're going to have to do something about that, too. In practice, though, the interning behavior seems to be a bigger reproducibility problem. > I think "intern one string, then share it 10 times" is faster than > "share one string 10 times, then intern each of 10 references". We end up interning each reference individually currently.
msg301593 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2017-09-07 16:46
> We end up interning each reference individually currently. But interning interned string is much faster. It only checks flag. Interning normal string requires dict lookup.
msg301594 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2017-09-07 16:54
On Thu, Sep 7, 2017, at 09:46, INADA Naoki wrote: > > INADA Naoki added the comment: > > > We end up interning each reference individually currently. > > But interning interned string is much faster. It only checks flag. > Interning normal string requires dict lookup. We could makes sure the version in the internal marshal memo is interned if appropriate.
msg321413 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2018-07-11 07:58
I doubt that interning cause reproduciblity problem. AFAIK, all strings in code object are interned or not interned deterministically. https://bugzilla.opensuse.org/show_bug.cgi?id=1049186 This issue seems be caused by w_ref() based on object refcnt, not interning.
History
Date User Action Args
2022-04-11 14:58:52 admin set github: 75558
2018-07-11 10:37:20 methane link issue34033 dependencies
2018-07-11 07:58:22 methane set messages: +
2017-09-07 16:54:17 benjamin.peterson set messages: +
2017-09-07 16:46:03 methane set messages: +
2017-09-07 16:36:52 benjamin.peterson set messages: +
2017-09-07 08:17:39 methane set nosy: + methanemessages: +
2017-09-07 06:41:54 benjamin.peterson set messages: +
2017-09-07 06:25:43 serhiy.storchaka set nosy: + serhiy.storchakamessages: +
2017-09-07 04:54:12 benjamin.peterson create