msg301569 - (view) |
Author: Benjamin Peterson (benjamin.peterson) *  |
Date: 2017-09-07 04:54 |
The *_INTERN opcodes inform the marsahl reader to intern the encoded string after deserialization. I believe for pycs this is pointless because PyCode_New ends up interning all strings that are interesting to intern. Writing this opcodes makes pycs non-deterministic because the intern state may be inconsistent in the writer. See https://bugzilla.opensuse.org/show_bug.cgi?id=1049186 |
|
|
msg301571 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2017-09-07 06:25 |
Marshal is used not only in pyc files. It is used for fast data serialization, faster than pickle, json, etc. |
|
|
msg301572 - (view) |
Author: Benjamin Peterson (benjamin.peterson) *  |
Date: 2017-09-07 06:41 |
Used but not really supported. Anyway, I doubt intern round-tripping is a particularly important. |
|
|
msg301576 - (view) |
Author: Inada Naoki (methane) *  |
Date: 2017-09-07 08:17 |
w_ref() depends on refcnt already. I don't think removing *_INTERN opcode makes PYC reproducible. https://github.com/python/cpython/blob/1f06a680de465be0c24a78ea3b610053955daa99/Python/marshal.c#L269-L271 I think "intern one string, then share it 10 times" is faster than "share one string 10 times, then intern each of 10 references". |
|
|
msg301592 - (view) |
Author: Benjamin Peterson (benjamin.peterson) *  |
Date: 2017-09-07 16:36 |
On Thu, Sep 7, 2017, at 01:17, INADA Naoki wrote: > > INADA Naoki added the comment: > > w_ref() depends on refcnt already. > I don't think removing *_INTERN opcode makes PYC reproducible. > https://github.com/python/cpython/blob/1f06a680de465be0c24a78ea3b610053955daa99/Python/marshal.c#L269-L271 I know—we're going to have to do something about that, too. In practice, though, the interning behavior seems to be a bigger reproducibility problem. > I think "intern one string, then share it 10 times" is faster than > "share one string 10 times, then intern each of 10 references". We end up interning each reference individually currently. |
|
|
msg301593 - (view) |
Author: Inada Naoki (methane) *  |
Date: 2017-09-07 16:46 |
> We end up interning each reference individually currently. But interning interned string is much faster. It only checks flag. Interning normal string requires dict lookup. |
|
|
msg301594 - (view) |
Author: Benjamin Peterson (benjamin.peterson) *  |
Date: 2017-09-07 16:54 |
On Thu, Sep 7, 2017, at 09:46, INADA Naoki wrote: > > INADA Naoki added the comment: > > > We end up interning each reference individually currently. > > But interning interned string is much faster. It only checks flag. > Interning normal string requires dict lookup. We could makes sure the version in the internal marshal memo is interned if appropriate. |
|
|
msg321413 - (view) |
Author: Inada Naoki (methane) *  |
Date: 2018-07-11 07:58 |
I doubt that interning cause reproduciblity problem. AFAIK, all strings in code object are interned or not interned deterministically. https://bugzilla.opensuse.org/show_bug.cgi?id=1049186 This issue seems be caused by w_ref() based on object refcnt, not interning. |
|
|