Safer data serialization with marshal module · Issue #113626 · python/cpython (original) (raw)

Feature or enhancement

The main purpose of the marshal module -- serialization of precompiled module code objects. This requires support of code objects, strings for names, and other primitive Python types and simple collection types referred by the code object.

It allows to use it as more data generic serialization tool -- more limited than pickle, but less limited than JSON. The marshal module supports different versions of the format and backward compatible with all earlier versions. But only if the data does not contain code objects. The format of the code objects changed with every Python version, and this is not reflected in marshal format version. Loading marshal data created in different Python version has undefined behavior if the data contains a code object.

I propose to add a keyword-only parameter allow_code with default value True in marshal functions. Specifying allow_code=False forbid saving and loading code objects. It allows to be safer when you load external data and to guarantee that the output can be safely loaded in other Python.

Linked PRs