Issue 28973: [doc] The fact that multiprocess.Queue uses serialization should be documented. (original) (raw)

Created on 2016-12-14 14:38 by Bernhard10, last changed 2022-04-11 14:58 by admin.

Files
File name Uploaded Description Edit
mwe.py Bernhard10,2016-12-14 14:38 Minimal working example to reproduce this bug/ surprising behaviour.
Messages (8)
msg283192 - (view) Author: Bernhard10 (Bernhard10) Date: 2016-12-14 14:38
When I did some tests involving unittest.mock.sentinel and multiprocessing.Queue, I noticed that multiprocessing.Queue changes the id of the sentinel. This behaviour is definitely surprising and not documented.
msg283193 - (view) Author: Bernhard10 (Bernhard10) Date: 2016-12-14 15:05
See http://stackoverflow.com/a/925241/5069869 Apparently multiprocessing.Queue uses pickle to serialize the objects in the queue, which explains the change of identity, but is absolutely unclear from the documentation.
msg283195 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-12-14 15:12
That fact that this is so is implicit in the name multi*process*ing and the documented restrictions of the id function. That is, it is the purpose of the module is to manage computation across multiple processes. Since different processes have distinct memory spaces, you cannot depend on object identity between processes, by the definition of object identity (it is constant only for the lifetime of the object in memory, and the different processes have different memory spaces, therefore the object id may be different in the different processes). By construction this applies also to any multiprocessing mechanism that is used to transmit objects, even if the transmission turns out to be to the same process in a particular case. You can't *depend* on the id in that case, because the transmission mechanism must be free to change the object identity in order to work in the general case. Should we document this explicitly? Perhaps so. Maybe in the multiprocessing introduction?
msg283198 - (view) Author: Bernhard10 (Bernhard10) Date: 2016-12-14 15:18
My first thought was that Queue was implemented using shared memory. I guess from the fact that the "Shared memory" section is separate in the multiprocessing documentation I should have known better, though. So I guess some clarification in the documentation would be helpful.
msg283199 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-12-14 15:37
Yeah, that's why I said "in the general case". Making it clear in the overview seems reasonable to me.
msg283206 - (view) Author: Davin Potts (davin) * (Python committer) Date: 2016-12-14 16:33
All communication between processes in multiprocessing has consistently used pickle to serialize the data being communicated (this includes what is described in the "Shared memory" section of the docs). The documentation has not done a great job of making this clear, instead only describing the requirement that data be pickleable in select places. For example, in the section on Queues: Note: When an object is put on a queue, the object is pickled and a background thread later flushes the pickled data to an underlying pipe. Though it only applies to 3.6+, still needs its own documentation improvement to make clear that the mechanism for communicating data defaults to serialization by pickle but that this can be replaced by alternatives. I agree that the documentation around the use of pickle in multiprocessing deserves improvement.
msg398809 - (view) Author: Irit Katriel (iritkatriel) * (Python committer) Date: 2021-08-02 23:08
There is a note mentioning pickle in this section: https://docs.python.org/3/library/multiprocessing.html#pipes-and-queues It starts with "When an object is put on a queue, the object is pickled and..." A comment about the object ids can be added there.
msg399179 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2021-08-07 13:17
Mentioning ids would be pretty much redundant with mentioning pickle. If it is pickled its id is going to change. I think Davin was suggesting that while the use of serialization is documented, it is not documented *consistently*. Everywhere serialization happens it should be mentioned in the docs. Regardless, a proposed doc PR is the way forward here.
History
Date User Action Args
2022-04-11 14:58:40 admin set github: 73159
2021-08-07 13:17:33 r.david.murray set messages: +
2021-08-02 23:08:41 iritkatriel set nosy: + iritkatrielversions: + Python 3.9, Python 3.10, Python 3.11, - Python 2.7, Python 3.5, Python 3.6, Python 3.7messages: + keywords: + easytitle: The fact that multiprocess.Queue uses serialization should be documented. -> [doc] The fact that multiprocess.Queue uses serialization should be documented.
2016-12-14 16:33:49 davin set versions: + Python 3.6, Python 3.7nosy: + davinmessages: + type: behavior -> enhancementstage: needs patch
2016-12-14 15:37:09 r.david.murray set messages: +
2016-12-14 15🔞24 Bernhard10 set messages: +
2016-12-14 15:12:31 r.david.murray set nosy: + r.david.murraymessages: +
2016-12-14 15:05:57 Bernhard10 set title: multiprocess.Queue changes objects identity -> The fact that multiprocess.Queue uses serialization should be documented.nosy: + docs@pythonmessages: + assignee: docs@pythoncomponents: + Documentation
2016-12-14 14:38:29 Bernhard10 create