Issue 29694: race condition in pathlib mkdir with flags parents=True (original) (raw)

Created on 2017-03-02 13:53 by whitespacer, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
x1.diff arigo,2017-03-07 06:39 review
x2.diff arigo,2017-03-28 14:27
Pull Requests
URL Status Linked Edit
PR 1089 merged python-dev,2017-04-12 09:40
PR 1126 merged Mariatta,2017-04-14 02:02
PR 1127 merged Mariatta,2017-04-14 02:03
Messages (14)
msg288801 - (view) Author: whitespacer (whitespacer) Date: 2017-03-02 13:53
When pathlib mkdir is called with parents=True and some parent doesn't exists it recursively calls self.parent.mkdir(parents=True) after catching OSError. However after catching of OSError and before call to self.parent.mkdir(parents=True) somebody else can create parent dir, which will lead to FileExistsError exception.
msg289155 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2017-03-07 06:39
A different bug in the same code: if someone creates the directory itself between the two calls to ``self._accessor.mkdir(self, mode)``, then the function will fail with an exception even if ``exist_ok=True``. Attached is a patch that fixes both cases.
msg290087 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-03-24 13:05
Can you provide tests?
msg290102 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2017-03-24 17:45
It's a mess to write a test, because the exact semantics of .mkdir() are not defined as far as I can tell. This patch is a best-effort attempt at making .mkdir() work in the presence of common parallel filesystem changes, that is, other processes that would create the same directories at the same time. This patch is by no means an attempt at being a complete solution for similar problems. The exact semantics have probably never been discussed at all. For example, what should occur if a parent directory is removed just after .mkdir() created it? I'm not suggesting to discuss these issues now, but to simply leave them open. I'm trying instead to explain why writing a test is a mess (more than "just" creating another thread and creating/removing directories very fast while the main thread calls .mkdir()), because we have no exact notion of what should work and what shouldn't.
msg290477 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-03-25 10:49
You can monkey-patch os.mkdir and pathlib._NormalAccessor.mkdir. Without tests we can't be sure that the patch fixes the issue and that the bug will be not reintroduced by future changes. Tests are required for this change.
msg290723 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2017-03-28 14:27
Changes including a test. The test should check all combinations of "concurrent" creation of the directory. It hacks around at pathlib._normal_accessor.mkdir (patching "os.mkdir" has no effect, as the built-in function was already extracted and stored inside pathlib._normal_accessor).
msg291386 - (view) Author: Christoph Böddeker (Christoph Böddeker) Date: 2017-04-09 18:40
I had a problem that can be solved with the presented change. But I had also problems to reproduce it in a small example. I am not sure if a test is allowed to depend on external libraries. The code at the end executed with mpirun -np 3 python test.py always breaks with the current code of pathlib and works with the fix. Maybe this helps to write a test and shows a usecase where this fix is necessary. P.S.: I was not able to produce the error with multiprosessing. from mpi4py import MPI from pathlib import Path import tempfile comm = MPI.COMM_WORLD rank = comm.rank size = comm.size with tempfile.TemporaryDirectory() as tmp_dir: tmp_dir = comm.bcast(tmp_dir, root=0) p = Path(tmp_dir) / 'a' / 'b' comm.barrier() p.mkdir(parents=True, exist_ok=True)
msg291419 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2017-04-10 07:51
Maybe we should review pathlib.py for this kind of issues and first apply the fixes and new tests inside PyPy. That sounds like a better way to get things done for these rare issues, where CPython is understandably reluctant to do much changes. Note that the PyPy version of the stdlib already contains fixes that have not been merged back to CPython (or only very slowly), though so far they are the kind of issues that trigger more often on PyPy than on CPython, like GC issues.
msg291527 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2017-04-12 08:16
Update: a review didn't show any other similar problems (pathlib.py is a thin layer after all). Applied the fix and test (x2.diff) inside PyPy.
msg291539 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2017-04-12 09:48
https://github.com/python/cpython/pull/1089 (I fixed the problem with my CLA check. Now https://cpython-devguide.readthedocs.io/pullrequest.html#licensing says "you can ask for the CLA check to be run again" but doesn't tell how to do that, so as far as I can tell, I have to ask e.g. here.)
msg291625 - (view) Author: Mariatta (Mariatta) * (Python committer) Date: 2017-04-13 18:08
New changeset 22a594a0047d7706537ff2ac676cdc0f1dcb329c by Mariatta (Armin Rigo) in branch 'master': bpo-29694: race condition in pathlib mkdir with flags parents=True (GH-1089) https://github.com/python/cpython/commit/22a594a0047d7706537ff2ac676cdc0f1dcb329c
msg291637 - (view) Author: Mariatta (Mariatta) * (Python committer) Date: 2017-04-14 02:26
New changeset cbc46afa59dcc43c2c8c90ae7a0a0dc404325a89 by Mariatta in branch '3.6': [3.6] bpo-29694: race condition in pathlib mkdir with flags parents=True (GH-1089). (GH-1126) https://github.com/python/cpython/commit/cbc46afa59dcc43c2c8c90ae7a0a0dc404325a89
msg291638 - (view) Author: Mariatta (Mariatta) * (Python committer) Date: 2017-04-14 02:26
New changeset d7abeb7024b9755c291c29bdc8c4494246e975ad by Mariatta in branch '3.5': [3.5] bpo-29694: race condition in pathlib mkdir with flags parents=True (GH-1089). (GH-1127) https://github.com/python/cpython/commit/d7abeb7024b9755c291c29bdc8c4494246e975ad
msg291639 - (view) Author: Mariatta (Mariatta) * (Python committer) Date: 2017-04-14 02:27
I merged Armin's PR and backported tp 3.5 and 3.6. Closing this now. Thanks all :)
History
Date User Action Args
2022-04-11 14:58:43 admin set github: 73880
2019-11-12 12:50:16 xtreak link issue35192 superseder
2017-04-14 02:27:57 Mariatta set status: open -> closedresolution: fixedmessages: + stage: backport needed -> resolved
2017-04-14 02:26:26 Mariatta set messages: +
2017-04-14 02:26:18 Mariatta set messages: +
2017-04-14 02:03:58 Mariatta set pull_requests: + <pull%5Frequest1262>
2017-04-14 02:02:33 Mariatta set pull_requests: + <pull%5Frequest1261>
2017-04-13 18:58:04 Mariatta set stage: test needed -> backport needed
2017-04-13 18:08:17 Mariatta set nosy: + Mariattamessages: +
2017-04-12 09:48:43 arigo set messages: +
2017-04-12 09:40:59 python-dev set pull_requests: + <pull%5Frequest1232>
2017-04-12 08:16:22 arigo set messages: +
2017-04-10 07:51:43 arigo set messages: +
2017-04-09 18:40:58 Christoph Böddeker set nosy: + Christoph Böddekermessages: +
2017-03-28 14:27:12 arigo set files: + x2.diffmessages: +
2017-03-25 10:49:48 serhiy.storchaka set messages: +
2017-03-24 17:45:12 arigo set messages: +
2017-03-24 13:05:22 serhiy.storchaka set nosy: + serhiy.storchakamessages: + stage: patch review -> test needed
2017-03-18 16:53:10 berker.peksag set nosy: + berker.peksagcomponents: + Library (Lib)stage: patch review
2017-03-07 06:39:04 arigo set files: + x1.diffnosy: + arigomessages: + keywords: + patch
2017-03-02 13:53:33 whitespacer create