because POSIX readdir does not guarantee any order glob often gives unexpectedly random results. Some background: for openSUSE Linux we build packages in the Open Build Service (OBS) which tracks dependencies, so when e.g. a new glibc is submitted, all packages depending on glibc are rebuilt and if those depending binaries changed, the new version is pushed to the mirrors. Many python modules build their .so files from a glob.glob(path, "*.cpp") The old glob behaviour would often lead to the linker randomly ordering functions in resulting object files, thus we were not able to auto-detect that the package did not actually change which wastes bandwidth of distribution mirrors and users. See also https://reproducible-builds.org/ on that topic. There are plenty affected packages out there https://github.com/pytries/datrie/blob/master/setup.py#L10https://github.com/jonashaag/bjoern/blob/master/setup.py#L6https://github.com/scipy/scipy/blob/master/scipy/sparse/linalg/dsolve/setup.py#L28
From my performance measurements, the overhead was negligible (not even counting the processing done on files returned by glob). And also glob in C, bash, perl all do sort by default and these are generally pretty fast languages, yet they still chose consistency over performance. I updated my PR to also update the documentation accordingly.
Sorry, we're going to reject this patch for the reasons discussed in the two other referenced patches. If a user wants sorted order, they can effortlessly specify that with sorted(glob('*.cpp')).
History
Date
User
Action
Args
2022-04-11 14:58:46
admin
set
github: 74646
2017-06-04 17:23:33
rhettinger
set
status: open -> closednosy: + rhettingermessages: + resolution: not a bug -> rejected
2017-06-04 09:02:38
bmwiedemann
set
status: closed -> openmessages: +
2017-05-25 00:56:05
rhettinger
set
status: open -> closedresolution: not a bugstage: resolved