Issue 32040: Sorting pahtlib.Paths does give the same order as sorting the (string) filenames of that pathlib.Paths (original) (raw)

Created on 2017-11-15 19:48 by QbLearningPython, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (5)
msg306304 - (view) Author: (QbLearningPython) Date: 2017-11-15 19:48
While testing a module, I have found a weird behaviour of pathlib package. I have a list of pathlib.Paths and I sorted() it. I assumed that the order retrieved by sorting a list of Paths would be the same as the order retrieved by sorting the list of their corresponding (string) filenames. But it is not the case. I run the following example: ========================================================================== from pathlib import Path # order string filenames filenames_for_testing = ( '/spam/spams.txt', '/spam/spam.txt', '/spam/another.txt', '/spam/binary.bin', '/spam/spams/spam.ttt', '/spam/spams/spam01.txt', '/spam/spams/spam02.txt', '/spam/spams/spam03.ppp', '/spam/spams/spam04.doc', ) sorted_filenames = sorted(filenames_for_testing) # output ordered list of string filenames print() print("Ordered list of string filenames:") print() [print(f'\t{element}') for element in sorted_filenames] print() # order paths (build from same string filenames) paths_for_testing = [ Path(filename) for filename in filenames_for_testing ] sorted_paths = sorted(paths_for_testing) # outoput ordered list of pathlib.Paths print() print("Ordered list of pathlib.Paths:") print() [print(f'\t{element}') for element in sorted_paths] print() # compare print() if sorted_filenames == [str(path) for path in sorted_paths]: print('Ordered lists of string filenames and pathlib.Paths are EQUAL.') else: print('Ordered lists of string filenames and pathlib.Paths are DIFFERENT.') for element in range(0, len(sorted_filenames)): if sorted_filenames[element] != str(sorted_paths[element]): print() print('First different element:') print(f'\tElement #{element}') print(f'\t{sorted_filenames[element]} != {sorted_paths[element]}') break print() ========================================================================== The output of this script was: ========================================================================== Ordered list of string filenames: /spam/another.txt /spam/binary.bin /spam/spam.txt /spam/spams.txt /spam/spams/spam.ttt /spam/spams/spam01.txt /spam/spams/spam02.txt /spam/spams/spam03.ppp /spam/spams/spam04.doc Ordered list of pathlib.Paths: /spam/another.txt /spam/binary.bin /spam/spam.txt /spam/spams/spam.ttt /spam/spams/spam01.txt /spam/spams/spam02.txt /spam/spams/spam03.ppp /spam/spams/spam04.doc /spam/spams.txt Ordered lists of string filenames and pathlib.Paths are DIFFERENT. First different element: Element #3 /spam/spams.txt != /spam/spams/spam.ttt ========================================================================== As you can see, 'spam/spams.txt' goes in different places if you have sorted by pathlib.Paths than if you have sorted by string filenames. I think that it is weird that sorting pathlib.Paths yields a different result than sorting their string filenames. I think that pathlib.Paths should be ordered by alphabetical order of their corresponding filenames. Thank you.
msg306306 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-11-15 20:09
Paths are ordered by lexicographical order of their corresponding components. Paths are not strings, and this this order is more natural for them. The alphabetical order of Path strings: SPAMS.txt SPAM\file.txt spam\file.txt spams.txt spam-file.txt spam/file.txt spam_file.txt The lexicographical order of Path components: SPAM\file.txt SPAMS.txt spam\file.txt spams.txt spam/file.txt spam-file.txt spam_file.txt
msg306367 - (view) Author: (QbLearningPython) Date: 2017-11-16 16:12
Thanks, serhiy.storchaka, for your answer. I am not fully convinced. You have described the current behaviour of the pathlib package. But let me ask: should be this the desired behaviour? Since string filenames and pathlib.Paths are different ways to refer to the same object (a path in a filesystem), should not be they behaved in the same way when sorting? You pointed out that the current behaviour is "more natural order" for pathlib.Paths. I am not truly sure about that. Can you please provide any citation or additional information about that? Thank you.
msg306394 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-11-16 18:40
It is "obvious by inspection". Paths are paths instead of strings because they are formed out of discrete path components instead of strings. If you sorted each directory in the paths from the top down, and then sorted the subdirectories, and then sorted the filenames, you get that sorting by component. It's the same order you would get out of an ls -R.
msg306395 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-11-16 18:41
To put it another way, think about sorting a list of tuples. Same behavior.
History
Date User Action Args
2022-04-11 14:58:54 admin set github: 76221
2017-11-16 18:41:02 r.david.murray set messages: +
2017-11-16 18:40:03 r.david.murray set status: open -> closednosy: + r.david.murraymessages: + stage: resolved
2017-11-16 16:12:43 QbLearningPython set messages: +
2017-11-15 20:09:37 serhiy.storchaka set resolution: not a bugmessages: + nosy: + serhiy.storchaka
2017-11-15 19:48:55 QbLearningPython create