Use version-sorting for all sorting by joshtriplett · Pull Request #115046 · rust-lang/rust (original) (raw)
We discussed this at length in today's @rust-lang/style meeting.
For context on my original writeup of this algorithm, I have no attachment to being anywhere close to the strverscmp
algorithm (and I do think strverscmp itself has some historical issues and bugs). I started out wanting to solve the problem of sorting x8 < x16 < x32 < x64 < x128 in that order, and similar, and knew version-sorting would get there, so I looked at strverscmp
. Then I found its quirks (from the manpage) as well as historical bugs (notably, musl's version history has some notes on glibc implementation issues that musl reworked to be bug-compatible with), so I reworked this algorithm description to make more sense because we have no need to be bug-compatible with strverscmp.
What we discussed and agreed on in the style meeting was that we shouldn't be path-dependent and end up with "strverscmp
with some tweaks/fixes". We should decide on the properties we want, come up with an algorithm that provides those properties, and not worry about how closely that aligns with strverscmp
or any other version-sorting algorithm.
We're also mindful of the fact that this is mostly a sort for Rust identifiers (and possibly future Rust style needs like Cargo.toml
but not yet), not a general sorting algorithm, and that sorting algorithms are context dependent. We want to prioritize making the mental model simple, but we're more flexible about the implementation being somewhat more complicated.
We do need to cover more corner cases in the document (it included some, but not all the ones raised here), by listing a series of identifiers and how they sort. (We don't think it's critical to document every difference between this and other version-sorting algorithms, but we can give a high-level difference like "This differs from other version-sorting algorithms, notably in the treatment of underscores and leading zeroes.".)
Properties we want to achieve, as agreed in the style meeting:
- We want to sort numeric values without regard for leading zeroes, so
x0a
<x00b
<x0c
. - We have to provide a total order, so we have to have a well-defined order between any non-equal strings.
- If identifiers are identical all the way to the end, other than the number of leading zeroes in one of its numeric components (e.g.
x005a09b
andx5a009b
), we should sort them based on the first such difference.- (We didn't try to figure out in the meeting whether we had consensus on which direction to sort in that case. Speaking for myself only, I have a mild preference for putting more leading zeroes first, but if there's a concrete use case that argues the other direction I'd prioritize the concrete use case.)
- We agreed that
_
makes sense to special-case, since it acts like a space between words. (We noted that for a fully general sort algorithm, we'd need to specify things like how_
and `` sort relative to each other, but for characters allowed in Rust identifiers, we don't.)
I'm going to revise this proposal to support these properties.