[Python-Dev] Re: Re: lists v. tuples (original) (raw)

Tim Peters tim_one@email.msn.com
Thu, 20 Mar 2003 00:58:55 -0500

Previous message: [Python-Dev] Re: Re: lists v. tuples
Next message: [Python-Dev] Re: Re: lists v. tuples
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

[ark@research.att.com]

The binary-search routines in the C++ standard library mostly avoid having to do != comparisons by defining their interfaces in the following clever way:

binarysearch returns a boolean that indicates whether the value sought is in the sequence. It does not say where that value is. lowerbound returns the first position ahead of which the given value could be inserted without disrupting the ordering of the sequence. upperbound returns the last position ahead of which the given value could be inserted without disrupting the ordering of the sequence.

These last two are quite like Python's bisect.bisect_{left, right}, which are implemented using only lt element comparison.

equalrange returns (lowerbound, upperbound) as a pair.

In Python terms: binarysearch([3, 5, 7], 6) would yield False binarysearch([3, 5, 7], 7) would yield True lowerbound([1, 3, 5, 7, 9, 11], 9) would yield 4 lowerbound([1, 3, 5, 7, 9, 11], 8) would also yield 4 upperbound([1, 3, 5, 7, 9, 11], 9) would yield 5

import bisect x = [1, 3, 5, 7, 9, 11] bisect.bisectleft(x, 9) 4 bisect.bisectleft(x, 8) 4 bisect.bisectright(x, 9) 5

We conclude that C++ did something right .

equalrange([1, 1, 3, 3, 3, 5, 5, 5, 7], 3) would yield (2, 5).

If you like, equalrange(seq, x) returns (l, h) such that all the elements of seq[l:h] are equal to x. If l == h, the subsequence is the empty sequence between the two adjacent elements with values that bracket x. These definitions turn out to be useful in practice, and are also easy to implement efficiently using only < comparisons.

I think Python got the most valuable of these, and they're useful in Python too. Nevertheless, if you're coding an explicit conventional binary search tree (nodes containing a value, a reference to "a left" node, and a reference to "a right" node), cmp() is more convenient; and even more so if you're coding a ternary search tree.

Sometimes cmp allows for more compact code. Python's previous samplesort implementation endured a little clumsiness to infer equality (a == b) from not (ab). The current adaptive mergesort feels the restriction to < more acutely and in more places. For example, when merging two runs A and B, part of the adaptive strategy is to precompute, via a form of binary search, where A[0] belongs in B, and where B[-1] belongs in A. This sounds like two instances of the same task, but they're maddeningly different because-- in order to preserve stability --the first search needs to be of the bisect_left flavor and the second of bisect_right. Combining both modes of operation in a single search routine with a flag argument, and sticking purely to lt, leads to horridly obscure code, so these searches are actually implemented by distinct functions. If it were able to use cmp() instead, folding them into one routine would have been unobjectionable (if < is needed, check for cmp < 0; if <= is needed, check for cmp <= 0 same-as cmp < 1; so 0 or 1 could be passed in to select between < and <= very efficiently and reasonably clearly).

Previous message: [Python-Dev] Re: Re: lists v. tuples
Next message: [Python-Dev] Re: Re: lists v. tuples
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]