Add key argument to tuple.index() (original) (raw)

January 6, 2024, 9:03pm 1

It would be useful to search a tuple in a functional-style manner

# tuple with structured data
data = ((1, 2), (2, 3))

# find index of 2 in classic manner:
for idx, item in enumerate(data):
    if item[1] == 2:
        break

# functional-style search:
idx = data.index(2, key=lambda i: i[1])

This manner is same to list.sort for example

data = [(1, 2), (2, 3)]
data.sort(key=lambda g: g[1])

I think that in some cases this will speed up Python scripts and improve code readability

GH Issue

Implementation

pf_moore (Paul Moore) January 6, 2024, 9:58pm 2

Is the proposal to add this feature just to the tuple type, or to lists as well? Or to make it required for all sequence types (by adding it to the Sequence ABC)?

Making it required for all sequences would break a lot of code. And conversely, making it only apply to tuples (or even tuples and lists) would make it rather useless, as a lot of code is written to accept generic sequences, either via duck typing, or via explicit type annotations accepting Sequence values.

I’m -1 on this - it doesn’t add enough benefit to be worthwhile, and if it’s not added to the general sequence protocol, I’d avoid it in favour of more general code even if it was available on built in types.

There is a list issue too.

I would like to implement this only for tuples

I think only a small part of production code uses the Sequence ABC interface. Basically, in the production code I see e.g tuple annotations but not sequence

But in general, we can implement this for all sequences

Stefan2 (Stefan) January 6, 2024, 10:36pm 4

Seems odd to enhance tuple.index. Have you ever used that at all? I don’t think I have. And I have used list.index many times. I think tuples are usually not used for data where you’d want to search for an index.

(But if we do get this, or get this for lists, I might suggest special-casing key=id so we get an ultra-fast search with just pointer comparisons, no object comparisons. That might btw also be good for some heapremove use cases, to optimize the linear search time.)

cameron (Cameron Simpson) January 6, 2024, 11:04pm 5

We can write this:

 idx = [ item[1] for item in data ].index(2)

and I’d think it’d be easy to construct a lazy version with enumerated
and a generator expression.

Cheers,
Cameron Simpson cs@cskk.id.au

jeff5 (Jeff Allen) January 7, 2024, 9:14am 6

And also, the rule of thumb is that lists are for elements of homogeneous type (must be, because you can insert or sort them), while tuples structure mixed data (like a C struct). I really think the outer structure in the motivating example is logically a list, and should have been.

However, my “like” on your post only extends to the first part.

I think it would be a source of confusion for users if the comparison function were magically to switch from __eq__ to is. I’m not even sure a method could tell when this is safe.

If you needed it to be fast for value keys you have built an index, making the outer container a dict. This API is conceived for elegance not speed.

Stefan2 (Stefan) January 7, 2024, 10:42am 7

Why would it be confusing and magically? Remember I’m talking about the users explicitly asking for that by passing key=id. Who would ask for that and then be confused to get what they asked for?

jeff5 (Jeff Allen) January 7, 2024, 2:46pm 8

Maybe I misunderstand the intent. Literally id, as in the built-in function? But used to modify the semantics … to what equivalent lambda expression or for-loop? Not I think the same as it would mean without special treatment.

MegaIng (Cornelius Krupp) January 7, 2024, 2:51pm 9

Assuming the value x can be found in tup, tup.find(x) is the same as next(i for i, val in enumerate(tup) if val == x)

By definition of id, id(a) == id(b) if and only if a is b. So tup.find(x, key=id) would mean the example same thing as next(i for i, val in enumerate(tup) if val is x).

I am still not in favor of adding key to the find function, but this operation could actually be useful.

xitop (Xitop) January 7, 2024, 3:02pm 10

If this functionality will be ever added to Python, I would expect a broadly usable solution, i.e. a complete set of utilities working with any sequence, e.g.: find and rfind; index and rindex with optional start and end arguments:
index(predicate, sequence [,start [,stop]])

jeff5 (Jeff Allen) January 7, 2024, 3:16pm 11

So in English: index in the tuple of the object I am already holding. That might be useful, but I think it is tup.find(id(x), key=id).

Unless, that is, the use of id as the key function changes the handling of the first argument (magically).

MegaIng (Cornelius Krupp) January 7, 2024, 3:20pm 12

Right, yes, this is an amazing counterpoint to adding key to find: I implicitly read it as also applying to the thing to look for, which conceptually matches my understanding of what key does in stuff like sort.

Stefan2 (Stefan) January 7, 2024, 3:56pm 13

Yes. Sorry that was unclear. I meant the optimization just as an implementation detail, not something deviating from the general suggestion of this topic.

Stefan2 (Stefan) January 7, 2024, 4:04pm 14

sort doesn’t have that issue, as you don’t give it an extra value. The OP’s example data.index(2, key=lambda i: i[1]) makes it clear the key function wouldn’t be applied to the search value. It’s like the bisect functions, where bisect(a, x, key=...) doesn’t apply the key function to the x value.

kknechtel (Karl Knechtel) January 7, 2024, 10:15pm 15

Here’s an itertools recipe off the top of my head:

from itertools import groupby

def keyed_index(iter, key, value):
    groups = groupby(map(key, iter), value.__eq__)
    try:
        does_match, elements = next(groups)
    except StopIteration:
        raise ValueError('not found') from None
    if does_match:
        return 0
    # have to do this before the next() call
    result = len(list(elements))
    try:
        next(groups)
    except StopIteration:
        raise ValueError('not found') from None
    return result

Stefan2 (Stefan) January 7, 2024, 10:33pm 16

That looks rather awful :-P. Especially compared to

from operator import indexOf

def keyed_index(iterable, key, value):
    return indexOf(map(key, iterable), value)

Stefan2 (Stefan) January 7, 2024, 10:40pm 17

And if that mixed data contains elements of different types, having a key function seems particularly odd, as that function would have to handle inputs of different types.

Stefan2 (Stefan) January 7, 2024, 10:44pm 18

Fun: for keyed_index(((1.0, 2.0), (2.0, 3.0)), lambda i: i[1], 42) you return index 0 instead of raising a “not found” error.

kknechtel (Karl Knechtel) January 7, 2024, 11:34pm 19

Ah, __eq__ won’t work like that with heterogeneous types, I keep forgetting. I’m also surprised that operator.indexOf apparently works with lazy iterators. Nice find, even better than an enumerate trick :wink: