Issue 33214: join method for list and tuple (original) (raw)

Issue33214

Created on 2018-04-03 14:33 by Javier Dehesa, last changed 2022-04-11 14:58 by admin.

Messages (9)
msg314881 - (view) Author: Javier Dehesa (Javier Dehesa) Date: 2018-04-03 14:33
It is pretty trivial to concatenate a sequence of strings: ''.join([str1, str2, ...]) Concatenating a sequence of lists is for some reason significantly more convoluted. Some current options include: sum([lst1, lst2, ...], []) [x for y [lst1, lst2, ...] for x in y] list(itertools.chain(lst1, lst2, ...)) The first one being the less recomendable but more intuitive and the third one being the faster but most cumbersome (see https://stackoverflow.com/questions/49631326/why-is-itertools-chain-faster-than-a-flattening-list-comprehension ). None of these looks like "the one obvious way to do it" to me. Furthermore, I feel a dedicated concatenation method could be more efficient than any of these approaches. If we accept that ''.join(...) is an intuitive idiom, why not provide the syntax: [].join([lst1, lst2, ...]) And while we are at it: ().join([tpl1, tpl2, ...]) Like with str, these methods should only accept sequences of objects of their own class (e.g. we could do [].join(list(s) for s in seqs) if seqs contains lists, tuples and generators). The use case for non-empty joiners would probably be less frequent than for strings, but it also solves a problem that has no clean solution with the current tools. Here is what I would probably do to join a sequence of lists with [None, 'STOP', None]: lsts = [lst1, lst2, ...] joiner = [None, 'STOP', None] lsts_joined = list(itertools.chain.from_iterable(lst + joiner for lst in lsts))[:-len(joiner)] Which is awful and inefficient (I am not saying this is the best or only possible way to solve it, it is just what I, self-considered experienced Python developer, might write).
msg314882 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2018-04-03 14:40
join() is a bad choice, because new developers will confusing list.join with str.join. We could turn list.extend(iterable) into list.extend(*iterable). Or you could just use extend with a chain iterator: >>> l = [] >>> l.extend(itertools.chain([1], [2], [3])) >>> l [1, 2, 3]
msg314883 - (view) Author: Javier Dehesa (Javier Dehesa) Date: 2018-04-03 15:06
Thanks Christian. I thought of join precisely because it performs conceptually the same function as with str, so the parallel between ''.join(), [].join() and ().join() looked more obvious. Also there is os.path.join and PurePath.joinpath, so the verb seemed well-established. As for shared method names, index and count are present both in sequences and str - although it is true that these do return the same kind of object in any cases. I'm not saying your points aren't valid, though. Your proposed way with extend is I guess about the same as list(itertools.chain(...)), which could be considered to be enough. I just feel that is not particularly convenient, especially for newer developers, which will probably gravitate towards sum(...) more than itertools or a nested generator expression, but I may be wrong.
msg314885 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-04-03 15:23
String concatenation: f'{a}{b}{c}' List concatenation: [*a, *b, *c] Tuple concatenation: (*a, *b, *c) Set union: {*a, *b, *c} Dict merging: {**a, **b, **c}
msg352387 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2019-09-13 18:35
Note that all of Serhiy's examples are for a known, fixed number of things to concatenate/union/merge. str.join's API can be used for that by wrapping the arguments in an anonymous tuple/list, but it's more naturally for a variable number of things, and the unpacking generalizations haven't reached the point where: [*seq for seq in allsequences] is allowed. list(itertools.chain.from_iterable(allsequences)) handles that just fine, but I could definitely see it being convenient to be able to do: [].join(allsequences) That said, a big reason str provides .join is because it's not uncommon to want to join strings with a repeated separator, e.g.: # For not-really-csv-but-people-do-it-anyway ','.join(row_strings) # Separate words with spaces ' '.join(words) # Separate lines with newlines '\n'.join(lines) I'm not seeing even one motivating use case for list.join/tuple.join that would actually join on a non-empty list or tuple ([None, 'STOP', None] being rather contrived). If that's not needed, it might make more sense to do this with an alternate constructor (a classmethod), e.g.: list.concat(allsequences) which would avoid the cost of creating an otherwise unused empty list (the empty tuple is a singleton, so no cost is avoided there). It would also work equally well with both tuple and list (where making list.extend take varargs wouldn't help tuple, though it's a perfectly worthy idea on its own). Personally, I don't find using itertools.chain (or its from_iterable alternate constructor) all that problematic (though I almost always import it with from itertools import chain to reduce the verbosity, especially when using chain.from_iterable). I think promoting itertools more is a good idea; right now, the notes on concatenation for sequence types mention str.join, bytes.join, and replacing tuple concatenation with a list that you call extend on, but doesn't mention itertools.chain at all, which seems like a failure to make the best solution the discoverable/obvious solution.
msg352530 - (view) Author: Александр Семенов (iamsav) Date: 2019-09-16 09:33
in javascript join() is made the other way around ['1','2','3'].join(', ') so, [].join() may confuse some peoples.
msg352531 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2019-09-16 09:46
> in javascript join() is made the other way around > ['1','2','3'].join(', ') > so, [].join() may confuse some peoples. It would be too confusing to have two different approaches to join strings in Python. Besides ECMAScript 1 came out in 1997, 5 years after Python was first released. By that argument JavaScript that should.
msg352532 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-09-16 09:53
How common is the case of variable number of things to concatenate/union/merge? From my experience, in most ceases this looks like: result = [] for ...: # many complex statements # may include continue and break result.extend(items) # may be intermixed with result.append(item) So concatenating purely lists from some sequence is very special case. And there are several ways to perform it. result = [] for items in seq: result.extend(items) # nothing wrong with this simple code, really result = [x for items in seq for x in items] # may be less effective for really long sublists, # but looks simple result = list(itertools.chain.from_iterable(items)) # if you are itertools addictive ;-)
msg352534 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-09-16 10:04
It is history, but in 1997 Python had the same order of arguments as ECMAScript: string.join(words [, sep]). str.join() was added only in 1999 (226ae6ca122f814dabdc40178c7b9656caf729c2).
History
Date User Action Args
2022-04-11 14:58:59 admin set github: 77395
2019-09-16 10:04:49 serhiy.storchaka set messages: +
2019-09-16 09:53:43 serhiy.storchaka set messages: +
2019-09-16 09:46:11 christian.heimes set messages: +
2019-09-16 09:33:29 iamsav set nosy: + iamsavmessages: +
2019-09-13 18:35:07 josh.r set nosy: + josh.rmessages: +
2018-04-06 16:29:01 eric.araujo set nosy: + eric.araujo
2018-04-03 15:23:56 serhiy.storchaka set nosy: + serhiy.storchakamessages: +
2018-04-03 15:06:33 Javier Dehesa set messages: +
2018-04-03 14:40:42 christian.heimes set nosy: + christian.heimesmessages: + versions: + Python 3.8
2018-04-03 14:33:53 Javier Dehesa create