Python Group list by first character of string (original) (raw)

Last Updated : 23 Mar, 2023

Sometimes, we have a use case in which we need to perform the grouping of strings by various factors, like first letter or any other factor. These types of problems are typical to database queries and hence can occur in web development while programming. This article focuses on one such grouping by the first letter of the string. Let’s discuss certain ways in which this can be performed.

Method #1: Using next() + lambda + loop

The combination of the above 3 functions is used to solve this particular problem by the naive method. The lambda function performs the task of finding like initial character, and the next function helps in forwarding iteration.

Python3

test_list = [ 'an' , 'a' , 'geek' , 'for' , 'g' , 'free' ]

print ( "The original list : " + str (test_list))

def util_func(x, y): return x[ 0 ] = = y[ 0 ]

res = []

for sub in test_list:

`` ele = next ((x for x in res if util_func(sub, x[ 0 ])), [])

`` if ele = = []:

`` res.append(ele)

`` ele.append(sub)

print ( "The list after Categorization : " + str (res))

Output :

The original list : ['an', 'a', 'geek', 'for', 'g', 'free'] The list after Categorization : [['an', 'a'], ['geek', 'g'], ['for', 'free']]

Time Complexity: O(n^2), where n is the number of elements in the list.
Auxiliary Space: O(n), where n is the number of elements in the list.

Method #2: Using sorted() + groupby()

This particular task can also be solved using the groupby function which offers a convenient method to solve this problem. The sorted function sorts the elements by initial character to be feed to groupby for the relevant grouping.

Python3

from itertools import groupby

test_list = [ 'an' , 'a' , 'geek' , 'for' , 'g' , 'free' ]

print ( "The original list : " + str (test_list))

def util_func(x): return x[ 0 ]

temp = sorted (test_list, key = util_func)

res = [ list (ele) for i, ele in groupby(temp, util_func)]

print ( "The list after Categorization : " + str (res))

Output :

The original list : ['an', 'a', 'geek', 'for', 'g', 'free'] The list after Categorization : [['an', 'a'], ['geek', 'g'], ['for', 'free']]

Time complexity: O(nlogn), where n is the length of the input list.
Auxiliary space: O(n), where n is the length of the input list.

Method #3: Using for loop

Python3

test_list = [ 'an' , 'a' , 'geek' , 'for' , 'g' , 'free' ]

print ( "The original list : " + str (test_list))

res = []

x = []

for i in test_list:

`` if i[ 0 ] not in x:

`` x.append(i[ 0 ])

for i in x:

`` p = []

`` for j in test_list:

`` if j[ 0 ] = = i:

`` p.append(j)

`` res.append(p)

print ( "The list after Categorization : " + str (res))

Output

The original list : ['an', 'a', 'geek', 'for', 'g', 'free'] The list after Categorization : [['an', 'a'], ['geek', 'g'], ['for', 'free']]

Time complexity: O(n^2), where n is the length of the input list.

Auxiliary space: O(n), where n is the length of the input list.

Approach 4: Using defaultdict

Python3

from collections import defaultdict

test_list = [ 'an' , 'a' , 'geek' , 'for' , 'g' , 'free' ]

print ( "The original list : " + str (test_list))

res = defaultdict( list )

for i in test_list:

`` res[i[ 0 ]].append(i)

print ( "The list after Categorization : " + str ( list (res.values())))

Output

The original list : ['an', 'a', 'geek', 'for', 'g', 'free'] The list after Categorization : [['an', 'a'], ['geek', 'g'], ['for', 'free']]

Time Complexity: O(n) where n is the number of elements in test_list
Auxiliary Space: O(n) as we use a defaultdict to store the result
Explanation:
We use a defaultdict to store the result where the key is the first character of each string in the list and value is the list of all strings with the same first character.
The defaultdict automatically initializes the value as an empty list if the key is not present. So, we don’t have to check if the key is present or not.
We iterate through the test_list and add the elements to the corresponding key in the defaultdict.
Finally, we convert the defaultdict values to a list to get the final result.\

Method #5: Using dictionary comprehension

Use dictionary comprehension to categorize the words based on their first character.

First, we create a set of unique first characters in the list using set comprehension: set([word[0] for word in test_list]).
Next, we create a dictionary comprehension where the keys are the first characters and the values are the list of words starting with that character: {char: [word for word in test_list if word.startswith(char)] for char in set([word[0] for word in test_list])}.
Finally, we print the result.

Python3

test_list = [ 'an' , 'a' , 'geek' , 'for' , 'g' , 'free' ]

print ( "The original list : " + str (test_list))

res = {char: [word for word in test_list if word.startswith(char)] for char in set ([word[ 0 ] for word in test_list])}

print ( "The list after Categorization : " + str (res))

Output

The original list : ['an', 'a', 'geek', 'for', 'g', 'free'] The list after Categorization : {'g': ['geek', 'g'], 'f': ['for', 'free'], 'a': ['an', 'a']}

Time complexity: O(n^2), where n is the length of the input list. This is because we use the startswith() method inside the list comprehension, which has a time complexity of O(n).

Auxiliary space: O(n), where n is the length of the input list. This is because we create a dictionary where each key has a list of words starting with that character.

Method #6: Using itertools.groupby() with sorted()

Use the itertools.groupby() function in combination with sorted() to group the words based on their first character.

Step-by-step approach:

First, sort the input list using sorted().
Then, use itertools.groupby() to group the words based on their first character. groupby() returns an iterator of (key, group) pairs, where key is the first character and group is an iterator of words starting with that character.
Iterate over the (key, group) pairs, convert the group iterator to a list using list(), and append it to the result list res.
Finally, print the result.

Below is the implementation of the above approach:

Python3

import itertools

test_list = [ 'an' , 'a' , 'geek' , 'for' , 'g' , 'free' ]

print ( "The original list : " + str (test_list))

res = []

for k, g in itertools.groupby( sorted (test_list), key = lambda x: x[ 0 ]):

`` res.append( list (g))

print ( "The list after Categorization : " + str (res))

Output

The original list : ['an', 'a', 'geek', 'for', 'g', 'free'] The list after Categorization : [['a', 'an'], ['for', 'free'], ['g', 'geek']]

Time complexity: O(n log n), where n is the length of the input list. This is because we use sorted() which has a time complexity of O(n log n).
Auxiliary space: O(n), where n is the length of the input list. This is because we create a list of lists, where each inner list contains the words starting with the same character.