pandas.Series.str.split — pandas 2.2.3 documentation (original) (raw)

Series.str.split(pat=None, *, n=-1, expand=False, regex=None)[source]#

Split strings around given separator/delimiter.

Splits the string in the Series/Index from the beginning, at the specified delimiter string.

Parameters:

patstr or compiled regex, optional

String or regular expression to split on. If not specified, split on whitespace.

nint, default -1 (all)

Limit number of splits in output.None, 0 and -1 will be interpreted as return all splits.

expandbool, default False

Expand the split strings into separate columns.

If True, return DataFrame/MultiIndex expanding dimensionality.
If False, return Series/Index, containing lists of strings.

regexbool, default None

Determines if the passed-in pattern is a regular expression:

If True, assumes the passed-in pattern is a regular expression
If False, treats the pattern as a literal string.
If None and pat length is 1, treats pat as a literal string.
If None and pat length is not 1, treats pat as a regular expression.
Cannot be set to False if pat is a compiled regex

Added in version 1.4.0.

Returns:

Series, Index, DataFrame or MultiIndex

Type matches caller unless expand=True (see Notes).

Raises:

ValueError

if regex is False and pat is a compiled regex

See also

Series.str.split

Split strings around given separator/delimiter.

Series.str.rsplit

Splits string around given separator/delimiter, starting from the right.

Series.str.join

Join lists contained as elements in the Series/Index with passed delimiter.

str.split

Standard library version for split.

str.rsplit

Standard library version for rsplit.

Notes

The handling of the n keyword depends on the number of found splits:

If found splits > n, make first n splits only
If found splits <= n, make all splits
If for a certain row the number of found splits < n, append None for padding up to n if expand=True

If using expand=True, Series and Index callers return DataFrame and MultiIndex objects, respectively.

Use of regex =False with a pat as a compiled regex will raise an error.

Examples

s = pd.Series( ... [ ... "this is a regular sentence", ... "https://docs.python.org/3/tutorial/index.html", ... np.nan ... ] ... ) s 0 this is a regular sentence 1 https://docs.python.org/3/tutorial/index.html 2 NaN dtype: object

In the default setting, the string is split by whitespace.

s.str.split() 0 [this, is, a, regular, sentence] 1 [https://docs.python.org/3/tutorial/index.html] 2 NaN dtype: object

Without the n parameter, the outputs of rsplit and splitare identical.

s.str.rsplit() 0 [this, is, a, regular, sentence] 1 [https://docs.python.org/3/tutorial/index.html] 2 NaN dtype: object

The n parameter can be used to limit the number of splits on the delimiter. The outputs of split and rsplit are different.

s.str.split(n=2) 0 [this, is, a regular sentence] 1 [https://docs.python.org/3/tutorial/index.html] 2 NaN dtype: object

s.str.rsplit(n=2) 0 [this is a, regular, sentence] 1 [https://docs.python.org/3/tutorial/index.html] 2 NaN dtype: object

The pat parameter can be used to split by other characters.

s.str.split(pat="/") 0 [this is a regular sentence] 1 [https:, , docs.python.org, 3, tutorial, index... 2 NaN dtype: object

When using expand=True, the split elements will expand out into separate columns. If NaN is present, it is propagated throughout the columns during the split.

s.str.split(expand=True) 0 1 2 3 4 0 this is a regular sentence 1 https://docs.python.org/3/tutorial/index.html None None None None 2 NaN NaN NaN NaN NaN

For slightly more complex use cases like splitting the html document name from a url, a combination of parameter settings can be used.

s.str.rsplit("/", n=1, expand=True) 0 1 0 this is a regular sentence None 1 https://docs.python.org/3/tutorial index.html 2 NaN NaN

Remember to escape special characters when explicitly using regular expressions.

s = pd.Series(["foo and bar plus baz"]) s.str.split(r"and|plus", expand=True) 0 1 2 0 foo bar baz

Regular expressions can be used to handle urls or file names. When pat is a string and regex=None (the default), the given pat is compiled as a regex only if len(pat) != 1.

s = pd.Series(['foojpgbar.jpg']) s.str.split(r".", expand=True) 0 1 0 foojpgbar jpg

s.str.split(r".jpg", expand=True) 0 1 0 foojpgbar

When regex=True, pat is interpreted as a regex

s.str.split(r".jpg", regex=True, expand=True) 0 1 0 foojpgbar

A compiled regex can be passed as pat

import re s.str.split(re.compile(r".jpg"), expand=True) 0 1 0 foojpgbar

When regex=False, pat is interpreted as the string itself

s.str.split(r".jpg", regex=False, expand=True) 0 0 foojpgbar.jpg