split - Split strings at delimiters - MATLAB (original) (raw)
Split strings at delimiters
Syntax
Description
[newStr](#bugc8gx-1-newStr) = split([str](#bugc8gx-1%5Fsep%5Fbu4l86d-str))
divides str
at whitespace characters and returns the result as the output array newStr
. The input array str
can be a string array, character vector, or cell array of character vectors. Ifstr
is a string array, then so is newStr
. Otherwise, newStr
is a cell array of character vectors.newStr
does not include the whitespace characters fromstr
.
If str
is a string array or cell array of character vectors, and has multiple elements, then each element must be divisible into the same number of substrings.
- If
str
is a string scalar or character vector, thennewStr
is anN
-by-1
string array or cell array of character vectors, whereN
is the number of substrings. - If
str
is anM
-by-1
string array or cell array, thennewStr
is anM
-by-N
array. - If
str
is a1
-by-M
string array or cell array, thennewStr
is an1
-by-M
-by-N
array.
For a string array or cell array of any size, split
orients theN
substrings along the first trailing dimension with a size of 1
.
If the number of substrings is not the same for every element ofstr
, then call split
in afor
-loop to divide the elements of str
one at a time.
[newStr](#bugc8gx-1-newStr) = split([str](#bugc8gx-1%5Fsep%5Fbu4l86d-str),[delimiter](#bugc8gx-1-delimiter))
divides each element of str
at the delimiters specified bydelimiter
. The output newStr
does not include the delimiters.
[newStr](#bugc8gx-1-newStr) = split([str](#bugc8gx-1%5Fsep%5Fbu4l86d-str),[delimiter](#bugc8gx-1-delimiter),[dim](#bugc8gx-1-dim))
divides each element of str
into a vector oriented along the dimension specified by dim
.
[[newStr](#bugc8gx-1-newStr),[match](#bugc8gx-1-matches)] = split(___)
additionally returns an array,match
, that contains all occurrences of delimiters at which the split
function splits str
. You can use this syntax with any of the input arguments of the previous syntaxes.
Examples
Split Strings at Whitespace and Rejoin Them
Split names in a string array at whitespace characters. Then reorder the strings and join them so that the last names precede the first names.
Create a 3-by-1 string array containing names.
names = ["Mary Butler"; "Santiago Marquez"; "Diana Lee"]
names = 3x1 string "Mary Butler" "Santiago Marquez" "Diana Lee"
Split names
at whitespace characters, making it a 3-by-2 string array.
names = 3x2 string "Mary" "Butler" "Santiago" "Marquez" "Diana" "Lee"
Switch the columns of names
so that the last names are in the first column. Add a comma after each last name.
names = [names(:,2) names(:,1)]; names(:,1) = names(:,1) + ','
names = 3x2 string
"Butler," "Mary"
"Marquez," "Santiago"
"Lee," "Diana"
Join the last and first names. The join
function places a space character between the strings it joins. After the join, names
is a 3-by-1 string array.
names = 3x1 string "Butler, Mary" "Marquez, Santiago" "Lee, Diana"
Split String at Delimiter and Join with New Delimiter
Create a string that contains the path to a folder.
myPath = "/Users/jdoe/My Documents/Examples"
myPath = "/Users/jdoe/My Documents/Examples"
Split the path at the /
character. split
returns myFolders
as a 5-by-1 string array. The first string is ""
because myPath
starts with the /
character.
myFolders = split(myPath,"/")
myFolders = 5x1 string "" "Users" "jdoe" "My Documents" "Examples"
Join myFolders
into a new path with \
as the delimiter. Add C:
as the beginning of the path.
myNewPath = join(myFolders,""); myNewPath = 'C:' + myNewPath
myNewPath = "C:\Users\jdoe\My Documents\Examples"
Split String Using Pattern as Delimiter
Since R2020b
Get the numbers from a string by treating text as a delimiter. Use a pattern to match the text. Then add up the numbers.
First, create a string that has numbers in it.
str = "10 apples 3 bananas and 5 oranges"
str = "10 apples 3 bananas and 5 oranges"
Then, create a pattern that matches a space character or letters.
pat = " " | lettersPattern
pat = pattern Matching:
" " | lettersPattern
Split the string using pat
as the delimiter. The empty strings represent splits between spaces and sequences of letters that had nothing else between them. For example, in "10 apples"
, there is a split before the delimiter " "
, and then between " "
and "apples"
. Since there is nothing between the delimiters " "
and "apples"
, the split
function returns an empty string to indicate there is nothing between them.
N = 11x1 string "10" "" "" "3" "" "" "" "" "5" "" ""
Discard the empty strings and keep the substrings that represent numbers.
N = 3x1 string "10" "3" "5"
Finally, convert N
to a numeric array and sum over it.
N = str2double(N); sum(N)
For a list of functions that create pattern objects, see pattern.
Split String at Multiple Delimiters
Create a string.
str = "A horse! A horse! My kingdom for a horse!"
str = "A horse! A horse! My kingdom for a horse!"
Split str
at exclamation points and at whitespace characters. newStr
is a 10-by-1 string array. The last string is an empty string, ""
, because the last character in str
is a delimiter.
newStr = split(str,[" ","!"])
newStr = 12x1 string "A" "horse" "" "A" "horse" "" "My" "kingdom" "for" "a" "horse" ""
Split String Array with Missing Data Between Delimiters
Create a string array in which each element contains comma-delimited data about a patient.
patients = ["LastName,Age,Gender,Height,Weight"; "Adams,47,F,64,123"; "Jones,,,68,175"; "King,,M,66,180"; "Smith,38,F,63,118"]
patients = 5x1 string "LastName,Age,Gender,Height,Weight" "Adams,47,F,64,123" "Jones,,,68,175" "King,,M,66,180" "Smith,38,F,63,118"
Split the string array. A pair of commas with nothing between them indicates missing data. When split
divides on repeated delimiters, it returns empty strings as corresponding elements of the output array.
patients = split(patients,",")
patients = 5x5 string
"LastName" "Age" "Gender" "Height" "Weight"
"Adams" "47" "F" "64" "123"
"Jones" "" "" "68" "175"
"King" "" "M" "66" "180"
"Smith" "38" "F" "63" "118"
Orient Strings Along Specified Dimension
Create a 3-by-1 string array containing names.
names = ["Mary Butler"; "Santiago Marquez"; "Diana Lee"]
names = 3x1 string "Mary Butler" "Santiago Marquez" "Diana Lee"
Split the array at whitespace characters. By default, split
orients the output substrings along the first trailing dimension with a size of 1. Because names
is a 3-by-1 string array, split
orients the substrings along the second dimension of splitNames
, that is, the columns.
splitNames = split(names)
splitNames = 3x2 string "Mary" "Butler" "Santiago" "Marquez" "Diana" "Lee"
To orient the substrings along the rows, or first dimension, specify the dimension after you specify the delimiter. splitNames
is now a 2-by-3 string array, with the first names in the first row and the last names in the second row.
splitNames = split(names," ",1)
splitNames = 2x3 string "Mary" "Santiago" "Diana" "Butler" "Marquez" "Lee"
Split String and Return Delimiters
Create a string.
str = "bacon, lettuce, and tomato"
str = "bacon, lettuce, and tomato"
Split str
on delimiters. Return the results of the split in a string array, and the delimiters in a second string array. When there is no text between consecutive delimiters, split
returns an empty string.
[newStr,match] = split(str,["and",","," "])
newStr = 7x1 string "bacon" "" "lettuce" "" "" "" "tomato"
match = 6x1 string "," " " "," " " "and" " "
Join newStr
and match
back together with the join
function.
originalStr = join(newStr,match)
originalStr = "bacon, lettuce, and tomato"
Input Arguments
str
— Input text
string array | character vector | cell array of character vectors
Input text, specified as a string array, character vector, or cell array of character vectors.
delimiter
— Delimiting substrings
string array | character vector | cell array of character vectors | pattern
array (since R2020b)
Delimiting substrings, specified as one of the following:
- String array
- Character vector
- Cell array of character vectors
- pattern array (since R2020b)
The substrings specified in delimiter
do not appear in the output newStr.
Specify multiple delimiters in a string array, cell array of character vectors, or pattern
array. The split
function splits str on the elements ofdelimiter
. The order in which delimiters appear indelimiter
does not matter unless multiple delimiters begin a match at the same character in str
. In that case, the split
function splits on the first matching delimiter in delimiter
.
Example: split(str,{' ',',','--'})
splitsstr
on spaces, commas, and pairs of consecutive dashes.
dim
— Dimension along which to split strings
positive integer
Dimension along which to split strings, specified as a positive integer. If dim
is not specified, then the default is the last array dimension with a size that does not equal 1
.
Output Arguments
newStr
— Substrings split out of original array
string array | cell array of character vectors
Substrings split out of original array, returned as a string array or cell array of character vectors. If the input array str
is a string array, then so is newStr
. Otherwise,newStr
is a cell array of character vectors.
match
— Identified delimiters
string array | cell array of character vectors
Identified delimiters, returned as a string array or cell array of character vectors. If the input array str
is a string array, then so is match
. Otherwise,match
is a cell array of character vectors.
match
always contains one fewer element than outputnewStr contains.
Extended Capabilities
Tall Arrays
Calculate with arrays that have more rows than fit in memory.
Thesplit
function fully supports tall arrays. For more information, see Tall Arrays.
Thread-Based Environment
Run code in the background using MATLAB® backgroundPool
or accelerate code with Parallel Computing Toolbox™ ThreadPool
.
This function fully supports thread-based environments. For more information, see Run MATLAB Functions in Thread-Based Environment.
Distributed Arrays
Partition large arrays across the combined memory of your cluster using Parallel Computing Toolbox™. (since R2024b)
This function fully supports distributed arrays. For more information, see Run MATLAB Functions with Distributed Arrays (Parallel Computing Toolbox).
Version History
Introduced in R2016b