extractBetween - Extract substrings between start and end points - MATLAB (original) (raw)
Extract substrings between start and end points
Syntax
Description
[newStr](#bvdtgvu-1-newStr) = extractBetween([str](#bvdtgvu-1%5Fsep%5Fbu4l86d-str),[startPat](#bvdtgvu-1-startStr),[endPat](#bvdtgvu-1-endStr))
extracts the substring from str
that occurs between the substrings startPat
and endPat
. The extracted substring does not include startPat
andendPat
.
newStr
is a string array if str
is a string array. Otherwise, newStr
is a cell array of character vectors.
If str
is a string array or a cell array of character vectors, then extractBetween
extracts substrings from each element ofstr
.
[newStr](#bvdtgvu-1-newStr) = extractBetween([str](#bvdtgvu-1%5Fsep%5Fbu4l86d-str),[startPos](#bvdtgvu-1-startPos),[endPos](#bvdtgvu-1-endPos))
extracts the substring from str
that occurs between the positionsstartPos
and endPos
, including the characters at those positions. extractBetween
returns the substring as newStr
.
[newStr](#bvdtgvu-1-newStr) = extractBetween(___,'Boundaries',[bounds](#mw%5F097400af-6f44-48ae-b6e8-d02f4e467205))
forces the starts and ends specified in any of the previous syntaxes to be either inclusive or exclusive. They are inclusive when bounds
is'inclusive'
, and exclusive when bounds
is'exclusive'
. For example,extractBetween(str,startPat,endPat,'Boundaries','inclusive')
returns startPat
, endPat
, and all the text between them as newStr
.
Examples
Create string arrays and select text that occurs between substrings.
str = "The quick brown fox"
str = "The quick brown fox"
Select the text that occurs between the substrings "quick "
and " fox"
. The extractBetween
function selects the text but does not include "quick "
or " fox"
in the output.
newStr = extractBetween(str,"quick "," fox")
Select substrings from each element of a string array. When you specify different substrings as start and end indicators, they must be contained in a string array or a cell array that is the same size as str
.
str = ["The quick brown fox jumps";"over the lazy dog"]
str = 2×1 string "The quick brown fox jumps" "over the lazy dog"
newStr = extractBetween(str,["quick ";"the "],[" fox";" dog"])
newStr = 2×1 string "brown" "lazy"
Since R2020b
Create a string array of text enclosed by tags.
str = ["Calculus I"; "Fall 2020"; "MWF 8:00-8:50"]
str = 3×1 string "Calculus I" "Fall 2020" "MWF 8:00-8:50"
Extract the text enclosed by tags. First create patterns that match any start tag and end tag by using the wildcardPattern
function.
startPat = "<" + wildcardPattern + ">"
startPat = pattern Matching:
"<" + wildcardPattern + ">"
endPat = "</" + wildcardPattern + ">"
endPat = pattern Matching:
"</" + wildcardPattern + ">"
Then call the extractBetween
function.
newStr = extractBetween(str,startPat,endPat)
newStr = 3×1 string "Calculus I" "Fall 2020" "MWF 8:00-8:50"
For a list of functions that create pattern objects, see pattern.
Create string arrays and select substrings between start and end positions that are specified as numbers.
Select the middle name. Specify the seventh and 11th positions in the string.
newStr = extractBetween(str,7,11)
Select substrings from each element of a string array. When you specify different start and end positions with numeric arrays, they must be the same size as the input string array.
str = ["Edgar Allen Poe";"Louisa May Alcott"]
str = 2×1 string "Edgar Allen Poe" "Louisa May Alcott"
newStr = extractBetween(str,[7;8],[11;10])
newStr = 2×1 string "Allen" "May"
Select text from string arrays with boundaries that are forced to be inclusive or exclusive. extractBetween
includes the boundaries with the selected text when the boundaries are inclusive. extractBetween
does not include the boundaries with the selected text when the boundaries are exclusive.
str1 = "small|medium|large"
str1 = "small|medium|large"
Select the text between sixth and 13th positions, but do not include the characters at those positions.
newStr = extractBetween(str1,6,13,'Boundaries','exclusive')
Select the text between two substrings, and also the substrings themselves.
str2 = "The quick brown fox jumps over the lazy dog"
str2 = "The quick brown fox jumps over the lazy dog"
newStr = extractBetween(str2," brown","jumps",'Boundaries','inclusive')
newStr = " brown fox jumps"
Create a character vector and select text between start and end positions.
chr = 'mushrooms, peppers, and onions'
chr = 'mushrooms, peppers, and onions'
newChr = extractBetween(chr,12,18)
newChr = 1×1 cell array {'peppers'}
Select text between substrings.
newChr = extractBetween(chr,'mushrooms, ',', and')
newChr = 1×1 cell array {'peppers'}
Input Arguments
Input text, specified as a string array, character vector, or cell array of character vectors.
Text or pattern that marks the start position of the text to extract, specified as one of the following:
- String array
- Character vector
- Cell array of character vectors
- pattern array (since R2020b)
If str
is a string array or cell array of character vectors, then you can extract substrings from every element ofstr
. You can specify that the substrings either all have the same start or have different starts in each element ofstr
.
- To specify the same start, specify
startPat
as a character vector, string scalar, orpattern
object. - To specify different starts, specify
startPat
as a string array, cell array of character vectors, orpattern
array.
Example: extractBetween(str,"AB","YZ")
extracts the substrings between AB
and YZ
in each element of str
.
Example: If str
is a2
-by-1
string array, thenextractBetween(str,["AB";"FG"],["YZ";"ST"])
extracts the substrings between AB
and YZ
instr(1)
, and between FG
andST
in str(2)
.
Text or pattern that marks the end position of the text to extract, specified as one of the following:
- String array
- Character vector
- Cell array of character vectors
- pattern array (since R2020b)
If str
is a string array or cell array of character vectors, then you can extract substrings from every element ofstr
. You can specify that the substrings either all have the same end or have different ends in each element ofstr
.
- To specify the same end, specify
endPat
as a character vector, string scalar, orpattern
object. - To specify different ends, specify
endPat
as a string array, cell array of character vectors, orpattern
array.
Example: extractBetween(str,"AB","YZ")
extracts the substrings between AB
and YZ
in each element of str
.
Example: If str
is a2
-by-1
string array, thenextractBetween(str,["AB";"FG"],["YZ";"ST"])
extracts the substrings between AB
and YZ
instr(1)
, and between FG
andST
in str(2)
.
Start position, specified as a numeric array.
If str
is an array with multiple pieces of text, thenstartPos
can be a numeric scalar or numeric array of the same size as str
.
Example: extractBetween(str,5,9)
extracts the substrings from the fifth through the ninth positions in each element ofstr
.
Example: If str
is a2
-by-1
string array, thenextractBetween(str,[5;10],[9;21])
extracts the substring from the fifth through the ninth positions instr(1)
, and from the 10th through the 21st positions in str(2)
.
Data Types: double
| single
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
End position, specified as a numeric array.
If str
is an array with multiple pieces of text, thenendPos
can be a numeric scalar or numeric array of the same size as str
.
Example: extractBetween(str,5,9)
extract the substrings from the fifth through the ninth positions in each element ofstr
.
Example: If str
is a2
-by-1
string array, thenextractBetween(str,[5;10],[9;21])
extracts the substrings from the fifth through the ninth positions instr(1)
, and from the 10th through the 21st positions in str(2)
.
Data Types: double
| single
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
Boundary behavior, specified as 'inclusive'
or'exclusive'
. When boundary behavior is inclusive the start and end specified by previous arguments are included in the extracted text. If boundary behavior is exclusive, then the start and end are not included.
Output Arguments
Output text, returned as a string array or cell array of character vectors.
Extended Capabilities
TheextractBetween
function supports tall arrays with the following usage notes and limitations:
- Expansion in the first dimension is not supported with tall arrays.
For more information, see Tall Arrays.
Version History
Introduced in R2016b