setdiff - Difference of two sets of data - MATLAB (original) (raw)
Difference of two sets of data
Syntax
Description
[C](#btcnv2b-1-C) = setdiff([A,B](#btcnv2b-1%5Fsep%5Fshared-AB))
returns the data in A
that is not in B
, with no repetitions.C
is in sorted order.
- If
A
andB
are tables or timetables, thensetdiff
returns the rows fromA
that are not inB
. For timetables,setdiff
takes row times into account to determine equality, and sorts the output timetableC
by row times.
[C](#btcnv2b-1-C) = setdiff([A,B](#btcnv2b-1%5Fsep%5Fshared-AB),[setOrder](#btcnv2b-1-setOrder))
returns C
in a specific order. setOrder
can be'sorted'
or 'stable'
.
[C](#btcnv2b-1-C) = setdiff([A,B](#btcnv2b-1%5Fsep%5Fshared-AB),___,'rows')
and`C` = setdiff(`A,B`,'rows',___)
treat each row of A
and each row of B
as single entities and return the rows from A
that are not in B
, with no repetitions. You must specify A
and B
and optionally can specify setOrder
.
The 'rows'
option does not support cell arrays, unless one of the inputs is either a categorical array or a datetime array.
[[C](#btcnv2b-1-C),[ia](#btcnv2b-1-ia)] = setdiff(___)
also returns the index vector ia
using any of the previous syntaxes.
- Generally,
C = A(ia)
. - If the
'rows'
option is specified, thenC = A(ia,:)
. - If
A
andB
are tables or timetables, thenC = A(ia,:)
.
[[C](#btcnv2b-1-C),[ia](#btcnv2b-1-ia)] = setdiff([A,B](#btcnv2b-1%5Fsep%5Fshared-AB),'legacy')
and[`C`,`ia`] = setdiff(`A,B`,'rows','legacy')
preserve the behavior of the setdiff
function from R2012b and prior releases.
The 'legacy'
option does not support categorical arrays, datetime arrays, duration arrays, tables, or timetables.
Examples
Define two vectors with values in common.
A = [3 6 2 1 5 1 1]; B = [2 4 6];
Find the values in A
that are not in B
.
Define two tables with rows in common.
A = table([1:5]',['A';'B';'C';'D';'E'],logical([0;1;0;1;0]))
A=5×3 table Var1 Var2 Var3 ____ ____ _____
1 A false
2 B true
3 C false
4 D true
5 E false
B = table([1:2:10]',['A';'C';'E';'G';'I'],logical(zeros(5,1)))
B=5×3 table Var1 Var2 Var3 ____ ____ _____
1 A false
3 C false
5 E false
7 G false
9 I false
Find the rows in A
that are not in B
.
C=2×3 table Var1 Var2 Var3 ____ ____ _____
2 B true
4 D true
Define two vectors with values in common.
A = [3 6 2 1 5 1 1]; B = [2 4 6];
Find the values in A
that are not in B
as well as the index vector ia
, such that C = A(ia)
.
Define a table, A
, of gender, age, and height for five people.
A = table(['M';'M';'F';'M';'F'],[27;52;31;46;35],[74;68;64;61;64],... 'VariableNames',{'Gender' 'Age' 'Height'},... 'RowNames',{'Ted' 'Fred' 'Betty' 'Bob' 'Judy'})
A=5×3 table Gender Age Height ______ ___ ______
Ted M 27 74
Fred M 52 68
Betty F 31 64
Bob M 46 61
Judy F 35 64
Define a table, B
, with the same variables as A
.
B = table(['F';'M';'F';'F'],[64;68;62;58],[31;47;35;23],... 'VariableNames',{'Gender' 'Height' 'Age'},... 'RowNames',{'Meg' 'Joe' 'Beth' 'Amy'})
B=4×3 table Gender Height Age ______ ______ ___
Meg F 64 31
Joe M 68 47
Beth F 62 35
Amy F 58 23
Find the rows in A
that are not in B
, as well as the index vector ia
, such that C = A(ia,:)
.
C=4×3 table Gender Age Height ______ ___ ______
Judy F 35 64
Ted M 27 74
Bob M 46 61
Fred M 52 68
The rows of C
are in sorted order first by Gender
and next by Age
.
Define two matrices with rows in common.
A = [7 9 7; 0 0 0; 7 9 7; 5 5 5; 1 4 5]; B = [0 0 0; 5 5 5];
Find the rows from A
that are not in B
as well as the index vector ia
, such that C = A(ia,:)
.
[C,ia] = setdiff(A,B,'rows')
Use the setOrder
argument to specify the ordering of the values in C
.
Specify 'stable'
or 'sorted'
when the order of the values in C
are important.
A = [3 6 2 1 5 1 1]; B = [2 4 6]; [C,ia] = setdiff(A,B,'stable')
Alternatively, you can specify 'sorted'
order.
[C,ia] = setdiff(A,B,'sorted')
Define two vectors containing NaN
.
A = [5 NaN NaN]; B = [5 NaN];
Find the set difference of A
and B
.
setdiff
treats NaN
values as distinct.
Create a cell array of character vectors, A
.
A = {'dog','cat','fish','horse'};
Create a cell array of character vectors, B
, where some of the vectors have trailing white space.
B = {'dog ','cat','fish ','horse'};
Find the character vectors in A
that are not in B
.
C = 1×2 cell {'dog'} {'fish'}
setdiff
treats trailing white space in cell arrays of character vectors as distinct characters.
Create a character vector, A
.
A = ['cat';'dog';'fox';'pig']; class(A)
Create a cell array of character vectors, B
.
B={'dog','cat','fish','horse'}; class(B)
Find the character vectors in A
that are not in B
.
C = 2×1 cell {'fox'} {'pig'}
The result, C
, is a cell array of character vectors.
Use the 'legacy'
flag to preserve the behavior of setdiff
from R2012b and prior releases in your code.
Find the difference of A
and B
with the current behavior.
A = [3 6 2 1 5 1 1]; B = [2 4 6]; [C1,ia1] = setdiff(A,B)
Find the difference of A
and B
, and preserve the legacy behavior.
[C2,ia2] = setdiff(A,B,'legacy')
Input Arguments
Order flag, specified as 'sorted'
or 'stable'
, indicates the order of the values (or rows) in C
.
Flag | Description |
---|---|
'sorted' | The values (or rows) in C return in sorted order as returned by sort.ExampleC = setdiff([4 1 3 2 5],[2 1],'sorted')C = 3 4 5 |
'stable' | The values (or rows) in C return in the same order as inA.ExampleC = setdiff([4 1 3 2 5],[2 1],'stable')C = 4 3 5 |
Data Types: char
| string
Output Arguments
Difference of A
and B
, returned as a vector, matrix, table, or timetable. If the inputs A
and B
are tables or timetables, then the order of the variables in C
is the same as the order of the variables in A
.
The following describes the shape of C
when the inputs are vectors or matrices and when the 'legacy'
flag is not specified:
- If the
'rows'
flag is not specified andA
is a row vector, thenC
is a row vector. - If the
'rows'
flag is not specified andA
is not a row vector, thenC
is a column vector. - If the
'rows'
flag is specified, thenC
is a matrix containing the rows ofA
that are not inB
. - If all the values (or rows) of
A
are also inB
, thenC
is an empty matrix.
The class of C
is the same as the class of A
, unless:
A
is a character array andB
is a cell array of character vectors, in which caseC
is a cell array of character vectors.A
is a character vector, cell array of character vectors, or string, andB
is a categorical array, in which caseC
is a categorical array.A
is a cell array of character vectors or single character vector andB
is a datetime array, in which caseC
is a datetime array.A
is a character vector or cell array of character vectors andB
is a string array, in which caseC
is a string array.
Index to A
, returned as a column vector when the'legacy'
flag is not specified. ia
identifies the values (or rows) in A
that are not in B
. If there is a repeated value (or row) appearing exclusively in A
, thenia
contains the index to the first occurrence of the value (or row).
Tips
- To find the set difference with respect to a subset of variables from a table or timetable, you can use column subscripting. For example, you can use
setdiff(A(:,_`vars`_),B(:,_`vars`_))
, wherevars
is a positive integer, a vector of positive integers, a variable name, a cell array of variable names, or a logical vector. Alternatively, you can usevartype to create a subscript that selects variables of a specified type.
Extended Capabilities
Thesetdiff
function supports tall arrays with the following usage notes and limitations:
- The
'stable'
and'legacy'
options are not supported. - Inputs of type
'char'
are not supported. - Ordinal categorical arrays are not supported.
For more information, see Tall Arrays.
Usage notes and limitations:
- Code generation does not support cell arrays for the first or second arguments.
- When you do not specify the
'rows'
option:- Inputs
A
andB
must be vectors. If you specify the'legacy'
option, then inputsA
andB
must be row vectors. - The first dimension of a variable-size row vector must have fixed length 1. The second dimension of a variable-size column vector must have fixed length 1.
- Do not use
[]
to represent the empty set. Use a 1-by-0 or 0-by-1 input, for example,zeros(1,0)
, to represent the empty set. - If you specify the
'legacy'
option, then empty outputs are row vectors, 1-by-0. They are never 0-by-0.
- Inputs
- When you specify both the
'legacy'
and'rows'
options, the outputia
is a column vector. Ifia
is empty, then it is 0-by-1. It is never 0-by-0, even if the outputC
is 0-by-0. - When the
setOrder
is not'stable'
or when you specify the'legacy'
option, the inputs must already be sorted in ascending order. The first output,C
, is sorted in ascending order. - Complex inputs must be
single
ordouble
. - When one input is complex and the other input is real, do one of the following:
- Set
setOrder
to'stable'
. - Sort the real input in complex ascending order (by absolute value). Suppose the real input is
x
. Usesort(complex(x))
orsortrows(complex(x))
.
- Set
- See Code Generation for Complex Data with Zero-Valued Imaginary Parts (MATLAB Coder).
The setdiff
function supports GPU array input with these usage notes and limitations:
- The
'legacy'
flag is not supported. - 64-bit integers are not supported.
For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).
Usage notes and limitations:
- The
'legacy'
flag is not supported. - Table, timetable, categorical, datetime and duration inputs are not supported.
- Inputs of type
char
andstring
are not supported whenA
orB
is a cell array of character vectors. Convert cell arrays of character vectors input arguments to string arrays instead.
For more information, see Run MATLAB Functions with Distributed Arrays (Parallel Computing Toolbox).
Version History
Introduced before R2006a