summary - Data summary - MATLAB (original) (raw)
Syntax
Description
summary([A](#btxc6c8-1-A))
displays a summary that includes the properties of and statistics for the input data.
summary([A](#btxc6c8-1-A),[dim](#btxc6c8-1-dim))
operates along dimension dim
. For example, you can summarize each row in a matrix A
using summary(A,2)
.
summary(___,[Name=Value](#namevaluepairarguments))
specifies options using one or more name-value arguments in addition to any of the input combinations in the previous syntaxes. For example,summary(A,Statistics="std")
includes only the standard deviation of the input data A
. (since R2024b)
[s](#btxc6c8-1-s) = summary(___)
returns a structure that contains a summary of the input data.
Examples
Create a matrix of type double
, and display a summary of the matrix that includes the default statistics for each matrix column.
A = rand(5,3); summary(A)
A: 5×3 double
NumMissing 0 0 0
Min 0.1270 0.0975 0.1576
Median 0.8147 0.5469 0.8003
Max 0.9134 0.9649 0.9706
Mean 0.6786 0.5691 0.6742
Std 0.3285 0.3921 0.3487
Display a summary that includes statistics for each matrix row.
A: 5×3 double
NumMissing Min Median Max Mean Std
0 0.09754 0.15761 0.81472 0.35663 0.39786
0 0.2785 0.90579 0.97059 0.71829 0.38225
0 0.12699 0.54688 0.95717 0.54368 0.4151
0 0.48538 0.91338 0.95751 0.78542 0.26078
0 0.63236 0.80028 0.96489 0.79918 0.16627
Create a categorical vector containing three categories.
A = categorical(["A";"B";"C";"A";"C"])
A = 5×1 categorical A B C A C
Display a summary of the vector that includes the number of occurrences of each category.
A: 5×1 categorical
A 2
B 1
C 2
<undefined> 0
Create a matrix of type double
and display a summary of the matrix that includes the sum of each matrix column in addition the default statistics.
A = rand(5,3); summary(A,Statistics=["default" "var" "sum"])
A: 5×3 double
NumMissing 0 0 0
Min 0.1270 0.0975 0.1576
Median 0.8147 0.5469 0.8003
Max 0.9134 0.9649 0.9706
Mean 0.6786 0.5691 0.6742
Std 0.3285 0.3921 0.3487
Var 0.1079 0.1537 0.1216
Sum 3.3932 2.8453 3.3710
Create a table with four variables of different data types.
num = rand(6,1); num2 = single(rand(6,1)); cat = categorical(["a";"a";"b";"a";"b";"c"]); dt = datetime(2016:2021,1,1)'; T = table(num,num2,cat,dt)
T=6×4 table
num num2 cat dt
_______ _______ ___ ___________
0.81472 0.2785 a 01-Jan-2016
0.90579 0.54688 a 01-Jan-2017
0.12699 0.95751 b 01-Jan-2018
0.91338 0.96489 a 01-Jan-2019
0.63236 0.15761 b 01-Jan-2020
0.09754 0.97059 c 01-Jan-2021
Display a summary of the table.
T: 6×4 table
Variables:
num: double
num2: single
cat: categorical (3 categories)
dt: datetime
Statistics for applicable variables:
NumMissing Min Median Max Mean Std
num 0 0.0975 0.7235 0.9134 0.5818 0.3776
num2 0 0.1576 0.7522 0.9706 0.6460 0.3708
cat 0
dt 0 01-Jan-2016 02-Jul-2018 12:00:00 01-Jan-2021 02-Jul-2018 12:00:00 16401:17:23
Load a table of data from the provided file.
Display a summary of the table with additional table and variable metadata, including custom metadata. Omit statistics from the summary.
summary(T,Detail="high",Statistics="none")
T: 100×4 table
Description: Simulated patient data
Variables:
Status: categorical
Instrument: [1×1 cell]
Age: double (Yrs)
Instrument: height rod
Smoker: logical
Instrument: [1×1 cell]
BloodPressure: 2-column double (mm Hg)
Description: Systolic/Diastolic
Instrument: bloodp pressure cuff
The summary includes the metadata properties that describe the table and its variables. Access the properties.
ans = TableProperties with properties:
Description: 'Simulated patient data'
UserData: []
DimensionNames: {'Row' 'Variables'}
VariableNames: {'Status' 'Age' 'Smoker' 'BloodPressure'}
VariableTypes: ["categorical" "double" "logical" "double"]
VariableDescriptions: {'' '' '' 'Systolic/Diastolic'}
VariableUnits: {'' 'Yrs' '' 'mm Hg'}
VariableContinuity: []
RowNames: {100×1 cell}
Custom Properties (access using t.Properties.CustomProperties.): Instrument: {'' 'height rod' '' 'bloodp pressure cuff'}
Create a timetable.
MeasurementTime = datetime(["2024-01-01";"2024-02-01";"2024-03-01"]); Temp = [37;39;42]; TT = timetable(MeasurementTime,Temp)
TT=3×1 timetable MeasurementTime Temp _______________ ____
01-Jan-2024 37
01-Feb-2024 39
01-Mar-2024 42
Return a summary of the timetable.
s = struct with fields: MeasurementTime: [1×1 struct] Temp: [1×1 struct]
The MeasurementTime
field of the structure contains a summary of the row times.
ans = struct with fields: Size: [3 1] Type: 'datetime' TimeZone: '' SampleRate: NaN StartTime: 01-Jan-2024 NumMissing: 0 Min: 01-Jan-2024 Median: 01-Feb-2024 Max: 01-Mar-2024 Mean: 31-Jan-2024 08:00:00 Std: 720:07:59 TimeStep: 1mo
The Temp
field of the structure contains a summary of the Temp
variable. Access the median.
Input Arguments
Input data, specified as an array, table, or timetable.
Note
Prior to R2024b, summarizing numeric, logical
,datetime
, duration
, andcalendarDuration
types was not supported.
Operating dimension for array, specified as a positive integer scalar, a vector of positive integers, or "all"
. If you do not specifydim
, then the default is the first array dimension whose size does not equal 1.
If the input array is categorical
, then dim
must be a scalar.
Consider an input matrix, A
:
summary(A,1)
displays statistics for each column ofA
.summary(A,2)
displays statistics for each row ofA
.
Specifying dim
is not supported when the input data is a table or timetable.
Name-Value Arguments
Specify optional pairs of arguments asName1=Value1,...,NameN=ValueN
, where Name
is the argument name and Value
is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.
Example: summary(A,Statistics="allstats")
Since R2024b
Level of detail to display for table or timetable input data, specified as one of these values:
"low"
— Provide a concise summary. Display the variable name, type, unit, and description for each table variable."high"
— Provide a verbose summary. Display all table and variable metadata in addition to details in"low"
. Forcategorical
variables,"high"
also displays the categories and counts.
summary
accesses metadata that describes a table and its variables through the [Properties](table.html#mw%5Fcb8d6608-0cf9-4164-b1f8-7bce6caa23ce)
property of the table.
The Detail
name-value argument does not configure the summary when you return the summary as a scalar structure. The summary structure always includes all table and variable metadata.
Example: summary(A,Detail="high")
displays table and variable metadata in addition to the variable names, types, units, and descriptions.
Since R2024b
Statistics to compute, specified as one or more of the following values. For table and timetable data, the specified statistics are computed for all applicable variables, including row times for timetable data.
For the "default"
value, the statistics to compute depend on the data type of the input data.
Data Type | Statistics to Compute |
---|---|
doublesingledurationdatetimeOther types | "nummissing""min""median""max""mean""std" |
Integer | "nummissing""min""median""max""mean" |
logical | "counts" |
Non-ordinal categorical | "counts""nummissing" |
Ordinal categorical | "counts""nummissing""min""median""max" |
stringcharCell array of character vectors | "nummissing" |
To compute a different set of statistics, you can specify one or more of these values. To specify multiple statistics, list the options in a string array or cell array.
Statistic | Description |
---|---|
"nummissing" | Number of missing elements |
"min" | Minimum |
"median" | Median |
"max" | Maximum |
"q1" | First quartile or 25th percentile |
"q3" | Third quartile or 75th percentile |
"mean" | Mean |
"std" | Standard deviation |
"var" | Variance |
"mode" | Mode |
"range" | Maximum minus minimum |
"sum" | Sum |
"numunique" | Number of distinct nonmissing elements |
"nnz" | Number of nonzero and nonmissing elements |
"counts" | Number of occurrences of each category |
"allstats" | All statistics previously listed |
"none" | No statistics |
You can also specify Statistics
as a function handle that must:
- Accept one input data argument.
- Return one output that is scalar or has the same size as the input data in all dimensions except for a size of 1 along the first dimension.
- For table or timetable input data, operate along each variable separately.
When summary
computes a statistic:
- If the function encounters an error, the summary does not include that statistic.
- If the function encounters missing values, it omits those values from the computation, with the exception of the
"nummissing"
statistic. To include missing values, use a function handle, such as@sum
instead of"sum"
.
Example: summary(A,Statistics=["mean" "var" "mode"])
computes the mean, variance, and mode.
Example: summary(A,Statistics={"default",myFun1})
computes the result of myFun1
in addition to the default statistics.
Since R2024b
Table or timetable variables to summarize, specified as one of the values in this table.
Variables in the table or timetable not specified by theDataVariables
name-value argument are not included in the summary.
Indexing Scheme | Values to Specify | Examples |
---|---|---|
Variable name | A string scalar or character vectorA string array or cell array of character vectorsA pattern object | "A" or 'A' — A variable named A["A" "B"] or {'A','B'} — Two variables named A andB"Var"+digitsPattern(1) — Variables named"Var" followed by a single digit |
Variable index | An index number that refers to the location of a variable in the tableA vector of numbersA logical vector. Typically, this vector is the same length as the number of variables, but you can omit trailing0 (false) values. | 3 — The third variable from the table[2 3] — The second and third variables from the table[false false true] — The third variable |
Function handle | A function handle that takes a table variable as input and returns a logical scalar | @isnumeric — All the variables containing numeric values |
Variable type | A vartype subscript that selects variables of a specified type | vartype("numeric") — All the variables containing numeric values |
Example: summary(A,DataVariables=["Var1" "Var2" "Var4"])
displays a summary of Var1
, Var2
, andVar4
.
Output Arguments
Summary of input data, returned as a scalar structure.
- If the input data is a table or timetable, then each field in
s
contains a summary of one of the variables. IfA
is a timetable,s
also contains a field with the summary of the row times. - If the input data is an array, then each field in
s
contains a property or statistic.
Extended Capabilities
Thesummary
function supports tall arrays with the following usage notes and limitations:
- Only tall tables and tall timetables are supported.
- Name-value arguments
Detail
,Statistics
, andDataVariables
are not supported. - Some calculations in the summary might be slow to complete with large data sets, such as the median and standard deviation, and are not included.
For more information, see Tall Arrays.
Usage notes and limitations:
- Only distributed tables are supported.
- Name-value arguments
Detail
,Statistics
, andDataVariables
are not supported. - Some calculations in the summary might be slow to complete with large data sets, such as the median and standard deviation, and are not included.
For more information, see Run MATLAB Functions with Distributed Arrays (Parallel Computing Toolbox).
Version History
Introduced in R2013b
You can now summarize array data, including numeric, logical
,datetime
, duration
, andcalendarDuration
types. Previously, the function supported array data only when it was categorical
.
You can configure the summary contents using one or more name-value arguments:
Statistics
— Specify which statistics to compute.Detail
— For table or timetable data only, specify the level of table metadata detail to display in the summary.DataVariables
— For table or timetable data only, specify the variables to summarize.
When you display a summary of a categorical array, the summary now always includes the number of undefined elements. Previously, the summary omitted the number of undefined elements if the array contained no missing values.
If you want to omit the number of undefined elements from the summary, specify theStatistics
name-value argument. For example,summary(A,Statistics="counts")
displays only the number of elements in each category.