summary - Data summary - MATLAB (original) (raw)

Syntax

Description

summary([A](#btxc6c8-1-A)) displays a summary that includes the properties of and statistics for the input data.

example

summary([A](#btxc6c8-1-A),[dim](#btxc6c8-1-dim)) operates along dimension dim. For example, you can summarize each row in a matrix A using summary(A,2).

example

summary(___,[Name=Value](#namevaluepairarguments)) specifies options using one or more name-value arguments in addition to any of the input combinations in the previous syntaxes. For example,summary(A,Statistics="std") includes only the standard deviation of the input data A. (since R2024b)

example

[s](#btxc6c8-1-s) = summary(___) returns a structure that contains a summary of the input data.

example

Examples

collapse all

Create a matrix of type double, and display a summary of the matrix that includes the default statistics for each matrix column.

A = rand(5,3); summary(A)

A: 5×3 double

NumMissing           0             0             0  
Min             0.1270        0.0975        0.1576  
Median          0.8147        0.5469        0.8003  
Max             0.9134        0.9649        0.9706  
Mean            0.6786        0.5691        0.6742  
Std             0.3285        0.3921        0.3487  

Display a summary that includes statistics for each matrix row.

A: 5×3 double

NumMissing      Min      Median       Max       Mean        Std  

    0         0.09754    0.15761    0.81472    0.35663    0.39786
    0          0.2785    0.90579    0.97059    0.71829    0.38225
    0         0.12699    0.54688    0.95717    0.54368     0.4151
    0         0.48538    0.91338    0.95751    0.78542    0.26078
    0         0.63236    0.80028    0.96489    0.79918    0.16627

Create a categorical vector containing three categories.

A = categorical(["A";"B";"C";"A";"C"])

A = 5×1 categorical A B C A C

Display a summary of the vector that includes the number of occurrences of each category.

A: 5×1 categorical

 A                2 
 B                1 
 C                2 
 <undefined>      0 

Create a matrix of type double and display a summary of the matrix that includes the sum of each matrix column in addition the default statistics.

A = rand(5,3); summary(A,Statistics=["default" "var" "sum"])

A: 5×3 double

NumMissing           0             0             0  
Min             0.1270        0.0975        0.1576  
Median          0.8147        0.5469        0.8003  
Max             0.9134        0.9649        0.9706  
Mean            0.6786        0.5691        0.6742  
Std             0.3285        0.3921        0.3487  
Var             0.1079        0.1537        0.1216  
Sum             3.3932        2.8453        3.3710  

Create a table with four variables of different data types.

num = rand(6,1); num2 = single(rand(6,1)); cat = categorical(["a";"a";"b";"a";"b";"c"]); dt = datetime(2016:2021,1,1)'; T = table(num,num2,cat,dt)

T=6×4 table num num2 cat dt
_______ _______ ___ ___________

0.81472     0.2785     a     01-Jan-2016
0.90579    0.54688     a     01-Jan-2017
0.12699    0.95751     b     01-Jan-2018
0.91338    0.96489     a     01-Jan-2019
0.63236    0.15761     b     01-Jan-2020
0.09754    0.97059     c     01-Jan-2021

Display a summary of the table.

T: 6×4 table

Variables:

num: double
num2: single
cat: categorical (3 categories)
dt: datetime

Statistics for applicable variables:

        NumMissing          Min                   Median                   Max                    Mean                    Std      

num         0                0.0975                      0.7235             0.9134                      0.5818             0.3776  
num2        0                0.1576                      0.7522             0.9706                      0.6460             0.3708  
cat         0                                                                                                                      
dt          0           01-Jan-2016        02-Jul-2018 12:00:00        01-Jan-2021        02-Jul-2018 12:00:00        16401:17:23  

Load a table of data from the provided file.

Display a summary of the table with additional table and variable metadata, including custom metadata. Omit statistics from the summary.

summary(T,Detail="high",Statistics="none")

T: 100×4 table

Description: Simulated patient data

Variables:

Status: categorical
    Instrument:  [1×1 cell]
Age: double (Yrs)
    Instrument:  height rod
Smoker: logical
    Instrument:  [1×1 cell]
BloodPressure: 2-column double (mm Hg)
    Description:  Systolic/Diastolic
    Instrument:  bloodp pressure cuff

The summary includes the metadata properties that describe the table and its variables. Access the properties.

ans = TableProperties with properties:

         Description: 'Simulated patient data'
            UserData: []
      DimensionNames: {'Row'  'Variables'}
       VariableNames: {'Status'  'Age'  'Smoker'  'BloodPressure'}
       VariableTypes: ["categorical"    "double"    "logical"    "double"]
VariableDescriptions: {''  ''  ''  'Systolic/Diastolic'}
       VariableUnits: {''  'Yrs'  ''  'mm Hg'}
  VariableContinuity: []
            RowNames: {100×1 cell}

Custom Properties (access using t.Properties.CustomProperties.): Instrument: {'' 'height rod' '' 'bloodp pressure cuff'}

Create a timetable.

MeasurementTime = datetime(["2024-01-01";"2024-02-01";"2024-03-01"]); Temp = [37;39;42]; TT = timetable(MeasurementTime,Temp)

TT=3×1 timetable MeasurementTime Temp _______________ ____

01-Jan-2024         37 
01-Feb-2024         39 
01-Mar-2024         42 

Return a summary of the timetable.

s = struct with fields: MeasurementTime: [1×1 struct] Temp: [1×1 struct]

The MeasurementTime field of the structure contains a summary of the row times.

ans = struct with fields: Size: [3 1] Type: 'datetime' TimeZone: '' SampleRate: NaN StartTime: 01-Jan-2024 NumMissing: 0 Min: 01-Jan-2024 Median: 01-Feb-2024 Max: 01-Mar-2024 Mean: 31-Jan-2024 08:00:00 Std: 720:07:59 TimeStep: 1mo

The Temp field of the structure contains a summary of the Temp variable. Access the median.

Input Arguments

collapse all

Input data, specified as an array, table, or timetable.

Note

Prior to R2024b, summarizing numeric, logical,datetime, duration, andcalendarDuration types was not supported.

Operating dimension for array, specified as a positive integer scalar, a vector of positive integers, or "all". If you do not specifydim, then the default is the first array dimension whose size does not equal 1.

If the input array is categorical, then dim must be a scalar.

Consider an input matrix, A:

Specifying dim is not supported when the input data is a table or timetable.

Name-Value Arguments

collapse all

Specify optional pairs of arguments asName1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: summary(A,Statistics="allstats")

Since R2024b

Level of detail to display for table or timetable input data, specified as one of these values:

summary accesses metadata that describes a table and its variables through the [Properties](table.html#mw%5Fcb8d6608-0cf9-4164-b1f8-7bce6caa23ce) property of the table.

The Detail name-value argument does not configure the summary when you return the summary as a scalar structure. The summary structure always includes all table and variable metadata.

Example: summary(A,Detail="high") displays table and variable metadata in addition to the variable names, types, units, and descriptions.

Since R2024b

Statistics to compute, specified as one or more of the following values. For table and timetable data, the specified statistics are computed for all applicable variables, including row times for timetable data.

For the "default" value, the statistics to compute depend on the data type of the input data.

Data Type Statistics to Compute
doublesingledurationdatetimeOther types "nummissing""min""median""max""mean""std"
Integer "nummissing""min""median""max""mean"
logical "counts"
Non-ordinal categorical "counts""nummissing"
Ordinal categorical "counts""nummissing""min""median""max"
stringcharCell array of character vectors "nummissing"

To compute a different set of statistics, you can specify one or more of these values. To specify multiple statistics, list the options in a string array or cell array.

Statistic Description
"nummissing" Number of missing elements
"min" Minimum
"median" Median
"max" Maximum
"q1" First quartile or 25th percentile
"q3" Third quartile or 75th percentile
"mean" Mean
"std" Standard deviation
"var" Variance
"mode" Mode
"range" Maximum minus minimum
"sum" Sum
"numunique" Number of distinct nonmissing elements
"nnz" Number of nonzero and nonmissing elements
"counts" Number of occurrences of each category
"allstats" All statistics previously listed
"none" No statistics

You can also specify Statistics as a function handle that must:

When summary computes a statistic:

Example: summary(A,Statistics=["mean" "var" "mode"]) computes the mean, variance, and mode.

Example: summary(A,Statistics={"default",myFun1}) computes the result of myFun1 in addition to the default statistics.

Since R2024b

Table or timetable variables to summarize, specified as one of the values in this table.

Variables in the table or timetable not specified by theDataVariables name-value argument are not included in the summary.

Indexing Scheme Values to Specify Examples
Variable name A string scalar or character vectorA string array or cell array of character vectorsA pattern object "A" or 'A' — A variable named A["A" "B"] or {'A','B'} — Two variables named A andB"Var"+digitsPattern(1) — Variables named"Var" followed by a single digit
Variable index An index number that refers to the location of a variable in the tableA vector of numbersA logical vector. Typically, this vector is the same length as the number of variables, but you can omit trailing0 (false) values. 3 — The third variable from the table[2 3] — The second and third variables from the table[false false true] — The third variable
Function handle A function handle that takes a table variable as input and returns a logical scalar @isnumeric — All the variables containing numeric values
Variable type A vartype subscript that selects variables of a specified type vartype("numeric") — All the variables containing numeric values

Example: summary(A,DataVariables=["Var1" "Var2" "Var4"]) displays a summary of Var1, Var2, andVar4.

Output Arguments

collapse all

Summary of input data, returned as a scalar structure.

Extended Capabilities

expand all

Thesummary function supports tall arrays with the following usage notes and limitations:

For more information, see Tall Arrays.

Usage notes and limitations:

For more information, see Run MATLAB Functions with Distributed Arrays (Parallel Computing Toolbox).

Version History

Introduced in R2013b

expand all

You can now summarize array data, including numeric, logical,datetime, duration, andcalendarDuration types. Previously, the function supported array data only when it was categorical.

You can configure the summary contents using one or more name-value arguments:

When you display a summary of a categorical array, the summary now always includes the number of undefined elements. Previously, the summary omitted the number of undefined elements if the array contained no missing values.

If you want to omit the number of undefined elements from the summary, specify theStatistics name-value argument. For example,summary(A,Statistics="counts") displays only the number of elements in each category.