histcounts - Histogram bin counts - MATLAB (original) (raw)
Syntax
Description
[[N](#buijikm-1-N),[edges](#buijikm-1-edges)] = histcounts([X](#buijikm-1-X))
partitions the X
values into bins and returns the bin counts and the bin edges. Thehistcounts
function uses an automatic binning algorithm that returns uniform bins chosen to cover the range of elements inX
and reveal the underlying shape of the distribution.
[[N](#buijikm-1-N),[edges](#buijikm-1-edges)] = histcounts([X](#buijikm-1-X),[nbins](#buijikm-1-nbins))
uses a number of bins specified by the scalar, nbins
.
[[N](#buijikm-1-N),[edges](#buijikm-1-edges)] = histcounts([X](#buijikm-1-X),[edges](#buijikm-1-edges))
sortsX
into bins with the bin edges specified by the vector,edges
.
[[N](#buijikm-1-N),[edges](#buijikm-1-edges),[bin](#buijikm-1-bin)] = histcounts(___)
also returns an index array,bin
, using any of the previous syntaxes.bin
is an array of the same size as X
whose elements are the bin indices for the corresponding elements inX
. The number of elements in the k
th bin is nnz(bin==k)
, which is the same asN(k)
.
[N](#buijikm-1-N) = histcounts([C](#buijikm-1-C))
, where C
is a categorical array, returns a vector,N
, that indicates the number of elements inC
whose value is equal to each of C
’s categories. N
has one element for each category inC
.
[N](#buijikm-1-N) = histcounts([C](#buijikm-1-C),[Categories](#buijikm-1-Categories))
counts only the elements in C
whose value is equal to the subset of categories specified by Categories
.
[[N](#buijikm-1-N),[Categories](#buijikm-1-Categories)] = histcounts(___)
also returns the categories that correspond to each count in N
using either of the previous syntaxes for categorical arrays.
[___] = histcounts(___,[Name,Value](#namevaluepairarguments))
specifies additional parameters using one or more name-value arguments. For example, you can specify BinWidth
as a scalar to adjust the width of the bins for numeric data.
Examples
Distribute 100 random values into bins. histcounts
automatically chooses an appropriate bin width to reveal the underlying distribution of the data.
X = randn(100,1); [N,edges] = histcounts(X)
N = 1×7
2 17 28 32 16 3 2
edges = 1×8
-3 -2 -1 0 1 2 3 4
Distribute 10 numbers into 6 equally spaced bins.
X = [2 3 5 7 11 13 17 19 23 29]; [N,edges] = histcounts(X,6)
edges = 1×7
0 4.9000 9.8000 14.7000 19.6000 24.5000 29.4000
Distribute 1,000 random numbers into bins. Define the bin edges with a vector, where the first element is the left edge of the first bin, and the last element is the right edge of the last bin.
X = randn(1000,1); edges = [-5 -4 -2 -1 -0.5 0 0.5 1 2 4 5]; N = histcounts(X,edges)
N = 1×10
0 24 149 142 195 200 154 111 25 0
Distribute all of the prime numbers less than 100 into bins. Specify 'Normalization'
as 'probability'
to normalize the bin counts so that sum(N)
is 1
. That is, each bin count represents the probability that an observation falls within that bin.
X = primes(100); [N,edges] = histcounts(X, 'Normalization', 'probability')
N = 1×4
0.4000 0.2800 0.2800 0.0400
edges = 1×5
0 30 60 90 120
Distribute 100 random integers between -5 and 5 into bins, and specify 'BinMethod'
as 'integers'
to use unit-width bins centered on integers. Specify a third output for histcounts
to return a vector representing the bin indices of the data.
X = randi([-5,5],100,1); [N,edges,bin] = histcounts(X,'BinMethod','integers');
Find the bin count for the third bin by counting the occurrences of the number 3
in the bin index vector, bin
. The result is the same as N(3)
.
Create a categorical vector that represents votes. The categories in the vector are 'yes'
, 'no'
, or 'undecided'
.
A = [0 0 1 1 1 0 0 0 0 NaN NaN 1 0 0 0 1 0 1 0 1 0 0 0 1 1 1 1]; C = categorical(A,[1 0 NaN],{'yes','no','undecided'})
C = 1×27 categorical no no yes yes yes no no no no undecided undecided yes no no no yes no yes no yes no no no yes yes yes yes
Determine the number of elements that fall into each category.
[N,Categories] = histcounts(C)
Categories = 1×3 cell {'yes'} {'no'} {'undecided'}
Input Arguments
Data to distribute among bins, specified as a vector, matrix, or multidimensional array. If X
is not a vector, thenhistcounts
treats it as a single column vector,X(:)
.
histcounts
ignores all NaN
values. Similarly, histcounts
ignores Inf
and-Inf
values unless the bin edges explicitly specifyInf
or -Inf
as a bin edge.
Data Types: single
| double
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
| logical
| datetime
| duration
Categorical data, specified as a categorical array.histcounts
ignores undefined categorical values.
Data Types: categorical
Number of bins, specified as a positive integer. If you do not specifynbins
, then histcounts
automatically calculates how many bins to use based on the values inX
.
Example: [N,edges] = histcounts(X,15)
uses 15 bins.
Bin edges, specified as a vector. edges(1)
is the leading edge of the first bin, and edges(end)
is the trailing edge of the last bin.
Each bin includes the leading edge, but does not include the trailing edge, except for the last bin which includes both edges.
For datetime
and duration
data,edges
must be a datetime
orduration
vector in monotonically increasing order.
Categories included in count, specified as a string vector, cell vector of character vectors, pattern scalar, or categorical vector. By default,histcounts
uses a bin for each category in categorical array C
. Use Categories
to specify a unique subset of the categories instead.
Example: h = histcounts(C,["Large","Small"])
counts only the categorical data in the categories Large
andSmall
.
Example: h = histcounts(C,"Y" + wildcardPattern)
counts categorical data in all the categories whose names begin with the letterY
.
Data Types: string
| cell
| pattern
| categorical
Name-Value Arguments
Specify optional pairs of arguments asName1=Value1,...,NameN=ValueN
, where Name
is the argument name and Value
is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose Name
in quotes.
Example: [N,edges] = histcounts(X,'Normalization','probability')
normalizes the bin counts in N
, such thatsum(N)
is 1.
Output Arguments
Bin counts, returned as a row vector.
Bin edges, returned as a vector. The first element is the leading edge of the first bin. The last element is the trailing edge of the last bin.
Bin indices, returned as an array of the same size asX
. Each element in bin
describes which numbered bin contains the corresponding element inX
.
A value of 0
in bin
indicates an element which does not belong to any of the bins (for example, aNaN
value).
Categories included in count, returned as a cell vector of character vectors. Categories
contains the categories inC
that correspond to each count inN
.
Tips
- The behavior of
histcounts
is similar to that of thediscretize
function. Usehistcounts
to find the number of elements in each bin. On the other hand, usediscretize
to find which bin each element belongs to (without counting).
Extended Capabilities
Thehistcounts
function supports tall arrays with the following usage notes and limitations:
- Some input options are not supported. The allowed options are:
BinWidth
BinLimits
Normalization
BinMethod
— The'auto'
and'scott'
bin methods are the same. The'fd'
bin method is not supported.
For more information, see Tall Arrays.
Usage notes and limitations:
- Code generation does not support sparse matrix inputs for this function.
- If you do not supply bin edges, then code generation might require variable-size arrays and dynamic memory allocation.
- The
Categories
input argument does not support pattern expressions.
Usage notes and limitations:
- Code generation does not support sparse matrix inputs for this function.
- If you do not supply bin edges, then code generation might require variable-size arrays and dynamic memory allocation.
- The
Categories
input argument does not support pattern expressions.
The histcounts
function supports GPU array input with these usage notes and limitations:
- 64-bit integers are not supported.
For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).
Version History
Introduced in R2014b
You can normalize histogram values as percentages by specifying theNormalization
name-value argument as'percentage'
.
The histcounts
function shows improved performance for numeric and logical data due to faster input parsing. The performance improvement is more significant when input parsing is a greater portion of the computation time. This situation occurs when the size of the data to distribute among bins is smaller than 2000 elements.
For example, this code calculates histogram bin counts for a 1000-element vector. The code is about 3x faster than in the previous release.
function timingHistcounts X = rand(1,1000); for k = 1:3e3 histcounts(X,"BinMethod","auto"); end end
The approximate execution times are:
R2022b: 0.62 s
R2023a: 0.21 s
The code was timed on a Windows® 10, Intel® Xeon® CPU E5-1650 v4 @ 3.60 GHz test system using thetimeit
function.
timeit(@timingHistcounts)