Detection of epileptiform activity in EEG signals based on time-frequency and non-linear analysis (original) (raw)

Materials and methods

Materials

The EEG signals used to design and test the new technique were recorded at the University Hospital Bonn, Germany with the same 128-channel amplifier system (Andrzejak et al., ). After 12 bit analog-to-digital conversion the EEG signals were saved in a data acquisition system at a sampling rate of 173.61 Hz. The amplifier range was adjusted well so that the recordings could be made with 12 bits. The recorded EEG signals were further passed through a low pass filter with the finite impulse response and bandwidth of 0–60 Hz. The frequencies higher than 60 Hz mostly present noise and are a very small part of the signal total energy in the frequency band up to 86.8 Hz saved by the acquisition system. We used 100 segments of epileptic and 200 segments of non-epileptic EEG signals to design and test our new technique. The epileptic EEG signals were recorded using cortical electrodes from 5 epileptic patients during seizure from within the seizure focus, i.e., the region of unhealthy brain tissue that was later removed by surgery. The first 100 segments of non-epileptic EEG signals were also recorded using cortical electrodes from the same epileptic patients and the same unhealthy brain tissue but during seizure-free interval. The remaining 100 segments of non-epileptic EEG signals were recorded using scalp electrodes from 5 healthy volunteers and of course their healthy brain tissue. So, there was a total of three groups with 100 segments of the EEG signals. All the segments have duration of 4096 samples, i.e., 23.6 s, and were additionally tested on the weak stationarity (Andrzejak et al., ) in order to perform non-linear analysis. Since the EEG signals were recorded from different patients and with different electrodes, all extracted EEG signal segments were also additionally normalized in order to have the same zero mean and unit variance as shown in Figure 1. In this way, we wanted to design a detection technique that is not dependent on patient and the EEG recording system either.

Methods

There are five broad sub-bands of the EEG signal which are generally of clinical interest: delta (0–4 Hz), theta (4–8 Hz), alpha (8–16 Hz), beta (16–32 Hz), and gamma waves (32–64 Hz). Higher frequencies are often more common in abnormal brain states such as epilepsy, i.e., there is a shift of EEG signal energy from lower to higher frequency bands before and during a seizure (Gajić et al., ). These five frequency sub-bands provide more accurate information about neuronal activities underlying the problem. Consequently, some changes in the EEG signal, which are not so obvious in the original full-spectrum signal, can be amplified when each sub-band is considered independently. Thus, we extract features from each sub-band separately and also in time, frequency and time-frequency domain as well as by non-linear analysis. After the feature extraction we reduce dimension of the feature space to two. Finally, two quadratic classifiers able to separate all three groups of the EEG signals from each other are designed. The entire structure of the technique is shown in Figure 2.

Time-frequency domain analysis

Since the segments of the EEG signals have already been normalized and all have zero mean and unit variance, additional extraction of these two features as well as coefficient of variation as function of mean value and variance, does not make any sense. However, we extracted the total variation as another measure of signal variability in the time domain even after normalization since it counts number of signal sign changes or signal polarity. In the case of a signal segment _x_[_n_] of N samples, i.e., n = 1, 2 ··· N, the total variation is given by:

where the signal is essentially normalized by the difference between its maximum and minimum values in the segment of interest. Obviously, the value of the total variation is located in the range between 1/(N − 1) for slower signals and 1 for signals with very high and frequent changes.

EEG signals, as the outcome of events with different repetition periods, contain signals whose different frequencies cannot be identified in the time domain, since all these signals are shown together. Thus, signal transformation from the time domain to the frequency domain is necessary, which in the case of a signal segment _x_[_n_] of N samples is achieved using the fast Fourier transform (FFT) defined by:

where ω = 2π_f_/fs represents the angular frequency discretized in N samples (Proakis and Manolakis, ). In order to avoid discontinuities between the end and beginning of the segments and thus spurious spectral frequency components the beginning of each segment was chosen in such a way that the amplitude difference of the last and first data points was within the range of amplitude differences of consecutive data points, and the slopes at the end and beginning of each segment had the same sign. This procedure reduces edge effects that result in spectral leakage in the FFT spectrum. In order to further minimize spectral leakage windowing of signal segments by the Hamming window (the sum of a rectangle and a Hanning window) is used before application of the FFT. Considering the fact that by transforming the signal into the frequency domain we do not lose any original information from the time domain, the signal can completely be reconstructed using the inverse Fourier transform by:

Clearly, the longer the segment _x_[_n_], i.e., the larger N, the greater the frequency resolution.

Power spectral density is also one of the most important features of the signal in the frequency domain and represents the contribution of each individual frequency component to the power of the whole signal segment _x_[_n_]. In practice, power spectral density is usually estimated using the coefficients of the fast Fourier transform, i.e., the periodogram (Welch, 1967) given by:

which is an unbiased and inconsistent estimator. Thus, with the increase in the length of the signal segment, the mean of the estimation tends toward the actual value of power spectral density, which is actually an advantage, unlike variance estimation, which is not reduced, i.e., which does not have a tendency toward zero with the increase in segment length. A periodogram can be further normalized by the total signal power, i.e.,:

where we obtain the relative contribution of each frequency component to the total power of the signal. If the original signal segment _x_[_n_] is further divided into P sub-segments of the N/P samples, the periodogram can be calculated as follows:

where _fftp_[ω] is the fast Fourier transform of each of the sub-segments of the N/P sample. In this way, the periodogram is actually an averaged one with a smaller variance, but clearly with a lower resolution in the frequency domain. Based on the periodogram we extracted relative power of all five previously mentioned sub-bands, i.e., delta (0–4 Hz), theta (4–8 Hz), alpha (8–12 Hz), beta (12–30 Hz), and gamma (30–60 Hz), as features of interest in frequency domain.

By analyzing the EEG signals solely in the time domain, extracted features do not contain any information on frequencies, which are, as we will later show, also very important for the proper detection of epileptic EEG signals. On the other hand, by transforming the signals from the time into the frequency domain, any information on time is completely lost, except of course in the case of sequential application on sufficiently short and stationary sub-segments, which also has its disadvantage in terms of the correct choice of the length of these sub-segments which would enable the simultaneous achievement of the desired resolution in both domains. In addition, once selected, the sub-segment length, i.e., the resolution in the time domain, remains fixed throughout the entire frequency bands and cannot be adjusted to the dominant signal frequencies at a specific time. Signal processing using wavelets very accurately resolves this deficiency and results in sufficient information on non-stationary signals, both in the time and frequency domain. We are already familiar with the fact that a signal can be presented as a linear combination of its basic functions. A unit impulse function whose power is limited and whose mean differs from zero is the basic function of the signal in the time domain, whereas in the frequency domain, this role is assigned to the sinusoidal function that has infinite power, and a zero mean. In the time-frequency domain, the basic function is the wavelet, which is actually a function of limited power, i.e., duration, and a zero mean (Rao and Bopardikar, ), and for which the following is valid:

The wavelet that is moved, or translated, in time for b samples and scaled by the so-called dilation parameter a is given by:

By changing the dilation parameter, the basic wavelet (a = 1) changes its width, that is, it spreads (a > 1) and contracts (0 ≤ a < 1) in the time domain. In the analysis of non-stationary signals, the possibility of changing the width of the wavelet represents a significant advantage of this analysis technique, considering the fact that wider wavelets can be used to extract slower changes, i.e., lower signal frequencies, and narrower wavelets can be used to extract faster changes, i.e., higher frequencies. Following the selection of the values of parameters a and b it is possible to transform segments of the signal _x_[_k_] of N samples, that is, to calculate the wavelet transform coefficients in the following way:

Thus, what is actually being extracted from the signal are only those frequencies that are within the wavelet frequency band ψ_ab_[_n_], i.e., the signals are filtrated by the wavelet ψ_ab_[_n_]. As previously indicated, based on the coefficients obtained in this way, the original signal can be reconstructed using an inverse wavelet transform. Of course, if necessary, it is possible to also independently reconstruct the part of the signal which is filtered, as well as the part that was rejected by the wavelet ψ_ab_[_n_] on the basis of the so-called detail coefficients and approximation coefficients respectively, which are of course a function of the transformation coefficients ψ_ab_[_n_].

Parameters a and b can continuously change, which is not so practical especially bearing in mind that the signal can be completely and accurately transformed and reconstructed by using a smaller and finite number of wavelets, that is, by using a limited number of discrete values of parameters a and b, which is also known as the discrete wavelet transform (DWT). In this case, parameters a and b are the powers of 2, which gives us the dyadic orthogonal wavelet network with frequency bands which do not overlap each other. The dilation parameter a, as the power of 2, at each subsequent higher level of transformation, doubles in value in comparison to the value from the previous level, which means that the wavelet becomes twice as wide in the time domain, and has a frequency band that is half as narrow and twice as low. This actually decreases the resolution of the transformed signal in the time domain two-fold, increasing it twice as much in the frequency domain. Thus, the signal frequency band from the previous level is split into two halves at every next level, into a higher band which contains higher frequencies and describes the finer changes, or details, and a lower band that contains lower frequencies and actually represents an approximation of the signal from the previous level. This technique is also known as wavelet decomposition of the signal.

Before the application of DWT, it is necessary to choose the type of the basic wavelet as well as the number of levels into which the signal will be decompose. After analysis of several types of the basic wavelets, the fourth-order Daubechies wavelet (Rao and Bopardikar, ) was selected for further analysis within this work since it has good localizing properties both in the time and frequency domains (Kalayci and Özdamar, ; Petrosian et al., ) Due to its shape and smoothing feature this type of the basic wavelet has already shown good capabilities in the field of EEG signal processing. The discrete wavelet decomposition was performed at four levels that resulted into five sub-bands of clinical interest. The standard deviation and the average relative power of the DWT coefficients in each of the sub-bands were extracted as representative features in time-frequency domain.

Non-linear analysis

EEG signals, as the result of the activities of an extremely complex and non-linear system, in addition to the fairly well-known and previously described linear techniques, can also be analyzed using some of the non-linear techniques. By using linear techniques, any non-linearity that can be found in the signal is only approximated, which can result in the loss of certain pieces of potentially relevant information. If that is the case, the use of non-linear techniques is preferred since they are more reliable for non-linear analyses, despite the fact that they imply weak signal stationarity (Varsavsky et al., 2011), and the fact that they need somewhat longer segments, which leads to their being computationally more demanding than linear techniques.

Let _x_[_n_] again represent the signal segment which is to be analyzed, where n = 1 ··· N. Also, let m denote the lag for which we can define two new sub-segments _x_[n_], the first xk containing samples starting from k up to N − m and the second x k + m with samples starting from k + m to N. Both of these sub-segments contain N − k − m + 1 samples and can be represented opposite one another in the phase space with a lag m and the so-called embedding dimension 2. In case of three sub-segments: x k + 2_m, x k + m and xk, the embedding dimension of the phase space would be 3. The lagged phase space provides a completely different view of signal evolution in time, where we can note that the signal gravitates to a certain part of the phase space, known as the attractor. With the aim of constructing lagged phase space, i.e., the signal attractor, it is necessary to previously define the values of the lag and the embedding dimension, which although significantly smaller than the real dimension of the non-linear system space, provides an approximation of the signal complexity and non-linearity (Andrzejak et al., ). The lag m should be large enough so that these sub-segments would overlap as little as possible, that is, share as little mutual information as possible, but at the same time sufficiently small so that the sub-segments could be long enough for any further useful analysis. An optimal lag is obtained by determining the mutual information coefficient the sub-segments for different values of the lag m. The mutual information coefficient is defined by Williams (1997):

where Ns represents the number of areas in which the signal is discretized based on the amplitude and p is the corresponding probability that the sub-segment belongs to a certain area. The first local minimum shown in the graph representing the dependence of the mutual information coefficient on lag determines the optimal lag mo.

After determining the optimal lag, the minimum embedding dimension of the lagged phase space is estimated using Cao's technique (Cao, ). In the phase space with a lag mo and embedding dimension d, the original segment is represented by its phase portraits, which all together make up the attractor defined by the following points in the lagged phase space:

where i = 1, 2, ···, N − mo(d − 1). According to the technique developed by Cao, if d is the right dimension, then the two points are also close to each other in phase space dimension d, as well as in the phase space of dimension d + 1 and are referred to as real neighbors (Cao, ). Dimension increases gradually until the number of false neighbors reaches zero, that is, until the Cao's embedding function defined by:

becomes constant, where i = 1, 2, ···, N − mod and _yd_[n i, _d_] represents the nearest neighbor of _yd_[_i_] in the _d_-dimensional phase space with a lag mo. In fact, the minimum embedding dimension dmin is determined when the ratio between the e d + 1/ed approaches the value of 1. Since this ratio may approach 1 in some other cases, e.g., for completely random signals, an additional check is also carried out where the Cao's embedding function is redefined and given by:

where _x_[n i, d + _mod_] is the nearest neighbor of _x_[i + _mod_]. The constant value of the ratio e*d + 1/e*d for different values of the embedding dimension indicates that we are dealing with a random signal. The signal is not random, i.e., it is deterministic if this ratio differs from 1 for at least one value of the embedding dimension, which in that case is also the minimum value.

The correlation dimension is a measure of the complexity of the signal attractor in the lagged phase space. This dimension, unlike most others better known dimensions, may have a fractional value and could thus characterize the dimension, that is, the complexity of the attractors with more precision than the embedding dimension; however, it is always less than or equal to the embedding dimension.

Let _C_ε be the correlational sum of the signal segment with N samples within the radius ε in its phase space with a lag mo and minimum embedding dimension dmin, i.e., M = N − modmin points ydmin given by Williams (1997):

where H is the Heaviside step function that results in 1 if _ydmin_[_j_] is within the radius ε of _ydmin_[_i_], i.e.,:

otherwise it is 0. The correlation dimension dcorr is the approximated slope of the natural logarithm of the correlation sum as a function of ε. Given that the total number of possible distances between two points in a lagged phase space equals M(M − 1)/2, the correlation dimension could directly be obtained by the Takens estimator (Takens, 1981; Cao, ) using:

The largest Lyapunov exponent λ_max_ represents a measure of both chaotic behavior of the attractor and the divergence of the trajectories in phase space, i.e., the predictability of the signal. Attractor divergence is the distance between two closely positioned points in a phase space after a certain period of time of k samples, which is also known as the prediction length. Based on chaos theory, i.e., the so-called butterfly effect, two points close in the phase space of a chaotic system may have completely different trajectories. Thus, the divergence of the trajectories implies a chaotic system, and vice versa. The Lyapunov exponent actually characterizes the exponential growth of that divergence. The number of Lyapunov exponents is equal to the embedding dimension, and each of these Lyapunov exponents represents the rate of a contracting (λ <0) or expanding attractor (λ >0) in a certain direction of the phase space. In the case of a chaotic system, the trajectories must diverge in at least one dimension, which means that at least one Lyapunov exponent must be greater than zero, when it is, at the same time, the largest Lyapunov exponent. If several Lyapunov exponents are positive, then the largest among them indicates the direction of the maximum expansion of the attractor and its chaotic behavior. The mean of the trajectory divergence after k samples and a sampling period Ts can be calculated by the Wolf's technique (Wolf et al., 1985; Rosenstein et al., 1993) using:

where _ydmin_[_i_] and _ydmin_[ni_] represent two close points on different trajectories in the phase space. The largest Lyapunov exponent λ_max is in this case an approximation of the slope of the natural logarithmic trajectory divergence as a function of the number of samples k, i.e., dT = d_0_e kT s_λ_max where _d_0 stands for the initial divergence. In addition, there is another very similar more practical technique for the evaluation of the largest Lyapunov exponent proposed by Sato et al. where we first calculate the prediction error for several different values of the number of samples k using:

after which the λ_max_ is determined as the slope of the middle and approximately linear part of the prediction error pk as a function of kTs.

We extract both the correlation dimension and the largest Lyapunov exponent as features that describe complexity and chaotic behavior of the attractor in the lagged phase space. By choosing the radius ε, the phase space is divided into parts of the dimension ε. While the correlation dimension shows how many points can be found in the surrounding areas of the phase space, the Lyapunov exponent describes the distance between each of the trajectories that terminate in different parts of the phase space but start from the same one. In other words, both of these features give us an idea of how complex and predictable EEG signal is, which, of course, they both interpret and quantify in their own characteristic way.

Dimension reduction in feature space

Let an _n_-dimensional random vector X be transformed through the application of a certain linear transformation into an n_-dimensional random vector Y = ATX where A is the transformational square matrix of the dimension n. Then the mean vector and the covariance matrix of the random vector Y are MY = ATMX and Σ_Y = AT_Σ_X A. Based on that, the distance function is:

that is, the distance function does not change with the linear transformation. If we were to perform the translation of the coordinate system for the mean vector MX we would obtain the random vector Z = X − MX whose mean vector is zero and its covariance matrix is the same as Σ_X_. If we wanted to determine the random vector Z which maximizes the distance function d_2_Z(Z) = ZT_Σ−1_Z under the condition that ZTZ = 1, it is necessary to minimize the following criterion:

where μ is the Lagrange multiplier. By using a partial derivate ∂J/∂Z and by equating it with zero, we obtain the following:

where λ = 1/μ. With the aim of obtaining a non-zero solution which satisfies the equation:

it is further necessary to find such a parameter λ which satisfies the following so-called characteristic equation of a matrix Σ:

Every λ which satisfies this characteristic equation is known as eigenvalue of the matrix Σ while the vector Z related to specific eigenvalue is known as an eigenvector. When Σ is a symmetric n × n matrix, then there are n real eigenvalues λ1, λ2, …, λ_n_ and n real eigenvectors Φ1, Φ2, …, Φ_n_ which are mutually orthogonal and for which Σ Φ = Φ Λ and Φ_T_Φ = I where Φ = [Φ1 Φ2 ··· Φ_n_] is the square matrix of the eigenvectors, Λ the diagonal matrix of the eigenvalues:

while I is the identity matrix.

If the matrix Φ is used as a transformation matrix during the linear transformation Y = Φ_TX_, then the covariance matrix of the random vector Y will be Σ_Y_ = Φ_T_Σ_X_Φ = Λ. This kind of transformation is orthonormal since for the transformation matrix Φ holds Φ_T_Φ = I. In addition, during all these orthonormal transformations, the Euclidean distance does not change, that is ||Y||2 = YTY = XT_Φ_T_Φ_X = XTX = ||X||2.

Let X be an _n_-dimensional random vector of the extracted features which could be represented using n linear independent vectors in the following way:

where Φ = [Φ1 Φ2 ··· Φ_n_] and Y = [y_1_y_2 ··· yn_] that is Φ_i are the basis vectors of the new n_-dimensional space, and the new coordinates yi are the scalar products of the basis vectors Φ_i and the random vector X. Assuming that the columns of the matrix Φ or in other words the basis vectors Φ_i are orthogonal, the coordinates of the random vector X in the new space can be obtained in the following way:

Thus, Y represents a mapped random vector and the orthonormal transformation of the original random vector X. The random vector X approximated using only the m (m < n) basis vectors, i.e., the mapped features, could be represented in the following way:

where the approximation error becomes:

and the mean squared error:

has its own minimal value for bi = E{yi} = Φ_TiE_{X}. The optimal mean squared error can then be presented in the following form:

where Σ_X_ is the covariance matrix of the random vector X and λ_i_ are its eigenvalues. Thus, the minimal mean squared error of approximation is also equal to the sum of the eigenvalues of the leftout coordinates, which actually means that we should leave out coordinates with the smallest eigenvalues. The mapping of the random vector X into the space made up by the eigenvectors of its covariance matrix Σ_X_ is known as the Karhunen-Loeve (KL) expansion. When reducing the dimension of the feature space using the KL expansion technique we should bear in mind that the performance of each feature is characterized by its eigenvalue. Thus, by rejecting features we should first reject those with the smallest eigenvalue, i.e., with the smallest variance in the new feature space. For example, in the case of dimension reduction from two to one shown in Figure 3 the feature _y_2 would be rejected as less informative even though it has better discriminatory potential than _y_1. Also the coordinates yi are mutually uncorrelated considering that the covariance matrix of the random vector Y is diagonal, i.e.,:

Unlike the previously outlined method, the reduction of dimension based on scatter matrices (Fukunaga, ; Djurovic, ) is of special significance for the new detection technique since it takes into consideration the very purpose of the reduction, that is, the classification of the random vectors. Let L be the number of classes which should be classified and Mi and Σ_i_, i = 1 ··· L the mean vectors and the covariance matrices of these classes, respectively. Then the within-class scatter matrix can be defined by:

and the between-class scatter matrix as:

where _M_0 is the joint vector of mathematical expectation for all the classes together, that is:

In addition the mixed scatter matrix can be defined by:

Then the problem of dimension reduction is reduced to the identification of the n × m transformation matrix A which maps the random vector X of dimension n onto the random vector Y = ATX of dimension m and at the same time maximizes the criteria J = tr(S_−1_W SB). This criteria is invariant to non-singular linear transformations and results into transformation matrix that takes the following form:

where Ψ_i_, i = 1, …, m are the eigenvectors of the matrix S_−12_S_1 which correspond to the greatest eigenvalues, i.e., (S_−1_W SB)Ψ_i = λ_i_Ψ_i_, i = 1, …, n, λ1 ≥ λ2 ≥ ··· ≥ λ_n_. Dimension reduction based on scatter matrices applied to the case shown in Figure 3 would result into selection of the feature _y_2 that is much better choice than the feature _y_1 selected by the KL expansion technique, of course in terms of more accurate classification.

Design of quadratic classifiers

Quadratic classifiers are already known to be very good robust solutions to the problems of classification of random vectors whose statistical features are either unknown or change over time. Additionally, quadratic classifiers allow visual insight into the classification results. We design a piecewise quadratic classifier for detection of epileptiform activity, i.e., two quadratic classifiers, able to separate all three classes of the EEG signals of interest as shown in Figure 2. The quadratic classifiers have the same structure defined by the following equation:

where _y_1 and _y_2 are two features in the reduced feature space. The matrix Q, the vector V and scalar ν0 are the unknowns which are also need to be determined optimally. The quadratic equation (37) can be represented in a linear form as:

In order to also achieve the largest possible between-class and shortest within-class scattering during the dimension reduction in the feature space, for the optimization criterion we have selected the following function (Fukunaga, ):

where _P_1 and _P_2 are probabilities and

Ml and Σ_l_ are the mean vectors and covariance matrices, respectively, of the random vector Z for each of the two classes l that need to be classified. By optimizing the function f, for the optimal vector Vz, i.e., matrix Q and vector V from Equation (37), we have:

and for the optimal scalar:

which finishes the design of the quadratic classifiers as well as the new technique for detection of epileptiform activity.

Statistical performances such as sensitivity, specificity and accuracy of the designed piecewise quadratic classifier, i.e., the new technique for detection of epileptiform activity, is estimated based on the classification results. The sensitivity is defined as a ratio between the number of correctly classified segments and the total number of the segments for each of the classes separately. The specificity is also calculated for each of these three classes separately and represents the ratio between the number of correctly classified features of the other two classes and the total number of the segments of these two classes. The accuracy is calculated as the ratio between the total number of correctly classified segments and the total number of the segments in all three classes together.

Results

Feature extraction

In total 30 features for each of 300 analyzed segments of the EEG signals were extracted. All the features together with their mean values and standard deviations for all three different classes of EEG signals of interest are presented in Table 1. The extracted features refer to the adequate clinical sub-bands since these sub-bands had better discrimination characteristics compared with the whole frequency band between 0 and 60 Hz. The separability index as a measure of the discriminatory potential was also calculated for all the extracted features. In this case, the separability index is the criteria J = tr(S_−1_W SB) where SW and SB are earlier defined within- and between-class scatter matrices, respectively. Based on these matrices, a higher separability index corresponds to better separability between different classes of the EEG signals. Based on these 30 features, each original segment of the EEG signals from time domain can be presented now by its feature vector X = [_x_1_x_2 ··· _x_30]T, i.e., by the point in the feature space with dimension of 30.

Index	Feature	Non-epileptic of healthy tissue	Non-epileptic of unhealthy tissue	Epileptic	Separa-bility
μ	σ	μ	σ	μ	σ	J
_x_1	Total variation—delta	0.011	0.002	0.011	0.003	0.019	0.005	1.253
_x_2	Total variation—theta	0.027	0.004	0.022	0.006	0.028	0.006	0.300
_x_3	Total variation—àlpha	0.044	0.005	0.034	0.011	0.042	0.011	0.215
_x_4	Total variation—beta	0.075	0.008	0.057	0.024	0.062	0.023	0.150
_x_5	Total variation—gamma	0.149	0.019	0.102	0.047	0.103	0.041	0.335
_x_6	Relative power FFT—delta	0.446	0.090	0.628	0.147	0.267	0.220	0.720
_x_7	Relative power FFT—theta	0.159	0.049	0.236	0.119	0.390	0.224	0.417
_x_8	Relative power FFT—alpha	0.162	0.043	0.086	0.066	0.134	0.057	0.316
_x_9	Relative power FFT—beta	0.221	0.075	0.046	0.024	0.205	0.151	0.641
_x_10	Relative power FFT—gamma	0.012	0.010	0.004	0.003	0.004	0.005	0.264
_x_11	St. dev. coeff. DWT—delta	2.825	0.275	3.362	0.290	2.507	0.549	0.810
_x_12	St. dev. coeff. DWT—theta	1.795	0.180	1.709	0.366	2.181	0.505	0.300
_x_13	St. dev. coeff. DWT—alpha	1.266	0.140	0.766	0.175	1.275	0.288	1.276
_x_14	St. dev. coeff. DWT—beta	0.556	0.122	0.267	0.072	0.466	0.146	1.057
_x_15	St. dev. coeff. DWT—gamma	0.154	0.039	0.085	0.028	0.115	0.040	0.596
_x_16	Relative power DWÒ—delta	0.501	0.097	0.708	0.118	0.408	0.175	0.873
_x_17	Relative power DWÒ—theta	0.203	0.039	0.190	0.081	0.311	0.132	0.347
_x_18	Relative power DWÒ—alpha	0.202	0.043	0.077	0.035	0.213	0.097	0.913
_x_19	Relative power DWÒ—beta	0.081	0.038	0.020	0.011	0.060	0.039	0.613
_x_20	Relative power DWÒ—gamma	0.013	0.007	0.005	0.003	0.008	0.006	0.291
_x_21	Correlation dimension—delta	6.979	3.443	6.494	1.605	5.763	1.489	0.045
_x_22	Correlation dimension—theta	4.621	0.594	4.288	0.925	4.206	0.884	0.048
_x_23	Correlation dimension—alpha	4.184	0.442	3.701	0.886	3.230	0.833	0.272
_x_24	Correlation dimension—beta	3.635	0.359	3.097	0.940	2.348	0.832	0.490
_x_25	Correlation dimension—gamma	6.729	1.248	6.374	1.838	4.003	1.994	0.493
_x_26	Largest Lyapunov exp.—delta	3.282	0.873	2.910	0.856	4.203	1.102	0.327
_x_27	Largest Lyapunov exp.—theta	8.213	1.935	8.188	1.914	8.286	1.933	0.000
_x_28	Largest Lyapunov exp.—alpha	17.58	2.165	17.57	2.160	17.58	2.377	0.000
_x_29	Largest Lyapunov exp.—beta	32.91	5.991	32.65	5.977	33.04	5.091	0.001
_x_30	Largest Lyapunov exp.—gamma	11.71	2.985	11.62	2.965	11.89	5.210	0.001

Normalized features extracted from different frequency sub-bands.

The total variation is the only one feature that we extracted in the time domain. In Table 1, it can be noticed that the total variation has a certain potential for the detection of epileptiform activity in EEG signals. However, the total variation is not that much reliable despite the fact that is a pretty well estimated having in mind the duration of each of the analyzed segments.

The periodogram represents a very important feature of the signal in the frequency domain given that based on it we can get a relative contribution of either any individual frequency or a specific frequency band to the total power of the analyzed signal. The periodograms of one epileptic and two non-epileptic (from both unhealthy and healthy tissue) segments of the EEG signals are shown in Figure 4 where it can be noticed that the EEG signal power of is shifting from lower to higher frequencies in the presence of epileptiform activity.

Using the discrete wavelet transform (DWT) we can completely and independently extract higher and lower frequencies from the signal. All that can be done with different resolution in the time domain, i.e., higher resolution in the time domain for higher frequencies and lower resolution in the time domain for lower frequencies. The EEG signal segments were analyzed at four levels, i.e., the discrete wavelet decomposition was performed at four levels as presented in Figure 5. At the first level of decomposition, the original frequency band of the EEG signals (0–60 Hz) was divided into its higher (30–60 Hz) and lower part (0–30 Hz), i.e., the details and the approximation of the signals at the first decomposition level, respectively. Then at the second decomposition level, the frequency band of the approximation from the first level was additionally divided into its higher (15–30 Hz) and lower (0–15 Hz) part, i.e., the details and the approximation of the signals at the second decomposition level, respectively. After all four decomposition levels, the original band was divided into its five sub-bands, i.e., four sub-bands with the details and one sub-band with the approximation. All these five sub-bands approximately correspond to the earlier defined clinical sub-bands. Power distribution of the EEG signals in the time-frequency domain is quite well described by the DWT coefficients. However, in order to reduce the dimension of the problem and make easier further classification we calculated certain statistics of these coefficients in each sub-band such as the standard deviation and the average relative power, i.e., the square of the absolute values of the DWT coefficients.

Given that the EEG signal also roughly represents a dynamics of a very complex non-linear system such as the brain, the non-linear analysis based on the chaos theory was used in order to extract the information that could not been extracted by any of previously described linear techniques. It is interesting to see that unlike the other feature extraction techniques in the field, a complete agreement about if at all and how to perform a non-linear analysis of the EEG signals has not been achieved yet. Thus, quite often it is possible to find contradictory results of such experiments in the literature. For example, the correlation dimension and the largest Lyapunov exponent have completely different values in Hively et al. (), Adeli and Ghosh-Dastidar () and Iasemidis and Sackellares (). The feature extraction techniques and non-linear analysis implemented and used in this research are exclusively based on the chaos theory described in the methods part. In addition, there are no any further subjective adjustments applied to the EEG signals, which provides a high level of reproductivity of the obtained results at any time.

At first, the optimal lag and the embedding dimension were determined in order to reconstruct a segment of the EEG signals in its own lagged phase space. The optimal lag mo was obtained as the first local minimum of the function of the mutual information coefficients. The value of the optimal lag of the most of analyzed segments varied between 5 and 7. The minimum embedding dimension dmin was determined using Cao's technique, i.e., based on the saturation of the embedding function ed, for example as presented in Figure 6 in the case of one segment. In other words when a further increase in the embedding dimension does not result in more than 5% of increase in the embedding function. The value of the embedding function of all 300 segments processed approached 1. In fact, this confirms that there is a certain level of chaos present in the segments of the EEG signals. That chaos is not random but deterministic given that the value of the redefined embedding function e*d is not constant for all values of the embedding dimension as it can be seen in Figure 6. The value of the minimum embedding dimension varied between 4 and 10.

After reconstruction of the EEG signals in the lagged phase space, the correlation dimension of attractor was estimated using the Taken's estimator. After a few tests the value of radius ε in the phase space was set to 5% of the total size of the attractor since the higher values resulted into to many points, and the smaller ones into insufficient number of points for a good estimation of the correlation dimension. From Table 1, it can be concluded that the correlation dimension as a non-linear feature has a potential for detection of epileptiform activity in EEG signals. It is also obvious that the attractor complexity, i.e., the chaotic behavior of the EEG signals, is lower in presence of epileptiform activity. The values of the correlation dimension in all cases were higher than the embedding dimension of the lagged phase space, which is in accordance with the chaos theory.

The largest Lyapunov exponent as a measure of signal predictability was estimated using Sato's technique. At first, the prediction error as a function of number of samples k was determined as shown in Figure 7 in the case of one segment. Then, the largest Lyapunov exponent was estimated based on the function's slope in its medium part. As it can be seen in Table 1, the largest Lyapunov exponent has smaller discrimination ability compared with the correlation dimension. Additionally, it can be also noticed that the presence of epileptiform activity reduces the predictability of the EEG signals since the largest Lyapunov exponent is slightly higher in that case.

Dimension reduction in feature space

After the feature extraction from all the segments of the EEG signals, obviously none of the individually extracted features is sufficiently reliable for detection of epileptiform activity in EEG signals. This fact represents the main reason to perform the feature extraction in a few different domains of interest, i.e., time, frequency, time-frequency domain and non-linear analysis. The assumption is that the each of them contains some new information about the EEG signal, i.e., the information which is not present in any other domain and thus later contributes to more accurate classification and detection. Therefore, a better separability between the classes of epileptic and non-epileptic segments is expected after an optimal combination of the features from different domains than in the case of using only features from one domain as it is the case with almost all the literature in the field.

Both the KL expansion technique and the dimension reduction technique based on the scatter matrices were tested on the features from all the domains. The obtained results, i.e., adequate separability indexes before and after the dimension reduction in the feature space are presented in Table 2. The reduction technique based on the scatter matrices gives better results in all the domains of interest and also results into the separability index that is, as expected, greater than any individual separability index given in Table 1.

Features analyzed	Dimension	Separability index
Before	After	KL expansion	By the scatter matrices
Time domain (_x_1−5)	5	2	1.93	2.13
Frequency domain (_x_6−10)	5	2	1.25	2.16
Time-frequency domain (_x_11−15)	10	2	1.40	4.78
Non-linear analysis (_x_16−20)	10	2	1.07	1.15

Separability indexes after application of two different techniques for dimension reduction in feature space.

In Table 2, one can see that out of all the analyzed features, the highest separability index and the best discrimination characteristics between epileptic and non-epileptic segments have the features obtained in time-frequency domain after the DWT. However, the other features despite their lower separability indexes are also useful for later classification that is concluded based on an additional analysis whose results are presented in Table 3. It can be noticed that starting from the features in time domain the separability index increases by a gradual inclusion of the features from other domains.

Features analyzed	Dimension	Separability index
Before	After
Time domain (_x_1−5)	5	2	2.13
Including frequency domain (_x_1-10)	10	2	3.52
Including time-frequency domain (_x_1−20)	20	2	6.74
Including non-linear analysis (_x_1−30)	30	2	8.78

Separability indexes after the reduction based on the scatter matrices and gradual involvement of features from different domains.

Unlike the previous figures, Figure 8 shows 50 original nineteen-dimensional feature vectors X, which correspond to 50 segments from each of the three classes of the EEG signals, mapped into their new reduced two-dimensional feature space. All these 150 two-dimensional vectors Y will be later used in the next section for the design of appropriate classifiers while the rest of 150 segments and their corresponding feature vectors will be used to test the performance of the designed classifiers as well as the total accuracy of the new technique for detection of epileptiform activity in EEG signals.

Classification

After the reduction of the feature space dimension to two, the next step is the design of appropriate classifiers that can separate epileptic from non-epileptic segments of the EEG signals in the reduced feature space shown in Figure 8. This represents the last step in design of the new technique for detection of epileptiform activity in EEG signals. Having in mind the nature of the EEG signals and possible changes in their statistical properties it is very desired to use robust classifiers. Based on Figure 8 it can be concluded that quadratic classifiers represent quite logical choice for classification even though these three classes of the EEG signals are also piecewise linearly separable but with a much higher classification error. In total two quadratic classifiers were designed following the procedure described in Section Design of Quadratic Classifiers.

As it can be seen in Figure 9, the first classifier separates the non-epileptic segments of the EEG signals of healthy brain tissue (in green) from the non-epileptic segments of unhealthy tissue (in blue) as well as from the epileptic segments (in red). This classifier is defined using the following equation:

where the unknown parameters are _q_11 = −4870.8, _q_12 = _q_21 = −239.9, _q_22 = −174.9, ν1 = −29.2, ν2 = −174.9 and ν0 = −2.3. After that, the second classifier which separates the remaining two unseparated classes of the EEG signals segments, i.e., the epileptic and the non-epileptic segments of unhealthy brain tissue, was designed. The parameters of the Equation (44) for this classifier are _q_11 = −436.7, _q_12 = _q_21 = −128.2, _q_22 = 444.6, ν1 = −237.9, ν2 = −57.2 and ν0 = 0.5 while the classifier itself is shown in Figure 10.

The performance of the designed classifiers and thus the new technique for detection of epileptiform activity in EEG signals was tested by classification of the remaining 150 segments which were not previously used during the design procedure. The obtained results are presented in Figure 11, where the piecewise quadratic classifier is just a combination of two quadratic classifiers.

The classification results can also be represented by a confusion matrix that is given in Table 4, where its each cell contains number of classified features for each combination of three classes of the EEG signals segments. Based on the confusion matrix and Figure 11, it can be concluded that all the non-epileptic segments of healthy tissue were correctly classified. However, the remaining two classes contained one segment each which was incorrectly classified, i.e., classified as it belongs to the other class. The statistical performances such as sensitivity, specificity and accuracy, of the designed piecewise quadratic classifiers are presented in Table 5. As it can be seen, the total accuracy of the new technique for detection of epileptiform activity in EEG signals is 98.7%. Typically, quadratic classifiers are robust and do not exhibit overtraining when the number of parameters to be estimated is much less than the number of samples as in this case. Anyway, it is a good practice to cross validate this piecewise classifier in order to ensure its stability. A five-fold cross validation was performed and it resulted in the cross-validation loss, i.e., the error of the out-of-fold samples, of 1.7%. Even though it is slightly higher than the classification error of 1.3% it gives a confidence that the classifier is reasonably accurate.

EEG signals (input/output)	Non-epileptic	Epileptic
Healthy	Unhealthy
Non-epileptic of healthy brain tissue	50	0	0
Non-epileptic of unhealthy brain tissue	0	49	1
Epileptic	0	1	49

Confusion matrix.

EEG signals	Statistical performances [%]
Sensitivity	Specificity	Accuracy
Non-epileptic of healthy brain tissue	100	100	98.7
Non-epileptic of unhealthy brain tissue	98	99
Epileptic	98	99

Statistical performances.