G. Einicke - Independent Researcher (original) (raw)
Papers by G. Einicke
IET Signal Processing, 2019
A linear state-space model is described whose second-order moments match that of a hidden Markov ... more A linear state-space model is described whose second-order moments match that of a hidden Markov chain. This model enables a modified transition probability matrix to be employed within minimum-variance filters and smoothers. However, the ensuing filter/smoother designs can exhibit suboptimal performance because a previously-reported transition-probability-matrix modification is conservative, and identified models can lack observability and reachability. This paper describes a less-conservative transition-probability-matrix modification and a model-order-reduction procedure to enforce observability and reachability. An optimal minimum-variance predictor, filter and smoother are derived to recover the Markov chain states from noisy measurements. The predictor is asymptotically stable provided that the problem assumptions are correct. It is shown that collapsing the model improves state prediction performance. The filter and smoother recover the Markov states exactly when the measurement noise is negligible. A mining vehicle position tracking application is discussed in which performance benefits are demonstrated.
Smoothing, Filtering and Prediction: Estimating the Past, Present and Future Second Edition, 2019
The previously-discussed optimal Kalman filter [1] – [3] is routinely used for tracking observed ... more The previously-discussed optimal Kalman filter [1] – [3] is routinely used for tracking observed and unobserved states whose second-order statistics change over time. It is often assumed within Kalman filtering applications that one or more random variable sequences are generated by a random walk or an autoregressive process. That is, common Kalman filter parameterisations do not readily exploit knowledge about the random variables’ probability distributions. More precisely, the filter is optimal only for Gaussian variables whose first and second order moments completely specify all relevant probability distributions. For non-Gaussian data, the filter is only optimal over all linear filters [1].
Rather than assuming that random variable sequences are generated by autoregressive processes they may alternatively be modelled as Markov chains. The phrase ‘Markov chain’ was first coined in 1926 by a Russian mathematician S. N. Bernstein to acknowledge previous discoveries made by Andrei Andreevich Markov [4]. Markov was a professor at St Petersburg University and a member of the St Petersburg Academy of Sciences, which was a hub for scientific advances in many fields including probability theory. Indeed, Markov, along with fellow academy members D. Bernoulli, V. Y. Bunyakovsky and P. L. Chebyshev, all wrote textbooks on probability theory. Markov extended the weak law of large numbers and the central limit theorem to certain sequences of dependent random variables forming special classes of what are now known as Markov chains [4].
The basic theory of Hidden Markov models (HMMs) was first published by Baum et al in the 1960s [5]. HMMs were introduced to the speech recognition field in the 1970s by J. Baker at CMU [6], and F. Jelinek and his colleagues at IBM [7]. One of the most influential papers on HMM filtering and smoothing was the tutorial exposition by L. Rabiner [8], which has been accorded a large number of citations. Rabiner explained how to implement the forward-backward algorithm for estimating Markov state probabilities, together with the Baum-Welch algorithm (also known as the Expectation Maximisation algorithm). HMM filters and smoothers can be advantageous in applications where sequences of alphabets occur [8] - [10]. For example, in automatic speech recognition, sentence and language models can be constructed by concatenating phoneme and word-level HMMs. Similarly, stroke, character, word and context HMMs can be used in handwriting recognition. HMMs have been useful in modelling in biological sequences such as proteins and DNA sequences.
The Doob–Meyer decomposition theorem [11] states that a stochastic process may be decomposed into the sum of two parts, namely, a prediction and an input process. The standard Kalman filter [1] makes use of both prediction plus input process assumptions and attains minimum-variance optimality. In contrast, the standard hidden Markov model filter/smoother rely exclusively on (Markov model) prediction and is optimum in a Bayesian sense [8] - [10]. It is shown below that minimum-variance and HMM techniques can be combined for improved state recovery.
The minimum-variance, HMM and combined-minimum-variance-HMM predictions are only calculated from states at the previous time step. Improved predictions can be calculated from states at multiple previous time steps. The desired interdependencies between multiple previous states are conveniently captured by constructing high-order-Kronecker-product state vectors. The theory and implementation of such high-order-minimum-variance-HMM filters is also described below.
The afore-mentioned developments are driven by our rapacious appetites for improved estimator performance. In principle, each additional embellishment, spanning HMM filters, minimum-variance-HMM filters to high-order-minimum-variance-HMM filters, has potential to provide further performance gains, subject to the usual proviso that the underlying modelling assumptions are correct. Needless to say, significantly higher calculation overheads must be reconciled against any performance benefits.
Some prerequisites, namely, some results from probability theory including Markov processes, are introduced in Section 11.2. Bayes’ theorem is judiciously applied in Section 11.3 to derive the HMM filters and smoothers for time-homogenous processes. A state-space model having an output covariance equivalent to an HMM is derived in Section 11.4. This enables transition probability matrices to be employed in optimal filter and smoother constructions that minimise the error variance. Section 11.5 describes high-order-minimum-variance-HMM filters, which employ Kronecker product states.
Smoothing, Filtering and Prediction: Estimating the Past, Present and Future Second Edition, 2019
The Kalman filter is widely used for linear estimation problems where its behaviour is well-under... more The Kalman filter is widely used for linear estimation problems where its behaviour is well-understood. Under prescribed conditions, the estimated states are unbiased and stability is guaranteed. Many real-world problems are nonlinear which requires amendments to linear solutions. If the nonlinear models can be expressed in a state-space setting then the Kalman filter may find utility by applying linearisations at each time step. In the two-dimensional case, linearising means finding tangents to the curves of interest about the current estimates, so that the standard filter recursions can be employed in tandem to produce predictions for the next step. This approach is known as extended Kalman filtering – see [1] – [5].
Extended Kalman filters (EKFs) revert to optimal Kalman filters when the problems become linear. Thus, EKFs can yield approximate minimum-variance estimates. However, there are no accompanying performance guarantees and they fall into the try-at-your-own-risk category. Indeed, Anderson and Moore [3] caution that the EKF “can be satisfactory on occasions”. A number of compounding factors can cause performance degradation. The approximate linearisations may be crude and are carried out about estimated states (as opposed to true states). Observability problems occur when the variables do not map onto each other, giving rise to discontinuities within estimated state trajectories. Singularities within functions can result in non-positive solutions to the design Riccati equations and lead to instabilities.
The discussion includes suggestions for performance improvement and is organised as follows. The next section begins with Taylor series expansions, which are prerequisites for linearisation. First, second and third-order EKFs are then derived. EKFs tend be prone to instability and a way of enforcing stability is to masquerade the design Riccati equation by a faux version. This faux algebraic Riccati equation technique [6] – [10] is presented in Section 10.3. In Section 10.4, the higher order terms discarded by an EKF are treated as uncertainties. It is shown that a robust EKF arises by solving a scaled H∞ problem in lieu of one possessing uncertainties. Nonlinear smoother procedures can be designed similarly. The use of fixed-lag and Rauch-Tung-Striebel smoothers may be preferable from a complexity perspective. However, the approximate minimum-variance and robust smoothers, which are presented in Section 10.5, revert to optimal solutions when the nonlinearities and uncertainties diminish. Another way of guaranteeing stability is to by imposing constraints and one such approach is discussed in Section 10.6.
Smoothing, Filtering and Prediction: Estimating the Past, Present and Future Second Edition, 2019
The previously-discussed optimum predictor, filter and smoother solutions assume that the model p... more The previously-discussed optimum predictor, filter and smoother solutions assume that the model parameters are correct, the noise processes are white and their associated covariances are known precisely. These solutions are optimal in a mean-square-error sense, that is they provide the best average performance. If the above assumptions are correct, then the filter’s mean-square-error equals the trace of design error covariance. The underlying modelling and noise assumptions are a often convenient fiction. They do, however, serve to allow estimated performance to be weighed against implementation complexity.
In general, robustness means “the persistence of a system’s characteristic behaviour under perturbations or conditions of uncertainty” [1]. In an estimation context, robust solutions refer to those that accommodate uncertainties in problem specifications. They are also known as worst-case or peak error designs. The standard predictor, filter and smoother structures are retained but a larger design error covariance is used to account for the presence of modelling error.
Designs that cater for worst cases are likely to exhibit poor average performance. Suppose that a bridge designed for average loading conditions returns an acceptable cost benefit. Then a robust design that is focussed on accommodating infrequent peak loads is likely to provide worse cost performance. Similarly, a worst-case shoe design that accommodates rarely occurring large feet would provide poor fitting performance on average. That is, robust designs tend to be conservative. In practice, a trade-off may be desired between optimum and robust designs.
The material canvassed herein is based on the H∞ filtering results from robust control. The robust control literature is vast, see [2] – [33] and the references therein. As suggested above, the H∞ solutions of interest here involve observers having gains that are obtained by solving Riccati equations. This Riccati equation solution approach relies on the Bounded Real Lemma – see the pioneering work by Vaidyanathan [2] and Petersen [3]. The Bounded Real Lemma is implicit with game theory [9] – [19]. Indeed, the continuous-time solutions presented in this section originate from the game theoretic approach of Doyle, Glover, Khargonekar, Francis Limebeer, Anderson, Khargonekar, Green, Theodore and Shaked, see [4], [13], [15], [21]. The discussed discrete-time versions stem from the results of Limebeer, Green, Walker, Yaesh, Shaked, Xie, de Souza and Wang, see [5], [11], [18], [19], [21]. In the parlance of game theory: “a statistician is trying to best estimate a linear combination of the states of a system that is driven by nature; nature is trying to cause the statistician’s estimate to be as erroneous as possible, while trying to minimise the energy it invests in driving the system” [19].
Pertinent state-space H∞ predictors, filters and smoothers are described in [4] – [19]. Some prediction, filtering and smoothing results are summarised in [13] and methods for accommodating model uncertainty are described in [14], [18], [19]. The aforementioned methods for handling model uncertainty can result in conservative designs (that depart far from optimality). This has prompted the use of linear matrix inequality solvers in [20], [23] to search for optimal solutions to model uncertainty problems.
It is explained in [15], [19], [21] that a saddle-point strategy for the games leads to robust estimators, and the resulting robust smoothing, filtering and prediction solutions are summarised below. While the solution structures remain unchanged, designers need to tweak the scalar within the underlying Riccati equations.
This chapter has two main parts. Section 9.2 describes robust continuous-time solutions and the discrete-time counterparts are presented in Section 9.3. The previously discussed techniques each rely on a trick. The optimum filters and smoothers arise by completing the square. In maximum-likelihood estimation, a function is differentiated with respect to an unknown parameter and then set to zero. The trick behind the described robust estimation techniques is the Bounded Real Lemma, which opens the discussions.
Smoothing, Filtering and Prediction: Estimating the Past, Present and Future Second Edition, 2019
Predictors, filters and smoothers have previously been described for state recovery under the ass... more Predictors, filters and smoothers have previously been described for state recovery under the assumption that the parameters of the generating models are correct. More often than not, the problem parameters are unknown and need to be identified. This section describes some standard statistical techniques for parameter estimation. Paradoxically, the discussed parameter estimation methods rely on having complete state information available. Although this is akin to a chicken-and-egg argument (state availability obviates the need for filters along with their attendant requirements for identified models), the task is not insurmountable.
The role of solution designers is to provide a cost benefit. That is, their objectives are to deliver improved performance at an acceptable cost. Inevitably, this requires simplifications so that the problems become sufficiently tractable and amenable to feasible solution. For example, suppose that speech emanating from a radio is too noisy and barely intelligible. In principle, high-order models could be proposed to equalise the communication channel, demodulate the baseband signal and recover the phonemes. Typically, low-order solutions tend to offer better performance because of the difficulty in identifying large numbers of parameters under low-SNR conditions. Consider also the problem of monitoring the output of a gas sensor and triggering alarms when environmental conditions become hazardous. Complex models could be constructed to take into account diurnal pressure variations, local weather influences and transients due to passing vehicles. It often turns out that low-order solutions exhibit lower false alarm rates because there are fewer assumptions susceptible to error. Thus, the absence of complete information need not inhibit solution development. Simple schemes may suffice, such as conducting trials with candidate parameter values and assessing the consequent error performance.
In maximum-likelihood estimation [1] – [5], unknown parameters θ1, θ2, …, θM, are identified given states, xk, by maximising a log-likelihood function, log f(θ1, θ2, …, . For example, the subject of noise variance estimation was studied by Mehra in [6], where maximum-likelihood estimates (MLEs) were updated using the Newton-Raphson method. Rife and Boorstyn obtained Cramér-Rao bounds for some MLEs, which “indicate the best estimation that can be made with the available data” [7]. Nayak et al used the pseudo-inverse to estimate unknown parameters in [8]. Belangér subsequently employed a least-squares approach to estimate the process noise and measurement noise variances [9]. A recursive technique for least-squares parameter estimation was developed by Strejc [10]. Dempster, Laird and Rubin [11] proved the convergence of a general purpose technique for solving joint state and parameter estimation problems, which they called the expectation-maximisation (EM) algorithm. They addressed problems where complete (state) information is not available to calculate the log-likelihood and instead maximised the expectation of , given incomplete measurements, zk. That is, by virtue of Jensen’s inequality the unknowns are found by using an objective function (which are also called an approximate log-likelihood function), , as a surrogate for log f(θ1, θ2, …, .
The system identification literature is vast and some mature techniques have evolved. Subspace identification methods have been developed for general problems where a system’s stochastic inputs, deterministic inputs and outputs are available. The subspace algorithms [12] – [14] consist of two steps. First, the order of the system is identified from stacked vectors of the inputs and outputs. Then the unknown parameters are determined from an extended observability matrix.
Continuous-time maximum-likelihood estimation has been mentioned previously. Here, the attention is focussed on the specific problem of joint state and parameter estimation exclusively from discrete measurements of a system’s outputs. The developments proceed as follows. Section 8.2 reviews the maximum-likelihood estimation method for obtaining unknown parameters. The same estimates can be found using the method of least squares, which was pioneered by Gauss for fitting astronomical observations. Well known (filtering) EM algorithms for variance and state matrix estimation are described in Section 8.3. Improved parameter estimation accuracy can be obtained via smoothing EM algorithms, which are introduced in Section 8.4.
Although EM algorithms can yield improved state matrix and input process covariance estimates, it has been found that they are only accurate when the measurement noise is negligible. Similarly in subspace identification, least-squares estimation of unknown state-space matrices can lead to biased results when the states are corrupted by noise. This arises because the standard least-squares and maximum-likelihood estimates of a state matrix and an input process covariance themselves are biased in the presence of measurement noise. Therefore, a correction term is introduced in Section 8.5 to eliminate the bias error. This yields unbiased, consistent, closed-form estimates of a state matrix, an input process covariance and a measurement noise covariance. Under simplifying conditions they are equal to MLEs and attain the corresponding Cramer-Rao Lower Bounds (CRLBs).
The use of the MLEs, filtering and smoothing EM algorithms discussed herein require caution. When perfect information and sufficiently large sample sizes are available, the corresponding likelihood functions are exact. However, the use of imperfect information leads to approximate likelihood functions and biased MLEs, which can degrade parameter estimation accuracy and follow-on filter/smoother performance.
Smoothing, Filtering and Prediction: Estimating the Past, Present and Future Second Edition Kindle, 2019
Observations are invariably accompanied by measurement noise and optimal filters are the usual so... more Observations are invariably accompanied by measurement noise and optimal filters are the usual solution of choice. Filter performances that fall short of user expectations motivate the pursuit of smoother solutions. Smoothers promise useful mean-square-error improvement at mid-range signal-to-noise ratios, provided that the assumed model parameters and noise statistics are correct.
In general, discrete-time filters and smoothers are more practical than the continuous-time counterparts. Often a designer may be able to value-add by assuming low-order discrete-time models which bear little or no resemblance to the underlying processes. Continuous-time approaches may be warranted only when application-specific performance considerations outweigh the higher overheads.
This chapter canvasses the main discrete-time fixed-point, fixed-lag and fixed interval smoothing results [1] – [9]. Fixed-point smoothers [1] calculate an improved estimate at a prescribed past instant in time. Fixed-lag smoothers [2] – [3] find application where small end-to-end delays are tolerable, for example, in press-to-talk communications or receiving public broadcasts. Fixed-interval smoothers [4] – [9] dispense with the need to fine tune the time of interest or the smoothing lags. They are suited to applications where processes are staggered such as delayed control or off-line data analysis. For example, in underground coal mining, smoothed position estimates and control signals can be calculated while a longwall shearer is momentarily stationary at each end of the face [9]. Similarly, in exploration drilling, analyses are typically carried out post-data acquisition.
The smoother descriptions are organised as follows. Section 7.2 sets out two prerequisites: time-varying adjoint systems and Riccati difference equation comparison theorems. Fixed-point, fixed-lag and fixed-interval smoothers are discussed in Sections 7.3, 7.4 and 7.5, respectively. It turns out that the structures of the discrete-time smoothers are essentially the same as those of the previously-described continuous-time versions. Differences arise in the calculation of Riccati equation solutions and the gain matrices. Consequently, the treatment is somewhat condensed. It is reaffirmed that the above-mentioned smoothers outperform the Kalman filter and the minimum-variance smoother provides the best performance.
Smoothing, Filtering and Prediction: Estimating the Past, Present and Future Second Edition, 2019
The previously-described minimum-mean-square-error and minimum-variance filtering solutions opera... more The previously-described minimum-mean-square-error and minimum-variance filtering solutions operate on measurements up to the current time. If some processing delay can be tolerated then improved estimation performance can be realised through the use of smoothers. There are three state-space smoothing technique categories, namely, fixed-point, fixed-lag and fixed-interval smoothing. Fixed-point smoothing refers to estimating some linear combination of states at a previous instant in time. In the case of fixed-lag smoothing, a fixed time delay is assumed between the measurement and on-line estimation processes. Fixed-interval smoothing is for retrospective data analysis, where measurements recorded over an interval are used to obtain the improved estimates. Compared to filtering, smoothing has a higher implementation cost, as it has increased memory and calculation requirements.
A large number of smoothing solutions have been reported since Wiener’s and Kalman’s development of the optimal filtering results – see the early surveys [1] – [2]. The minimum-variance fixed-point and fixed-lag smoother solutions are well known. Two fixed-interval smoother solutions, namely the maximum-likelihood smoother developed by Rauch, Tung and Striebel [3], and the two-filter Fraser-Potter formula [4], have been in widespread use since the 1960s. However, the minimum-variance fixed-interval smoother is not well known. This smoother is simply a time-varying state-space generalisation of the optimal Wiener solution. It differs from the Rauch-Tung-Striebel and Fraser-Potter solutions, which may not sit well with more orthodox practitioners.
The main approaches for continuous-time fixed-point, fixed-lag and fixed-interval smoothing are canvassed here. It is assumed throughout that the underlying noise processes are zero mean and uncorrelated. Nonzero means and correlated processes can be handled using the approaches of Chapters 3 and 4. It is also assumed here that the noise statistics and state-space model parameters are known precisely. Note that techniques for estimating parameters and accommodating uncertainty are addressed subsequently.
Some prerequisite concepts, namely time-varying adjoint systems, backwards differential equations, Riccati equation comparison and the continuous-time maximum-likelihood method are covered in Section 6.2. Section 6.3 outlines a derivation of the fixed-point smoother by Meditch [5]. The fixed-lag smoother reported by Sage et al [6] and Moore [7], is the subject of Section 4. Section 5 deals with the Rauch-Tung-Striebel [3], Fraser-Potter [4] and minimum-variance fixed-interval smoother solutions [8] - [10]. As before, the approach here is to accompany the developments, where appropriate, with proofs about performance being attained. Smoothing is not a panacea for all ills. If the measurement noise is negligible then smoothing (and filtering) may be superfluous. Conversely, if measurement noise obliterates the signals then data recovery may not be possible. Therefore, estimator performance is often discussed in terms of the prevailing signal-to-noise ratio.
Smoothing, Filtering and Prediction: Estimating the Past, Present and Future Second Edition, 2019
This chapter presents the minimum-variance filtering results simplified for the case when the mod... more This chapter presents the minimum-variance filtering results simplified for the case when the model parameters are time-invariant and the noise processes are stationary. The filtering objective remains the same, namely, the task is to estimate a signal in such as way to minimise the filter error covariance.A somewhat naïve approach is to apply the standard filter recursions using the time-invariant problem parameters. Although this approach is valid, it involves recalculating the Riccati difference equation solution and filter gain at each time-step, which is computationally expensive. A lower implementation cost can be realised by recognising that the Riccati difference equation solution asymptotically approaches the solution of an algebraic Riccati equation. In this case, the algebraic Riccati equation solution and hence the filter gain can be calculated before running the filter.
The steady-state discrete-time Kalman filtering literature is vast and some of the more accessible accounts [1] – [14] are canvassed here. The filtering problem and the application of the standard time-varying filter recursions are described in Section 5.2. An important criterion for checking whether the states can be uniquely reconstructed from the measurements is observability. For example, sometimes states may be internal or sensor measurements might not be available, which can result in the system having hidden modes. Section 5.3 describes two common tests for observability, namely, checking that an observability matrix or an observability gramian are of full rank. The subject of Riccati equation monotonicity and convergence has been studied extensively by Chan [4], De Souza [5], [6], Bitmead [7], [8], Wimmer [9] and Wonham [10], which is discussed in Section 5.4. Chan, et al [4] also showed that if the underlying system is stable and observable then the minimum-variance filter is stable. Section 6 describes a discrete-time version of the Kalman-Yakubovich-Popov Lemma, which states for time-invariant systems that solving a Riccati equation is equivalent to spectral factorisation. In this case, the Wiener and Kalman filters are the same.
Since the optimal filter is model-based, any unknown model parameters need to be estimated (as explained in Chapter 7) prior to implementation. The estimated parameters can be inexact which leads to degraded filter performance. An iterative frequency weighting procedure is described in Section 5.5 for mitigating the performance degradation.
Smoothing, Filtering and Prediction: Estimating the Past, Present and Future Second Edition, 2019
Kalman filters are employed wherever it is desired to recover data from the noise in an optimal w... more Kalman filters are employed wherever it is desired to recover data from the noise in an optimal way, such as in satellite orbit estimation, aircraft guidance, radar, communication systems, navigation, medical diagnosis and finance. Continuous-time problems that possess differential equations may be easier to describe in a state-space framework, however, the filters have higher implementation costs because an additional integration step and higher sampling rates are required. Conversely, although discrete-time state-space models may be less intuitive, the ensuing filter difference equations can be realised immediately.
The discrete-time Kalman filter calculates predicted states via a linear recursion in which the predictor gain is a function of the noise statistics and the model parameters. This solution was reported by Rudolf E. Kalman in the 1960s [1], [2]. He has since received many awards and prizes, including the National Medal of Science, which was presented to him by President Barack Obama in 2009.
The Kalman filter calculations are simple and well-established. A possibly troublesome obstacle is expressing problems at hand within a state-space framework. This chapter derives the main discrete-time results to provide familiarity with state-space techniques and filter application. The continuous-time and discrete-time minimum-square-error Wiener filters were derived using a completing-the-square approach in Chapters 1 and 2, respectively. Similarly for time-varying continuous-time signal models, the derivation of the minimum-variance Kalman filter, presented in Chapter 3, relied on a least-mean-square (or conditional-mean) formula. This formula is used again in the solution of the discrete-time prediction and filtering problems. Predictions can be used when the measurements are irregularly spaced or missing at the cost of increased mean-square-error.
This chapter develops the prediction and filtering results for the case where the problem is nonstationary or time-varying. It is routinely assumed that the process and measurement noises are zero mean and uncorrelated. Nonzero mean cases can be accommodated by including deterministic inputs within the state prediction and filter output updates. Correlated noises can be handled by adding a term within the predictor gain and the underlying Riccati equation. The same approach is employed when the signal model possesses a direct-feedthrough term. A simplification of the generalised regulator problem from control theory is presented, from which the solutions of output estimation, input estimation (or equalisation), state estimation and mixed filtering problems follow immediately.
Smoothing, Filtering and Prediction: Estimating the Past, Present and Future Second Edition, 2019
Rudolf E. Kalman studied discrete-time linear dynamic systems for his master’s thesis at MIT in 1... more Rudolf E. Kalman studied discrete-time linear dynamic systems for his master’s thesis at MIT in 1954. He commenced work at the Research Institute for Advanced Studies (RIAS) in Baltimore during 1957 and nominated Richard S. Bucy to join him in 1958 [1]. Bucy recognised that the nonlinear ordinary differential equation studied by an Italian mathematician, Count Jacopo F. Riccati, in around 1720, now called the Riccati equation, is equivalent to the Wiener-Hopf equation for the case of finite dimensional systems [1], [2]. In November 1958, Kalman recasted the frequency domain methods developed by Norbert Wiener and Andrei N. Kolmogorov in the 1940s to state-space form [2]. Kalman noted in his 1960 paper [3] that generalising the Wiener solution to nonstationary problems was difficult, which motivated his development of the optimal discrete-time filter in a state-space framework. He described the continuous-time version with Bucy in 1961 [4] and published a generalisation in 1963 [5]. Bucy later investigated the monotonicity and stability of the underlying Riccati equation [6]. The continuous-time minimum-variance filter is now commonly attributed to both Kalman and Bucy.
Compared to the Wiener Filter, Kalman’s state-space approach has the following advantages.
• It is applicable to time-varying problems.
• As noted in [7], [8], the state-space parameters can be linearisations of nonlinear models.
• The burdens of spectral factorisation and pole-zero cancelation are replaced by the easier task of solving a Riccati equation.
• It is a more intuitive model-based approach in which the estimated states correspond to those within the signal generation process.
Kalman’s research at the RIAS was concerned with estimation and control for aerospace systems which was funded by the Air Force Office of Scientific Research. His explanation of why the dynamics-based Kalman filter is more important than the purely stochastic Wiener filter is that “Newton is more important than Gauss” [1]. The continuous-time Kalman filter produces state estimates from the solution of a simple differential equation in which it is tacitly assumed that the model is correct, the noises are zero-mean, white and uncorrelated. It is straightforward to include nonzero means, coloured and correlated noises. In practice, the true model can be elusive but a simple (low-order) solution may return a cost benefit.
The Kalman filter can be derived in many different ways. In an early account [3], a quadratic cost function was minimised using orthogonal projections. Other derivation methods include deriving a maximum a posteriori estimate, using Itô’s calculus, calculus-of-variations, dynamic programming, invariant imbedding and from the Wiener-Hopf equation [6] - [17]. This chapter provides a brief derivation of the optimal filter using a conditional mean (or equivalently, a least mean square error) approach.
The developments begin by introducing a time-varying state-space model. Next, the state transition matrix is defined, which is used to derive a Lyapunov differential equation. The Kalman filter follows immediately from a conditional mean formula. Its filter gain is obtained by solving a Riccati differential equation corresponding to the estimation error system. Generalisations for problems possessing deterministic inputs, correlated process and measurement noises, and direct feedthrough terms are described subsequently. Finally, it is shown that the Kalman filter reverts to the Wiener filter when the problems are time-invariant.
Smoothing, Filtering and Prediction: Estimating the Past, Present and Future Second Edition, 2019
This chapter reviews the solutions for the discrete-time, linear stationary filtering problems th... more This chapter reviews the solutions for the discrete-time, linear stationary filtering problems that are attributed to Wiener [1] and Kolmogorov [2]. As in the continuous-time case, a model-based approach is employed. Here, a linear model is specified by the coefficients of the input and output difference equations. It is shown that the same coefficients appear in the system’s (frequency domain) transfer function. In other words, frequency domain model representations can be written down without background knowledge of z-transforms.
In the 1960s and 1970s, continuous-time filters were implemented on analogue computers. This technology has been largely discontinued for two main reasons. First, analogue multipliers and op amp circuits exhibit poor performance whenever (temperature-sensitive) calibrations become out of date. Second, updated software releases are faster to turn around than hardware design iterations. Continuous-time filters are now routinely implemented using digital computers, provided that the signal sampling rates and data processing rates are sufficiently high. Alternatively, continuous-time model parameters may be converted into discrete-time and differential equations can be transformed into difference equations. The ensuing discrete-time filter solutions are then amenable to more economical implementation, namely, employing relatively lower processing rates.
The discrete-time Wiener filtering problem is solved in the frequency domain. Once again, it is shown that the optimum minimum-mean-square-error solution is found by completing the square. The optimum solution is noncausal, which can only be implemented by forward and backward processes. This solution is actually a smoother and the optimum filter is found by taking the causal part.
The developments rely on solving a spectral factorisation problem, which requires pole-zero cancellations. Therefore, some pertinent discrete-time concepts are introduced in Section 2.2 prior to deriving the filtering results. The discussion of the prerequisite concepts is comparatively brief since it mirrors the continuous-time material introduced previously. In Section 2.3 it is shown that the structure of the filter solutions is unchanged – only the spectral factors are calculated differently.
Smoothing, Filtering and Prediction: Estimating the Past, Present and Future Second Edition, 2019
Optimal filtering is concerned with designing the best linear system for recovering data from noi... more Optimal filtering is concerned with designing the best linear system for recovering data from noisy measurements. It is a model-based approach requiring knowledge of the signal generating system. The signal models, together with the noise statistics are factored into the design in such a way to satisfy an optimality criterion, namely, minimising the square of the error.
A prerequisite technique, the method of least-squares, has its origin in curve fitting. Amid some controversy, Kepler claimed in 1609 that the planets move around the Sun in elliptical orbits [1]. Carl Freidrich Gauss arrived at a better performing method for fitting curves to astronomical observations and predicting planetary trajectories in 1799 [1]. He formally published a least-squares approximation method in 1809 [2], which was developed independently by Adrien-Marie Legendre in 1806 [1]. This technique was famously used by Giusseppe Piazzi to discover and track the asteroid Ceres using a least-squares analysis which was easier than solving Kepler’s complicated nonlinear equations of planetary motion [1]. Andrey N. Kolmogorov refined Gauss’s theory of least-squares and applied it for the prediction of discrete-time stationary stochastic processes in 1939 [3]. Norbert Wiener, a faculty member at MIT, independently solved analogous continuous-time estimation problems. He worked on defence applications during the Second World War and produced a report entitled Extrapolation, Interpolation and Smoothing of Stationary Time Series in 1943. The report was later published as a book in 1949 [4].
Wiener derived two important results, namely, the optimum (non-causal) minimum-mean-square-error solution and the optimum causal minimum-mean-square-error solution [4] – [6]. The optimum causal solution has since become known at the Wiener filter and in the time-invariant case is equivalent to the Kalman filter that was developed subsequently. Wiener pursued practical outcomes and attributed the term “unrealisable filter” to the optimal non-causal solution because “it is not in fact realisable with a finite network of resistances, capacities, and inductances” [4]. Wiener’s unrealisable filter is actually the optimum linear smoother.
The optimal Wiener filter is calculated in the frequency domain. Consequently, Section 1.2 touches on some frequency-domain concepts. In particular, the notions of spaces, state-space systems, transfer functions, canonical realisations, stability, causal systems, power spectral density and spectral factorisation are introduced. The Wiener filter is then derived by minimising the square of the error. Three cases are discussed in Section 1.3. First, the solution to general estimation problem is stated. Second, the general estimation results are specialised to output estimation. The optimal input estimation or equalisation solution is then described. An example, demonstrating the recovery of a desired signal from noisy measurements, completes the chapter.
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2013
—This paper considers estimation problems where inequality constraints are imposed on the outputs... more —This paper considers estimation problems where inequality constraints are imposed on the outputs of linear systems and can be modeled by nonlinear functions. In this case, censoring functions can be designed to constrain measurements for use by filters and smoothers. It is established that the filter and smoother output estimates are unbiased, provided that the underlying probability density functions are even and the censoring functions are odd. The Bounded Real Lemma is employed to ensure that the output estimates satisfy a performance criterion. A global positioning system (GPS) and inertial navigation system (INS) integration application is discussed in which a developed solution exhibits improved performance during GPS outages when a priori information is used to constrain the altitude and velocity measurements.
IEEE Trans. Signal Processing, 2006
—The paper describes an optimal minimum-variance noncausal filter or fixed-interval smoother. The... more —The paper describes an optimal minimum-variance noncausal filter or fixed-interval smoother. The optimal solution involves a cascade of a Kalman predictor and an adjoint Kalman predictor. A robust smoother involving H predictors is also described. Filter asymptotes are developed for output estimation and input estimation problems which yield bounds on the spectrum of the estimation error. These bounds lead to a priori estimates for the scalar in the H filter and smoother design. The results of simulation studies are presented, which demonstrate that optimal, robust, and extended Kalman smoothers can provide performance benefits.
IEEE Signal Processing Letters, 2014
Pioneering research on the perception of sounds at different frequencies was conducted by Fletche... more Pioneering research on the perception of sounds at different frequencies was conducted by Fletcher and Munson in the 1930s. Their work led to a standard way of weighting measured sound levels within investigations of industrial noise and hearing loss. Frequency weightings have since been used within filter and controller designs to manage performance within bands of interest. This paper introduces iterative frequency weighted filtering and smoothing procedures. It is assumed that filter and smoother estimation errors are generated by a first-order moving-average (AR1) system. This AR1 system is identified and used to apply a frequency weighting function within the design of minimum-variance filters and smoothers. It is shown under prescribed conditions that the described solutions result in nonincreasing error variances. An example is described which demonstrates improved mean-square-error performance.
IEEE Control Systems Magazine, 2008
Longwall mining is a method for extracting coal from underground mines. The mining technology inv... more Longwall mining is a method for extracting coal from underground mines. The mining technology involves a longwall shearer, which is a 15-m long, 100-ton machine that has picks attached to two drums, which rotate at 30 to 40 revolutions per minute. A longwall face is the mined area from which material is extracted. The shearer removes coal by traversing a face at approximately 25-minute intervals. Traditionally, longwall mining equipment is controlled manually, where the face is aligned using a string line. Under manual control, the face meanders and wanders out of the coal seam, which causes rock to contaminate the coal and limits the rate of production. Coal production is maximized by maintaining a straight shearer trajectory and keeping the face within the seam. Therefore, precise estimates of the face locations are required so that the longwall equipment can be repositioned after each shear. We are automating longwall equipment to improve production. A Northrop Grumman LN270 inertial navigation unit and an IEEE 802.11b wireless local area network client device are installed within a flame-proof enclosure, which, together with an odometer, are mounted on a shearer. The inertial navigation unit and odometer measure the shearer's orientation and distance travelled across the face, respectively. The inertial navigation and odometer measurements are stored locally and subsequently forwarded when the shearer is near an access point of the wireless local area network. Upon completion of each shear, inertial navigation and odometer data are used to estimate the position of the face and control the roof support equipment for the next shear. In particular, minimum-variance fixed-interval smoothing is applied to the inertial and odometer measurements to calculate face positions in three-dimensional space. Filtering refers to the process of estimating the current value of a signal from noisy measurements up to the current time. In fixed-interval smoothing, measurements recorded over an interval are used to estimate past values of a signal. Compared to filtering, smoothing can provide improved estimation accuracy at the cost of twice the computational complexity. Smoothing is applicable wherever measurements are organized in blocks and retrospective data analysis is feasible. In the case of longwall mining, the position estimates and controls are calculated while the shearer is momentarily stationary at the ends of the face.
SMOOTHING, FILTERING AND PREDICTION: ESTIMATING THE PAST, PRESENT AND FUTURE, 2012
IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2018
This paper investigates deteriorations in knee and ankle dynamics during running. Changes in lo... more This paper investigates deteriorations in
knee and ankle dynamics during running. Changes in
lower limb accelerations are analyzed by a wearable
musculoskeletal monitoring system. The system employs
a machine-learning technique to classify joint stiffness. A
maximum-entropy-rate method is developed to select the
most relevant features. Experimental results demonstrate
that distance travelled and energy expended can be estimated
from observed changes in knee and ankle motions
during 5-km runs.
IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS, 2017
A high order signal model is proposed in which the states are Kronecker tensor products of probab... more A high order signal model is proposed in which the states are Kronecker tensor products of probability distributions. This model enables an optimal linear filter to be specified. A minimum residual error variance criterion may be used to select the number of discretizations and Kronecker products. The filtering of LIDAR data from a coal shiploader environment is investigated. It is demonstrated that the proposed method can outperform conventional Kalman and hidden Markov model filters.
IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS, 2015
The minimum-variance filter and smoother are generalized to include Poisson-distributed measureme... more The minimum-variance filter and smoother are generalized to include Poisson-distributed measurement noise components. It is shown that the resulting filtered and smoothed estimates are unbiased. The use of the filter and smoother within Expectation-Maximization algorithms are described for joint estimation of the signal and Poisson noise intensity. Conditions for the monotonicity and asymptotic convergence of the Poisson intensity iterates are also established. An image restoration example is presented which demonstrates improved estimation performance at low signal-to-noise ratios.
IET Signal Processing, 2019
A linear state-space model is described whose second-order moments match that of a hidden Markov ... more A linear state-space model is described whose second-order moments match that of a hidden Markov chain. This model enables a modified transition probability matrix to be employed within minimum-variance filters and smoothers. However, the ensuing filter/smoother designs can exhibit suboptimal performance because a previously-reported transition-probability-matrix modification is conservative, and identified models can lack observability and reachability. This paper describes a less-conservative transition-probability-matrix modification and a model-order-reduction procedure to enforce observability and reachability. An optimal minimum-variance predictor, filter and smoother are derived to recover the Markov chain states from noisy measurements. The predictor is asymptotically stable provided that the problem assumptions are correct. It is shown that collapsing the model improves state prediction performance. The filter and smoother recover the Markov states exactly when the measurement noise is negligible. A mining vehicle position tracking application is discussed in which performance benefits are demonstrated.
Smoothing, Filtering and Prediction: Estimating the Past, Present and Future Second Edition, 2019
The previously-discussed optimal Kalman filter [1] – [3] is routinely used for tracking observed ... more The previously-discussed optimal Kalman filter [1] – [3] is routinely used for tracking observed and unobserved states whose second-order statistics change over time. It is often assumed within Kalman filtering applications that one or more random variable sequences are generated by a random walk or an autoregressive process. That is, common Kalman filter parameterisations do not readily exploit knowledge about the random variables’ probability distributions. More precisely, the filter is optimal only for Gaussian variables whose first and second order moments completely specify all relevant probability distributions. For non-Gaussian data, the filter is only optimal over all linear filters [1].
Rather than assuming that random variable sequences are generated by autoregressive processes they may alternatively be modelled as Markov chains. The phrase ‘Markov chain’ was first coined in 1926 by a Russian mathematician S. N. Bernstein to acknowledge previous discoveries made by Andrei Andreevich Markov [4]. Markov was a professor at St Petersburg University and a member of the St Petersburg Academy of Sciences, which was a hub for scientific advances in many fields including probability theory. Indeed, Markov, along with fellow academy members D. Bernoulli, V. Y. Bunyakovsky and P. L. Chebyshev, all wrote textbooks on probability theory. Markov extended the weak law of large numbers and the central limit theorem to certain sequences of dependent random variables forming special classes of what are now known as Markov chains [4].
The basic theory of Hidden Markov models (HMMs) was first published by Baum et al in the 1960s [5]. HMMs were introduced to the speech recognition field in the 1970s by J. Baker at CMU [6], and F. Jelinek and his colleagues at IBM [7]. One of the most influential papers on HMM filtering and smoothing was the tutorial exposition by L. Rabiner [8], which has been accorded a large number of citations. Rabiner explained how to implement the forward-backward algorithm for estimating Markov state probabilities, together with the Baum-Welch algorithm (also known as the Expectation Maximisation algorithm). HMM filters and smoothers can be advantageous in applications where sequences of alphabets occur [8] - [10]. For example, in automatic speech recognition, sentence and language models can be constructed by concatenating phoneme and word-level HMMs. Similarly, stroke, character, word and context HMMs can be used in handwriting recognition. HMMs have been useful in modelling in biological sequences such as proteins and DNA sequences.
The Doob–Meyer decomposition theorem [11] states that a stochastic process may be decomposed into the sum of two parts, namely, a prediction and an input process. The standard Kalman filter [1] makes use of both prediction plus input process assumptions and attains minimum-variance optimality. In contrast, the standard hidden Markov model filter/smoother rely exclusively on (Markov model) prediction and is optimum in a Bayesian sense [8] - [10]. It is shown below that minimum-variance and HMM techniques can be combined for improved state recovery.
The minimum-variance, HMM and combined-minimum-variance-HMM predictions are only calculated from states at the previous time step. Improved predictions can be calculated from states at multiple previous time steps. The desired interdependencies between multiple previous states are conveniently captured by constructing high-order-Kronecker-product state vectors. The theory and implementation of such high-order-minimum-variance-HMM filters is also described below.
The afore-mentioned developments are driven by our rapacious appetites for improved estimator performance. In principle, each additional embellishment, spanning HMM filters, minimum-variance-HMM filters to high-order-minimum-variance-HMM filters, has potential to provide further performance gains, subject to the usual proviso that the underlying modelling assumptions are correct. Needless to say, significantly higher calculation overheads must be reconciled against any performance benefits.
Some prerequisites, namely, some results from probability theory including Markov processes, are introduced in Section 11.2. Bayes’ theorem is judiciously applied in Section 11.3 to derive the HMM filters and smoothers for time-homogenous processes. A state-space model having an output covariance equivalent to an HMM is derived in Section 11.4. This enables transition probability matrices to be employed in optimal filter and smoother constructions that minimise the error variance. Section 11.5 describes high-order-minimum-variance-HMM filters, which employ Kronecker product states.
Smoothing, Filtering and Prediction: Estimating the Past, Present and Future Second Edition, 2019
The Kalman filter is widely used for linear estimation problems where its behaviour is well-under... more The Kalman filter is widely used for linear estimation problems where its behaviour is well-understood. Under prescribed conditions, the estimated states are unbiased and stability is guaranteed. Many real-world problems are nonlinear which requires amendments to linear solutions. If the nonlinear models can be expressed in a state-space setting then the Kalman filter may find utility by applying linearisations at each time step. In the two-dimensional case, linearising means finding tangents to the curves of interest about the current estimates, so that the standard filter recursions can be employed in tandem to produce predictions for the next step. This approach is known as extended Kalman filtering – see [1] – [5].
Extended Kalman filters (EKFs) revert to optimal Kalman filters when the problems become linear. Thus, EKFs can yield approximate minimum-variance estimates. However, there are no accompanying performance guarantees and they fall into the try-at-your-own-risk category. Indeed, Anderson and Moore [3] caution that the EKF “can be satisfactory on occasions”. A number of compounding factors can cause performance degradation. The approximate linearisations may be crude and are carried out about estimated states (as opposed to true states). Observability problems occur when the variables do not map onto each other, giving rise to discontinuities within estimated state trajectories. Singularities within functions can result in non-positive solutions to the design Riccati equations and lead to instabilities.
The discussion includes suggestions for performance improvement and is organised as follows. The next section begins with Taylor series expansions, which are prerequisites for linearisation. First, second and third-order EKFs are then derived. EKFs tend be prone to instability and a way of enforcing stability is to masquerade the design Riccati equation by a faux version. This faux algebraic Riccati equation technique [6] – [10] is presented in Section 10.3. In Section 10.4, the higher order terms discarded by an EKF are treated as uncertainties. It is shown that a robust EKF arises by solving a scaled H∞ problem in lieu of one possessing uncertainties. Nonlinear smoother procedures can be designed similarly. The use of fixed-lag and Rauch-Tung-Striebel smoothers may be preferable from a complexity perspective. However, the approximate minimum-variance and robust smoothers, which are presented in Section 10.5, revert to optimal solutions when the nonlinearities and uncertainties diminish. Another way of guaranteeing stability is to by imposing constraints and one such approach is discussed in Section 10.6.
Smoothing, Filtering and Prediction: Estimating the Past, Present and Future Second Edition, 2019
The previously-discussed optimum predictor, filter and smoother solutions assume that the model p... more The previously-discussed optimum predictor, filter and smoother solutions assume that the model parameters are correct, the noise processes are white and their associated covariances are known precisely. These solutions are optimal in a mean-square-error sense, that is they provide the best average performance. If the above assumptions are correct, then the filter’s mean-square-error equals the trace of design error covariance. The underlying modelling and noise assumptions are a often convenient fiction. They do, however, serve to allow estimated performance to be weighed against implementation complexity.
In general, robustness means “the persistence of a system’s characteristic behaviour under perturbations or conditions of uncertainty” [1]. In an estimation context, robust solutions refer to those that accommodate uncertainties in problem specifications. They are also known as worst-case or peak error designs. The standard predictor, filter and smoother structures are retained but a larger design error covariance is used to account for the presence of modelling error.
Designs that cater for worst cases are likely to exhibit poor average performance. Suppose that a bridge designed for average loading conditions returns an acceptable cost benefit. Then a robust design that is focussed on accommodating infrequent peak loads is likely to provide worse cost performance. Similarly, a worst-case shoe design that accommodates rarely occurring large feet would provide poor fitting performance on average. That is, robust designs tend to be conservative. In practice, a trade-off may be desired between optimum and robust designs.
The material canvassed herein is based on the H∞ filtering results from robust control. The robust control literature is vast, see [2] – [33] and the references therein. As suggested above, the H∞ solutions of interest here involve observers having gains that are obtained by solving Riccati equations. This Riccati equation solution approach relies on the Bounded Real Lemma – see the pioneering work by Vaidyanathan [2] and Petersen [3]. The Bounded Real Lemma is implicit with game theory [9] – [19]. Indeed, the continuous-time solutions presented in this section originate from the game theoretic approach of Doyle, Glover, Khargonekar, Francis Limebeer, Anderson, Khargonekar, Green, Theodore and Shaked, see [4], [13], [15], [21]. The discussed discrete-time versions stem from the results of Limebeer, Green, Walker, Yaesh, Shaked, Xie, de Souza and Wang, see [5], [11], [18], [19], [21]. In the parlance of game theory: “a statistician is trying to best estimate a linear combination of the states of a system that is driven by nature; nature is trying to cause the statistician’s estimate to be as erroneous as possible, while trying to minimise the energy it invests in driving the system” [19].
Pertinent state-space H∞ predictors, filters and smoothers are described in [4] – [19]. Some prediction, filtering and smoothing results are summarised in [13] and methods for accommodating model uncertainty are described in [14], [18], [19]. The aforementioned methods for handling model uncertainty can result in conservative designs (that depart far from optimality). This has prompted the use of linear matrix inequality solvers in [20], [23] to search for optimal solutions to model uncertainty problems.
It is explained in [15], [19], [21] that a saddle-point strategy for the games leads to robust estimators, and the resulting robust smoothing, filtering and prediction solutions are summarised below. While the solution structures remain unchanged, designers need to tweak the scalar within the underlying Riccati equations.
This chapter has two main parts. Section 9.2 describes robust continuous-time solutions and the discrete-time counterparts are presented in Section 9.3. The previously discussed techniques each rely on a trick. The optimum filters and smoothers arise by completing the square. In maximum-likelihood estimation, a function is differentiated with respect to an unknown parameter and then set to zero. The trick behind the described robust estimation techniques is the Bounded Real Lemma, which opens the discussions.
Smoothing, Filtering and Prediction: Estimating the Past, Present and Future Second Edition, 2019
Predictors, filters and smoothers have previously been described for state recovery under the ass... more Predictors, filters and smoothers have previously been described for state recovery under the assumption that the parameters of the generating models are correct. More often than not, the problem parameters are unknown and need to be identified. This section describes some standard statistical techniques for parameter estimation. Paradoxically, the discussed parameter estimation methods rely on having complete state information available. Although this is akin to a chicken-and-egg argument (state availability obviates the need for filters along with their attendant requirements for identified models), the task is not insurmountable.
The role of solution designers is to provide a cost benefit. That is, their objectives are to deliver improved performance at an acceptable cost. Inevitably, this requires simplifications so that the problems become sufficiently tractable and amenable to feasible solution. For example, suppose that speech emanating from a radio is too noisy and barely intelligible. In principle, high-order models could be proposed to equalise the communication channel, demodulate the baseband signal and recover the phonemes. Typically, low-order solutions tend to offer better performance because of the difficulty in identifying large numbers of parameters under low-SNR conditions. Consider also the problem of monitoring the output of a gas sensor and triggering alarms when environmental conditions become hazardous. Complex models could be constructed to take into account diurnal pressure variations, local weather influences and transients due to passing vehicles. It often turns out that low-order solutions exhibit lower false alarm rates because there are fewer assumptions susceptible to error. Thus, the absence of complete information need not inhibit solution development. Simple schemes may suffice, such as conducting trials with candidate parameter values and assessing the consequent error performance.
In maximum-likelihood estimation [1] – [5], unknown parameters θ1, θ2, …, θM, are identified given states, xk, by maximising a log-likelihood function, log f(θ1, θ2, …, . For example, the subject of noise variance estimation was studied by Mehra in [6], where maximum-likelihood estimates (MLEs) were updated using the Newton-Raphson method. Rife and Boorstyn obtained Cramér-Rao bounds for some MLEs, which “indicate the best estimation that can be made with the available data” [7]. Nayak et al used the pseudo-inverse to estimate unknown parameters in [8]. Belangér subsequently employed a least-squares approach to estimate the process noise and measurement noise variances [9]. A recursive technique for least-squares parameter estimation was developed by Strejc [10]. Dempster, Laird and Rubin [11] proved the convergence of a general purpose technique for solving joint state and parameter estimation problems, which they called the expectation-maximisation (EM) algorithm. They addressed problems where complete (state) information is not available to calculate the log-likelihood and instead maximised the expectation of , given incomplete measurements, zk. That is, by virtue of Jensen’s inequality the unknowns are found by using an objective function (which are also called an approximate log-likelihood function), , as a surrogate for log f(θ1, θ2, …, .
The system identification literature is vast and some mature techniques have evolved. Subspace identification methods have been developed for general problems where a system’s stochastic inputs, deterministic inputs and outputs are available. The subspace algorithms [12] – [14] consist of two steps. First, the order of the system is identified from stacked vectors of the inputs and outputs. Then the unknown parameters are determined from an extended observability matrix.
Continuous-time maximum-likelihood estimation has been mentioned previously. Here, the attention is focussed on the specific problem of joint state and parameter estimation exclusively from discrete measurements of a system’s outputs. The developments proceed as follows. Section 8.2 reviews the maximum-likelihood estimation method for obtaining unknown parameters. The same estimates can be found using the method of least squares, which was pioneered by Gauss for fitting astronomical observations. Well known (filtering) EM algorithms for variance and state matrix estimation are described in Section 8.3. Improved parameter estimation accuracy can be obtained via smoothing EM algorithms, which are introduced in Section 8.4.
Although EM algorithms can yield improved state matrix and input process covariance estimates, it has been found that they are only accurate when the measurement noise is negligible. Similarly in subspace identification, least-squares estimation of unknown state-space matrices can lead to biased results when the states are corrupted by noise. This arises because the standard least-squares and maximum-likelihood estimates of a state matrix and an input process covariance themselves are biased in the presence of measurement noise. Therefore, a correction term is introduced in Section 8.5 to eliminate the bias error. This yields unbiased, consistent, closed-form estimates of a state matrix, an input process covariance and a measurement noise covariance. Under simplifying conditions they are equal to MLEs and attain the corresponding Cramer-Rao Lower Bounds (CRLBs).
The use of the MLEs, filtering and smoothing EM algorithms discussed herein require caution. When perfect information and sufficiently large sample sizes are available, the corresponding likelihood functions are exact. However, the use of imperfect information leads to approximate likelihood functions and biased MLEs, which can degrade parameter estimation accuracy and follow-on filter/smoother performance.
Smoothing, Filtering and Prediction: Estimating the Past, Present and Future Second Edition Kindle, 2019
Observations are invariably accompanied by measurement noise and optimal filters are the usual so... more Observations are invariably accompanied by measurement noise and optimal filters are the usual solution of choice. Filter performances that fall short of user expectations motivate the pursuit of smoother solutions. Smoothers promise useful mean-square-error improvement at mid-range signal-to-noise ratios, provided that the assumed model parameters and noise statistics are correct.
In general, discrete-time filters and smoothers are more practical than the continuous-time counterparts. Often a designer may be able to value-add by assuming low-order discrete-time models which bear little or no resemblance to the underlying processes. Continuous-time approaches may be warranted only when application-specific performance considerations outweigh the higher overheads.
This chapter canvasses the main discrete-time fixed-point, fixed-lag and fixed interval smoothing results [1] – [9]. Fixed-point smoothers [1] calculate an improved estimate at a prescribed past instant in time. Fixed-lag smoothers [2] – [3] find application where small end-to-end delays are tolerable, for example, in press-to-talk communications or receiving public broadcasts. Fixed-interval smoothers [4] – [9] dispense with the need to fine tune the time of interest or the smoothing lags. They are suited to applications where processes are staggered such as delayed control or off-line data analysis. For example, in underground coal mining, smoothed position estimates and control signals can be calculated while a longwall shearer is momentarily stationary at each end of the face [9]. Similarly, in exploration drilling, analyses are typically carried out post-data acquisition.
The smoother descriptions are organised as follows. Section 7.2 sets out two prerequisites: time-varying adjoint systems and Riccati difference equation comparison theorems. Fixed-point, fixed-lag and fixed-interval smoothers are discussed in Sections 7.3, 7.4 and 7.5, respectively. It turns out that the structures of the discrete-time smoothers are essentially the same as those of the previously-described continuous-time versions. Differences arise in the calculation of Riccati equation solutions and the gain matrices. Consequently, the treatment is somewhat condensed. It is reaffirmed that the above-mentioned smoothers outperform the Kalman filter and the minimum-variance smoother provides the best performance.
Smoothing, Filtering and Prediction: Estimating the Past, Present and Future Second Edition, 2019
The previously-described minimum-mean-square-error and minimum-variance filtering solutions opera... more The previously-described minimum-mean-square-error and minimum-variance filtering solutions operate on measurements up to the current time. If some processing delay can be tolerated then improved estimation performance can be realised through the use of smoothers. There are three state-space smoothing technique categories, namely, fixed-point, fixed-lag and fixed-interval smoothing. Fixed-point smoothing refers to estimating some linear combination of states at a previous instant in time. In the case of fixed-lag smoothing, a fixed time delay is assumed between the measurement and on-line estimation processes. Fixed-interval smoothing is for retrospective data analysis, where measurements recorded over an interval are used to obtain the improved estimates. Compared to filtering, smoothing has a higher implementation cost, as it has increased memory and calculation requirements.
A large number of smoothing solutions have been reported since Wiener’s and Kalman’s development of the optimal filtering results – see the early surveys [1] – [2]. The minimum-variance fixed-point and fixed-lag smoother solutions are well known. Two fixed-interval smoother solutions, namely the maximum-likelihood smoother developed by Rauch, Tung and Striebel [3], and the two-filter Fraser-Potter formula [4], have been in widespread use since the 1960s. However, the minimum-variance fixed-interval smoother is not well known. This smoother is simply a time-varying state-space generalisation of the optimal Wiener solution. It differs from the Rauch-Tung-Striebel and Fraser-Potter solutions, which may not sit well with more orthodox practitioners.
The main approaches for continuous-time fixed-point, fixed-lag and fixed-interval smoothing are canvassed here. It is assumed throughout that the underlying noise processes are zero mean and uncorrelated. Nonzero means and correlated processes can be handled using the approaches of Chapters 3 and 4. It is also assumed here that the noise statistics and state-space model parameters are known precisely. Note that techniques for estimating parameters and accommodating uncertainty are addressed subsequently.
Some prerequisite concepts, namely time-varying adjoint systems, backwards differential equations, Riccati equation comparison and the continuous-time maximum-likelihood method are covered in Section 6.2. Section 6.3 outlines a derivation of the fixed-point smoother by Meditch [5]. The fixed-lag smoother reported by Sage et al [6] and Moore [7], is the subject of Section 4. Section 5 deals with the Rauch-Tung-Striebel [3], Fraser-Potter [4] and minimum-variance fixed-interval smoother solutions [8] - [10]. As before, the approach here is to accompany the developments, where appropriate, with proofs about performance being attained. Smoothing is not a panacea for all ills. If the measurement noise is negligible then smoothing (and filtering) may be superfluous. Conversely, if measurement noise obliterates the signals then data recovery may not be possible. Therefore, estimator performance is often discussed in terms of the prevailing signal-to-noise ratio.
Smoothing, Filtering and Prediction: Estimating the Past, Present and Future Second Edition, 2019
This chapter presents the minimum-variance filtering results simplified for the case when the mod... more This chapter presents the minimum-variance filtering results simplified for the case when the model parameters are time-invariant and the noise processes are stationary. The filtering objective remains the same, namely, the task is to estimate a signal in such as way to minimise the filter error covariance.A somewhat naïve approach is to apply the standard filter recursions using the time-invariant problem parameters. Although this approach is valid, it involves recalculating the Riccati difference equation solution and filter gain at each time-step, which is computationally expensive. A lower implementation cost can be realised by recognising that the Riccati difference equation solution asymptotically approaches the solution of an algebraic Riccati equation. In this case, the algebraic Riccati equation solution and hence the filter gain can be calculated before running the filter.
The steady-state discrete-time Kalman filtering literature is vast and some of the more accessible accounts [1] – [14] are canvassed here. The filtering problem and the application of the standard time-varying filter recursions are described in Section 5.2. An important criterion for checking whether the states can be uniquely reconstructed from the measurements is observability. For example, sometimes states may be internal or sensor measurements might not be available, which can result in the system having hidden modes. Section 5.3 describes two common tests for observability, namely, checking that an observability matrix or an observability gramian are of full rank. The subject of Riccati equation monotonicity and convergence has been studied extensively by Chan [4], De Souza [5], [6], Bitmead [7], [8], Wimmer [9] and Wonham [10], which is discussed in Section 5.4. Chan, et al [4] also showed that if the underlying system is stable and observable then the minimum-variance filter is stable. Section 6 describes a discrete-time version of the Kalman-Yakubovich-Popov Lemma, which states for time-invariant systems that solving a Riccati equation is equivalent to spectral factorisation. In this case, the Wiener and Kalman filters are the same.
Since the optimal filter is model-based, any unknown model parameters need to be estimated (as explained in Chapter 7) prior to implementation. The estimated parameters can be inexact which leads to degraded filter performance. An iterative frequency weighting procedure is described in Section 5.5 for mitigating the performance degradation.
Smoothing, Filtering and Prediction: Estimating the Past, Present and Future Second Edition, 2019
Kalman filters are employed wherever it is desired to recover data from the noise in an optimal w... more Kalman filters are employed wherever it is desired to recover data from the noise in an optimal way, such as in satellite orbit estimation, aircraft guidance, radar, communication systems, navigation, medical diagnosis and finance. Continuous-time problems that possess differential equations may be easier to describe in a state-space framework, however, the filters have higher implementation costs because an additional integration step and higher sampling rates are required. Conversely, although discrete-time state-space models may be less intuitive, the ensuing filter difference equations can be realised immediately.
The discrete-time Kalman filter calculates predicted states via a linear recursion in which the predictor gain is a function of the noise statistics and the model parameters. This solution was reported by Rudolf E. Kalman in the 1960s [1], [2]. He has since received many awards and prizes, including the National Medal of Science, which was presented to him by President Barack Obama in 2009.
The Kalman filter calculations are simple and well-established. A possibly troublesome obstacle is expressing problems at hand within a state-space framework. This chapter derives the main discrete-time results to provide familiarity with state-space techniques and filter application. The continuous-time and discrete-time minimum-square-error Wiener filters were derived using a completing-the-square approach in Chapters 1 and 2, respectively. Similarly for time-varying continuous-time signal models, the derivation of the minimum-variance Kalman filter, presented in Chapter 3, relied on a least-mean-square (or conditional-mean) formula. This formula is used again in the solution of the discrete-time prediction and filtering problems. Predictions can be used when the measurements are irregularly spaced or missing at the cost of increased mean-square-error.
This chapter develops the prediction and filtering results for the case where the problem is nonstationary or time-varying. It is routinely assumed that the process and measurement noises are zero mean and uncorrelated. Nonzero mean cases can be accommodated by including deterministic inputs within the state prediction and filter output updates. Correlated noises can be handled by adding a term within the predictor gain and the underlying Riccati equation. The same approach is employed when the signal model possesses a direct-feedthrough term. A simplification of the generalised regulator problem from control theory is presented, from which the solutions of output estimation, input estimation (or equalisation), state estimation and mixed filtering problems follow immediately.
Smoothing, Filtering and Prediction: Estimating the Past, Present and Future Second Edition, 2019
Rudolf E. Kalman studied discrete-time linear dynamic systems for his master’s thesis at MIT in 1... more Rudolf E. Kalman studied discrete-time linear dynamic systems for his master’s thesis at MIT in 1954. He commenced work at the Research Institute for Advanced Studies (RIAS) in Baltimore during 1957 and nominated Richard S. Bucy to join him in 1958 [1]. Bucy recognised that the nonlinear ordinary differential equation studied by an Italian mathematician, Count Jacopo F. Riccati, in around 1720, now called the Riccati equation, is equivalent to the Wiener-Hopf equation for the case of finite dimensional systems [1], [2]. In November 1958, Kalman recasted the frequency domain methods developed by Norbert Wiener and Andrei N. Kolmogorov in the 1940s to state-space form [2]. Kalman noted in his 1960 paper [3] that generalising the Wiener solution to nonstationary problems was difficult, which motivated his development of the optimal discrete-time filter in a state-space framework. He described the continuous-time version with Bucy in 1961 [4] and published a generalisation in 1963 [5]. Bucy later investigated the monotonicity and stability of the underlying Riccati equation [6]. The continuous-time minimum-variance filter is now commonly attributed to both Kalman and Bucy.
Compared to the Wiener Filter, Kalman’s state-space approach has the following advantages.
• It is applicable to time-varying problems.
• As noted in [7], [8], the state-space parameters can be linearisations of nonlinear models.
• The burdens of spectral factorisation and pole-zero cancelation are replaced by the easier task of solving a Riccati equation.
• It is a more intuitive model-based approach in which the estimated states correspond to those within the signal generation process.
Kalman’s research at the RIAS was concerned with estimation and control for aerospace systems which was funded by the Air Force Office of Scientific Research. His explanation of why the dynamics-based Kalman filter is more important than the purely stochastic Wiener filter is that “Newton is more important than Gauss” [1]. The continuous-time Kalman filter produces state estimates from the solution of a simple differential equation in which it is tacitly assumed that the model is correct, the noises are zero-mean, white and uncorrelated. It is straightforward to include nonzero means, coloured and correlated noises. In practice, the true model can be elusive but a simple (low-order) solution may return a cost benefit.
The Kalman filter can be derived in many different ways. In an early account [3], a quadratic cost function was minimised using orthogonal projections. Other derivation methods include deriving a maximum a posteriori estimate, using Itô’s calculus, calculus-of-variations, dynamic programming, invariant imbedding and from the Wiener-Hopf equation [6] - [17]. This chapter provides a brief derivation of the optimal filter using a conditional mean (or equivalently, a least mean square error) approach.
The developments begin by introducing a time-varying state-space model. Next, the state transition matrix is defined, which is used to derive a Lyapunov differential equation. The Kalman filter follows immediately from a conditional mean formula. Its filter gain is obtained by solving a Riccati differential equation corresponding to the estimation error system. Generalisations for problems possessing deterministic inputs, correlated process and measurement noises, and direct feedthrough terms are described subsequently. Finally, it is shown that the Kalman filter reverts to the Wiener filter when the problems are time-invariant.
Smoothing, Filtering and Prediction: Estimating the Past, Present and Future Second Edition, 2019
This chapter reviews the solutions for the discrete-time, linear stationary filtering problems th... more This chapter reviews the solutions for the discrete-time, linear stationary filtering problems that are attributed to Wiener [1] and Kolmogorov [2]. As in the continuous-time case, a model-based approach is employed. Here, a linear model is specified by the coefficients of the input and output difference equations. It is shown that the same coefficients appear in the system’s (frequency domain) transfer function. In other words, frequency domain model representations can be written down without background knowledge of z-transforms.
In the 1960s and 1970s, continuous-time filters were implemented on analogue computers. This technology has been largely discontinued for two main reasons. First, analogue multipliers and op amp circuits exhibit poor performance whenever (temperature-sensitive) calibrations become out of date. Second, updated software releases are faster to turn around than hardware design iterations. Continuous-time filters are now routinely implemented using digital computers, provided that the signal sampling rates and data processing rates are sufficiently high. Alternatively, continuous-time model parameters may be converted into discrete-time and differential equations can be transformed into difference equations. The ensuing discrete-time filter solutions are then amenable to more economical implementation, namely, employing relatively lower processing rates.
The discrete-time Wiener filtering problem is solved in the frequency domain. Once again, it is shown that the optimum minimum-mean-square-error solution is found by completing the square. The optimum solution is noncausal, which can only be implemented by forward and backward processes. This solution is actually a smoother and the optimum filter is found by taking the causal part.
The developments rely on solving a spectral factorisation problem, which requires pole-zero cancellations. Therefore, some pertinent discrete-time concepts are introduced in Section 2.2 prior to deriving the filtering results. The discussion of the prerequisite concepts is comparatively brief since it mirrors the continuous-time material introduced previously. In Section 2.3 it is shown that the structure of the filter solutions is unchanged – only the spectral factors are calculated differently.
Smoothing, Filtering and Prediction: Estimating the Past, Present and Future Second Edition, 2019
Optimal filtering is concerned with designing the best linear system for recovering data from noi... more Optimal filtering is concerned with designing the best linear system for recovering data from noisy measurements. It is a model-based approach requiring knowledge of the signal generating system. The signal models, together with the noise statistics are factored into the design in such a way to satisfy an optimality criterion, namely, minimising the square of the error.
A prerequisite technique, the method of least-squares, has its origin in curve fitting. Amid some controversy, Kepler claimed in 1609 that the planets move around the Sun in elliptical orbits [1]. Carl Freidrich Gauss arrived at a better performing method for fitting curves to astronomical observations and predicting planetary trajectories in 1799 [1]. He formally published a least-squares approximation method in 1809 [2], which was developed independently by Adrien-Marie Legendre in 1806 [1]. This technique was famously used by Giusseppe Piazzi to discover and track the asteroid Ceres using a least-squares analysis which was easier than solving Kepler’s complicated nonlinear equations of planetary motion [1]. Andrey N. Kolmogorov refined Gauss’s theory of least-squares and applied it for the prediction of discrete-time stationary stochastic processes in 1939 [3]. Norbert Wiener, a faculty member at MIT, independently solved analogous continuous-time estimation problems. He worked on defence applications during the Second World War and produced a report entitled Extrapolation, Interpolation and Smoothing of Stationary Time Series in 1943. The report was later published as a book in 1949 [4].
Wiener derived two important results, namely, the optimum (non-causal) minimum-mean-square-error solution and the optimum causal minimum-mean-square-error solution [4] – [6]. The optimum causal solution has since become known at the Wiener filter and in the time-invariant case is equivalent to the Kalman filter that was developed subsequently. Wiener pursued practical outcomes and attributed the term “unrealisable filter” to the optimal non-causal solution because “it is not in fact realisable with a finite network of resistances, capacities, and inductances” [4]. Wiener’s unrealisable filter is actually the optimum linear smoother.
The optimal Wiener filter is calculated in the frequency domain. Consequently, Section 1.2 touches on some frequency-domain concepts. In particular, the notions of spaces, state-space systems, transfer functions, canonical realisations, stability, causal systems, power spectral density and spectral factorisation are introduced. The Wiener filter is then derived by minimising the square of the error. Three cases are discussed in Section 1.3. First, the solution to general estimation problem is stated. Second, the general estimation results are specialised to output estimation. The optimal input estimation or equalisation solution is then described. An example, demonstrating the recovery of a desired signal from noisy measurements, completes the chapter.
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2013
—This paper considers estimation problems where inequality constraints are imposed on the outputs... more —This paper considers estimation problems where inequality constraints are imposed on the outputs of linear systems and can be modeled by nonlinear functions. In this case, censoring functions can be designed to constrain measurements for use by filters and smoothers. It is established that the filter and smoother output estimates are unbiased, provided that the underlying probability density functions are even and the censoring functions are odd. The Bounded Real Lemma is employed to ensure that the output estimates satisfy a performance criterion. A global positioning system (GPS) and inertial navigation system (INS) integration application is discussed in which a developed solution exhibits improved performance during GPS outages when a priori information is used to constrain the altitude and velocity measurements.
IEEE Trans. Signal Processing, 2006
—The paper describes an optimal minimum-variance noncausal filter or fixed-interval smoother. The... more —The paper describes an optimal minimum-variance noncausal filter or fixed-interval smoother. The optimal solution involves a cascade of a Kalman predictor and an adjoint Kalman predictor. A robust smoother involving H predictors is also described. Filter asymptotes are developed for output estimation and input estimation problems which yield bounds on the spectrum of the estimation error. These bounds lead to a priori estimates for the scalar in the H filter and smoother design. The results of simulation studies are presented, which demonstrate that optimal, robust, and extended Kalman smoothers can provide performance benefits.
IEEE Signal Processing Letters, 2014
Pioneering research on the perception of sounds at different frequencies was conducted by Fletche... more Pioneering research on the perception of sounds at different frequencies was conducted by Fletcher and Munson in the 1930s. Their work led to a standard way of weighting measured sound levels within investigations of industrial noise and hearing loss. Frequency weightings have since been used within filter and controller designs to manage performance within bands of interest. This paper introduces iterative frequency weighted filtering and smoothing procedures. It is assumed that filter and smoother estimation errors are generated by a first-order moving-average (AR1) system. This AR1 system is identified and used to apply a frequency weighting function within the design of minimum-variance filters and smoothers. It is shown under prescribed conditions that the described solutions result in nonincreasing error variances. An example is described which demonstrates improved mean-square-error performance.
IEEE Control Systems Magazine, 2008
Longwall mining is a method for extracting coal from underground mines. The mining technology inv... more Longwall mining is a method for extracting coal from underground mines. The mining technology involves a longwall shearer, which is a 15-m long, 100-ton machine that has picks attached to two drums, which rotate at 30 to 40 revolutions per minute. A longwall face is the mined area from which material is extracted. The shearer removes coal by traversing a face at approximately 25-minute intervals. Traditionally, longwall mining equipment is controlled manually, where the face is aligned using a string line. Under manual control, the face meanders and wanders out of the coal seam, which causes rock to contaminate the coal and limits the rate of production. Coal production is maximized by maintaining a straight shearer trajectory and keeping the face within the seam. Therefore, precise estimates of the face locations are required so that the longwall equipment can be repositioned after each shear. We are automating longwall equipment to improve production. A Northrop Grumman LN270 inertial navigation unit and an IEEE 802.11b wireless local area network client device are installed within a flame-proof enclosure, which, together with an odometer, are mounted on a shearer. The inertial navigation unit and odometer measure the shearer's orientation and distance travelled across the face, respectively. The inertial navigation and odometer measurements are stored locally and subsequently forwarded when the shearer is near an access point of the wireless local area network. Upon completion of each shear, inertial navigation and odometer data are used to estimate the position of the face and control the roof support equipment for the next shear. In particular, minimum-variance fixed-interval smoothing is applied to the inertial and odometer measurements to calculate face positions in three-dimensional space. Filtering refers to the process of estimating the current value of a signal from noisy measurements up to the current time. In fixed-interval smoothing, measurements recorded over an interval are used to estimate past values of a signal. Compared to filtering, smoothing can provide improved estimation accuracy at the cost of twice the computational complexity. Smoothing is applicable wherever measurements are organized in blocks and retrospective data analysis is feasible. In the case of longwall mining, the position estimates and controls are calculated while the shearer is momentarily stationary at the ends of the face.
SMOOTHING, FILTERING AND PREDICTION: ESTIMATING THE PAST, PRESENT AND FUTURE, 2012
IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2018
This paper investigates deteriorations in knee and ankle dynamics during running. Changes in lo... more This paper investigates deteriorations in
knee and ankle dynamics during running. Changes in
lower limb accelerations are analyzed by a wearable
musculoskeletal monitoring system. The system employs
a machine-learning technique to classify joint stiffness. A
maximum-entropy-rate method is developed to select the
most relevant features. Experimental results demonstrate
that distance travelled and energy expended can be estimated
from observed changes in knee and ankle motions
during 5-km runs.
IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS, 2017
A high order signal model is proposed in which the states are Kronecker tensor products of probab... more A high order signal model is proposed in which the states are Kronecker tensor products of probability distributions. This model enables an optimal linear filter to be specified. A minimum residual error variance criterion may be used to select the number of discretizations and Kronecker products. The filtering of LIDAR data from a coal shiploader environment is investigated. It is demonstrated that the proposed method can outperform conventional Kalman and hidden Markov model filters.
IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS, 2015
The minimum-variance filter and smoother are generalized to include Poisson-distributed measureme... more The minimum-variance filter and smoother are generalized to include Poisson-distributed measurement noise components. It is shown that the resulting filtered and smoothed estimates are unbiased. The use of the filter and smoother within Expectation-Maximization algorithms are described for joint estimation of the signal and Poisson noise intensity. Conditions for the monotonicity and asymptotic convergence of the Poisson intensity iterates are also established. An image restoration example is presented which demonstrates improved estimation performance at low signal-to-noise ratios.
Prime Publishing – Amazon.com, 2019
Scientists, engineers and the like are a strange lot. Unperturbed by societal norms, they direct ... more Scientists, engineers and the like are a strange lot. Unperturbed by societal norms, they direct their energies to finding better alternatives to existing theories and concocting solutions to unsolved problems. Driven by an insatiable curiosity, they record their observations and crunch the numbers. This tome is about the science of crunching. It’s about digging out something of value from the detritus that others tend to leave behind. The described approaches involve constructing models to process the available data. Smoothing entails revisiting historical records in an endeavour to understand something of the past. Filtering refers to estimating what is happening currently, whereas prediction is concerned with hazarding a guess about what might happen next.
This book describes the classical smoothing, filtering and prediction techniques together with some more recently developed embellishments for improving performance within applications. It aims to present the subject in an accessible way, so that it can serve as a practical guide for undergraduates and newcomers to the field. The material is organised as an eleven-lecture course. The foundations are laid in Chapters 1 and 2, which explain minimum-mean-square-error solution construction and asymptotic behaviour. Chapters 3 and 4 introduce continuous-time and discrete-time minimum-variance filtering. Generalisations for missing data, deterministic inputs, correlated noises, direct feedthrough terms, output estimation and equalisation are described. Chapter 5 simplifies the minimum-variance filtering results for steady-state problems. Observability, Riccati equation solution convergence, asymptotic stability and Wiener filter equivalence are discussed. Chapters 6 and 7 cover the subject of continuous-time and discrete-time smoothing. The main fixed-lag, fixed-point and fixed-interval smoother results are derived. It is shown that the minimum-variance fixed-interval smoother attains the best performance. Chapter 8 attends to parameter estimation. As the above-mentioned approaches all rely on knowledge of the underlying model parameters, maximum-likelihood techniques within expectation-maximisation algorithms for joint state and parameter estimation are described. Chapter 9 is concerned with robust techniques that accommodate uncertainties within problem specifications. An extra term within Riccati equations enables designers to trade-off average error and peak error performance. Chapter 10 applies the afore-mentioned linear techniques to nonlinear estimation problems. It is demonstrated that step-wise linearisations can be used within predictors, filters and smoothers, albeit by forsaking optimal performance guarantees. Chapter 11 rounds off the course by exploiting knowledge about transition probabilities. HMM and minimum-variance-HMM filters and smoothers are derived. The improved performance offered by these techniques needs to be reconciled against the significantly higher calculation overheads.