Daniel McMichael - Academia.edu (original) (raw)

Papers by Daniel McMichael

Research paper thumbnail of Objective functions for maximum likelihood classifier design

1999 Information, Decision and Control. Data and Information Fusion Symposium, Signal Processing and Communications Symposium and Decision and Control Symposium. Proceedings (Cat. No.99EX251)

This paper reports research into maximum likelihood parameter estimation for classification of da... more This paper reports research into maximum likelihood parameter estimation for classification of data modelled as mixtures of multivariate Gaussian distributions. Two likelihood metrics are compared: the log conditional probability of the feature data (the non-discriminative log likelihood, L/sub n/), and the log conditional probability of the class labels (the discriminative log likelihood, L/sub d/). Results on some simple data sets indicate that L/sub d/ yields poorer classification accuracy, as measured by the average log probability l~/sub c/ of obtaining the correct classification of a set of labelled test data. Analysis of the score equations and the information matrices derived from L/sub d/ and L/sub n/ reveals that L/sub d/ produces estimates of class means with larger bias and variance, and hence larger mean-square error (E~/sup 2/), than those from L/sub n/. Some experimental results on simple data sets are given as illustration.

Research paper thumbnail of Statistical Models for Situation Awareness

1999 Information, Decision and Control. Data and Information Fusion Symposium, Signal Processing and Communications Symposium and Decision and Control Symposium. Proceedings (Cat. No.99EX251)

Situation awareness can be defined as the state of knowledge of an information user relevant to a... more Situation awareness can be defined as the state of knowledge of an information user relevant to achieving its objective. Achieving a satisfactory state of awareness of technical, military and business situations can be assisted by the provision of information generated by statistical computations. Automated systems seeking to provide information to users need to assess its relevance to the user’s objectives, decision-making and planning. Such systems therefore need to second guess the user. In fielded systems, these decisions are made on the basis of prior or learned information [ 1,2,3]. However, more advanced systems evaluate the appropriate decision sequences or the game. This talk categorises the statistical problems, and surveys a selection of the relevant models, including partially observed Markov decision processes, multi-arm bandit games, decision networks and concludes with a summary of new work on factorial hidden Markov models.

Research paper thumbnail of Functional Combinatory Categorial Grammar

Functional Combinatory Categorial Grammar (FCCG) advances the field of combi- natory categorial g... more Functional Combinatory Categorial Grammar (FCCG) advances the field of combi- natory categorial grammars by enabling semantic dependencies to be determined directly from the syntactic derivation under the action of a small set of extraction rules. Pred- icates are extracted composably and can be used to apply semantic constraints during parsing. The approach is an alternative to that of classical CCG which requires (i) map- ping from categories to lambda expressions, (ii) a set of semantic transformation rules for unary combination, and (iii) an explicit �-reduction stage. GFCCG, a generalised form of the grammar, has previously been applied to situation assessment (McMichael, Jarrad, & Williams, 2006). In FCCG, combinators are largely distinguished by their semantic purpose. Unary combination is only used for preterminal-terminal transitions. Replacing unary type- raising and type-changing by their binary counterparts R and P tends to reduce parse ambiguity. Four other binary combi...

Research paper thumbnail of A Meta-grammar for CCG

Applying CCG to domains outside of linguistics could require different sets of combinators to be ... more Applying CCG to domains outside of linguistics could require different sets of combinators to be developed for each domain. The meta-grammar described in this paper aims to assist such development by enabling simple, succinct expression of both existing and new combinator definitions. It favours the development of an easily-configurable, onetime-coded module that can perform CCG combinations for any combinator set of the researcher’s choosing. A preliminary implementation shows both the feasibility and potential of the meta-grammar.

Research paper thumbnail of Cool Fusion Statistical Modelling for Data Fusion

The consequences of the use of probability in modelling under uncertainty are explored, and it is... more The consequences of the use of probability in modelling under uncertainty are explored, and it is shown how data and parameter independence and likelihood modularity emerge as desirable properties for statistical models. They lead directly to multi-object models which themselves create the association problem. The expectation-maximisation (EM) technique is introduced and then applied to discriminative training of a Gaussian mixture model classi er. Generalised training is introduced which interpolates between the extremes of maximising discriminative and non-discriminative likelihoods. A generalisation of EM, the conditional expectation-maximisation process is presented, and applied to designing an algorithm for estimating the location and rotation parameters for transforming a 3D reference model to generate an observed x-ray image. The EM E-step is derived for the case in which the nuisance variables (\missing data") are divided into sets and integrated out separately. This co...

Research paper thumbnail of INFORMATION FUSION, CAUSAL PROBABILISTIC NETWORK AND PROBANET II: Inference Algorithms and Probanet System

As an extension of an overview paper [Pan and McMichael, 1997] on information fusion and Causal P... more As an extension of an overview paper [Pan and McMichael, 1997] on information fusion and Causal Probabilistic Networks (CPN), this paper formalizes kernel algorithms for probabilistic inferences upon CPNs. Information fusion is realized through updating joint probabilities of the variables upon the arrival of new evidences or new hypotheses. Kernel algorithms for some dominant methods of inferences are formalized from discontiguous, mathematics-oriented literatures, with gaps lled in with regards to computability and completeness. In particular, possible optimizations on causal tree algorithm, graph triangulation and junction tree algorithm are discussed. Probanet has been designed and developed as a generic shell, or say, mother system for CPN construction and application. The design aspects and current status of Probanet are described. A few directions for research and system development are pointed out, including hierarchical structuring of network, structure decomposition and ad...

Research paper thumbnail of 2007 I D C Information, Decision and Control

Research paper thumbnail of Robust recursive Lp estimation

IEE Proceedings D Control Theory and Applications

Research paper thumbnail of Track association with kinematic and non-kinematic state histories

IFAC Proceedings Volumes

Abstract In distributed surveillance systems, a key problem is to associate track estimates that ... more Abstract In distributed surveillance systems, a key problem is to associate track estimates that are generated by local processors to obtain a unified set of tracks. Existing track association techniques do not use track histories, and they do not take into account non-kinematic information such as target class that may be available. This paper presents a new technique that addresses these issues. We give three track association tests using histories of generalised tracks with states that can have kinematic and non-kinematic components. The track association tests are derived by minimising Bayesian risk functions where explicit recursive analytical expressions are obtained for kinematic and non-kinematic likelihood ratios. When the risk function is chosen to be the average probability of error, the test yields a maximum a posteriori (MAP) decision rule. The tests compare likelihoods of fused generalised track histories states to non-fused ones, hence they are computationally more expensive than the traditional track association tests. However, they are optimal in Bayesian sense and they do not require significant amounts of additional memory even though they use generalised track histories.

Research paper thumbnail of Fusing multiple images and extracting features for visual inspection

The aim of image analysis is not to compress and summarise data from images. It is concerned to u... more The aim of image analysis is not to compress and summarise data from images. It is concerned to use image data to abstract information about scenes and objects. The research described in this paper seeks to use insights taken from biological models of the primate visual system to design reliable visual inspection procedures. It forms part of a programme designed to create a coherent image analysis toolbox of biologically inspired algorithms. The paper has both presented practical examples of balanced gradient kernels, and provided the general theory that constrains the point spread functions employed. Two fast methods for edge segment extraction that make use of both gradient and orientation information are able to parameterise edge segments, even down to one pixel in length. In addition to parametric information, confidence factors can also be calculated and incorporated into inspection decision making. These methods, although biologically inspired, are an attempt to code some aspects of neural function in a form efficient for serial computation.

Research paper thumbnail of A Statistical Approach to Situation Assessment

Situation assessment is the discrimination of concise summary descriptions of the state of uncert... more Situation assessment is the discrimination of concise summary descriptions of the state of uncertain parametric models. This paper considers the problem of providing summary descriptions of dynamic state histories of multitarget models. It shows that under the product-ofmarginals approximation of the posterior distribution of these models it is possible to evaluate the likelihood of asynchronous multistage situations involving multiple interacting elements. Such situations include individual and multiple target manoeuvres. A dynamic programming algorithm is described that solves the problem of associating the stages of multiple situation elements with discrete time instants. The search space is much reduced by reparameterising the problem as an optimisation of the interaction times. The problem of associating tracks with situation elements is solved by selective enumeration. Methods are provided for eliminating a priori the vast preponderance of uninteresting possibilities, so that they never need to be calculated.

Research paper thumbnail of A Statistical Approach to Situation Assessment

Situation assessment is the discrimination of concise summary descriptions of the state of uncert... more Situation assessment is the discrimination of concise summary descriptions of the state of uncertain parametric models. This paper considers the problem of providing summary descriptions of dynamic state histories of multitarget models. It shows that under the product-ofmarginals approximation of the posterior distribution of these models it is possible to evaluate the likelihood of asynchronous multistage situations involving multiple interacting elements. Such situations include individual and multiple target manoeuvres. A dynamic programming algorithm is described that solves the problem of associating the stages of multiple situation elements with discrete time instants. The search space is much reduced by reparameterising the problem as an optimisation of the interaction times. The problem of associating tracks with situation elements is solved by selective enumeration. Methods are provided for eliminating a priori the vast preponderance of uninteresting possibilities, so that they never need to be calculated.

Research paper thumbnail of BARTIN: a neural structure that learns to take Bayesian minimum risk decisions

BARTIN (Bayesian real time networks) is a general structure for learning Bayesian minimum risk (m... more BARTIN (Bayesian real time networks) is a general structure for learning Bayesian minimum risk (maximum expected utility) decision schemes. It can be realized in a great variety of forms. The features that distinguish it from a standard Bayesian minimum risk classifier are, (i) it implements a general method for incorporating a prior distribution, and (ii) its ability to learn a risk minimising decision scheme from training data. Included in the enumerative realization described later, and applicable to many other variants, is a method for proportionately biassing specific decisions. BARTIN provides a bridge between neural networks and classical taught decision classification methods that are less versatile but whose internal workings are often much clearer. It provides both the flexibility of a neural network and the structure and clarity of these more formal schemes. >

Research paper thumbnail of Bayesian growing and pruning strategies for MAP-optimal estimation of Gaussian mixture models

Research paper thumbnail of Robot control using the feedback-error-learning rule with variable feedback gain

Abstract A problem plaguing the application of neural networks in all fields is the difficulty of... more Abstract A problem plaguing the application of neural networks in all fields is the difficulty of incorporating prior information to constrain the possible functions that the network can represent. This problem is addressed and solved in a robot control application. The authors present the numerical results of extensive simulations where the main interest is to investigate the generalization ability of the networks when applied to the dynamical robot control problem. They restrict the network weights to a plausible region, obtained from ...

Research paper thumbnail of Data fusion for vehicle-borne mine detection

Research paper thumbnail of BARTIN applied to visual inspection of axisymmetric engineering parts

A visual inspection scheme for detecting flaws in axisymmetric engineering parts is described and... more A visual inspection scheme for detecting flaws in axisymmetric engineering parts is described and shows very good performance. The scheme can be used for both recognition of parts and classifying flaws. The polygon transform, a compact representation of the edge information in images of axisymmetric objects gives the inspection system invariance with respect to location, scale, orientation, and illumination intensity. Polygon transform representations of the object provide the inputs to a BARTIN (Bayesian Real Time Network) that generates the inspection decisions. This implementation employs a distance measure related to Kullback-Leibler divergence to quantify the difference between sample polygon transforms, and demonstrates the parsimony and reliability of the BARTIN architecture. The latter enables inclusion of prior information in the form of probabilities, decision utilities, and engineering drawings. The results obtained from an application provided by British Aerospace gave two wrong (but safe) decisions in 625 test examples. >

Research paper thumbnail of Structural generalization in neural classification: incorporation of prior probabilities

Supervised learning of classifications under l/sub 2/ costing of error trains learning classifier... more Supervised learning of classifications under l/sub 2/ costing of error trains learning classifiers to recall Bayesian a posteriori probabilities of the possible classes, given observed measurements. This result leads to a number of insights concerning the validation of training, access to the likelihood function, creating networks of networks, incorporation of prior probabilities (which may vary in real-time), and how to choose the training set. The author focuses on the latter two points. Contextual information in the form of priors is used to generalise training data to economise on both training and computation. Structural generalization is the process whereby data is generalised architecturally rather than parametrically. A training procedure and postprocessing technique are given which enable learning under one set of prior classification probabilities to be generalized to give (asymptotically) Bayes optimal classifications under all others. >

Research paper thumbnail of Automatic Complexity Determination of Gaussian Mixture Models with the EMS Algorithm

Estimating the complexity and regularisation parameters of semiparametric models like neural netw... more Estimating the complexity and regularisation parameters of semiparametric models like neural networks by repeated trials is slow, and makes them less attractive in real-time estimation problems. Simultaneous estimation of both model parameters and complexity can be achieved using the EMS algorithm which augments expectation-maximisation (EM) to include a pruning and growing step that relies on approximating the posterior odds of model structures with di erent complexities. EMS is applied to Gaussian mixtures. Regularising priors are introduced, including the truncated inverse exponential (TIE) distribution for the component covariance matrices. A fast method for estimating the hyperparameters tunes the smoothing action of the priors to the data. This approach is applied to density estimation of speech sound data and gives a signi cant performance advantage in comparison to current methods in speech recognition.

Research paper thumbnail of A computable theory for learning Bayesian networks based on MAP-MDL principles

Research paper thumbnail of Objective functions for maximum likelihood classifier design

1999 Information, Decision and Control. Data and Information Fusion Symposium, Signal Processing and Communications Symposium and Decision and Control Symposium. Proceedings (Cat. No.99EX251)

This paper reports research into maximum likelihood parameter estimation for classification of da... more This paper reports research into maximum likelihood parameter estimation for classification of data modelled as mixtures of multivariate Gaussian distributions. Two likelihood metrics are compared: the log conditional probability of the feature data (the non-discriminative log likelihood, L/sub n/), and the log conditional probability of the class labels (the discriminative log likelihood, L/sub d/). Results on some simple data sets indicate that L/sub d/ yields poorer classification accuracy, as measured by the average log probability l~/sub c/ of obtaining the correct classification of a set of labelled test data. Analysis of the score equations and the information matrices derived from L/sub d/ and L/sub n/ reveals that L/sub d/ produces estimates of class means with larger bias and variance, and hence larger mean-square error (E~/sup 2/), than those from L/sub n/. Some experimental results on simple data sets are given as illustration.

Research paper thumbnail of Statistical Models for Situation Awareness

1999 Information, Decision and Control. Data and Information Fusion Symposium, Signal Processing and Communications Symposium and Decision and Control Symposium. Proceedings (Cat. No.99EX251)

Situation awareness can be defined as the state of knowledge of an information user relevant to a... more Situation awareness can be defined as the state of knowledge of an information user relevant to achieving its objective. Achieving a satisfactory state of awareness of technical, military and business situations can be assisted by the provision of information generated by statistical computations. Automated systems seeking to provide information to users need to assess its relevance to the user’s objectives, decision-making and planning. Such systems therefore need to second guess the user. In fielded systems, these decisions are made on the basis of prior or learned information [ 1,2,3]. However, more advanced systems evaluate the appropriate decision sequences or the game. This talk categorises the statistical problems, and surveys a selection of the relevant models, including partially observed Markov decision processes, multi-arm bandit games, decision networks and concludes with a summary of new work on factorial hidden Markov models.

Research paper thumbnail of Functional Combinatory Categorial Grammar

Functional Combinatory Categorial Grammar (FCCG) advances the field of combi- natory categorial g... more Functional Combinatory Categorial Grammar (FCCG) advances the field of combi- natory categorial grammars by enabling semantic dependencies to be determined directly from the syntactic derivation under the action of a small set of extraction rules. Pred- icates are extracted composably and can be used to apply semantic constraints during parsing. The approach is an alternative to that of classical CCG which requires (i) map- ping from categories to lambda expressions, (ii) a set of semantic transformation rules for unary combination, and (iii) an explicit �-reduction stage. GFCCG, a generalised form of the grammar, has previously been applied to situation assessment (McMichael, Jarrad, & Williams, 2006). In FCCG, combinators are largely distinguished by their semantic purpose. Unary combination is only used for preterminal-terminal transitions. Replacing unary type- raising and type-changing by their binary counterparts R and P tends to reduce parse ambiguity. Four other binary combi...

Research paper thumbnail of A Meta-grammar for CCG

Applying CCG to domains outside of linguistics could require different sets of combinators to be ... more Applying CCG to domains outside of linguistics could require different sets of combinators to be developed for each domain. The meta-grammar described in this paper aims to assist such development by enabling simple, succinct expression of both existing and new combinator definitions. It favours the development of an easily-configurable, onetime-coded module that can perform CCG combinations for any combinator set of the researcher’s choosing. A preliminary implementation shows both the feasibility and potential of the meta-grammar.

Research paper thumbnail of Cool Fusion Statistical Modelling for Data Fusion

The consequences of the use of probability in modelling under uncertainty are explored, and it is... more The consequences of the use of probability in modelling under uncertainty are explored, and it is shown how data and parameter independence and likelihood modularity emerge as desirable properties for statistical models. They lead directly to multi-object models which themselves create the association problem. The expectation-maximisation (EM) technique is introduced and then applied to discriminative training of a Gaussian mixture model classi er. Generalised training is introduced which interpolates between the extremes of maximising discriminative and non-discriminative likelihoods. A generalisation of EM, the conditional expectation-maximisation process is presented, and applied to designing an algorithm for estimating the location and rotation parameters for transforming a 3D reference model to generate an observed x-ray image. The EM E-step is derived for the case in which the nuisance variables (\missing data") are divided into sets and integrated out separately. This co...

Research paper thumbnail of INFORMATION FUSION, CAUSAL PROBABILISTIC NETWORK AND PROBANET II: Inference Algorithms and Probanet System

As an extension of an overview paper [Pan and McMichael, 1997] on information fusion and Causal P... more As an extension of an overview paper [Pan and McMichael, 1997] on information fusion and Causal Probabilistic Networks (CPN), this paper formalizes kernel algorithms for probabilistic inferences upon CPNs. Information fusion is realized through updating joint probabilities of the variables upon the arrival of new evidences or new hypotheses. Kernel algorithms for some dominant methods of inferences are formalized from discontiguous, mathematics-oriented literatures, with gaps lled in with regards to computability and completeness. In particular, possible optimizations on causal tree algorithm, graph triangulation and junction tree algorithm are discussed. Probanet has been designed and developed as a generic shell, or say, mother system for CPN construction and application. The design aspects and current status of Probanet are described. A few directions for research and system development are pointed out, including hierarchical structuring of network, structure decomposition and ad...

Research paper thumbnail of 2007 I D C Information, Decision and Control

Research paper thumbnail of Robust recursive Lp estimation

IEE Proceedings D Control Theory and Applications

Research paper thumbnail of Track association with kinematic and non-kinematic state histories

IFAC Proceedings Volumes

Abstract In distributed surveillance systems, a key problem is to associate track estimates that ... more Abstract In distributed surveillance systems, a key problem is to associate track estimates that are generated by local processors to obtain a unified set of tracks. Existing track association techniques do not use track histories, and they do not take into account non-kinematic information such as target class that may be available. This paper presents a new technique that addresses these issues. We give three track association tests using histories of generalised tracks with states that can have kinematic and non-kinematic components. The track association tests are derived by minimising Bayesian risk functions where explicit recursive analytical expressions are obtained for kinematic and non-kinematic likelihood ratios. When the risk function is chosen to be the average probability of error, the test yields a maximum a posteriori (MAP) decision rule. The tests compare likelihoods of fused generalised track histories states to non-fused ones, hence they are computationally more expensive than the traditional track association tests. However, they are optimal in Bayesian sense and they do not require significant amounts of additional memory even though they use generalised track histories.

Research paper thumbnail of Fusing multiple images and extracting features for visual inspection

The aim of image analysis is not to compress and summarise data from images. It is concerned to u... more The aim of image analysis is not to compress and summarise data from images. It is concerned to use image data to abstract information about scenes and objects. The research described in this paper seeks to use insights taken from biological models of the primate visual system to design reliable visual inspection procedures. It forms part of a programme designed to create a coherent image analysis toolbox of biologically inspired algorithms. The paper has both presented practical examples of balanced gradient kernels, and provided the general theory that constrains the point spread functions employed. Two fast methods for edge segment extraction that make use of both gradient and orientation information are able to parameterise edge segments, even down to one pixel in length. In addition to parametric information, confidence factors can also be calculated and incorporated into inspection decision making. These methods, although biologically inspired, are an attempt to code some aspects of neural function in a form efficient for serial computation.

Research paper thumbnail of A Statistical Approach to Situation Assessment

Situation assessment is the discrimination of concise summary descriptions of the state of uncert... more Situation assessment is the discrimination of concise summary descriptions of the state of uncertain parametric models. This paper considers the problem of providing summary descriptions of dynamic state histories of multitarget models. It shows that under the product-ofmarginals approximation of the posterior distribution of these models it is possible to evaluate the likelihood of asynchronous multistage situations involving multiple interacting elements. Such situations include individual and multiple target manoeuvres. A dynamic programming algorithm is described that solves the problem of associating the stages of multiple situation elements with discrete time instants. The search space is much reduced by reparameterising the problem as an optimisation of the interaction times. The problem of associating tracks with situation elements is solved by selective enumeration. Methods are provided for eliminating a priori the vast preponderance of uninteresting possibilities, so that they never need to be calculated.

Research paper thumbnail of A Statistical Approach to Situation Assessment

Situation assessment is the discrimination of concise summary descriptions of the state of uncert... more Situation assessment is the discrimination of concise summary descriptions of the state of uncertain parametric models. This paper considers the problem of providing summary descriptions of dynamic state histories of multitarget models. It shows that under the product-ofmarginals approximation of the posterior distribution of these models it is possible to evaluate the likelihood of asynchronous multistage situations involving multiple interacting elements. Such situations include individual and multiple target manoeuvres. A dynamic programming algorithm is described that solves the problem of associating the stages of multiple situation elements with discrete time instants. The search space is much reduced by reparameterising the problem as an optimisation of the interaction times. The problem of associating tracks with situation elements is solved by selective enumeration. Methods are provided for eliminating a priori the vast preponderance of uninteresting possibilities, so that they never need to be calculated.

Research paper thumbnail of BARTIN: a neural structure that learns to take Bayesian minimum risk decisions

BARTIN (Bayesian real time networks) is a general structure for learning Bayesian minimum risk (m... more BARTIN (Bayesian real time networks) is a general structure for learning Bayesian minimum risk (maximum expected utility) decision schemes. It can be realized in a great variety of forms. The features that distinguish it from a standard Bayesian minimum risk classifier are, (i) it implements a general method for incorporating a prior distribution, and (ii) its ability to learn a risk minimising decision scheme from training data. Included in the enumerative realization described later, and applicable to many other variants, is a method for proportionately biassing specific decisions. BARTIN provides a bridge between neural networks and classical taught decision classification methods that are less versatile but whose internal workings are often much clearer. It provides both the flexibility of a neural network and the structure and clarity of these more formal schemes. >

Research paper thumbnail of Bayesian growing and pruning strategies for MAP-optimal estimation of Gaussian mixture models

Research paper thumbnail of Robot control using the feedback-error-learning rule with variable feedback gain

Abstract A problem plaguing the application of neural networks in all fields is the difficulty of... more Abstract A problem plaguing the application of neural networks in all fields is the difficulty of incorporating prior information to constrain the possible functions that the network can represent. This problem is addressed and solved in a robot control application. The authors present the numerical results of extensive simulations where the main interest is to investigate the generalization ability of the networks when applied to the dynamical robot control problem. They restrict the network weights to a plausible region, obtained from ...

Research paper thumbnail of Data fusion for vehicle-borne mine detection

Research paper thumbnail of BARTIN applied to visual inspection of axisymmetric engineering parts

A visual inspection scheme for detecting flaws in axisymmetric engineering parts is described and... more A visual inspection scheme for detecting flaws in axisymmetric engineering parts is described and shows very good performance. The scheme can be used for both recognition of parts and classifying flaws. The polygon transform, a compact representation of the edge information in images of axisymmetric objects gives the inspection system invariance with respect to location, scale, orientation, and illumination intensity. Polygon transform representations of the object provide the inputs to a BARTIN (Bayesian Real Time Network) that generates the inspection decisions. This implementation employs a distance measure related to Kullback-Leibler divergence to quantify the difference between sample polygon transforms, and demonstrates the parsimony and reliability of the BARTIN architecture. The latter enables inclusion of prior information in the form of probabilities, decision utilities, and engineering drawings. The results obtained from an application provided by British Aerospace gave two wrong (but safe) decisions in 625 test examples. >

Research paper thumbnail of Structural generalization in neural classification: incorporation of prior probabilities

Supervised learning of classifications under l/sub 2/ costing of error trains learning classifier... more Supervised learning of classifications under l/sub 2/ costing of error trains learning classifiers to recall Bayesian a posteriori probabilities of the possible classes, given observed measurements. This result leads to a number of insights concerning the validation of training, access to the likelihood function, creating networks of networks, incorporation of prior probabilities (which may vary in real-time), and how to choose the training set. The author focuses on the latter two points. Contextual information in the form of priors is used to generalise training data to economise on both training and computation. Structural generalization is the process whereby data is generalised architecturally rather than parametrically. A training procedure and postprocessing technique are given which enable learning under one set of prior classification probabilities to be generalized to give (asymptotically) Bayes optimal classifications under all others. >

Research paper thumbnail of Automatic Complexity Determination of Gaussian Mixture Models with the EMS Algorithm

Estimating the complexity and regularisation parameters of semiparametric models like neural netw... more Estimating the complexity and regularisation parameters of semiparametric models like neural networks by repeated trials is slow, and makes them less attractive in real-time estimation problems. Simultaneous estimation of both model parameters and complexity can be achieved using the EMS algorithm which augments expectation-maximisation (EM) to include a pruning and growing step that relies on approximating the posterior odds of model structures with di erent complexities. EMS is applied to Gaussian mixtures. Regularising priors are introduced, including the truncated inverse exponential (TIE) distribution for the component covariance matrices. A fast method for estimating the hyperparameters tunes the smoothing action of the priors to the data. This approach is applied to density estimation of speech sound data and gives a signi cant performance advantage in comparison to current methods in speech recognition.

Research paper thumbnail of A computable theory for learning Bayesian networks based on MAP-MDL principles