Michael Thomason - Profile on Academia.edu (original) (raw)

Papers by Michael Thomason

As peer-to-peer and widely distributed storage systems proliferate, the need to perform efficient... more As peer-to-peer and widely distributed storage systems proliferate, the need to perform efficient erasure coding, instead of replication, is crucial to performance and efficiency. Low-Density Parity-Check (LDPC) codes have arisen as alternatives to standard erasure codes, such as Reed-Solomon codes, trading off vastly improved decoding performance for inefficiencies in the amount of data that must be acquired to perform decoding. The scores of papers written on LDPC codes typically analyze their collective and asymptotic behavior. Unfortunately, their practical application requires the generation and analysis of individual codes for finite systems. This paper attempts to illuminate the practical considerations of LDPC codes for peer-to-peer and distributed storage systems. The three main types of LDPC codes are detailed, and a huge variety of codes are generated, then analyzed using simulation. This analysis focuses on the performance of individual codes for finite systems, and addresses several important heretofore unanswered questions about employing LDPC codes in real-world systems.

Erasure codes have profound uses in wide-and mediumarea storage applications. While infinite-size... more Erasure codes have profound uses in wide-and mediumarea storage applications. While infinite-size codes have been developed with optimal properties, there remains a need to develop small codes with optimal properties. In this paper, we provide a framework for exploring very small codes, and we use this framework to derive optimal and near-optimal ones for discrete numbers of data bits and coding bits. These codes have heretofore been unknown and unpublished, and should be useful in practice. We also use our exploration to make observations about upper bounds for these codes, in order to gain a better understanding of them and to spur future derivations of larger, optimal and near-optimal codes.

Performance prediction of checkpointing systems in the presence of failures is a well-studied res... more Performance prediction of checkpointing systems in the presence of failures is a well-studied research area. While the literature abounds with performance models of checkpointing systems, none address the issue of selecting runtime parameters other than the optimal checkpointing interval. In particular, the issue of processor allocation is typically ignored. In this paper, we briefly present a performance model for long-running parallel computations that execute with checkpointing enabled. We then discuss how it is relevant to today's parallel computing environments and software, and present case studies of using the model to select runtime parameters.

Pattern Recognition, 1986

Experiments with a method for inference of Markov networks are described. Dynamic programming is ... more Experiments with a method for inference of Markov networks are described. Dynamic programming is used to search for string alignments; which cause high probability, landmark substrings to emerge by reinforcement as the training samples are processed. Network entropy and divergence values are interpreted with respect to the results obtained when inferred networks are used in classification experiments. The data used here are representations of isolated, spoken words mapped into finite strings of symbols.

Automatically inferred markov network models for classification of chromosomal band pattern structures

Cytometry, 1990

A structural pattern recognition approach to the analysis and classification of metaphase chromos... more A structural pattern recognition approach to the analysis and classification of metaphase chromosome band patterns is presented.An operational method of representing band pattern profiles as sharp edged idealized profiles is outlined. These profiles are nonlinearly scaled to a few, but fixed number of “density” levels. Previous experience has shown that profiles of six levels are appropriate and that the differences between successive bands in these profiles are suitable for classification. String representations, which focuses on the sequences of transitions between local band pattern levels, are derived from such “difference profiles.”A method of syntactic analysis of the band transition sequences by dynamic programming for optimal (maximal probability) string‐to‐network alignments is described. It develops automatic data‐driven inference of band pattern models (Markov networks) per class, and uses these models for classification. The method does not use centromere information, but assumes the p‐q‐orientation of the band pattern profiles to be known a priori.It is experimentally established that the method can build Markov network models, which, when used for classification, show a recognition rate of about 92% on test data. The experiments used 200 samples (chromosome profiles) for each of the 22 autosome chromosome types and are designed to also investigate various classifier design problems. It is found that the use of a priori knowledge of Denver Group assignment only improved classification by 1 or 2%. A scheme for typewise normalization of the class relationship measures prove useful, partly through improvements on average results and partly through a more evenly distributed error pattern. The choice of reference of the p‐q‐orientation of the band patterns is found to be unimportant, and results of timing of the execution time of the analysis show that recent and efficient implementations can process one cell in less than 1 min on current standard hardware. A measure of divergence between data sets and Markov network models is shown to provide usable estimates of experimental classification performance.

Dynamic programming channel constraints for Markov networks in classification of strings

Dynamic programming alignment of sequences representing cyclic patterns

IEEE Transactions on Pattern Analysis and Machine Intelligence, 1993

String alignment by dynamic programming is generalized to include cyclic shift and corresponding ... more String alignment by dynamic programming is generalized to include cyclic shift and corresponding optimal alignment cost for strings representing cyclic patterns. A guided search algorithm uses bounds on alignment costs to find all optimal cyclic shifts. The bounds are derived from submatrices of an initial dynamic programming matrix. Algorithmic complexity is analyzed for major stages in the search. The applicability of the method is illustrated with satellite DNA sequences and circularly permuted protein sequences.

41st ACM Southeast …, 2003

We describe methods for inferring and using probabilistic models that capture characteristic patt... more We describe methods for inferring and using probabilistic models that capture characteristic pattern structures that may exist in symbolic data sequences. Our emphasis is on modeling the sequence of system calls made during the execution of a software application. To obtain learning ...

On the Use of Automatically Inferred Markov Networks for Chromosome Analysis

Springer eBooks, 1989

Simulation of emission tomography using grid middleware for distributed computing

Computer Methods and Programs in Biomedicine, 2004

Proceedings of the ACM SIGART international symposium on Methodologies for intelligent systems, 1986

International Journal of Computer & Information Sciences, 1976

For an electrical, mechanical, or hybrid system described diagramatically as a network of interco... more For an electrical, mechanical, or hybrid system described diagramatically as a network of interconnected components, fault tree modeling of system reliability as a function of individual component failure probabilities gives rise to logic expressions obtained from the network connections. Application of the method of Boolean differences in the analysis of such Boolean expressions is discussed, and it is shown that the influence of the status of specific components on the reliability of the total system may be investigated by straightforward algebraic operations on the network failure function.

Proceedings of the ... IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE/RSJ International Conference on Intelligent Robots and Systems

Our research focuses on anomaly detection problems in unknown environments using Wireless Sensor ... more Our research focuses on anomaly detection problems in unknown environments using Wireless Sensor Networks (WSN). We are interested in detecting two types of abnormal events: sensory level anomalies (e.g., noise in an office without lights on) and time-related anomalies (e.g., freezing temperature in a mid-summer day).We present a novel, distributed, machine learning based anomaly detector that is able to detect time-related changes. It consists of three components. First, a Fuzzy Adaptive Resonance Theory (ART) neural network classifier is used to label multi-dimensional sensor data into discrete classes and detect sensory level anomalies. Over time, the labeled classes form a sequence of classes. Next, a symbol compressor is used to extract the semantic meaning of the temporal sequence. Finally, a Variable Memory Markov (VMM) model in the form of a Probabilistic Suffix Tree (PST) is used to model and detect time-related anomalies in the environment. To our knowledge, this is the fi...

2012 IEEE 9th International Conference on Mobile Ad-Hoc and Sensor Systems (MASS 2012), 2012

WSN applications are prone to bugs and failures due to their typical characteristics, such as bei... more WSN applications are prone to bugs and failures due to their typical characteristics, such as being extensively distributed, heavily concurrent, and resource restricted. In this paper, we propose and develop a flexible and iterative WSN debugging system based on sequence mining techniques. At first, we develop a data structure called the vectorized Probabilistic Suffix Tree (vPST), an elastic model to extract and store sequential information from program runtime traces in compact suffix tree based vectors. Then, we build a novel WSN debugging system by integrating vPST with Support Vector Machines (SVM), a robust and generic classifier for both linear and nonlinear data classification tasks. Finally, we demonstrate that the vPST-SVM debugging system is efficient, flexible, and generic by three different test cases, two on the LiteOS operating system and one on the TinyOS operating system.

Human Behavior Understanding in Networked Sensing, 2014

Anomaly detection is an important problem for environment, fault diagnosis and intruder detection... more Anomaly detection is an important problem for environment, fault diagnosis and intruder detection in Wireless Sensor Networks (WSNs). A key challenge is to minimize the communication overhead and energy consumption in the network when identifying these abnormal events. We present a machine learning (ML) framework that is suitable for WSNs to sequentially detect sensory level anomalies and time-related anomalies in an unknown environment. Our system consists of a set of modular, unsupervised, machine learning algorithms that are adaptive. The modularity of the ML algorithms to maximize the use of resource constrained sensor nodes in different environmental monitoring tasks without reprogramming. The developed ML framework consists of the following modular components. First, an unsupervised neural network is used to map multi-dimensional sensor data into discrete environmental states/classes and detect sensor level anomalies. Over time, the labeled classes form a sequence of environmental states. Next, we use a variable length Markov model in the form of a Probabilistic Suffix Tree (PST) to model the relationship between temporal events. Depending on the types of applications, high order Markov models can be expensive. We use a symbol compression technique to bring down the cost of PST models by extracting the semantic meaning out of temporal sequences. Lastly, we use a likelihood-ratio test to verify whether there are anomalous events. We demonstrate the efficiency our approach by applying it in two real-world applications: volcano monitoring and traffic monitoring applications. Our experimental results show that the developed approach yields high perfor

Efficient dynamic programming alignment of cyclic strings by shift elimination

Pattern Recognition, 1996

Optimal alignment of two strings of length m and n is computed in time O(mn) by dynamic programmi... more Optimal alignment of two strings of length m and n is computed in time O(mn) by dynamic programming. When the strings represent cyclic patterns, the alignment computation must consider all possible shifts and the computation complexity increases accordingly. We present an algorithm for efficient dynamic programming alignment of cyclic strings which uses a previously established channeling technique to reduce the

Journal of Parallel and Distributed Computing, 1999

The parallelization of iterative algorithms is an important issue for e cient solution of large n... more The parallelization of iterative algorithms is an important issue for e cient solution of large numerical problems. Several theoretical results concerning su cient conditions for, and speed of, convergence of parallel iterative algorithms are available. However, those results usually do not take into account the processor workloads and network communications at the application level. The approach in this paper develops a Markov chain based on random variables which describe aspects of the multiuser , distributed-memory environment and the phases of the algorithm. The performance characterization addresses stochastic characteristics of the algorithmic execution time such as mean values and standard deviations. We present simulation results as well as experimental results over di erent time periods. The results provide information about the impact of distributed environment and implementation style on long-run, expected execution time characteristics.

Information and Software Technology, 2000

In previous work we developed a method to model software testing data, including both failure eve... more In previous work we developed a method to model software testing data, including both failure events and correct behavior, as a finitestate, discrete-parameter, recurrent Markov chain. We then showed how direct computation on the Markov chain could yield various reliability related test measures. Use of the Markov chain allows us to avoid common assumptions about failure rate distributions and allows both the operational profile and test coverage of behavior to be explicitly and automatically incorporated into reliability computation. Current practice in Markov chain based testing and reliability analysis uses only the testing (and failure) activity on the most recent software build to estimate reliability. In this paper we extend the model to allow use of testing data on prior builds to cover the real-world scenario in which the release build is constructed only after a succession of repairs to buggy pre-release builds. Our goal is to enable reliability prediction for future builds using any or all testing data for prior builds. The technique we present uses multiple linear regression and exponential smoothing to merge multi-build test data (modeled as separate Markov chains) into a single Markov chain which acts as a predictor of the next build of testing activity. At the end of the testing cycle, the predicted Markov chain represents field use. It is from this chain that reliability predictions are made.

IEEE Transactions on Software Engineering, 1994

Abstruct-Statistical testing of software establishes a basis for statistical inference about a so... more Abstruct-Statistical testing of software establishes a basis for statistical inference about a software system's expected field quality. This paper describes a method for statistical testing based on a Markov chain model of software usage. The significance of the Markov chain is twofold. First, it allows test input sequences to be generated from multiple probability distributions, making it more'general than many existing techniques. Analytical results associated with Markov chains facilitate informative analysis of the sequences before they are generated, indicating how the test is likely to unfold. Second, the test input sequences generated from the chain and applied to the software are themselves a stochastic model and are used to create a second Markov chain to encapsulate the history of the test, including any observed failure information. The influence of the failures is assessed through analytical computations on this chain. We also derive a stopping criterion for the testing process based on a comparison of the sequence generating properties of the two chains.

Constrained Markov networks for automated analysis of G-banded chromosomes

Computers in Biology and Medicine, 1993

Automated analysis of chromosome band patterns using probabilistic Markov networks has been repor... more Automated analysis of chromosome band patterns using probabilistic Markov networks has been reported in previous work. Band patterns are represented as strings of symbols. Inferred from a set of learning strings, a Markov network is a model of intraband and interband relations in these strings. The inference is entirely data-driven and is accomplished using dynamic programming. This paper presents a new model of chromosome band patterns, the constrained Markov network, which is a special case of its predecessor. Substantial experimental evidence of the superiority of the new model over the old is given in terms of equal results in centromere finding and improved results in classification for the 22 autosomes. Furthermore, a method for simplification of constrained Markov networks is shown to be of considerable importance with respect to computational complexity.