Statistical Computing Research Papers - Academia.edu (original) (raw)

The problem of "variable selection" is a fundamental one across the sciences. In its broadest terms, this problem would be at least part of the general issue of theory selection and comparison. However, there is a more circumscribed... more

The problem of "variable selection" is a fundamental one across the sciences. In its broadest terms, this problem would be at least part of the general issue of theory selection and comparison. However, there is a more circumscribed problem that concerns primarily the choice of variables for the best fitting model, given some set of data, usually observational in nature, and specific statistical techniques, typically multiple regression. There is a deep strand in econometrics and other applied social, behavioral, and biomedical science statistics to want formal decision rules or algorithms to pick out variables. The paper examines seven such formal procedures using a simulated data set with known causal relations. The conclusion is that seven often-used procedures make systematic causal errors. Some suggestions about better alternatives conclude.

Diplomová práce se zabývá zobecněným exponenciálním rozdělením jako alternativou k Weibullovu a log-normálnímu rozdělení. Jsou popsány základní charakteristiky tohoto rozdělení a metody odhadu parametrů. Samostatná kapitola je věnována... more

Diplomová práce se zabývá zobecněným exponenciálním rozdělením jako alternativou k Weibullovu a log-normálnímu rozdělení. Jsou popsány základní charakteristiky tohoto rozdělení a metody odhadu parametrů. Samostatná kapitola je věnována testům dobré shody. Druhá část práce se zabývá cenzorovanými výběry. Jsou uvedeny ukázkové příklady pro exponenciální rozdělení. Dále je studován případ cenzorování typu I zleva, který dosud nebyl publikován. Pro tento speciální případ jsou provedeny simulace s podrobným popisem vlastností a chování. Dále je pro toto rozdělení odvozen EM algoritmus a jeho efektivita je porovnána s metodou maximální věrohodnosti. Vypracovaná teorie je aplikována pro analýzu environmentálních dat

An industrial devices quality control inspection system based on a general purpose microprocessor is proposed. The system employs a microprocessor CPU, timer, ROM and peripheral interfaces in the hardware portion. Software includes an... more

An industrial devices quality control inspection system based on a general purpose microprocessor is proposed. The system employs a microprocessor CPU, timer, ROM and peripheral interfaces in the hardware portion. Software includes an algorithm incorporating data acquisition, statistical computation and command interpreting functions, plus arithmetic and conversion subroutines. Two applications have been developed and are briefly described.

Access Various Climate Data, Manipulate Different File Formats, and Downscale GCM/RCM Models Using a Stochastic Approach, ALL with the R programming Language. The purpose of this document is to consolidate and improve the various R... more

Access Various Climate Data, Manipulate Different File Formats, and Downscale GCM/RCM Models Using a Stochastic Approach, ALL with the R programming Language. The purpose of this document is to consolidate and improve the various R scripts used to perform the cited analyses.

The motion of small spherical solid particles are simulated numerically in a decaying homogeneous isotropic turbulent gas flow field generated by the large eddy simulation. By comparing with the previous experimental and theoretical... more

The motion of small spherical solid particles are simulated numerically in a decaying homogeneous isotropic turbulent gas flow field generated by the large eddy simulation. By comparing with the previous experimental and theoretical studies, the present method is found to be a successful tool to generate the properties of the particle motion involving the second-order statistics, such as the mean-square displacement, the dispersion coefficient, and the root-mean-square velocity fluctuation. The present results are complementary to the experimental data and include a detailed study of the effects of the flow turbulence, the particle’s inertia, and the particle’s free-fall velocity in a still fluid on the particle dispersion and turbulence intensity. By performing particle simulation in the flow fields generated with different values of the coefficient in the subgrid model and with different sizes of the calculation domain, it is found that the particle motion is indeed controlled mai...

We propose cKAM, cyclical Kernel Adaptive Metropolis, which incorporates a cyclical stepsize scheme to allow control for exploration and sampling. We show that on a crafted bimodal distribution, existing Adaptive Metropolis type... more

We propose cKAM, cyclical Kernel Adaptive Metropolis, which incorporates a cyclical stepsize scheme to allow control for exploration and sampling. We show that on a crafted bimodal distribution, existing Adaptive Metropolis type algorithms would fail to converge to the true posterior distribution. We point out that this is because adaptive samplers estimates the local/global covariance structure using past history of the chain, which will lead to adaptive algorithms be trapped in a local mode. We demonstrate that cKAM encourages exploration of the posterior distribution and allows the sampler to escape from a local mode, while maintaining the high performance of adaptive methods.

Duan (2015) propose a tempering or annealing approach to Bayesian inference for time series state space models. In such models the likelihood is often analytically and computationally intractable. Their approach generalizes the annealed... more

Duan (2015) propose a tempering or annealing approach to Bayesian inference for time series state space models. In such models the likelihood is often analytically and computationally intractable. Their approach generalizes the annealed importance sampling (AIS) approach of Neal (2001) and DelMoral (2006) when the likelihood can be computed analytically. Annealing is a sequential Monte Carlo approach that moves a collection of parameters and latent state variables through a number of levels, with each level having its own target density, in such a way that it is easy to generate both the parameters and latent state variables at the initial level while the target density at the final level is the posterior density of interest. A critical component of the annealing or density tempering method is the Markov move component that is implemented at every stage of the annealing process. The Markov move component effectively runs a small number of Markov chain Monte Carlo iterations for each...

Structural genomics aims to solve a large number of protein structures that represent the protein space. Currently an exhaustive solution for all structures seems prohibitively expensive, so the challenge is to define a relatively small... more

Structural genomics aims to solve a large number of protein structures that represent the protein space. Currently an exhaustive solution for all structures seems prohibitively expensive, so the challenge is to define a relatively small set of proteins with new, currently unknown folds. This paper presents a method that assigns each protein with a probability of having an unsolved fold. The method makes extensive use of protomap , a sequence-based classification, and scop , a structure-based classification. According to protomap , the protein space encodes the relationship among proteins as a graph whose vertices correspond to 13,354 clusters of proteins. A representative fold for a cluster with at least one solved protein is determined after superposition of all scop (release 1.37) folds onto protomap clusters. Distances within the protomap graph are computed from each representative fold to the neighboring folds. The distribution of these distances is used to create a statistical ...

International audienceUncertainty management is a key aspect of any information fusion (IF) system. Evaluation of how uncertainty is dealt with within a given IF system is distinct from, although closely related to, evaluation of the... more

International audienceUncertainty management is a key aspect of any information fusion (IF) system. Evaluation of how uncertainty is dealt with within a given IF system is distinct from, although closely related to, evaluation of the overall performance of the system. This paper presents the Uncertainty Representation and Reasoning Evaluation Framework (URREF), which is developed by the ISIF Evaluation of Techniques for Uncertainty Representation Working Group (ETURWG) for evaluating the uncertainty management aspects of IF systems. The paper describes the scope of the framework, its core element-the URREF ontology, the elementary fusion process it considers, and how these are related to the subjects being evaluated using the framework. Although material about the URREF has been previously published elsewhere, this work is the first to provide a comprehensive overview of the framework, establishing its scope, core elements, elementary fusion process considered, and relationship betw...

The core of the insurance business is the underwriting function. As a business process, underwriting has remained essentially unchanged since the early 1600’s in London, England. Ship owners, seeking to protect themselves from financial... more

The core of the insurance business is the underwriting function. As a business process, underwriting has remained essentially unchanged since the early 1600’s in London, England. Ship owners, seeking to protect themselves from financial ruin in the event their ships were to be lost at sea, would seek out men of wealth to share in their financial risk. Wealthy men, upon accepting the risk, would write their name under (at the bottom of) the ship’s manifest, hence the name “underwriters.” The underwriters would then share in the profits of the voyage, or reimburse the ship’s captain for his losses if the ship were lost at sea. This practice lead to the founding of Lloyd’s of London, the most recognized name in the insurance business today (Gibb, 1972; Golding & King-Page, 1952).

Missing data is a common issue encountered in research across various fields, including social sciences. It occurs when no data value is stored for the variable in an observation. It can happen for a wide range of reasons: from... more

Missing data is a common issue encountered in research across various fields, including social sciences. It occurs when no data value is stored for the variable in an observation. It can happen for a wide range of reasons: from participants not answering certain questions in a survey to errors in data collection or transfer processes. Missing data can significantly impact the analysis, potentially leading to biased estimates, reduced statistical power, and invalid conclusions. It's crucial for researchers and analysts to recognize the types of missing data, understand the mechanisms behind them, and apply appropriate methods for handling them.

The full modulation spectrum is a high-dimensional representation of one-dimensional audio signals. Most previous research in automatic speech recognition converted this very rich representation into the equivalent of a sequence of... more

The full modulation spectrum is a high-dimensional representation of one-dimensional audio signals. Most previous research in automatic speech recognition converted this very rich representation into the equivalent of a sequence of short-time power spectra, mainly to simplify the computation of the posterior probability that a frame of an unknown speech signal is related to a specific state. In this paper we use the raw output of a modulation spectrum analyser in combination with sparse coding as a means for obtaining state posterior probabilities. The modulation spectrum analyser uses 15 gammatone filters. The Hilbert envelope of the output of these filters is then processed by nine modulation frequency filters, with bandwidths up to 16 Hz. Experiments using the AURORA-2 task show that the novel approach is promising. We found that the representation of medium-term dynamics in the modulation spectrum analyser must be improved. We also found that we should move towards sparse classi...