Statistical Computing Research Papers - Academia.edu (original) (raw)

The paper is concentrated on two essential problems: neural networks topology optimization and weights parameters computation that are often solved separately. This paper describes new solution of solving both selected problems together.... more

The paper is concentrated on two essential problems: neural networks topology optimization and weights parameters computation that are often solved separately. This paper describes new solution of solving both selected problems together. According to proposed methodology a special kind of multilayer ontogenic neural networks called Self-Optimizing Neural Networks (SONNs) can simultaneously develop its structure for given training data and compute

This article describes a recently proposed standard, ISEA discrete global grids, for gridding information on the surface of the earth. The acronym ISEA stands for icosahedral Snyder equal area. The grid cells not only have equal areas,... more

This article describes a recently proposed standard, ISEA discrete global grids, for gridding information on the surface of the earth. The acronym ISEA stands for icosahedral Snyder equal area. The grid cells not only have equal areas, they are hexagons when projected ...

Online PCA for multivariate and functional data using perturbation, incremental, and stochastic gradient methods.

Even though the automotive industry was among the key players of the industrial revolution in the last century, striking transformations experienced in other sectors did not have significant repercussions on this industry until a few... more

Even though the automotive industry was among the key players of the industrial revolution in the last century, striking transformations experienced in other sectors did not have significant repercussions on this industry until a few years ago. However, general advancements in technology and Industry 4.0 have presented new opportunities for the reconfiguration of the business environment. Developments in cryptocurrencies such as bitcoin, in particular, have attracted the attention to what is known as block-chain technology. Several successful examples of blockchain applications in different industries have tempted the automotive industry to be rapidly involved with efforts in this direction. As a consequence, the application of the blockchain technology to highly diverse areas in the automotive industry was set in motion. The purpose of this chapter is to explore the application of blockchain technology in the automotive industry, to analyse its advantages and disadvantages, and to demonstrate its successful in general.

You yourself, or what is the same, your experience is such ``coin'' that, while you aren't questioned, it rotates all the time in ``free flight''. And only when you answer the question the ``coin'' falls on one of the sides: ``Yes'' or... more

You yourself, or what is the same, your experience is such ``coin'' that, while you aren't questioned, it rotates all the time in ``free flight''. And only when you answer the question the ``coin'' falls on one of the sides: ``Yes'' or ``No'' with the believability that your experience tells you.

A great many empirical researchers in the social sciences take computational factors for granted: For the social scientist, software is a tool, not an end in itself. Although there is an extensive literature on statistical computing in... more

A great many empirical researchers in the social sciences take computational factors for granted: For the social scientist, software is a tool, not an end in itself. Although there is an extensive literature on statistical computing in statistics, applied mathematics, and embedded within various natural science fields, there is currently no such guide tailored directly to the needs of the

MXM is an R package which offers variable selection for high-dimensional data in cases of regression and classification. Many regression models are offered. In addition some functions for Bayesian Networks and graphical models are... more

MXM is an R package which offers variable selection for high-dimensional data in cases of regression and classification. Many regression models are offered. In addition some functions for Bayesian Networks and graphical models are offered.

Data analysis plays an indispensable role for understanding various phenomena. Cluster analysis, primitive exploration with little or no prior knowledge, consists of research developed across a wide variety of communities. The diversity,... more

Data analysis plays an indispensable role for understanding various phenomena. Cluster analysis, primitive exploration with little or no prior knowledge, consists of research developed across a wide variety of communities. The diversity, on one hand, equips us with many tools. On the other hand, the profusion of options causes confusion. We survey clustering algorithms for data sets appearing in statistics, computer science, and machine learning, and illustrate their applications in some benchmark data sets, the traveling salesman problem, and bioinformatics, a new field attracting intensive efforts. Several tightly related topics, proximity measure, and cluster validation, are also discussed.

The paper reports on both methodological and substantive findings. It presents a method for generating simplified representations for regional urban populations, their geographical sub-populations and communities. the method generates... more

The paper reports on both methodological and substantive findings. It presents a method for generating simplified representations for regional urban populations, their geographical sub-populations and communities. the method generates greatly simplified high-resolution socio-economic profiles of populated geographical areas from complex large census data sets.

Parametric survival models are being increasingly used as an alternative to the Cox model in biomedical research. Through direct modelling of the baseline hazard function, we can gain greater understanding of the risk profile of patients... more

Parametric survival models are being increasingly used as an alternative to the Cox model in biomedical research. Through direct modelling of the baseline hazard function, we can gain greater understanding of the risk profile of patients over time, obtaining absolute measures of risk. Commonly used parametric survival models, such as the Weibull, make restrictive assumptions of the baseline hazard function, such as monotonicity, which is often violated in clinical datasets. In this article, we extend the general framework of parametric survival models proposed by Crowther and Lambert (Journal of Statistical Software 53:12, 2013), to incorporate relative survival, and robust and cluster robust standard errors. We describe the general framework through three applications to clinical datasets, in particular, illustrating the use of restricted cubic splines, modelled on the log hazard scale, to provide a highly flexible survival modelling framework. Through the use of restricted cubic splines, we can derive the cumulative hazard function analytically beyond the boundary knots, resulting in a combined analytic/numerical approach, which substantially improves the estimation process compared with only using numerical integration. User-friendly Stata software is provided, which significantly extends parametric survival models available in standard software. Copyright © 2014 John Wiley & Sons, Ltd.

In this thesis, we consider the extreme value distn. of two parameters for the reason of its appearance in many statistical fields of applications. Mathematical and statistical properties of the distribution. such as moments and higher... more

In this thesis, we consider the extreme value distn. of two parameters for the reason of its appearance in many statistical fields of applications. Mathematical and statistical properties of the distribution. such as moments and higher moments are collected and unified and the properties of reliability and hazard functions of the distribution are illustrated.
The chi-square goodness - of - fit is used to test whether the generated samples from the standardized extreme value distribution by Monte Carlo simulation are acceptable for use.
These samples are used to estimate the distribution parameters by four methods of estimation, namely moments method, maximum likelihood method, order statistic method and least squares method.
These methods are discussed theoretically and assessed practically in estimating the reliability and hazard functions. The properties of the estimator, reliability and hazard functions, such as bias, variance, skewness, kurtosis, and mean square error are tabled.
The computer programs are listed in three appendices and the run is made by using "MathCAD 14".

We herein introduce a new method of interpretable clustering that uses unsupervised binary trees. It is a three-stage procedure, the first stage of which entails a series of recursive binary splits to reduce the heterogeneity of the data... more

We herein introduce a new method of interpretable clustering that uses unsupervised binary trees. It is a three-stage procedure, the first stage of which entails a series of recursive binary splits to reduce the heterogeneity of the data within the new subsamples. During the second stage (pruning), consideration is given to whether adjacent nodes can be aggregated. Finally, during the third stage (joining), similar clusters are joined together, even if they do not share the same parent originally. Consistency results are obtained, and the procedure is used on simulated and real data sets.

This paper kicks off a project to write a comprehensive book of best practices for documenting SAS® projects. The presenter’s existing documentation styles are explained. The presenter wants to discuss and gather current best practices... more

This paper kicks off a project to write a comprehensive book of best practices for documenting SAS® projects. The presenter’s existing documentation styles are explained. The presenter wants to discuss and gather current best practices used by the SAS user community. The presenter shows documentation styles at three different levels of scope. The first is a style used for project documentation, the second a style for program documentation, and the third a style for variable documentation. This third style enables researchers to repeat the modeling in SAS research, in an alternative language, or conceptually.

Short-term wind power forecasts are fundamental information for the safe and economic integration of wind farms into an electric power system. In this work we present a Generalized Additive Model to predict the wind power quantiles... more

Short-term wind power forecasts are fundamental information for the safe and economic integration of wind farms into an electric power system. In this work we present a Generalized Additive Model to predict the wind power quantiles (Quantile Regression) from which we obtain a prediction of the wind power production probability density function in a wind farm. The methodology was implemented in the VENTOS Program. In order to illustrate the application of the methodology as well as the VENTOS Program this work presents the results achieved by a computational experiment based on real data from a wind farm located in Galicia, Spain.

The early detection of epileptic seizures requires computing relevant statistics from multivariate data and defining a robust decision strategy as a function of these statistics that accurately detects the transition from the normal to... more

The early detection of epileptic seizures requires computing relevant statistics from multivariate data and defining a robust decision strategy as a function of these statistics that accurately detects the transition from the normal to the peri-ictal (problematic) state. We model the afflicted brain as a hidden Markov model (HMM) with two hidden clinical states (normal and peri-ictal). The output of the HMM is a statistic computed from multivariate neural measurements. A Bayesian framework is developed to analyze the a posteriori conditional probability of being in peri-ictal state given current and past output measurements. We apply this method to multichannel intracortical EEGs (iEEGs) from the thalamo-cortical ictal pathway in an epilepsy rat model. We first define the output statistic as the max singular value of a connectivity matrix computed on the EEG channels with spectral techniques Then, we estimate the HMM transition probabilities from this statistic and track the a poste...

Interest is in evaluating, by Markov chain Monte Carlo (MCMC) simulation, the expected value of a function with respect to a, possibly unnormalized, probability distribution. A general purpose variance reduction technique for the MCMC... more

Interest is in evaluating, by Markov chain Monte Carlo (MCMC) simulation, the expected value of a function with respect to a, possibly unnormalized, probability distribution. A general purpose variance reduction technique for the MCMC estimator, based on the zero-variance principle introduced in the physics literature, is proposed. Conditions for asymptotic unbiasedness of the zero-variance estimator are derived. A central limit theorem is also proved under regularity conditions. The potential of the idea is illustrated with real applications to probit, logit and GARCH Bayesian models. For all these models, a central limit theorem and unbiasedness for the zero-variance estimator are proved (see the supplementary material available on-line).

In Bayesian statistics, many problems can be expressed as the evaluation of the expectation of a quantity of interest with respect to the posterior distribution. Standard Monte Carlo method is often not applicable because the encountered... more

In Bayesian statistics, many problems can be expressed as the evaluation of the expectation of a quantity of interest with respect to the posterior distribution. Standard Monte Carlo method is often not applicable because the encountered posterior distributions cannot be sampled directly. In this case, the most popular strategies are the importance sampling method, Markov chain Monte Carlo, and annealing.