Non Parametric Research Papers - Academia.edu (original) (raw)
Methods for on-line monitoring of business cycles are compared with respect to the ability of early prediction of the next turn by an alarm for a turn in a leading index. Three likelihood based methods for turning point detection are... more
Methods for on-line monitoring of business cycles are compared with respect to the ability of early prediction of the next turn by an alarm for a turn in a leading index. Three likelihood based methods for turning point detection are compared in detail by using the theory of statistical surveillance and by simulations. One of the methods is based on a Hidden Markov Model. Another includes a non-parametric estimation procedure. Evaluations are made of several features such as knowledge of shape and parameters of the curve, types and probabilities of transitions and smoothing. Results on the expected delay time to a correct alarm and the predictive value of an alarm are discussed. The three methods are also used to analyze an actual data set of a period of the Swedish industrial production. The relative merits of evaluation of methods by one real data set or by simulations are discussed.
This paper considers bidding behavior in a repeated procurement auction setting. We study highway procurement data for the state of California between December 1994 and October 1998. We consider a dynamic bidding model that takes into... more
This paper considers bidding behavior in a repeated procurement auction setting. We study highway procurement data for the state of California between December 1994 and October 1998. We consider a dynamic bidding model that takes into account the presence of intertemporal constraints such as capacity constraints. We estimate the model non-parametrically and assess the presence of dynamic constraints in bidding.
Kendall's tau (τ) has been widely used as a distribution-free measure of cross-correlation between two variables. It has been previously shown that persistence in the two involved variables results in the inflation of the variance of τ.... more
Kendall's tau (τ) has been widely used as a distribution-free measure of cross-correlation between two variables. It has been previously shown that persistence in the two involved variables results in the inflation of the variance of τ. In this paper, the full null distribution of Kendall's τ for persistent data with multivariate Gaussian dependence is derived, and an approximation to the full distribution is proposed. The effect of the deviation from the multivariate Gaussian dependence model on the distribution of τ is also investigated. As a demonstration, the temporal consistency and field significance of the cross-correlation between the North Hemisphere (NH) temperature time series in the period 1850-1995 and a set of 784 NH tree-ring width (TRW) proxies in addition to 105 NH tree-ring maximum latewood density (MXD) proxies are studied. When persistence is ignored, the original Mann-Kendall test gives temporally inconsistent results between the early half (1850-1922) and the late half (1923-1995) of the record. These temporal inconsistencies are largely eliminated when persistence is accounted for, indicating the spuriousness of a large portion of the identified cross-correlations. Furthermore, the use of the modified test in combination with a field significance test that is robust to spatial correlation indicates the absence of field significant cross-correlation in both halves of the record. These results have serious implications for the use of tree-ring data as temperature proxies, and emphasize the importance of utilizing the correct distribution of Kendall's τ in order to avoid the overestimation of the significance of cross-correlation between data that exhibit significant persistence.
Educational and social service researchers and evaluators continue to develop advanced statistical methods. To ensure that our students have the essential skills as they enter direct service, the focus must be on assuring that they learn... more
Educational and social service researchers and evaluators continue to develop advanced statistical methods. To ensure that our students have the essential skills as they enter direct service, the focus must be on assuring that they learn readily understandable methods that are appropriate for small samples and use repeated measures.
The purpose of this study was to examine different non-parametric imputation methods to reduce regional biases in growth estimates. Growth estimates were obtained using non-parametric k-nearest neighbour imputation (k-NN) to predict... more
The purpose of this study was to examine different non-parametric imputation methods to reduce regional biases in growth estimates. Growth estimates were obtained using non-parametric k-nearest neighbour imputation (k-NN) to predict future 5-year diameter increment over bark at breast height for Scots pine (Pinus sylvestris L.) and Norway spruce (Picea abies). The Mahalanobis distance function was chosen as the most suitable measure of similarity, and then it was modified using weights provided by linear regression analysis. The use of weights from linear regression facilitated the examination of the correlation structure of the variables and allowed for transformations of the independent variables. Localization of the non-parametric estimates was obtained through a variety of methods, in particular, by using spatial coordinates as independent variables, by restricting the selection of neighbours to a circular area around the target tree, and by restriction the selection of neighbours to a local database. The localized estimates using spatial measures were then compared with non-spatial imputation and also with estimates from a parametric growth model. Results were then compared by vegetation zones in Finland. The differences between the non-spatial k-NN estimates and the localized spatial estimates were negligible when summarized to the stand level, and localization did not reduce the regional biases relative to the non-spatial k-NN estimates. Regional biases in northern Finland and in south-western Finland were reduced substantially using the non-parametric estimates rather than the parametric growth models, however, and the mean biases in all of the regions were quite similar, while the mean biases of the growth estimates obtained with the parametric model varied notably between the regions.
This paper addresses the inference of spatial dependence in the context of a recently proposed framework. More specifically, the paper focuses on the estimation of model parameters for a class of generalized Gibbs random fields, i.e.,... more
This paper addresses the inference of spatial dependence in the context of a recently proposed framework. More specifically, the paper focuses on the estimation of model parameters for a class of generalized Gibbs random fields, i.e., Spartan Spatial Random Fields (SSRFs). The problem of parameter inference is based on the minimization of a distance metric. The latter involves a specifically designed distance between sample constraints (variance, generalized ``gradient'' and ``curvature'') and their ensemble counterparts. The general principles used in the construction of the metric are discussed and intuitively motivated. In order to enable calculation of the metric from sample data, estimators for generalized ``gradient'' and ``curvature'' constraints are constructed. These estimators, which are not restricted to SSRFs, are formulated using compactly supported kernel functions. An intuitive method for kernel bandwidth selection is proposed. It is pr...
Most dominant point detection methods require heuristically chosen control parameters. One of the commonly used control parameter is maximum deviation. This paper uses a theoretical bound of the maximum deviation of pixels obtained by... more
Most dominant point detection methods require heuristically chosen control parameters. One of the commonly used control parameter is maximum deviation. This paper uses a theoretical bound of the maximum deviation of pixels obtained by digitization of a line segment for constructing a general framework to make most dominant point detection methods non-parametric. The derived analytical bound of the maximum deviation can be used as a natural bench mark for the line fitting algorithms and thus dominant point detection methods can be made parameter-independent and non-heuristic. Most methods can easily incorporate the bound. This is demonstrated using three categorically different dominant point detection methods. Such non-parametric approach retains the characteristics of the digital curve while providing good fitting performance and compression ratio for all the three methods using a variety of digital, non-digital, and noisy curves.
An attempt is made in this paper to examine whether stock returns in two premier stock exchanges in India namely, Bombay Stock Exchange (BSE) and National Stock Exchange (NSE) follow a random walk. Towards this end, data on major indices... more
An attempt is made in this paper to examine whether stock returns in two premier stock exchanges in India namely, Bombay Stock Exchange (BSE) and National Stock Exchange (NSE) follow a random walk. Towards this end, data on major indices during the period 1997 to 2009 are analyzed using non-parametric Runs and BDS tests. The findings of the study reveal that the stock returns do not follow a random walk during the sample period.
Escaping unidimensional analysis limits and linear regression irrelevancy, the duration model incorporates impacts of covariates on the duration variable and permits to test the dependence of daily travel times on elapsed time. In the... more
Escaping unidimensional analysis limits and linear regression irrelevancy, the duration model incorporates impacts of covariates on the duration variable and permits to test the dependence of daily travel times on elapsed time. In the perspective of a discussion of Zahavi's hypothesis, the duration model approach is applied to the daily travel times of Lyon (France). The relationships between daily travel
We used simulation to explore the impact of common data imperfections (i.e., missing parents, genotyping error, map error, and missing genotypes) upon the performance of multipoint and single point linkage analysis in the analyses of... more
We used simulation to explore the impact of common data imperfections (i.e., missing parents, genotyping error, map error, and missing genotypes) upon the performance of multipoint and single point linkage analysis in the analyses of linkage data from pairs of siblings affected with an idealized complex trait. The performance of single point and multipoint linkage was similar under an unrealistic best case scenario; however, when four data imperfections were combined, the performance of single point linkage analysis appeared to be superior to multipoint. The absence of parental genotypes in the presence of 1% genotype error led to marked degradation of linkage signal, particularly for multipoint analyses.
In this paper we focus primarily on the dynamic evolution of the world distribution of growth rates in per capita GDP. We propose new concepts and measures of "convergence," or "divergence" that are based on entropy distances and... more
In this paper we focus primarily on the dynamic evolution of the world distribution of growth rates in per capita GDP. We propose new concepts and measures of "convergence," or "divergence" that are based on entropy distances and dominance relations between groups of countries over time. We update the sample period to include the most recent decade of data available, and we offer traditional parametric and new nonparametric estimates of the most widely used growth regressions for two important subgroups of countries, OECD and non-OECD. Traditional parametric models are rejected by the data, however, using robust nonparametric methods we find strong evidence in favor of "polarization" and "within group" mobility.
In this paper we focus primarily on the dynamic evolution of the world distribution of growth rates in per capita GDP. We propose new concepts and measures of "convergence," or "divergence" that are based on entropy distances and... more
In this paper we focus primarily on the dynamic evolution of the world distribution of growth rates in per capita GDP. We propose new concepts and measures of "convergence," or "divergence" that are based on entropy distances and dominance relations between groups of countries over time.
Little attention has been given to the correlation coefficient when data come from discrete or continuous non-normal populations. In this article, we consider the efficiency of two correlation coefficients which are from the same family,... more
Little attention has been given to the correlation coefficient when data come from discrete or continuous non-normal populations. In this article, we consider the efficiency of two correlation coefficients which are from the same family, Pearson's and Spearman's estimators. Two discrete bivariate distributions were examined: the Poisson and the Negative Binomial. The comparison between these two estimators took place using classical and bootstrap techniques for the construction of confidence intervals. Thus, these techniques are also subject to comparison. Simulation studies were also used for the relative efficiency and bias of the two estimators. Pearson's estimator performed slightly better than Spearman's.
We describe a novel approach to extract the neural tracts of interest from a diffusion tensor image (DTI). Compared to standard streamline tractography, existing probabilistic methods are able to capture fiber paths that deviate from the... more
We describe a novel approach to extract the neural tracts of interest from a diffusion tensor image (DTI). Compared to standard streamline tractography, existing probabilistic methods are able to capture fiber paths that deviate from the main tensor diffusion directions. At the same time, tensor clustering methods are able to more precisely delimit the border of the bundle. To the best of our knowledge, we propose the first algorithm which combines the advantages supplied by probabilistic and tensor clustering approaches. The algorithm includes a post-processing step to limit partial-volume related segmentation errors. We extensively test the accuracy of our algorithm on different configurations of a DTI software phantom for which we systematically vary the image noise, the number of gradients, the geometry of the fiber paths and the angle between adjacent and crossing fiber bundles. The reproducibility of the algorithm is supported by the segmentation of the corticospinal tract of nine patients. Additional segmentations of the corticospinal tract, the arcuate fasciculus, and the optic radiations are in accordance with anatomical knowledge. The required user interaction is comparable to that of streamline tractography, which allows for an uncomplicated integration of the algorithm into the clinical routine.
Observations made on many environmental variables do not often follow a normal distribution. Even the widely used logarithmic transformation does not guarantee normality of the transformed data. Because of this, resort is often made, when... more
Observations made on many environmental variables do not often follow a normal distribution. Even the widely used logarithmic transformation does not guarantee normality of the transformed data. Because of this, resort is often made, when comparing groups of observations, to non-parametric test procedures. Although the null hypothesis of interest in such analyses is often that the means of two groups are the same, this is not the null hypothesis tested in these procedures. This implies that use of these tests as procedures for means may be invalid, in that even when the group means are equal, the test does not have the Type I error chosen. Further problems arise with non-independent data. We report the results of a Monte Carlo study where: (a) the means of the two groups are the same, but other characteristics differ; (b) the differences of pairs in a paired-comparison model are dependent; and (c) the marginal distributions of pairs are dependent, but not identical. We note that frequently used non-parametric procedures when the assumptions are violated are not valid. The results demonstrate the importance of understanding the assumptions required for the validity of non-parametric test procedures.
Using longitudinal data for Norwegian children born in 1950Norwegian children born in , 1955Norwegian children born in , 1960Norwegian children born in and 1965, we find a relatively high degree of earnings mobility. There is no tendency... more
Using longitudinal data for Norwegian children born in 1950Norwegian children born in , 1955Norwegian children born in , 1960Norwegian children born in and 1965, we find a relatively high degree of earnings mobility. There is no tendency toward decreasing mobility over the cohorts. Conditioning on the position in the earnings distribution, the analysis indicates quite high mobility in the middle of the distribution and somewhat more persistence at the top and bottom. This approach also reveals increased mobility over time for sons, but a less clear picture for daughters.
Infrastructure vs. Mobile Sensing-The Evolving Landscape There has been a surge of enthusiasm around the paradigm of participatory urban sensing, which views a citizen-centric distributed and mobile sensing substrate as one of the most... more
Infrastructure vs. Mobile Sensing-The Evolving Landscape There has been a surge of enthusiasm around the paradigm of participatory urban sensing, which views a citizen-centric distributed and mobile sensing substrate as one of the most promising ways to gather various types of urban information, such as environmental parameters & pollution levels, traffic congestion, popularity of events at various public spaces, etc. In parallel, there have also been significant recent deployments, perhaps less heralded, of infrastructure or ambient sensors, especially in public spaces, such as malls, airports and stadiums. The panel seeks to understand the limits of each of these individual sensing paradigms observed in practice, due to challenges such as resource limitations, privacy, user interfaces and data quality. It will also discuss how the combined capabilities of participatory and infrastructure sensing can be effectively harnessed, for novel urban computing applications.
A new method of kernel density estimation with a varying adaptive window size is proposed. It is based on the socalled intersection of confidence intervals (ICI) rule. Several examples of the proposed method are given for different types... more
A new method of kernel density estimation with a varying adaptive window size is proposed. It is based on the socalled intersection of confidence intervals (ICI) rule. Several examples of the proposed method are given for different types of densities and the quality of the adaptive density estimate is assessed by means of numerical simulations.
Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute Department of Biostatistics, Harvard School of Public Health, Boston, MA, USA. Faculty of Industrial Engineering and Management, TechnionIsrael Institute... more
Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute Department of Biostatistics, Harvard School of Public Health, Boston, MA, USA. Faculty of Industrial Engineering and Management, TechnionIsrael Institute of Technology, Haifa, Israel. Division of Cancer Epidemiology and Genetics, Biostatistics Branch, National Cancer Institute, Bethesda, MD, USA. Department of Epidemiology, University of California Irvine, Irvine, CA, USA.
This paper is a review of recent developments of parametric and non-parametric approaches to decompose inequality by subgroups, income sources, causal factors and other unit characteristics. Different methods of decomposing changes in... more
This paper is a review of recent developments of parametric and non-parametric approaches to decompose inequality by subgroups, income sources, causal factors and other unit characteristics. Different methods of decomposing changes in poverty into growth, redistribution, poverty standard and residual components are described. In parametric approaches the dynamics of income accounting for transitory and permanent changes in individual and household earnings conditional of various covariates are also reviewed. Statistical inferences for inequality measurement including delta and bootstrapping and other methods to provide estimates of the sampling distribution are presented. These issues are important in the design of policy measures and expectations about their impacts on earnings inequality and poverty reductions.
We describe a novel approach to extract the neural tracts of interest from a diffusion tensor image (DTI). Compared to standard streamline tractography, existing probabilistic methods are able to capture fiber paths that deviate from the... more
We describe a novel approach to extract the neural tracts of interest from a diffusion tensor image (DTI). Compared to standard streamline tractography, existing probabilistic methods are able to capture fiber paths that deviate from the main tensor diffusion directions. At the same time, tensor clustering methods are able to more precisely delimit the border of the bundle. To the best of our knowledge, we propose the first algorithm which combines the advantages supplied by probabilistic and tensor clustering approaches. The algorithm includes a post-processing step to limit partial-volume related segmentation errors. We extensively test the accuracy of our algorithm on different configurations of a DTI software phantom for which we systematically vary the image noise, the number of gradients, the geometry of the fiber paths and the angle between adjacent and crossing fiber bundles. The reproducibility of the algorithm is supported by the segmentation of the corticospinal tract of nine patients. Additional segmentations of the corticospinal tract, the arcuate fasciculus, and the optic radiations are in accordance with anatomical knowledge. The required user interaction is comparable to that of streamline tractography, which allows for an uncomplicated integration of the algorithm into the clinical routine.
This document provides recent evidence about the persistency of wage gaps between formal and informal workers in Colombia by using a non-parametric method proposed by Ñopo (2008a). Over a rich dataset at a household level during... more
This document provides recent evidence about the persistency of wage gaps between formal and informal workers in Colombia by using a non-parametric method proposed by Ñopo (2008a). Over a rich dataset at a household level during 2008-2012, it is found that formal workers earn between 30 to 60 percent more, on average, than informal workers. Despite of the formality definition-structuralist or institucionalist- adopted, it is clear that formal workers have more economic advantages than informal ones, but after controlling by demographic and labor variables an important fraction of the gap still remains unexplained.
- by Luis Gamboa
- •
- Colombia, Wage Gaps, Non Parametric, C
Recent tests of stochastic dominance of several orders, proposed by Linton, Maasoumi and Whang [Linton, O., Maasoumi, E., & Whang, Y. (2005). Consistent testing for stochastic dominance under general sampling schemes. Review of Economic... more
Recent tests of stochastic dominance of several orders, proposed by Linton, Maasoumi and Whang [Linton, O., Maasoumi, E., & Whang, Y. (2005). Consistent testing for stochastic dominance under general sampling schemes. Review of Economic Studies, 72(3), 735-765], are applied to reexamine the equity-premium puzzle. An advantage of this non-parametric approach is that it provides a framework to assess whether the existence of a premium is due to particular cardinal choices of either the utility function or the underlying returns distribution, or both. The approach is applied to the original Mehra-Prescott data and more recent data that include daily yields on Treasury bonds and daily returns on the S&P500 and the NASDAQ indexes. The empirical results show little evidence of stochastic dominance among the assets investigated. This suggests that the observed equity premium represents compensation for bearing higher risk, taking into account higher-order moments such as skewness and kurtosis. There is some evidence of a reverse puzzle, whereby Treasury bonds stochastically dominate equities at the third order, a result which potentially reflects insufficient compensation to investors for bearing the negative skewness associated with the S&P500 index.
Mendelian risk prediction models calculate the probability of a proband being a mutation carrier based on family history and known mutation prevalence and penetrance. Family history in this setting, is self-reported and is often reported... more
Mendelian risk prediction models calculate the probability of a proband being a mutation carrier based on family history and known mutation prevalence and penetrance. Family history in this setting, is self-reported and is often reported with error. Various studies in the literature have evaluated misreporting of family history. Using a validation data set which includes both error-prone self-reported family history and error-free validated family history, we propose a method to adjust for misreporting of family history. We estimate the measurement error process in a validation data set (from University of California at Irvine (UCI)) using nonparametric smoothed Kaplan-Meier estimators, and use Monte Carlo integration to implement the adjustment. In this paper, we extend BRCAPRO, a Mendelian risk prediction model for breast and ovarian cancers, to adjust for misreporting in family history. We apply the extended model to data from the Cancer Genetics Network (CGN).
This manuscript discusses the application of chemometrics to the handling of HPLC response data using a model mixture containing ascorbic acid, paracetamol and guaiphenesin. Derivative treatment of chromatographic response data followed... more
This manuscript discusses the application of chemometrics to the handling of HPLC response data using a model mixture containing ascorbic acid, paracetamol and guaiphenesin. Derivative treatment of chromatographic response data followed by convolution of the resulting derivative curves using 8-points sin xi polynomials (discrete Fourier functions) was found beneficial in eliminating different types of interferences. This was successfully applied to handle some of the most common chromatographic problems and non-ideal conditions, namely: very low analyte concentrations, overlapping chromatographic peaks and baseline drift. For example, a significant change in the correlation coefficient of guaiphenesin, in case of baseline drift, went from 0.9978 to 0.9998 on applying normal conventional peak area and first derivative under Fourier functions methods, respectively. It also compares the application of Theil's method, a non-parametric regression method, in handling the response data, with the least squares parametric regression method, which is considered the de facto standard method used for regression. Theil's method was found to be superior to the method of least squares as it assumes that errors could occur in both x- and y-directions and they might not be normally distributed. In addition, it could effectively circumvent any outlier data points.
We adopt the Total Time on Test procedure to investigate monotone time trends in the intensity in a repeated event setting. The correct model is assumed to be a proportional hazards model, with a random effect to account for dependence... more
We adopt the Total Time on Test procedure to investigate monotone time trends in the intensity in a repeated event setting. The correct model is assumed to be a proportional hazards model, with a random effect to account for dependence within subjects. The method offers a simple routine for testing relevant hypotheses for recurrent event processes, without making distributional assumptions about the frailty. Such assumptions may severely affect conclusions concerning regression coefficients and cause bias in the estimated heterogeneity. The method is illustrated by re-analyzing Danish registry data and a long-term Swiss clinical study on recurrence in affective disorder.
In this paper we focus primarily on the dynamic evolution of the world distribution of growth rates in per capita GDP. We propose new concepts and measures of "convergence," or "divergence" that are based on entropy... more
In this paper we focus primarily on the dynamic evolution of the world distribution of growth rates in per capita GDP. We propose new concepts and measures of "convergence," or "divergence" that are based on entropy distances and dominance relations between groups of countries over time. We update the sample period to include the most recent decade of data available,
where F is a semi-metric space. We study a kernel estimator of the conditional mode of the univariate response variable Y i given the functional variable X i . The main aim of this paper is to prove the almost complete convergence (with... more
where F is a semi-metric space. We study a kernel estimator of the conditional mode of the univariate response variable Y i given the functional variable X i . The main aim of this paper is to prove the almost complete convergence (with rate) of this estimate