Veronica Gonzalez-Lopez - Academia.edu (original) (raw)
Papers by Veronica Gonzalez-Lopez
ABSTRACT This paper investigates whether values of acoustical correlates of pretonic syllables ad... more ABSTRACT This paper investigates whether values of acoustical correlates of pretonic syllables adjacent to the one(s) perceived as bearing secondary stress could predict such perception in Brazilian Portuguese (BP) data. In order to pursue this goal, a comparison is made between pretonic syllables perceived as bearing secondary stress and those perceived as not bearing it. According to the results, obtained by application of statistical analyses, it is possible to claim that variation in intensity and in F0 in syllables perceived as bearing secondary stress, as well as in adjacent syllables, can be taken as a robust correlate for data perception regarding secondary stress placement in BP. Variation in intensity and in F0 in syllables perceived as bearing secondary stress and variation in intensity and in F0 in the other adjacent pretonic syllables seem to be complementary information for the perception of secondary stresses by BP speakers. The results point to relevant questions for further work concerning the rhythmic and intonational organization of Brazilian Portuguese.
AIP Conference Proceedings
ABSTRACT In this paper it was applied a new Bayesian approach to model the dependence between two... more ABSTRACT In this paper it was applied a new Bayesian approach to model the dependence between two variables of interest in public policy: "Gonorrhea Rates per 100,000 Population" and "400% Federal Poverty Level and over" with a small number of paired observations (one pair for each U.S. state). We use a mixture of Gumbel-Barnett copulas suitable to represent situations with weak and negative dependence, which is the case treated here. The methodology allows even making a prediction of the dependence between the variables from one year to another, showing whether there was any alteration in the dependence.
In this work we introduce a new and richer class of finite order Markov chain models and address ... more In this work we introduce a new and richer class of finite order Markov chain models and address the following model selection problem: find the Markov model with the minimal set of parameters (minimal Markov model) which is necessary to represent a source as a Markov chain of finite order. Let us call M the order of the chain and A the finite alphabet, to determine the minimal Markov model, we define an equivalence relation on the state space A M , such that all the sequences of size M with the same transition probabilities are put in the same category. In this way we have one set of (|A| − 1) transition probabilities for each category, obtaining a model with a minimal number of parameters. We show that the model can be selected consistently using the Bayesian information criterion. * This work is partially supported by PRONEX/FAPESP Project Stochastic behavior, critical phenomena and rhythmic pattern identification in natural languages (grant number 03/09930-9) and by CNPq Edital Universal (2007), project: "Padrões rítmicos, domínios prosódicos e modelagem probabilística em corpora do português".
ABSTRACT The Asymmetric Cubic Sections copula family is used to describe the dependence between l... more ABSTRACT The Asymmetric Cubic Sections copula family is used to describe the dependence between language and mathematics scores of admission test. The parameters of the copula are annually estimated by means of a non-informative Bayesian method. The application focus on measuring the impact of conditioning language result to a mathematics minimum grade and vice-versa, this paper shows some initial results in that line. In addition, the analytical expression for each conditional expectation is shown together with a description of its monotonic behavior.
Mathematical Methods in the Applied Sciences, 2014
ABSTRACT A family of conjugated distributions for a given type of copulas is defined in this pape... more ABSTRACT A family of conjugated distributions for a given type of copulas is defined in this paper. Those copulas can be written as a mixture of d-dimensional parameter exponential functions. The generalized Farlie–Gumbel–Morgenstern copula is an example of this representation. This family is used to illustrate the estimation technique with real data. Also, the applicability of Bayesian predictive approach is shown in an education policy issue by defining goals for the number of students per class that leads to improve their performance at school. Copyright © 2014 John Wiley & Sons, Ltd.
buscado, em correlatos acústicos como duração, freqüência fundamental, intensidade e configuração... more buscado, em correlatos acústicos como duração, freqüência fundamental, intensidade e configuração formântica, evidências para a existência do acento secundário em português brasileiro (doravante, PB). Tais trabalhos, com foco na implementação fonética da sílaba percebida como portadora de acento secundário, especificamente, e com base em dados experimentais constituídos por frases isoladas, afirmam não haver correlato estatisticamente robusto para a existência deste acento percebido pelos falantes de PB. Dado o fato de o acento secundário se constituir em um fenômeno de natureza suprassegmental, nossa hipótese é a de que os correlatos acústicos a ele associados se manifestam predominantemente em outra(s) sílaba(s) adjacentes à sílaba portadora de acento secundário e precedentes à sílaba portadora de acento primário (sílaba tônica) no âmbito da palavra prosódica. Os resultados preliminares deste trabalho, provenientes da análise de dados produzidos a partir de leitura de um mesmo texto por falantes de PB, trazem evidências de que a variação de intensidade manifestada na sílaba percebida como portadora de acento secundário, bem como nas outras sílabas a ela adjacentes, configura-se como um correlato robusto para a presença do acento secundário percebido auditivamente pelos falantes desta variedade de português. Há uma correlação entre os valores de intensidade da sílaba percebida como portadora de acento secundário e as outras no seu entorno e precedentes à tônica, o que permite postular que o valor médio da intensidade daquelas(s) sílaba(s) resulta em um preditor natural da intensidade da sílaba percebida como portadora de acento secundário. A metodologia deste trabalho consiste: (i) na análise do sinal acústico, em termos de intensidade dos núcleos silábicos precedentes à sílaba tônica de palavras prosódicas constantes de um corpus de PB nas quais foram identificadas perceptualmente ocorrências de acentos secundários por falantes nativos desta variedade; e (ii) na aplicação de análises estatísticas a estes dados, ajustando-se um modelo de regressão de efeitos fixos, cujos objetivos são: (i) extrair a influência do locutor; e (ii) estimar a intensidade média associada a ambos os processos: Anais do CELSUL 2008 GT -Abordagens acústicas em estudos segmentais e supra-segmentais 2 intensidade observada na sílaba percebida como portadora de acento secundário e intensidade média observada na(s) outra(s) sílaba(s) pretônica(s) no seu entorno. A análise do sinal acústico e as análises estatísticas dos dados foram realizadas, respectivamente, por meio do programa de análise de fala Praat (http://www.fon.hum.uva.nl/praat/) e do programa estatístico R-project, disponível na página: http://www.rproject.org/.
Journal of Multivariate Analysis, 2014
ABSTRACT We propose a new class of nonparametric tests for the supposition of independence betwee... more ABSTRACT We propose a new class of nonparametric tests for the supposition of independence between two continuous random variables XX and YY. Given a size nn sample, let ππ be the permutation which maps the ranks of the XX observations on the rank of the YY observations. We identify the independence assumption of the null hypothesis with the uniform distribution on the permutation space. A test based on the size of the longest increasing subsequence of ππ (LnLn) is defined. The exact distribution of LnLn is computed from Schensted’s theorem [C. Schensted. Longest increasing and decreasing sub-sequeces. Canad. J. Math. 13 (1961) 179–191]. The asymptotic distribution of LnLn was obtained by Baik et al. [J. Baik, P. Deift, K.Johansson. On The Distribution of the Length of the Longest Increasing Subsequence of Random Permutations. J. Amer. Math. Soc. 12 (1999) 1119–1178]. As the statistic LnLn is discrete, there is a small set of possible significance levels. To solve this problem we define the JLnJLn statistic which is a jackknife version of LnLn, as well as the corresponding hypothesis test. A third test is defined based on the JLMnJLMn statistic which is a jackknife version of the longest monotonic subsequence of ππ. On a simulation study we apply our tests to diverse dependence situations with null or very small correlations where the independence hypothesis is difficult to reject. We show that LnLn, JLnJLn and JLMnJLMn tests have very good performance on that kind of situations. We illustrate the use of those tests on two real data examples with small sample size.
Journal of Historical Linguistics, 2012
The prosodic change that has been reported to have occurred from Classical to Modern Portuguese i... more The prosodic change that has been reported to have occurred from Classical to Modern Portuguese is investigated by means of a new approach to the study of rhythm in language change. Assuming that rhythm is a by-product of the presence/absence of a set of properties in a given linguistic system, we computed frequency information on rhythm-related properties from written texts of the 16 th to the 19 th centuries, by means of the electronic tool FreP. Results show a change in the distributions of properties related to word stress and prosodic word shape after the 16 th century, indicating that the prosodic change occurred between the 16 th and 17 th centuries. A predictive analysis based on Bayesian statistics provided strong support for the timing of the change, and successfully modelled our data showing a time line consistent with the direction of the prosodic shift towards the integration of stress-timing properties into Romance syllable-timed rhythm. Rhythm from Classical to Modern Portuguese 3
Communications in Statistics - Theory and Methods, 2013
ABSTRACT In this paper was selected a generalized Frank copula to model the dependence between th... more ABSTRACT In this paper was selected a generalized Frank copula to model the dependence between the energy on two frequency bands of the speech signal, coming from eight languages. Was developed an algorithm that uses maximum likelihood to choose the best fitting copula's parameters. Through Bootstrap, estimates the variability of the parameters for each language and computes confidence regions by means of Voronoi tesselations. A linguistic conjecture which claims that the languages are organized in three rhythmic classes, was confirmed by the Voronoi regions. Modeling with a mono-parametric Frank copula, the different degrees of dependence between the energies was quantified.
Communications in Statistics - Theory and Methods, 2012
ABSTRACT The family of the asymmetric logistic copulas appears naturally in modeling tail depende... more ABSTRACT The family of the asymmetric logistic copulas appears naturally in modeling tail dependence. Within this family, some well-known models, as independence and logistic dependence, define precise hypotheses, having zero posterior probability for an absolute continuous posterior distribution. We show that the e-value associated to the Full Bayesian Significance Test has a good performance in non standard dependence problems, obtaining posterior estimates and predictive distributions. The analysis proposed is illustrated with two examples: (1) monthly sea level maxima at Newlyn and Sheerness, England (1990–2005) and (2) AIDS rates related to an educational indicator in U.S. Census Bureau (2007). We validate the inferences obtained through simulated data.
ABSTRACT To decide if it is worth taking multiple measurements to reduce the length of confidence... more ABSTRACT To decide if it is worth taking multiple measurements to reduce the length of confidence intervals for the mean, we must have information on whether the intraclass correlation coefficient is smaller than a certain constant. Under Normal model exact tests are available for the corresponding hypothesis, but in many practical settings they must rely on small pilot studies and therefore may have little power. We propose a Bayesian test that allows the incorporation of previous information on the variance components used in the definition of the intraclass correlation coefficients and may be generalized to situations where normality does not hold. We develop computational algorithms to implement the proposed method and present an example based on data from the food industry.
A common practice in scientific experimentation in areas such as Medicine, Pharmacy, Nutrition, a... more A common practice in scientific experimentation in areas such as Medicine, Pharmacy, Nutrition, among others, is to measure each sample unit three times (in triplicate) or more generally, m times (in m-plicate) and take the average of such measurements as the response variable. This is generally done to improve the precision of model parameter estimates. When the objective is to estimate the population mean, we use a random effects model to show that the efficiency of working with m-plicates is related to the magnitude of the intraclass correlation coefficient, which essentially measures the contribution of the variance between sample units to the total variance. We show that above certain values of this parameter, the use of m-plicates does not bring significant improvement (say, of 10% or more) to the precision of the estimates. Additionally, taking the costs of sampling units and making measurements into account, we compare sampling schemes with and without m-plicates designed to obtain fixed width confidence intervals for the mean. We illustrate the results through a practical example.
We propose a new nonparametric test for the supposition of independence between two continuous ra... more We propose a new nonparametric test for the supposition of independence between two continuous random variables X and Y. Given a sample of (X, Y ), the test is based on the size of the longest increasing subsequence of the permutation which maps the ranks of the X observations to the ranks of the Y observations. We identify the independence assumption between the two continuous variables with the space of permutation equipped with the uniform distribution and we show the exact distribution of the statistic. We calculate the distribution for several sample sizes. Through a simulation study we estimate the power of our test for diverse alternative hypothesis under the null hypothesis of independence. * This work is partially supported by PRONEX/FAPESP Project Stochastic behavior, critical phenomena and rhythmic pattern identification in natural languages (grant number 03/09930-9) and by CNPq Edital Universal (2007), project: "Padrões rítmicos, domínios prosódicos e modelagem probabilística em corpora do português".
ABSTRACT We address the problem of robust model selection for finite memory stochastic processes.... more ABSTRACT We address the problem of robust model selection for finite memory stochastic processes. Consider m independent samples, with most of them being realizations of the same stochastic process with law Q, which is the one we want to retrieve. We define the asymptotic breakdown point γ for a model selection procedure and also we devise a model selection procedure. We compute the value of γ which is 0.5, when all the processes are Markovian. This result is valid for any family of finite order Markov models but for simplicity we will focus on the family of variable length Markov chains.
ABSTRACT In this paper we address the problem of the statistical classification of languages acco... more ABSTRACT In this paper we address the problem of the statistical classification of languages according to their rhythmic features, using speech samples. This is an important open problem in phonology. A persistent difficulty on this issue is that the speech samples correspond to several sentences produced by diverse speakers, corresponding to a mixture of distributions. The usual procedure to deal with this problem has been to choose a subset of the complete sample which seems to best represent each language. The selection is made by listening to the samples. In contrast, our approach uses the full dataset without any prior selection of the samples. In this paper, the classification is obtained trough a robust model selection methodology. We estimate a model that represents the main law for each language, then, the laws are compared using the relative entropy and clusters of languages are obtained. Our findings agree with the linguistic conjecture, related to the rhythm of the languages analyzed in the dataset. The robust model selection methodology consider m independent samples,with more than half of them being realizations of the same stochastic process with law Q, which is the one we want to retrieve. Under that conditions, and for a sample size large enough, our procedure select the process with law Q. Our model selection strategy is based on estimating relative entropies to select a subset of samples that are realizations of the same law. Although, the procedure is valid for any family of finite order Markov models, we will focus on the family of variable length Markov chain models, which include the fixed order Markov chain model family.
ABSTRACT We introduce a new index to detect dependence in trivariate distributions. The index is ... more ABSTRACT We introduce a new index to detect dependence in trivariate distributions. The index is based on the maximization of the coefficients of directional dependence over the set of directions. We show how to calculate the index using the three pairwise Spearman’s rho coefficients and the three common 3-dimensional versions of Spearman’s rho. We obtain the asymptotic distributions of the empirical processes related to the estimators of the coefficients of directional dependence and also we derive the asymptotic distribution of our index. We display examples where the index identifies dependence undetected by the aforementioned 3-dimensional versions of Spearman’s rho. The value of the new index and the direction in which the maximal dependence occurs are easily computed and we illustrate with a simulation study and a real data set.
Theory and …, 2000
It is always possible to construct a real function φ, given random quantities X and Y with contin... more It is always possible to construct a real function φ, given random quantities X and Y with continuous distribution functions F and G, respectively, in such a way that φ(X) and φ(Y ), also random quantities, have both the same distribution function, say H . This result of De Finetti introduces an alternative way to somehow describe the 'opinion' of a group of experts about a continuous random quantity by the construction of Fields of coincidence of opinions (FCO). A Field of coincidence of opinions is a finite union of intervals where the opinions of the experts coincide with respect to that quantity of interest. We speculate on (dis)advantages of Fields of Opinion compared to usual 'probability' measures of a group and on their relation with a continuous version of the well-known Allais' paradox.
ABSTRACT This paper investigates whether values of acoustical correlates of pretonic syllables ad... more ABSTRACT This paper investigates whether values of acoustical correlates of pretonic syllables adjacent to the one(s) perceived as bearing secondary stress could predict such perception in Brazilian Portuguese (BP) data. In order to pursue this goal, a comparison is made between pretonic syllables perceived as bearing secondary stress and those perceived as not bearing it. According to the results, obtained by application of statistical analyses, it is possible to claim that variation in intensity and in F0 in syllables perceived as bearing secondary stress, as well as in adjacent syllables, can be taken as a robust correlate for data perception regarding secondary stress placement in BP. Variation in intensity and in F0 in syllables perceived as bearing secondary stress and variation in intensity and in F0 in the other adjacent pretonic syllables seem to be complementary information for the perception of secondary stresses by BP speakers. The results point to relevant questions for further work concerning the rhythmic and intonational organization of Brazilian Portuguese.
AIP Conference Proceedings
ABSTRACT In this paper it was applied a new Bayesian approach to model the dependence between two... more ABSTRACT In this paper it was applied a new Bayesian approach to model the dependence between two variables of interest in public policy: "Gonorrhea Rates per 100,000 Population" and "400% Federal Poverty Level and over" with a small number of paired observations (one pair for each U.S. state). We use a mixture of Gumbel-Barnett copulas suitable to represent situations with weak and negative dependence, which is the case treated here. The methodology allows even making a prediction of the dependence between the variables from one year to another, showing whether there was any alteration in the dependence.
In this work we introduce a new and richer class of finite order Markov chain models and address ... more In this work we introduce a new and richer class of finite order Markov chain models and address the following model selection problem: find the Markov model with the minimal set of parameters (minimal Markov model) which is necessary to represent a source as a Markov chain of finite order. Let us call M the order of the chain and A the finite alphabet, to determine the minimal Markov model, we define an equivalence relation on the state space A M , such that all the sequences of size M with the same transition probabilities are put in the same category. In this way we have one set of (|A| − 1) transition probabilities for each category, obtaining a model with a minimal number of parameters. We show that the model can be selected consistently using the Bayesian information criterion. * This work is partially supported by PRONEX/FAPESP Project Stochastic behavior, critical phenomena and rhythmic pattern identification in natural languages (grant number 03/09930-9) and by CNPq Edital Universal (2007), project: "Padrões rítmicos, domínios prosódicos e modelagem probabilística em corpora do português".
ABSTRACT The Asymmetric Cubic Sections copula family is used to describe the dependence between l... more ABSTRACT The Asymmetric Cubic Sections copula family is used to describe the dependence between language and mathematics scores of admission test. The parameters of the copula are annually estimated by means of a non-informative Bayesian method. The application focus on measuring the impact of conditioning language result to a mathematics minimum grade and vice-versa, this paper shows some initial results in that line. In addition, the analytical expression for each conditional expectation is shown together with a description of its monotonic behavior.
Mathematical Methods in the Applied Sciences, 2014
ABSTRACT A family of conjugated distributions for a given type of copulas is defined in this pape... more ABSTRACT A family of conjugated distributions for a given type of copulas is defined in this paper. Those copulas can be written as a mixture of d-dimensional parameter exponential functions. The generalized Farlie–Gumbel–Morgenstern copula is an example of this representation. This family is used to illustrate the estimation technique with real data. Also, the applicability of Bayesian predictive approach is shown in an education policy issue by defining goals for the number of students per class that leads to improve their performance at school. Copyright © 2014 John Wiley & Sons, Ltd.
buscado, em correlatos acústicos como duração, freqüência fundamental, intensidade e configuração... more buscado, em correlatos acústicos como duração, freqüência fundamental, intensidade e configuração formântica, evidências para a existência do acento secundário em português brasileiro (doravante, PB). Tais trabalhos, com foco na implementação fonética da sílaba percebida como portadora de acento secundário, especificamente, e com base em dados experimentais constituídos por frases isoladas, afirmam não haver correlato estatisticamente robusto para a existência deste acento percebido pelos falantes de PB. Dado o fato de o acento secundário se constituir em um fenômeno de natureza suprassegmental, nossa hipótese é a de que os correlatos acústicos a ele associados se manifestam predominantemente em outra(s) sílaba(s) adjacentes à sílaba portadora de acento secundário e precedentes à sílaba portadora de acento primário (sílaba tônica) no âmbito da palavra prosódica. Os resultados preliminares deste trabalho, provenientes da análise de dados produzidos a partir de leitura de um mesmo texto por falantes de PB, trazem evidências de que a variação de intensidade manifestada na sílaba percebida como portadora de acento secundário, bem como nas outras sílabas a ela adjacentes, configura-se como um correlato robusto para a presença do acento secundário percebido auditivamente pelos falantes desta variedade de português. Há uma correlação entre os valores de intensidade da sílaba percebida como portadora de acento secundário e as outras no seu entorno e precedentes à tônica, o que permite postular que o valor médio da intensidade daquelas(s) sílaba(s) resulta em um preditor natural da intensidade da sílaba percebida como portadora de acento secundário. A metodologia deste trabalho consiste: (i) na análise do sinal acústico, em termos de intensidade dos núcleos silábicos precedentes à sílaba tônica de palavras prosódicas constantes de um corpus de PB nas quais foram identificadas perceptualmente ocorrências de acentos secundários por falantes nativos desta variedade; e (ii) na aplicação de análises estatísticas a estes dados, ajustando-se um modelo de regressão de efeitos fixos, cujos objetivos são: (i) extrair a influência do locutor; e (ii) estimar a intensidade média associada a ambos os processos: Anais do CELSUL 2008 GT -Abordagens acústicas em estudos segmentais e supra-segmentais 2 intensidade observada na sílaba percebida como portadora de acento secundário e intensidade média observada na(s) outra(s) sílaba(s) pretônica(s) no seu entorno. A análise do sinal acústico e as análises estatísticas dos dados foram realizadas, respectivamente, por meio do programa de análise de fala Praat (http://www.fon.hum.uva.nl/praat/) e do programa estatístico R-project, disponível na página: http://www.rproject.org/.
Journal of Multivariate Analysis, 2014
ABSTRACT We propose a new class of nonparametric tests for the supposition of independence betwee... more ABSTRACT We propose a new class of nonparametric tests for the supposition of independence between two continuous random variables XX and YY. Given a size nn sample, let ππ be the permutation which maps the ranks of the XX observations on the rank of the YY observations. We identify the independence assumption of the null hypothesis with the uniform distribution on the permutation space. A test based on the size of the longest increasing subsequence of ππ (LnLn) is defined. The exact distribution of LnLn is computed from Schensted’s theorem [C. Schensted. Longest increasing and decreasing sub-sequeces. Canad. J. Math. 13 (1961) 179–191]. The asymptotic distribution of LnLn was obtained by Baik et al. [J. Baik, P. Deift, K.Johansson. On The Distribution of the Length of the Longest Increasing Subsequence of Random Permutations. J. Amer. Math. Soc. 12 (1999) 1119–1178]. As the statistic LnLn is discrete, there is a small set of possible significance levels. To solve this problem we define the JLnJLn statistic which is a jackknife version of LnLn, as well as the corresponding hypothesis test. A third test is defined based on the JLMnJLMn statistic which is a jackknife version of the longest monotonic subsequence of ππ. On a simulation study we apply our tests to diverse dependence situations with null or very small correlations where the independence hypothesis is difficult to reject. We show that LnLn, JLnJLn and JLMnJLMn tests have very good performance on that kind of situations. We illustrate the use of those tests on two real data examples with small sample size.
Journal of Historical Linguistics, 2012
The prosodic change that has been reported to have occurred from Classical to Modern Portuguese i... more The prosodic change that has been reported to have occurred from Classical to Modern Portuguese is investigated by means of a new approach to the study of rhythm in language change. Assuming that rhythm is a by-product of the presence/absence of a set of properties in a given linguistic system, we computed frequency information on rhythm-related properties from written texts of the 16 th to the 19 th centuries, by means of the electronic tool FreP. Results show a change in the distributions of properties related to word stress and prosodic word shape after the 16 th century, indicating that the prosodic change occurred between the 16 th and 17 th centuries. A predictive analysis based on Bayesian statistics provided strong support for the timing of the change, and successfully modelled our data showing a time line consistent with the direction of the prosodic shift towards the integration of stress-timing properties into Romance syllable-timed rhythm. Rhythm from Classical to Modern Portuguese 3
Communications in Statistics - Theory and Methods, 2013
ABSTRACT In this paper was selected a generalized Frank copula to model the dependence between th... more ABSTRACT In this paper was selected a generalized Frank copula to model the dependence between the energy on two frequency bands of the speech signal, coming from eight languages. Was developed an algorithm that uses maximum likelihood to choose the best fitting copula's parameters. Through Bootstrap, estimates the variability of the parameters for each language and computes confidence regions by means of Voronoi tesselations. A linguistic conjecture which claims that the languages are organized in three rhythmic classes, was confirmed by the Voronoi regions. Modeling with a mono-parametric Frank copula, the different degrees of dependence between the energies was quantified.
Communications in Statistics - Theory and Methods, 2012
ABSTRACT The family of the asymmetric logistic copulas appears naturally in modeling tail depende... more ABSTRACT The family of the asymmetric logistic copulas appears naturally in modeling tail dependence. Within this family, some well-known models, as independence and logistic dependence, define precise hypotheses, having zero posterior probability for an absolute continuous posterior distribution. We show that the e-value associated to the Full Bayesian Significance Test has a good performance in non standard dependence problems, obtaining posterior estimates and predictive distributions. The analysis proposed is illustrated with two examples: (1) monthly sea level maxima at Newlyn and Sheerness, England (1990–2005) and (2) AIDS rates related to an educational indicator in U.S. Census Bureau (2007). We validate the inferences obtained through simulated data.
ABSTRACT To decide if it is worth taking multiple measurements to reduce the length of confidence... more ABSTRACT To decide if it is worth taking multiple measurements to reduce the length of confidence intervals for the mean, we must have information on whether the intraclass correlation coefficient is smaller than a certain constant. Under Normal model exact tests are available for the corresponding hypothesis, but in many practical settings they must rely on small pilot studies and therefore may have little power. We propose a Bayesian test that allows the incorporation of previous information on the variance components used in the definition of the intraclass correlation coefficients and may be generalized to situations where normality does not hold. We develop computational algorithms to implement the proposed method and present an example based on data from the food industry.
A common practice in scientific experimentation in areas such as Medicine, Pharmacy, Nutrition, a... more A common practice in scientific experimentation in areas such as Medicine, Pharmacy, Nutrition, among others, is to measure each sample unit three times (in triplicate) or more generally, m times (in m-plicate) and take the average of such measurements as the response variable. This is generally done to improve the precision of model parameter estimates. When the objective is to estimate the population mean, we use a random effects model to show that the efficiency of working with m-plicates is related to the magnitude of the intraclass correlation coefficient, which essentially measures the contribution of the variance between sample units to the total variance. We show that above certain values of this parameter, the use of m-plicates does not bring significant improvement (say, of 10% or more) to the precision of the estimates. Additionally, taking the costs of sampling units and making measurements into account, we compare sampling schemes with and without m-plicates designed to obtain fixed width confidence intervals for the mean. We illustrate the results through a practical example.
We propose a new nonparametric test for the supposition of independence between two continuous ra... more We propose a new nonparametric test for the supposition of independence between two continuous random variables X and Y. Given a sample of (X, Y ), the test is based on the size of the longest increasing subsequence of the permutation which maps the ranks of the X observations to the ranks of the Y observations. We identify the independence assumption between the two continuous variables with the space of permutation equipped with the uniform distribution and we show the exact distribution of the statistic. We calculate the distribution for several sample sizes. Through a simulation study we estimate the power of our test for diverse alternative hypothesis under the null hypothesis of independence. * This work is partially supported by PRONEX/FAPESP Project Stochastic behavior, critical phenomena and rhythmic pattern identification in natural languages (grant number 03/09930-9) and by CNPq Edital Universal (2007), project: "Padrões rítmicos, domínios prosódicos e modelagem probabilística em corpora do português".
ABSTRACT We address the problem of robust model selection for finite memory stochastic processes.... more ABSTRACT We address the problem of robust model selection for finite memory stochastic processes. Consider m independent samples, with most of them being realizations of the same stochastic process with law Q, which is the one we want to retrieve. We define the asymptotic breakdown point γ for a model selection procedure and also we devise a model selection procedure. We compute the value of γ which is 0.5, when all the processes are Markovian. This result is valid for any family of finite order Markov models but for simplicity we will focus on the family of variable length Markov chains.
ABSTRACT In this paper we address the problem of the statistical classification of languages acco... more ABSTRACT In this paper we address the problem of the statistical classification of languages according to their rhythmic features, using speech samples. This is an important open problem in phonology. A persistent difficulty on this issue is that the speech samples correspond to several sentences produced by diverse speakers, corresponding to a mixture of distributions. The usual procedure to deal with this problem has been to choose a subset of the complete sample which seems to best represent each language. The selection is made by listening to the samples. In contrast, our approach uses the full dataset without any prior selection of the samples. In this paper, the classification is obtained trough a robust model selection methodology. We estimate a model that represents the main law for each language, then, the laws are compared using the relative entropy and clusters of languages are obtained. Our findings agree with the linguistic conjecture, related to the rhythm of the languages analyzed in the dataset. The robust model selection methodology consider m independent samples,with more than half of them being realizations of the same stochastic process with law Q, which is the one we want to retrieve. Under that conditions, and for a sample size large enough, our procedure select the process with law Q. Our model selection strategy is based on estimating relative entropies to select a subset of samples that are realizations of the same law. Although, the procedure is valid for any family of finite order Markov models, we will focus on the family of variable length Markov chain models, which include the fixed order Markov chain model family.
ABSTRACT We introduce a new index to detect dependence in trivariate distributions. The index is ... more ABSTRACT We introduce a new index to detect dependence in trivariate distributions. The index is based on the maximization of the coefficients of directional dependence over the set of directions. We show how to calculate the index using the three pairwise Spearman’s rho coefficients and the three common 3-dimensional versions of Spearman’s rho. We obtain the asymptotic distributions of the empirical processes related to the estimators of the coefficients of directional dependence and also we derive the asymptotic distribution of our index. We display examples where the index identifies dependence undetected by the aforementioned 3-dimensional versions of Spearman’s rho. The value of the new index and the direction in which the maximal dependence occurs are easily computed and we illustrate with a simulation study and a real data set.
Theory and …, 2000
It is always possible to construct a real function φ, given random quantities X and Y with contin... more It is always possible to construct a real function φ, given random quantities X and Y with continuous distribution functions F and G, respectively, in such a way that φ(X) and φ(Y ), also random quantities, have both the same distribution function, say H . This result of De Finetti introduces an alternative way to somehow describe the 'opinion' of a group of experts about a continuous random quantity by the construction of Fields of coincidence of opinions (FCO). A Field of coincidence of opinions is a finite union of intervals where the opinions of the experts coincide with respect to that quantity of interest. We speculate on (dis)advantages of Fields of Opinion compared to usual 'probability' measures of a group and on their relation with a continuous version of the well-known Allais' paradox.