Dimitris Karlis | Athens University of Economics and Business (original) (raw)
Papers by Dimitris Karlis
Handbook of Mixture Analysis, 2019
In recent days, statistical methods are applied in a vast extent of different disciplines by a lo... more In recent days, statistical methods are applied in a vast extent of different disciplines by a lot of people that, very often, have a rather little knowledge of statistics. This fact can lead to misuse of statistical methods, mainly due to the fact that the users are not able to evaluate the appropriateness and the applicability of the selected methodology, although these methods are rather simple. In addition, these users can be terrified by a statistical output with which they are not familiar. The statistical community has recognized this fact and now statistical packages try to offer some guidance to their users. This paper presents the design of a statistical package, oriented to users with medium or little knowledge of statistics. This package enhances some kind of ‘expertise’ in order to prevent the users from statistical maltreatment. The results of a survey about the user requirements of such software are presented. The general architecture and the functionality of such a s...
Accident Analysis & Prevention, 2018
Studies analyzing the temporary repercussions of motor vehicle accidents are scarcer than those a... more Studies analyzing the temporary repercussions of motor vehicle accidents are scarcer than those analyzing permanent injuries or mortality. A regres sion model to evaluate the risk factors affecting the duration of temporary disability after injury in such an accident is constructed using a motor in surance dataset. The length of non-hospitalization medical leave, measured in days, following a motor accident is used here as a measure of the severity of temporary disability. The probability function of the number of days of sick leave presents spikes in multiples of five (working week), seven (calen dar week) and thirty (month), etc. To account for this, a regression model based on finite mixtures of multiple discrete distributions is proposed to fit the data properly. The model provides a very good fit when the multiples for the working week, week, fortnight and month are taken into account. Victim characteristics of gender and age and accident characteristics of the road user type, vehicle class and the severity of permanent injuries were found to be significant when accounting for the duration of temporary disability.
ESMO open, 2017
The European Society for Medical Oncology (ESMO) has developed the ESMO Magnitude of Clinical Ben... more The European Society for Medical Oncology (ESMO) has developed the ESMO Magnitude of Clinical Benefit Scale (ESMO-MCBS), a tool to assess the magnitude of clinical benefit from new cancer therapies. Grading is guided by a dual rule comparing the relative benefit (RB) and the absolute benefit (AB) achieved by the therapy to prespecified threshold values. The ESMO-MCBS v1.0 dual rule evaluates the RB of an experimental treatment based on the lower limit of the 95%CI (LL95%CI) for the hazard ratio (HR) along with an AB threshold. This dual rule addresses two goals: inclusiveness: not unfairly penalising experimental treatments from trials designed with adequate power targeting clinically meaningful relative benefit; and discernment: penalising trials designed to detect a small inconsequential benefit. Based on 50 000 simulations of plausible trial scenarios, the sensitivity and specificity of the LL95%CI rule and the ESMO-MCBS dual rule, the robustness of their characteristics for reas...
Cancer Research, 2012
BACKGROUND: Disease-free survival (DFS) is often a primary endpoint of randomized trials of adjuv... more BACKGROUND: Disease-free survival (DFS) is often a primary endpoint of randomized trials of adjuvant therapies for breast cancer, but long-term follow-up of DFS and especially overall survival (OS) remain important. When the primary DFS results favor the experimental arm, patients (pts) assigned to the control group may select the option to crossover to receive the experimental treatment via protocol amendment. Such “selective crossover” disturbs the integrity of the randomized comparison for any efficacy endpoints that rely on further follow-up. Selective crossover, which is motivated by positive results having been observed in the current trial, is distinct from so-called “unplanned crossover,” which refers to non-adherence to protocol. In this abstract, we discuss the consequences of selective crossover for trials evaluating adjuvant trastuzumab, using the HERA (HERceptin Adjuvant) trial as an example, and present a variety of alternative analysis approaches. METHODS: HERA enroll...
Statistical Modelling, 2014
While models for integer valued time series are now abundant, there is a shortage of similar mode... more While models for integer valued time series are now abundant, there is a shortage of similar models when the time series refer to data defined on Z, i.e., in both the positive and negative integers. Such data occur in certain disciplines and the need for such models also appear when taking differences of a positive integer count time series. In addition one would often like to include covariates to explain variations in the variable of interest. In this article we construct a model doing all these assuming a specific innovation distribution and provide fully parametric inference, including prediction. Real data applications on accidents and financial returns are given. Finally we also discuss alternative models and extensions.
Statistical Methodology, 2009
Goodness-of-fit tests for the family of symmetric normal inverse Gaussian distributions are const... more Goodness-of-fit tests for the family of symmetric normal inverse Gaussian distributions are constructed. The tests are based on a weighted integral incorporating the empirical characteristic function of suitably standardized data. An EM-type algorithm is employed for the estimation of the parameters involved in the test statistic. Monte Carlo results show that the new procedure is competitive with classical goodness-of-fit methods.
Pure and Applied Geophysics, 2010
Poisson Hidden Markov models (PHMMs) are introduced to model temporal seismicity changes. In a PH... more Poisson Hidden Markov models (PHMMs) are introduced to model temporal seismicity changes. In a PHMM the unobserved sequence of states is a finite-state Markov chain and the distribution of the obser vation at any time is Poisson with rate depending only on the current state of the chain. Thus, PHMMs allow a region to have varying seis micity rate. We applied the PHMM to model earthquake frequencies in the seismogenic area of Killini, Ionian Sea, Greece, in the period 1990-2006. Simulations of data from the assumed model showed that it describes quite well the true data. The earthquake catalogue is dom inated by mainshocks occurring in 1993, 1997 and 2002. The time plot of PHMM seismicity states not only reproduces the three seismicity clusters but also quantifies the seismicity level and underlies the de gree of strength of the serial dependence of the events at any point 1 PAGEOPH UNDER PUBLICATION Identifying Seismicity levels using PHMMs of time. Foreshock activity becomes quite evident before the three sequences with the gradual transition to states of cascade seismicity. Traditional analysis, based on the determination of highly significant changes of seismicity rates, failed to recognize foreshocks before the 1997 mainshock due to the low number of events preceding that mainshock. Then, PHMM has better performance than traditional analysis since the transition from one state to another does not only depend on the total number of events involved but also on the current state of the system. Therefore, PHMM recognizes significant changes of seismicity soon after they start, which is of particular importance for real-time recognition of foreshock activities and other seismicity changes.
Journal of Time Series Analysis, 2012
In several circumstances the collected data are counts observed in different time points, while t... more In several circumstances the collected data are counts observed in different time points, while the counts at each time point are correlated. Current models are able to account for serial correlation but usually fail to account for cross‐correlation. Motivated by the lack of appropriate tools for handling such type of data, we define a multivariate integer‐valued autoregressive process of order 1 (MINAR(1)) and examine its basic statistical properties. Apart from the general specification of the MINAR(1) process, we also study two specific parametric cases that arise under the assumptions of a multivariate Poisson and a multivariate negative binomial distribution for the innovations of the process. To overcome the computational difficulties of the maximum likelihood approach we suggest the method of composite likelihood. The performance of the two methods of estimation, that is, maximum likelihood and composite likelihood, is compared through a small simulation experiment. Extensions of the time‐invariant model to a regression model are also discussed. The proposed model is applied to a trivariate data series related to daily traffic accidents in three areas in the Netherlands.
Journal of Statistical Computation and Simulation, 2013
Goodness-of-fit tests for the family of the four-parameter normal–variance gamma distribution are... more Goodness-of-fit tests for the family of the four-parameter normal–variance gamma distribution are constructed. The tests are based on a weighted integral incorporating the empirical characteristic function of suitably standardized data. Non-standard algorithms are employed for the computation of the maximum-likelihood estimators of the parameters involved in the test statistic, while Monte Carlo results are used in order to compare the
Journal of Biopharmaceutical Statistics, 2014
The main goal of a Phase II clinical trial is to decide, whether a particular therapeutic regimen... more The main goal of a Phase II clinical trial is to decide, whether a particular therapeutic regimen is effective enough to warrant further study. The hypothesis tested by Fleming's Phase II design (Fleming, 1982) is [Formula: see text] versus [Formula: see text], with level [Formula: see text] and with a power [Formula: see text] at [Formula: see text], where [Formula: see text] is chosen to represent the response probability achievable with standard treatment and [Formula: see text] is chosen such that the difference [Formula: see text] represents a targeted improvement with the new treatment. This hypothesis creates a misinterpretation mainly among clinicians that rejection of the null hypothesis is tantamount to accepting the alternative, and vice versa. As mentioned by Storer (1992), this introduces ambiguity in the evaluation of type I and II errors and the choice of the appropriate decision at the end of the study. Instead of testing this hypothesis, an alternative class of designs is proposed in which two hypotheses are tested sequentially. The hypothesis [Formula: see text] versus [Formula: see text] is tested first. If this null hypothesis is rejected, the hypothesis [Formula: see text] versus [Formula: see text] is tested next, in order to examine whether the therapy is effective enough to consider further testing in a Phase III study. For the derivation of the proposed design the exact binomial distribution is used to calculate the decision cut-points. The optimal design parameters are chosen, so as to minimize the average sample number (ASN) under specific upper bounds for error levels. The optimal values for the design were found using a simulated annealing method.
Computational Statistics & Data Analysis, 2013
INteger-valued AutoRegressive (INAR) processes are common choices for modeling non-negative discr... more INteger-valued AutoRegressive (INAR) processes are common choices for modeling non-negative discrete valued time series. In this framework and motivated by the frequent occurrence of multivariate count time series data in several different disciplines, a generalized specification of the bivariate INAR(1) (BINAR(1)) model is considered. In this new, full BINAR(1) process, dependence between the two series stems from two sources simultaneously. The main focus is on the specific parametric case that arises under the assumption of a bivariate Poisson distribution for the innovations of the process. As it is shown, such an assumption gives rise to a Hermite BINAR(1) process. The method of conditional maximum likelihood is suggested for the estimation of its unknown parameters. A short application on financial count data illustrates the model.
Computational Statistics, 2014
ABSTRACT Goodness-of-fit tests are proposed for the innovation distribution in INAR models. The t... more ABSTRACT Goodness-of-fit tests are proposed for the innovation distribution in INAR models. The test statistics incorporate the joint probability generating function of the observations. Special emphasis is given to the INAR(1) model and particular instances of the procedures which involve innovations from the general family of Poisson stopped-sum distributions. A Monte Carlo power study of a bootstrap version of the test statistic is included as well as a real data example. Generalizations of the proposed methods are also discussed.
Communications in Statistics - Theory and Methods, 2013
ABSTRACT Multivariate count time series data occur in many different disciplines. The class of IN... more ABSTRACT Multivariate count time series data occur in many different disciplines. The class of INteger-valued AutoRegressive (INAR) processes has the great advantage to consider explicitly both the discreteness and autocorrelation characterizing this type of data. Moreover, extensions of the simple INAR(1) model to the multi-dimensional space make it possible to model more than one series simultaneously. However, existing models do not offer great flexibility for dependence modelling, allowing only for positive correlation. In this work, we consider a bivariate INAR(1) (BINAR(1)) process where cross-correlation is introduced through the use of copulas for the specification of the joint distribution of the innovations. We mainly emphasize on the parametric case that arises under the assumption of Poisson marginals. Other marginal distributions are also considered. A short application on a bivariate financial count series illustrates the model.
Communications in Statistics - Simulation and Computation, 2014
ABSTRACT The use of mixture models for clustering purposes has been considerably increased the la... more ABSTRACT The use of mixture models for clustering purposes has been considerably increased the last years primarily due to the existence of efficient computational methods that facilitate estimation. Nowadays, there are several clustering procedures based on mixtures for certain types of data. On the other hand, copulas are becoming very popular models to model dependencies as one of their appealing properties is the separation of the marginal properties of the data from the dependence properties. The purpose of this article is to put together the two distinct ideas, namely mixtures and copulas, so as to use mixtures of copulas aiming at using them for clustering with respect to the dependence properties of the data. This is accomplished by considering finite mixture of different copulas to represent different dependence structures. We provide properties of the derived models along with the description of an estimation method using an EM algorithm based on the standard approach for mixture models. Using daily returns from major stock markets, we illustrate the potential of our method.
Communications in Statistics - Simulation and Computation, 2013
ABSTRACT In a recent article, Pedeli and Karlis (2010) examined the extension of the classical In... more ABSTRACT In a recent article, Pedeli and Karlis (2010) examined the extension of the classical Integer–valued Autoregressive (INAR) model to the bivariate case. In the present article, we examine estimation methods for the case of bivariate Poisson innovations. This is a simple extension of the classical INAR model allowing for two discrete valued time series to be correlated. Properties of different estimators are given. We also compare their properties via a small simulation experiment. Extensions to incorporate covariate information is discussed. A real data application is also provided.
ABSTRACT The problem of building bootstrap confidence intervals for probabilities P(X 2 J) when X... more ABSTRACT The problem of building bootstrap confidence intervals for probabilities P(X 2 J) when X is a discrete random variable taking values in {0,1,2,...} is considered. The set J is finite while the support of X is supposed infinite. The true probability distribution generating the independent observations is assumed to be an unknown infinite mixture of a given family of power series distributions. The mixing distribution is estimated by nonparametric maximum likelihood and the corresponding mixture is used for resampling. We build percentile t and Efron percentile bootstrap confidence intervals for which we prove consistency. As a by-product of our methodology, we obtain bootstrap intervals for the hazard rate (k) = P (X = k)/P (X k). The theoretical results are supported by
Handbook of Mixture Analysis, 2019
In recent days, statistical methods are applied in a vast extent of different disciplines by a lo... more In recent days, statistical methods are applied in a vast extent of different disciplines by a lot of people that, very often, have a rather little knowledge of statistics. This fact can lead to misuse of statistical methods, mainly due to the fact that the users are not able to evaluate the appropriateness and the applicability of the selected methodology, although these methods are rather simple. In addition, these users can be terrified by a statistical output with which they are not familiar. The statistical community has recognized this fact and now statistical packages try to offer some guidance to their users. This paper presents the design of a statistical package, oriented to users with medium or little knowledge of statistics. This package enhances some kind of ‘expertise’ in order to prevent the users from statistical maltreatment. The results of a survey about the user requirements of such software are presented. The general architecture and the functionality of such a s...
Accident Analysis & Prevention, 2018
Studies analyzing the temporary repercussions of motor vehicle accidents are scarcer than those a... more Studies analyzing the temporary repercussions of motor vehicle accidents are scarcer than those analyzing permanent injuries or mortality. A regres sion model to evaluate the risk factors affecting the duration of temporary disability after injury in such an accident is constructed using a motor in surance dataset. The length of non-hospitalization medical leave, measured in days, following a motor accident is used here as a measure of the severity of temporary disability. The probability function of the number of days of sick leave presents spikes in multiples of five (working week), seven (calen dar week) and thirty (month), etc. To account for this, a regression model based on finite mixtures of multiple discrete distributions is proposed to fit the data properly. The model provides a very good fit when the multiples for the working week, week, fortnight and month are taken into account. Victim characteristics of gender and age and accident characteristics of the road user type, vehicle class and the severity of permanent injuries were found to be significant when accounting for the duration of temporary disability.
ESMO open, 2017
The European Society for Medical Oncology (ESMO) has developed the ESMO Magnitude of Clinical Ben... more The European Society for Medical Oncology (ESMO) has developed the ESMO Magnitude of Clinical Benefit Scale (ESMO-MCBS), a tool to assess the magnitude of clinical benefit from new cancer therapies. Grading is guided by a dual rule comparing the relative benefit (RB) and the absolute benefit (AB) achieved by the therapy to prespecified threshold values. The ESMO-MCBS v1.0 dual rule evaluates the RB of an experimental treatment based on the lower limit of the 95%CI (LL95%CI) for the hazard ratio (HR) along with an AB threshold. This dual rule addresses two goals: inclusiveness: not unfairly penalising experimental treatments from trials designed with adequate power targeting clinically meaningful relative benefit; and discernment: penalising trials designed to detect a small inconsequential benefit. Based on 50 000 simulations of plausible trial scenarios, the sensitivity and specificity of the LL95%CI rule and the ESMO-MCBS dual rule, the robustness of their characteristics for reas...
Cancer Research, 2012
BACKGROUND: Disease-free survival (DFS) is often a primary endpoint of randomized trials of adjuv... more BACKGROUND: Disease-free survival (DFS) is often a primary endpoint of randomized trials of adjuvant therapies for breast cancer, but long-term follow-up of DFS and especially overall survival (OS) remain important. When the primary DFS results favor the experimental arm, patients (pts) assigned to the control group may select the option to crossover to receive the experimental treatment via protocol amendment. Such “selective crossover” disturbs the integrity of the randomized comparison for any efficacy endpoints that rely on further follow-up. Selective crossover, which is motivated by positive results having been observed in the current trial, is distinct from so-called “unplanned crossover,” which refers to non-adherence to protocol. In this abstract, we discuss the consequences of selective crossover for trials evaluating adjuvant trastuzumab, using the HERA (HERceptin Adjuvant) trial as an example, and present a variety of alternative analysis approaches. METHODS: HERA enroll...
Statistical Modelling, 2014
While models for integer valued time series are now abundant, there is a shortage of similar mode... more While models for integer valued time series are now abundant, there is a shortage of similar models when the time series refer to data defined on Z, i.e., in both the positive and negative integers. Such data occur in certain disciplines and the need for such models also appear when taking differences of a positive integer count time series. In addition one would often like to include covariates to explain variations in the variable of interest. In this article we construct a model doing all these assuming a specific innovation distribution and provide fully parametric inference, including prediction. Real data applications on accidents and financial returns are given. Finally we also discuss alternative models and extensions.
Statistical Methodology, 2009
Goodness-of-fit tests for the family of symmetric normal inverse Gaussian distributions are const... more Goodness-of-fit tests for the family of symmetric normal inverse Gaussian distributions are constructed. The tests are based on a weighted integral incorporating the empirical characteristic function of suitably standardized data. An EM-type algorithm is employed for the estimation of the parameters involved in the test statistic. Monte Carlo results show that the new procedure is competitive with classical goodness-of-fit methods.
Pure and Applied Geophysics, 2010
Poisson Hidden Markov models (PHMMs) are introduced to model temporal seismicity changes. In a PH... more Poisson Hidden Markov models (PHMMs) are introduced to model temporal seismicity changes. In a PHMM the unobserved sequence of states is a finite-state Markov chain and the distribution of the obser vation at any time is Poisson with rate depending only on the current state of the chain. Thus, PHMMs allow a region to have varying seis micity rate. We applied the PHMM to model earthquake frequencies in the seismogenic area of Killini, Ionian Sea, Greece, in the period 1990-2006. Simulations of data from the assumed model showed that it describes quite well the true data. The earthquake catalogue is dom inated by mainshocks occurring in 1993, 1997 and 2002. The time plot of PHMM seismicity states not only reproduces the three seismicity clusters but also quantifies the seismicity level and underlies the de gree of strength of the serial dependence of the events at any point 1 PAGEOPH UNDER PUBLICATION Identifying Seismicity levels using PHMMs of time. Foreshock activity becomes quite evident before the three sequences with the gradual transition to states of cascade seismicity. Traditional analysis, based on the determination of highly significant changes of seismicity rates, failed to recognize foreshocks before the 1997 mainshock due to the low number of events preceding that mainshock. Then, PHMM has better performance than traditional analysis since the transition from one state to another does not only depend on the total number of events involved but also on the current state of the system. Therefore, PHMM recognizes significant changes of seismicity soon after they start, which is of particular importance for real-time recognition of foreshock activities and other seismicity changes.
Journal of Time Series Analysis, 2012
In several circumstances the collected data are counts observed in different time points, while t... more In several circumstances the collected data are counts observed in different time points, while the counts at each time point are correlated. Current models are able to account for serial correlation but usually fail to account for cross‐correlation. Motivated by the lack of appropriate tools for handling such type of data, we define a multivariate integer‐valued autoregressive process of order 1 (MINAR(1)) and examine its basic statistical properties. Apart from the general specification of the MINAR(1) process, we also study two specific parametric cases that arise under the assumptions of a multivariate Poisson and a multivariate negative binomial distribution for the innovations of the process. To overcome the computational difficulties of the maximum likelihood approach we suggest the method of composite likelihood. The performance of the two methods of estimation, that is, maximum likelihood and composite likelihood, is compared through a small simulation experiment. Extensions of the time‐invariant model to a regression model are also discussed. The proposed model is applied to a trivariate data series related to daily traffic accidents in three areas in the Netherlands.
Journal of Statistical Computation and Simulation, 2013
Goodness-of-fit tests for the family of the four-parameter normal–variance gamma distribution are... more Goodness-of-fit tests for the family of the four-parameter normal–variance gamma distribution are constructed. The tests are based on a weighted integral incorporating the empirical characteristic function of suitably standardized data. Non-standard algorithms are employed for the computation of the maximum-likelihood estimators of the parameters involved in the test statistic, while Monte Carlo results are used in order to compare the
Journal of Biopharmaceutical Statistics, 2014
The main goal of a Phase II clinical trial is to decide, whether a particular therapeutic regimen... more The main goal of a Phase II clinical trial is to decide, whether a particular therapeutic regimen is effective enough to warrant further study. The hypothesis tested by Fleming's Phase II design (Fleming, 1982) is [Formula: see text] versus [Formula: see text], with level [Formula: see text] and with a power [Formula: see text] at [Formula: see text], where [Formula: see text] is chosen to represent the response probability achievable with standard treatment and [Formula: see text] is chosen such that the difference [Formula: see text] represents a targeted improvement with the new treatment. This hypothesis creates a misinterpretation mainly among clinicians that rejection of the null hypothesis is tantamount to accepting the alternative, and vice versa. As mentioned by Storer (1992), this introduces ambiguity in the evaluation of type I and II errors and the choice of the appropriate decision at the end of the study. Instead of testing this hypothesis, an alternative class of designs is proposed in which two hypotheses are tested sequentially. The hypothesis [Formula: see text] versus [Formula: see text] is tested first. If this null hypothesis is rejected, the hypothesis [Formula: see text] versus [Formula: see text] is tested next, in order to examine whether the therapy is effective enough to consider further testing in a Phase III study. For the derivation of the proposed design the exact binomial distribution is used to calculate the decision cut-points. The optimal design parameters are chosen, so as to minimize the average sample number (ASN) under specific upper bounds for error levels. The optimal values for the design were found using a simulated annealing method.
Computational Statistics & Data Analysis, 2013
INteger-valued AutoRegressive (INAR) processes are common choices for modeling non-negative discr... more INteger-valued AutoRegressive (INAR) processes are common choices for modeling non-negative discrete valued time series. In this framework and motivated by the frequent occurrence of multivariate count time series data in several different disciplines, a generalized specification of the bivariate INAR(1) (BINAR(1)) model is considered. In this new, full BINAR(1) process, dependence between the two series stems from two sources simultaneously. The main focus is on the specific parametric case that arises under the assumption of a bivariate Poisson distribution for the innovations of the process. As it is shown, such an assumption gives rise to a Hermite BINAR(1) process. The method of conditional maximum likelihood is suggested for the estimation of its unknown parameters. A short application on financial count data illustrates the model.
Computational Statistics, 2014
ABSTRACT Goodness-of-fit tests are proposed for the innovation distribution in INAR models. The t... more ABSTRACT Goodness-of-fit tests are proposed for the innovation distribution in INAR models. The test statistics incorporate the joint probability generating function of the observations. Special emphasis is given to the INAR(1) model and particular instances of the procedures which involve innovations from the general family of Poisson stopped-sum distributions. A Monte Carlo power study of a bootstrap version of the test statistic is included as well as a real data example. Generalizations of the proposed methods are also discussed.
Communications in Statistics - Theory and Methods, 2013
ABSTRACT Multivariate count time series data occur in many different disciplines. The class of IN... more ABSTRACT Multivariate count time series data occur in many different disciplines. The class of INteger-valued AutoRegressive (INAR) processes has the great advantage to consider explicitly both the discreteness and autocorrelation characterizing this type of data. Moreover, extensions of the simple INAR(1) model to the multi-dimensional space make it possible to model more than one series simultaneously. However, existing models do not offer great flexibility for dependence modelling, allowing only for positive correlation. In this work, we consider a bivariate INAR(1) (BINAR(1)) process where cross-correlation is introduced through the use of copulas for the specification of the joint distribution of the innovations. We mainly emphasize on the parametric case that arises under the assumption of Poisson marginals. Other marginal distributions are also considered. A short application on a bivariate financial count series illustrates the model.
Communications in Statistics - Simulation and Computation, 2014
ABSTRACT The use of mixture models for clustering purposes has been considerably increased the la... more ABSTRACT The use of mixture models for clustering purposes has been considerably increased the last years primarily due to the existence of efficient computational methods that facilitate estimation. Nowadays, there are several clustering procedures based on mixtures for certain types of data. On the other hand, copulas are becoming very popular models to model dependencies as one of their appealing properties is the separation of the marginal properties of the data from the dependence properties. The purpose of this article is to put together the two distinct ideas, namely mixtures and copulas, so as to use mixtures of copulas aiming at using them for clustering with respect to the dependence properties of the data. This is accomplished by considering finite mixture of different copulas to represent different dependence structures. We provide properties of the derived models along with the description of an estimation method using an EM algorithm based on the standard approach for mixture models. Using daily returns from major stock markets, we illustrate the potential of our method.
Communications in Statistics - Simulation and Computation, 2013
ABSTRACT In a recent article, Pedeli and Karlis (2010) examined the extension of the classical In... more ABSTRACT In a recent article, Pedeli and Karlis (2010) examined the extension of the classical Integer–valued Autoregressive (INAR) model to the bivariate case. In the present article, we examine estimation methods for the case of bivariate Poisson innovations. This is a simple extension of the classical INAR model allowing for two discrete valued time series to be correlated. Properties of different estimators are given. We also compare their properties via a small simulation experiment. Extensions to incorporate covariate information is discussed. A real data application is also provided.
ABSTRACT The problem of building bootstrap confidence intervals for probabilities P(X 2 J) when X... more ABSTRACT The problem of building bootstrap confidence intervals for probabilities P(X 2 J) when X is a discrete random variable taking values in {0,1,2,...} is considered. The set J is finite while the support of X is supposed infinite. The true probability distribution generating the independent observations is assumed to be an unknown infinite mixture of a given family of power series distributions. The mixing distribution is estimated by nonparametric maximum likelihood and the corresponding mixture is used for resampling. We build percentile t and Efron percentile bootstrap confidence intervals for which we prove consistency. As a by-product of our methodology, we obtain bootstrap intervals for the hazard rate (k) = P (X = k)/P (X k). The theoretical results are supported by