Yves Tille - Academia.edu (original) (raw)
Papers by Yves Tille
Biometrika, 2004
The cube method allows the selection of balanced samples on several auxiliary variables with equa... more The cube method allows the selection of balanced samples on several auxiliary variables with equal or unequal inclusion probabilities. Practical implementation of the cube method has raised questions concerning the selection of a multi-phase balanced sampling design, the rebalancing of an unbalanced sampling design by completing it with another sample, the selection of a balanced sample from an unbalanced sample and the coordination of balanced samples. This paper provides a complete solution of all these problems.
Australian <html_ent glyph="@amp;" ascii="&"/> New Zealand Journal of Statistics, 2003
The Montanari (1987) regression estimator is optimal when the population regression coefficients ... more The Montanari (1987) regression estimator is optimal when the population regression coefficients are known. When the coefficients are estimated, the Montanari estimator is not optimal and can be extremely volatile. Using design-based arguments, this paper proposes a simpler and better alternative to the Montanari estimator that is also optimal when the population regression coefficients are known. Moreover, it can be easily implemented as it involves standard weighted least squares. The estimator is applicable under single stage stratified sampling with unequal probabilities within each stratum.
In sample surveys of households and persons, questions about income are often sensitive and thus ... more In sample surveys of households and persons, questions about income are often sensitive and thus subject to a higher non-response rate. Nevertheless, the household or personal incomes are among the important variables in surveys of this type. The distribution of such collected incomes is not normal, neither log-normal. Hypotheses of classical regression models to explain the income (or their log) are not fulfilled. Imputations using such models modify the original and true distribution of the data. This is not suitable and may conduct the user to wrong interpretations of results computed from data imputed in this way. The generalized beta distribution of the second kind (GB2) is a four parameters distribution. Empirical studies have shown that it adapts very well to income data. The advantage of a parametric income distribution is that there exist explicit formulae for the inequality measures like the Laeken indicators as functions of the parameters. We present a parametric method o...
Springer Series in Statistics, 2006
... Hurdle: Smoothing Techniques: With Implementation in S. Harrell: Regression Modeling Strategi... more ... Hurdle: Smoothing Techniques: With Implementation in S. Harrell: Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and ... I am also grateful to Cedric Beguin, Ken Brewer, Lionel Qualite, and Paul-Andre Salamin for their constructive ...
Environmental and Ecological Statistics, 2013
Computational Statistics, 2006
Summary The cube method (Deville &amp; Till 2004) is a large family of algorithms that a... more Summary The cube method (Deville &amp; Till 2004) is a large family of algorithms that allows selecting balanced samples with equal or unequal inclusion probabilities. In this paper, we propose a very fast implementation of the cube method. The execution time does not depend on the square of the population size anymore, but only on the population size. Balanced samples can
Biometrika, 2004
A balanced sampling design is defined by the property that the Horvitz-Thompson estimators of the... more A balanced sampling design is defined by the property that the Horvitz-Thompson estimators of the population totals of a set of auxiliary variables equal the known totals of these variables. Therefore the variances of estimators of totals of all the variables of interest are reduced, depending on the correlations of these variables with the controlled variables. In this paper, we develop a general method, called the cube method, for selecting approximately balanced samples with equal or unequal inclusion probabilities and any number of auxiliary variables.
The American Statistician, 2007
Page 1. This introductory text is very accessibleit does not assume any previous knowledge of st... more Page 1. This introductory text is very accessibleit does not assume any previous knowledge of statistics or design. The presentation style is clear and easy to fol-low, and each chapter concludes with a concise summary. Presented ...
This paper presents a review and assessment of the use of balanced sampling by means of the cube ... more This paper presents a review and assessment of the use of balanced sampling by means of the cube method. After defining the notion of balanced sample and balanced sampling, a short history of the concept of balancing is presented. The theory of the cube method is briefly presented. Emphasis is placed on the practical problems posed by balanced sampling: the interest of the method with respect to other sampling methods and calibration, the field of application, the accuracy of balancing, the choice of auxiliary variables and ways to implement the method
Proceedings of the 2020 the 4th International Conference on Compute and Data Analysis, 2020
Distributed systems increasingly require the processing of large amounts of data, for metrology, ... more Distributed systems increasingly require the processing of large amounts of data, for metrology, safety or security purposes. The online processing of these large data streams requires the development of algorithms to efficiently calculate parameters. If elegant solutions have been proposed recently, their approximation is commonly calculated from the inception of the data stream. In a distributed execution context, it would be preferable to collect information only on the recent past (for resource saving or relevancy of most recent information). We therefore consider here the sliding window model. In this article, we propose a family of new sampling techniques that take into account both the sliding window model and the presence of a malicious adversary. Wayne Fuller proposed in 1970 a very ingenious method of sampling with unequal inclusion probabilities. After doing justice to this precursor paper and proposing a fast and simple implementation of it, we completely generalize Fuller's method in order to enable the use of a tuning parameter of spreading. The analytical results of these techniques show the excellent performance of the generalized pivotal approach. This generalization makes the sampling method less predictable and seems appropriate to be protected from malicious attacks when sampling from a stream.
Statistics, 2016
In order to overcome the problem of item nonresponse, random imputations are often used because t... more In order to overcome the problem of item nonresponse, random imputations are often used because they tend to preserve the distribution of the imputed variable. Among the methods of random imputation, the random hot-deck has the interesting property that the imputed values are observed values. We present a new random method of hot-deck imputation which enables us to select the imputed values such that some balancing equations are satisfied and such that the donors are selected in neighborhoods of the recipients.
Annales d'Économie et de Statistique, 1996
A set of demands is presented which should be satisfied by a good unequal-probability sampling me... more A set of demands is presented which should be satisfied by a good unequal-probability sampling method without replacement. First it is shown that some of these demands are contradictory. In particular, it is shown that a sequential algorithm that ensures strictly positive joint inclusion probabilities does not exist; not does there exist a sequential procedure that yields a result which is not dependent on the order of units in the data file. Next, a way is discussed to build approximations of the joint-inclusion probabilities for sampling design implemented by means of a sequential algorithm and preceded by a random sort of the data file. An original approximation is proposed which resorts to a method of adjustment to marginal totals. Finally, several approximations are compared to systematic sampling, and to Sunter's method.
fin du dix-neuvième siècle. Pourtant, ce concept qui est appliqué autant aux plans par quotas qu'... more fin du dix-neuvième siècle. Pourtant, ce concept qui est appliqué autant aux plans par quotas qu'aux plans probabilistes est largement galvaudé. Après avoir rappelé quelqueséléments de l'histoire de la théorie des sondages, nous rappelons quelques techniques de base de plans aléatoires età choix raisonnés. Nous montrons ensuite que le concept de planéquilibré permet de lever les ambiguïtés fondamentales de la notion de représentativité.
In this short note, we show that simple random sampling without replacement and Bernoulli samplin... more In this short note, we show that simple random sampling without replacement and Bernoulli sampling have approximately the same entropy when the population size is large. An empirical example is given as an illustration.
Biometrika, 2004
The cube method allows the selection of balanced samples on several auxiliary variables with equa... more The cube method allows the selection of balanced samples on several auxiliary variables with equal or unequal inclusion probabilities. Practical implementation of the cube method has raised questions concerning the selection of a multi-phase balanced sampling design, the rebalancing of an unbalanced sampling design by completing it with another sample, the selection of a balanced sample from an unbalanced sample and the coordination of balanced samples. This paper provides a complete solution of all these problems.
Australian <html_ent glyph="@amp;" ascii="&"/> New Zealand Journal of Statistics, 2003
The Montanari (1987) regression estimator is optimal when the population regression coefficients ... more The Montanari (1987) regression estimator is optimal when the population regression coefficients are known. When the coefficients are estimated, the Montanari estimator is not optimal and can be extremely volatile. Using design-based arguments, this paper proposes a simpler and better alternative to the Montanari estimator that is also optimal when the population regression coefficients are known. Moreover, it can be easily implemented as it involves standard weighted least squares. The estimator is applicable under single stage stratified sampling with unequal probabilities within each stratum.
In sample surveys of households and persons, questions about income are often sensitive and thus ... more In sample surveys of households and persons, questions about income are often sensitive and thus subject to a higher non-response rate. Nevertheless, the household or personal incomes are among the important variables in surveys of this type. The distribution of such collected incomes is not normal, neither log-normal. Hypotheses of classical regression models to explain the income (or their log) are not fulfilled. Imputations using such models modify the original and true distribution of the data. This is not suitable and may conduct the user to wrong interpretations of results computed from data imputed in this way. The generalized beta distribution of the second kind (GB2) is a four parameters distribution. Empirical studies have shown that it adapts very well to income data. The advantage of a parametric income distribution is that there exist explicit formulae for the inequality measures like the Laeken indicators as functions of the parameters. We present a parametric method o...
Springer Series in Statistics, 2006
... Hurdle: Smoothing Techniques: With Implementation in S. Harrell: Regression Modeling Strategi... more ... Hurdle: Smoothing Techniques: With Implementation in S. Harrell: Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and ... I am also grateful to Cedric Beguin, Ken Brewer, Lionel Qualite, and Paul-Andre Salamin for their constructive ...
Environmental and Ecological Statistics, 2013
Computational Statistics, 2006
Summary The cube method (Deville &amp; Till 2004) is a large family of algorithms that a... more Summary The cube method (Deville &amp; Till 2004) is a large family of algorithms that allows selecting balanced samples with equal or unequal inclusion probabilities. In this paper, we propose a very fast implementation of the cube method. The execution time does not depend on the square of the population size anymore, but only on the population size. Balanced samples can
Biometrika, 2004
A balanced sampling design is defined by the property that the Horvitz-Thompson estimators of the... more A balanced sampling design is defined by the property that the Horvitz-Thompson estimators of the population totals of a set of auxiliary variables equal the known totals of these variables. Therefore the variances of estimators of totals of all the variables of interest are reduced, depending on the correlations of these variables with the controlled variables. In this paper, we develop a general method, called the cube method, for selecting approximately balanced samples with equal or unequal inclusion probabilities and any number of auxiliary variables.
The American Statistician, 2007
Page 1. This introductory text is very accessibleit does not assume any previous knowledge of st... more Page 1. This introductory text is very accessibleit does not assume any previous knowledge of statistics or design. The presentation style is clear and easy to fol-low, and each chapter concludes with a concise summary. Presented ...
This paper presents a review and assessment of the use of balanced sampling by means of the cube ... more This paper presents a review and assessment of the use of balanced sampling by means of the cube method. After defining the notion of balanced sample and balanced sampling, a short history of the concept of balancing is presented. The theory of the cube method is briefly presented. Emphasis is placed on the practical problems posed by balanced sampling: the interest of the method with respect to other sampling methods and calibration, the field of application, the accuracy of balancing, the choice of auxiliary variables and ways to implement the method
Proceedings of the 2020 the 4th International Conference on Compute and Data Analysis, 2020
Distributed systems increasingly require the processing of large amounts of data, for metrology, ... more Distributed systems increasingly require the processing of large amounts of data, for metrology, safety or security purposes. The online processing of these large data streams requires the development of algorithms to efficiently calculate parameters. If elegant solutions have been proposed recently, their approximation is commonly calculated from the inception of the data stream. In a distributed execution context, it would be preferable to collect information only on the recent past (for resource saving or relevancy of most recent information). We therefore consider here the sliding window model. In this article, we propose a family of new sampling techniques that take into account both the sliding window model and the presence of a malicious adversary. Wayne Fuller proposed in 1970 a very ingenious method of sampling with unequal inclusion probabilities. After doing justice to this precursor paper and proposing a fast and simple implementation of it, we completely generalize Fuller's method in order to enable the use of a tuning parameter of spreading. The analytical results of these techniques show the excellent performance of the generalized pivotal approach. This generalization makes the sampling method less predictable and seems appropriate to be protected from malicious attacks when sampling from a stream.
Statistics, 2016
In order to overcome the problem of item nonresponse, random imputations are often used because t... more In order to overcome the problem of item nonresponse, random imputations are often used because they tend to preserve the distribution of the imputed variable. Among the methods of random imputation, the random hot-deck has the interesting property that the imputed values are observed values. We present a new random method of hot-deck imputation which enables us to select the imputed values such that some balancing equations are satisfied and such that the donors are selected in neighborhoods of the recipients.
Annales d'Économie et de Statistique, 1996
A set of demands is presented which should be satisfied by a good unequal-probability sampling me... more A set of demands is presented which should be satisfied by a good unequal-probability sampling method without replacement. First it is shown that some of these demands are contradictory. In particular, it is shown that a sequential algorithm that ensures strictly positive joint inclusion probabilities does not exist; not does there exist a sequential procedure that yields a result which is not dependent on the order of units in the data file. Next, a way is discussed to build approximations of the joint-inclusion probabilities for sampling design implemented by means of a sequential algorithm and preceded by a random sort of the data file. An original approximation is proposed which resorts to a method of adjustment to marginal totals. Finally, several approximations are compared to systematic sampling, and to Sunter's method.
fin du dix-neuvième siècle. Pourtant, ce concept qui est appliqué autant aux plans par quotas qu'... more fin du dix-neuvième siècle. Pourtant, ce concept qui est appliqué autant aux plans par quotas qu'aux plans probabilistes est largement galvaudé. Après avoir rappelé quelqueséléments de l'histoire de la théorie des sondages, nous rappelons quelques techniques de base de plans aléatoires età choix raisonnés. Nous montrons ensuite que le concept de planéquilibré permet de lever les ambiguïtés fondamentales de la notion de représentativité.
In this short note, we show that simple random sampling without replacement and Bernoulli samplin... more In this short note, we show that simple random sampling without replacement and Bernoulli sampling have approximately the same entropy when the population size is large. An empirical example is given as an illustration.