Yves Tille - Academia.edu (original) (raw)

Papers by Yves Tille

Research paper thumbnail of Coordination, combination and extension of balanced samples

Biometrika, 2004

The cube method allows the selection of balanced samples on several auxiliary variables with equa... more The cube method allows the selection of balanced samples on several auxiliary variables with equal or unequal inclusion probabilities. Practical implementation of the cube method has raised questions concerning the selection of a multi-phase balanced sampling design, the rebalancing of an unbalanced sampling design by completing it with another sample, the selection of a balanced sample from an unbalanced sample and the coordination of balanced samples. This paper provides a complete solution of all these problems.

Research paper thumbnail of Towards optimal regression estimation in sample surveys

Australian <html_ent glyph="@amp;" ascii="&"/> New Zealand Journal of Statistics, 2003

The Montanari (1987) regression estimator is optimal when the population regression coefficients ... more The Montanari (1987) regression estimator is optimal when the population regression coefficients are known. When the coefficients are estimated, the Montanari estimator is not optimal and can be extremely volatile. Using design-based arguments, this paper proposes a simpler and better alternative to the Montanari estimator that is also optimal when the population regression coefficients are known. Moreover, it can be easily implemented as it involves standard weighted least squares. The estimator is applicable under single stage stratified sampling with unequal probabilities within each stratum.

Research paper thumbnail of Exercises and Solutions

Research paper thumbnail of Imputation of income data with generalized calibration procedure and GB 2 law : illustration with SILC data

In sample surveys of households and persons, questions about income are often sensitive and thus ... more In sample surveys of households and persons, questions about income are often sensitive and thus subject to a higher non-response rate. Nevertheless, the household or personal incomes are among the important variables in surveys of this type. The distribution of such collected incomes is not normal, neither log-normal. Hypotheses of classical regression models to explain the income (or their log) are not fulfilled. Imputations using such models modify the original and true distribution of the data. This is not suitable and may conduct the user to wrong interpretations of results computed from data imputed in this way. The generalized beta distribution of the second kind (GB2) is a four parameters distribution. Empirical studies have shown that it adapts very well to income data. The advantage of a parametric income distribution is that there exist explicit formulae for the inequality measures like the Laeken indicators as functions of the parameters. We present a parametric method o...

Research paper thumbnail of Sampling Algorithms

Springer Series in Statistics, 2006

... Hurdle: Smoothing Techniques: With Implementation in S. Harrell: Regression Modeling Strategi... more ... Hurdle: Smoothing Techniques: With Implementation in S. Harrell: Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and ... I am also grateful to Cedric Beguin, Ken Brewer, Lionel Qualite, and Paul-Andre Salamin for their constructive ...

Research paper thumbnail of Calcul De La Pr�cision Des Estimations Longitudinales Dans L'Enqu�te Suisse Sur La Valeur Ajout�e

Research paper thumbnail of Balanced sampling by means of the cube method

Research paper thumbnail of Use of auxiliary information in survey sampling

Research paper thumbnail of Estimation sans biais par calage sur la repartition dans les plans simples sans remise

Research paper thumbnail of Complex national sampling design for long-term monitoring of protected dry grasslands in Switzerland

Environmental and Ecological Statistics, 2013

Research paper thumbnail of A fast algorithm for balanced sampling

Computational Statistics, 2006

Summary The cube method (Deville &amp;amp; Till 2004) is a large family of algorithms that a... more Summary The cube method (Deville &amp;amp; Till 2004) is a large family of algorithms that allows selecting balanced samples with equal or unequal inclusion probabilities. In this paper, we propose a very fast implementation of the cube method. The execution time does not depend on the square of the population size anymore, but only on the population size. Balanced samples can

Research paper thumbnail of Efficient balanced sampling: The cube method

Biometrika, 2004

A balanced sampling design is defined by the property that the Horvitz-Thompson estimators of the... more A balanced sampling design is defined by the property that the Horvitz-Thompson estimators of the population totals of a set of auxiliary variables equal the known totals of these variables. Therefore the variances of estimators of totals of all the variables of interest are reduced, depending on the correlations of these variables with the controlled variables. In this paper, we develop a general method, called the cube method, for selecting approximately balanced samples with equal or unequal inclusion probabilities and any number of auxiliary variables.

Research paper thumbnail of Sampling Methods: Exercises and Solutions

The American Statistician, 2007

Page 1. This introductory text is very accessible—it does not assume any previous knowledge of st... more Page 1. This introductory text is very accessible—it does not assume any previous knowledge of statistics or design. The presentation style is clear and easy to fol-low, and each chapter concludes with a concise summary. Presented ...

Research paper thumbnail of Ten years of balanced sampling with the cube method: An appraisal

This paper presents a review and assessment of the use of balanced sampling by means of the cube ... more This paper presents a review and assessment of the use of balanced sampling by means of the cube method. After defining the notion of balanced sample and balanced sampling, a short history of the concept of balancing is presented. The theory of the cube method is briefly presented. Emphasis is placed on the practical problems posed by balanced sampling: the interest of the method with respect to other sampling methods and calibration, the field of application, the accuracy of balancing, the choice of auxiliary variables and ways to implement the method

Research paper thumbnail of Attack-tolerant Unequal Probability Sampling Methods over Sliding Window for Distributed Streams

Proceedings of the 2020 the 4th International Conference on Compute and Data Analysis, 2020

Distributed systems increasingly require the processing of large amounts of data, for metrology, ... more Distributed systems increasingly require the processing of large amounts of data, for metrology, safety or security purposes. The online processing of these large data streams requires the development of algorithms to efficiently calculate parameters. If elegant solutions have been proposed recently, their approximation is commonly calculated from the inception of the data stream. In a distributed execution context, it would be preferable to collect information only on the recent past (for resource saving or relevancy of most recent information). We therefore consider here the sliding window model. In this article, we propose a family of new sampling techniques that take into account both the sliding window model and the presence of a malicious adversary. Wayne Fuller proposed in 1970 a very ingenious method of sampling with unequal inclusion probabilities. After doing justice to this precursor paper and proposing a fast and simple implementation of it, we completely generalize Fuller's method in order to enable the use of a tuning parameter of spreading. The analytical results of these techniques show the excellent performance of the generalized pivotal approach. This generalization makes the sampling method less predictable and seems appropriate to be protected from malicious attacks when sampling from a stream.

Research paper thumbnail of Balancedk-nearest neighbour imputation

Statistics, 2016

In order to overcome the problem of item nonresponse, random imputations are often used because t... more In order to overcome the problem of item nonresponse, random imputations are often used because they tend to preserve the distribution of the imputed variable. Among the methods of random imputation, the random hot-deck has the interesting property that the imputed values are observed values. We present a new random method of hot-deck imputation which enables us to select the imputed values such that some balancing equations are satisfied and such that the donors are selected in neighborhoods of the recipients.

Research paper thumbnail of Some Remarks on Unequal Probability Sampling Designs without Replacement

Annales d'Économie et de Statistique, 1996

A set of demands is presented which should be satisfied by a good unequal-probability sampling me... more A set of demands is presented which should be satisfied by a good unequal-probability sampling method without replacement. First it is shown that some of these demands are contradictory. In particular, it is shown that a sequential algorithm that ensures strictly positive joint inclusion probabilities does not exist; not does there exist a sequential procedure that yields a result which is not dependent on the order of units in the data file. Next, a way is discussed to build approximations of the joint-inclusion probabilities for sampling design implemented by means of a sequential algorithm and preceded by a random sort of the data file. An original approximation is proposed which resorts to a method of adjustment to marginal totals. Finally, several approximations are compared to systematic sampling, and to Sunter's method.

Research paper thumbnail of Some recent sampling without replacement algorithms with unequal probabilities. (Quelques algorithmes de sondage récents sans remise à probabilités inégales.)

Research paper thumbnail of Utilisation d'informations auxiliaires dans les enquêtes par sondage

fin du dix-neuvième siècle. Pourtant, ce concept qui est appliqué autant aux plans par quotas qu'... more fin du dix-neuvième siècle. Pourtant, ce concept qui est appliqué autant aux plans par quotas qu'aux plans probabilistes est largement galvaudé. Après avoir rappelé quelqueséléments de l'histoire de la théorie des sondages, nous rappelons quelques techniques de base de plans aléatoires età choix raisonnés. Nous montrons ensuite que le concept de planéquilibré permet de lever les ambiguïtés fondamentales de la notion de représentativité.

Research paper thumbnail of An Interesting Property of the Entropy of Some Sampling Designs

In this short note, we show that simple random sampling without replacement and Bernoulli samplin... more In this short note, we show that simple random sampling without replacement and Bernoulli sampling have approximately the same entropy when the population size is large. An empirical example is given as an illustration.

Research paper thumbnail of Coordination, combination and extension of balanced samples

Biometrika, 2004

The cube method allows the selection of balanced samples on several auxiliary variables with equa... more The cube method allows the selection of balanced samples on several auxiliary variables with equal or unequal inclusion probabilities. Practical implementation of the cube method has raised questions concerning the selection of a multi-phase balanced sampling design, the rebalancing of an unbalanced sampling design by completing it with another sample, the selection of a balanced sample from an unbalanced sample and the coordination of balanced samples. This paper provides a complete solution of all these problems.

Research paper thumbnail of Towards optimal regression estimation in sample surveys

Australian <html_ent glyph="@amp;" ascii="&"/> New Zealand Journal of Statistics, 2003

The Montanari (1987) regression estimator is optimal when the population regression coefficients ... more The Montanari (1987) regression estimator is optimal when the population regression coefficients are known. When the coefficients are estimated, the Montanari estimator is not optimal and can be extremely volatile. Using design-based arguments, this paper proposes a simpler and better alternative to the Montanari estimator that is also optimal when the population regression coefficients are known. Moreover, it can be easily implemented as it involves standard weighted least squares. The estimator is applicable under single stage stratified sampling with unequal probabilities within each stratum.

Research paper thumbnail of Exercises and Solutions

Research paper thumbnail of Imputation of income data with generalized calibration procedure and GB 2 law : illustration with SILC data

In sample surveys of households and persons, questions about income are often sensitive and thus ... more In sample surveys of households and persons, questions about income are often sensitive and thus subject to a higher non-response rate. Nevertheless, the household or personal incomes are among the important variables in surveys of this type. The distribution of such collected incomes is not normal, neither log-normal. Hypotheses of classical regression models to explain the income (or their log) are not fulfilled. Imputations using such models modify the original and true distribution of the data. This is not suitable and may conduct the user to wrong interpretations of results computed from data imputed in this way. The generalized beta distribution of the second kind (GB2) is a four parameters distribution. Empirical studies have shown that it adapts very well to income data. The advantage of a parametric income distribution is that there exist explicit formulae for the inequality measures like the Laeken indicators as functions of the parameters. We present a parametric method o...

Research paper thumbnail of Sampling Algorithms

Springer Series in Statistics, 2006

... Hurdle: Smoothing Techniques: With Implementation in S. Harrell: Regression Modeling Strategi... more ... Hurdle: Smoothing Techniques: With Implementation in S. Harrell: Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and ... I am also grateful to Cedric Beguin, Ken Brewer, Lionel Qualite, and Paul-Andre Salamin for their constructive ...

Research paper thumbnail of Calcul De La Pr�cision Des Estimations Longitudinales Dans L'Enqu�te Suisse Sur La Valeur Ajout�e

Research paper thumbnail of Balanced sampling by means of the cube method

Research paper thumbnail of Use of auxiliary information in survey sampling

Research paper thumbnail of Estimation sans biais par calage sur la repartition dans les plans simples sans remise

Research paper thumbnail of Complex national sampling design for long-term monitoring of protected dry grasslands in Switzerland

Environmental and Ecological Statistics, 2013

Research paper thumbnail of A fast algorithm for balanced sampling

Computational Statistics, 2006

Summary The cube method (Deville &amp;amp; Till 2004) is a large family of algorithms that a... more Summary The cube method (Deville &amp;amp; Till 2004) is a large family of algorithms that allows selecting balanced samples with equal or unequal inclusion probabilities. In this paper, we propose a very fast implementation of the cube method. The execution time does not depend on the square of the population size anymore, but only on the population size. Balanced samples can

Research paper thumbnail of Efficient balanced sampling: The cube method

Biometrika, 2004

A balanced sampling design is defined by the property that the Horvitz-Thompson estimators of the... more A balanced sampling design is defined by the property that the Horvitz-Thompson estimators of the population totals of a set of auxiliary variables equal the known totals of these variables. Therefore the variances of estimators of totals of all the variables of interest are reduced, depending on the correlations of these variables with the controlled variables. In this paper, we develop a general method, called the cube method, for selecting approximately balanced samples with equal or unequal inclusion probabilities and any number of auxiliary variables.

Research paper thumbnail of Sampling Methods: Exercises and Solutions

The American Statistician, 2007

Page 1. This introductory text is very accessible—it does not assume any previous knowledge of st... more Page 1. This introductory text is very accessible—it does not assume any previous knowledge of statistics or design. The presentation style is clear and easy to fol-low, and each chapter concludes with a concise summary. Presented ...

Research paper thumbnail of Ten years of balanced sampling with the cube method: An appraisal

This paper presents a review and assessment of the use of balanced sampling by means of the cube ... more This paper presents a review and assessment of the use of balanced sampling by means of the cube method. After defining the notion of balanced sample and balanced sampling, a short history of the concept of balancing is presented. The theory of the cube method is briefly presented. Emphasis is placed on the practical problems posed by balanced sampling: the interest of the method with respect to other sampling methods and calibration, the field of application, the accuracy of balancing, the choice of auxiliary variables and ways to implement the method

Research paper thumbnail of Attack-tolerant Unequal Probability Sampling Methods over Sliding Window for Distributed Streams

Proceedings of the 2020 the 4th International Conference on Compute and Data Analysis, 2020

Distributed systems increasingly require the processing of large amounts of data, for metrology, ... more Distributed systems increasingly require the processing of large amounts of data, for metrology, safety or security purposes. The online processing of these large data streams requires the development of algorithms to efficiently calculate parameters. If elegant solutions have been proposed recently, their approximation is commonly calculated from the inception of the data stream. In a distributed execution context, it would be preferable to collect information only on the recent past (for resource saving or relevancy of most recent information). We therefore consider here the sliding window model. In this article, we propose a family of new sampling techniques that take into account both the sliding window model and the presence of a malicious adversary. Wayne Fuller proposed in 1970 a very ingenious method of sampling with unequal inclusion probabilities. After doing justice to this precursor paper and proposing a fast and simple implementation of it, we completely generalize Fuller's method in order to enable the use of a tuning parameter of spreading. The analytical results of these techniques show the excellent performance of the generalized pivotal approach. This generalization makes the sampling method less predictable and seems appropriate to be protected from malicious attacks when sampling from a stream.

Research paper thumbnail of Balancedk-nearest neighbour imputation

Statistics, 2016

In order to overcome the problem of item nonresponse, random imputations are often used because t... more In order to overcome the problem of item nonresponse, random imputations are often used because they tend to preserve the distribution of the imputed variable. Among the methods of random imputation, the random hot-deck has the interesting property that the imputed values are observed values. We present a new random method of hot-deck imputation which enables us to select the imputed values such that some balancing equations are satisfied and such that the donors are selected in neighborhoods of the recipients.

Research paper thumbnail of Some Remarks on Unequal Probability Sampling Designs without Replacement

Annales d'Économie et de Statistique, 1996

A set of demands is presented which should be satisfied by a good unequal-probability sampling me... more A set of demands is presented which should be satisfied by a good unequal-probability sampling method without replacement. First it is shown that some of these demands are contradictory. In particular, it is shown that a sequential algorithm that ensures strictly positive joint inclusion probabilities does not exist; not does there exist a sequential procedure that yields a result which is not dependent on the order of units in the data file. Next, a way is discussed to build approximations of the joint-inclusion probabilities for sampling design implemented by means of a sequential algorithm and preceded by a random sort of the data file. An original approximation is proposed which resorts to a method of adjustment to marginal totals. Finally, several approximations are compared to systematic sampling, and to Sunter's method.

Research paper thumbnail of Some recent sampling without replacement algorithms with unequal probabilities. (Quelques algorithmes de sondage récents sans remise à probabilités inégales.)

Research paper thumbnail of Utilisation d'informations auxiliaires dans les enquêtes par sondage

fin du dix-neuvième siècle. Pourtant, ce concept qui est appliqué autant aux plans par quotas qu'... more fin du dix-neuvième siècle. Pourtant, ce concept qui est appliqué autant aux plans par quotas qu'aux plans probabilistes est largement galvaudé. Après avoir rappelé quelqueséléments de l'histoire de la théorie des sondages, nous rappelons quelques techniques de base de plans aléatoires età choix raisonnés. Nous montrons ensuite que le concept de planéquilibré permet de lever les ambiguïtés fondamentales de la notion de représentativité.

Research paper thumbnail of An Interesting Property of the Entropy of Some Sampling Designs

In this short note, we show that simple random sampling without replacement and Bernoulli samplin... more In this short note, we show that simple random sampling without replacement and Bernoulli sampling have approximately the same entropy when the population size is large. An empirical example is given as an illustration.