Sylvie Huet - Academia.edu (original) (raw)

Papers by Sylvie Huet

Research paper thumbnail of Accelerating metabolic models evaluation with statistical metamodels: application to Salmonella infection models

ESAIM: Proceedings and Surveys

Mathematical and numerical models are increasingly used in microbial ecology to model the fate of... more Mathematical and numerical models are increasingly used in microbial ecology to model the fate of microbial communities in their ecosystem. These models allow to connect in a mechanistic framework species-level informations, such as the microbial genomes, with macro-scale features, such as species spatial distributions or metabolite gradients. Numerous models are built upon species-level metabolic models that predict the metabolic behaviour of a microbe by solving an optimization problem knowing its genome and its nutritional environment. However, screening the community dynamics with these metabolic models implies to solve such an optimization problem by species at each time step, leading to a significant computational load further increased by several orders of magnitude when spatial dimensions are added. In this paper, we propose a statistical framework based on Reproducing Kernel Hilbert Space (RKHS) metamodels that are used to provide fast approximations of the original metabol...

Research paper thumbnail of Risk upper bounds for RKHS ridge group sparse estimator in the regression model with non-Gaussian and non-bounded error

arXiv (Cornell University), Sep 22, 2020

We consider the problem of estimating a meta-model of an unknown regression model with non-Gaussi... more We consider the problem of estimating a meta-model of an unknown regression model with non-Gaussian and non-bounded error. The meta-model belongs to a reproducing kernel Hilbert space constructed as a direct sum of Hilbert spaces leading to an additive decomposition including the variables and interactions between them. The estimator of this meta-model is calculated by minimizing an empirical least-squares criterion penalized by the sum of the Hilbert norm and the empirical L 2-norm. In this context, the upper bounds of the empirical L 2 risk and the L 2 risk of the estimator are established.

Research paper thumbnail of RKHSMetaMod: An R Package to Estimate the Hoeffding Decomposition of a Complex Model by Solving RKHS Ridge Group Sparse Optimization Problem

The R Journal

In this paper, we propose an R package, called RKHSMetaMod, that implements a procedure for estim... more In this paper, we propose an R package, called RKHSMetaMod, that implements a procedure for estimating a meta-model of a complex model. The meta-model approximates the Hoeffding decomposition of the complex model and allows us to perform sensitivity analysis on it. It belongs to a reproducing kernel Hilbert space that is constructed as a direct sum of Hilbert spaces. The estimator of the meta-model is the solution of a penalized empirical least-squares minimization with the sum of the Hilbert norm and the empirical L 2-norm. This procedure, called RKHS ridge group sparse, allows both to select and estimate the terms in the Hoeffding decomposition, and therefore, to select and estimate the Sobol indices that are non-zero. The RKHSMetaMod package provides an interface from R statistical computing environment to the C++ libraries Eigen and GSL. In order to speed up the execution time and optimize the storage memory, except for a function that is written in R, all of the functions of this package are written using the efficient C++ libraries through RcppEigen and RcppGSL packages. These functions are then interfaced in the R environment in order to propose a user-friendly package.

Research paper thumbnail of DOI: 10.1051/ps:2006004 MODEL SELECTION FOR ESTIMATING THE NON ZERO COMPONENTS OF A GAUSSIAN VECTOR

Abstract. We propose a method based on a penalised likelihood criterion, for estimating the numbe... more Abstract. We propose a method based on a penalised likelihood criterion, for estimating the number on non-zero components of the mean of a Gaussian vector. Following the work of Birge ́ and Massart in Gaussian model selection, we choose the penalty function such that the resulting estimator minimises the Kullback risk.

Research paper thumbnail of DOI: 10.1051/ps:2003006 ADAPTIVE TESTS OF QUALITATIVE HYPOTHESES

Abstract. We propose a test of a qualitative hypothesis on the mean of a n-Gaussian vector. The t... more Abstract. We propose a test of a qualitative hypothesis on the mean of a n-Gaussian vector. The testing procedure is available when the variance of the observations is unknown and does not depend on any prior information on the alternative. The properties of the test are non-asymptotic. For testing positivity or monotonicity, we establish separation rates with respect to the Euclidean distance, over subsets of Rn which are related to Hölderian balls in functional spaces. We provide a simulation study in order to evaluate the procedure when the purpose is to test monotonicity in a functional regression model and to check the robustness of the procedure to non-Gaussian errors.

Research paper thumbnail of Semiparametric additive indices for binary response and generalized additive models

Models are studied where the response Y and covariates X, T are assumed to fulfill E(Y|X; T) = G{... more Models are studied where the response Y and covariates X, T are assumed to fulfill E(Y|X; T) = G{XT O + » + m1(T1) + … + md(Td)}. Here G is a known (link) function, O is an unknown parameter, and m1, …, md are unknown functions. In particular, we consider additive binary response models where the response Y is binary. In these models, given X and T, the response Y has a Bernoulli distribution with parameter G{XT O + » + m1(T1) + … + md(Td)}. The paper discusses estimation of O and m1, …, md. Procedures are proposed for testing linearity of the additive components m1, …, md. Furthermore, bootstrap uniform confidence intervals for the additive components are introduced. The practical performance of the proposed methods is discussed in simulations and in two economic applications.

Research paper thumbnail of Metamodel construction for sensitivity analysis

ESAIM: Proceedings and Surveys, 2017

We propose to estimate a metamodel and the sensitivity indices of a complex model m in the Gaussi... more We propose to estimate a metamodel and the sensitivity indices of a complex model m in the Gaussian regression framework. Our approach combines methods for sensitivity analysis of complex models and statistical tools for sparse non-parametric estimation in multivariate Gaussian regression model. It rests on the construction of a metamodel for aproximating the Hoeffding-Sobol decomposition of m. This metamodel belongs to a reproducing kernel Hilbert space constructed as a direct sum of Hilbert spaces leading to a functional ANOVA decomposition. The estimation of the metamodel is carried out via a penalized least-squares minimization allowing to select the subsets of variables that contribute to predict the output. It allows to estimate the sensitivity indices of m. We establish an oracle-type inequality for the risk of the estimator, describe the procedure for estimating the metamodel and the sensitivity indices, and assess the performances of the procedure via a simulation study. Résumé. Nous considérons l'estimation d'un méta-modèle d'un modèle complexe m à partir des observations d'un n-échantillon dans un modèle de régression gaussien. Nous en déduisons une estimation des indices de sensibilité de m. Notre approche combine les méthodes d'analyse de sensibilité de modèles complexes et les outils statistiques de l'estimation non-paramétrique en régression multivariée. Elle repose sur la construction d'un méta-modèle qui approche la décomposition de Hoeffding-Sobol de m. Ce méta-modèle appartient à un espace de Hilbert à noyau reproduisant qui est lui-même la somme directe d'espaces de Hilbert, permettant ainsi une décomposition de type ANOVA. On en déduit des estimateurs des indices de sensibilité de m. Nous établissons une inégalité de type oracle pour le risque de l'estimateur, nous décrivons la procédure pour estimer le méta-modèle et les indices de sensibilité, et évaluons les performances de notre méthode à l'aide d'une étude de simulations.

Research paper thumbnail of Testing k-monotonicity of a discrete distribution. Application to the estimation of the number of classes in a population

Computational Statistics & Data Analysis, 2018

We develop here several goodness-of-fit tests for testing the k-monotonicity of a discrete densit... more We develop here several goodness-of-fit tests for testing the k-monotonicity of a discrete density, based on the empirical distribution of the observations. Our tests are non-parametric, easy to implement and are proved to be asymptotically of the desired level and consistent. We propose an estimator of the degree of k-monotonicity of the distribution based on the non-parametric goodness-of-fit tests. We apply our work to the estimation of the total number of classes in a population. A large simulation study allows to assess the performances of our procedures.

Research paper thumbnail of Hybridization within Saccharomyces Genus Results in Homoeostasis and Phenotypic Novelty in Winemaking Conditions

PloS one, 2015

Despite its biotechnological interest, hybridization, which can result in hybrid vigor, has not c... more Despite its biotechnological interest, hybridization, which can result in hybrid vigor, has not commonly been studied or exploited in the yeast genus. From a diallel design including 55 intra- and interspecific hybrids between Saccharomyces cerevisiae and S. uvarum grown at two temperatures in enological conditions, we analyzed as many as 35 fermentation traits with original statistical and modeling tools. We first showed that, depending on the types of trait - kinetics parameters, life-history traits, enological parameters and aromas -, the sources of variation (strain, temperature and strain * temperature effects) differed in a large extent. Then we compared globally three groups of hybrids and their parents at two growth temperatures: intraspecific hybrids S. cerevisiae * S. cerevisiae, intraspecific hybrids S. uvarum * S. uvarum and interspecific hybrids S. cerevisiae * S. uvarum. We found that hybridization could generate multi-trait phenotypes with improved oenological perform...

Research paper thumbnail of The spatial distribution of mustelidae in france

PloS one, 2015

We estimated the spatial distribution of 6 Mustelidae species in France using the data collected ... more We estimated the spatial distribution of 6 Mustelidae species in France using the data collected by the French national hunting and wildlife agency under the "small carnivorous species logbooks" program. The 1500 national wildlife protection officers working for this agency spend 80% of their working time traveling in the spatial area in which they have authority. During their travels, they occasionally detect dead or living small and medium size carnivorous animals. Between 2002 and 2005, each car operated by this agency was equipped with a logbook in which officers recorded information about the detected animals (species, location, dead or alive, date). Thus, more than 30000 dead or living animals were detected during the study period. Because a large number of detected animals in a region could have been the result of a high sampling pressure there, we modeled the number of detected animals as a function of the sampling effort to allow for unbiased estimation of the spe...

Research paper thumbnail of Mélange de modèles mixtes: application à l'analyse des appariements de chromosomes chez des haploïdes de colza

Mélange de modèles mixtes : application à l'analyse des appariements de chromosomes chez des hapl... more Mélange de modèles mixtes : application à l'analyse des appariements de chromosomes chez des haploïdes de colza Journal de la société française de statistique, tome 143, n o 1-2 (2002), p. 147-153 <http://www.numdam.org/item?id=JSFS_2002__143_1-2_147_0> © Société française de statistique, 2002, tous droits réservés. L'accès aux archives de la revue « Journal de la société française de statistique » (http://publications-sfds.math.cnrs.fr/index.php/J-SFdS) implique l'accord avec les conditions générales d'utilisation (http://www.numdam. org/conditions). Toute utilisation commerciale ou impression systématique est constitutive d'une infraction pénale. Toute copie ou impression de ce fichier doit contenir la présente mention de copyright. Article numérisé dans le cadre du programme Numérisation de documents anciens mathématiques http://www.numdam.org/

Research paper thumbnail of Dosages radioimmunologiques : quelle analyse statistique ?

Reproduction Nutrition Développement, 1984

Research paper thumbnail of Estimator selection in the Gaussian setting

Annales de l'Institut Henri Poincaré, Probabilités et Statistiques, 2014

We consider the problem of estimating the mean f of a Gaussian vector Y with independent componen... more We consider the problem of estimating the mean f of a Gaussian vector Y with independent components of common unknown variance σ 2. Our estimation procedure is based on estimator selection. More precisely, we start with an arbitrary and possibly infinite collection F of estimators of f based on Y and, with the same data Y , aim at selecting an estimator among F with the smallest Euclidean risk. No assumptions on the estimators are made and their dependencies with respect to Y may be unknown. We establish a non-asymptotic risk bound for the selected estimator and derive oracle-type inequalities when F consists of linear estimators. As particular cases, our approach allows to handle the problems of aggregation, model selection as well as those of choosing a window and a kernel for estimating a regression function, or tuning the parameter involved in a penalized criterion. In all theses cases but aggregation, the method can be easily implemented. For illustration, we carry out two simulation studies. One aims at comparing our procedure to cross-validation for choosing a tuning parameter. The other shows how to implement our approach to solve the problem of variable selection in practice. Résumé. Nous présentons une nouvelle procédure de sélection d'estimateurs pour estimer l'espérance f d'un vecteur Y de n variables gaussiennes indépendantes dont la variance est inconnue. Nous proposons de choisir un estimateur de f , dont l'objectif est de minimiser le risque l 2 , dans une collection arbitraire et éventuellement infinie F d'estimateurs. La procédure de choix ainsi que la collection F ne dépendent que des seules observations Y. Nous calculons une borne de risque, non asymptotique, ne nécessitant aucune hypothèse sur les estimateurs dans F, ni la connaissance de leur dépendance en Y. Nous calculons des inégalités de type "oracle" quand F est une collection d'estimateurs linéaires. Nous considérons plusieurs cas particuliers : estimation par aggrégation, estimation par sélection de modèles, choix d'une fenêtre et du paramètre de lissage en régression fonctionnelle, choix du paramètre de régularisation dans un critère pénalisé. Pour tous ces cas particuliers, sauf pour les méthodes d'aggrégation, la méthode est très facile à programmer. A titre d'illustration nous montrons des résultats de simulations avec deux objectifs : comparer notre méthode à la procédure de cross-validation, montrer comment la mettre en oeuvre dans le cadre de la sélection de variables.

Research paper thumbnail of Statistical Tools for Nonlinear Regression: A Practical Guide with S-PLUS Examples

Journal of the American Statistical Association, 1997

Production managed by Natalie Johnson; manufacturing supervised by Jeffrey Taub. Camera-ready cop... more Production managed by Natalie Johnson; manufacturing supervised by Jeffrey Taub. Camera-ready copy prepared using the authors' LaTeX files.

Research paper thumbnail of Model selection for estimating the non zero components of a Gaussian vector

ESAIM: Probability and Statistics, 2006

We propose a method based on a penalised likelihood criterion, for estimating the number on non-z... more We propose a method based on a penalised likelihood criterion, for estimating the number on non-zero components of the mean of a Gaussian vector. Following the work of Birgé and Massart in Gaussian model selection, we choose the penalty function such that the resulting estimator minimises the Kullback risk.

Research paper thumbnail of Bootstrap Inference in Semiparametric Generalized Additive Models

Econometric Theory, 2004

Here, G is a known link, a, J3 are unknown parameters, and ml, ... , I1ld are unknown (smooth) fu... more Here, G is a known link, a, J3 are unknown parameters, and ml, ... , I1ld are unknown (smooth) functions of possibly higher dimensional covariates T}, ... , T d. Estimates of m}, ... , 11ld, a and J3 are presented and asymptotic distribution theory for both the nonparametric and the parametric part is given. The main focus is the application of bootstrap methods. It is shown that bootstrap can be used for bias correction, hypothesis testing (e.g. component-wise analysis) and the construction of uniform confidence bands. Various bootstrap tests for model specification and parametrization are given, in particular for testing additivity and link function specification. The practical performance of our methods is illustrated in simulations and in an application to East-West German migration.

Research paper thumbnail of Gaussian model selection with an unknown variance

The Annals of Statistics, 2009

Let Y be a Gaussian vector whose components are independent with a common unknown variance. We co... more Let Y be a Gaussian vector whose components are independent with a common unknown variance. We consider the problem of estimating the mean μ of Y by model selection. More precisely, we start with a collection S = {S m , m ∈ M} of linear subspaces of R n and associate to each of these the least-squares estimator of μ on S m. Then, we use a data driven penalized criterion in order to select one estimator among these. Our first objective is to analyze the performance of estimators associated to classical criteria such as FPE, AIC, BIC and AMDL. Our second objective is to propose better penalties that are versatile enough to take into account both the complexity of the collection S and the sample size. Then we apply those to solve various statistical problems such as variable selection, change point detections and signal estimation among others. Our results are based on a nonasymptotic risk bound with respect to the Euclidean loss for the selected estimator. Some analogous results are also established for the Kullback loss.

Research paper thumbnail of Adaptive tests of linear hypotheses by model selection

The Annals of Statistics, 2003

We propose a new test, based on model selection methods, for testing that the expectation of a Ga... more We propose a new test, based on model selection methods, for testing that the expectation of a Gaussian vector with n independent components belongs to a linear subspace of R n against a nonparametric alternative. The testing procedure is available when the variance of the observations is unknown and does not depend on any prior information on the alternative. The properties of the test are nonasymptotic and we prove that the test is rate optimal [up to a possible log(n) factor] over various classes of alternatives simultaneously. We also provide a simulation study in order to evaluate the procedure when the purpose is to test goodness-of-fit in a regression model.

Research paper thumbnail of Methodology for choosing a model for wheat kernel growth

Agronomie, 1999

HAL is a multidisciplinary open access archive for the deposit and dissemination of scientific re... more HAL is a multidisciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L'archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d'enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Research paper thumbnail of Estimation of a convex discrete distribution

arXiv: Methodology, 2012

Non-parametric estimation of a convex discrete distribution may be of interest in several applica... more Non-parametric estimation of a convex discrete distribution may be of interest in several applications, such as the estimation of species abundance distribution in ecology. In this paper we study the least squares estimator of a discrete distribution under the constraint of convexity. We show that this estimator exists and is unique, and that it always outperforms the classical empirical estimator in terms of the ell2\ell_{2}ell2-distance. We provide an algorithm for its computation, based on the support reduction algorithm. We compare its performance to those of the empirical estimator, on the basis of a simulation study.

Research paper thumbnail of Accelerating metabolic models evaluation with statistical metamodels: application to Salmonella infection models

ESAIM: Proceedings and Surveys

Mathematical and numerical models are increasingly used in microbial ecology to model the fate of... more Mathematical and numerical models are increasingly used in microbial ecology to model the fate of microbial communities in their ecosystem. These models allow to connect in a mechanistic framework species-level informations, such as the microbial genomes, with macro-scale features, such as species spatial distributions or metabolite gradients. Numerous models are built upon species-level metabolic models that predict the metabolic behaviour of a microbe by solving an optimization problem knowing its genome and its nutritional environment. However, screening the community dynamics with these metabolic models implies to solve such an optimization problem by species at each time step, leading to a significant computational load further increased by several orders of magnitude when spatial dimensions are added. In this paper, we propose a statistical framework based on Reproducing Kernel Hilbert Space (RKHS) metamodels that are used to provide fast approximations of the original metabol...

Research paper thumbnail of Risk upper bounds for RKHS ridge group sparse estimator in the regression model with non-Gaussian and non-bounded error

arXiv (Cornell University), Sep 22, 2020

We consider the problem of estimating a meta-model of an unknown regression model with non-Gaussi... more We consider the problem of estimating a meta-model of an unknown regression model with non-Gaussian and non-bounded error. The meta-model belongs to a reproducing kernel Hilbert space constructed as a direct sum of Hilbert spaces leading to an additive decomposition including the variables and interactions between them. The estimator of this meta-model is calculated by minimizing an empirical least-squares criterion penalized by the sum of the Hilbert norm and the empirical L 2-norm. In this context, the upper bounds of the empirical L 2 risk and the L 2 risk of the estimator are established.

Research paper thumbnail of RKHSMetaMod: An R Package to Estimate the Hoeffding Decomposition of a Complex Model by Solving RKHS Ridge Group Sparse Optimization Problem

The R Journal

In this paper, we propose an R package, called RKHSMetaMod, that implements a procedure for estim... more In this paper, we propose an R package, called RKHSMetaMod, that implements a procedure for estimating a meta-model of a complex model. The meta-model approximates the Hoeffding decomposition of the complex model and allows us to perform sensitivity analysis on it. It belongs to a reproducing kernel Hilbert space that is constructed as a direct sum of Hilbert spaces. The estimator of the meta-model is the solution of a penalized empirical least-squares minimization with the sum of the Hilbert norm and the empirical L 2-norm. This procedure, called RKHS ridge group sparse, allows both to select and estimate the terms in the Hoeffding decomposition, and therefore, to select and estimate the Sobol indices that are non-zero. The RKHSMetaMod package provides an interface from R statistical computing environment to the C++ libraries Eigen and GSL. In order to speed up the execution time and optimize the storage memory, except for a function that is written in R, all of the functions of this package are written using the efficient C++ libraries through RcppEigen and RcppGSL packages. These functions are then interfaced in the R environment in order to propose a user-friendly package.

Research paper thumbnail of DOI: 10.1051/ps:2006004 MODEL SELECTION FOR ESTIMATING THE NON ZERO COMPONENTS OF A GAUSSIAN VECTOR

Abstract. We propose a method based on a penalised likelihood criterion, for estimating the numbe... more Abstract. We propose a method based on a penalised likelihood criterion, for estimating the number on non-zero components of the mean of a Gaussian vector. Following the work of Birge ́ and Massart in Gaussian model selection, we choose the penalty function such that the resulting estimator minimises the Kullback risk.

Research paper thumbnail of DOI: 10.1051/ps:2003006 ADAPTIVE TESTS OF QUALITATIVE HYPOTHESES

Abstract. We propose a test of a qualitative hypothesis on the mean of a n-Gaussian vector. The t... more Abstract. We propose a test of a qualitative hypothesis on the mean of a n-Gaussian vector. The testing procedure is available when the variance of the observations is unknown and does not depend on any prior information on the alternative. The properties of the test are non-asymptotic. For testing positivity or monotonicity, we establish separation rates with respect to the Euclidean distance, over subsets of Rn which are related to Hölderian balls in functional spaces. We provide a simulation study in order to evaluate the procedure when the purpose is to test monotonicity in a functional regression model and to check the robustness of the procedure to non-Gaussian errors.

Research paper thumbnail of Semiparametric additive indices for binary response and generalized additive models

Models are studied where the response Y and covariates X, T are assumed to fulfill E(Y|X; T) = G{... more Models are studied where the response Y and covariates X, T are assumed to fulfill E(Y|X; T) = G{XT O + » + m1(T1) + … + md(Td)}. Here G is a known (link) function, O is an unknown parameter, and m1, …, md are unknown functions. In particular, we consider additive binary response models where the response Y is binary. In these models, given X and T, the response Y has a Bernoulli distribution with parameter G{XT O + » + m1(T1) + … + md(Td)}. The paper discusses estimation of O and m1, …, md. Procedures are proposed for testing linearity of the additive components m1, …, md. Furthermore, bootstrap uniform confidence intervals for the additive components are introduced. The practical performance of the proposed methods is discussed in simulations and in two economic applications.

Research paper thumbnail of Metamodel construction for sensitivity analysis

ESAIM: Proceedings and Surveys, 2017

We propose to estimate a metamodel and the sensitivity indices of a complex model m in the Gaussi... more We propose to estimate a metamodel and the sensitivity indices of a complex model m in the Gaussian regression framework. Our approach combines methods for sensitivity analysis of complex models and statistical tools for sparse non-parametric estimation in multivariate Gaussian regression model. It rests on the construction of a metamodel for aproximating the Hoeffding-Sobol decomposition of m. This metamodel belongs to a reproducing kernel Hilbert space constructed as a direct sum of Hilbert spaces leading to a functional ANOVA decomposition. The estimation of the metamodel is carried out via a penalized least-squares minimization allowing to select the subsets of variables that contribute to predict the output. It allows to estimate the sensitivity indices of m. We establish an oracle-type inequality for the risk of the estimator, describe the procedure for estimating the metamodel and the sensitivity indices, and assess the performances of the procedure via a simulation study. Résumé. Nous considérons l'estimation d'un méta-modèle d'un modèle complexe m à partir des observations d'un n-échantillon dans un modèle de régression gaussien. Nous en déduisons une estimation des indices de sensibilité de m. Notre approche combine les méthodes d'analyse de sensibilité de modèles complexes et les outils statistiques de l'estimation non-paramétrique en régression multivariée. Elle repose sur la construction d'un méta-modèle qui approche la décomposition de Hoeffding-Sobol de m. Ce méta-modèle appartient à un espace de Hilbert à noyau reproduisant qui est lui-même la somme directe d'espaces de Hilbert, permettant ainsi une décomposition de type ANOVA. On en déduit des estimateurs des indices de sensibilité de m. Nous établissons une inégalité de type oracle pour le risque de l'estimateur, nous décrivons la procédure pour estimer le méta-modèle et les indices de sensibilité, et évaluons les performances de notre méthode à l'aide d'une étude de simulations.

Research paper thumbnail of Testing k-monotonicity of a discrete distribution. Application to the estimation of the number of classes in a population

Computational Statistics & Data Analysis, 2018

We develop here several goodness-of-fit tests for testing the k-monotonicity of a discrete densit... more We develop here several goodness-of-fit tests for testing the k-monotonicity of a discrete density, based on the empirical distribution of the observations. Our tests are non-parametric, easy to implement and are proved to be asymptotically of the desired level and consistent. We propose an estimator of the degree of k-monotonicity of the distribution based on the non-parametric goodness-of-fit tests. We apply our work to the estimation of the total number of classes in a population. A large simulation study allows to assess the performances of our procedures.

Research paper thumbnail of Hybridization within Saccharomyces Genus Results in Homoeostasis and Phenotypic Novelty in Winemaking Conditions

PloS one, 2015

Despite its biotechnological interest, hybridization, which can result in hybrid vigor, has not c... more Despite its biotechnological interest, hybridization, which can result in hybrid vigor, has not commonly been studied or exploited in the yeast genus. From a diallel design including 55 intra- and interspecific hybrids between Saccharomyces cerevisiae and S. uvarum grown at two temperatures in enological conditions, we analyzed as many as 35 fermentation traits with original statistical and modeling tools. We first showed that, depending on the types of trait - kinetics parameters, life-history traits, enological parameters and aromas -, the sources of variation (strain, temperature and strain * temperature effects) differed in a large extent. Then we compared globally three groups of hybrids and their parents at two growth temperatures: intraspecific hybrids S. cerevisiae * S. cerevisiae, intraspecific hybrids S. uvarum * S. uvarum and interspecific hybrids S. cerevisiae * S. uvarum. We found that hybridization could generate multi-trait phenotypes with improved oenological perform...

Research paper thumbnail of The spatial distribution of mustelidae in france

PloS one, 2015

We estimated the spatial distribution of 6 Mustelidae species in France using the data collected ... more We estimated the spatial distribution of 6 Mustelidae species in France using the data collected by the French national hunting and wildlife agency under the "small carnivorous species logbooks" program. The 1500 national wildlife protection officers working for this agency spend 80% of their working time traveling in the spatial area in which they have authority. During their travels, they occasionally detect dead or living small and medium size carnivorous animals. Between 2002 and 2005, each car operated by this agency was equipped with a logbook in which officers recorded information about the detected animals (species, location, dead or alive, date). Thus, more than 30000 dead or living animals were detected during the study period. Because a large number of detected animals in a region could have been the result of a high sampling pressure there, we modeled the number of detected animals as a function of the sampling effort to allow for unbiased estimation of the spe...

Research paper thumbnail of Mélange de modèles mixtes: application à l'analyse des appariements de chromosomes chez des haploïdes de colza

Mélange de modèles mixtes : application à l'analyse des appariements de chromosomes chez des hapl... more Mélange de modèles mixtes : application à l'analyse des appariements de chromosomes chez des haploïdes de colza Journal de la société française de statistique, tome 143, n o 1-2 (2002), p. 147-153 <http://www.numdam.org/item?id=JSFS_2002__143_1-2_147_0> © Société française de statistique, 2002, tous droits réservés. L'accès aux archives de la revue « Journal de la société française de statistique » (http://publications-sfds.math.cnrs.fr/index.php/J-SFdS) implique l'accord avec les conditions générales d'utilisation (http://www.numdam. org/conditions). Toute utilisation commerciale ou impression systématique est constitutive d'une infraction pénale. Toute copie ou impression de ce fichier doit contenir la présente mention de copyright. Article numérisé dans le cadre du programme Numérisation de documents anciens mathématiques http://www.numdam.org/

Research paper thumbnail of Dosages radioimmunologiques : quelle analyse statistique ?

Reproduction Nutrition Développement, 1984

Research paper thumbnail of Estimator selection in the Gaussian setting

Annales de l'Institut Henri Poincaré, Probabilités et Statistiques, 2014

We consider the problem of estimating the mean f of a Gaussian vector Y with independent componen... more We consider the problem of estimating the mean f of a Gaussian vector Y with independent components of common unknown variance σ 2. Our estimation procedure is based on estimator selection. More precisely, we start with an arbitrary and possibly infinite collection F of estimators of f based on Y and, with the same data Y , aim at selecting an estimator among F with the smallest Euclidean risk. No assumptions on the estimators are made and their dependencies with respect to Y may be unknown. We establish a non-asymptotic risk bound for the selected estimator and derive oracle-type inequalities when F consists of linear estimators. As particular cases, our approach allows to handle the problems of aggregation, model selection as well as those of choosing a window and a kernel for estimating a regression function, or tuning the parameter involved in a penalized criterion. In all theses cases but aggregation, the method can be easily implemented. For illustration, we carry out two simulation studies. One aims at comparing our procedure to cross-validation for choosing a tuning parameter. The other shows how to implement our approach to solve the problem of variable selection in practice. Résumé. Nous présentons une nouvelle procédure de sélection d'estimateurs pour estimer l'espérance f d'un vecteur Y de n variables gaussiennes indépendantes dont la variance est inconnue. Nous proposons de choisir un estimateur de f , dont l'objectif est de minimiser le risque l 2 , dans une collection arbitraire et éventuellement infinie F d'estimateurs. La procédure de choix ainsi que la collection F ne dépendent que des seules observations Y. Nous calculons une borne de risque, non asymptotique, ne nécessitant aucune hypothèse sur les estimateurs dans F, ni la connaissance de leur dépendance en Y. Nous calculons des inégalités de type "oracle" quand F est une collection d'estimateurs linéaires. Nous considérons plusieurs cas particuliers : estimation par aggrégation, estimation par sélection de modèles, choix d'une fenêtre et du paramètre de lissage en régression fonctionnelle, choix du paramètre de régularisation dans un critère pénalisé. Pour tous ces cas particuliers, sauf pour les méthodes d'aggrégation, la méthode est très facile à programmer. A titre d'illustration nous montrons des résultats de simulations avec deux objectifs : comparer notre méthode à la procédure de cross-validation, montrer comment la mettre en oeuvre dans le cadre de la sélection de variables.

Research paper thumbnail of Statistical Tools for Nonlinear Regression: A Practical Guide with S-PLUS Examples

Journal of the American Statistical Association, 1997

Production managed by Natalie Johnson; manufacturing supervised by Jeffrey Taub. Camera-ready cop... more Production managed by Natalie Johnson; manufacturing supervised by Jeffrey Taub. Camera-ready copy prepared using the authors' LaTeX files.

Research paper thumbnail of Model selection for estimating the non zero components of a Gaussian vector

ESAIM: Probability and Statistics, 2006

We propose a method based on a penalised likelihood criterion, for estimating the number on non-z... more We propose a method based on a penalised likelihood criterion, for estimating the number on non-zero components of the mean of a Gaussian vector. Following the work of Birgé and Massart in Gaussian model selection, we choose the penalty function such that the resulting estimator minimises the Kullback risk.

Research paper thumbnail of Bootstrap Inference in Semiparametric Generalized Additive Models

Econometric Theory, 2004

Here, G is a known link, a, J3 are unknown parameters, and ml, ... , I1ld are unknown (smooth) fu... more Here, G is a known link, a, J3 are unknown parameters, and ml, ... , I1ld are unknown (smooth) functions of possibly higher dimensional covariates T}, ... , T d. Estimates of m}, ... , 11ld, a and J3 are presented and asymptotic distribution theory for both the nonparametric and the parametric part is given. The main focus is the application of bootstrap methods. It is shown that bootstrap can be used for bias correction, hypothesis testing (e.g. component-wise analysis) and the construction of uniform confidence bands. Various bootstrap tests for model specification and parametrization are given, in particular for testing additivity and link function specification. The practical performance of our methods is illustrated in simulations and in an application to East-West German migration.

Research paper thumbnail of Gaussian model selection with an unknown variance

The Annals of Statistics, 2009

Let Y be a Gaussian vector whose components are independent with a common unknown variance. We co... more Let Y be a Gaussian vector whose components are independent with a common unknown variance. We consider the problem of estimating the mean μ of Y by model selection. More precisely, we start with a collection S = {S m , m ∈ M} of linear subspaces of R n and associate to each of these the least-squares estimator of μ on S m. Then, we use a data driven penalized criterion in order to select one estimator among these. Our first objective is to analyze the performance of estimators associated to classical criteria such as FPE, AIC, BIC and AMDL. Our second objective is to propose better penalties that are versatile enough to take into account both the complexity of the collection S and the sample size. Then we apply those to solve various statistical problems such as variable selection, change point detections and signal estimation among others. Our results are based on a nonasymptotic risk bound with respect to the Euclidean loss for the selected estimator. Some analogous results are also established for the Kullback loss.

Research paper thumbnail of Adaptive tests of linear hypotheses by model selection

The Annals of Statistics, 2003

We propose a new test, based on model selection methods, for testing that the expectation of a Ga... more We propose a new test, based on model selection methods, for testing that the expectation of a Gaussian vector with n independent components belongs to a linear subspace of R n against a nonparametric alternative. The testing procedure is available when the variance of the observations is unknown and does not depend on any prior information on the alternative. The properties of the test are nonasymptotic and we prove that the test is rate optimal [up to a possible log(n) factor] over various classes of alternatives simultaneously. We also provide a simulation study in order to evaluate the procedure when the purpose is to test goodness-of-fit in a regression model.

Research paper thumbnail of Methodology for choosing a model for wheat kernel growth

Agronomie, 1999

HAL is a multidisciplinary open access archive for the deposit and dissemination of scientific re... more HAL is a multidisciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L'archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d'enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Research paper thumbnail of Estimation of a convex discrete distribution

arXiv: Methodology, 2012

Non-parametric estimation of a convex discrete distribution may be of interest in several applica... more Non-parametric estimation of a convex discrete distribution may be of interest in several applications, such as the estimation of species abundance distribution in ecology. In this paper we study the least squares estimator of a discrete distribution under the constraint of convexity. We show that this estimator exists and is unique, and that it always outperforms the classical empirical estimator in terms of the ell2\ell_{2}ell2-distance. We provide an algorithm for its computation, based on the support reduction algorithm. We compare its performance to those of the empirical estimator, on the basis of a simulation study.