GJ Babu - Academia.edu (original) (raw)

Papers by GJ Babu

Research paper thumbnail of C. R. Rao (1920–) Celebrates His 101st Birthday

Notices of the American Mathematical Society

was a big event in the department (Stanford Statistics), not just for me." And Donald Rubin, Harv... more was a big event in the department (Stanford Statistics), not just for me." And Donald Rubin, Harvard University, writes in [1] that "Despite his dominant reputation, Rao always seemed to be extremely modest and, moreover, helpful to younger colleagues." Efron further writes in [1] that "the 25-year-old Rao in 1945 [2] introduced differential geometry into statistical inference, opening up the burgeoning field now called information geometry." Rao distances combined with conformal mappings are seen in modern applications such as virtual tourism [4,5]. Shun-ichi Amari, Tokyo University, writes in [1] that "[C. R.] Rao's initiation of information geometry is one of the many achievements for which he was awarded the US National Medal of Science. Information geometry has grown to become an important tool not only in statistics but also in artificial intelligence, data science, signal processing, physics, and many other fields since it elucidates the fundamental structure of the manifold of probabilities." This tribute to Professor C. R. Rao in the Notices is to celebrate his 101st birthday, and it can be treated as a sequel to the two tributes to him by two groups of renowned statisticians and mathematicians published during the centenary. See B. Efron et al. [1] and Prakasa Rao et al. [3].

Research paper thumbnail of A statistical model for the relation between exoplanets and their host stars

arXiv: Data Analysis, Statistics and Probability, 2009

A general model is proposed to explain the relation between the extrasolar planets (or exoplanets... more A general model is proposed to explain the relation between the extrasolar planets (or exoplanets) detected until June 2008 and the main characteristics of their host stars through statistical techniques. The main goal is to establish a mathematical relation among the set of variables which better describe the physical characteristics of the host star and the planet itself. The host star is characterized by its distance, age, effective temperature, mass, metallicity, radius and magnitude. The exoplanet is described through its physical parameters (radius and mass) and its orbital parameters (distance, period, eccentricity, inclination and major semiaxis). As a first approach we consider that only the mass of the exoplanet is being determined by the physical properties of its host star. The proposed model is then validated through statistical analysis. Finally we discuss the categorical behavior of the dependent variable through binary models.

Research paper thumbnail of Statistical Methodology for Large Astronomical Surveys

Symposium - International Astronomical Union, 1998

Multiwavelength surveys present a variety of challenging statistical problems: raw data processin... more Multiwavelength surveys present a variety of challenging statistical problems: raw data processing, source identification, source characterization and classification, and interrelations between multiwavelength properties. For these last two issues, we discuss the applicability of standard and new multivariate statistical techniques. Traditional methods such as ANOVA, principal components analysis, cluster analysis, and tests for multivariate linear hypotheses are underutilized in astronomy and can be very helpful. Newer statistical methods such as projection pursuit, multivariate splines, and visualization tools such as XGobi are briefly introduced. However, multivariate databases from astronomical surveys present significant challenges to the statistical community. These include treatments of heteroscedastic measurement errors, censoring and truncation due to flux limits, and parameter estimation for nonlinear astrophysical models.

Research paper thumbnail of 20 Data-based sampling and model-based estimation for environmental resources

Handbook of Statistics, 1988

Research paper thumbnail of Edgeworth Expansions: A Brief Review of Zhidong Bai's Contributions

Advances in Statistics - Proceedings of the Conference in Honor of Professor Zhidong Bai on His 65th Birthday, 2008

Professor Bai's contributions to Edgeworth Expansions are reviewed. Author's collaborations with ... more Professor Bai's contributions to Edgeworth Expansions are reviewed. Author's collaborations with Professor Bai on the topic are also discussed.

Research paper thumbnail of The Spectrum of LSST Data Analysis Challenges: Kiloscale to Petascale

The unprecedented science opportunities enabled by LSST's wide-fast-deep mode of operation are ac... more The unprecedented science opportunities enabled by LSST's wide-fast-deep mode of operation are accompanied by equally unprecedented data analysis challenges, due to the huge size and synoptic scope of LSST data products. The most obvious challenges are those associated with processing the petabyte-scale fundamental LSST image data. But the challenges will not end with the production of official LSST catalogs and databases. Science with LSST data will present new data analysis challenges spanning a broad range of sizes, types, and complexity, requiring innovative methodological research across this full range. We present representative examples of LSST data analysis problems of various scales, displaying some of the diversity of astroinformatics/astrostatistics research astronomers using LSST data must undertake.

Research paper thumbnail of Linear Regression: Which Method should be used?

Research paper thumbnail of Statistical Challenges in Modern Astronomy

Technometrics, 1994

Despite centuries of close association, statistics and astronomy are surprisingly distant today. ... more Despite centuries of close association, statistics and astronomy are surprisingly distant today. Most observational astronomical research relies on an inadequate toolbox of methodological tools. Yet the needs are substantial: astronomy encounters sophisticated problems involving sampling theory, survival analysis, multivariate classification and analysis, time series analysis, wavelet analysis, spatial point processes, nonlinear regression, bootstrap resampling and model selection. We review the recent resurgence of astrostatistical research, and outline new challenges raised by the emerging Virtual Observatory. Our essay ends with a list of research challenges and infrastructure for astrostatistics in the coming decade.

Research paper thumbnail of Multivariate Permutation Tests

Research paper thumbnail of Bootstrap confidence intervals

Statistics & Probability Letters, 1988

Nonparametric confidence bounds are obtained for a wide class of statistics using bootstrap. Thes... more Nonparametric confidence bounds are obtained for a wide class of statistics using bootstrap. These results improve the errors in the probability estimates of the confidence intervals over the ones obtained by the normal approximation theory unconditionally.

Research paper thumbnail of Accuracy of the bootstrap approximation

Probability Theory and Related Fields, 1991

Summary The sampling distribution of several commonly occurring statistics are known to be closer... more Summary The sampling distribution of several commonly occurring statistics are known to be closer to the corresponding bootstrap distribution than the normal distribution, under some conditions on the moments and the smoothness of the population distribution. These conditional approximations are suggestive of the unconditional ones considered in this paper, though one cannot be derived from the other by elementary methods.

Research paper thumbnail of Astrostatistics

Research paper thumbnail of Bayesian model selection and extrasolar planet detection

The discovery of nearly 200 extrasolar planets during the last decade has revitalized scientific ... more The discovery of nearly 200 extrasolar planets during the last decade has revitalized scientific interest in the physics of planet formation and ushered in a new era for astronomy. Astronomers searching for the small signals induced by planets inevitably face significant statistical challenges. For example, radial velocity (RV) planet searches (that have discovered most of the known planets) are increasingly finding planets with small velocity amplitudes, with long orbital periods, or in multiple planet systems. Bayesian inference has the potential to improve the interpretation of existing observations, the planning of future observations and ultimately inferences concerning the overall population of planets. The main obstacle to applying Bayesian inference to extrasolar planet searches is the need to develop computationally efficient algorithms for calculating integrals over high-dimensional parameter spaces. In recent years, the refinement of Markov chain Monte Carlo (MCMC) algorithms has made it practical to accurately characterize orbital parameters and their uncertainties from RV observations of single-planet and weakly interacting multiple-planet systems. Unfortunately, MCMC is not sufficient for Bayesian model selection, i.e., comparing the marginal posterior probability of models with different parameters, as is necessary to determine how strongly the observational data favor a model with n + 1 planets over a model with just n planets. Many of the obvious estimators for the marginal posterior probability suffer from poor convergence properties. We compare several estimators of the marginal likelihood and feature those that display desirable convergence properties based on the analysis of a sample data set for HD 88133b Fischer et al. (2005). We find that methods based on importance sampling are most efficient, provided that a good analytic approximation of the posterior probability distribution is available. We present a simple algorithm for using a sample from the posterior to construct a mixture distribution that approximates the posterior and can be used for importance sampling and Bayesian model selection. We conclude with some suggestions for the development and refinement of computationally efficient and robust estimators of marginal posterior probabilities.

Research paper thumbnail of Implementing Astrostatistics in the Virtual Observatory

Data analysis in a Virtual Observatory will require a broad range of applied statistical tools. T... more Data analysis in a Virtual Observatory will require a broad range of applied statistical tools. Towards that end we have been exploring a few software models which will allow astronomers to transparently apply different statistical techniques to large datasets. This includes a prototype web-based service, which will be described in this poster. The backbone of the service will be a combination of existing and well tested computer languages/packages like C, IDL, PERL and R; but it will be flexible enough to allow users to use their own favourite packages as well. We will demonstrate our pipeline and discuss how it fits in well with the evolving VO architecture. This work is produced by an interdisciplinary (computational astrostatistics) collaborative effort supported by the NSF FRG grant DMS-0101360.

Research paper thumbnail of Doing Science with VOStat

Research paper thumbnail of Statistical Methodology for the National Virtual Observatory

Research paper thumbnail of C. R. Rao (1920–) Celebrates His 101st Birthday

Notices of the American Mathematical Society

was a big event in the department (Stanford Statistics), not just for me." And Donald Rubin, Harv... more was a big event in the department (Stanford Statistics), not just for me." And Donald Rubin, Harvard University, writes in [1] that "Despite his dominant reputation, Rao always seemed to be extremely modest and, moreover, helpful to younger colleagues." Efron further writes in [1] that "the 25-year-old Rao in 1945 [2] introduced differential geometry into statistical inference, opening up the burgeoning field now called information geometry." Rao distances combined with conformal mappings are seen in modern applications such as virtual tourism [4,5]. Shun-ichi Amari, Tokyo University, writes in [1] that "[C. R.] Rao's initiation of information geometry is one of the many achievements for which he was awarded the US National Medal of Science. Information geometry has grown to become an important tool not only in statistics but also in artificial intelligence, data science, signal processing, physics, and many other fields since it elucidates the fundamental structure of the manifold of probabilities." This tribute to Professor C. R. Rao in the Notices is to celebrate his 101st birthday, and it can be treated as a sequel to the two tributes to him by two groups of renowned statisticians and mathematicians published during the centenary. See B. Efron et al. [1] and Prakasa Rao et al. [3].

Research paper thumbnail of A statistical model for the relation between exoplanets and their host stars

arXiv: Data Analysis, Statistics and Probability, 2009

A general model is proposed to explain the relation between the extrasolar planets (or exoplanets... more A general model is proposed to explain the relation between the extrasolar planets (or exoplanets) detected until June 2008 and the main characteristics of their host stars through statistical techniques. The main goal is to establish a mathematical relation among the set of variables which better describe the physical characteristics of the host star and the planet itself. The host star is characterized by its distance, age, effective temperature, mass, metallicity, radius and magnitude. The exoplanet is described through its physical parameters (radius and mass) and its orbital parameters (distance, period, eccentricity, inclination and major semiaxis). As a first approach we consider that only the mass of the exoplanet is being determined by the physical properties of its host star. The proposed model is then validated through statistical analysis. Finally we discuss the categorical behavior of the dependent variable through binary models.

Research paper thumbnail of Statistical Methodology for Large Astronomical Surveys

Symposium - International Astronomical Union, 1998

Multiwavelength surveys present a variety of challenging statistical problems: raw data processin... more Multiwavelength surveys present a variety of challenging statistical problems: raw data processing, source identification, source characterization and classification, and interrelations between multiwavelength properties. For these last two issues, we discuss the applicability of standard and new multivariate statistical techniques. Traditional methods such as ANOVA, principal components analysis, cluster analysis, and tests for multivariate linear hypotheses are underutilized in astronomy and can be very helpful. Newer statistical methods such as projection pursuit, multivariate splines, and visualization tools such as XGobi are briefly introduced. However, multivariate databases from astronomical surveys present significant challenges to the statistical community. These include treatments of heteroscedastic measurement errors, censoring and truncation due to flux limits, and parameter estimation for nonlinear astrophysical models.

Research paper thumbnail of 20 Data-based sampling and model-based estimation for environmental resources

Handbook of Statistics, 1988

Research paper thumbnail of Edgeworth Expansions: A Brief Review of Zhidong Bai's Contributions

Advances in Statistics - Proceedings of the Conference in Honor of Professor Zhidong Bai on His 65th Birthday, 2008

Professor Bai's contributions to Edgeworth Expansions are reviewed. Author's collaborations with ... more Professor Bai's contributions to Edgeworth Expansions are reviewed. Author's collaborations with Professor Bai on the topic are also discussed.

Research paper thumbnail of The Spectrum of LSST Data Analysis Challenges: Kiloscale to Petascale

The unprecedented science opportunities enabled by LSST's wide-fast-deep mode of operation are ac... more The unprecedented science opportunities enabled by LSST's wide-fast-deep mode of operation are accompanied by equally unprecedented data analysis challenges, due to the huge size and synoptic scope of LSST data products. The most obvious challenges are those associated with processing the petabyte-scale fundamental LSST image data. But the challenges will not end with the production of official LSST catalogs and databases. Science with LSST data will present new data analysis challenges spanning a broad range of sizes, types, and complexity, requiring innovative methodological research across this full range. We present representative examples of LSST data analysis problems of various scales, displaying some of the diversity of astroinformatics/astrostatistics research astronomers using LSST data must undertake.

Research paper thumbnail of Linear Regression: Which Method should be used?

Research paper thumbnail of Statistical Challenges in Modern Astronomy

Technometrics, 1994

Despite centuries of close association, statistics and astronomy are surprisingly distant today. ... more Despite centuries of close association, statistics and astronomy are surprisingly distant today. Most observational astronomical research relies on an inadequate toolbox of methodological tools. Yet the needs are substantial: astronomy encounters sophisticated problems involving sampling theory, survival analysis, multivariate classification and analysis, time series analysis, wavelet analysis, spatial point processes, nonlinear regression, bootstrap resampling and model selection. We review the recent resurgence of astrostatistical research, and outline new challenges raised by the emerging Virtual Observatory. Our essay ends with a list of research challenges and infrastructure for astrostatistics in the coming decade.

Research paper thumbnail of Multivariate Permutation Tests

Research paper thumbnail of Bootstrap confidence intervals

Statistics & Probability Letters, 1988

Nonparametric confidence bounds are obtained for a wide class of statistics using bootstrap. Thes... more Nonparametric confidence bounds are obtained for a wide class of statistics using bootstrap. These results improve the errors in the probability estimates of the confidence intervals over the ones obtained by the normal approximation theory unconditionally.

Research paper thumbnail of Accuracy of the bootstrap approximation

Probability Theory and Related Fields, 1991

Summary The sampling distribution of several commonly occurring statistics are known to be closer... more Summary The sampling distribution of several commonly occurring statistics are known to be closer to the corresponding bootstrap distribution than the normal distribution, under some conditions on the moments and the smoothness of the population distribution. These conditional approximations are suggestive of the unconditional ones considered in this paper, though one cannot be derived from the other by elementary methods.

Research paper thumbnail of Astrostatistics

Research paper thumbnail of Bayesian model selection and extrasolar planet detection

The discovery of nearly 200 extrasolar planets during the last decade has revitalized scientific ... more The discovery of nearly 200 extrasolar planets during the last decade has revitalized scientific interest in the physics of planet formation and ushered in a new era for astronomy. Astronomers searching for the small signals induced by planets inevitably face significant statistical challenges. For example, radial velocity (RV) planet searches (that have discovered most of the known planets) are increasingly finding planets with small velocity amplitudes, with long orbital periods, or in multiple planet systems. Bayesian inference has the potential to improve the interpretation of existing observations, the planning of future observations and ultimately inferences concerning the overall population of planets. The main obstacle to applying Bayesian inference to extrasolar planet searches is the need to develop computationally efficient algorithms for calculating integrals over high-dimensional parameter spaces. In recent years, the refinement of Markov chain Monte Carlo (MCMC) algorithms has made it practical to accurately characterize orbital parameters and their uncertainties from RV observations of single-planet and weakly interacting multiple-planet systems. Unfortunately, MCMC is not sufficient for Bayesian model selection, i.e., comparing the marginal posterior probability of models with different parameters, as is necessary to determine how strongly the observational data favor a model with n + 1 planets over a model with just n planets. Many of the obvious estimators for the marginal posterior probability suffer from poor convergence properties. We compare several estimators of the marginal likelihood and feature those that display desirable convergence properties based on the analysis of a sample data set for HD 88133b Fischer et al. (2005). We find that methods based on importance sampling are most efficient, provided that a good analytic approximation of the posterior probability distribution is available. We present a simple algorithm for using a sample from the posterior to construct a mixture distribution that approximates the posterior and can be used for importance sampling and Bayesian model selection. We conclude with some suggestions for the development and refinement of computationally efficient and robust estimators of marginal posterior probabilities.

Research paper thumbnail of Implementing Astrostatistics in the Virtual Observatory

Data analysis in a Virtual Observatory will require a broad range of applied statistical tools. T... more Data analysis in a Virtual Observatory will require a broad range of applied statistical tools. Towards that end we have been exploring a few software models which will allow astronomers to transparently apply different statistical techniques to large datasets. This includes a prototype web-based service, which will be described in this poster. The backbone of the service will be a combination of existing and well tested computer languages/packages like C, IDL, PERL and R; but it will be flexible enough to allow users to use their own favourite packages as well. We will demonstrate our pipeline and discuss how it fits in well with the evolving VO architecture. This work is produced by an interdisciplinary (computational astrostatistics) collaborative effort supported by the NSF FRG grant DMS-0101360.

Research paper thumbnail of Doing Science with VOStat

Research paper thumbnail of Statistical Methodology for the National Virtual Observatory