Jay Breidt - Profile on Academia.edu (original) (raw)

Papers by Jay Breidt

Journal of Statistical Planning and Inference, 2011

The Cube method proposed by enables the selection of balanced samples : that is, samples such tha... more The Cube method proposed by enables the selection of balanced samples : that is, samples such that the Horvitz-Thompson estimators of auxiliary variables match the known totals of those variables. As an exact balanced sampling design often does not exist, the Cube method generally proceeds in two steps : a "flight phase" in which exact balance is maintained, and a "landing phase" in which the final sample is selected while respecting the balance conditions as closely as possible. Deville and Tillé (2005) derive a variance approximation for balanced sampling that takes account of the flight phase only, whereas the landing phase can prove to add non-negligible variance. This paper uses a martingale difference representation of the cube method to construct an efficient simulationbased method for calculating approximate second-order inclusion probabilities. The approximation enables nearly unbiased variance estimation, where the bias is primarily due to the limited number of simulations. In a Monte Carlo study, the proposed method has significantly less bias than the standard variance estimator, leading to improved confidence interval coverage.

Bayesian Analysis, 2017

The general projected normal distribution is a simple and intuitive model for directional data in... more The general projected normal distribution is a simple and intuitive model for directional data in any dimension: a multivariate normal random vector divided by its length is the projection of that vector onto the surface of the unit hypersphere. Observed data consist of the projections, but not the lengths. Inference for this model has been restricted to the two-dimensional (circular) case, using Bayesian methods with data augmentation to generate the latent lengths and a Metropolis-within-Gibbs algorithm to sample from the posterior. We describe a new parameterization of the general projected normal distribution that makes inference in any dimension tractable, including the important three-dimensional (spherical) case, which has not previously been considered. Under this new parameterization, the full conditionals of the unknown parameters have closed forms, and we propose a new slice sampler to draw the latent lengths without the need for rejection. Gibbs sampling with this new scheme is fast and easy, leading to improved Bayesian inference; for example, it is now feasible to conduct model selection among complex mixture and regression models for large data sets. Our parameterization also allows straightforward incorporation of covariates into the covariance matrix of the multivariate normal, increasing the ability of the model to explain directional data as a function of independent regressors. Circular and spherical cases are considered in detail and illustrated with scientific applications. For the circular case, seasonal variation in time-of-day departures of anglers from recreational fishing sites is modeled using covariates in both the mean vector and covariance matrix. For the spherical case, we consider paired angles that describe the relative positions of carbon atoms along the backbone chain of a protein. We fit mixtures of general projected normals to these data, with the best-fitting mixture accurately describing biologically meaningful structures including helices, β-sheets, and coils and turns. Finally, we show via simulation that our methodology has satisfactory performance in some 10-dimensional and 50-dimensional problems.

Spline Estimators of the Density Function of a Variable Measured with Error

Communications in Statistics - Simulation and Computation, Jan 4, 2003

Abstract The estimation of the distribution function of a random variable X measured with error i... more Abstract The estimation of the distribution function of a random variable X measured with error is studied. It is assumed that the measurement error has a normal distribution with known parameters. Let the i-th observation on X be denoted by Yi= Xi+ εi, where εi is the ...

Modeling nitrous oxide mitigation potential of enhanced efficiency nitrogen fertilizers from agricultural systems

Science of The Total Environment, Dec 1, 2021

Agriculture soils are responsible for a large proportion of global nitrous oxide (N2O) emissions-... more Agriculture soils are responsible for a large proportion of global nitrous oxide (N2O) emissions-a potent greenhouse gas and ozone depleting substance. Enhanced-efficiency nitrogen (N) fertilizers (EENFs) can reduce N2O emission from N-fertilized soils, but their effect varies considerably due to a combination of factors, including climatic conditions, edaphic characteristics and management practices. In this study, we further developed the DayCent ecosystem model to simulate two EENFs: controlled-release N fertilizers (CRNFs) and nitrification inhibitors (NIs) and evaluated their N2O mitigation potentials. We implemented a Bayesian calibration method using the sampling importance resampling (SIR) algorithm to derive a joint posterior distribution of model parameters that was informed by N2O flux measurements from corn production systems a network of experimental sites within the GRACEnet program. The joint posterior distribution can be applied to estimate predictions of N2O reduction factors when EENFs are adopted in place of conventional urea-based N fertilizer. The resulting median reduction factors were - 11.9% for CRNFs (ranging from -51.7% and 0.58%) and - 26.7% for NIs (ranging from -61.8% to 3.1%), which is comparable to the measured reduction factors in the dataset. By incorporating EENFs, the DayCent ecosystem model is able to simulate a broader suite of options to identify best management practices for reducing N2O emissions.

Institute of Mathematical Statistics eBooks, 2006

The first-order moving average model or MA(1) is given by Xt = Zt − θ 0 Z t−1 , with independent ... more The first-order moving average model or MA(1) is given by Xt = Zt − θ 0 Z t−1 , with independent and identically distributed {Zt}. This is arguably the simplest time series model that one can write down. The MA(1) with unit root (θ 0 = 1) arises naturally in a variety of time series applications. For example, if an underlying time series consists of a linear trend plus white noise errors, then the differenced series is an MA(1) with unit root. In such cases, testing for a unit root of the differenced series is equivalent to testing the adequacy of the trend plus noise model. The unit root problem also arises naturally in a signal plus noise model in which the signal is modeled as a random walk. The differenced series follows a MA(1) model and has a unit root if and only if the random walk signal is in fact a constant. The asymptotic theory of various estimators based on Gaussian likelihood has been developed for the unit root case and nearly unit root case (θ = 1+β/n, β ≤ 0). Unlike standard 1/ √ n-asymptotics, these estimation procedures have 1/n-asymptotics and a so-called pileup effect, in which P(θ = 1) converges to a positive value. One explanation for this pileup phenomenon is the lack of identifiability of θ in the Gaussian case. That is, the Gaussian likelihood has the same value for the two sets of parameter values (θ, σ 2) and (1/θ, θ 2 σ 2). It follows that θ = 1 is always a critical point of the likelihood function. In contrast, for non-Gaussian noise, θ is identifiable for all real values. Hence it is no longer clear whether or not the same pileup phenomenon will persist in the non-Gaussian case. In this paper, we focus on limiting pileup probabilities for estimates of θ 0 based on a Laplace likelihood. In some cases, these estimates can be viewed as Least Absolute Deviation (LAD) estimates. Simulation results illustrate the limit theory.

A procedure for computing exact maximum likelihood estimates (MLEs) is proposed for non-Gaussian ... more A procedure for computing exact maximum likelihood estimates (MLEs) is proposed for non-Gaussian moving average (MA) processes. By augmenting the data with appropriate latent variables, a joint likelihood can be explicitly expressed based on the observed data and the latent variables. The exact MLE can then be obtained numerically by the EM algorithm. Two alternative likelihoodbased methods are also proposed using different treatments of the latent variables. These approximate MLEs are shown to be asymptotically equivalent to the exact MLE. In simulations, the exact MLE obtained by EM performs better than other likelihood-based estimators, including another approximate MLE due to Lii and Rosenblatt (1992). The exact MLE has a smaller root mean square error in small samples for various non-Gaussian MA processes, particularly for the non-invertible cases.

Developing a Long-Term Monitoring Network for Monitoring Soil Carbon Stocks in the U.S

AGU Fall Meeting Abstracts, Dec 1, 2006

ABSTRACT Monitoring carbon pools is important for evaluating modeled results, particularly for th... more ABSTRACT Monitoring carbon pools is important for evaluating modeled results, particularly for the slower dynamics of soil C pools. Thus, a national, long-term soil monitoring network is being established on cropland and grazing lands in the U.S. Data from this network will serve as an independent evaluation for model-based estimates of soil carbon stocks. The soil monitoring network will consist of permanently marked sample sites associated with the USDA-NRCS Natural Resource Inventory (NRI), with an anticipated resampling frequency of 5-10 years. Sample stratification has been designed to minimize variance associated with soil carbon stock changes for similar land use and climate-soil combinations, with strata largely based on the USDA's Major Land Resource Areas (MLRA). The national sample has been distributed to minimize the standard error of prediction based on a Neyman Allocation, with goal of detecting soil carbon stock changes at regional and national scales over 5-10 year time periods. A pilot phase is currently underway to establish the network. Site-scale sampling is utilizing a triangular design to minimize within site variance. During the first sample, soils are collected at the corners of the triangle, and then repeated sampling will occur along the sides until reaching the next corner over about a 100 year time horizon. Three triangles are being sampled at each NRI point. A formal uncertainty analysis of modeled results will be conducted using initial carbon data measured at ~35 sites from the Mid-Continent Region of the U.S., comparing modeled data to measurements. In addition, we are incorporating remotely sensed image data into the analysis to further constrain the uncertainty in soil carbon estimates provided by ecosystem models. The soil carbon monitoring network will provide much-needed model evaluation data at a broad scale, which will improve the quantification of carbon sources and sinks.

Human and Ecological Risk Assessment, Aug 26, 2019

A one-year angler intercept survey was conducted on the lower 17 miles of the Passaic River, an u... more A one-year angler intercept survey was conducted on the lower 17 miles of the Passaic River, an urban industrialized river that flows through Newark, New Jersey. The purpose of the survey was to collect data about anglers' behaviors and fish consumption habits in order to calculate exposure factors for a human health risk assessment of the Study Area. This paper focuses on estimating site-specific fish consumption rates for LPRSA anglers that consume their catch. The study design included on-site interviews and counts (angler enumeration). Forty survey locations were included in the stratified random sampling plan; interviews were conducted on 136 days and counts on 164 days. After matching intercepts with the same angler, a total of 294 anglers were interviewed, of which 25 reported consuming their catch. LPRSA fishing trips ranged from 2 to nearly 50 annual trips for anglers who reported consuming their catch. Species caught and reported to be consumed included carp, catfish, white perch, smallmouth bass, and eel. The estimated mean and 90th percentile consumption rates for the population of consuming anglers are 5.0 and 8.8 g/day, respectively. Based on sensitivity analyses, the 90th percentile fish consumption rates range from approximately 4 to 18 g/day.

Using Machine Learning Methods to Improve the Representation of Management Activity Data in the US Soil Greenhouse Gas Emissions Inventory

Revista Colombiana de Estadistica, 2009

This paper presents a new regression estimator for the total of a population created by means of ... more This paper presents a new regression estimator for the total of a population created by means of the minimization of a measure of dispersion and the use of the Wilcoxon scores. The use of a particular nonparametric model is considered in order to obtain a model-assisted estimator by means of the generalized difference estimator. First, an estimator of the vector of the regression coefficients for the finite population is presented and then, using the generalized difference principles, an estimator for the total a population is proposed. The study of the accuracy and efficiency measures, such as design bias and mean square error of the estimators, is carried out through simulation experiments.

Journal of Survey Statistics and Methodology, 2017

Canadian Journal of Statistics, 2016

We consider small area estimation for the departure times of recreational anglers along the Atlan... more We consider small area estimation for the departure times of recreational anglers along the Atlantic and Gulf coasts of the United States. A Bayesian area-level Fay-Herriot model is considered to obtain estimates of the departure time distribution functions. The departure distribution functions are modelled as circular distributions plus area-specific errors. The circular distributions are modelled as projected normal, and a regression model is specified to borrow information across domains. Estimation is conducted through the use of a Hamiltonian Monte Carlo sampler and a projective approach onto the probability simplex. The

Developing a Long-Term Monitoring Network for Monitoring Soil Carbon Stocks in the U.S

Monitoring carbon pools is important for evaluating modeled results, particularly for the slower ... more Monitoring carbon pools is important for evaluating modeled results, particularly for the slower dynamics of soil C pools. Thus, a national, long-term soil monitoring network is being established on cropland and grazing lands in the U.S. Data from this network will serve as an independent evaluation for model-based estimates of soil carbon stocks. The soil monitoring network will consist of permanently marked sample sites associated with the USDA-NRCS Natural Resource Inventory (NRI), with an anticipated resampling frequency of 5-10 years. Sample stratification has been designed to minimize variance associated with soil carbon stock changes for similar land use and climate-soil combinations, with strata largely based on the USDA's Major Land Resource Areas (MLRA). The national sample has been distributed to minimize the standard error of prediction based on a Neyman Allocation, with goal of detecting soil carbon stock changes at regional and national scales over 5-10 year time p...

Best mean square prediction for moving averages

Statistica Sinica

Best mean square prediction for moving average time series models is generally non-linear predict... more Best mean square prediction for moving average time series models is generally non-linear prediction, even in the invertible case. Gaussian processes are an exception, since best linear prediction is always best mean square predic-tion. Stable numerical recursions are proposed for computation of residuals and evaluation of unnormalized conditional distributions in invertible or non-invertible moving average models, including those with distinct unit roots. The conditional distributions allow evaluation of the best mean square predictor via computation of a low-dimensional integral. For finite, discrete innovations, the method yields best mean square predictors exactly. For continuous innovations, an importance sampling scheme is proposed for numerical approximation of the best mean square predictor and its prediction mean square error. In numerical experiments, the method accurately computes best mean square predictors for cases with known solutions. The approximate best mean square...

Journal of Time Series Analysis, 2012

Small‐angle X‐ray scattering (SAXS) is a technique for obtaining low‐resolution structural inform... more Small‐angle X‐ray scattering (SAXS) is a technique for obtaining low‐resolution structural information about biological macromolecules, by exposing a dilute solution to a high‐intensity X‐ray beam and capturing the resulting scattering pattern on a two‐dimensional detector. The two‐dimensional pattern is reduced to a one‐dimensional curve through radial averaging, that is, by averaging across annuli on the detector plane. Subsequent analysis of structure relies on these one‐dimensional data. This article reviews the technique of SAXS and investigates autocorrelation structure in the detector plane and in the radial averages. Across a range of experimental conditions and molecular types, spatial autocorrelation in the detector plane is present and is well‐described by a stationary kernel convolution model. The corresponding autocorrelation structure for the radial averages is non‐stationary. Implications of the autocorrelation structure for inference about macromolecular structure ar...

Journal of Econometrics, 1998

We propose a new time series representation of persistence in conditional variance called a long ... more We propose a new time series representation of persistence in conditional variance called a long memory stochastic volatility (LMSV) model. The LMSV model is constructed by incorporating an ARFIMA process in a standard stochastic volatility scheme. Strongly consistent estimators of the parameters of the model are obtained by maximizing the spectral approximation to the Gaussian likelihood. The finite sample properties of the spectral likelihood estimator are analyzed by means of a Monte Carlo study. An empirical example with a long time series of stock prices demonstrates the superiority of the LMSV model over existing (short-memory) volatility models, ~ 1998 Elsevier Science S.A.

Journal of Computational and Graphical Statistics, 2004

It is common in practice to estimate the quantiles of a complicated distribution by using the ord... more It is common in practice to estimate the quantiles of a complicated distribution by using the order statistics of a simulated sample. If the distribution of interest has known population mean, then it is often possible to improve the mean square error of the standard quantile estimator substantially through the simple device of meancorrection: subtract off the sample mean and add on the known population mean. Asymptotic results for the mean-corrected quantile estimator are derived and compared to the standard sample quantile. Simulation results for a variety of distributions and processes illustrate the asymptotic theory. Application to Markov chain Monte Carlo and to simulation-based uncertainty analysis is described.

Sampling Schemes for Policy Analyses Using Computer Simulation Experiments

Environmental Management, 1998

Environmental and Ecological Statistics, 2012

Terrestrial CO 2 flux estimates are obtained from two fundamentally different methods generally t... more Terrestrial CO 2 flux estimates are obtained from two fundamentally different methods generally termed bottom-up and top-down approaches. Inventory methods are one type of bottom-up approach which uses various sources of information such as crop production surveys and forest monitoring data to estimate the annual CO 2 flux at locations covering a study region. Top-down approaches are various types of atmospheric inversion methods which use CO 2 concentration measurements from monitoring towers and atmospheric transport models to estimate CO 2 flux over a study region. Both methods can also quantify the uncertainty associated with their estimates. Historically, these two approaches have produced estimates that differ considerably. The goal of this work is to construct a statistical model which sensibly combines estimates from the two approaches to produce a new estimate of CO 2 flux for our study region. The two approaches have complementary strengths and weaknesses, and our results show that certain aspects of the uncertainty associated with each of the approaches are greatly reduced by combining the methods. Our model is purposefully simple and designed to take the two approaches' estimates and measures of uncertainty at 'face value'. Specifically, we use a constrained least-squares approach to appropriately weigh the estimates by the inverse of their variance, and the constraint

Journal of Statistical Planning and Inference, 2011

Bayesian Analysis, 2017

Spline Estimators of the Density Function of a Variable Measured with Error

Communications in Statistics - Simulation and Computation, Jan 4, 2003

Modeling nitrous oxide mitigation potential of enhanced efficiency nitrogen fertilizers from agricultural systems

Science of The Total Environment, Dec 1, 2021

Institute of Mathematical Statistics eBooks, 2006

Developing a Long-Term Monitoring Network for Monitoring Soil Carbon Stocks in the U.S

AGU Fall Meeting Abstracts, Dec 1, 2006

Human and Ecological Risk Assessment, Aug 26, 2019

Using Machine Learning Methods to Improve the Representation of Management Activity Data in the US Soil Greenhouse Gas Emissions Inventory

Revista Colombiana de Estadistica, 2009

Journal of Survey Statistics and Methodology, 2017

Canadian Journal of Statistics, 2016

Developing a Long-Term Monitoring Network for Monitoring Soil Carbon Stocks in the U.S

Best mean square prediction for moving averages

Statistica Sinica

Journal of Time Series Analysis, 2012

Journal of Econometrics, 1998

Journal of Computational and Graphical Statistics, 2004

Sampling Schemes for Policy Analyses Using Computer Simulation Experiments

Environmental Management, 1998

Environmental and Ecological Statistics, 2012