Shesh Rai - Academia.edu (original) (raw)

Papers by Shesh Rai

Research paper thumbnail of Differential Expression Analysis of Single-Cell RNA-Seq Data: Current Statistical Approaches and Outstanding Challenges

Entropy

With the advent of single-cell RNA-sequencing (scRNA-seq), it is possible to measure the expressi... more With the advent of single-cell RNA-sequencing (scRNA-seq), it is possible to measure the expression dynamics of genes at the single-cell level. Through scRNA-seq, a huge amount of expression data for several thousand(s) of genes over million(s) of cells are generated in a single experiment. Differential expression analysis is the primary downstream analysis of such data to identify gene markers for cell type detection and also provide inputs to other secondary analyses. Many statistical approaches for differential expression analysis have been reported in the literature. Therefore, we critically discuss the underlying statistical principles of the approaches and distinctly divide them into six major classes, i.e., generalized linear, generalized additive, Hurdle, mixture models, two-class parametric, and non-parametric approaches. We also succinctly discuss the limitations that are specific to each class of approaches, and how they are addressed by other subsequent classes of approa...

Research paper thumbnail of Decreased Tumoral Expression of Colon-Specific Water Channel Aquaporin 8 Is Associated With Reduced Overall Survival in Colon Adenocarcinoma

Diseases of the Colon & Rectum, 2021

Supplemental Digital Content is available in the text. BACKGROUND: Colon cancer survival is depen... more Supplemental Digital Content is available in the text. BACKGROUND: Colon cancer survival is dependent on metastatic potential and treatment. Large RNA-sequencing data sets may assist in identifying colon cancer-specific biomarkers to improve patient outcomes. OBJECTIVE: This study aimed to identify a highly specific biomarker for overall survival in colon adenocarcinoma by using an RNA-sequencing data set. DESIGN: Raw RNA-sequencing and clinical data for patients with colon adenocarcinoma (n = 271) were downloaded from The Cancer Genome Atlas. A binomial regression model was used to calculate differential RNA expression between paired colon cancer and normal epithelium samples (n = 40). Highly differentially expressed RNAs were examined. SETTINGS: This study was conducted at the University of Louisville using data acquired by The Cancer Genome Atlas. PATIENTS: Patients from US accredited cancer centers between 1998 and 2013 were analyzed. MAIN OUTCOME MEASURES: The primary outcome measures were recurrence-free and overall survival. RESULTS: The median age was 66 years (147/271 men, 180/271 White patients). Thirty RNAs were differentially expressed in colon adenocarcinoma compared with paired normal epithelium, using a log-fold change cutoff of ±6. Using median expression as a cutoff, 4 RNAs were associated with worse overall survival: decreased ZG16 (log-rank = 0.023), aquaporin 8 (log-rank = 0.023), and SLC26A3 (log-rank = 0.098), and increased COL1A1 (log-rank = 0.105). On multivariable analysis, low aquaporin 8 expression (HR, 1.748; 95% CI, 1.016–3.008; p = 0.044) was a risk factor for worse overall survival. Our final aquaporin 8 model had an area under the curve of 0.85 for overall survival. On subgroup analysis, low aquaporin 8 was associated with worse overall survival in patients with high microsatellite instability and in patients with stage II disease. Low aquaporin 8 expression was associated with KRAS and BRAF mutations. Aquaporin 8 immunohistochemistry was optimized for clinical application. LIMITATIONS: This was a retrospective study. CONCLUSION: Aquaporin 8 is a water channel selectively expressed in normal colon tissue. Low aquaporin 8 expression is a risk factor for worse overall survival in patients who have colon cancer. Aquaporin 8 measurement may have a role as a colon-specific prognostic biomarker and help in patient risk stratification for increased surveillance. See Video Abstract at http://links.lww.com/DCR/B603. LA DISMINUCIÓN DE LA EXPRESIÓN TUMORAL DE LA ACUAPORINA 8 DEL CANAL DE AGUA ESPECÍFICO DEL COLON SE ASOCIA CON UNA REDUCCIÓN DE LA SUPERVIVENCIA GENERAL EN EL ADENOCARCINOMA DE COLON ANTECEDENTES: La supervivencia del cáncer de colon depende del potencial metastásico y del tratamiento. Grandes conjuntos de datos de secuenciación de ARN pueden ayudar a identificar biomarcadores específicos del cáncer de colon para mejorar los resultados de los pacientes. OBJETIVO: Identificar un biomarcador altamente específico para la supervivencia general en el adenocarcinoma de colon utilizando un conjunto de datos de secuenciación de ARN. DISEÑO: La secuenciación de ARN sin procesar y los datos clínicos para pacientes con adenocarcinoma de colon (n = 271) se descargaron de The Cancer Genome Atlas. Se utilizó un modelo de regresión binomial para calcular la expresión diferencial de ARN entre muestras de cáncer de colon emparejadas y muestras de epitelio normal (n = 40). Se examinaron los ARN expresados de forma altamente diferencial. ENTORNO CLINICO: Este estudio se realizó en la Universidad de Louisville utilizando datos adquiridos por The Cancer Genome Atlas. PACIENTES: Se analizaron pacientes de centros oncológicos acreditados en Estados Unidos entre 1998-2013. PRINCIPALES MEDIDAS DE VALORACION: Las principales medidas de valoración fueron la supervivencia general y libre de recurrencia. RESULTADOS: La mediana de edad fue de 66 años (147/271 hombres, 180/271 caucásicos). Treinta ARN se expresaron diferencialmente en el adenocarcinoma de colon en comparación con el epitelio normal emparejado, utilizando un límite de cambio logarítmico de ± 6. Utilizando la expresión mediana como punto de corte, cuatro ARN se asociaron con una peor supervivencia general: disminución de ZG16 (rango logarítmico = 0,023), acuaporina8 (rango logarítmico = 0,023) y SLC26A3 (rango logarítmico = 0,098) y aumento de COL1A1 (log -rango = 0,105). En el análisis multivariable, la baja expresión de acuaporina8 (HR = 1,748, IC del 95%: 1,016-3,008, p = 0,044) fue un factor de riesgo para una peor supervivencia global. Nuestro modelo de aquaporin8 final tuvo un AUC de 0,85 para la supervivencia global. En el análisis de subgrupos, la acuaporina8 baja se asoció con una peor supervivencia general en pacientes con MSI-H y en pacientes en estadio II. La baja expresión de acuaporina8 se asoció con mutaciones de KRAS y BRAF. La inmunohistoquímica de aquaporina8 se optimizó para su aplicación clínica. LIMITACIONES: Este…

Research paper thumbnail of Crohn’s disease–related single nucleotide polymorphisms are associated with ileal pouch afferent limb stenosis

Journal of Gastrointestinal Surgery, 2021

Ileal pouch-anal anastomosis (IPAA) is a common surgical treatment for ulcerative colitis. Affere... more Ileal pouch-anal anastomosis (IPAA) is a common surgical treatment for ulcerative colitis. Afferent limb stenosis is an infrequent complication following IPAA, suggesting underlying Crohn’s disease (CD). We hypothesized that CD-related single nucleotide polymorphisms (SNPs) are associated with afferent limb stenosis. Afferent limb stenosis and CD control group patients were recruited from a prospective institutional inflammatory bowel disease database and associated biobank. Patient demographics, Montreal classification, and medication use were recorded. Ten SNPs associated with stricturing Crohn’s disease were examined in genomic DNA and compared among afferent limb stenosis, stricturing CD, and non-stricturing CD controls. Twenty-seven afferent limb stenosis and 162 CD control group patients (108 stricturing, 54 non-stricturing) were identified. Patients were gender and race matched. Afferent limb stenosis and stricturing CD controls were younger at diagnosis (Montreal A1/A2 vs. A3) compared to non-stricturing CD controls (both p < 0.05). The majority of afferent limb stenosis patients were non-smokers compared to CD controls (74% vs. 36%, p < 0.01) and did not use biologic therapies (4% vs. 37%, p < 0.001). The FUT2 G allele was more frequent in afferent limb stenosis and stricturing CD controls compared to non-stricturing CD controls (both p < 0.05). The NOD2 T allele was more frequent in stricturing CD controls compared to afferent limb stenosis and non-stricturing CD controls (both p < 0.05). Afferent limb stenosis patients are phenotypically similar to stricturing CD controls, but differ with lower smoking rates and lower NOD2 allele frequency. Such differences could contribute to the presentation delay with a stricturing phenotype. Selective SNP assessment may help categorize patients likely to develop afferent limb stenosis.

Research paper thumbnail of Role of neutrophil to lymphocyte ratio in addition to revised international prognostic index (R-IPI) in patients with extranodal diffuse large B-cell lymphoma of head and neck

Journal of Clinical Oncology, 2017

e19032 Background: Diffuse large B-cell lymphoma (DLBCL) is the most prevalent subtype of non-Hod... more e19032 Background: Diffuse large B-cell lymphoma (DLBCL) is the most prevalent subtype of non-Hodgkin lymphoma (NHL). One third of DLBCLs cases have a primary extranodal origin and head and neck localization is second most common localization after gastrointestinal tract. The Revised-International Prognostic Index (r-IPI) is commonly used as prognostic tool, but there is growing evidence that neutrophil to lymphocyte ratio (NLR) also has prognostic significance in DLBCL. Methods: We retrospectively reviewed all cases of extranodal DLBCLs diagnosed between 2006 and 2016 at a single academic institution. Collected data included race, gender, primary site, baseline laboratory data, IPI score, pathology, treatment and survival. Results: A total of 33 patient were included, with 18 (54.5%) being females. Median age at diagnosis was 68 (range 28-92). 15% of patients had a r-IPI of 0, 30% a r-IPI of 1-2, 12% a rIPI of 3-5 and 36% a not evaluable (NE) r-IPI. Twelve (36%) patients had germin...

Research paper thumbnail of Role of neutrophil to lymphocyte ratio in addition to revised international prognostic index (R-IPI) in patients with extranodal diffuse large B-cell lymphoma of head and neck

Journal of Clinical Oncology, 2017

e19032 Background: Diffuse large B-cell lymphoma (DLBCL) is the most prevalent subtype of non-Hod... more e19032 Background: Diffuse large B-cell lymphoma (DLBCL) is the most prevalent subtype of non-Hodgkin lymphoma (NHL). One third of DLBCLs cases have a primary extranodal origin and head and neck localization is second most common localization after gastrointestinal tract. The Revised-International Prognostic Index (r-IPI) is commonly used as prognostic tool, but there is growing evidence that neutrophil to lymphocyte ratio (NLR) also has prognostic significance in DLBCL. Methods: We retrospectively reviewed all cases of extranodal DLBCLs diagnosed between 2006 and 2016 at a single academic institution. Collected data included race, gender, primary site, baseline laboratory data, IPI score, pathology, treatment and survival. Results: A total of 33 patient were included, with 18 (54.5%) being females. Median age at diagnosis was 68 (range 28-92). 15% of patients had a r-IPI of 0, 30% a r-IPI of 1-2, 12% a rIPI of 3-5 and 36% a not evaluable (NE) r-IPI. Twelve (36%) patients had germin...

Research paper thumbnail of Comparison of coliform contamination in non-municipal waters consumed by the Mennonite versus the non-Mennonite rural populations

Environmental Health and Preventive Medicine, 2015

Objectives Mennonites reside in clusters, do not use modern sewage systems and consume water from... more Objectives Mennonites reside in clusters, do not use modern sewage systems and consume water from nonmunicipal sources. The purpose of this study is to assess risk of Escherichia coli exposure via consumption of nonmunicipal waters in Mennonite versus non-Mennonite rural households. Methods Results were reviewed for non-municipal water samples collected by the local health department from Mennonite and non-Mennonite lifestyle households from 1998 through 2012. Water contamination was examined with the help of two study variables: water quality (potable, polluted) and gastrointestinal (GI) health risk (none, low, high). These variables were analyzed for association with lifestyle (Mennonite, non-Mennonite) and season (fall, winter, spring, summer) of sample collection. Data were split into two periods to adjust for the ceiling effect of laboratory instrument. Results From the entire cohort, 82 % samples were polluted and 46 % samples contained E. coli, which is consistent with high GI health risk. In recent years (2009 through 2012), the presence of total coliforms was higher in non-Mennonites (39 %, P = 0.018) and presence of E. coli was higher in Mennonites (P = 0.012). Most polluted samples were collected during summer (45 %, P = 0.019) and had high GI health risk (51 %, P = 0.008) as compared to other seasons. Conclusions Majority of non-municipal waters in this region are polluted, consuming those poses a high GI health risk and contamination is prevalent in all households consuming these waters. An association of E. coli exposure with the Mennonite lifestyle was limited to recent years. Seasons with high heat index and increased surface runoffs were the riskiest to consume non-municipal waters.

Research paper thumbnail of Implementing an intervention to improve bone mineral density in survivors of childhood acute lymphoblastic leukemia: BONEII, a prospective placebo-controlled double-blind randomized interventional longitudinal study design

Contemporary Clinical Trials, 2008

The BONEII study is a large two-phase study. The baseline study (Study 1) aims to estimate the pr... more The BONEII study is a large two-phase study. The baseline study (Study 1) aims to estimate the prevalence of diminished bone mineral density (BMD) in patients treated for childhood acute lymphoblastic leukemia (ALL) and identify risk factors for BMD deficits. The interventional phase (Study 2) of BONEII has a placebo-controlled double-blind randomized longitudinal design to evaluate the effects of nutritional counseling and calcium and vitamin D supplementation on changes in BMD and serum and urine markers of bone metabolism. The extensive information being collected through this large study will serve as a repository of relational data about BMD and bone turnover and will support further investigations to assess the association of calcium metabolism, bone turnover, nutritional intake, lifestyle factors (such as exercise and the use of alcohol and tobacco), and the specific agents used in ALL therapy in this rapidly increasing population of childhood cancer survivors.

Research paper thumbnail of Robust Estimation and Inference on Current Status Data with Applications to Phase IV Cancer Trial

Journal of Modern Applied Statistical Methods, 2018

Research paper thumbnail of Additional file 5 of Micro-RNA-186-5p inhibition attenuates proliferation, anchorage independent growth and invasion in metastatic prostate cancer cells

Figure S2. Up-regulation of AKAP12 in PC-3 cells and pAKT in HEK 293 T cells. A) Total RNA was co... more Figure S2. Up-regulation of AKAP12 in PC-3 cells and pAKT in HEK 293 T cells. A) Total RNA was collected from PC-3 cells transfected with miR-186-5p inhibitor and scramble control 72 h post-transfection. AKAP12 transcript expression was increased by 1.7-fold in transient miR-186-5p inhibited PC-3 cells relative to negative controls (p = 0.0597). B) Protein lysate (35 μg) was collected from HEK 293 T cells transfected with miR-186-5p mimic and scramble control 72 h post-transfection. pAKT expression was enhanced by 1.67-fold increase via miR-186-5p overexpression in HEK 293 T cells (p = 0.196). Data was quantitated from at least 2–3 independent experiments and are represented as mean ± S.D. (TIFF 457 kb)

Research paper thumbnail of Interactive Web Tool for Standardizing Proteomics Workflow for Liquid Chromatography-Mass Spectrometry Data

Journal of Proteomics & Bioinformatics, May 23, 2019

The proteomics experiments involve several steps and there are many choices available for each st... more The proteomics experiments involve several steps and there are many choices available for each step in the workflow. Therefore, standardization of proteomics workflow is an essential task for design of proteomics experiments. However, there are challenges associated with the quantitative measurements based on liquid chromatography-mass spectrometry such as heterogeneity due to technical variability and missing values. We introduce a web application, Proteomics Workflow Standardization Tool (PWST) to standardize the proteomics workflow. The tool will be helpful in deciding the most suitable choice for each step of the experimentation. This is based on identifying steps/choices with least variability such as comparing Coefficient of Variation (CV). We demonstrate the tool on data with categorical and continuous variables. We have used the special cases of general linear model, analysis of covariance and analysis of variance with fixed effects to study the effects due to various source...

Research paper thumbnail of Lead time distribution for individuals with a screening history

Statistics and Its Interface, 2021

Research paper thumbnail of Statistical Approach of Gene Set Analysis with Quantitative Trait Loci for Crop Gene Expression Studies

Entropy, 2021

Genome-wide expression study is a powerful genomic technology to quantify expression dynamics of ... more Genome-wide expression study is a powerful genomic technology to quantify expression dynamics of genes in a genome. In gene expression study, gene set analysis has become the first choice to gain insights into the underlying biology of diseases or stresses in plants. It also reduces the complexity of statistical analysis and enhances the explanatory power of the obtained results from the primary downstream differential expression analysis. The gene set analysis approaches are well developed in microarrays and RNA-seq gene expression data analysis. These approaches mainly focus on analyzing the gene sets with gene ontology or pathway annotation data. However, in plant biology, such methods may not establish any formal relationship between the genotypes and the phenotypes, as most of the traits are quantitative and controlled by polygenes. The existing Quantitative Trait Loci (QTL)-based gene set analysis approaches only focus on the over-representation analysis of the selected genes ...

Research paper thumbnail of Liquid Liver Biopsy in Residential Cohort Exposed to Polychlorinated Biphenyls Is Consistent with Steatohepatitis

Research paper thumbnail of Coronavirus (COVID-19): A Systematic Review and Meta-analysis to Evaluate the Significance of Demographics and Comorbidities

Background The unprecedented outbreak of a contagious respiratory disease caused by a novel coron... more Background The unprecedented outbreak of a contagious respiratory disease caused by a novel coronavirus has led to a pandemic since December 2019, claiming millions of lives. The study systematically reviews and summarizes COVID-19’s impact based on symptoms, demographics, comorbidities, and demonstrates the association of demographics in cases and mortality in the United States.Methods PubMed and Google Scholar were searched from December 2019- August 2020, and articles restricted to the English language were collected following PRISMA guidelines. US CDC data was used for establishing statistical significance of age, sex, and race.Results• Among 3745 patients in China, mean age is 50.63 (95% CI: 36.84, 64.42) years, and 55.7 % (95% CI: 52.2, 59.2) were males. Symptoms included fever 86.5% (82.7, 90.0), fatigue 41.9% (32.7, 51.4), dyspnea 29.0% (21.2, 37.5), cough 66.0% (61.3, 70.6), mucus 66% (61.3, 70.6), lymphopenia 18.9% (5.2, 38.0). Prevalent comorbidities were hypertension 16....

Research paper thumbnail of Fifteen Years of Gene Set Analysis for High-Throughput Genomic Data: A Review of Statistical Approaches and Future Challenges

Entropy, 2020

Over the last decade, gene set analysis has become the first choice for gaining insights into und... more Over the last decade, gene set analysis has become the first choice for gaining insights into underlying complex biology of diseases through gene expression and gene association studies. It also reduces the complexity of statistical analysis and enhances the explanatory power of the obtained results. Although gene set analysis approaches are extensively used in gene expression and genome wide association data analysis, the statistical structure and steps common to these approaches have not yet been comprehensively discussed, which limits their utility. In this article, we provide a comprehensive overview, statistical structure and steps of gene set analysis approaches used for microarrays, RNA-sequencing and genome wide association data analysis. Further, we also classify the gene set analysis approaches and tools by the type of genomic study, null hypothesis, sampling model and nature of the test statistic, etc. Rather than reviewing the gene set analysis approaches individually, w...

Research paper thumbnail of Improved Confidence Intervals for Fixed Term Survival Probabilities in a Small Two-Arm Trial

BackgroundThe confidence interval for survival probability at a fixed time point provides valuabl... more BackgroundThe confidence interval for survival probability at a fixed time point provides valuable information on how the subject performs in terms of survival rate. However, in a two-arm trial when the sample size in each group is small or when the distribution of events that occurred within the group is skewed, the confidence interval might become very unstable, and thus may not provide accurate information for estimating survival rate. In addition, when there are other covariates available in the dataset, it is important to select those significant variables and include them in the model. On the other hand, researchers such as physicians who pay more attention to the final result often analyze the treatment group and control group separately, which may lead to inaccurate prediction. MethodsIn this study, two treatment groups are combined, and the group indicator variable is considered as a covariate and is included in the model for computation. Yuan and Rai’s adjusted effective s...

Research paper thumbnail of Clinical Design for Phase II/III Clinical Trials for Testing Therapeutic Interventions in COVID-19 Patients

Background Researchers around the world are urgently conducting clinical trials to develop new tr... more Background Researchers around the world are urgently conducting clinical trials to develop new treatments for reducing mortality and morbidity related to COVID-19. However, due to unknown features of the disease and complexity of the patient population, traditional trial designs may not be optimal in such patients. We propose two independent clinical trials designs based on careful grouping of the expected characteristics of patient population. This could serve as a useful guide for researchers designing COVID-19 related Phase II/III trials. Methods Using the commonly utilized World Health Organization ordinal scale on patient status, we classify patients into three risk groups. In this approach, patients in Stages 3, 4 and 5 are categorized as the intermediate-risk group while patients in Stages 6 and 7 are categorized as the high-risk group. To ensure that an intervention, if deemed efficacious, is promptly made available to vulnerable patients, we propose a group sequential desig...

Research paper thumbnail of NHERF1 Loss Upregulates Enzymes of the Pentose Phosphate Pathway in Kidney Cortex

Antioxidants, 2020

(1) Background: We previously showed Na/H exchange regulatory factor 1 (NHERF1) loss resulted in ... more (1) Background: We previously showed Na/H exchange regulatory factor 1 (NHERF1) loss resulted in increased susceptibility to cisplatin nephrotoxicity. NHERF1-deficient cultured proximal tubule cells and proximal tubules from NHERF1 knockout (KO) mice exhibit altered mitochondrial protein expression and poor survival. We hypothesized that NHERF1 loss results in changes in metabolic pathways and/or mitochondrial dysfunction, leading to increased sensitivity to cisplatin nephrotoxicity. (2) Methods: Two to 4-month-old male wildtype (WT) and KO mice were treated with vehicle or cisplatin (20 mg/kg dose IP). After 72 h, kidney cortex homogenates were utilized for metabolic enzyme activities. Non-treated kidneys were used to isolate mitochondria for mitochondrial respiration via the Seahorse XF24 analyzer. Non-treated kidneys were also used for LC-MS analysis to evaluate kidney ATP abundance, and electron microscopy (EM) was utilized to evaluate mitochondrial morphology and number. (3) Results: KO mouse kidneys exhibit significant increases in malic enzyme and glucose-6 phosphate dehydrogenase activity under baseline conditions but in no other gluconeogenic or glycolytic enzymes. NHERF1 loss does not decrease kidney ATP content. Mitochondrial morphology, number, and area appeared normal. Isolated mitochondria function was similar between WT and KO. Conclusions: KO kidneys experience a shift in metabolism to the pentose phosphate pathway, which may sensitize them to the oxidative stress imposed by cisplatin.

Research paper thumbnail of Standardizing Proteomics Workflow for Liquid Chromatography-Mass Spectrometry: Technical and Statistical Considerations

Journal of Proteomics & Bioinformatics, 2019

Research paper thumbnail of Multi-group diagnostic classification of high-dimensional data using differential scanning calorimetry plasma thermograms

PLOS ONE, 2019

The thermoanalytical technique differential scanning calorimetry (DSC) has been applied to charac... more The thermoanalytical technique differential scanning calorimetry (DSC) has been applied to characterize protein denaturation patterns (thermograms) in blood plasma samples and relate these to a subject's health status. The analysis and classification of thermograms is challenging because of the high-dimensionality of the dataset. There are various methods for group classification using high-dimensional data sets; however, the impact of using highdimensional data sets for cancer classification has been poorly understood. In the present article, we proposed a statistical approach for data reduction and a parametric method (PM) for modeling of high-dimensional data sets for two-and three-group classification using DSC and demographic data. We compared the PM to the non-parametric classification method K-nearest neighbors (KNN) and the semi-parametric classification method KNN with dynamic time warping (DTW). We evaluated the performance of these methods for multiple two-group classifications: (i) normal versus cervical cancer, (ii) normal versus lung cancer, (iii) normal versus cancer (cervical + lung), (iv) lung cancer versus cervical cancer as well as for three-group classification: normal versus cervical cancer versus lung cancer. In general, performance for two-group classification was high whereas three-group classification was more challenging, with all three methods predicting normal samples more accurately than cancer samples. Moreover, specificity of the PM method was mostly higher or the same as KNN and DTW-KNN with lower sensitivity. The performance of KNN and DTW-KNN decreased with the inclusion of demographic data, whereas similar performance was observed for the PM which could be explained by the fact that the PM uses fewer parameters as compared to KNN and DTW-KNN methods and is thus less susceptible to the risk of overfitting. More importantly the accuracy of the PM can be increased by using a greater

Research paper thumbnail of Differential Expression Analysis of Single-Cell RNA-Seq Data: Current Statistical Approaches and Outstanding Challenges

Entropy

With the advent of single-cell RNA-sequencing (scRNA-seq), it is possible to measure the expressi... more With the advent of single-cell RNA-sequencing (scRNA-seq), it is possible to measure the expression dynamics of genes at the single-cell level. Through scRNA-seq, a huge amount of expression data for several thousand(s) of genes over million(s) of cells are generated in a single experiment. Differential expression analysis is the primary downstream analysis of such data to identify gene markers for cell type detection and also provide inputs to other secondary analyses. Many statistical approaches for differential expression analysis have been reported in the literature. Therefore, we critically discuss the underlying statistical principles of the approaches and distinctly divide them into six major classes, i.e., generalized linear, generalized additive, Hurdle, mixture models, two-class parametric, and non-parametric approaches. We also succinctly discuss the limitations that are specific to each class of approaches, and how they are addressed by other subsequent classes of approa...

Research paper thumbnail of Decreased Tumoral Expression of Colon-Specific Water Channel Aquaporin 8 Is Associated With Reduced Overall Survival in Colon Adenocarcinoma

Diseases of the Colon & Rectum, 2021

Supplemental Digital Content is available in the text. BACKGROUND: Colon cancer survival is depen... more Supplemental Digital Content is available in the text. BACKGROUND: Colon cancer survival is dependent on metastatic potential and treatment. Large RNA-sequencing data sets may assist in identifying colon cancer-specific biomarkers to improve patient outcomes. OBJECTIVE: This study aimed to identify a highly specific biomarker for overall survival in colon adenocarcinoma by using an RNA-sequencing data set. DESIGN: Raw RNA-sequencing and clinical data for patients with colon adenocarcinoma (n = 271) were downloaded from The Cancer Genome Atlas. A binomial regression model was used to calculate differential RNA expression between paired colon cancer and normal epithelium samples (n = 40). Highly differentially expressed RNAs were examined. SETTINGS: This study was conducted at the University of Louisville using data acquired by The Cancer Genome Atlas. PATIENTS: Patients from US accredited cancer centers between 1998 and 2013 were analyzed. MAIN OUTCOME MEASURES: The primary outcome measures were recurrence-free and overall survival. RESULTS: The median age was 66 years (147/271 men, 180/271 White patients). Thirty RNAs were differentially expressed in colon adenocarcinoma compared with paired normal epithelium, using a log-fold change cutoff of ±6. Using median expression as a cutoff, 4 RNAs were associated with worse overall survival: decreased ZG16 (log-rank = 0.023), aquaporin 8 (log-rank = 0.023), and SLC26A3 (log-rank = 0.098), and increased COL1A1 (log-rank = 0.105). On multivariable analysis, low aquaporin 8 expression (HR, 1.748; 95% CI, 1.016–3.008; p = 0.044) was a risk factor for worse overall survival. Our final aquaporin 8 model had an area under the curve of 0.85 for overall survival. On subgroup analysis, low aquaporin 8 was associated with worse overall survival in patients with high microsatellite instability and in patients with stage II disease. Low aquaporin 8 expression was associated with KRAS and BRAF mutations. Aquaporin 8 immunohistochemistry was optimized for clinical application. LIMITATIONS: This was a retrospective study. CONCLUSION: Aquaporin 8 is a water channel selectively expressed in normal colon tissue. Low aquaporin 8 expression is a risk factor for worse overall survival in patients who have colon cancer. Aquaporin 8 measurement may have a role as a colon-specific prognostic biomarker and help in patient risk stratification for increased surveillance. See Video Abstract at http://links.lww.com/DCR/B603. LA DISMINUCIÓN DE LA EXPRESIÓN TUMORAL DE LA ACUAPORINA 8 DEL CANAL DE AGUA ESPECÍFICO DEL COLON SE ASOCIA CON UNA REDUCCIÓN DE LA SUPERVIVENCIA GENERAL EN EL ADENOCARCINOMA DE COLON ANTECEDENTES: La supervivencia del cáncer de colon depende del potencial metastásico y del tratamiento. Grandes conjuntos de datos de secuenciación de ARN pueden ayudar a identificar biomarcadores específicos del cáncer de colon para mejorar los resultados de los pacientes. OBJETIVO: Identificar un biomarcador altamente específico para la supervivencia general en el adenocarcinoma de colon utilizando un conjunto de datos de secuenciación de ARN. DISEÑO: La secuenciación de ARN sin procesar y los datos clínicos para pacientes con adenocarcinoma de colon (n = 271) se descargaron de The Cancer Genome Atlas. Se utilizó un modelo de regresión binomial para calcular la expresión diferencial de ARN entre muestras de cáncer de colon emparejadas y muestras de epitelio normal (n = 40). Se examinaron los ARN expresados de forma altamente diferencial. ENTORNO CLINICO: Este estudio se realizó en la Universidad de Louisville utilizando datos adquiridos por The Cancer Genome Atlas. PACIENTES: Se analizaron pacientes de centros oncológicos acreditados en Estados Unidos entre 1998-2013. PRINCIPALES MEDIDAS DE VALORACION: Las principales medidas de valoración fueron la supervivencia general y libre de recurrencia. RESULTADOS: La mediana de edad fue de 66 años (147/271 hombres, 180/271 caucásicos). Treinta ARN se expresaron diferencialmente en el adenocarcinoma de colon en comparación con el epitelio normal emparejado, utilizando un límite de cambio logarítmico de ± 6. Utilizando la expresión mediana como punto de corte, cuatro ARN se asociaron con una peor supervivencia general: disminución de ZG16 (rango logarítmico = 0,023), acuaporina8 (rango logarítmico = 0,023) y SLC26A3 (rango logarítmico = 0,098) y aumento de COL1A1 (log -rango = 0,105). En el análisis multivariable, la baja expresión de acuaporina8 (HR = 1,748, IC del 95%: 1,016-3,008, p = 0,044) fue un factor de riesgo para una peor supervivencia global. Nuestro modelo de aquaporin8 final tuvo un AUC de 0,85 para la supervivencia global. En el análisis de subgrupos, la acuaporina8 baja se asoció con una peor supervivencia general en pacientes con MSI-H y en pacientes en estadio II. La baja expresión de acuaporina8 se asoció con mutaciones de KRAS y BRAF. La inmunohistoquímica de aquaporina8 se optimizó para su aplicación clínica. LIMITACIONES: Este…

Research paper thumbnail of Crohn’s disease–related single nucleotide polymorphisms are associated with ileal pouch afferent limb stenosis

Journal of Gastrointestinal Surgery, 2021

Ileal pouch-anal anastomosis (IPAA) is a common surgical treatment for ulcerative colitis. Affere... more Ileal pouch-anal anastomosis (IPAA) is a common surgical treatment for ulcerative colitis. Afferent limb stenosis is an infrequent complication following IPAA, suggesting underlying Crohn’s disease (CD). We hypothesized that CD-related single nucleotide polymorphisms (SNPs) are associated with afferent limb stenosis. Afferent limb stenosis and CD control group patients were recruited from a prospective institutional inflammatory bowel disease database and associated biobank. Patient demographics, Montreal classification, and medication use were recorded. Ten SNPs associated with stricturing Crohn’s disease were examined in genomic DNA and compared among afferent limb stenosis, stricturing CD, and non-stricturing CD controls. Twenty-seven afferent limb stenosis and 162 CD control group patients (108 stricturing, 54 non-stricturing) were identified. Patients were gender and race matched. Afferent limb stenosis and stricturing CD controls were younger at diagnosis (Montreal A1/A2 vs. A3) compared to non-stricturing CD controls (both p < 0.05). The majority of afferent limb stenosis patients were non-smokers compared to CD controls (74% vs. 36%, p < 0.01) and did not use biologic therapies (4% vs. 37%, p < 0.001). The FUT2 G allele was more frequent in afferent limb stenosis and stricturing CD controls compared to non-stricturing CD controls (both p < 0.05). The NOD2 T allele was more frequent in stricturing CD controls compared to afferent limb stenosis and non-stricturing CD controls (both p < 0.05). Afferent limb stenosis patients are phenotypically similar to stricturing CD controls, but differ with lower smoking rates and lower NOD2 allele frequency. Such differences could contribute to the presentation delay with a stricturing phenotype. Selective SNP assessment may help categorize patients likely to develop afferent limb stenosis.

Research paper thumbnail of Role of neutrophil to lymphocyte ratio in addition to revised international prognostic index (R-IPI) in patients with extranodal diffuse large B-cell lymphoma of head and neck

Journal of Clinical Oncology, 2017

e19032 Background: Diffuse large B-cell lymphoma (DLBCL) is the most prevalent subtype of non-Hod... more e19032 Background: Diffuse large B-cell lymphoma (DLBCL) is the most prevalent subtype of non-Hodgkin lymphoma (NHL). One third of DLBCLs cases have a primary extranodal origin and head and neck localization is second most common localization after gastrointestinal tract. The Revised-International Prognostic Index (r-IPI) is commonly used as prognostic tool, but there is growing evidence that neutrophil to lymphocyte ratio (NLR) also has prognostic significance in DLBCL. Methods: We retrospectively reviewed all cases of extranodal DLBCLs diagnosed between 2006 and 2016 at a single academic institution. Collected data included race, gender, primary site, baseline laboratory data, IPI score, pathology, treatment and survival. Results: A total of 33 patient were included, with 18 (54.5%) being females. Median age at diagnosis was 68 (range 28-92). 15% of patients had a r-IPI of 0, 30% a r-IPI of 1-2, 12% a rIPI of 3-5 and 36% a not evaluable (NE) r-IPI. Twelve (36%) patients had germin...

Research paper thumbnail of Role of neutrophil to lymphocyte ratio in addition to revised international prognostic index (R-IPI) in patients with extranodal diffuse large B-cell lymphoma of head and neck

Journal of Clinical Oncology, 2017

e19032 Background: Diffuse large B-cell lymphoma (DLBCL) is the most prevalent subtype of non-Hod... more e19032 Background: Diffuse large B-cell lymphoma (DLBCL) is the most prevalent subtype of non-Hodgkin lymphoma (NHL). One third of DLBCLs cases have a primary extranodal origin and head and neck localization is second most common localization after gastrointestinal tract. The Revised-International Prognostic Index (r-IPI) is commonly used as prognostic tool, but there is growing evidence that neutrophil to lymphocyte ratio (NLR) also has prognostic significance in DLBCL. Methods: We retrospectively reviewed all cases of extranodal DLBCLs diagnosed between 2006 and 2016 at a single academic institution. Collected data included race, gender, primary site, baseline laboratory data, IPI score, pathology, treatment and survival. Results: A total of 33 patient were included, with 18 (54.5%) being females. Median age at diagnosis was 68 (range 28-92). 15% of patients had a r-IPI of 0, 30% a r-IPI of 1-2, 12% a rIPI of 3-5 and 36% a not evaluable (NE) r-IPI. Twelve (36%) patients had germin...

Research paper thumbnail of Comparison of coliform contamination in non-municipal waters consumed by the Mennonite versus the non-Mennonite rural populations

Environmental Health and Preventive Medicine, 2015

Objectives Mennonites reside in clusters, do not use modern sewage systems and consume water from... more Objectives Mennonites reside in clusters, do not use modern sewage systems and consume water from nonmunicipal sources. The purpose of this study is to assess risk of Escherichia coli exposure via consumption of nonmunicipal waters in Mennonite versus non-Mennonite rural households. Methods Results were reviewed for non-municipal water samples collected by the local health department from Mennonite and non-Mennonite lifestyle households from 1998 through 2012. Water contamination was examined with the help of two study variables: water quality (potable, polluted) and gastrointestinal (GI) health risk (none, low, high). These variables were analyzed for association with lifestyle (Mennonite, non-Mennonite) and season (fall, winter, spring, summer) of sample collection. Data were split into two periods to adjust for the ceiling effect of laboratory instrument. Results From the entire cohort, 82 % samples were polluted and 46 % samples contained E. coli, which is consistent with high GI health risk. In recent years (2009 through 2012), the presence of total coliforms was higher in non-Mennonites (39 %, P = 0.018) and presence of E. coli was higher in Mennonites (P = 0.012). Most polluted samples were collected during summer (45 %, P = 0.019) and had high GI health risk (51 %, P = 0.008) as compared to other seasons. Conclusions Majority of non-municipal waters in this region are polluted, consuming those poses a high GI health risk and contamination is prevalent in all households consuming these waters. An association of E. coli exposure with the Mennonite lifestyle was limited to recent years. Seasons with high heat index and increased surface runoffs were the riskiest to consume non-municipal waters.

Research paper thumbnail of Implementing an intervention to improve bone mineral density in survivors of childhood acute lymphoblastic leukemia: BONEII, a prospective placebo-controlled double-blind randomized interventional longitudinal study design

Contemporary Clinical Trials, 2008

The BONEII study is a large two-phase study. The baseline study (Study 1) aims to estimate the pr... more The BONEII study is a large two-phase study. The baseline study (Study 1) aims to estimate the prevalence of diminished bone mineral density (BMD) in patients treated for childhood acute lymphoblastic leukemia (ALL) and identify risk factors for BMD deficits. The interventional phase (Study 2) of BONEII has a placebo-controlled double-blind randomized longitudinal design to evaluate the effects of nutritional counseling and calcium and vitamin D supplementation on changes in BMD and serum and urine markers of bone metabolism. The extensive information being collected through this large study will serve as a repository of relational data about BMD and bone turnover and will support further investigations to assess the association of calcium metabolism, bone turnover, nutritional intake, lifestyle factors (such as exercise and the use of alcohol and tobacco), and the specific agents used in ALL therapy in this rapidly increasing population of childhood cancer survivors.

Research paper thumbnail of Robust Estimation and Inference on Current Status Data with Applications to Phase IV Cancer Trial

Journal of Modern Applied Statistical Methods, 2018

Research paper thumbnail of Additional file 5 of Micro-RNA-186-5p inhibition attenuates proliferation, anchorage independent growth and invasion in metastatic prostate cancer cells

Figure S2. Up-regulation of AKAP12 in PC-3 cells and pAKT in HEK 293 T cells. A) Total RNA was co... more Figure S2. Up-regulation of AKAP12 in PC-3 cells and pAKT in HEK 293 T cells. A) Total RNA was collected from PC-3 cells transfected with miR-186-5p inhibitor and scramble control 72 h post-transfection. AKAP12 transcript expression was increased by 1.7-fold in transient miR-186-5p inhibited PC-3 cells relative to negative controls (p = 0.0597). B) Protein lysate (35 μg) was collected from HEK 293 T cells transfected with miR-186-5p mimic and scramble control 72 h post-transfection. pAKT expression was enhanced by 1.67-fold increase via miR-186-5p overexpression in HEK 293 T cells (p = 0.196). Data was quantitated from at least 2–3 independent experiments and are represented as mean ± S.D. (TIFF 457 kb)

Research paper thumbnail of Interactive Web Tool for Standardizing Proteomics Workflow for Liquid Chromatography-Mass Spectrometry Data

Journal of Proteomics & Bioinformatics, May 23, 2019

The proteomics experiments involve several steps and there are many choices available for each st... more The proteomics experiments involve several steps and there are many choices available for each step in the workflow. Therefore, standardization of proteomics workflow is an essential task for design of proteomics experiments. However, there are challenges associated with the quantitative measurements based on liquid chromatography-mass spectrometry such as heterogeneity due to technical variability and missing values. We introduce a web application, Proteomics Workflow Standardization Tool (PWST) to standardize the proteomics workflow. The tool will be helpful in deciding the most suitable choice for each step of the experimentation. This is based on identifying steps/choices with least variability such as comparing Coefficient of Variation (CV). We demonstrate the tool on data with categorical and continuous variables. We have used the special cases of general linear model, analysis of covariance and analysis of variance with fixed effects to study the effects due to various source...

Research paper thumbnail of Lead time distribution for individuals with a screening history

Statistics and Its Interface, 2021

Research paper thumbnail of Statistical Approach of Gene Set Analysis with Quantitative Trait Loci for Crop Gene Expression Studies

Entropy, 2021

Genome-wide expression study is a powerful genomic technology to quantify expression dynamics of ... more Genome-wide expression study is a powerful genomic technology to quantify expression dynamics of genes in a genome. In gene expression study, gene set analysis has become the first choice to gain insights into the underlying biology of diseases or stresses in plants. It also reduces the complexity of statistical analysis and enhances the explanatory power of the obtained results from the primary downstream differential expression analysis. The gene set analysis approaches are well developed in microarrays and RNA-seq gene expression data analysis. These approaches mainly focus on analyzing the gene sets with gene ontology or pathway annotation data. However, in plant biology, such methods may not establish any formal relationship between the genotypes and the phenotypes, as most of the traits are quantitative and controlled by polygenes. The existing Quantitative Trait Loci (QTL)-based gene set analysis approaches only focus on the over-representation analysis of the selected genes ...

Research paper thumbnail of Liquid Liver Biopsy in Residential Cohort Exposed to Polychlorinated Biphenyls Is Consistent with Steatohepatitis

Research paper thumbnail of Coronavirus (COVID-19): A Systematic Review and Meta-analysis to Evaluate the Significance of Demographics and Comorbidities

Background The unprecedented outbreak of a contagious respiratory disease caused by a novel coron... more Background The unprecedented outbreak of a contagious respiratory disease caused by a novel coronavirus has led to a pandemic since December 2019, claiming millions of lives. The study systematically reviews and summarizes COVID-19’s impact based on symptoms, demographics, comorbidities, and demonstrates the association of demographics in cases and mortality in the United States.Methods PubMed and Google Scholar were searched from December 2019- August 2020, and articles restricted to the English language were collected following PRISMA guidelines. US CDC data was used for establishing statistical significance of age, sex, and race.Results• Among 3745 patients in China, mean age is 50.63 (95% CI: 36.84, 64.42) years, and 55.7 % (95% CI: 52.2, 59.2) were males. Symptoms included fever 86.5% (82.7, 90.0), fatigue 41.9% (32.7, 51.4), dyspnea 29.0% (21.2, 37.5), cough 66.0% (61.3, 70.6), mucus 66% (61.3, 70.6), lymphopenia 18.9% (5.2, 38.0). Prevalent comorbidities were hypertension 16....

Research paper thumbnail of Fifteen Years of Gene Set Analysis for High-Throughput Genomic Data: A Review of Statistical Approaches and Future Challenges

Entropy, 2020

Over the last decade, gene set analysis has become the first choice for gaining insights into und... more Over the last decade, gene set analysis has become the first choice for gaining insights into underlying complex biology of diseases through gene expression and gene association studies. It also reduces the complexity of statistical analysis and enhances the explanatory power of the obtained results. Although gene set analysis approaches are extensively used in gene expression and genome wide association data analysis, the statistical structure and steps common to these approaches have not yet been comprehensively discussed, which limits their utility. In this article, we provide a comprehensive overview, statistical structure and steps of gene set analysis approaches used for microarrays, RNA-sequencing and genome wide association data analysis. Further, we also classify the gene set analysis approaches and tools by the type of genomic study, null hypothesis, sampling model and nature of the test statistic, etc. Rather than reviewing the gene set analysis approaches individually, w...

Research paper thumbnail of Improved Confidence Intervals for Fixed Term Survival Probabilities in a Small Two-Arm Trial

BackgroundThe confidence interval for survival probability at a fixed time point provides valuabl... more BackgroundThe confidence interval for survival probability at a fixed time point provides valuable information on how the subject performs in terms of survival rate. However, in a two-arm trial when the sample size in each group is small or when the distribution of events that occurred within the group is skewed, the confidence interval might become very unstable, and thus may not provide accurate information for estimating survival rate. In addition, when there are other covariates available in the dataset, it is important to select those significant variables and include them in the model. On the other hand, researchers such as physicians who pay more attention to the final result often analyze the treatment group and control group separately, which may lead to inaccurate prediction. MethodsIn this study, two treatment groups are combined, and the group indicator variable is considered as a covariate and is included in the model for computation. Yuan and Rai’s adjusted effective s...

Research paper thumbnail of Clinical Design for Phase II/III Clinical Trials for Testing Therapeutic Interventions in COVID-19 Patients

Background Researchers around the world are urgently conducting clinical trials to develop new tr... more Background Researchers around the world are urgently conducting clinical trials to develop new treatments for reducing mortality and morbidity related to COVID-19. However, due to unknown features of the disease and complexity of the patient population, traditional trial designs may not be optimal in such patients. We propose two independent clinical trials designs based on careful grouping of the expected characteristics of patient population. This could serve as a useful guide for researchers designing COVID-19 related Phase II/III trials. Methods Using the commonly utilized World Health Organization ordinal scale on patient status, we classify patients into three risk groups. In this approach, patients in Stages 3, 4 and 5 are categorized as the intermediate-risk group while patients in Stages 6 and 7 are categorized as the high-risk group. To ensure that an intervention, if deemed efficacious, is promptly made available to vulnerable patients, we propose a group sequential desig...

Research paper thumbnail of NHERF1 Loss Upregulates Enzymes of the Pentose Phosphate Pathway in Kidney Cortex

Antioxidants, 2020

(1) Background: We previously showed Na/H exchange regulatory factor 1 (NHERF1) loss resulted in ... more (1) Background: We previously showed Na/H exchange regulatory factor 1 (NHERF1) loss resulted in increased susceptibility to cisplatin nephrotoxicity. NHERF1-deficient cultured proximal tubule cells and proximal tubules from NHERF1 knockout (KO) mice exhibit altered mitochondrial protein expression and poor survival. We hypothesized that NHERF1 loss results in changes in metabolic pathways and/or mitochondrial dysfunction, leading to increased sensitivity to cisplatin nephrotoxicity. (2) Methods: Two to 4-month-old male wildtype (WT) and KO mice were treated with vehicle or cisplatin (20 mg/kg dose IP). After 72 h, kidney cortex homogenates were utilized for metabolic enzyme activities. Non-treated kidneys were used to isolate mitochondria for mitochondrial respiration via the Seahorse XF24 analyzer. Non-treated kidneys were also used for LC-MS analysis to evaluate kidney ATP abundance, and electron microscopy (EM) was utilized to evaluate mitochondrial morphology and number. (3) Results: KO mouse kidneys exhibit significant increases in malic enzyme and glucose-6 phosphate dehydrogenase activity under baseline conditions but in no other gluconeogenic or glycolytic enzymes. NHERF1 loss does not decrease kidney ATP content. Mitochondrial morphology, number, and area appeared normal. Isolated mitochondria function was similar between WT and KO. Conclusions: KO kidneys experience a shift in metabolism to the pentose phosphate pathway, which may sensitize them to the oxidative stress imposed by cisplatin.

Research paper thumbnail of Standardizing Proteomics Workflow for Liquid Chromatography-Mass Spectrometry: Technical and Statistical Considerations

Journal of Proteomics & Bioinformatics, 2019

Research paper thumbnail of Multi-group diagnostic classification of high-dimensional data using differential scanning calorimetry plasma thermograms

PLOS ONE, 2019

The thermoanalytical technique differential scanning calorimetry (DSC) has been applied to charac... more The thermoanalytical technique differential scanning calorimetry (DSC) has been applied to characterize protein denaturation patterns (thermograms) in blood plasma samples and relate these to a subject's health status. The analysis and classification of thermograms is challenging because of the high-dimensionality of the dataset. There are various methods for group classification using high-dimensional data sets; however, the impact of using highdimensional data sets for cancer classification has been poorly understood. In the present article, we proposed a statistical approach for data reduction and a parametric method (PM) for modeling of high-dimensional data sets for two-and three-group classification using DSC and demographic data. We compared the PM to the non-parametric classification method K-nearest neighbors (KNN) and the semi-parametric classification method KNN with dynamic time warping (DTW). We evaluated the performance of these methods for multiple two-group classifications: (i) normal versus cervical cancer, (ii) normal versus lung cancer, (iii) normal versus cancer (cervical + lung), (iv) lung cancer versus cervical cancer as well as for three-group classification: normal versus cervical cancer versus lung cancer. In general, performance for two-group classification was high whereas three-group classification was more challenging, with all three methods predicting normal samples more accurately than cancer samples. Moreover, specificity of the PM method was mostly higher or the same as KNN and DTW-KNN with lower sensitivity. The performance of KNN and DTW-KNN decreased with the inclusion of demographic data, whereas similar performance was observed for the PM which could be explained by the fact that the PM uses fewer parameters as compared to KNN and DTW-KNN methods and is thus less susceptible to the risk of overfitting. More importantly the accuracy of the PM can be increased by using a greater