Gang Fang | Mount Sinai School of Medicine (original) (raw)

Papers by Gang Fang

Research paper thumbnail of Mapping and characterizing N6-methyladenine in eukaryotic genomes using single molecule real-time sequencing

Genome research, Jul 15, 2018

N6-methyladenine (m6dA) has been discovered as a novel form of DNA methylation prevalent in eukar... more N6-methyladenine (m6dA) has been discovered as a novel form of DNA methylation prevalent in eukaryotes, however, methods for high resolution mapping of m6dA events are still lacking. Single-molecule real-time (SMRT) sequencing has enabled the detection of m6dA events at single-nucleotide resolution in prokaryotic genomes, but its application to detecting m6dA in eukaryotic genomes has not been rigorously examined. Herein, we identified unique characteristics of eukaryotic m6dA methylomes that fundamentally differ from those of prokaryotes. Based on these differences, we describe the first approach for mapping m6dA events using SMRT sequencing specifically designed for the study of eukaryotic genomes, and provide appropriate strategies for designing experiments and carrying out sequencing in future studies. We apply the novel approach to study two eukaryotic genomes. For green algae, we construct the first complete genome-wide map of m6dA at single nucleotide and single molecule reso...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Matrix Epistasis: ultrafast, exhaustive epistasis scan for quantitative traits with covariate adjustment

Bioinformatics (Oxford, England), Jan 2, 2018

For many traits, causal loci uncovered by genetic-mapping studies explain only a minority of the ... more For many traits, causal loci uncovered by genetic-mapping studies explain only a minority of the heritable contribution to trait variation. Multiple explanations for this 'missing heritability' have been proposed. SNP-SNP interaction (epistasis), as one of the compelling models, has been widely studied. However, the genome-wide scan of epistasis, especially for quantitative traits, poses huge computational challenges. Moreover, covariate adjustment is largely ignored in epistasis analysis due to the massive extra computational undertaking. In the current study, we found striking differences among epistasis models using both simulation data and real biological data, suggesting that the covariate adjustment cannot only remove confounding bias, but also improve the power. Furthermore, we derived mathematical formulas, which enable the exhaustive epistasis scan together with full covariate adjustment to be expressed in terms of large matrix operation, therefore substantially imp...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Discovering combinatorial disease biomarkers

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Characterizing Discriminative Patterns

Corr, Feb 20, 2011

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Approximate subspace pattern mining for mapping copy-number variations

Proceedings of the Acm Conference on Bioinformatics Computational Biology and Biomedicine, Oct 7, 2012

Bookmarks Related papers MentionsView impact

Research paper thumbnail of P.: Use of ridge points in partial fingerprint matching

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Comparison of ROC-based and likelihood methods for flngerprint veriflcation

The flngerprint veriflcation task answers the question of whether or not two flngerprints belongs... more The flngerprint veriflcation task answers the question of whether or not two flngerprints belongs to the same flnger. The paper focuses on the classiflcation aspect of flngerprint veriflcation. Classiflcation is the third and flnal step after after the two earlier steps of feature extraction, where a known set of features (minutiae points) have been extracted from each flngerprint, and scoring,

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Construction and Functional Analysis of Human Genetic InteractionNetworks with Genome-wide Association Data

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Whole-genome sequencing identifies emergence of a quinolone resistance mutation in a case of Stenotrophomonas maltophilia bacteremia

Antimicrobial agents and chemotherapy, Jan 31, 2015

Whole genome sequences for Stenotrophomonas maltophilia serial isolates from a bacteremic patient... more Whole genome sequences for Stenotrophomonas maltophilia serial isolates from a bacteremic patient before and after development of levofloxacin resistance were assembled de novo and differed by one single-nucleotide variant in smeT, a repressor for multidrug efflux operon smeDEF. Along with sequenced isolates from five contemporaneous cases, they displayed considerable diversity compared against all published complete genomes. Whole genome sequencing and complete assembly can conclusively identify resistance mechanisms emerging in S. maltophilia strains during clinical therapy.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Predictors of Home Care Oral Medication Management

Aims: The aims of this study were to identify risk factors and clinician interventions from elect... more Aims: The aims of this study were to identify risk factors and clinician interventions from electronic health record data that predict improvement in oral medication management for home health care patients. Methods: This study is a retrospective cohort design analyzing OASIS assessment data, Omaha System interventions, and medications from electronic health records in 15 home health care agencies. Models were created to discover predictors for improvement in oral medication management using data mining techniques of discriminative pattern analysis and classification rules. Results: The 1,688 cases represented predominately older Caucasian adults with two-thirds females who frequently were admitted from the hospital. Oral medication management improved in 268 (16.1%) cases by discharge. Discriminative pattern analysis resulted in two rules involving four variables that accounted for 90% of all cases for improvement or no improvement. Classification rules correctly classified patient...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Approximate subspace pattern mining for mapping copy-number variations

Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine - BCB '12, 2012

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Generative Models for Fingerprint Individuality using Ridge Types

Third International Symposium on Information Assurance and Security, 2007

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Using Constraints to Generate and Explore Higher Order Discriminative Patterns

Lecture Notes in Computer Science, 2011

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Identification of Single Nucleotide Polymorphism Interactions Associated with Survival and Risk In Multiple Myeloma Using Novel Data Mining Methods

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Quantitative evaluation of approximate frequent pattern mining algorithms

Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD 08, 2008

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Complex biomarker discovery in neuroimaging data: Finding a needle in a haystack

NeuroImage: Clinical, 2013

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Detecting epigenetic motifs in low coverage and metagenomics settings

BMC Bioinformatics, 2014

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Subspace differential coexpression analysis: problem definition and a general approach

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing, 2010

In this paper, we study methods to identify differential coexpression patterns in case-control ge... more In this paper, we study methods to identify differential coexpression patterns in case-control gene expression data. A differential coexpression pattern consists of a set of genes that have substantially different levels of coherence of their expression profiles across the two sample-classes, i.e., highly coherent in one class, but not in the other. Biologically, a differential coexpression patterns may indicate the disruption of a regulatory mechanism possibly caused by disregulation of pathways or mutations of transcription factors. A common feature of all the existing approaches for differential coexpression analysis is that the coexpression of a set of genes is measured on all the samples in each of the two classes, i.e., over the full-space of samples. Hence, these approaches may miss patterns that only cover a subset of samples in each class, i.e., subspace patterns, due to the heterogeneity of the subject population and disease causes. In this paper, we extend differential co...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of SubPatCNV: approximate subspace pattern mining for mapping copy-number variations

BMC bioinformatics, Jan 16, 2015

BackgroundMany DNA copy-number variations (CNVs) are known to lead to phenotypic variations and p... more BackgroundMany DNA copy-number variations (CNVs) are known to lead to phenotypic variations and pathogenesis. While CNVs are often only common in a small number of samples in the studied population or patient cohort, previous work has not focused on customized identification of CNV regions that only exhibit in subsets of samples with advanced data mining techniques to reliably answer questions such as ¿Which are all the chromosomal fragments showing nearly identical deletions or insertions in more than 30% of the individuals?¿.ResultsWe introduce a tool for mining CNV subspace patterns, namely SubPatCNV, which is capable of identifying all aberrant CNV regions specific to arbitrary sample subsets larger than a support threshold. By design, SubPatCNV is the implementation of a variation of approximate association pattern mining algorithm under a spatial constraint on the positional CNV probe features. In benchmark test, SubPatCNV was applied to identify population specific germline C...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Altered WNT signaling in hiPSC NPCs derived from four schizophrenia patients

Biological Psychiatry, 2015

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Mapping and characterizing N6-methyladenine in eukaryotic genomes using single molecule real-time sequencing

Genome research, Jul 15, 2018

N6-methyladenine (m6dA) has been discovered as a novel form of DNA methylation prevalent in eukar... more N6-methyladenine (m6dA) has been discovered as a novel form of DNA methylation prevalent in eukaryotes, however, methods for high resolution mapping of m6dA events are still lacking. Single-molecule real-time (SMRT) sequencing has enabled the detection of m6dA events at single-nucleotide resolution in prokaryotic genomes, but its application to detecting m6dA in eukaryotic genomes has not been rigorously examined. Herein, we identified unique characteristics of eukaryotic m6dA methylomes that fundamentally differ from those of prokaryotes. Based on these differences, we describe the first approach for mapping m6dA events using SMRT sequencing specifically designed for the study of eukaryotic genomes, and provide appropriate strategies for designing experiments and carrying out sequencing in future studies. We apply the novel approach to study two eukaryotic genomes. For green algae, we construct the first complete genome-wide map of m6dA at single nucleotide and single molecule reso...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Matrix Epistasis: ultrafast, exhaustive epistasis scan for quantitative traits with covariate adjustment

Bioinformatics (Oxford, England), Jan 2, 2018

For many traits, causal loci uncovered by genetic-mapping studies explain only a minority of the ... more For many traits, causal loci uncovered by genetic-mapping studies explain only a minority of the heritable contribution to trait variation. Multiple explanations for this 'missing heritability' have been proposed. SNP-SNP interaction (epistasis), as one of the compelling models, has been widely studied. However, the genome-wide scan of epistasis, especially for quantitative traits, poses huge computational challenges. Moreover, covariate adjustment is largely ignored in epistasis analysis due to the massive extra computational undertaking. In the current study, we found striking differences among epistasis models using both simulation data and real biological data, suggesting that the covariate adjustment cannot only remove confounding bias, but also improve the power. Furthermore, we derived mathematical formulas, which enable the exhaustive epistasis scan together with full covariate adjustment to be expressed in terms of large matrix operation, therefore substantially imp...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Discovering combinatorial disease biomarkers

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Characterizing Discriminative Patterns

Corr, Feb 20, 2011

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Approximate subspace pattern mining for mapping copy-number variations

Proceedings of the Acm Conference on Bioinformatics Computational Biology and Biomedicine, Oct 7, 2012

Bookmarks Related papers MentionsView impact

Research paper thumbnail of P.: Use of ridge points in partial fingerprint matching

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Comparison of ROC-based and likelihood methods for flngerprint veriflcation

The flngerprint veriflcation task answers the question of whether or not two flngerprints belongs... more The flngerprint veriflcation task answers the question of whether or not two flngerprints belongs to the same flnger. The paper focuses on the classiflcation aspect of flngerprint veriflcation. Classiflcation is the third and flnal step after after the two earlier steps of feature extraction, where a known set of features (minutiae points) have been extracted from each flngerprint, and scoring,

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Construction and Functional Analysis of Human Genetic InteractionNetworks with Genome-wide Association Data

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Whole-genome sequencing identifies emergence of a quinolone resistance mutation in a case of Stenotrophomonas maltophilia bacteremia

Antimicrobial agents and chemotherapy, Jan 31, 2015

Whole genome sequences for Stenotrophomonas maltophilia serial isolates from a bacteremic patient... more Whole genome sequences for Stenotrophomonas maltophilia serial isolates from a bacteremic patient before and after development of levofloxacin resistance were assembled de novo and differed by one single-nucleotide variant in smeT, a repressor for multidrug efflux operon smeDEF. Along with sequenced isolates from five contemporaneous cases, they displayed considerable diversity compared against all published complete genomes. Whole genome sequencing and complete assembly can conclusively identify resistance mechanisms emerging in S. maltophilia strains during clinical therapy.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Predictors of Home Care Oral Medication Management

Aims: The aims of this study were to identify risk factors and clinician interventions from elect... more Aims: The aims of this study were to identify risk factors and clinician interventions from electronic health record data that predict improvement in oral medication management for home health care patients. Methods: This study is a retrospective cohort design analyzing OASIS assessment data, Omaha System interventions, and medications from electronic health records in 15 home health care agencies. Models were created to discover predictors for improvement in oral medication management using data mining techniques of discriminative pattern analysis and classification rules. Results: The 1,688 cases represented predominately older Caucasian adults with two-thirds females who frequently were admitted from the hospital. Oral medication management improved in 268 (16.1%) cases by discharge. Discriminative pattern analysis resulted in two rules involving four variables that accounted for 90% of all cases for improvement or no improvement. Classification rules correctly classified patient...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Approximate subspace pattern mining for mapping copy-number variations

Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine - BCB '12, 2012

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Generative Models for Fingerprint Individuality using Ridge Types

Third International Symposium on Information Assurance and Security, 2007

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Using Constraints to Generate and Explore Higher Order Discriminative Patterns

Lecture Notes in Computer Science, 2011

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Identification of Single Nucleotide Polymorphism Interactions Associated with Survival and Risk In Multiple Myeloma Using Novel Data Mining Methods

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Quantitative evaluation of approximate frequent pattern mining algorithms

Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD 08, 2008

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Complex biomarker discovery in neuroimaging data: Finding a needle in a haystack

NeuroImage: Clinical, 2013

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Detecting epigenetic motifs in low coverage and metagenomics settings

BMC Bioinformatics, 2014

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Subspace differential coexpression analysis: problem definition and a general approach

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing, 2010

In this paper, we study methods to identify differential coexpression patterns in case-control ge... more In this paper, we study methods to identify differential coexpression patterns in case-control gene expression data. A differential coexpression pattern consists of a set of genes that have substantially different levels of coherence of their expression profiles across the two sample-classes, i.e., highly coherent in one class, but not in the other. Biologically, a differential coexpression patterns may indicate the disruption of a regulatory mechanism possibly caused by disregulation of pathways or mutations of transcription factors. A common feature of all the existing approaches for differential coexpression analysis is that the coexpression of a set of genes is measured on all the samples in each of the two classes, i.e., over the full-space of samples. Hence, these approaches may miss patterns that only cover a subset of samples in each class, i.e., subspace patterns, due to the heterogeneity of the subject population and disease causes. In this paper, we extend differential co...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of SubPatCNV: approximate subspace pattern mining for mapping copy-number variations

BMC bioinformatics, Jan 16, 2015

BackgroundMany DNA copy-number variations (CNVs) are known to lead to phenotypic variations and p... more BackgroundMany DNA copy-number variations (CNVs) are known to lead to phenotypic variations and pathogenesis. While CNVs are often only common in a small number of samples in the studied population or patient cohort, previous work has not focused on customized identification of CNV regions that only exhibit in subsets of samples with advanced data mining techniques to reliably answer questions such as ¿Which are all the chromosomal fragments showing nearly identical deletions or insertions in more than 30% of the individuals?¿.ResultsWe introduce a tool for mining CNV subspace patterns, namely SubPatCNV, which is capable of identifying all aberrant CNV regions specific to arbitrary sample subsets larger than a support threshold. By design, SubPatCNV is the implementation of a variation of approximate association pattern mining algorithm under a spatial constraint on the positional CNV probe features. In benchmark test, SubPatCNV was applied to identify population specific germline C...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Altered WNT signaling in hiPSC NPCs derived from four schizophrenia patients

Biological Psychiatry, 2015

Bookmarks Related papers MentionsView impact