Yan-Shi Hu | Zhejiang University (original) (raw)

Papers by Yan-Shi Hu

Research paper thumbnail of HALD, a human aging and longevity knowledge graph for precision gerontology and geroscience analyses

Figshare, 2023

about biomedical entities including gene, disease, chemical, mutation, species, and cell line fro... more about biomedical entities including gene, disease, chemical, mutation, species, and cell line from all published biomedical literature. Integrated datasets with comprehensive knowledge are crucial for researchers to leverage existing resources. Currently, there are some publicly online manually curated databases related to human aging and longevity, such as Aging genes/interventions database (AGEID) 8 , Human Ageing Genomic Resources (HAGR) 9 , JenAge Ageing Factor Database (AgeFactDB) 10 , Aging Atlas 11 , and AgingBank 12 (Table 1). AGEID is a database of experimental results that provides formatted gene/intervention reports related to aging 8. HAGR includes the GenAge, AnAge, GenDR, LongevityMap, DrugAge and CellAge databases that are manually curated by experts and regularly updated 9. AgeFactDB is aimed at the collection and integration of aging-related data including genes, chemical compounds, and other environmental cues 10. Aging Atlas is a manually curated biomedical database comprising a range of aging-related multi-omics datasets and bioinformatics tools 11. AgingBank documents high-quality aging-related associations in more than 50 species by manually reviewing more than 20,000 publicly published papers 12. However, to the best of our knowledge, these databases are all manually curated, making it difficult to incorporate comprehensive knowledge of human aging and longevity. It is also difficult to obtain the latest biomedical knowledge from manually curated databases as their services are out of maintenance or not updated in time. In addition, although human nucleic acids information is generally involved in these studies, knowledge of other important organic compounds like carbohydrate, lipid, and protein is not yet fully integrated. Relation extraction between these entities is also indispensable for researchers to facilitate integrative and comprehensive analysis. Associations between molecular markers and diseases also must be clarified to illuminate the mechanisms and effects of anti-aging therapies on aging-related diseases 13. A knowledge graph (KG) is widely used for knowledge domain visualization or knowledge domain mapping graphs in the library and information industry 14. In the field of life sciences, a biomedical KG can not only link biomedical entities through certain relations, but also predict the potential relationships between existing entities and discover new relational facts 15. Such characteristics can facilitate the understanding of relations between biomedical entities, which is crucial for researchers to refine their research scope. In this paper, we presented HALD, a human aging and longevity dataset of the biomedical KG from human aging and longevity-related literature in PubMed. Figure 1 illustrates the workflow of biomedical literature mining using multiple NLP techniques. First, we used the Bio.Entrez 16 python package to conduct literature retrieval. Then, we took web-based (PubTator 6), dictionary-based (Python re module), rule-based (Stanford CoreNLP 17), and DL-based (ScispaCy 18 and BERN 19) methods to conduct named entity recognition (NER) for better accuracy. Next, we combined NetworkX, OpenIE, and AllenNLP tools to conduct relation extraction (RE) for wider coverage. Finally, the entities were further identified as human aging and longevity biomarkers according to their relationships with aging-related diseases. Up to September 2023, we had annotated 339,918 abstracts from PubMed and curated 12,227 entities in 10 types (gene, RNA, carbohydrate, peptide, lipid, protein, Databases Aging/Longevity Data Last Update* AGEID (2002) 8 Aging and longevity Genes and interventions Not available AnAge (2013) 9 Aging and longevity Aging and life history Build 15

Research paper thumbnail of ncPlantDB: a plant ncRNA database with potential ncPEP information and cell type-specific interaction

Nucleic Acids Research, 2024

The field of plant non-coding RNAs (ncRNAs) has seen significant advancements in recent years, wi... more The field of plant non-coding RNAs (ncRNAs) has seen significant advancements in recent years, with many ncRNAs recognized as important regulators of gene expression during plant development and stress responses. Moreover, the coding potential of these ncRNAs, giving rise to ncRNA-encoded peptides (ncPEPs), has emerged as an essential area of study. However, existing plant ncRNA databases lack comprehensive information on ncRNA-encoded peptides (ncPEPs) and cell type-specific interactions. To address this gap, we present ncPlantDB (https://bis.zju.edu.cn/ncPlantDB), a comprehensive database integrating ncRNA and ncPEP data across 43 plant species. ncPlantDB encompasses 353 140 ncRNAs, 3799 ncPEPs and 4 647 071 interactions, sourced from established databases and literature mining. The database offers unique features including translational potential data, cell-specific interaction networks derived from single-cell RNA sequencing and Ribo-seq analyses, and interactive visualization tools. ncPlantDB provides a user-friendly interface for exploring ncRNA expression patterns at the single-cell level, facilitating the discovery of tissue-specific ncRNAs and potential ncPEPs. By integrating diverse data types and offering advanced analytical tools, ncPlantDB serves as a valuable resource for researchers investigating plant ncRNA functions, interactions, and their potential coding capacity. This database significantly enhances our understanding of plant ncRNA biology and opens new avenues for exploring the complex regulatory networks in plant genomics.

Research paper thumbnail of Benchmarking alternative polyadenylation detection in single-cell and spatial transcriptomes

bioRxiv, 2024

Background: 3′-tag-based sequencing methods have become the predominant approach for single-cell ... more Background: 3′-tag-based sequencing methods have become the predominant approach for single-cell and spatial transcriptomics, with some protocols proven effective in detecting alternative polyadenylation (APA). While numerous computational tools have been developed for APA detection from these sequencing data, the absence of comprehensive benchmarks and the diversity of sequencing protocols and tools make it challenging to select appropriate methods for APA analysis in these contexts.

Research paper thumbnail of HALD, a human aging and longevity knowledge graph for precision gerontology and geroscience analyses

Scientific Data, Nov 30, 2023

Research paper thumbnail of Systematic single-cell analysis reveals dynamic control of transposable element activity orchestrating the endothelial-to-hematopoietic transition

BMC Biology, 2024

The endothelial-to-hematopoietic transition (EHT) process during definitive hematopoiesis is high... more The endothelial-to-hematopoietic transition (EHT) process during definitive hematopoiesis is highly conserved in vertebrates. Stage-specific expression of transposable elements (TEs) has been detected during zebrafish EHT and may promote hematopoietic stem cell (HSC) formation by activating inflammatory signaling. However, little is known about how TEs contribute to the EHT process in human and mouse. We reconstructed the single-cell EHT trajectories of human and mouse and resolved the dynamic expression patterns of TEs during EHT. Most TEs presented a transient co-upregulation pattern along the conserved EHT trajectories, coinciding with the temporal relaxation of epigenetic silencing systems. TE products can be sensed by multiple pattern recognition receptors, triggering inflammatory signaling to facilitate HSC emergence. Interestingly, we observed that hypoxia-related signals were enriched in cells with higher TE expression. Furthermore, we constructed the hematopoietic cis-regulatory network of accessible TEs and identified potential TE-derived enhancers that may boost the expression of specific EHT marker genes. Our study provides a systematic vision of how TEs are dynamically controlled to promote the hematopoietic fate decisions through transcriptional and cis-regulatory networks, and pre-train the immunity of nascent HSCs.

Research paper thumbnail of Systematic single-cell analysis reveals dynamic control of transposable element activity orchestrating the endothelial-to-hematopoietic transition

BackgroundThe endothelial-to-hematopoietic transition (EHT) process during definitive hematopoies... more BackgroundThe endothelial-to-hematopoietic transition (EHT) process during definitive hematopoiesis in vertebrate is highly conserved. Stage-specific expression of transposable elements (TEs) has been detected during zebrafish EHT and may promote hematopoietic stem cell formation by activating inflammatory signaling. However, little is known about how TEs contribute to the EHT process in human and mouse.ResultsWe reconstructed the single-cell EHT trajectories of human and mouse, and resolved the dynamic expression patterns of TEs during EHT. Most TEs presented a transient co-upregulation pattern along the conserved EHT trajectories. Enhanced TE activation was tightly associated with the temporal relaxation of epigenetic silencing systems. TE products can be sensed by multiple pattern recognition receptors, triggering inflammatory signaling to facilitate the emergence of hematopoietic stem cells. Furthermore, we observed that hypoxia-related signals were enriched in cells with higher...

Research paper thumbnail of CoGO: a contrastive learning framework to predict disease similarity based on gene network and ontology structure

Bioinformatics, 2022

Motivation Quantifying the similarity of human diseases provides guiding insights to the discover... more Motivation Quantifying the similarity of human diseases provides guiding insights to the discovery of micro-scope mechanisms from a macro scale. Previous work demonstrated that better performance can be gained by integrating multi-view data sources or applying machine learning techniques. However, designing an efficient framework to extract and incorporate information from different biological data using deep learning models remains unexplored. Results We present CoGO, a Contrastive learning framework to predict disease similarity based on Gene network and Ontology structure, which incorporates the gene interaction network and gene ontology (GO) domain knowledge using graph deep learning models. First, graph deep learning models are applied to encode the features of genes and GO terms from separate graph structure data. Next, gene and GO features are projected to a common embedding space via a non-linear projection. Then cross-view contrastive loss is applied to maximize the agreeme...

Research paper thumbnail of LBD: a manually curated database of experimentally validated lymphoma biomarkers

Database

Lymphoma is a heterogeneous disease caused by malignant proliferation of lymphocytes, resulting i... more Lymphoma is a heterogeneous disease caused by malignant proliferation of lymphocytes, resulting in significant mortality worldwide. While more and more lymphoma biomarkers have been identified with the advent and development of precision medicine, there are currently no databases dedicated to systematically gathering these scattered treasures. Therefore, we developed a lymphoma biomarker database (LBD) to curate experimentally validated lymphoma biomarkers in this study. LBD consists of 793 biomarkers extracted from 978 articles covering diverse subtypes of lymphomas, including 715 single and 78 combined biomarkers. These biomarkers can be categorized into molecular, cellular, image, histopathological, physiological and other biomarkers with various functions such as prognosis, diagnosis and treatment. As a manually curated database that provides comprehensive information about lymphoma biomarkers, LBD is helpful for personalized diagnosis and treatment of lymphoma. Database URL htt...

Research paper thumbnail of CoVM2: Molecular Biological Data Integration of SARS-CoV-2 Proteins in a Macro-to-Micro Method

Biomolecules

The COVID-19 pandemic has been a major public health event since 2020. Multiple variant strains o... more The COVID-19 pandemic has been a major public health event since 2020. Multiple variant strains of SARS-CoV-2, the causative agent of COVID-19, were detected based on the mutation sites in their sequences. These sequence mutations may lead to changes in the protein structures and affect the binding states of SARS-CoV-2 and human proteins. Experimental research on SARS-CoV-2 has accumulated a large amount of structural data and protein-protein interactions (PPIs), but the studies on the SARS-CoV-2–human PPI networks lack integration of physical associations with possible protein docking information. In addition, the docking structures of variant viral proteins with human receptor proteins are still insufficient. This study constructed SARS-CoV-2–human protein–protein interaction network with data integration methods. Crystal structures were collected to map the interaction pairs. The pairs of direct interactions and physical associations were selected and analyzed for variant docking...

Research paper thumbnail of CoVM2 : Molecular Biological Data Integration of SARS-CoV-2 Proteins in a Macro-to-Micro Method

Biomolecules, 2022

The COVID-19 pandemic has been a major public health event since 2020. Multiple variant strains o... more The COVID-19 pandemic has been a major public health event since 2020. Multiple variant strains of SARS-CoV-2, the causative agent of COVID-19, were detected based on the mutation sites in their sequences. These sequence mutations may lead to changes in the protein structures and affect the binding states of SARS-CoV-2 and human proteins. Experimental research on SARS-CoV-2 has accumulated a large amount of structural data and protein-protein interactions (PPIs), but the studies on the SARS-CoV-2–human PPI networks lack integration of physical associations with possible protein docking information. In addition, the docking structures of variant viral proteins with human receptor proteins are still insufficient. This study constructed SARS-CoV-2–human protein–protein interaction network with data integration methods. Crystal structures were collected to map the interaction pairs. The pairs of direct interactions and physical associations were selected and analyzed for variant docking calculations. The study examined the structures of spike (S) glycoprotein of variants Delta B.1.617.2, Omicron BA.1, and Omicron BA.2. The calculated docking structures of S proteins and potential human receptors were obtained. The study integrated binary protein interactions with 3D docking structures to fulfill an extended view of SARS-CoV-2 proteins from a macro- to micro-scale.

Research paper thumbnail of LBD: a manually curated database of experimentally validated lymphoma biomarkers

Database, 2022

Lymphoma is a heterogeneous disease caused by malignant proliferation of lymphocytes, resulting i... more Lymphoma is a heterogeneous disease caused by malignant proliferation of lymphocytes, resulting in significant mortality worldwide. While more and more lymphoma biomarkers have been identified with the advent and development of precision medicine, there are currently no databases dedicated to systematically gathering these scattered treasures. Therefore, we developed a lymphoma biomarker database (LBD) to curate experimentally validated lymphoma biomarkers in this study. LBD consists of 793 biomarkers extracted from 978 articles covering diverse subtypes of lymphomas, including 715 single and 78 combined biomarkers. These biomarkers can be categorized into molecular, cellular, image, histopathological, physiological and other biomarkers with various functions such as prognosis, diagnosis and treatment. As a manually curated database that provides comprehensive information about lymphoma biomarkers, LBD is helpful for personalized diagnosis and treatment of lymphoma.

Database URL
http://bis.zju.edu.cn/LBD

Research paper thumbnail of Analyzing the genes related to Alzheimer's disease via a network and pathway-based approach

Alzheimer's research & therapy, Jan 27, 2017

Our understanding of the molecular mechanisms underlying Alzheimer's disease (AD) remains inc... more Our understanding of the molecular mechanisms underlying Alzheimer's disease (AD) remains incomplete. Previous studies have revealed that genetic factors provide a significant contribution to the pathogenesis and development of AD. In the past years, numerous genes implicated in this disease have been identified via genetic association studies on candidate genes or at the genome-wide level. However, in many cases, the roles of these genes and their interactions in AD are still unclear. A comprehensive and systematic analysis focusing on the biological function and interactions of these genes in the context of AD will therefore provide valuable insights to understand the molecular features of the disease. In this study, we collected genes potentially associated with AD by screening publications on genetic association studies deposited in PubMed. The major biological themes linked with these genes were then revealed by function and biochemical pathway enrichment analysis, and the ...

Research paper thumbnail of GRONS: a comprehensive genetic resource of nicotine and smoking

Research paper thumbnail of Detecting pathway relationship in the context of human protein-protein interaction network and its application to Parkinson's disease

Methods (San Diego, Calif.), Dec 5, 2017

In human physiological conditions like complex diseases, a large number of genes/proteins, as wel... more In human physiological conditions like complex diseases, a large number of genes/proteins, as well as their interactions, are involved. Thus, detecting the biochemical pathways enriched in these genes/proteins and identifying the pathway relationships is critical to understand the molecular mechanisms underlying a disease and can also be valuable in selecting the potential molecular targets for further exploration. In this study, we proposed a method to measure the relationship between pathways based on their distribution in the human PPI network. By representing each pathway as a gene module in the PPI network, a distance was calculated to measure the closeness of two pathways. For the pathways in the KEGG database, a total of 2143 pathway pairs with close connections were identified. Additional evaluations indicated the pathway relationship built via such approach was consistent with available evidence. Further, based on the genes and pathways potentially associated with the patho...

Research paper thumbnail of Network and Pathway-Based Analyses of Genes Associated with Parkinson's Disease

Molecular neurobiology, Jan 27, 2017

Parkinson's disease (PD) is a major neurodegenerative disease influenced by both genetic and ... more Parkinson's disease (PD) is a major neurodegenerative disease influenced by both genetic and environmental factors. Although previous studies have provided insights into the significant impacts of genetic factors on PD, the molecular mechanism underlying PD remains largely unclear. Under such situation, a comprehensive analysis focusing on biological function and interactions of PD-related genes will provide us valuable information to understand the pathogenesis of PD. In the current study, by reviewing the literatures deposited in PUBMED, we identified 242 genes genetically associated with PD, referred to as PD-related genes gene set (PDgset). Functional analysis revealed that biological processes and biochemical pathways related to neurodevelopment, metabolism, and immune system were enriched in PDgset. Then, pathway crosstalk analysis indicated that the enriched pathways could be grouped into two modules, with one module consisted of pathways mainly involved in neuronal signa...

Research paper thumbnail of The inflammatory cytokine IL-6 induces FRA1 deacetylation promoting colorectal cancer stem-like properties

Research paper thumbnail of The inflammatory cytokine IL-6 induces FRA1 deacetylation promoting colorectal cancer stem-like properties

Oncogene, 2019

Colorectal cancer (CRC) has long been known for its tight association with chronic inflammation, ... more Colorectal cancer (CRC) has long been known for its tight association with chronic inflammation, thought to play a key role in tumor onset and malignant progression through the modulation of cancer stemness. However, the underlying molecular and cellular mechanisms are still largely elusive. Here we show that the IL-6/STAT3 inflammatory signaling axis induces the deacetylation of FRA1 at the Lys-116 residue located within its DNA-binding domain. The HDAC6 deacetylase underlies this key modification leading to the increase of FRA1 transcriptional activity, the subsequent transactivation of NANOG expression, and the acquisition of stem-like cellular features. As validated in a large (n = 123) CRC cohort, IL-6 secretion was invariably accompanied by increased FRA1 deacetylation at K116 and an overall increase in its protein levels, coincident with malignant progression and poor prognosis. Of note, combined treatment with the conventional cytotoxic drug 5-FU together with Tubastatin A, a HDAC6-specific inhibitor, resulted in a significant in vivo synergistic inhibitory effect on tumor growth through suppression of CRC stemness. Our results reveal a novel transcriptional and posttranslational regulatory cross-talk between inflammation and stemness signaling pathways that underlie self-renewal and maintenance of CRC stem cells and promote their malignant behavior. Combinatorial treatment aimed at the core regulatory mechanisms downstream of IL-6 may offer a novel promising approach for CRC treatment.

Research paper thumbnail of PncStress: a manually curated database of experimentally validated stress-responsive non-coding RNAs in plants

Database, 2020

Non-coding RNAs (ncRNAs) are recognized as key regulatory molecules in many biological processes.... more Non-coding RNAs (ncRNAs) are recognized as key regulatory molecules in many biological processes. Accumulating evidence indicates that ncRNA-related mechanisms play important roles in plant stress responses. Although abundant plant stress-responsive ncRNAs have been identified, these experimentally validated results have not been gathered into a single public domain archive. Therefore, we established PncStress by curating experimentally validated stress-responsive ncRNAs in plants, including microRNAs, long non-coding RNAs and circular RNAs. The current version of PncStress contains 4227 entries from 114 plants covering 48 biotic and 91 abiotic stresses. For each entry, PncStress has biological information and network visualization. Serving as a manually curated database, PncStress will become a valuable resource in support of plant stress response research.

Research paper thumbnail of Common characteristics of Alzheimer's disease and Parkinson's disease based on AlzGene and PDGene databases

The Sixth National Conference on Bioinformatics & Systems Biology of China and International Workshop on Advanced Bioinformatics, 2014

Research paper thumbnail of GRONS: a comprehensive genetic resource of nicotine and smoking

Nicotine, the primary psychoactive component in tobacco, can exert a broad impact on both the cen... more Nicotine, the primary psychoactive component in tobacco, can exert a broad impact on both the central and peripheral nervous systems. During the past years, a tremendous amount of efforts has been put to exploring the molecular mechanisms underlying tobacco smoking related behaviors and diseases, and many susceptibility genes have been identified via various genomic approaches. For many human complex diseases, there is a trend towards collecting and integrating the data from genetic studies and the biological information related to them into a comprehensive resource for further investigation , but we have not found such an effort for nicotine addiction or smoking-related phenotypes yet. To collect, curate, and integrate cross-platform genetic data so as to make them interpretable and easily accessible, we developed Genetic Resources Of Nicotine and Smoking (GRONS), a comprehensive database for genes related to biological response to nicotine exposure, tobacco smoking related behaviors or diseases. GRONS deposits genes from nicotine addiction studies in the following four categories, i.e. association study, genome-wide linkage scan, expression analysis on genes/proteins via high-throughput technologies, as well as single gene/protein-based experimental studies via literature search. Moreover, GRONS not only provides tools for data browse, search and graphical presentation of gene prioritization, but also presents the results from comprehensive bioinformatics analyses for the prioritized genes associated with nicotine addiction. With more and more genetic data and analysis tools integrated, GRONS will become a useful resource for studies focusing on nicotine addiction or tobacco smoking.
Database URL: http://bioinfo.tmu.edu.cn/GRONS/

Research paper thumbnail of HALD, a human aging and longevity knowledge graph for precision gerontology and geroscience analyses

Figshare, 2023

about biomedical entities including gene, disease, chemical, mutation, species, and cell line fro... more about biomedical entities including gene, disease, chemical, mutation, species, and cell line from all published biomedical literature. Integrated datasets with comprehensive knowledge are crucial for researchers to leverage existing resources. Currently, there are some publicly online manually curated databases related to human aging and longevity, such as Aging genes/interventions database (AGEID) 8 , Human Ageing Genomic Resources (HAGR) 9 , JenAge Ageing Factor Database (AgeFactDB) 10 , Aging Atlas 11 , and AgingBank 12 (Table 1). AGEID is a database of experimental results that provides formatted gene/intervention reports related to aging 8. HAGR includes the GenAge, AnAge, GenDR, LongevityMap, DrugAge and CellAge databases that are manually curated by experts and regularly updated 9. AgeFactDB is aimed at the collection and integration of aging-related data including genes, chemical compounds, and other environmental cues 10. Aging Atlas is a manually curated biomedical database comprising a range of aging-related multi-omics datasets and bioinformatics tools 11. AgingBank documents high-quality aging-related associations in more than 50 species by manually reviewing more than 20,000 publicly published papers 12. However, to the best of our knowledge, these databases are all manually curated, making it difficult to incorporate comprehensive knowledge of human aging and longevity. It is also difficult to obtain the latest biomedical knowledge from manually curated databases as their services are out of maintenance or not updated in time. In addition, although human nucleic acids information is generally involved in these studies, knowledge of other important organic compounds like carbohydrate, lipid, and protein is not yet fully integrated. Relation extraction between these entities is also indispensable for researchers to facilitate integrative and comprehensive analysis. Associations between molecular markers and diseases also must be clarified to illuminate the mechanisms and effects of anti-aging therapies on aging-related diseases 13. A knowledge graph (KG) is widely used for knowledge domain visualization or knowledge domain mapping graphs in the library and information industry 14. In the field of life sciences, a biomedical KG can not only link biomedical entities through certain relations, but also predict the potential relationships between existing entities and discover new relational facts 15. Such characteristics can facilitate the understanding of relations between biomedical entities, which is crucial for researchers to refine their research scope. In this paper, we presented HALD, a human aging and longevity dataset of the biomedical KG from human aging and longevity-related literature in PubMed. Figure 1 illustrates the workflow of biomedical literature mining using multiple NLP techniques. First, we used the Bio.Entrez 16 python package to conduct literature retrieval. Then, we took web-based (PubTator 6), dictionary-based (Python re module), rule-based (Stanford CoreNLP 17), and DL-based (ScispaCy 18 and BERN 19) methods to conduct named entity recognition (NER) for better accuracy. Next, we combined NetworkX, OpenIE, and AllenNLP tools to conduct relation extraction (RE) for wider coverage. Finally, the entities were further identified as human aging and longevity biomarkers according to their relationships with aging-related diseases. Up to September 2023, we had annotated 339,918 abstracts from PubMed and curated 12,227 entities in 10 types (gene, RNA, carbohydrate, peptide, lipid, protein, Databases Aging/Longevity Data Last Update* AGEID (2002) 8 Aging and longevity Genes and interventions Not available AnAge (2013) 9 Aging and longevity Aging and life history Build 15

Research paper thumbnail of ncPlantDB: a plant ncRNA database with potential ncPEP information and cell type-specific interaction

Nucleic Acids Research, 2024

The field of plant non-coding RNAs (ncRNAs) has seen significant advancements in recent years, wi... more The field of plant non-coding RNAs (ncRNAs) has seen significant advancements in recent years, with many ncRNAs recognized as important regulators of gene expression during plant development and stress responses. Moreover, the coding potential of these ncRNAs, giving rise to ncRNA-encoded peptides (ncPEPs), has emerged as an essential area of study. However, existing plant ncRNA databases lack comprehensive information on ncRNA-encoded peptides (ncPEPs) and cell type-specific interactions. To address this gap, we present ncPlantDB (https://bis.zju.edu.cn/ncPlantDB), a comprehensive database integrating ncRNA and ncPEP data across 43 plant species. ncPlantDB encompasses 353 140 ncRNAs, 3799 ncPEPs and 4 647 071 interactions, sourced from established databases and literature mining. The database offers unique features including translational potential data, cell-specific interaction networks derived from single-cell RNA sequencing and Ribo-seq analyses, and interactive visualization tools. ncPlantDB provides a user-friendly interface for exploring ncRNA expression patterns at the single-cell level, facilitating the discovery of tissue-specific ncRNAs and potential ncPEPs. By integrating diverse data types and offering advanced analytical tools, ncPlantDB serves as a valuable resource for researchers investigating plant ncRNA functions, interactions, and their potential coding capacity. This database significantly enhances our understanding of plant ncRNA biology and opens new avenues for exploring the complex regulatory networks in plant genomics.

Research paper thumbnail of Benchmarking alternative polyadenylation detection in single-cell and spatial transcriptomes

bioRxiv, 2024

Background: 3′-tag-based sequencing methods have become the predominant approach for single-cell ... more Background: 3′-tag-based sequencing methods have become the predominant approach for single-cell and spatial transcriptomics, with some protocols proven effective in detecting alternative polyadenylation (APA). While numerous computational tools have been developed for APA detection from these sequencing data, the absence of comprehensive benchmarks and the diversity of sequencing protocols and tools make it challenging to select appropriate methods for APA analysis in these contexts.

Research paper thumbnail of HALD, a human aging and longevity knowledge graph for precision gerontology and geroscience analyses

Scientific Data, Nov 30, 2023

Research paper thumbnail of Systematic single-cell analysis reveals dynamic control of transposable element activity orchestrating the endothelial-to-hematopoietic transition

BMC Biology, 2024

The endothelial-to-hematopoietic transition (EHT) process during definitive hematopoiesis is high... more The endothelial-to-hematopoietic transition (EHT) process during definitive hematopoiesis is highly conserved in vertebrates. Stage-specific expression of transposable elements (TEs) has been detected during zebrafish EHT and may promote hematopoietic stem cell (HSC) formation by activating inflammatory signaling. However, little is known about how TEs contribute to the EHT process in human and mouse. We reconstructed the single-cell EHT trajectories of human and mouse and resolved the dynamic expression patterns of TEs during EHT. Most TEs presented a transient co-upregulation pattern along the conserved EHT trajectories, coinciding with the temporal relaxation of epigenetic silencing systems. TE products can be sensed by multiple pattern recognition receptors, triggering inflammatory signaling to facilitate HSC emergence. Interestingly, we observed that hypoxia-related signals were enriched in cells with higher TE expression. Furthermore, we constructed the hematopoietic cis-regulatory network of accessible TEs and identified potential TE-derived enhancers that may boost the expression of specific EHT marker genes. Our study provides a systematic vision of how TEs are dynamically controlled to promote the hematopoietic fate decisions through transcriptional and cis-regulatory networks, and pre-train the immunity of nascent HSCs.

Research paper thumbnail of Systematic single-cell analysis reveals dynamic control of transposable element activity orchestrating the endothelial-to-hematopoietic transition

BackgroundThe endothelial-to-hematopoietic transition (EHT) process during definitive hematopoies... more BackgroundThe endothelial-to-hematopoietic transition (EHT) process during definitive hematopoiesis in vertebrate is highly conserved. Stage-specific expression of transposable elements (TEs) has been detected during zebrafish EHT and may promote hematopoietic stem cell formation by activating inflammatory signaling. However, little is known about how TEs contribute to the EHT process in human and mouse.ResultsWe reconstructed the single-cell EHT trajectories of human and mouse, and resolved the dynamic expression patterns of TEs during EHT. Most TEs presented a transient co-upregulation pattern along the conserved EHT trajectories. Enhanced TE activation was tightly associated with the temporal relaxation of epigenetic silencing systems. TE products can be sensed by multiple pattern recognition receptors, triggering inflammatory signaling to facilitate the emergence of hematopoietic stem cells. Furthermore, we observed that hypoxia-related signals were enriched in cells with higher...

Research paper thumbnail of CoGO: a contrastive learning framework to predict disease similarity based on gene network and ontology structure

Bioinformatics, 2022

Motivation Quantifying the similarity of human diseases provides guiding insights to the discover... more Motivation Quantifying the similarity of human diseases provides guiding insights to the discovery of micro-scope mechanisms from a macro scale. Previous work demonstrated that better performance can be gained by integrating multi-view data sources or applying machine learning techniques. However, designing an efficient framework to extract and incorporate information from different biological data using deep learning models remains unexplored. Results We present CoGO, a Contrastive learning framework to predict disease similarity based on Gene network and Ontology structure, which incorporates the gene interaction network and gene ontology (GO) domain knowledge using graph deep learning models. First, graph deep learning models are applied to encode the features of genes and GO terms from separate graph structure data. Next, gene and GO features are projected to a common embedding space via a non-linear projection. Then cross-view contrastive loss is applied to maximize the agreeme...

Research paper thumbnail of LBD: a manually curated database of experimentally validated lymphoma biomarkers

Database

Lymphoma is a heterogeneous disease caused by malignant proliferation of lymphocytes, resulting i... more Lymphoma is a heterogeneous disease caused by malignant proliferation of lymphocytes, resulting in significant mortality worldwide. While more and more lymphoma biomarkers have been identified with the advent and development of precision medicine, there are currently no databases dedicated to systematically gathering these scattered treasures. Therefore, we developed a lymphoma biomarker database (LBD) to curate experimentally validated lymphoma biomarkers in this study. LBD consists of 793 biomarkers extracted from 978 articles covering diverse subtypes of lymphomas, including 715 single and 78 combined biomarkers. These biomarkers can be categorized into molecular, cellular, image, histopathological, physiological and other biomarkers with various functions such as prognosis, diagnosis and treatment. As a manually curated database that provides comprehensive information about lymphoma biomarkers, LBD is helpful for personalized diagnosis and treatment of lymphoma. Database URL htt...

Research paper thumbnail of CoVM2: Molecular Biological Data Integration of SARS-CoV-2 Proteins in a Macro-to-Micro Method

Biomolecules

The COVID-19 pandemic has been a major public health event since 2020. Multiple variant strains o... more The COVID-19 pandemic has been a major public health event since 2020. Multiple variant strains of SARS-CoV-2, the causative agent of COVID-19, were detected based on the mutation sites in their sequences. These sequence mutations may lead to changes in the protein structures and affect the binding states of SARS-CoV-2 and human proteins. Experimental research on SARS-CoV-2 has accumulated a large amount of structural data and protein-protein interactions (PPIs), but the studies on the SARS-CoV-2–human PPI networks lack integration of physical associations with possible protein docking information. In addition, the docking structures of variant viral proteins with human receptor proteins are still insufficient. This study constructed SARS-CoV-2–human protein–protein interaction network with data integration methods. Crystal structures were collected to map the interaction pairs. The pairs of direct interactions and physical associations were selected and analyzed for variant docking...

Research paper thumbnail of CoVM2 : Molecular Biological Data Integration of SARS-CoV-2 Proteins in a Macro-to-Micro Method

Biomolecules, 2022

The COVID-19 pandemic has been a major public health event since 2020. Multiple variant strains o... more The COVID-19 pandemic has been a major public health event since 2020. Multiple variant strains of SARS-CoV-2, the causative agent of COVID-19, were detected based on the mutation sites in their sequences. These sequence mutations may lead to changes in the protein structures and affect the binding states of SARS-CoV-2 and human proteins. Experimental research on SARS-CoV-2 has accumulated a large amount of structural data and protein-protein interactions (PPIs), but the studies on the SARS-CoV-2–human PPI networks lack integration of physical associations with possible protein docking information. In addition, the docking structures of variant viral proteins with human receptor proteins are still insufficient. This study constructed SARS-CoV-2–human protein–protein interaction network with data integration methods. Crystal structures were collected to map the interaction pairs. The pairs of direct interactions and physical associations were selected and analyzed for variant docking calculations. The study examined the structures of spike (S) glycoprotein of variants Delta B.1.617.2, Omicron BA.1, and Omicron BA.2. The calculated docking structures of S proteins and potential human receptors were obtained. The study integrated binary protein interactions with 3D docking structures to fulfill an extended view of SARS-CoV-2 proteins from a macro- to micro-scale.

Research paper thumbnail of LBD: a manually curated database of experimentally validated lymphoma biomarkers

Database, 2022

Lymphoma is a heterogeneous disease caused by malignant proliferation of lymphocytes, resulting i... more Lymphoma is a heterogeneous disease caused by malignant proliferation of lymphocytes, resulting in significant mortality worldwide. While more and more lymphoma biomarkers have been identified with the advent and development of precision medicine, there are currently no databases dedicated to systematically gathering these scattered treasures. Therefore, we developed a lymphoma biomarker database (LBD) to curate experimentally validated lymphoma biomarkers in this study. LBD consists of 793 biomarkers extracted from 978 articles covering diverse subtypes of lymphomas, including 715 single and 78 combined biomarkers. These biomarkers can be categorized into molecular, cellular, image, histopathological, physiological and other biomarkers with various functions such as prognosis, diagnosis and treatment. As a manually curated database that provides comprehensive information about lymphoma biomarkers, LBD is helpful for personalized diagnosis and treatment of lymphoma.

Database URL
http://bis.zju.edu.cn/LBD

Research paper thumbnail of Analyzing the genes related to Alzheimer's disease via a network and pathway-based approach

Alzheimer's research & therapy, Jan 27, 2017

Our understanding of the molecular mechanisms underlying Alzheimer's disease (AD) remains inc... more Our understanding of the molecular mechanisms underlying Alzheimer's disease (AD) remains incomplete. Previous studies have revealed that genetic factors provide a significant contribution to the pathogenesis and development of AD. In the past years, numerous genes implicated in this disease have been identified via genetic association studies on candidate genes or at the genome-wide level. However, in many cases, the roles of these genes and their interactions in AD are still unclear. A comprehensive and systematic analysis focusing on the biological function and interactions of these genes in the context of AD will therefore provide valuable insights to understand the molecular features of the disease. In this study, we collected genes potentially associated with AD by screening publications on genetic association studies deposited in PubMed. The major biological themes linked with these genes were then revealed by function and biochemical pathway enrichment analysis, and the ...

Research paper thumbnail of GRONS: a comprehensive genetic resource of nicotine and smoking

Research paper thumbnail of Detecting pathway relationship in the context of human protein-protein interaction network and its application to Parkinson's disease

Methods (San Diego, Calif.), Dec 5, 2017

In human physiological conditions like complex diseases, a large number of genes/proteins, as wel... more In human physiological conditions like complex diseases, a large number of genes/proteins, as well as their interactions, are involved. Thus, detecting the biochemical pathways enriched in these genes/proteins and identifying the pathway relationships is critical to understand the molecular mechanisms underlying a disease and can also be valuable in selecting the potential molecular targets for further exploration. In this study, we proposed a method to measure the relationship between pathways based on their distribution in the human PPI network. By representing each pathway as a gene module in the PPI network, a distance was calculated to measure the closeness of two pathways. For the pathways in the KEGG database, a total of 2143 pathway pairs with close connections were identified. Additional evaluations indicated the pathway relationship built via such approach was consistent with available evidence. Further, based on the genes and pathways potentially associated with the patho...

Research paper thumbnail of Network and Pathway-Based Analyses of Genes Associated with Parkinson's Disease

Molecular neurobiology, Jan 27, 2017

Parkinson's disease (PD) is a major neurodegenerative disease influenced by both genetic and ... more Parkinson's disease (PD) is a major neurodegenerative disease influenced by both genetic and environmental factors. Although previous studies have provided insights into the significant impacts of genetic factors on PD, the molecular mechanism underlying PD remains largely unclear. Under such situation, a comprehensive analysis focusing on biological function and interactions of PD-related genes will provide us valuable information to understand the pathogenesis of PD. In the current study, by reviewing the literatures deposited in PUBMED, we identified 242 genes genetically associated with PD, referred to as PD-related genes gene set (PDgset). Functional analysis revealed that biological processes and biochemical pathways related to neurodevelopment, metabolism, and immune system were enriched in PDgset. Then, pathway crosstalk analysis indicated that the enriched pathways could be grouped into two modules, with one module consisted of pathways mainly involved in neuronal signa...

Research paper thumbnail of The inflammatory cytokine IL-6 induces FRA1 deacetylation promoting colorectal cancer stem-like properties

Research paper thumbnail of The inflammatory cytokine IL-6 induces FRA1 deacetylation promoting colorectal cancer stem-like properties

Oncogene, 2019

Colorectal cancer (CRC) has long been known for its tight association with chronic inflammation, ... more Colorectal cancer (CRC) has long been known for its tight association with chronic inflammation, thought to play a key role in tumor onset and malignant progression through the modulation of cancer stemness. However, the underlying molecular and cellular mechanisms are still largely elusive. Here we show that the IL-6/STAT3 inflammatory signaling axis induces the deacetylation of FRA1 at the Lys-116 residue located within its DNA-binding domain. The HDAC6 deacetylase underlies this key modification leading to the increase of FRA1 transcriptional activity, the subsequent transactivation of NANOG expression, and the acquisition of stem-like cellular features. As validated in a large (n = 123) CRC cohort, IL-6 secretion was invariably accompanied by increased FRA1 deacetylation at K116 and an overall increase in its protein levels, coincident with malignant progression and poor prognosis. Of note, combined treatment with the conventional cytotoxic drug 5-FU together with Tubastatin A, a HDAC6-specific inhibitor, resulted in a significant in vivo synergistic inhibitory effect on tumor growth through suppression of CRC stemness. Our results reveal a novel transcriptional and posttranslational regulatory cross-talk between inflammation and stemness signaling pathways that underlie self-renewal and maintenance of CRC stem cells and promote their malignant behavior. Combinatorial treatment aimed at the core regulatory mechanisms downstream of IL-6 may offer a novel promising approach for CRC treatment.

Research paper thumbnail of PncStress: a manually curated database of experimentally validated stress-responsive non-coding RNAs in plants

Database, 2020

Non-coding RNAs (ncRNAs) are recognized as key regulatory molecules in many biological processes.... more Non-coding RNAs (ncRNAs) are recognized as key regulatory molecules in many biological processes. Accumulating evidence indicates that ncRNA-related mechanisms play important roles in plant stress responses. Although abundant plant stress-responsive ncRNAs have been identified, these experimentally validated results have not been gathered into a single public domain archive. Therefore, we established PncStress by curating experimentally validated stress-responsive ncRNAs in plants, including microRNAs, long non-coding RNAs and circular RNAs. The current version of PncStress contains 4227 entries from 114 plants covering 48 biotic and 91 abiotic stresses. For each entry, PncStress has biological information and network visualization. Serving as a manually curated database, PncStress will become a valuable resource in support of plant stress response research.

Research paper thumbnail of Common characteristics of Alzheimer's disease and Parkinson's disease based on AlzGene and PDGene databases

The Sixth National Conference on Bioinformatics & Systems Biology of China and International Workshop on Advanced Bioinformatics, 2014

Research paper thumbnail of GRONS: a comprehensive genetic resource of nicotine and smoking

Nicotine, the primary psychoactive component in tobacco, can exert a broad impact on both the cen... more Nicotine, the primary psychoactive component in tobacco, can exert a broad impact on both the central and peripheral nervous systems. During the past years, a tremendous amount of efforts has been put to exploring the molecular mechanisms underlying tobacco smoking related behaviors and diseases, and many susceptibility genes have been identified via various genomic approaches. For many human complex diseases, there is a trend towards collecting and integrating the data from genetic studies and the biological information related to them into a comprehensive resource for further investigation , but we have not found such an effort for nicotine addiction or smoking-related phenotypes yet. To collect, curate, and integrate cross-platform genetic data so as to make them interpretable and easily accessible, we developed Genetic Resources Of Nicotine and Smoking (GRONS), a comprehensive database for genes related to biological response to nicotine exposure, tobacco smoking related behaviors or diseases. GRONS deposits genes from nicotine addiction studies in the following four categories, i.e. association study, genome-wide linkage scan, expression analysis on genes/proteins via high-throughput technologies, as well as single gene/protein-based experimental studies via literature search. Moreover, GRONS not only provides tools for data browse, search and graphical presentation of gene prioritization, but also presents the results from comprehensive bioinformatics analyses for the prioritized genes associated with nicotine addiction. With more and more genetic data and analysis tools integrated, GRONS will become a useful resource for studies focusing on nicotine addiction or tobacco smoking.
Database URL: http://bioinfo.tmu.edu.cn/GRONS/