Miguel Andrade - Academia.edu (original) (raw)
Papers by Miguel Andrade
Journal of Biomedical Discovery and Collaboration, 2010
The MEDLINE database of medical literature is routinely used by researchers and doctors to find a... more The MEDLINE database of medical literature is routinely used by researchers and doctors to find articles pertaining to their area of interest. Insight into historical changes in research areas and use of scientific language may be gained by chronological analysis of the 18 million records currently in the database, however such analysis is generally complex and time consuming. The authors’ MLTrends web application graphs term usage in MEDLINE over time, allowing the determination of emergence dates for biomedical terms and historical variations in term usage intensity. Terms considered are individual words or quoted phrases which may be combined using Boolean operators. MLTrends can plot the number of records in MEDLINE per year whose titles or abstracts match each queried term for multiple terms simultaneously. The MEDLINE database is stored and indexed on the MLTrends server allowing queries to be completed and graphs generated in less than one second. Queries may be performed on ...
Journal of cell science, Jan 23, 2015
The preprophase band of microtubules performs the critical function of marking the plane of cell ... more The preprophase band of microtubules performs the critical function of marking the plane of cell division. Although the preprophase band depolymerises at the onset of mitosis, the division plane is 'memorized' by a cortical division zone to which the phragmoplast is attracted during cytokinesis. Proteins have been discovered that are part of the molecular memory but little is known about how they contribute to phragmoplast guidance. Previously, we found that the microtubule-associated protein AIR9 is found in the cortical division zone at preprophase and returns during cell plate insertion but is absent from the cortex during the intervening mitosis. To identify new components of the preprophase memory we searched for protein interactors of AIR9. We detected the kinesin-like calmodulin binding protein, KCBP, which can be visualised at the predicted cortical site throughout division. A truncation study of KCBP indicates that its MyTH4-FERM domain is required for linking the m...
The simultaneous genotyping of thousands of single nucleotide polymorphisms (SNPs) in a genome us... more The simultaneous genotyping of thousands of single nucleotide polymorphisms (SNPs) in a genome using SNP-Arrays is a very important tool that is revolutionizing genetics and molecular biology. We expanded the utility of this technique by using it following chromatin immunoprecipitation (ChIP) to assess the multiple genomic locations protected by a protein complex recognized by an antibody. The power of this technique is illustrated through an analysis of the changes in histone H4 acetylation, a marker of open chromatin and transcriptionally active genomic regions, which occur during differentiation of human myoblasts into myotubes. The findings have been validated by the observation of a significant correlation between the detected histone modifications and the expression of the nearby genes, as measured by DNA expression microarrays. This chapter focuses on the computational analysis of the data.
Methods in Molecular Biology, 2007
StemBase is a database of gene expression data obtained from stem cells and derivatives mainly fr... more StemBase is a database of gene expression data obtained from stem cells and derivatives mainly from mouse and human using DNA microarrays and Serial Analysis of Gene Expression. Here, we describe this database and indicate ways to use it for the study the expression of particular genes in stem cells or to search for genes with particular expression profiles in stem cells, which could be associated to stem cell function or used as stem cell markers.
BMC cancer, Jan 29, 2004
Recently, several members of a vertebrate protein family containing a six trans-membrane (6TM) do... more Recently, several members of a vertebrate protein family containing a six trans-membrane (6TM) domain and involved in apoptosis and cancer (e.g. STEAP, STAMP1, TSAP6), have been identified in Golgi and cytoplasmic membranes. The exact function of these proteins remains unknown. We related this 6TM domain to distant protein families using intermediate sequences and methods of iterative profile sequence similarity search. Here we show for the first time that this 6TM domain is homolog to the 6TM heme binding domain of both the NADPH oxidase (Nox) family and the YedZ family of bacterial oxidoreductases. This finding gives novel insights about the existence of a previously undetected electron transfer system involved in apoptosis and cancer, and suggests further steps in the experimental characterization of these evolutionarily related families.
BMC genetics, Jan 22, 2005
Human inherited diseases can be associated by genetic linkage with one or more genomic regions. T... more Human inherited diseases can be associated by genetic linkage with one or more genomic regions. The availability of the complete sequence of the human genome allows examining those locations for an associated gene. We previously developed an algorithm to prioritize genes on a chromosomal region according to their possible relation to an inherited disease using a combination of data mining on biomedical databases and gene sequence analysis. We have implemented this method as a web application in our site G2D (Genes to Diseases). It allows users to inspect any region of the human genome to find candidate genes related to a genetic disease of their interest. In addition, the G2D server includes pre-computed analyses of candidate genes for 552 linked monogenic diseases without an associated gene, and the analysis of 18 asthma loci. G2D can be publicly accessed at http://www.ogic.ca/projects/g2d_2/.
Discovering Biomolecular Mechanisms with Computational Biology
The first step in understanding the molecular biology of an inherited disease is to identify whic... more The first step in understanding the molecular biology of an inherited disease is to identify which gene or genes are carrying variants. This process starts with locating the mutations in a chromosomal band, as narrow as possible, and follows with the manual analysis of all ...
Pharmacological reviews, 2014
The Mas-related G protein-coupled receptors (Mrgprs or Mas-related genes) comprise a subfamily of... more The Mas-related G protein-coupled receptors (Mrgprs or Mas-related genes) comprise a subfamily of receptors named after the first discovered member, Mas. For most Mrgprs, pruriception seems to be the major function based on the following observations: 1) they are relatively promiscuous in their ligand specificity with best affinities for itch-inducing substances; 2) they are expressed in sensory neurons and mast cells in the skin, the main cellular components of pruriception; and 3) they appear in evolution first in tetrapods, which have arms and legs necessary for scratching to remove parasites or other noxious substances from the skin before they create harm. Because parasites coevolved with hosts, each species faced different parasitic challenges, which may explain another striking observation, the multiple independent duplication and expansion events of Mrgpr genes in different species as a consequence of parallel adaptive evolution. Their predominant expression in dorsal root g...
Genome Biology, 2002
BACKGROUND: Iron uptake from the host is essential for bacteria that infect animals. To find pote... more BACKGROUND: Iron uptake from the host is essential for bacteria that infect animals. To find potential targets for drugs active against pathogenic bacteria, we have searched all completely sequenced genomes of pathogenic bacteria for genes relevant for iron transport. RESULTS: We identified a protein domain that appears in variable copy number in bacterial genes that are usually in the vicinity
Plant Signaling & Behavior, 2007
AIR9 Auxin-Induced in Root Cultures 9 IgG g-immunoglobulin domain A9 domain IgG domain found in A... more AIR9 Auxin-Induced in Root Cultures 9 IgG g-immunoglobulin domain A9 domain IgG domain found in AIR9-like proteins GFP green fluorescent protein LRR leucine-rich repeat MAP microtubule associated protein PPB preprophase band ACKNoWLedgeMeNtS This work was supported by a BBSRC Grant to Clive W. Lloyd. Miguel A. Andrade-Navarro is a Canada Research Chair in Bioinformatics.
Nucleic Acids Research, 2003
As scientific literature databases like MEDLINE increase in size, so does the time required to se... more As scientific literature databases like MEDLINE increase in size, so does the time required to search them. Scientists must frequently inspect long lists of references manually, often just reading the titles. XplorMed is a web tool that aids MEDLINE searching by summarizing the subjects contained in the results, thus allowing users to focus on subjects of interest. Here we describe new features added to XplorMed during the last 2 years (http://www.bork. embl-heidelberg.de/xplormed/).
Nucleic Acids Research, 2012
Expanded runs of consecutive trinucleotide CAG repeats encoding polyglutamine (polyQ) stretches a... more Expanded runs of consecutive trinucleotide CAG repeats encoding polyglutamine (polyQ) stretches are observed in the genes of a large number of patients with different genetic diseases such as Huntington's and several Ataxias. Protein aggregation, which is a key feature of most of these diseases, is thought to be triggered by these expanded polyQ sequences in disease-related proteins. However, polyQ tracts are a normal feature of many human proteins, suggesting that they have an important cellular function. To clarify the potential function of polyQ repeats in biological systems, we systematically analyzed available information stored in sequence and protein interaction databases. By integrating genomic, phylogenetic, protein interaction network and functional information, we obtained evidence that polyQ tracts in proteins stabilize protein interactions. This happens most likely through structural changes whereby the polyQ sequence extends a neighboring coiled-coil region to facilitate its interaction with a coiled-coil region in another protein. Alteration of this important biological function due to polyQ expansion results in gain of abnormal interactions, leading to pathological effects like protein aggregation. Our analyses suggest that research on polyQ proteins should shift focus from expanded polyQ proteins into the characterization of the influence of the wild-type polyQ on protein interactions.
Nucleic Acids Research, 2014
There are groups of genes that need coordinated repression in multiple contexts, for example if t... more There are groups of genes that need coordinated repression in multiple contexts, for example if they code for proteins that work together in a pathway or in a protein complex. Redundancy of biological regulatory networks implies that such coordinated repression might occur at both the pre-and posttranscriptional level, though not necessarily simultaneously or under the same conditions. Here, we propose that such redundancy in the global regulatory network can be detected by the overlap between the putative targets of a transcriptional repressor, as identified by a ChIP-seq experiment, and predicted targets of a microRNA (miRNA). To test this hypothesis, we used publicly available ChIP-seq data of the neural transcriptional repressor RE1 silencing transcription factor (REST) from 15 different cell samples. We found 20 miRNAs, each of which shares a significant amount of predicted targets with REST. The set of predicted associations between these 20 miRNAs and the overlapping REST targets is enriched in known miRNA targets. Many of the detected miRNAs have functions related to neural identity and glioblastoma, which could be expected from their overlap in targets with REST. We propose that the integration of experimentally determined transcription factor binding sites with miRNA-target predictions provides functional information on miRNAs.
Journal of Structural Biology, 2001
Internal repetition within proteins has been a successful strategem on multiple separate occasion... more Internal repetition within proteins has been a successful strategem on multiple separate occasions throughout evolution. Such protein repeats possess regular secondary structures and form multirepeat assemblies in three dimensions of diverse sizes and functions. In general, however, internal repetition affords a protein enhanced evolutionary prospects due to an enlargement of its available binding surface area. Constraints on sequence conservation appear to be relatively lax, due to binding functions ensuing from multiple, rather than, single repeats. Considerable sequence divergence as well as the short lengths of sequence repeats mean that repeat detection can be a particularly arduous task. We also consider the conundrum of how multiple repeats, which show strong structural and functional interdependencies, ever evolved from a single repeat ancestor. In this review, we illustrate each of these points by referring to six prolific repeat types (repeats in -propellers and -trefoils and tetratricopeptide, ankyrin, armadillo/HEAT, and leucine-rich repeats) and in other less-prolific but nonetheless interesting repeats.
Briefings in Bioinformatics, 2006
The cross-disciplinary nature of bioinformatics entails co-evolution with other biomedical discip... more The cross-disciplinary nature of bioinformatics entails co-evolution with other biomedical disciplines, whereby some bioinformatics applications become popular in certain disciplines and, in turn, these disciplines influence the focus of future bioinformatics development efforts. We observe here that the growth of computational approaches within various biomedical disciplines is not merely a reflection of a general extended usage of computers and the Internet, but due to the production of useful bioinformatics databases and methods for the rest of the biomedical scientific community. We have used the abstracts stored both in the MEDLINE database of biomedical literature and in NIH-funded project grants, to quantify two effects. First, we examine the biomedical literature as a whole and find that the use of computational methods has become increasingly prevalent across biomedical disciplines over the past three decades, while use of databases and the Internet have been rapidly increasing over the past decade. Second, we study the recent trends in the use of bioinformatics topics. We observe that molecular sequence databases are a widely adopted contribution in biomedicine from the field of bioinformatics, and that microarray analysis is one of the major new topics engaged by the bioinformatics community. Via this analysis, we were able to identify areas of rapid growth in the use of informatics to aid in curriculum planning, development of computational infrastructure and strategies for workforce education and funding.
BMC Research Notes, 2009
Background: Currently one of the largest online repositories for human and mouse stem cell gene e... more Background: Currently one of the largest online repositories for human and mouse stem cell gene expression data, StemBase was first designed as a simple web-interface to DNA microarray data generated by the Canadian Stem Cell Network to facilitate the discovery of gene functions relevant to stem cell control and differentiation. Findings: Since its creation, StemBase has grown in both size and scope into a system with analysis tools that examine either the whole database at once, or slices of data, based on tissue type, cell type or gene of interest. As of September 1, 2008, StemBase contains gene expression data (microarray and Serial Analysis of Gene Expression) from 210 stem cell samples in 60 different experiments. Conclusion: StemBase can be used to study gene expression in human and murine stem cells and is available at http://www.stembase.ca.
BMC Evolutionary Biology, 2010
Background: Naturally occurring antisense transcripts (NATs) are non-coding RNAs that may regulat... more Background: Naturally occurring antisense transcripts (NATs) are non-coding RNAs that may regulate the activity of sense transcripts to which they bind because of complementarity. NATs that are not located in the gene they regulate (trans-NATs) have better chances to evolve than cis-NATs, which is evident when the sense strand of the cis-NAT is part of a protein coding gene. However, the generation of a trans-NAT requires the formation of a relatively large region of complementarity to the gene it regulates. Results: Pseudogene formation may be one evolutionary mechanism that generates trans-NATs to the parental gene. For example, this could occur if the parental gene is regulated by a cis-NAT that is copied as a trans-NAT in the pseudogene. To support this we identified human pseudogenes with a trans-NAT to the parental gene in their antisense strand by analysis of the database of expressed sequence tags (ESTs). We found that the mutations that appeared in these trans-NATs after the pseudogene formation do not show the flat distribution that would be expected in a non functional transcript. Instead, we found higher similarity to the parental gene in a region nearby the 3' end of the trans-NATs. Conclusions: Our results do not imply a functional relation of the trans-NAT arising from pseudogenes over their respective parental genes but add evidence for it and stress the importance of duplication mechanisms of genetic material in the generation of non-coding RNAs. We also provide a plausible explanation for the large transcripts that can be found in the antisense strand of some pseudogenes.
BMC Bioinformatics, 2013
Background A popular query from scientists reading a biomedical abstract is to search for topic-r... more Background A popular query from scientists reading a biomedical abstract is to search for topic-related documents in bibliographic databases. Such a query is challenging because the amount of information attached to a single abstract is little, whereas classification-based retrieval algorithms are optimally trained with large sets of relevant documents. As a solution to this problem, we propose a query expansion method that extends the information related to a manuscript using its cited references. Results Data on cited references and text sections in 249,108 full-text biomedical articles was extracted from the Open Access subset of the PubMed Central® database (PMC-OA). Of the five standard sections of a scientific article, the Introduction and Discussion sections contained most of the citations (mean = 10.2 and 9.9 citations, respectively). A large proportion of articles (98.4%) and their cited references (79.5%) were indexed in the PubMed® database. Using the MedlineRanker abstra...
Bioinformatics, 1998
Motivation: The explosive growth of the biological sequences databases stimulated by genome proje... more Motivation: The explosive growth of the biological sequences databases stimulated by genome projects has modified the framework of several applications in the biological sequence analysis area. In most cases, this new scenario is characterized by studies on large sets of sequences, suggesting the need for effective and automatic methods for their clustering. A more effective clustering of the database could be followed by the application of common family analysis schemes to the groups so formed. Results: In this work, we present a new strategy to reduce the computational cost associated with the clustering of large sets of sequences which are expected to contain several families. The strategy is based on the grouping of the sequences into families by using a dynamic threshold on a pairwise sequence similarity criterion. Routine clustering of large data sets can now be done very efficiently. The method developed here achieves a computational space reduction of about an order of magnitude over more traditional ones of all-versus-all comparisons. The outcome of this approach produces family groupings that reproduce closely already accepted biological results. Our work includes a parallel implementation for distributed memory multiprocessors with a dynamic scheduling strategy for performance optimization. Availability: By anonymous ftp at ftp.ac.uma.es
Journal of Biomedical Discovery and Collaboration, 2010
The MEDLINE database of medical literature is routinely used by researchers and doctors to find a... more The MEDLINE database of medical literature is routinely used by researchers and doctors to find articles pertaining to their area of interest. Insight into historical changes in research areas and use of scientific language may be gained by chronological analysis of the 18 million records currently in the database, however such analysis is generally complex and time consuming. The authors’ MLTrends web application graphs term usage in MEDLINE over time, allowing the determination of emergence dates for biomedical terms and historical variations in term usage intensity. Terms considered are individual words or quoted phrases which may be combined using Boolean operators. MLTrends can plot the number of records in MEDLINE per year whose titles or abstracts match each queried term for multiple terms simultaneously. The MEDLINE database is stored and indexed on the MLTrends server allowing queries to be completed and graphs generated in less than one second. Queries may be performed on ...
Journal of cell science, Jan 23, 2015
The preprophase band of microtubules performs the critical function of marking the plane of cell ... more The preprophase band of microtubules performs the critical function of marking the plane of cell division. Although the preprophase band depolymerises at the onset of mitosis, the division plane is 'memorized' by a cortical division zone to which the phragmoplast is attracted during cytokinesis. Proteins have been discovered that are part of the molecular memory but little is known about how they contribute to phragmoplast guidance. Previously, we found that the microtubule-associated protein AIR9 is found in the cortical division zone at preprophase and returns during cell plate insertion but is absent from the cortex during the intervening mitosis. To identify new components of the preprophase memory we searched for protein interactors of AIR9. We detected the kinesin-like calmodulin binding protein, KCBP, which can be visualised at the predicted cortical site throughout division. A truncation study of KCBP indicates that its MyTH4-FERM domain is required for linking the m...
The simultaneous genotyping of thousands of single nucleotide polymorphisms (SNPs) in a genome us... more The simultaneous genotyping of thousands of single nucleotide polymorphisms (SNPs) in a genome using SNP-Arrays is a very important tool that is revolutionizing genetics and molecular biology. We expanded the utility of this technique by using it following chromatin immunoprecipitation (ChIP) to assess the multiple genomic locations protected by a protein complex recognized by an antibody. The power of this technique is illustrated through an analysis of the changes in histone H4 acetylation, a marker of open chromatin and transcriptionally active genomic regions, which occur during differentiation of human myoblasts into myotubes. The findings have been validated by the observation of a significant correlation between the detected histone modifications and the expression of the nearby genes, as measured by DNA expression microarrays. This chapter focuses on the computational analysis of the data.
Methods in Molecular Biology, 2007
StemBase is a database of gene expression data obtained from stem cells and derivatives mainly fr... more StemBase is a database of gene expression data obtained from stem cells and derivatives mainly from mouse and human using DNA microarrays and Serial Analysis of Gene Expression. Here, we describe this database and indicate ways to use it for the study the expression of particular genes in stem cells or to search for genes with particular expression profiles in stem cells, which could be associated to stem cell function or used as stem cell markers.
BMC cancer, Jan 29, 2004
Recently, several members of a vertebrate protein family containing a six trans-membrane (6TM) do... more Recently, several members of a vertebrate protein family containing a six trans-membrane (6TM) domain and involved in apoptosis and cancer (e.g. STEAP, STAMP1, TSAP6), have been identified in Golgi and cytoplasmic membranes. The exact function of these proteins remains unknown. We related this 6TM domain to distant protein families using intermediate sequences and methods of iterative profile sequence similarity search. Here we show for the first time that this 6TM domain is homolog to the 6TM heme binding domain of both the NADPH oxidase (Nox) family and the YedZ family of bacterial oxidoreductases. This finding gives novel insights about the existence of a previously undetected electron transfer system involved in apoptosis and cancer, and suggests further steps in the experimental characterization of these evolutionarily related families.
BMC genetics, Jan 22, 2005
Human inherited diseases can be associated by genetic linkage with one or more genomic regions. T... more Human inherited diseases can be associated by genetic linkage with one or more genomic regions. The availability of the complete sequence of the human genome allows examining those locations for an associated gene. We previously developed an algorithm to prioritize genes on a chromosomal region according to their possible relation to an inherited disease using a combination of data mining on biomedical databases and gene sequence analysis. We have implemented this method as a web application in our site G2D (Genes to Diseases). It allows users to inspect any region of the human genome to find candidate genes related to a genetic disease of their interest. In addition, the G2D server includes pre-computed analyses of candidate genes for 552 linked monogenic diseases without an associated gene, and the analysis of 18 asthma loci. G2D can be publicly accessed at http://www.ogic.ca/projects/g2d_2/.
Discovering Biomolecular Mechanisms with Computational Biology
The first step in understanding the molecular biology of an inherited disease is to identify whic... more The first step in understanding the molecular biology of an inherited disease is to identify which gene or genes are carrying variants. This process starts with locating the mutations in a chromosomal band, as narrow as possible, and follows with the manual analysis of all ...
Pharmacological reviews, 2014
The Mas-related G protein-coupled receptors (Mrgprs or Mas-related genes) comprise a subfamily of... more The Mas-related G protein-coupled receptors (Mrgprs or Mas-related genes) comprise a subfamily of receptors named after the first discovered member, Mas. For most Mrgprs, pruriception seems to be the major function based on the following observations: 1) they are relatively promiscuous in their ligand specificity with best affinities for itch-inducing substances; 2) they are expressed in sensory neurons and mast cells in the skin, the main cellular components of pruriception; and 3) they appear in evolution first in tetrapods, which have arms and legs necessary for scratching to remove parasites or other noxious substances from the skin before they create harm. Because parasites coevolved with hosts, each species faced different parasitic challenges, which may explain another striking observation, the multiple independent duplication and expansion events of Mrgpr genes in different species as a consequence of parallel adaptive evolution. Their predominant expression in dorsal root g...
Genome Biology, 2002
BACKGROUND: Iron uptake from the host is essential for bacteria that infect animals. To find pote... more BACKGROUND: Iron uptake from the host is essential for bacteria that infect animals. To find potential targets for drugs active against pathogenic bacteria, we have searched all completely sequenced genomes of pathogenic bacteria for genes relevant for iron transport. RESULTS: We identified a protein domain that appears in variable copy number in bacterial genes that are usually in the vicinity
Plant Signaling & Behavior, 2007
AIR9 Auxin-Induced in Root Cultures 9 IgG g-immunoglobulin domain A9 domain IgG domain found in A... more AIR9 Auxin-Induced in Root Cultures 9 IgG g-immunoglobulin domain A9 domain IgG domain found in AIR9-like proteins GFP green fluorescent protein LRR leucine-rich repeat MAP microtubule associated protein PPB preprophase band ACKNoWLedgeMeNtS This work was supported by a BBSRC Grant to Clive W. Lloyd. Miguel A. Andrade-Navarro is a Canada Research Chair in Bioinformatics.
Nucleic Acids Research, 2003
As scientific literature databases like MEDLINE increase in size, so does the time required to se... more As scientific literature databases like MEDLINE increase in size, so does the time required to search them. Scientists must frequently inspect long lists of references manually, often just reading the titles. XplorMed is a web tool that aids MEDLINE searching by summarizing the subjects contained in the results, thus allowing users to focus on subjects of interest. Here we describe new features added to XplorMed during the last 2 years (http://www.bork. embl-heidelberg.de/xplormed/).
Nucleic Acids Research, 2012
Expanded runs of consecutive trinucleotide CAG repeats encoding polyglutamine (polyQ) stretches a... more Expanded runs of consecutive trinucleotide CAG repeats encoding polyglutamine (polyQ) stretches are observed in the genes of a large number of patients with different genetic diseases such as Huntington's and several Ataxias. Protein aggregation, which is a key feature of most of these diseases, is thought to be triggered by these expanded polyQ sequences in disease-related proteins. However, polyQ tracts are a normal feature of many human proteins, suggesting that they have an important cellular function. To clarify the potential function of polyQ repeats in biological systems, we systematically analyzed available information stored in sequence and protein interaction databases. By integrating genomic, phylogenetic, protein interaction network and functional information, we obtained evidence that polyQ tracts in proteins stabilize protein interactions. This happens most likely through structural changes whereby the polyQ sequence extends a neighboring coiled-coil region to facilitate its interaction with a coiled-coil region in another protein. Alteration of this important biological function due to polyQ expansion results in gain of abnormal interactions, leading to pathological effects like protein aggregation. Our analyses suggest that research on polyQ proteins should shift focus from expanded polyQ proteins into the characterization of the influence of the wild-type polyQ on protein interactions.
Nucleic Acids Research, 2014
There are groups of genes that need coordinated repression in multiple contexts, for example if t... more There are groups of genes that need coordinated repression in multiple contexts, for example if they code for proteins that work together in a pathway or in a protein complex. Redundancy of biological regulatory networks implies that such coordinated repression might occur at both the pre-and posttranscriptional level, though not necessarily simultaneously or under the same conditions. Here, we propose that such redundancy in the global regulatory network can be detected by the overlap between the putative targets of a transcriptional repressor, as identified by a ChIP-seq experiment, and predicted targets of a microRNA (miRNA). To test this hypothesis, we used publicly available ChIP-seq data of the neural transcriptional repressor RE1 silencing transcription factor (REST) from 15 different cell samples. We found 20 miRNAs, each of which shares a significant amount of predicted targets with REST. The set of predicted associations between these 20 miRNAs and the overlapping REST targets is enriched in known miRNA targets. Many of the detected miRNAs have functions related to neural identity and glioblastoma, which could be expected from their overlap in targets with REST. We propose that the integration of experimentally determined transcription factor binding sites with miRNA-target predictions provides functional information on miRNAs.
Journal of Structural Biology, 2001
Internal repetition within proteins has been a successful strategem on multiple separate occasion... more Internal repetition within proteins has been a successful strategem on multiple separate occasions throughout evolution. Such protein repeats possess regular secondary structures and form multirepeat assemblies in three dimensions of diverse sizes and functions. In general, however, internal repetition affords a protein enhanced evolutionary prospects due to an enlargement of its available binding surface area. Constraints on sequence conservation appear to be relatively lax, due to binding functions ensuing from multiple, rather than, single repeats. Considerable sequence divergence as well as the short lengths of sequence repeats mean that repeat detection can be a particularly arduous task. We also consider the conundrum of how multiple repeats, which show strong structural and functional interdependencies, ever evolved from a single repeat ancestor. In this review, we illustrate each of these points by referring to six prolific repeat types (repeats in -propellers and -trefoils and tetratricopeptide, ankyrin, armadillo/HEAT, and leucine-rich repeats) and in other less-prolific but nonetheless interesting repeats.
Briefings in Bioinformatics, 2006
The cross-disciplinary nature of bioinformatics entails co-evolution with other biomedical discip... more The cross-disciplinary nature of bioinformatics entails co-evolution with other biomedical disciplines, whereby some bioinformatics applications become popular in certain disciplines and, in turn, these disciplines influence the focus of future bioinformatics development efforts. We observe here that the growth of computational approaches within various biomedical disciplines is not merely a reflection of a general extended usage of computers and the Internet, but due to the production of useful bioinformatics databases and methods for the rest of the biomedical scientific community. We have used the abstracts stored both in the MEDLINE database of biomedical literature and in NIH-funded project grants, to quantify two effects. First, we examine the biomedical literature as a whole and find that the use of computational methods has become increasingly prevalent across biomedical disciplines over the past three decades, while use of databases and the Internet have been rapidly increasing over the past decade. Second, we study the recent trends in the use of bioinformatics topics. We observe that molecular sequence databases are a widely adopted contribution in biomedicine from the field of bioinformatics, and that microarray analysis is one of the major new topics engaged by the bioinformatics community. Via this analysis, we were able to identify areas of rapid growth in the use of informatics to aid in curriculum planning, development of computational infrastructure and strategies for workforce education and funding.
BMC Research Notes, 2009
Background: Currently one of the largest online repositories for human and mouse stem cell gene e... more Background: Currently one of the largest online repositories for human and mouse stem cell gene expression data, StemBase was first designed as a simple web-interface to DNA microarray data generated by the Canadian Stem Cell Network to facilitate the discovery of gene functions relevant to stem cell control and differentiation. Findings: Since its creation, StemBase has grown in both size and scope into a system with analysis tools that examine either the whole database at once, or slices of data, based on tissue type, cell type or gene of interest. As of September 1, 2008, StemBase contains gene expression data (microarray and Serial Analysis of Gene Expression) from 210 stem cell samples in 60 different experiments. Conclusion: StemBase can be used to study gene expression in human and murine stem cells and is available at http://www.stembase.ca.
BMC Evolutionary Biology, 2010
Background: Naturally occurring antisense transcripts (NATs) are non-coding RNAs that may regulat... more Background: Naturally occurring antisense transcripts (NATs) are non-coding RNAs that may regulate the activity of sense transcripts to which they bind because of complementarity. NATs that are not located in the gene they regulate (trans-NATs) have better chances to evolve than cis-NATs, which is evident when the sense strand of the cis-NAT is part of a protein coding gene. However, the generation of a trans-NAT requires the formation of a relatively large region of complementarity to the gene it regulates. Results: Pseudogene formation may be one evolutionary mechanism that generates trans-NATs to the parental gene. For example, this could occur if the parental gene is regulated by a cis-NAT that is copied as a trans-NAT in the pseudogene. To support this we identified human pseudogenes with a trans-NAT to the parental gene in their antisense strand by analysis of the database of expressed sequence tags (ESTs). We found that the mutations that appeared in these trans-NATs after the pseudogene formation do not show the flat distribution that would be expected in a non functional transcript. Instead, we found higher similarity to the parental gene in a region nearby the 3' end of the trans-NATs. Conclusions: Our results do not imply a functional relation of the trans-NAT arising from pseudogenes over their respective parental genes but add evidence for it and stress the importance of duplication mechanisms of genetic material in the generation of non-coding RNAs. We also provide a plausible explanation for the large transcripts that can be found in the antisense strand of some pseudogenes.
BMC Bioinformatics, 2013
Background A popular query from scientists reading a biomedical abstract is to search for topic-r... more Background A popular query from scientists reading a biomedical abstract is to search for topic-related documents in bibliographic databases. Such a query is challenging because the amount of information attached to a single abstract is little, whereas classification-based retrieval algorithms are optimally trained with large sets of relevant documents. As a solution to this problem, we propose a query expansion method that extends the information related to a manuscript using its cited references. Results Data on cited references and text sections in 249,108 full-text biomedical articles was extracted from the Open Access subset of the PubMed Central® database (PMC-OA). Of the five standard sections of a scientific article, the Introduction and Discussion sections contained most of the citations (mean = 10.2 and 9.9 citations, respectively). A large proportion of articles (98.4%) and their cited references (79.5%) were indexed in the PubMed® database. Using the MedlineRanker abstra...
Bioinformatics, 1998
Motivation: The explosive growth of the biological sequences databases stimulated by genome proje... more Motivation: The explosive growth of the biological sequences databases stimulated by genome projects has modified the framework of several applications in the biological sequence analysis area. In most cases, this new scenario is characterized by studies on large sets of sequences, suggesting the need for effective and automatic methods for their clustering. A more effective clustering of the database could be followed by the application of common family analysis schemes to the groups so formed. Results: In this work, we present a new strategy to reduce the computational cost associated with the clustering of large sets of sequences which are expected to contain several families. The strategy is based on the grouping of the sequences into families by using a dynamic threshold on a pairwise sequence similarity criterion. Routine clustering of large data sets can now be done very efficiently. The method developed here achieves a computational space reduction of about an order of magnitude over more traditional ones of all-versus-all comparisons. The outcome of this approach produces family groupings that reproduce closely already accepted biological results. Our work includes a parallel implementation for distributed memory multiprocessors with a dynamic scheduling strategy for performance optimization. Availability: By anonymous ftp at ftp.ac.uma.es