Biobibliometrics: information retrieval and visualization from co-occurrences of gene names in Medline abstracts - PubMed (original) (raw)

Biobibliometrics: information retrieval and visualization from co-occurrences of gene names in Medline abstracts

B J Stapley et al. Pac Symp Biocomput. 2000.

Free article

Abstract

Successful information retrieval from biomedical literature databases is becoming increasingly difficult. We have developed a prototype system for retrieving and visualizing information from literature and genomic databases using gene names. The premise of our work is that, if two genes have a related biological function, the co-occurrence of two gene names (or aliases of those genes) within the biomedical literature is more likely. From a collection of Medline documents, we have extracted the number of co-occurrences of every pair of Saccharomyces cerevisiae genes. The query is automatically conflated to include gene aliases as well. In addition, the retrieved document set can be filtered by the user with a MeSH term. From this co-occurrence data we construct a matrix that contains dissimilarity measurements of every pair of genes, based on their joint and individual occurrence statistics. A graph is generated from this matrix, with node and edge inclusion being determined by a user-defined threshold. Nodes of the graph represent genes, while edge lengths are a function of the occurrence of the two genes within the literature. Nodes can be hypertext-linked to sequence databases, while edges are linked to those Medline documents that generated them. The system is a tool for efficiently exploring the biomedical information landscape and may act as a inference network.

PubMed Disclaimer

Similar articles

Cited by

Publication types

MeSH terms

LinkOut - more resources