The Mouse Genome Database (MGD): mouse biology and model systems (original) (raw)
Journal Article
,
The Jackson Laboratory, 600 Main Street, Bar Harbor, ME 04609, USA
*To whom correspondence should be addressed. Tel: +1 207 288 6248 ; Fax:
+1 207 288 6132
; Email: carol.bult@jax.org
Search for other works by this author on:
,
The Jackson Laboratory, 600 Main Street, Bar Harbor, ME 04609, USA
Search for other works by this author on:
,
The Jackson Laboratory, 600 Main Street, Bar Harbor, ME 04609, USA
Search for other works by this author on:
,
The Jackson Laboratory, 600 Main Street, Bar Harbor, ME 04609, USA
Search for other works by this author on:
,
The Jackson Laboratory, 600 Main Street, Bar Harbor, ME 04609, USA
Search for other works by this author on:
the Mouse Genome Database Group
Search for other works by this author on:
†The Mouse Genome Database Group: M.T. Airey, A. Anagnostopoulos, R. Babiuk, R.M. Baldarelli, M. Baya, J.S. Beal, S.M. Bello, D.W. Bradt, D.L. Burkart, N.E. Butler, J. Campbell, L.E. Corbani, S.L. Cousins, D.J. Dahmen, H. Dene, M.E. Dolan, K.L. Forthofer, K.S. Frazer, P. Frost, M. Hall, M. Knowlton, J.R. Lewis, I. Lu, L.J. Maltais, M. McAndrews-Hill, S. McClatchy, M.J. McCrossin, J. Mason, D.B. Miers, L.A. Miller, L. Ni, H. Onda, J.E. Ormsby, T.B.K. Reddy, D.J. Reed, B. Richards-Smith, D.R. Shaw, R. Sinclair, D. Sitnikov, C.L. Smith, P. Szauter, M. Tomczuk, M.A. Updegraff, L.L. Washburn, I.T. Witham and Y. Zhu.
Received:
19 September 2007
Revision received:
14 October 2007
Accepted:
15 October 2007
Published:
01 January 2008
Cite
Carol J. Bult, Janan T. Eppig, James A. Kadin, Joel E. Richardson, Judith A. Blake, the Mouse Genome Database Group, The Mouse Genome Database (MGD): mouse biology and model systems, Nucleic Acids Research, Volume 36, Issue suppl_1, 1 January 2008, Pages D724–D728, https://doi.org/10.1093/nar/gkm961
Close
Navbar Search Filter Mobile Enter search term Search
Abstract
The Mouse Genome Database, (MGD, http://www.informatics.jax.org /), integrates genetic, genomic and phenotypic information about the laboratory mouse, a primary animal model for studying human biology and disease. MGD data content includes comprehensive characterization of genes and their functions, standardized descriptions of mouse phenotypes, extensive integration of DNA and protein sequence data, normalized representation of genome and genome variant information including comparative data on mammalian genes. Data within MGD are obtained from diverse sources including manual curation of the biomedical literature, direct contributions from individual investigator's laboratories and major informatics resource centers such as Ensembl, UniProt and NCBI. MGD collaborates with the bioinformatics community on the development of data and semantic standards such as the Gene Ontology (GO) and the Mammalian Phenotype (MP) Ontology. MGD provides a data-mining platform that enables the development of translational research hypotheses based on comparative genotype, phenotype and functional analyses. Both web-based querying and computational access to data are provided. Recent improvements in MGD described here include the association of gene trap data with mouse genes and a new batch query capability for customized data access and retrieval.
INTRODUCTION
The Mouse Genome Database (MGD) is an integrated database of genetic, genomic and phenotypic data for the laboratory mouse ( 1–3 ). MGD is a core database component of the Mouse Genome Informatics (MGI) database resource ( http://www.informatics.jax.org ), the community model organism database for the laboratory mouse. Other resources that are integrated with MGD as part of the MGI resource include the Gene Expression Database (GXD) ( 4 ), the Mouse Tumor Biology Database (MTB) ( 5 ) and the Gene Ontology project ( 6 ).
MGD facilitates translational biomedical research by integrating data that enhances the use of the laboratory mouse as a model animal system for studying human biology and disease processes. MGD supports genome-scale electronic data mining through its integration of diverse data and use of semantic standards. Primary data types maintained in MGD include sequences, genetic and physical maps, genes and their functions, gene families, strains, mutant phenotypes, SNPs and other polymorphisms, animal models of human disease and mammalian homology ( Table 1 ). The diverse data in MGD are integrated through a combination of expert human curation and automated processes that evaluate when different data refer to the same gene. MGD employs controlled and structured vocabularies (i.e. ontologies) to facilitate knowledge representation and data retrieval. Examples of vocabularies and ontologies that are used for annotation include the Gene Ontology (GO), and the Mammalian Phenotype (MP) Ontology ( 7 ). Mouse genes and gene products in MGD are also associated with Online Mendelian Inheritance in Man (OMIM) human phenotype terms, InterPro protein domain descriptions and PIR protein super family classifications. The staff of MGD collaborates with members of other large genome informatics resources to maintain a comprehensive catalog of mouse genes and genome features, and to resolve inconsistencies in the representation of mouse genome features as needed. MGD is the authoritative source for mouse gene, allele and strain nomenclature and GO annotations for mouse gene function. MGD contains the most comprehensive source of mouse phenotype information and associations between human diseases and mouse models.
Table 1.
Summary of MGD data content (5 September 2007)
MGD data statistics | 5 September 2007 |
---|---|
Number of genes with sequence data | 29 031 |
Number of genes (including uncloned mutations) | 36 473 |
Number of markers (including genes) | 72 020 |
Markers mapped | 67 536 |
Genes with protein sequence information | 24 961 |
Genes with GO annotations | 18 049 |
Mouse/human orthologs | 16 927 |
Mouse/rat orthologs | 15 801 |
Genes with one or more phenotypic alleles | 7205 |
Genes with targeted alleles | 4751 |
Phenotypic alleles | 18 491 |
Phenotype alleles that are targeted mutations | 10 536 |
Human diseases with one or more mouse models | 790 |
QTLs | 3601 |
Number of references | 119 121 |
Mouse RefSNPs | 6 348 628 |
Mouse nucleotide sequences integrated into the MGI system (includes ESTs) | >8 400 000 |
MGD data statistics | 5 September 2007 |
---|---|
Number of genes with sequence data | 29 031 |
Number of genes (including uncloned mutations) | 36 473 |
Number of markers (including genes) | 72 020 |
Markers mapped | 67 536 |
Genes with protein sequence information | 24 961 |
Genes with GO annotations | 18 049 |
Mouse/human orthologs | 16 927 |
Mouse/rat orthologs | 15 801 |
Genes with one or more phenotypic alleles | 7205 |
Genes with targeted alleles | 4751 |
Phenotypic alleles | 18 491 |
Phenotype alleles that are targeted mutations | 10 536 |
Human diseases with one or more mouse models | 790 |
QTLs | 3601 |
Number of references | 119 121 |
Mouse RefSNPs | 6 348 628 |
Mouse nucleotide sequences integrated into the MGI system (includes ESTs) | >8 400 000 |
Table 1.
Summary of MGD data content (5 September 2007)
MGD data statistics | 5 September 2007 |
---|---|
Number of genes with sequence data | 29 031 |
Number of genes (including uncloned mutations) | 36 473 |
Number of markers (including genes) | 72 020 |
Markers mapped | 67 536 |
Genes with protein sequence information | 24 961 |
Genes with GO annotations | 18 049 |
Mouse/human orthologs | 16 927 |
Mouse/rat orthologs | 15 801 |
Genes with one or more phenotypic alleles | 7205 |
Genes with targeted alleles | 4751 |
Phenotypic alleles | 18 491 |
Phenotype alleles that are targeted mutations | 10 536 |
Human diseases with one or more mouse models | 790 |
QTLs | 3601 |
Number of references | 119 121 |
Mouse RefSNPs | 6 348 628 |
Mouse nucleotide sequences integrated into the MGI system (includes ESTs) | >8 400 000 |
MGD data statistics | 5 September 2007 |
---|---|
Number of genes with sequence data | 29 031 |
Number of genes (including uncloned mutations) | 36 473 |
Number of markers (including genes) | 72 020 |
Markers mapped | 67 536 |
Genes with protein sequence information | 24 961 |
Genes with GO annotations | 18 049 |
Mouse/human orthologs | 16 927 |
Mouse/rat orthologs | 15 801 |
Genes with one or more phenotypic alleles | 7205 |
Genes with targeted alleles | 4751 |
Phenotypic alleles | 18 491 |
Phenotype alleles that are targeted mutations | 10 536 |
Human diseases with one or more mouse models | 790 |
QTLs | 3601 |
Number of references | 119 121 |
Mouse RefSNPs | 6 348 628 |
Mouse nucleotide sequences integrated into the MGI system (includes ESTs) | >8 400 000 |
Researchers can query MGD using simple keywords, vocabulary browsers and web-based query forms. Keywords can include any free text including gene symbols, anatomical terms, strain names, phenotypes and disease terms, etc. MGD also provides several vocabulary browsers to support browsing of the database content using controlled vocabulary terms. For example, MGD's Human Disease Vocabulary Browser supports access to mouse genotype information that has been cross-referenced to human disease terms in OMIM. Finally, the MGD web-based query forms allow users to formulate queries of differing degrees of specificity. For example, using the Genes and Markers Query form in MGD, one can query for a list of all genes on mouse Chromosome 1. The Genes and Markers query form can also support more complex, biologically relevant queries that leverage the data integration in MGD. The query above, for example, could be refined to return a list of genes on mouse Chromosome 1 where the genes are associated with eye dysmorphology and have been annotated as transcription factors.
Data in MGD are updated daily. Data access is accomplished via dynamically generated web pages, text files available via FTP (updated nightly) and through direct SQL (Structured Query Language; user account is required). In general, there are 4–6 major software releases per year to support access and display of new data types. A recent summary of MGD content is shown in Table 1 .
IMPROVEMENTS DURING 2007
Inclusion of gene trap data
MGD now includes the association of gene trap mutant cell IDs with mouse genes. The data for the gene trap are obtained on a regular basis from the dbGSS division of GenBank. The data in dbGSS include sequences associated with both exon and gene traps. The mouse data in dbGSS are expected to grow markedly as a consequence of several initiatives that have begun to generate a knockout allele for every mouse gene ( 8 ). Gene trap associations for a given gene or marker are included in the ‘Other Database Links’ section of the gene detail report. Currently in MGD, 10 708 mouse genes are associated with at least one gene trap. There are over 124 000 genomic survey sequences from NCBI's dbGSS database that cannot be unambiguously associated with mouse genes.
Queries for specific gene traps can be accomplished by querying MGD using the dbGSS sequence accession identifiers. In addition, tab-delimited reports of gene trap in MGD can be viewed or downloaded from the MGI FTP site. The FTP reports include ‘gene traps associated with markers’ and ‘gene traps not associated with markers’. The report of gene trap sequences that cannot be associated with a mouse gene includes a brief notation describing why the sequence cannot be associated to a gene. (e.g. no match to the reference genome sequence, multiple good matches to the reference genome sequence, etc.)
Batch query tool
The new MGI Batch Query Tool ( http://www.informatics.jax.org/javawi2/servlet/WIFetch?page=batchQF ) provides the ability to access information about nomenclature, genome location, function or phenotype associations for many genes/markers in a single query ( Figure 1 ). The allowable input into the Batch Query Tool includes current gene symbols, Ensembl gene ids, EntrezGene ids, VEGA gene ids, MGI ids, RefSeq ids and GenBank sequence accession ids. These data can be uploaded as a file or pasted into a text box on the query form. Users can specify the desired output and output format (web or tab-delimited text). The Batch Query Tool is particularly useful for researchers who use non-MGI gene accession identifiers in their analyses but who want to connect those identifiers to the rich functional and phenotypic annotations for mouse genes contained in MGD.
Figure 1.
Screenshot showing the new MGI Batch Query Tool. Inputs into the query form (A) can be lists of sequence or gene identifiers. The output of the query (B) can be gene identifiers from other resources, genome location, gene nomenclature, functional annotations and phenotype annotations. Output from the batch query form can be displayed as a web form or as a tab-delimited file (C) .
Links to TreeFam and other comparative resources
The comparative data resources in MGD now includes links to the TreeFam ( 9 ) resource and the availability of graphical displays of mammalian genes organized by the OrthoDisease ( 10 ) sets ( Figure 2 ). TreeFam provides curated information about ortholog and paralog assignments and the evolutionary history of various gene families. Hypertext links to TreeFam are from the Genes and Markers or the Mammalian Orthology detail pages in MGD.
Figure 2.
Screenshot showing the Gene Ontology Molecular Function annotation graph for a gene associated with OMIM disorder ‘Aniridia, type II’ (OMIM id 106210). The graph displays experimental GO annotations for the human gene (PAX6) associated with this disorder as well as annotations for orthologous genes in other organisms (mouse, rat, nematode, chicken and yeast) based on the OrthoDisease set. The graph nodes are color-coded to indicate the organism that is the source of the annotation. The full graph and table of annotations can be viewed at: http://proto.informatics.jax.org/prototypes/GOgraphEX/OrthoDisease_Graphs/ OMIM_DisorderGraphs/106210.html
The OrthoDisease orthology resource ( http://orthodisease.cgb.ki.se /) provides eukaryotic orthology sets based on InParanoid analysis for genes in 26 organisms ( 11 ). These orthology sets are organized in relationship to 3409 diseases as represented in OMIM ( 12 ).
OTHER INFORMATION
Mouse Gene, allele and strain nomenclature
MGD is responsible for assigning unique symbols and names to mouse genes, alleles and strains following the guidelines set by the ‘International Committee on Standardized Genetic Nomenclature for Mice’ ( http://www.informatics.jax.org/nomen ). This official nomenclature is widely disseminated through regular data exchange and curation of shared links between MGI and other bioinformatics resources. MGD staff works with editors of journal publications to promote adherence to mouse nomenclature standards in publications.
The MGD nomenclature group works closely with nomenclature specialists for human ( http://www.genenames.org /) and rat ( http://rgd.mcw.edu ) to provide consistent nomenclature for mammalian species. The mouse and human nomenclature committees collaborate with scientific experts in specific domain areas to represent the latest knowledge about gene families such as the alpha-tubulin family ( 13 ) or the RCAN gene family ( 14 ). The MGD nomenclature coordinator can be contacted by email ( nomen@informatics.jax.org ).
Electronic data submission
MGD accepts contributed data sets for any type of data maintained by the database. The most frequent types of contributed data are mutant and phenotypic allele information originating with the large mouse mutagenesis centers and repositories that contribute to the International Mouse Strain Resource (IMSR, http://www.imsr.org ) ( 9 ). Each electronic submission receives a permanent database accession ID. All data sets are associated with their source, either a publication or an electronic submission reference. Details about data submission procedures can be found at: http://www.informatics.jax.org/mgihome/submissions/submissions_menu.shtml .
Community outreach and user support
MGD User Support can be accessed through online documentation and easy email or phone access to User Support Staff. MGD User Support staff are also available for on-site training on the use of MGD and other MGI data resources. The traveling tutorial program includes lectures, demos and hands-on tutorials that can be customized according to the research interests of the audience.
- World wide web: http://www.informatics.jax.org/mgihome/support/support.shtml
- Email access: mgi-help@informatics.jax.org
- Telephone access: 1 207 288 6445
- FAX access: 1 207 288 6132
Other Outreach
MGI-LIST ( http://www.informatics.jax.org/mgihome/lists/lists.shtml ) is a moderated and active email bulletin board supported by the MGD User Support group.
HIGH-LEVEL OVERVIEW OF THE MAIN COMPONENTS AND IMPLEMENTATION
MGD is implemented in the Sybase relational database management system with ∼180 tables within which the biological information is stored. BLAST-able databases and genome assembly files for sequence data are stored outside the relational database. An editing interface and automated load programs are used to input data into the MGD system. The editing interface (EI) is an interactive, graphical application used by curators. Automated load programs that integrate larger data sets from many sources into the database include quality control (QC) checks and processing algorithms that integrate the bulk of the data automatically and identify issues to be resolved by curators or the data provider. Thus, through EI and automated loads, we acquire and integrate large amounts of data into a high-quality, knowledgebase.
Public data access is provided through the web interface (WI) where users can interactively query and download our data through a web browser. MouseBLAST allows users to do sequence similarity searches against a variety of rodent sequence databases that are updated weekly from selected sequence databases from NCBI, UniProt and other providers. Mouse GBrowse allows users to visualize mouse data sets against the genome as a series of linear tracks. FTP reports are a major source for other data providers who link to or use MGD data in their products, and for computational biologists who use MGD data in their analyses. Programmatic access to MGD via web services is under development. All MGD files and programs are openly and freely available.
CITING MGD
For a general citation of the Mouse Genome Informatics (MGI) resource please cite this article. In addition, the following citation format is suggested when referring to data sets specific to the MGD component of MGI: Mouse Genome Database (MGD), Mouse Genome Informatics, The Jackson Laboratory, Bar Harbor, Maine (URL: http://www.informatics.jax.org ). [Type in date (month, year) when you retrieved the data cited].
ACKNOWLEDGEMENTS
The Mouse Genome Database is supported by NIH/NHGRI grant HG000330. Funding to pay the Open Access publication charges for this article was provided by HG000330.
Conflict of interest statement . None declared.
REFERENCES
1
the Mouse Genome Informatics group.
The Mouse Genome Database (MGD): new features facilitating a model system
,
Nucleic Acids Res.
,
2007
, vol.
35
pg.
637
2
Mouse Genome Database Group.
The Mouse Genome Database (MGD): updates and enhancements
,
Nucleic Acids Res.
,
2006
, vol.
34
(pg.
D562
-
D567
)
3
The Mouse Genome Database Group.
The Mouse Genome Database (MGD): from genes to mice – a community resource for mouse biology
,
Nucleic Acids Res.
,
2005
, vol.
33
(pg.
D471
-
D475
)
4
et al.
The Mouse Gene Expression Database (GXD): updates and enhancements
,
Nucleic Acids Res
,
2004
, vol.
32
(pg.
D568
-
D571
)
5
The Mouse Tumor Biology Database: integrated access to mouse cancer biology data
,
Exp. Lung Res
,
2005
, vol.
31
(pg.
259
-
270
)
6
The Gene Ontology Consortium
The Gene Ontology (GO) project in 2008
,
Nucleic Acids Res
,
2008
, vol.
36
7
The Mammalian Phenotype Ontology as a tool for annotating, analyzing and comparing phenotypic information
,
Genome Biol.
,
2005
, vol.
6
pg.
R7
8
The International Mouse Knockout Consortium
A mouse for all reasons
,
Cell
,
2007
, vol.
128
(pg.
9
-
13
)
9
et al.
TreeFam: a curated database of phylogenetic trees of animal gene families
,
Nucleic Acids Res
,
2006
, vol.
34
(pg.
D572
-
D580
)
10
Using ontology visualization to coordinate cross-species functional annotation for human disease genes. Computer-based medical systems
,
Proceedings of the Nineteenth IEEE Symposium on Computer-Based Medical Systems (CBMS'06): Ontologies for Biomedical Systems
,
2006
Salt Lake City, Utah
(pg.
583
-
587
)
11
OrthoDisease: a database of human disease orthologs
,
Hum. Mutat.
,
2004
, vol.
24
(pg.
112
-
119
)
12
Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders
,
Nucleic Acids Res.
,
2005
, vol.
33
Database issue
(pg.
D514
-
D517
)
13
et al.
A revised nomenclature for the human and rodent alpha-tubulin gene family
,
Genomics
,
2007
, vol.
90
(pg.
285
-
289
)
14
et al.
Renaming the DSCR1/Adapt78 gene family as RCAN :regulators of calcineurin
,
FASEB J.
,
2007
, vol.
21
doi:10.1096/fj.06-7246com
Author notes
†The Mouse Genome Database Group: M.T. Airey, A. Anagnostopoulos, R. Babiuk, R.M. Baldarelli, M. Baya, J.S. Beal, S.M. Bello, D.W. Bradt, D.L. Burkart, N.E. Butler, J. Campbell, L.E. Corbani, S.L. Cousins, D.J. Dahmen, H. Dene, M.E. Dolan, K.L. Forthofer, K.S. Frazer, P. Frost, M. Hall, M. Knowlton, J.R. Lewis, I. Lu, L.J. Maltais, M. McAndrews-Hill, S. McClatchy, M.J. McCrossin, J. Mason, D.B. Miers, L.A. Miller, L. Ni, H. Onda, J.E. Ormsby, T.B.K. Reddy, D.J. Reed, B. Richards-Smith, D.R. Shaw, R. Sinclair, D. Sitnikov, C.L. Smith, P. Szauter, M. Tomczuk, M.A. Updegraff, L.L. Washburn, I.T. Witham and Y. Zhu.
© 2007 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
I agree to the terms and conditions. You must accept the terms and conditions.
Submit a comment
Name
Affiliations
Comment title
Comment
You have entered an invalid code
Thank you for submitting a comment on this article. Your comment will be reviewed and published at the journal's discretion. Please check for further notifications by email.
Citations
Views
Altmetric
Metrics
Total Views 3,146
2,493 Pageviews
653 PDF Downloads
Since 1/1/2017
Month: | Total Views: |
---|---|
January 2017 | 7 |
February 2017 | 17 |
March 2017 | 15 |
April 2017 | 10 |
May 2017 | 24 |
June 2017 | 12 |
July 2017 | 7 |
August 2017 | 8 |
September 2017 | 9 |
October 2017 | 10 |
November 2017 | 16 |
December 2017 | 61 |
January 2018 | 43 |
February 2018 | 41 |
March 2018 | 58 |
April 2018 | 38 |
May 2018 | 39 |
June 2018 | 36 |
July 2018 | 51 |
August 2018 | 30 |
September 2018 | 22 |
October 2018 | 39 |
November 2018 | 45 |
December 2018 | 34 |
January 2019 | 37 |
February 2019 | 41 |
March 2019 | 56 |
April 2019 | 52 |
May 2019 | 57 |
June 2019 | 47 |
July 2019 | 33 |
August 2019 | 61 |
September 2019 | 57 |
October 2019 | 25 |
November 2019 | 50 |
December 2019 | 27 |
January 2020 | 28 |
February 2020 | 21 |
March 2020 | 37 |
April 2020 | 176 |
May 2020 | 66 |
June 2020 | 29 |
July 2020 | 29 |
August 2020 | 21 |
September 2020 | 55 |
October 2020 | 55 |
November 2020 | 39 |
December 2020 | 25 |
January 2021 | 40 |
February 2021 | 44 |
March 2021 | 39 |
April 2021 | 33 |
May 2021 | 30 |
June 2021 | 34 |
July 2021 | 32 |
August 2021 | 41 |
September 2021 | 69 |
October 2021 | 69 |
November 2021 | 22 |
December 2021 | 13 |
January 2022 | 23 |
February 2022 | 16 |
March 2022 | 16 |
April 2022 | 20 |
May 2022 | 33 |
June 2022 | 12 |
July 2022 | 19 |
August 2022 | 19 |
September 2022 | 48 |
October 2022 | 31 |
November 2022 | 17 |
December 2022 | 19 |
January 2023 | 23 |
February 2023 | 34 |
March 2023 | 14 |
April 2023 | 31 |
May 2023 | 18 |
June 2023 | 24 |
July 2023 | 12 |
August 2023 | 19 |
September 2023 | 19 |
October 2023 | 20 |
November 2023 | 24 |
December 2023 | 35 |
January 2024 | 42 |
February 2024 | 45 |
March 2024 | 70 |
April 2024 | 20 |
May 2024 | 18 |
June 2024 | 21 |
July 2024 | 32 |
August 2024 | 28 |
September 2024 | 24 |
October 2024 | 29 |
November 2024 | 9 |
Citations
315 Web of Science
×
Email alerts
Citing articles via
More from Oxford Academic