UniProt: the universal protein knowledgebase in 2021 - PubMed (original) (raw)

UniProt: the universal protein knowledgebase in 2021

UniProt Consortium. Nucleic Acids Res. 2021.

Abstract

The aim of the UniProt Knowledgebase is to provide users with a comprehensive, high-quality and freely accessible set of protein sequences annotated with functional information. In this article, we describe significant updates that we have made over the last two years to the resource. The number of sequences in UniProtKB has risen to approximately 190 million, despite continued work to reduce sequence redundancy at the proteome level. We have adopted new methods of assessing proteome completeness and quality. We continue to extract detailed annotations from the literature to add to reviewed entries and supplement these in unreviewed entries with annotations provided by automated systems such as the newly implemented Association-Rule-Based Annotator (ARBA). We have developed a credit-based publication submission interface to allow the community to contribute publications and annotations to UniProt entries. We describe how UniProtKB responded to the COVID-19 pandemic through expert curation of relevant entries that were rapidly made available to the research community through a dedicated portal. UniProt resources are available under a CC-BY (4.0) license via the web at https://www.uniprot.org/.

© The Author(s) 2020. Published by Oxford University Press on behalf of Nucleic Acids Research.

PubMed Disclaimer

Figures

Figure 1.

Figure 1.

Growth in the number of entries in the UniProt databases over the last decade.

Figure 2.

Figure 2.

Bacillus subtilis proteomes viewed on the Proteomes webpage with BUSCO and CPD scores. The left-hand panel suggests further option by which the user could filter the data, for example by only selecting reference proteomes.

Figure 3.

Figure 3.

Catalytic activity comment and Rhea reaction visualized by the reaction graphic for the histone-lysine _N_-methyltransferase EHMT2 (UniProtKB:Q96KQ7).

Figure 4.

Figure 4.

Information extracted from an entry describing Hepatitis C viral protein (UniProtKB:P27958) highlighting annotation added at the processed mature chain level, describing the p21 core protein. (PRO_0000037566).

Figure 5.

Figure 5.

Community contribution. (i) Use ‘Add a publication’ functionality (red box) in the UniProtKB entry. (ii) Partial snapshot of the submission form, a sample available here:

https://community.uniprot.org/bbsub/sampleform.html

. (iii) After submission and review the publication and information are displayed in the relevant UniProtKB entry with attribution to submitter (red box) in a future public release.

Figure 6.

Figure 6.

(A) The UniProtKB interaction viewer as seen in entry UniProtKB:Q9NSA3, the beta-catenin-interacting protein 1. (B) The interaction viewer reusable web component in the Nightingale library.

Similar articles

Cited by

References

    1. Drysdale R., Cook C.E., Petryszak R., Baillie-Gerritsen V., Barlow M., Gasteiger E., Gruhl F., Haas J., Lanfear J., Lopez R. et al. .. The ELIXIR core data resources: fundamental infrastructure for the life sciences. Bioinformatics. 2020; 36:2636–2642. - PMC - PubMed
    1. Garcia L., Bolleman J., Gehant S., Redaschi N., Martin M., Consortium UniProt. FAIR adoption, assessment and challenges at UniProt. Sci Data. 2019; 6:175. - PMC - PubMed
    1. UniProt Consortium UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 2019; 47:D506–D515. - PMC - PubMed
    1. Karsch-Mizrachi I., Takagi T., Cochrane G International Nucleotide Sequence Database Collaboration . The international nucleotide sequence database collaboration. Nucleic Acids Res. 2018; 46:D48–D51. - PMC - PubMed
    1. Yates A.D., Achuthan P., Akanni W., Allen J., Allen J., Alvarez-Jarreta J., Amode M.R., Armean I.M., Azov A.G., Bennett R. et al. .. Ensembl 2020. Nucleic Acids Res. 2020; 48:D682–D688. - PMC - PubMed

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources