NRG-CING: integrated validation reports of remediated experimental biomolecular NMR data and coordinates in wwPDB (original) (raw)

BioMagResBank database with sets of experimental NMR constraints corresponding to the structures of over 1400 biomolecules deposited in the Protein Data Bank

Journal of biomolecular NMR, 2003

Experimental constraints associated with NMR structures are available from the Protein Data Bank (PDB) in the form of "Magnetic Resonance" (MR) files. These files contain multiple types of data concatenated without boundary markers and are difficult to use for further research. Reported here are the results of a project initiated to annotate, archive, and disseminate these data to the research community from a searchable resource in a uniform format. The MR files from a set of 1410 NMR structures were analyzed and their original constituent data blocks annotated as to data type using a semi-automated protocol. A new software program called Wattos was then used to parse and archive the data in a relational database. From the total number of MR file blocks annotated as constraints, it proved possible to parse 84% (3337/3975). The constraint lists that were parsed correspond to three data types (2511 distance, 788 dihedral angle, and 38 residual dipolar couplings lists) from ...

BioMagResBank (BMRB) as a partner in the Worldwide Protein Data Bank (wwPDB): new policies affecting biomolecular NMR depositions

Journal of Biomolecular NMR, 2008

We describe the role of the BioMagResBank (BMRB) within the Worldwide Protein Data Bank (wwPDB) and recent policies affecting the deposition of biomolecular NMR data. All PDB depositions of structures based on NMR data must now be accompanied by experimental restraints. A scheme has been devised that allows depositors to specify a representative structure and to define residues within that structure found experimentally to be largely unstructured. The BMRB now accepts coordinate sets representing three-dimensional structural models based on experimental NMR data of molecules of biological interest that fall outside the guidelines of the Protein Data Bank (i.e., the molecule is a peptide with 23 or fewer residues, a polynucleotide with 3 or fewer residues, a polysaccharide with 3 or fewer sugar residues, or a natural product), provided that the coordinates are accompanied by representation of the covalent structure of the molecule (atom connectivity), assigned NMR chemical shifts, and the structural restraints used in generating model. The BMRB now contains an archive of NMR data for metabolites and other small molecules found in biological systems.

The Accuracy of NMR Protein Structures in the Protein Data Bank

SSRN Electronic Journal, 2021

We recently described a method, ANSURR, for measuring the accuracy of NMR protein structures. It is based on comparing residue-specific measures of rigidity from backbone chemical shifts via the random coil index, and from structures. Here, we report the use of ANSURR to analyse NMR ensembles within the Protein Data Bank (PDB). NMR structures cover a wide range of accuracy, which improved over time until about 2005, since when accuracy has not improved. Most structures have accurate secondary structure, but are too floppy, particularly in loops. There is a need for more experimental restraints in loops. The best current accuracy measures are Ramachandran distribution and number of NOE restraints per residue. The precision of structure ensembles correlates with accuracy, as does the number of hydrogen bond restraints per residue. If a structure contains additional components (such as additional polypeptide chains or ligands), then their inclusion improves accuracy. Analysis of over 7000 PDB NMR ensembles is available via our website ansurr.com. .

Concepts and tools for NMR restraint analysis and validation

Concepts in Magnetic Resonance, 2004

The quality of NMR-derived biomolecular structure models can be assessed by validation on the level of structural characteristics as well as the NMR data used to derive the structure models. Here, an overview is given of the common methods to validate experimental NMR data. These methods provide measures of quality and goodness of fit of the structure to the data. A detailed discussion is given of newly developed methods to assess the information contained in experimental NMR restraints, which provide powerful tools for validation and error analysis in NMR structure determination.

Straightforward and complete deposition of NMR data to the PDBe

Journal of Biomolecular NMR, 2010

We present a suite of software for the complete and easy deposition of NMR data to the PDB and BMRB. This suite uses the CCPN framework and introduces a freely downloadable, graphical desktop application called CcpNmr Entry Completion Interface (ECI) for the secure editing of experimental information and associated datasets through the lifetime of an NMR project. CCPN projects can be created within the CcpNmr Analysis software or by importing existing NMR data files using the CcpNmr FormatConverter. After further data entry and checking with the ECI, the project can then be rapidly deposited to the PDBe using AutoDep, or exported as a complete deposition NMR-STAR file. In full CCPN projects created with ECI, it is straightforward to select chemical shift lists, restraint data sets, structural ensembles and all relevant associated experimental collection details, which all are or will become mandatory when depositing to the PDB.

Protein Structure Characterization From NMR Chemical Shifts

2016

In order to understand the complex biological functions of proteins, highly detailed, atomic resolution protein structures are needed. Experimental methods such as X-ray crystallography and NMR spectroscopy provide standard platforms for determining the atomic-resolution structures of proteins. However, a continuing bottleneck in conventional NOE-based NMR structure determination lies in the difficulty of measuring NOEs for medium-to-large proteins and the resulting time-costs and the corresponding reduction in structure accuracy and precision. This has led to an increased interest in using other easily identifiable NMR parameters, such as chemical shifts, to facilitate protein structure determination by NMR. Chemical shifts, often considered as mileposts of NMR, have long been used to decipher the structures of small molecules. However, chemical shifts are much less frequently utilized for structural interpretation of larger macromolecules such as peptides and proteins. Most existing macromolecular methods use chemical shifts and various heuristic, rule-based algorithms to identify and determine a small number of structural parameters (such as secondary structure). Other methods, such as CS-Rosetta and CS23D, which attempt to determine 3D structures from chemical shifts alone, are only modestly successful (~50% success). So while good progress has been made, I believe that there is still substantial room for improvement and that the "Shift-to-Structure" problem has not yet been fully solved. My PhD project involves investigating innovative computational and machine-learning approaches to develop chemical-shift based prediction models to determine protein structures with high efficiency and high accuracy (>90%). More specifically, my thesis consists of three major components: a) shift-based local protein structure prediction; b) prediction of protein local/non-local interactions from sequence and chemical shifts; and c) tertiary fold recognition iii from chemical shifts. Towards that goal, I have developed several chemical-shift based prediction models that exploit advanced computational and machine-learning algorithms. In particular, I developed a) CSI 2.0 -a multi-class prediction method for protein local structure prediction from chemical shift data; b) CSI 3.0 -a computational model that identifies detailed local structure and structural motifs in proteins using chemical shift data; c) ShiftASA -a boosted tree regression model for predicting accessible surface area from chemical shifts; and d) E-Thrifty -a protein fold recognition method that performs chemical shift threading to identify and generate the most probable fold or 3D structure that a query protein may have. Validation of these proposed methods was performed using several independent test sets and the results indicate substantial improvements over other state-of-the-art methods. Given their superior performance, I believe that these methods will be useful contributions to the field of NMR-based protein structure determination and will be fundamental to the development 3D structure determination protocols that use only chemical shift data. The introductory discussion in Chapter 1 as well as the concluding analysis in Chapter 6 are my original work. Chapter 5, which is also my original work, is being prepared as a paper for submission to the Journal of Biomolecular NMR. Chapter 2 of this thesis has been published as: Hafsa, N. E., & Wishart, D. S. (2015). "CSI 2.0: a significantly improved version of the Chemical Shift Index". Journal of Biomolecular NMR, 60(2-3), 131-146. I was responsible for the algorithm creation, program development, experimental assessment and resulting analysis as well as the manuscript composition. Dr. Wishart was the supervisory author and was involved with the concept formation, testing and manuscript composition/editing. Chapter 3 of this thesis has been published as: Hafsa, N. E., Arndt, D., & Wishart, D. S. (2015). "CSI 3.0: a web server for identifying secondary and super-secondary structure in proteins using NMR chemical shifts". Nucleic Acids Research, 43, W370-377. I was responsible for the algorithm creation, program development, experimental assessment and resulting analysis as well as the manuscript composition. David Arndt assisted with the web server and program development. Dr. Wishart was the supervisory author and assisted with the concept formation, algorithm creation, testing and manuscript composition/editing.

Recommendations of the wwPDB NMR Validation Task Force

Structure, 2013

As methods for analysis of biomolecular structure and dynamics using nuclear magnetic resonance spectroscopy (NMR) continue to advance, the resulting 3D structures, chemical shifts, and other NMR data are broadly impacting biology, chemistry, and medicine. Structure model assessment is a critical area of NMR methods development, and is an essential component of the process of making these structures accessible and useful to the wider scientific community. For these reasons, the Worldwide Protein Data Bank (wwPDB) has convened an NMR Validation Task Force (NMR-VTF) to work with the wwPDB partners in developing metrics and policies for biomolecular NMR data harvesting, structure representation, and structure quality assessment. This paper summarizes the recommendations of the NMR-VTF, and lays the groundwork for future work in developing standards and metrics for biomolecular NMR structure quality assessment.

Biomolecular NMR: Past and future

Archives of biochemistry and biophysics, 2017

The editors of this special volume suggested this topic, presumably because of the perspective lent by our combined >90-year association with biomolecular NMR. What follows is our personal experience with the evolution of the field, which we hope will illustrate the trajectory of change over the years. As for the future, one can confidently predict that it will involve unexpected advances. Our narrative is colored by our experience in using the NMR Facility for Biomedical Studies at Carnegie-Mellon University (Pittsburgh) and in developing similar facilities at Purdue (1977-1984) and the University of Wisconsin-Madison (1984-). We have enjoyed developing NMR technology and making it available to collaborators and users of these facilities. Our group's association with the Biological Magnetic Resonance data Bank (BMRB) and with the Worldwide Protein Data Bank (wwPDB) has also been rewarding. Of course, many groups contributed to the early growth and development of biomolecular...