APART: Automated Preprocessing for NMR Assignments with Reduced Tedium (original) (raw)

Towards Fully Automated Structure-Based NMR Resonance Assignment of 15 N-Labeled Proteins From Automatically Picked Peaks

Journal of Computational Biology, 2011

In NMR resonance assignment, an indispensable step in NMR protein studies, manually processed peaks from both N-labeled and C-labeled spectra are typically used as inputs. However, the use of homologous structures can allow one to use only N-labeled NMR data and avoid the added expense of using C-labeled data. We propose a novel integer programming framework for structure-based backbone resonance assignment using N-labeled data. The core consists of a pair of integer programming models: one for spin system forming and amino acid typing, and the other for backbone resonance assignment. The goal is to perform the assignment directly from spectra without any manual intervention via automatically picked peaks, which are much noisier than manually picked peaks, so methods must be error-tolerant. In the case of semiautomated/manually processed peak data, we compare our system with the Xiong-Pandurangan-Bailey-Kellogg's contact replacement (CR) method, which is the most error-tolerant method for structure-based resonance assignment. Our system, on average, reduces the error rate of the CR method by five folds on their data set. In addition, by using an iterative algorithm, our system has the added capability of using the NOESY data to correct assignment errors due to errors in predicting the amino acid and secondary structure type of each spin system. On a publicly available data set for human ubiquitin, where the typing accuracy is 83%, we achieve 91% accuracy, compared to the 59% accuracy obtained without correcting for such errors. In the case of automatically picked peaks, using assignment information from yeast ubiquitin, we achieve a fully automatic assignment with 97% accuracy. To our knowledge, this is the first system that can achieve fully automatic structure-based assignment directly from spectra. This has implications in NMR protein mutant studies, where the assignment step is repeated for each mutant.

Smartnotebook: A semi-automated approach to protein sequential NMR resonance assignments

2003

Complete and accurate NMR spectral assignment is a prerequisite for high-throughput automated structure determination of biological macromolecules. However, completely automated assignment procedures generally encounter difficulties for all but the most ideal data sets. Sources of these problems include difficulty in resolving correlations in crowded spectral regions, as well as complications arising from dynamics, such as weak or missing peaks, or atoms exhibiting more than one peak due to exchange phenomena. Smartnotebook is a semi-automated assignment software package designed to combine the best features of the automated and manual approaches. The software finds and displays potential connections between residues, while the spectroscopist makes decisions on which connection is correct, allowing rapid and robust assignment. In addition, smartnotebook helps the user fit chains of connected residues to the primary sequence of the protein by comparing the experimentally determined chemical shifts with expected shifts derived from a chemical shift database, while providing bookkeeping throughout the assignment procedure.

ERROR TOLERANT NMR BACKBONE RESONANCE ASSIGNMENT AND AUTOMATED STRUCTURE GENERATION

Journal of Bioinformatics and Computational Biology, 2011

Error tolerant backbone resonance assignment is the cornerstone of the NMR structure determination process. Although a variety of assignment approaches have been developed, none works sufficiently well on noisy fully automatically picked peaks to enable the subsequent automatic structure determination steps. We have designed an integer linear programming (ILP) based assignment system (IPASS) that has enabled fully automatic protein structure determination for four test proteins. IPASS employs probabilistic spin system typing based on chemical shifts and secondary structure predictions. Furthermore, IPASS extracts connectivity information from the inter-residue information and the (automatically picked) 15 N-edited NOESY peaks which are then used to fix reliable fragments. When applied to automatically picked peaks for real proteins, IPASS achieves an average precision and recall of 82% and 63%, respectively. In contrast, the next best method, MARS, achieves an average precision and recall of 77% and 36%, respectively. The assignments generated by IPASS are then fed into our protein structure calculation system, FALCON-NMR, to determine the 3D structures without human intervention. The final models have backbone RMSDs of 1.25Å, 0.88Å, 1.49Å, and 0.67Å to the reference native structures for proteins TM1112, CASKIN, VRAR, and HACS1, respectively. The web server is publicly available at http://monod.uwaterloo.ca/nmr/ipass. 15 16 B. Alipanahi et al. Error Tolerant NMR Resonance Assignment 17

Robust structure-based resonance assignment for functional protein studies by NMR

Journal of Biomolecular NMR, 2010

High-throughput functional protein NMR studies, like protein interactions or dynamics, require an automated approach for the assignment of the protein backbone. With the availability of a growing number of protein 3D structures, a new class of automated approaches, called structure-based assignment, has been developed quite recently. Structurebased approaches use primarily NMR input data that are not based on J-coupling and for which connections between residues are not limited by through bonds magnetization transfer efficiency. We present here a robust structure-based assignment approach using mainly H N -H N NOEs networks, as well as 1 H-15 N residual dipolar couplings and chemical shifts. The NOEnet complete search algorithm is robust against assignment errors, even for sparse input data. Instead of a unique and partly erroneous assignment solution, an optimal assignment ensemble with an accuracy equal or near to 100% is given by NOEnet. We show that even low precision assignment ensembles give enough information for functional studies, like modeling of protein-complexes. Finally, the combination of NOEnet with a low number of ambiguous J-coupling sequential connectivities yields a high precision assignment ensemble. NOEnet will be available under: http://www.icsn. cnrs-gif.fr/download/nmr.

Automated NMR resonance assignments and structure determination using a minimal set of 4D spectra

Nature communications, 2018

Automated methods for NMR structure determination of proteins are continuously becoming more robust. However, current methods addressing larger, more complex targets rely on analyzing 6-10 complementary spectra, suggesting the need for alternative approaches. Here, we describe 4D-CHAINS/autoNOE-Rosetta, a complete pipeline for NOE-driven structure determination of medium- to larger-sized proteins. The 4D-CHAINS algorithm analyzes two 4D spectra recorded using a single, fully protonated protein sample in an iterative ansatz where common NOEs between different spin systems supplement conventional through-bond connectivities to establish assignments of sidechain and backbone resonances at high levels of completeness and with a minimum error rate. The 4D-CHAINS assignments are then used to guide automated assignment of long-range NOEs and structure refinement in autoNOE-Rosetta. Our results on four targets ranging in size from 15.5 to 27.3 kDa illustrate that the structures of proteins ...

An Integrated Platform for Automated Analysis of Protein NMR Structures

Methods in Enzymology, 2005

Recent developments provide automated analysis of NMR assignments and 3D structures of proteins. These approaches are generally applicable to proteins ranging from about 50 to 150 amino acids. In this chapter, we summarize progress by the Northeast Structure Genomics Consortium in standardizing the NMR data collection process for protein structure determination, and in building an integrated platform for automated protein NMR structure analysis. Our integrated platform includes the following principal steps: (i) standarized NMR data collection, (ii) standardized data processing (including spectral referencing and Fourier transformation), (iii) automated peak picking and peak list editing, (iv) automated analysis of resonance assignments, (v) automated analysis of NOESY data together with 3D structure determination, and (vi) methods for protein structure validation. In particular, the software AutoStructure for automated NOESY data analysis is described in this Chapter, together with a discussion of practical considerations for its use in a high throughput structure production effort. The critical area of data quality assessment has evolved significantly over the last few years, and involves evaluation of both intermediate and final peak lists, resonance assignments, and structural information derived from the NMR data. Methods for quality control of each of the major automated analysis steps in our platform are also discussed. Despite significant remaining challenges, when good quality data are available, automated analysis of protein NMR assignment and structures with this platform is both fast and reliable.

A tracked approach for automated NMR assignments in proteins (TATAPRO)

Journal of Biomolecular Nmr, 2000

A novel automated approach for the sequence specific NMR assignments of 1HN, 13Ca, 13Cß, 13C'/1Ha and 15N spins in proteins, using triple resonance experimental data, is presented. The algorithm, TATAPRO (Tracked AuTomated Assignments in Proteins) utilizes the protein primary sequence and peak lists from a set of triple resonance spectra which correlate 1HN and 15N chemical shifts with those of

CcpNmr AnalysisAssign: a flexible platform for integrated NMR analysis

Journal of biomolecular NMR, 2016

NMR spectroscopy is an indispensably powerful technique for the analysis of biomolecules under ambient conditions, both for structural- and functional studies. However, in practice the complexity of the technique has often frustrated its application by non-specialists. In this paper, we present CcpNmr version-3, the latest software release from the Collaborative Computational Project for NMR, for all aspects of NMR data analysis, including liquid- and solid-state NMR data. This software has been designed to be simple, functional and flexible, and aims to ensure that routine tasks can be performed in a straightforward manner. We have designed the software according to modern software engineering principles and leveraged the capabilities of modern graphics libraries to simplify a variety of data analysis tasks. We describe the process of backbone assignment as an example of the flexibility and simplicity of implementing workflows, as well as the toolkit used to create the necessary gr...

SAGA: rapid automatic mainchain NMR assignment for large proteins

Journal of Biomolecular NMR, 2010

Here we describe a new algorithm for automatically determining the mainchain sequential assignment of NMR spectra for proteins. Using only the customary triple resonance experiments, assignments can be quickly found for not only small proteins having rather complete data, but also for large proteins, even when only half the residues can be assigned. The result of the calculation is not the single best assignment according to some criterion, but rather a large number of satisfactory assignments that are summarized in such a way as to help the user identify portions of the sequence that are assigned with confidence, vs. other portions where the assignment has some correlated alternatives. Thus very imperfect initial data can be used to suggest future experiments.

EZ-ASSIGN, a program for exhaustive NMR chemical shift assignments of large proteins from complete or incomplete triple-resonance data

Journal of Biomolecular NMR, 2013

For several of the proteins in the BioMagResBank larger than 200 residues, 60% or fewer of the backbone resonances were assigned. But how reliable are those assignments? In contrast to complete assignments, where it is possible to check whether every triple-resonance Generalized Spin System (GSS) is assigned once and only once, with incomplete data one should compare all possible assignments and pick the best one. But that is not feasible: For example, for 200 residues and an incomplete set of 100 GSS, there are 1.6*10 260 possible assignments. In "EZ-ASSIGN", the protein sequence is divided in smaller unique fragments. Combined with intelligent search approaches, an exhaustive comparison of all possible assignments is now feasible using a laptop computer. The program was tested with experimental data of a 388-residue domain of the Hsp70 chaperone protein DnaK and for a 351-residue domain of a type III secretion ATPase. EZ-ASSIGN reproduced the hand assignments. It did slightly better than the computer program PINE (Bahrami et al.,