Scipion3: A workflow engine for cryo-electron microscopy image processing and structural biology (original) (raw)

Scipion web tools: Easy to use cryo-EM image processing over the web

Protein Science

Macromolecular structural determination by Electron Microscopy under cryogenic conditions is revolutionizing the field of structural biology, interesting a large community of potential users. Still, the path from raw images to density maps is complex, and sophisticated image processing suites are required in this process, often demanding the installation and understanding of different software packages. Here, we present Scipion Web Tools, a web-based set of tools/workflows derived from the Scipion image processing framework, specially tailored to nonexpert users in need of very precise answers at several key stages of the structural elucidation process.

Automated cryo-electron microscopy

Proceedings IEEE International Symposium on Biomedical Imaging

Cryo-electron microscopy is widely viewed as a uniquely powerful method for the study of membrane proteins and large macromolecular complexes-subjects that are viewed as extremely challenging or impossible to study using x-ray or NMR methods. Although the methodology of molecular microscopy has enormous potential, it is time consuming and labor intensive. Our group has done extensive work to automate image acquisition and processing for cryo-EM. In this paper we will provide an overview of our automated system, called Leginon, and present results where we used tobacco mosaic virus (TMV) as a proof of concept.

Computer controlled cryo-electron microscopy – TOM2 a software package for high-throughput applications

Journal of Structural Biology, 2011

Automated data acquisition expedites structural studies by electron microscopy and it allows to collect data sets of unprecedented size and consistent quality. In electron tomography it greatly facilitates the systematic exploration of large cellular landscapes and in single particle analysis it allows to generate data sets for an exhaustive classification of coexisting molecular states. Here we describe a novel software philosophy and architecture that can be used for a great variety of automated data acquisition scenarios. Based on our original software package TOM, the new TOM(2) package has been designed in an object-oriented way. The whole program can be seen as a collection of self-sufficient modules with defined relationships acting in a concerted manner. It subdivides data acquisition into a set of hierarchical tasks, bonding data structure and the operations to be performed tightly together. To demonstrate its capacity for high-throughput data acquisition it has been used in conjunction with instrumentation combining the latest technological achievements in electron optics, cryogenics and robotics. Its performance is demonstrated with a single particle analysis case study and with a batch tomography application.

CryoDiscoveryTM: A Machine Learning Platform for Automated Cryo-electron Microscopy Particle Classification

Microscopy and Microanalysis

Cryogenic electron microscopy (Cryo-EM) produces high-resolution 3D images at angstrom levels used by researchers across a broad range of fields including structural biology, life Science, materials science, nanotechnology, semiconductors, energy, environmental science, and food science. Advancements in microscopy hardware enable production of 2D and 3D micrographs with near sub-Angstrom resolution, but require exponentially increasing data processing and storage capability. Images generated by cryo-EM are visually noisy, and each project can produce more than 100,000 images and take weeks to arrive at one viewable 3D structure. Many steps in the cryo-EM workflow require manual intervention and analysis that can take several weeks and result in errors due to user bias, time waiting and user fatigue. Current image processing and data analysis solutions are not well-integrated, requiring extensive manual user involvement and long wait times before assessing image quality. Here we describe our development of machine learning models for automation of single particle classification during cryo-EM image processing with repeatable accuracy levels and integrated into the cryo-EM workflow for easy deployment with a new machine learning platform, called CryoDiscovery.

SPARX, a new environment for Cryo-EM image processing

Journal of structural …, 2007

SPARX (single particle analysis for resolution extension) is a new image processing environment with a particular emphasis on transmission electron microscopy (TEM) structure determination. It includes a graphical user interface that provides a complete graphical programming environment with a novel data/process-flow infrastructure, an extensive library of Python scripts that perform specific TEM-related computational tasks, and a core library of fundamental

Particle-Picking in Cryo-Electron Microscopy Images with Online Resources

Journal of advances in microbiology, 2023

Aims: Use deep learning online resources to identify and pick single particle views in micrographs to enable high quality three-dimensional reconstruction for macromolecular structure determination. Study Design: Using the keyhole limpet hemocyanin dataset, a public cryo electron microscopy (cryo-EM) dataset containing two dimensional projections of the particles in two views (top and side) in several orientations, and a recent deep learning algorithm made available in a GitHub repository, we design the procedure to pick both views with high degree of confidence, using only online resources and running in a standard laptop. Methodology: The defocus images are subject to a pre-processing stage to increase its contrast and ameliorate its radiometric range. This is followed by a training stage, that needs a few images annotated with examples of both views of interesttop and side viewidentified using userfriendly tools available online. The annotated subset of images is divided for train and validation purposes, and the algorithm runs to produce a set of weights, that can be used for inference in any other similar image, locating all the instances of the particles in both views in seconds.

A large expert-curated cryo-EM image dataset for machine learning protein particle picking

Scientific Data, 2023

Cryo-electron microscopy (cryo-EM) is a powerful technique for determining the structures of biological macromolecular complexes. Picking single-protein particles from cryo-EM micrographs is a crucial step in reconstructing protein structures. However, the widely used template-based particle picking process is labor-intensive and time-consuming. Though machine learning and artificial intelligence (AI) based particle picking can potentially automate the process, its development is hindered by lack of large, high-quality labelled training data. To address this bottleneck, we present CryoPPP, a large, diverse, expert-curated cryo-EM image dataset for protein particle picking and analysis. It consists of labelled cryo-EM micrographs (images) of 34 representative protein datasets selected from the Electron Microscopy Public Image Archive (EMPIAR). The dataset is 2.6 terabytes and includes 9,893 highresolution micrographs with labelled protein particle coordinates. the labelling process was rigorously validated through 2D particle class validation and 3D density map validation with the gold standard. the dataset is expected to greatly facilitate the development of both aI and classical methods for automated cryo-EM protein particle picking.

CryoPPP: A Large Expert-Labelled Cryo-EM Image Dataset for Machine Learning Protein Particle Picking

bioRxiv, 2023

Cryo-electron microscopy (cryo-EM) is currently the most powerful technique for determining the structures of large protein complexes and assemblies. Picking single-protein particles from cryo-EM micrographs (images) is a key step in reconstructing protein structures. However, the widely used template-based particle picking process is labor-intensive and time-consuming. Though the emerging machine learning-based particle picking can potentially automate the process, its development is severely hindered by lack of large, high-quality, manually labelled training data. Here, we present CryoPPP, a large, diverse, expert-curated cryo-EM image dataset for single protein particle picking and analysis to address this bottleneck. It consists of manually labelled cryo-EM micrographs of 32 non-redundant, representative protein datasets selected from the Electron Microscopy Public Image Archive (EMPIAR). It includes 9,089 diverse, high-resolution micrographs (~300 cryo-EM images per EMPIAR dataset) in which the coordinates of protein particles were labelled by human experts. The protein particle labelling process was rigorously validated by both 2D particle class validation and 3D density map validation with the gold standard. The dataset is expected to greatly facilitate the development of machine learning and artificial intelligence methods for automated cryo-EM protein particle picking. The dataset and data processing scripts are available at

Building Proteins in a Day: Efficient 3D Molecular Structure Estimation with Electron Cryomicroscopy

IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017

Discovering the 3D atomic structure of molecules such as proteins and viruses is a fundamental research problem in biology and medicine. Electron Cryomicroscopy (Cryo-EM) is a promising vision-based technique for structure estimation which attempts to reconstruct 3D structures from 2D images. This paper addresses the challenging problem of 3D reconstruction from 2D Cryo-EM images. A new framework for estimation is introduced which relies on modern stochastic optimization techniques to scale to large datasets. We also introduce a novel technique which reduces the cost of evaluating the objective function during optimization by over five orders or magnitude. The net result is an approach capable of estimating 3D molecular structure from large scale datasets in about a day on a single workstation.

2dx_automator: Implementation of a semiautomatic high-throughput high-resolution cryo-electron crystallography pipeline

Journal of Structural Biology, 2014

The introduction of direct electron detectors (DED) to cryo-electron microscopy has tremendously increased the signal-to-noise ratio (SNR) and quality of the recorded images. We discuss the optimal use of DEDs for cryo-electron crystallography, introduce a new automatic image processing pipeline, and demonstrate the vast improvement in the resolution achieved by the use of both together, especially for highly tilted samples. The new processing pipeline (now included in the software package 2dx) exploits the high SNR and frame readout frequency of DEDs to automatically correct for beam-induced sample movement, and reliably processes individual crystal images without human interaction as data are being acquired. A new graphical user interface (GUI) condenses all information required for quality assessment in one window, allowing the imaging conditions to be verified and adjusted during the data collection session. With this new pipeline an automatically generated unit cell projection map of each recorded 2D crystal is available less than 5 min after the image was recorded. The entire processing procedure yielded a three-dimensional reconstruction of the 2D-crystallized ion-channel membrane protein MloK1 with a much-improved resolution of 5 Å in-plane and 7 Å in the z-direction, within 2 days of data acquisition and simultaneous processing. The results obtained are superior to those delivered by conventional photographic film-based methodology of the same sample, and demonstrate the importance of drift-correction.