On Open Workflows for Processing of Standardized Electroencephalography Data (original) (raw)

Preparing Laboratory and Real-World EEG Data for Large-Scale Analysis: A Containerized Approach

Frontiers in Neuroinformatics, 2016

Large-scale analysis of EEG and other physiological measures promises new insights into brain processes and more accurate and robust brain-computer interface models. However, the absence of standardized vocabularies for annotating events in a machine understandable manner, the welter of collection-specific data organizations, the difficulty in moving data across processing platforms, and the unavailability of agreed-upon standards for preprocessing have prevented large-scale analyses of EEG. Here we describe a "containerized" approach and freely available tools we have developed to facilitate the process of annotating, packaging, and preprocessing EEG data collections to enable data sharing, archiving, large-scale machine learning/data mining and (meta-)analysis. The EEG Study Schema (ESS) comprises three data "Levels," each with its own XML-document schema and file/folder convention, plus a standardized (PREP) pipeline to move raw (Data Level 1) data to a basic preprocessed state (Data Level 2) suitable for application of a large class of EEG analysis methods. Researchers can ship a study as a single unit and operate on its data using a standardized interface. ESS does not require a central database and provides all the metadata data necessary to execute a wide variety of EEG processing pipelines. The primary focus of ESS is automated in-depth analysis and meta-analysis EEG studies. However, ESS can also encapsulate meta-information for the other modalities such as eye tracking, that are increasingly used in both laboratory and real-world neuroimaging. ESS schema and tools are freely available at www.eegstudy.org and a central catalog of over 850 GB of existing data in ESS format is available at studycatalog.org. These tools and resources are part of a larger effort to enable data sharing at sufficient scale for researchers to engage in truly large-scale EEG analysis and data mining (BigEEG.org).

Neurodata Without Borders: Creating a Common Data Format for Neurophysiology

Neuron, 2015

The Neurodata Without Borders (NWB) initiative promotes data standardization in neuroscience to increase research reproducibility and opportunities. In the first NWB pilot project, neurophysiologists and software developers produced a common data format for recordings and metadata of cellular electrophysiology and optical imaging experiments. The format specification, application programming interfaces, and sample datasets have been released.

End-to-end processing of M/EEG data with BIDS, HED, and EEGLAB

Reliable and reproducible machine-learning enabled neuroscience research requires large-scale data sharing and analysis. Essential to the analysis of shared datasets are standardized data organization and metadata formatting, a well-documented automated analysis pipeline, and a comprehensive software framework with a compute environment that can adequately support the research. In this chapter, we introduce the combined Brain Imaging Data Structure (BIDS) and Hierarchical Event Descriptors (HED) frameworks and illustrate their example use through the organization and time course annotation of a publicly shared EEG dataset. We show how the open-source software EEGLAB can operate on data formatted using these standards to perform EEG analysis using a variety of techniques including group-based statistical analysis. Finally, we present a way to exploit freely available high-performance computing resources that allows the application of computationally intensive learning methods to ever...

A streamable large-scale clinical EEG dataset for Deep Learning

2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC)

Deep Learning has revolutionized various fields, including Computer Vision, Natural Language Processing, as well as Biomedical research. Within the field of neuroscience, specifically in electrophysiological neuroimaging, researchers are starting to explore leveraging deep learning to make predictions on their data without extensive feature engineering. The availability of large-scale datasets is a crucial aspect of allowing the experimentation of Deep Learning models. We are publishing the first large-scale clinical EEG dataset that simplifies data access and management for Deep Learning. This dataset contains eyes-closed EEG data prepared from a collection of 1,574 juvenile participants from the Healthy Brain Network. We demonstrate a use case integrating this framework, and discuss why providing such neuroinformatics infrastructure to the community is critical for future scientific discoveries.

The PREP pipeline: standardized preprocessing for large-scale EEG analysis

Frontiers in Neuroinformatics, 2015

The technology to collect brain imaging and physiological measures has become portable and ubiquitous, opening the possibility of large-scale analysis of real-world human imaging. By its nature, such data is large and complex, making automated processing essential. This paper shows how lack of attention to the very early stages of an EEG preprocessing pipeline can reduce the signal-to-noise ratio and introduce unwanted artifacts into the data, particularly for computations done in single precision. We demonstrate that ordinary average referencing improves the signal-to-noise ratio, but that noisy channels can contaminate the results. We also show that identification of noisy channels depends on the reference and examine the complex interaction of filtering, noisy channel identification, and referencing. We introduce a multi-stage robust referencing scheme to deal with the noisy channel-reference interaction. We propose a standardized early-stage EEG processing pipeline (PREP) and discuss the application of the pipeline to more than 600 EEG datasets. The pipeline includes an automatically generated report for each dataset processed. Users can download the PREP pipeline as a freely available MATLAB library from http://eegstudy.org/prepcode.

NWB:N 2.0: An Accessible Data Standard for Neurophysiology

2019

ABSTRACTNeurodata Without Borders: Neurophysiology (NWB:N) is a data standard for neurophysiology, providing neuroscientists with a common standard to share, archive, use, and build common analysis tools for neurophysiology data. With NWB:N version 2.0 (NWB:N 2.0) we made significant advances towards creating a usable standard, software ecosystem, and vibrant community for standardizing neurophysiology data. In this manuscript we focus in particular on the NWB:N data standard schema and present advances towards creating an accessible data standard for neurophysiology.

The NMT Scalp EEG Dataset: An Open-Source Annotated Dataset of Healthy and Pathological EEG Recordings for Predictive Modeling

Frontiers in Neuroscience, 2022

Electroencephalogram (EEG) is widely used for the diagnosis of neurological conditions like epilepsy, neurodegenerative illnesses and sleep related disorders. Proper interpretation of EEG recordings requires the expertise of trained neurologists, a resource which is scarce in the developing world. Neurologists spend a significant portion of their time sifting through EEG recordings looking for abnormalities. Most recordings turn out to be completely normal, owing to the low yield of EEG tests. To minimize such wastage of time and effort, automatic algorithms could be used to provide pre-diagnostic screening to separate normal from abnormal EEG. Data driven machine learning offers a way forward however, design and verification of modern machine learning algorithms require properly curated labeled datasets. To avoid bias, deep learning based methods must be trained on large datasets from diverse sources. This work presents a new open-source dataset, named the NMT Scalp EEG Dataset, co...

EEGsig: an open-source machine learning-based toolbox for EEG signal processing

2020

In order to develop a comprehensive EEG signal processing framework, in this paper, we demonstrate a toolbox and Graphical User Interface (GUI), EEGsig, for the full EEG signal processing procedure. Our goal is to provide a comprehensive suite, free and open-source framework for EEG signal processing, so that the users, especially physicians with little programming experience, can focus on their practical requirements, thereby accelerating the medical projects. We have integrated all the three EEG signal processing phases, including preprocessing, feature extraction, and classification, into EEGsig, , created using MATLAB software. In addition to a variety of useful features, in EEGsig, we have implemented three popular classification algorithms (K-NN, SVM, and ANN) in EEGsig to evaluate the performance of the features. Our experimental results demonstrate that our novel framework for EEG signal processing delivers outstanding classification perforfance and feature extraction robust...

iEEG-BIDS, extending the Brain Imaging Data Structure specification to human intracranial electrophysiology

Scientific Data

The Brain Imaging Data Structure (BIDS) is a community-driven specification for organizing neuroscience data and metadata with the aim to make datasets more transparent, reusable, and reproducible. Intracranial electroencephalography (iEEG) data offer a unique combination of high spatial and temporal resolution measurements of the living human brain. To improve internal (re)use and external sharing of these unique data, we present a specification for storing and sharing iEEG data: iEEG-BIDS.