Gerhard Mayer | University of Bochum (original) (raw)

Papers by Gerhard Mayer

Research paper thumbnail of A Current Encyclopedia of Bioinformatics Tools, Data Formats and Resources for Mass Spectrometry Lipidomics

Metabolites

Mass spectrometry is a widely used technology to identify and quantify biomolecules such as lipid... more Mass spectrometry is a widely used technology to identify and quantify biomolecules such as lipids, metabolites and proteins necessary for biomedical research. In this study, we catalogued freely available software tools, libraries, databases, repositories and resources that support lipidomics data analysis and determined the scope of currently used analytical technologies. Because of the tremendous importance of data interoperability, we assessed the support of standardized data formats in mass spectrometric (MS)-based lipidomics workflows. We included tools in our comparison that support targeted as well as untargeted analysis using direct infusion/shotgun (DI-MS), liquid chromatography−mass spectrometry, ion mobility or MS imaging approaches on MS1 and potentially higher MS levels. As a result, we determined that the Human Proteome Organization-Proteomics Standards Initiative standard data formats, mzML and mzTab-M, are already supported by a substantial number of recent software...

Research paper thumbnail of The HUPO proteomics standards initiative- mass spectrometry controlled vocabulary

Controlled vocabularies (CVs), i.e. a collection of predefined terms describing a modeling domain... more Controlled vocabularies (CVs), i.e. a collection of predefined terms describing a modeling domain, used for the semantic annotation of data, and ontologies are used in structured data formats and databases to avoid inconsistencies in annota-tion, to have a unique (and preferably short) accession number and to give researchers and computer algorithms the possibility for more expressive semantic annotation of data. The Human Proteome Organization (HUPO)–Proteomics

Research paper thumbnail of The mzQuantML Data Standard for Mass Spectrometry-based Quantitative Studies in Proteomics

Molecular & Cellular Proteomics, 2013

Research paper thumbnail of The mzIdentML data standard for mass spectrometry-based proteomics results

Molecular & cellular proteomics : MCP, 2012

We report the release of mzIdentML, an exchange standard for peptide and protein identification d... more We report the release of mzIdentML, an exchange standard for peptide and protein identification data, designed by the Proteomics Standards Initiative (PSI). The format was developed by the PSI in collaboration with instrument and software vendors, and the developers of the major open-source projects in proteomics. Software implementations have been developed to enable conversion from most popular proprietary and open-source formats, and mzIdentML will soon be supported by the major public repositories. These developments enable proteomics scientists to start working with the standard for exchanging and publishing data sets in support of publications and they provide a stable platform for bioinformatics groups and commercial software vendors to work with a single file format for identification data.

Research paper thumbnail of ProteomeXchange provides globally coordinated proteomics data submission and dissemination

Nature Biotechnology, 2014

Author contributions JAV, HH, and EWD led the current implementation of the ProteomeXchange data ... more Author contributions JAV, HH, and EWD led the current implementation of the ProteomeXchange data workflow, guidelines and related software. RW developed the 'ProteomeXchange submission tool'. Further authors contributed to the development of the ProteomeXchange consortium in different ways, e.g. contributing to the initial ProteomeXchange prototypes in the past, developing software and data standards, or contributing in different aspects to the implementation of the guidelines and the data workflow. JAV, EWD and HH wrote the manuscript.

Research paper thumbnail of Ontological analysis of controlled vocabularies used in PSI/MSI supported XML standards

Besides a ple thora of formal ontologies, the requirement for simple data annotation has led to a... more Besides a ple thora of formal ontologies, the requirement for simple data annotation has led to an increased use o f so called controlled vocabularies (CV) in multiple omics communities . We analyze two of those CVs from an ontological viewpoint, highlight typical modelling errors and propose more adeq uat solutions. Discovered errors are discussed in the light of the OOPS ontology pitfa ll framework and the OBO Foundry naming conventions. As a result the CVs could be improved and the OOPS catalogue could be amended and ex pand d with new, previously missing error categories . In an outlook we discuss potential reasons for the error prevalence and analyse what criticism is justified for CV semantics d what ‘errors’ are more valid for formal ontologies rather than CVs. We conclude that although many design principles valid for description logics ontol gies are not relevant for semantically flat CVs and in tur n there is a need for CV-best-practice s that are not appropriate for descr...

Research paper thumbnail of Proteomics Standards Initiative Extended FASTA Format (PEFF)

Mass spectrometry-based proteomics enables the high-throughput identification and quantification ... more Mass spectrometry-based proteomics enables the high-throughput identification and quantification of proteins, including sequence variants and post-translational modifications (PTMs), in biological samples. However, most workflows require that such variations be included in the search space used to analyze the data, and doing so remains challenging with most analysis tools. In order to facilitate the search for known sequence variants and PTMs, the Proteomics Standards Initiative (PSI) has designed and implemented the PSI Extended FASTA Format (PEFF). PEFF is based on the very popular FASTA format but adds a uniform mechanism for encoding substantially more metadata about the sequence collection as well as individual entries, including support for encoding known sequence variants, PTMs, and proteoforms. The format is very nearly backwards compatible, and as such, existing FASTA parsers will require little or no changes to be able to read PEFF files as FASTA files, although without su...

Research paper thumbnail of Mass spectrometry for semi-experimental protein structure determination and modeling

arXiv: Other Quantitative Biology, 2020

The structure of proteins is essential for its function. The determination of protein structures ... more The structure of proteins is essential for its function. The determination of protein structures is possible by experimental or predicted by computational methods, but also a combination of both approaches is possible. Here, first an overview about experimental structure determination methods with their pros and cons is given. Then we describe how mass spectrometry is useful for semi-experimental integrative protein structure determination. We review the methodology and describe software programs supporting such integrated protein structure prediction approaches, making use of distance constraints got from mass spectrometry cross-linking experiments

Research paper thumbnail of The Proteomics Standards Initiative: Fifteen Years of Progress and Future Work

Journal of proteome research, Jan 29, 2017

The Proteomics Standards Initiative (PSI) of the Human Proteome Organization (HUPO) has now been ... more The Proteomics Standards Initiative (PSI) of the Human Proteome Organization (HUPO) has now been developing and promoting open community standards and software tools in the field of proteomics for 15 years. Under the guidance of the chair, co-chairs, and other leadership positions, the PSI working groups are tasked with the development and maintenance of community standards via special workshops and ongoing work. Among the existing, ratified standards, the PSI working groups continue to update PSI-MI XML, MITAB, mzML, mzIdentML, mzQuantML, mzTab, and the MIAPE (Minimum Information About a Proteomics Experiment) guidelines with the advance of new technologies and techniques. Further, new standards are currently either in the final stages of completion (proBed and proBAM for proteogenomics results, as well as PEFF) or in early stages of design (a spectral library standard format, a universal spectrum identifier, the qcML quality control format, and the Protein Expression Interface (PR...

Research paper thumbnail of BioInfra.Prot: A Comprehensive Proteomics Workflow Including Data Standardization, Protein Inference, Expression Analysis and Data Publication

Journal of biotechnology, Jan 9, 2017

The analysis of high-throughput mass spectrometry-based proteomics data must address the specific... more The analysis of high-throughput mass spectrometry-based proteomics data must address the specific challenges of this technology. To this end, the comprehensive proteomics workflow offered by the de.NBI service center BioInfra.Prot provides indispensable components for the computational and statistical analysis of this kind of data. These components include tools and methods for spectrum identification and protein inference, protein quantification, expression analysis as well as data standardization and data publication. All particular methods of the workflow which address these tasks are state-of-the-art or cutting edge. As has been shown in previous publications, each of these methods is adequate to solve its specific task and gives competitive results. However, the methods included in the workflow are continuously reviewed, updated and improved to adapt to new scientific developments. All of these particular components and methods are available as stand-alone BioInfra.Prot service...

Research paper thumbnail of The mzIdentML data standard version 1.2, supporting advances in proteome informatics

Molecular & cellular proteomics : MCP, Jul 17, 2017

The first stable version of the Proteomics Standards Initiative mzIdentML open data standard (ver... more The first stable version of the Proteomics Standards Initiative mzIdentML open data standard (version 1.1) was published in 2012 - capturing the outputs of peptide and protein identification software. In the intervening years, the standard has become well supported in both commercial and open software, as well as a submission and download format for public repositories. Here we report a new release of mzIdentML (version 1.2) that is required to keep pace with emerging practice in proteome informatics. New features have been added to support: (i) scores associated with localization of modifications on peptides; (ii) statistics performed at the level of peptides; (iii) identification of cross-linked peptides; and (iv) support for proteogenomics approaches. In addition, there is now improved support for the encoding of de novo sequencing of peptides, spectral library searches and protein inference. As a key point, the underlying XML schema has only undergone very minor modifications to...

Research paper thumbnail of Boolean modeling techniques for protein co-expression networks in systems medicine

Expert review of proteomics, Jun 1, 2016

Application of systems biology/systems medicine approaches is promising for proteomics/biomedical... more Application of systems biology/systems medicine approaches is promising for proteomics/biomedical research, but requires selection of an adequate modeling type. This article reviews the existing Boolean network modeling approaches, which provide in comparison with alternative modeling techniques several advantages for the processing of proteomics data. Application of methods for inference, reduction and validation of protein co-expression networks that are derived from quantitative high-throughput proteomics measurements is presented. It's also shown how Boolean models can be used to derive system-theoretic characteristics that describe both the dynamical behavior of such networks as a whole and the properties of different cell states (e.g. healthy or diseased cell states). Furthermore, application of methods derived from control theory is proposed in order to simulate the effects of therapeutic interventions on such networks, which is a promising approach for the computer-assis...

Research paper thumbnail of Data management in systems biology I - Overview and bibliography

Eprint Arxiv 0908 0411, Aug 4, 2009

Large systems biology projects can encompass several workgroups often located in different countr... more Large systems biology projects can encompass several workgroups often located in different countries. An overview about existing data standards in systems biology and the management, storage, exchange and integration of the generated data in large distributed research projects is given, the pros and cons of the different approaches are illustrated from a practical point of view, the existing software - open source as well as commercial - and the relevant literature is extensively overview, so that the reader should be enabled to decide which data management approach is the best suited for his special needs. An emphasis is laid on the use of workflow systems and of TAB-based formats. The data in this format can be viewed and edited easily using spreadsheet programs which are familiar to the working experimental biologists. The use of workflows for the standardized access to data in either own or publicly available databanks and the standardization of operation procedures is presented. The use of ontologies and semantic web technologies for data management will be discussed in a further paper.

Research paper thumbnail of Data management in Systems biology II - Outlook towards the semantic web

Eprint Arxiv 0912 2822, Dec 15, 2009

The benefit of using ontologies, defined by the respective data standards, is shown. It is presen... more The benefit of using ontologies, defined by the respective data standards, is shown. It is presented how ontologies can be used for the semantic enrichment of data and how this can contribute to the vision of the semantic web to become true. The problems existing today on the way to a true semantic web are pinpointed, different semantic web standards, tools and development frameworks are overlooked and an outlook towards artificial intelligence and agents for searching and mining the data in the semantic web are given, paving the way from data management to information and in the end true knowledge management systems.

Research paper thumbnail of 2016 update of the PRIDE database and its related tools

Nucleic Acids Research, 2015

The PRoteomics IDEntifications (PRIDE) database is one of the world-leading data repositories of ... more The PRoteomics IDEntifications (PRIDE) database is one of the world-leading data repositories of mass spectrometry (MS)-based proteomics data. Since the beginning of 2014, PRIDE Archive (http://www.ebi.ac. uk/pride/archive/) is the new PRIDE archival system, replacing the original PRIDE database. Here we summarize the developments in PRIDE resources and related tools since the previous update manuscript in the Database Issue in 2013. PRIDE Archive constitutes a complete redevelopment of the original PRIDE, comprising a new storage backend, data submission system and web interface, among other components. PRIDE Archive supports the mostwidely used PSI (Proteomics Standards Initiative) data standard formats (mzML and mzIdentML) and implements the data requirements and guidelines of the ProteomeXchange Consortium. The wide adoption of ProteomeXchange within the community has triggered an unprecedented increase in the number of submitted data sets (around 150 data sets per month). We outline some statistics on the current PRIDE Archive data contents. We also report on the status of the PRIDE related stand-alone tools: PRIDE Inspector, PRIDE Converter 2 and the Pro-teomeXchange submission tool. Finally, we will give a brief update on the resources under development 'PRIDE Cluster' and 'PRIDE Proteomes', which provide a complementary view and quality-scored information of the peptide and protein identification data available in PRIDE Archive.

Research paper thumbnail of Ontology usage in Omics Standards Initiatives: Pros and Cons of enriching XML data formats with controlled vocabulary terms EMAIL-ADDRESSES

We here review a method of XML data enrichment with controlled vocabularies (CV) in light of end-... more We here review a method of XML data enrichment with controlled vocabularies (CV) in light of end-user compliance. We outline the reasons that made major standard initiatives in proteomics and metabolomics use this data enrichment scheme on omics data in favor of more formal approaches, e.g. description logics (DL) knowledge bases. We show that in comparison to other knowledge representation formalisms, the list of prerequisite skills on the user-side and the learning threshold is significantly lower, making the approach feasible for bioinformaticians with average skill levels, i.e. basic XML knowledge. Additionally our approach allows to source out the 'business logics' from the terminology into external rules. This enables the successive and encapsulated addition of semantics in a flexible way.

Research paper thumbnail of ProCon – PROteomics CONversion tool

Journal of Proteomics, 2015

With the growing amount of experimental data produced in proteomics experiments and the requireme... more With the growing amount of experimental data produced in proteomics experiments and the requirements/recommendations of journals in the proteomics field to publicly make available data described in papers, a need for long-term storage of proteomics data in public repositories arises. For such an upload one needs proteomics data in a standardized format. Therefore, it is desirable, that the proprietary vendor's software will integrate in the future such an export functionality using the standard formats for proteomics results defined by the HUPO-PSI group. Currently not all search engines and analysis tools support these standard formats. In the meantime there is a need to provide user-friendly free-to-use conversion tools that can convert the data into such standard formats in order to support wet-lab scientists in creating proteomics data files ready for upload into the public repositories. ProCon is such a conversion tool written in Java for conversion of proteomics identification data into standard formats mzIdentML and Pride XML. It allows the conversion of Sequest™/Comet .out files, of search results from the popular and often used ProteomeDiscoverer® 1.x (x=versions 1.1 to1.4) software and search results stored in the LIMS systems ProteinScape® 1.3 and 2.1 into mzIdentML and PRIDE XML.

Research paper thumbnail of Development of data representation standards by the human proteome organization proteomics standards initiative

Journal of the American Medical Informatics Association : JAMIA, Jan 28, 2015

To describe the goals of the Proteomics Standards Initiative (PSI) of the Human Proteome Organiza... more To describe the goals of the Proteomics Standards Initiative (PSI) of the Human Proteome Organization, the methods that the PSI has employed to create data standards, the resulting output of the PSI, lessons learned from the PSI's evolution, and future directions and synergies for the group. The PSI has 5 categories of deliverables that have guided the group. These are minimum information guidelines, data formats, controlled vocabularies, resources and software tools, and dissemination activities. These deliverables are produced via the leadership and working group organization of the initiative, driven by frequent workshops and ongoing communication within the working groups. Official standards are subjected to a rigorous document process that includes several levels of peer review prior to release. We have produced and published minimum information guidelines describing what information should be provided when making data public, either via public repositories or other means. ...

Research paper thumbnail of Guidelines for reporting quantitative mass spectrometry based experiments in proteomics

Journal of Proteomics, 2013

Mass spectrometry is already a well-established protein identification tool and recent methodolog... more Mass spectrometry is already a well-established protein identification tool and recent methodological and technological developments have also made possible the extraction of quantitative data of protein abundance in large-scale studies. Several strategies for absolute and relative quantitative proteomics and the statistical assessment of quantifications are possible, each having specific measurements and therefore, different data analysis workflows.

Research paper thumbnail of Guidelines for reporting quantitative mass spectrometry based experiments in proteomics

Journal of Proteomics, 2013

Mass spectrometry is already a well-established protein identification tool and recent methodolog... more Mass spectrometry is already a well-established protein identification tool and recent methodological and technological developments have also made possible the extraction of quantitative data of protein abundance in large-scale studies. Several strategies for absolute and relative quantitative proteomics and the statistical assessment of quantifications are possible, each having specific measurements and therefore, different data analysis workflows.

Research paper thumbnail of A Current Encyclopedia of Bioinformatics Tools, Data Formats and Resources for Mass Spectrometry Lipidomics

Metabolites

Mass spectrometry is a widely used technology to identify and quantify biomolecules such as lipid... more Mass spectrometry is a widely used technology to identify and quantify biomolecules such as lipids, metabolites and proteins necessary for biomedical research. In this study, we catalogued freely available software tools, libraries, databases, repositories and resources that support lipidomics data analysis and determined the scope of currently used analytical technologies. Because of the tremendous importance of data interoperability, we assessed the support of standardized data formats in mass spectrometric (MS)-based lipidomics workflows. We included tools in our comparison that support targeted as well as untargeted analysis using direct infusion/shotgun (DI-MS), liquid chromatography−mass spectrometry, ion mobility or MS imaging approaches on MS1 and potentially higher MS levels. As a result, we determined that the Human Proteome Organization-Proteomics Standards Initiative standard data formats, mzML and mzTab-M, are already supported by a substantial number of recent software...

Research paper thumbnail of The HUPO proteomics standards initiative- mass spectrometry controlled vocabulary

Controlled vocabularies (CVs), i.e. a collection of predefined terms describing a modeling domain... more Controlled vocabularies (CVs), i.e. a collection of predefined terms describing a modeling domain, used for the semantic annotation of data, and ontologies are used in structured data formats and databases to avoid inconsistencies in annota-tion, to have a unique (and preferably short) accession number and to give researchers and computer algorithms the possibility for more expressive semantic annotation of data. The Human Proteome Organization (HUPO)–Proteomics

Research paper thumbnail of The mzQuantML Data Standard for Mass Spectrometry-based Quantitative Studies in Proteomics

Molecular & Cellular Proteomics, 2013

Research paper thumbnail of The mzIdentML data standard for mass spectrometry-based proteomics results

Molecular & cellular proteomics : MCP, 2012

We report the release of mzIdentML, an exchange standard for peptide and protein identification d... more We report the release of mzIdentML, an exchange standard for peptide and protein identification data, designed by the Proteomics Standards Initiative (PSI). The format was developed by the PSI in collaboration with instrument and software vendors, and the developers of the major open-source projects in proteomics. Software implementations have been developed to enable conversion from most popular proprietary and open-source formats, and mzIdentML will soon be supported by the major public repositories. These developments enable proteomics scientists to start working with the standard for exchanging and publishing data sets in support of publications and they provide a stable platform for bioinformatics groups and commercial software vendors to work with a single file format for identification data.

Research paper thumbnail of ProteomeXchange provides globally coordinated proteomics data submission and dissemination

Nature Biotechnology, 2014

Author contributions JAV, HH, and EWD led the current implementation of the ProteomeXchange data ... more Author contributions JAV, HH, and EWD led the current implementation of the ProteomeXchange data workflow, guidelines and related software. RW developed the 'ProteomeXchange submission tool'. Further authors contributed to the development of the ProteomeXchange consortium in different ways, e.g. contributing to the initial ProteomeXchange prototypes in the past, developing software and data standards, or contributing in different aspects to the implementation of the guidelines and the data workflow. JAV, EWD and HH wrote the manuscript.

Research paper thumbnail of Ontological analysis of controlled vocabularies used in PSI/MSI supported XML standards

Besides a ple thora of formal ontologies, the requirement for simple data annotation has led to a... more Besides a ple thora of formal ontologies, the requirement for simple data annotation has led to an increased use o f so called controlled vocabularies (CV) in multiple omics communities . We analyze two of those CVs from an ontological viewpoint, highlight typical modelling errors and propose more adeq uat solutions. Discovered errors are discussed in the light of the OOPS ontology pitfa ll framework and the OBO Foundry naming conventions. As a result the CVs could be improved and the OOPS catalogue could be amended and ex pand d with new, previously missing error categories . In an outlook we discuss potential reasons for the error prevalence and analyse what criticism is justified for CV semantics d what ‘errors’ are more valid for formal ontologies rather than CVs. We conclude that although many design principles valid for description logics ontol gies are not relevant for semantically flat CVs and in tur n there is a need for CV-best-practice s that are not appropriate for descr...

Research paper thumbnail of Proteomics Standards Initiative Extended FASTA Format (PEFF)

Mass spectrometry-based proteomics enables the high-throughput identification and quantification ... more Mass spectrometry-based proteomics enables the high-throughput identification and quantification of proteins, including sequence variants and post-translational modifications (PTMs), in biological samples. However, most workflows require that such variations be included in the search space used to analyze the data, and doing so remains challenging with most analysis tools. In order to facilitate the search for known sequence variants and PTMs, the Proteomics Standards Initiative (PSI) has designed and implemented the PSI Extended FASTA Format (PEFF). PEFF is based on the very popular FASTA format but adds a uniform mechanism for encoding substantially more metadata about the sequence collection as well as individual entries, including support for encoding known sequence variants, PTMs, and proteoforms. The format is very nearly backwards compatible, and as such, existing FASTA parsers will require little or no changes to be able to read PEFF files as FASTA files, although without su...

Research paper thumbnail of Mass spectrometry for semi-experimental protein structure determination and modeling

arXiv: Other Quantitative Biology, 2020

The structure of proteins is essential for its function. The determination of protein structures ... more The structure of proteins is essential for its function. The determination of protein structures is possible by experimental or predicted by computational methods, but also a combination of both approaches is possible. Here, first an overview about experimental structure determination methods with their pros and cons is given. Then we describe how mass spectrometry is useful for semi-experimental integrative protein structure determination. We review the methodology and describe software programs supporting such integrated protein structure prediction approaches, making use of distance constraints got from mass spectrometry cross-linking experiments

Research paper thumbnail of The Proteomics Standards Initiative: Fifteen Years of Progress and Future Work

Journal of proteome research, Jan 29, 2017

The Proteomics Standards Initiative (PSI) of the Human Proteome Organization (HUPO) has now been ... more The Proteomics Standards Initiative (PSI) of the Human Proteome Organization (HUPO) has now been developing and promoting open community standards and software tools in the field of proteomics for 15 years. Under the guidance of the chair, co-chairs, and other leadership positions, the PSI working groups are tasked with the development and maintenance of community standards via special workshops and ongoing work. Among the existing, ratified standards, the PSI working groups continue to update PSI-MI XML, MITAB, mzML, mzIdentML, mzQuantML, mzTab, and the MIAPE (Minimum Information About a Proteomics Experiment) guidelines with the advance of new technologies and techniques. Further, new standards are currently either in the final stages of completion (proBed and proBAM for proteogenomics results, as well as PEFF) or in early stages of design (a spectral library standard format, a universal spectrum identifier, the qcML quality control format, and the Protein Expression Interface (PR...

Research paper thumbnail of BioInfra.Prot: A Comprehensive Proteomics Workflow Including Data Standardization, Protein Inference, Expression Analysis and Data Publication

Journal of biotechnology, Jan 9, 2017

The analysis of high-throughput mass spectrometry-based proteomics data must address the specific... more The analysis of high-throughput mass spectrometry-based proteomics data must address the specific challenges of this technology. To this end, the comprehensive proteomics workflow offered by the de.NBI service center BioInfra.Prot provides indispensable components for the computational and statistical analysis of this kind of data. These components include tools and methods for spectrum identification and protein inference, protein quantification, expression analysis as well as data standardization and data publication. All particular methods of the workflow which address these tasks are state-of-the-art or cutting edge. As has been shown in previous publications, each of these methods is adequate to solve its specific task and gives competitive results. However, the methods included in the workflow are continuously reviewed, updated and improved to adapt to new scientific developments. All of these particular components and methods are available as stand-alone BioInfra.Prot service...

Research paper thumbnail of The mzIdentML data standard version 1.2, supporting advances in proteome informatics

Molecular & cellular proteomics : MCP, Jul 17, 2017

The first stable version of the Proteomics Standards Initiative mzIdentML open data standard (ver... more The first stable version of the Proteomics Standards Initiative mzIdentML open data standard (version 1.1) was published in 2012 - capturing the outputs of peptide and protein identification software. In the intervening years, the standard has become well supported in both commercial and open software, as well as a submission and download format for public repositories. Here we report a new release of mzIdentML (version 1.2) that is required to keep pace with emerging practice in proteome informatics. New features have been added to support: (i) scores associated with localization of modifications on peptides; (ii) statistics performed at the level of peptides; (iii) identification of cross-linked peptides; and (iv) support for proteogenomics approaches. In addition, there is now improved support for the encoding of de novo sequencing of peptides, spectral library searches and protein inference. As a key point, the underlying XML schema has only undergone very minor modifications to...

Research paper thumbnail of Boolean modeling techniques for protein co-expression networks in systems medicine

Expert review of proteomics, Jun 1, 2016

Application of systems biology/systems medicine approaches is promising for proteomics/biomedical... more Application of systems biology/systems medicine approaches is promising for proteomics/biomedical research, but requires selection of an adequate modeling type. This article reviews the existing Boolean network modeling approaches, which provide in comparison with alternative modeling techniques several advantages for the processing of proteomics data. Application of methods for inference, reduction and validation of protein co-expression networks that are derived from quantitative high-throughput proteomics measurements is presented. It's also shown how Boolean models can be used to derive system-theoretic characteristics that describe both the dynamical behavior of such networks as a whole and the properties of different cell states (e.g. healthy or diseased cell states). Furthermore, application of methods derived from control theory is proposed in order to simulate the effects of therapeutic interventions on such networks, which is a promising approach for the computer-assis...

Research paper thumbnail of Data management in systems biology I - Overview and bibliography

Eprint Arxiv 0908 0411, Aug 4, 2009

Large systems biology projects can encompass several workgroups often located in different countr... more Large systems biology projects can encompass several workgroups often located in different countries. An overview about existing data standards in systems biology and the management, storage, exchange and integration of the generated data in large distributed research projects is given, the pros and cons of the different approaches are illustrated from a practical point of view, the existing software - open source as well as commercial - and the relevant literature is extensively overview, so that the reader should be enabled to decide which data management approach is the best suited for his special needs. An emphasis is laid on the use of workflow systems and of TAB-based formats. The data in this format can be viewed and edited easily using spreadsheet programs which are familiar to the working experimental biologists. The use of workflows for the standardized access to data in either own or publicly available databanks and the standardization of operation procedures is presented. The use of ontologies and semantic web technologies for data management will be discussed in a further paper.

Research paper thumbnail of Data management in Systems biology II - Outlook towards the semantic web

Eprint Arxiv 0912 2822, Dec 15, 2009

The benefit of using ontologies, defined by the respective data standards, is shown. It is presen... more The benefit of using ontologies, defined by the respective data standards, is shown. It is presented how ontologies can be used for the semantic enrichment of data and how this can contribute to the vision of the semantic web to become true. The problems existing today on the way to a true semantic web are pinpointed, different semantic web standards, tools and development frameworks are overlooked and an outlook towards artificial intelligence and agents for searching and mining the data in the semantic web are given, paving the way from data management to information and in the end true knowledge management systems.

Research paper thumbnail of 2016 update of the PRIDE database and its related tools

Nucleic Acids Research, 2015

The PRoteomics IDEntifications (PRIDE) database is one of the world-leading data repositories of ... more The PRoteomics IDEntifications (PRIDE) database is one of the world-leading data repositories of mass spectrometry (MS)-based proteomics data. Since the beginning of 2014, PRIDE Archive (http://www.ebi.ac. uk/pride/archive/) is the new PRIDE archival system, replacing the original PRIDE database. Here we summarize the developments in PRIDE resources and related tools since the previous update manuscript in the Database Issue in 2013. PRIDE Archive constitutes a complete redevelopment of the original PRIDE, comprising a new storage backend, data submission system and web interface, among other components. PRIDE Archive supports the mostwidely used PSI (Proteomics Standards Initiative) data standard formats (mzML and mzIdentML) and implements the data requirements and guidelines of the ProteomeXchange Consortium. The wide adoption of ProteomeXchange within the community has triggered an unprecedented increase in the number of submitted data sets (around 150 data sets per month). We outline some statistics on the current PRIDE Archive data contents. We also report on the status of the PRIDE related stand-alone tools: PRIDE Inspector, PRIDE Converter 2 and the Pro-teomeXchange submission tool. Finally, we will give a brief update on the resources under development 'PRIDE Cluster' and 'PRIDE Proteomes', which provide a complementary view and quality-scored information of the peptide and protein identification data available in PRIDE Archive.

Research paper thumbnail of Ontology usage in Omics Standards Initiatives: Pros and Cons of enriching XML data formats with controlled vocabulary terms EMAIL-ADDRESSES

We here review a method of XML data enrichment with controlled vocabularies (CV) in light of end-... more We here review a method of XML data enrichment with controlled vocabularies (CV) in light of end-user compliance. We outline the reasons that made major standard initiatives in proteomics and metabolomics use this data enrichment scheme on omics data in favor of more formal approaches, e.g. description logics (DL) knowledge bases. We show that in comparison to other knowledge representation formalisms, the list of prerequisite skills on the user-side and the learning threshold is significantly lower, making the approach feasible for bioinformaticians with average skill levels, i.e. basic XML knowledge. Additionally our approach allows to source out the 'business logics' from the terminology into external rules. This enables the successive and encapsulated addition of semantics in a flexible way.

Research paper thumbnail of ProCon – PROteomics CONversion tool

Journal of Proteomics, 2015

With the growing amount of experimental data produced in proteomics experiments and the requireme... more With the growing amount of experimental data produced in proteomics experiments and the requirements/recommendations of journals in the proteomics field to publicly make available data described in papers, a need for long-term storage of proteomics data in public repositories arises. For such an upload one needs proteomics data in a standardized format. Therefore, it is desirable, that the proprietary vendor's software will integrate in the future such an export functionality using the standard formats for proteomics results defined by the HUPO-PSI group. Currently not all search engines and analysis tools support these standard formats. In the meantime there is a need to provide user-friendly free-to-use conversion tools that can convert the data into such standard formats in order to support wet-lab scientists in creating proteomics data files ready for upload into the public repositories. ProCon is such a conversion tool written in Java for conversion of proteomics identification data into standard formats mzIdentML and Pride XML. It allows the conversion of Sequest™/Comet .out files, of search results from the popular and often used ProteomeDiscoverer® 1.x (x=versions 1.1 to1.4) software and search results stored in the LIMS systems ProteinScape® 1.3 and 2.1 into mzIdentML and PRIDE XML.

Research paper thumbnail of Development of data representation standards by the human proteome organization proteomics standards initiative

Journal of the American Medical Informatics Association : JAMIA, Jan 28, 2015

To describe the goals of the Proteomics Standards Initiative (PSI) of the Human Proteome Organiza... more To describe the goals of the Proteomics Standards Initiative (PSI) of the Human Proteome Organization, the methods that the PSI has employed to create data standards, the resulting output of the PSI, lessons learned from the PSI's evolution, and future directions and synergies for the group. The PSI has 5 categories of deliverables that have guided the group. These are minimum information guidelines, data formats, controlled vocabularies, resources and software tools, and dissemination activities. These deliverables are produced via the leadership and working group organization of the initiative, driven by frequent workshops and ongoing communication within the working groups. Official standards are subjected to a rigorous document process that includes several levels of peer review prior to release. We have produced and published minimum information guidelines describing what information should be provided when making data public, either via public repositories or other means. ...

Research paper thumbnail of Guidelines for reporting quantitative mass spectrometry based experiments in proteomics

Journal of Proteomics, 2013

Mass spectrometry is already a well-established protein identification tool and recent methodolog... more Mass spectrometry is already a well-established protein identification tool and recent methodological and technological developments have also made possible the extraction of quantitative data of protein abundance in large-scale studies. Several strategies for absolute and relative quantitative proteomics and the statistical assessment of quantifications are possible, each having specific measurements and therefore, different data analysis workflows.

Research paper thumbnail of Guidelines for reporting quantitative mass spectrometry based experiments in proteomics

Journal of Proteomics, 2013

Mass spectrometry is already a well-established protein identification tool and recent methodolog... more Mass spectrometry is already a well-established protein identification tool and recent methodological and technological developments have also made possible the extraction of quantitative data of protein abundance in large-scale studies. Several strategies for absolute and relative quantitative proteomics and the statistical assessment of quantifications are possible, each having specific measurements and therefore, different data analysis workflows.