Frank Tristram - Academia.edu (original) (raw)
Miscellaneous by Frank Tristram
Papers by Frank Tristram
Picture that orders different kind of disciplines within a triangle that connects three "pur... more Picture that orders different kind of disciplines within a triangle that connects three "pure main goals" of science (at corners). Combined with further information it may help to understand the "data culture" in some discplines.
One of the scientiĄc communities that generate the largest amounts of data today are the climate ... more One of the scientiĄc communities that generate the largest amounts of data today are the climate sciences. New climate models enable model integrations at unprecedented resolution, simulating timescales from decades to centuries of climate change. Nowadays, limited storage space and ever increasing model output is a big challenge. For this reason, we look at lossless compression using prediction-based data compression. We show that there is a signiĄcant dependence of the compression rate on the chosen traversal method and the underlying data model. We examine the inĆuence of this structural dependency on prediction-based compression algorithms and explore possibilities to improve compression rates. We introduce the concept of Information Spaces (IS), which help to improve the accuracy of predictions by nearly 10% and decrease the standard deviation of the compression results by 20% on average.
Journal of Computational Chemistry, Aug 10, 2012
Journal of Computational Chemistry, Jun 8, 2011
The computational effort of biomolecular simulations can be significantly reduced by means of imp... more The computational effort of biomolecular simulations can be significantly reduced by means of implicit solvent models in which the energy generally contains a correction depending on the surface area and/or the volume of the molecule. In this article, we present simple derivation of exact, easy-to-use analytical formulas for these quantities and their derivatives with respect to atomic coordinates. In addition, we provide an efficient, linear-scaling algorithm for the construction of the power diagram required for practical implementation of these formulas. Our approach is implemented in a C++ header-only template library.
Journal of Computational Chemistry, Aug 9, 2012
The relevance of receptor conformational change during ligand binding is well documented for many... more The relevance of receptor conformational change during ligand binding is well documented for many pharmaceutically relevant receptors, but is still not fully accounted for in in silico docking methods. While there has been significant progress in treatment of receptor side chain flexibility sampling of backbone flexibility remains challenging because the conformational space expands dramatically and the scoring function must balance protein-protein and protein-ligand contributions. Here, we investigate an efficient multistage backbone reconstruction algorithm for large loop regions in the receptor and demonstrate that treatment of backbone receptor flexibility significantly improves binding mode prediction starting from apo structures and in cross docking simulations. For three different kinase receptors in which large flexible loops reconstruct upon ligand binding, we demonstrate that treatment of backbone flexibility results in accurate models of the complexes in simulations starting from the apo structure. At the example of the DFG-motif in the p38 kinase, we also show how loop reconstruction can be used to model allosteric binding. Our approach thus paves the way to treat the complex process of receptor reconstruction upon ligand binding in docking simulations and may help to design new ligands with high specificity by exploitation of allosteric mechanisms.
Journal of Cheminformatics, May 1, 2010
Heart Rhythm, Oct 1, 2014
BACKGROUND Effective treatment of atrial fibrillation (AF) remains an unmet need. Human K 2P 3.1 ... more BACKGROUND Effective treatment of atrial fibrillation (AF) remains an unmet need. Human K 2P 3.1 (TASK-1) K þ channels display atrial-specific expression and may serve as novel antiarrhythmic targets. In rodents, inhibition of K 2P 3.1 causes prolongation of action potentials and QT intervals. We used a porcine model to further elucidate the significance of K 2P 3.1 in large mammals. OBJECTIVE The purpose of this study was to study porcine (p) K 2P 3.1 channel function and cardiac expression and to analyze pK 2P 3.1 remodeling in AF and heart failure (HF). METHODS The porcine K 2P 3.1 ortholog was amplified and characterized using voltage-clamp electrophysiology. K 2P 3.1 mRNA expression and remodeling were studied in domestic pigs during AF and HF induced by atrial burst pacing. RESULTS Porcine K 2P 3.1 cDNA encodes a channel protein with 97% identity to human K 2P 3.1. K þ currents recorded from Xenopus oocytes expressing pK 2P 3.1 were functionally and pharmacologically similar to their human counterparts. In the pig, K 2P 3.1 mRNA was predominantly expressed in atrial tissue. AF and HF were associated with reduction of K 2P 3.1 mRNA levels by 85.1% (right atrium) and 77.0% (left atrium) at 21-day follow-up. In contrast, ventricular K 2P 3.1 expression was low and not significantly affected by AF/HF. CONCLUSION Porcine K 2P 3.1 channels exhibit atrial expression and functional properties similar to their human orthologs, supporting a general role as antiarrhythmic drug targets. K 2P 3.1 down-regulation in AF with HF may indicate functional relevance of the channel that remains to be validated in prospective interventional studies. KEYWORDS Atrial fibrillation; Background potassium current; Cardiac action potential; Electrical remodeling; K 2P channel ABBREVIATIONS AF ¼ atrial fibrillation; AERP ¼ atrial effective refractory period; HF ¼ heart failure; K 2P ¼ 2-pore domain K þ channel; LVEF ¼ left ventricular ejection fraction; RA ¼ right atrium; RT-qPCR ¼ quantitative real-time polymerase chain reaction; SR ¼ sinus rhythm; TASK ¼ TWIK-related acid sensitive K þ channel 1; TWIK ¼ tandem of P domains in a weak inward rectifying K þ channel
Research data are valuable goods that are often only reproducible with significant effort or, in ... more Research data are valuable goods that are often only reproducible with significant effort or, in the case of unique observations, not at all. Scientists focus on data analysis and its results. By now, data exploration is accepted as a fourth scientific pillar (next to experiments, theory, and simulation). A main prerequisite for easy data exploration is successful data management. A holistic approach includes all phases of a data lifecycle: data generation, data analysis, data ingest, data preservation, data access, reusage and long term preservation. Tackling the challenge of increasing complexity in managing research data, the objective of bwFDM-Communities is to expose problems of research communities. 1 To achieve this goal, the project's key account managers enter into a dialogue with all relevant research groups at each university in Baden-Württemberg. Next to the identification of best practices, possible developments will be determined together with the scientists.
In 2554 User Storys wurden die Aussagen von 779 Forschenden zu ihren Bedarf beim Umgang mit Forsc... more In 2554 User Storys wurden die Aussagen von 779 Forschenden zu ihren Bedarf beim Umgang mit Forschungsdaten zusammengefasst. Der zugehorige Projektabschlussbericht fasst diese Einzelaussagen in durch diese Daten quantifizierbaren Themen zusammen.
Naunyn-schmiedebergs Archives of Pharmacology, Dec 1, 2010
Cardiac side effects of antidepressant drugs are well recognized. Adverse effects precipitated by... more Cardiac side effects of antidepressant drugs are well recognized. Adverse effects precipitated by the tricyclic drug desipramine include prolonged QT intervals, torsade de pointes tachycardia, heart failure, and sudden cardiac death. QT prolongation has been primarily attributed to acute blockade of hERG/I Kr currents. This study was designed to provide a more complete picture of cellular effects associated with desipramine. hERG channels were expressed in Xenopus laevis oocytes and human embryonic kidney (HEK 293) cells, and potassium currents were recorded using patch clamp and two-electrode voltage clamp electrophysiology. Ventricular action potentials were recorded from guinea pig cardiomyocytes. Protein trafficking and cell viability were evaluated in HEK 293 cells and in HL-1 mouse cardiomyocytes by immunocytochemistry, Western blot analysis, or colorimetric MTT assay, respectively. We found that desipramine reduced hERG currents by binding to a receptor site inside the channel pore. hERG protein surface expression was reduced after short-term treatment, revealing a previously unrecognized mechanism. When long-term effects were studied, forward trafficking was impaired and hERG currents were decreased. Action potential duration was prolonged upon acute and chronic desipramine exposure. Finally, desipramine triggered apoptosis in cells expressing hERG channels. Desipramine exerts at least four different cellular effects: (1) direct hERG channel block, (2) acute reduction of hERG surface expression, (3) chronic disruption of hERG trafficking, and (4) induction of apoptosis. These data highlight the complexity of hERG-associated drug effects. Keywords Action potential. hERG. Ion channels. K + channel. Long QT syndrome. Torsade de pointes I
The federal state of Baden-Wurttemberg wants to offer scientists the best conditions for research... more The federal state of Baden-Wurttemberg wants to offer scientists the best conditions for research. Against the backdrop of the ever-increasing importance of data and information the bwFDM-Communities project is tasked to develop recommendations that shall enable scientists in our federal state to process and use data without barriers. In order to achieve this objective, we engage an active dialogue with all university research groups in Baden-Wurttemberg (~3000). Next to identifying and advertising best-practice solutions, this project is supposed to gather information on how federal IT support needs to be expanded in order to meet the increasing demands of future research. As this is an ongoing project there may be further results in time, but some early conclusions can be drawn: Scientists want clear-cut requirements and responsibilities for data management and are willing to share their data if there is a proper appreciation model for data publication. Additionally, a lot of scientists complain about too strict law regulations regarding copyright and need better information about available RDM support, partners and opportunities. Final conclusions and recommendations can only be given in the further course of the project, but we are confident that our final recommendations will help the scientists in Baden-Wurttemberg.
The increase in compute power and development of sophisticated simulation models with higher reso... more The increase in compute power and development of sophisticated simulation models with higher resolution output triggers a need for compression algorithms for scientific data. Several compression algorithms are currently under development. Most of these algorithms are using prediction-based compression algorithms, where each value is predicted and the residual between the prediction and true value is saved on disk. Currently there are two established forms of residual calculation: Exclusive-or and numerical difference. In this paper we will summarize both techniques and show their strengths and weaknesses. We will show that shifting the prediction and true value to a binary number with certain properties results in a better compression factor with minimal additional computational costs. This gain in compression factor allows for the usage of less sophisticated prediction algorithms to achieve a higher throughput during compression and decompression. In addition, we will introduce a new encoding scheme to achieve an 9% increase in compression factor on average compared to the current state-of-the-art.
Advanced Functional Materials
The basic modules for materials research are systems for the design, synthesis, preparation, anal... more The basic modules for materials research are systems for the design, synthesis, preparation, analysis, and application of materials and materials systems. To be efficient and produce findable, accessible, interoperable, and reusable (FAIR) data, state‐of‐the‐art materials research needs to consider the integration of research data management (RDM) workflows and, in the end, the implementation of process automation concepts for all parts of the main modules. Here, the state‐of‐the‐art methods of RDM in academia are described and a perspective on the future of digitalized molecular material systems workflows is given. The different elements of an integrated research data management strategy are described, and examples of automated processes are depicted. As such, the use of electronic lab notebooks for comprehensive documentation, the use of data‐integration and data‐conversion strategies, and the establishment of two platforms that enable the automated synthesis of chemical component...
2019 15th International Conference on eScience (eScience)
The increase in compute power and development of sophisticated simulation models with higher reso... more The increase in compute power and development of sophisticated simulation models with higher resolution output triggers a need for compression algorithms for scientific data. Several compression algorithms are currently under development. Most of these algorithms are using prediction-based compression algorithms, where each value is predicted and the residual between the prediction and true value is saved on disk. Currently there are two established forms of residual calculation: Exclusive-or and numerical difference. In this paper we will summarize both techniques and show their strengths and weaknesses. We will show that shifting the prediction and true value to a binary number with certain properties results in a better compression factor with minimal additional computational costs. This gain in compression factor allows for the usage of less sophisticated prediction algorithms to achieve a higher throughput during compression and decompression. In addition, we will introduce a new encoding scheme to achieve an 9% increase in compression factor on average compared to the current state-of-the-art.
One of the scientific communities that generate the largest amounts of data today are the climate... more One of the scientific communities that generate the largest amounts of data today are the climate sciences. New climate models enable model integration at unprecedented resolution, simulating decades and centuries of climate change, including many complex interactions in the Earth system, under different scenarios. Previously, the CPU intensive numerical integration's used to be the bottleneck. Nowadays, limited storage space and ever increasing model output is the bigger challenge. The number of variables stored for post-processing analysis has to be limited to keep the data amounts small. For this reason, we look at lossless compression of climate data to make better use of available storage space. More specifically, we investigate prediction-based data compression. In prediction-based compression, data is processed in a predefined sequence. A prediction is provided for each data point based on prior data in the sequence. We show that there is a significant dependence of the compression ratio on the chosen traversal method and the underlying spatiotemporal data model. We examine the influence of this structural dependency on compression algorithms and explore possibilities to retrieve this information to improve compression ratios. To do this, we introduce the concept of Information Spaces (IS), which helps improve the predictions made by individual predictors by nearly 10% on average. More importantly, the standard deviation of the compression results is decreased by over 20% on average. The use of IS provides better predictions and more consistent compression ratios. Furthermore, it allows options for consolidation and fine-granular tuning of predictions, which are not possible with many common approaches used today.
Der Umgang mit Forschungsdaten gewinnt mit fortschreitender Digitalisierung an gesellschaftlicher... more Der Umgang mit Forschungsdaten gewinnt mit fortschreitender Digitalisierung an gesellschaftlicher Aufmerksamkeit. Während ein weitgehender Konsens zu den in den FAIR-Prinzipien dargestellten Anforderungen an gutes Forschungsdatenmanagement besteht, ist ihre Umsetzung eine längerfristige Aufgabe. Das im Rahmen der Nationalen Forschungsdateninfrastruktur (NFDI) gegründete Konsortium NFDI4Phys führte im Frühjahr 2020 eine Umfrage zur Ermittlung des status quo und der vordringlichen Bedarfe an das Forschungsdatenmanagement in der Physik durch. In diesem Dokument stellen wir die Methodik und detaillierte Auswertung der Umfrage vor; für den Kontext und unsere Schlüsse verweisen wir auf die Publikation im PhysikJournal.
Der Research Data Management Organiser (RDMO) unterstützt Forschungsprojekte bei der Planung, Ums... more Der Research Data Management Organiser (RDMO) unterstützt Forschungsprojekte bei der Planung, Umsetzung und Verwaltung aller Aufgaben des Forschungsdatenmanagements über den gesamten Datenlebenszyklus. Er ermöglicht zusätzlich die textuelle Ausgabe eines Datenmanagementplans (DMP) nach den Vorgaben unterschiedlicher Förderer. Das Erstellen eines DMP erfordert ein frühzeitiges Nachdenken über die eigenen Daten und darüber, wie mit ihnen nach Projektende umgegangen werden soll. Es stellen sich Fragen zu Nutzungsrechten, Publikationen und Archivierungsmöglichkeiten. RDMO kann sofort für kleinere und größere Projekten eingesetzt werden. In der zweiten Phase des RDMO-Projekts seit November 2017 erweitern die Projektpartner des Leibnitz-Institut für Astrophysik Potsdam, der Fachhochschule Potsdam und der Bibliothek des Karlsruher Instituts für Technologie die bereits veröffentlichte Version des RDMO und arbeiten intensiv mit der Anwendergemeinde zusammen. Es werden Workshops zur Diskussio...
Das Informationsportal zum Forschungsdatenmanagement: forschungsdaten.info
Picture that orders different kind of disciplines within a triangle that connects three "pur... more Picture that orders different kind of disciplines within a triangle that connects three "pure main goals" of science (at corners). Combined with further information it may help to understand the "data culture" in some discplines.
One of the scientiĄc communities that generate the largest amounts of data today are the climate ... more One of the scientiĄc communities that generate the largest amounts of data today are the climate sciences. New climate models enable model integrations at unprecedented resolution, simulating timescales from decades to centuries of climate change. Nowadays, limited storage space and ever increasing model output is a big challenge. For this reason, we look at lossless compression using prediction-based data compression. We show that there is a signiĄcant dependence of the compression rate on the chosen traversal method and the underlying data model. We examine the inĆuence of this structural dependency on prediction-based compression algorithms and explore possibilities to improve compression rates. We introduce the concept of Information Spaces (IS), which help to improve the accuracy of predictions by nearly 10% and decrease the standard deviation of the compression results by 20% on average.
Journal of Computational Chemistry, Aug 10, 2012
Journal of Computational Chemistry, Jun 8, 2011
The computational effort of biomolecular simulations can be significantly reduced by means of imp... more The computational effort of biomolecular simulations can be significantly reduced by means of implicit solvent models in which the energy generally contains a correction depending on the surface area and/or the volume of the molecule. In this article, we present simple derivation of exact, easy-to-use analytical formulas for these quantities and their derivatives with respect to atomic coordinates. In addition, we provide an efficient, linear-scaling algorithm for the construction of the power diagram required for practical implementation of these formulas. Our approach is implemented in a C++ header-only template library.
Journal of Computational Chemistry, Aug 9, 2012
The relevance of receptor conformational change during ligand binding is well documented for many... more The relevance of receptor conformational change during ligand binding is well documented for many pharmaceutically relevant receptors, but is still not fully accounted for in in silico docking methods. While there has been significant progress in treatment of receptor side chain flexibility sampling of backbone flexibility remains challenging because the conformational space expands dramatically and the scoring function must balance protein-protein and protein-ligand contributions. Here, we investigate an efficient multistage backbone reconstruction algorithm for large loop regions in the receptor and demonstrate that treatment of backbone receptor flexibility significantly improves binding mode prediction starting from apo structures and in cross docking simulations. For three different kinase receptors in which large flexible loops reconstruct upon ligand binding, we demonstrate that treatment of backbone flexibility results in accurate models of the complexes in simulations starting from the apo structure. At the example of the DFG-motif in the p38 kinase, we also show how loop reconstruction can be used to model allosteric binding. Our approach thus paves the way to treat the complex process of receptor reconstruction upon ligand binding in docking simulations and may help to design new ligands with high specificity by exploitation of allosteric mechanisms.
Journal of Cheminformatics, May 1, 2010
Heart Rhythm, Oct 1, 2014
BACKGROUND Effective treatment of atrial fibrillation (AF) remains an unmet need. Human K 2P 3.1 ... more BACKGROUND Effective treatment of atrial fibrillation (AF) remains an unmet need. Human K 2P 3.1 (TASK-1) K þ channels display atrial-specific expression and may serve as novel antiarrhythmic targets. In rodents, inhibition of K 2P 3.1 causes prolongation of action potentials and QT intervals. We used a porcine model to further elucidate the significance of K 2P 3.1 in large mammals. OBJECTIVE The purpose of this study was to study porcine (p) K 2P 3.1 channel function and cardiac expression and to analyze pK 2P 3.1 remodeling in AF and heart failure (HF). METHODS The porcine K 2P 3.1 ortholog was amplified and characterized using voltage-clamp electrophysiology. K 2P 3.1 mRNA expression and remodeling were studied in domestic pigs during AF and HF induced by atrial burst pacing. RESULTS Porcine K 2P 3.1 cDNA encodes a channel protein with 97% identity to human K 2P 3.1. K þ currents recorded from Xenopus oocytes expressing pK 2P 3.1 were functionally and pharmacologically similar to their human counterparts. In the pig, K 2P 3.1 mRNA was predominantly expressed in atrial tissue. AF and HF were associated with reduction of K 2P 3.1 mRNA levels by 85.1% (right atrium) and 77.0% (left atrium) at 21-day follow-up. In contrast, ventricular K 2P 3.1 expression was low and not significantly affected by AF/HF. CONCLUSION Porcine K 2P 3.1 channels exhibit atrial expression and functional properties similar to their human orthologs, supporting a general role as antiarrhythmic drug targets. K 2P 3.1 down-regulation in AF with HF may indicate functional relevance of the channel that remains to be validated in prospective interventional studies. KEYWORDS Atrial fibrillation; Background potassium current; Cardiac action potential; Electrical remodeling; K 2P channel ABBREVIATIONS AF ¼ atrial fibrillation; AERP ¼ atrial effective refractory period; HF ¼ heart failure; K 2P ¼ 2-pore domain K þ channel; LVEF ¼ left ventricular ejection fraction; RA ¼ right atrium; RT-qPCR ¼ quantitative real-time polymerase chain reaction; SR ¼ sinus rhythm; TASK ¼ TWIK-related acid sensitive K þ channel 1; TWIK ¼ tandem of P domains in a weak inward rectifying K þ channel
Research data are valuable goods that are often only reproducible with significant effort or, in ... more Research data are valuable goods that are often only reproducible with significant effort or, in the case of unique observations, not at all. Scientists focus on data analysis and its results. By now, data exploration is accepted as a fourth scientific pillar (next to experiments, theory, and simulation). A main prerequisite for easy data exploration is successful data management. A holistic approach includes all phases of a data lifecycle: data generation, data analysis, data ingest, data preservation, data access, reusage and long term preservation. Tackling the challenge of increasing complexity in managing research data, the objective of bwFDM-Communities is to expose problems of research communities. 1 To achieve this goal, the project's key account managers enter into a dialogue with all relevant research groups at each university in Baden-Württemberg. Next to the identification of best practices, possible developments will be determined together with the scientists.
In 2554 User Storys wurden die Aussagen von 779 Forschenden zu ihren Bedarf beim Umgang mit Forsc... more In 2554 User Storys wurden die Aussagen von 779 Forschenden zu ihren Bedarf beim Umgang mit Forschungsdaten zusammengefasst. Der zugehorige Projektabschlussbericht fasst diese Einzelaussagen in durch diese Daten quantifizierbaren Themen zusammen.
Naunyn-schmiedebergs Archives of Pharmacology, Dec 1, 2010
Cardiac side effects of antidepressant drugs are well recognized. Adverse effects precipitated by... more Cardiac side effects of antidepressant drugs are well recognized. Adverse effects precipitated by the tricyclic drug desipramine include prolonged QT intervals, torsade de pointes tachycardia, heart failure, and sudden cardiac death. QT prolongation has been primarily attributed to acute blockade of hERG/I Kr currents. This study was designed to provide a more complete picture of cellular effects associated with desipramine. hERG channels were expressed in Xenopus laevis oocytes and human embryonic kidney (HEK 293) cells, and potassium currents were recorded using patch clamp and two-electrode voltage clamp electrophysiology. Ventricular action potentials were recorded from guinea pig cardiomyocytes. Protein trafficking and cell viability were evaluated in HEK 293 cells and in HL-1 mouse cardiomyocytes by immunocytochemistry, Western blot analysis, or colorimetric MTT assay, respectively. We found that desipramine reduced hERG currents by binding to a receptor site inside the channel pore. hERG protein surface expression was reduced after short-term treatment, revealing a previously unrecognized mechanism. When long-term effects were studied, forward trafficking was impaired and hERG currents were decreased. Action potential duration was prolonged upon acute and chronic desipramine exposure. Finally, desipramine triggered apoptosis in cells expressing hERG channels. Desipramine exerts at least four different cellular effects: (1) direct hERG channel block, (2) acute reduction of hERG surface expression, (3) chronic disruption of hERG trafficking, and (4) induction of apoptosis. These data highlight the complexity of hERG-associated drug effects. Keywords Action potential. hERG. Ion channels. K + channel. Long QT syndrome. Torsade de pointes I
The federal state of Baden-Wurttemberg wants to offer scientists the best conditions for research... more The federal state of Baden-Wurttemberg wants to offer scientists the best conditions for research. Against the backdrop of the ever-increasing importance of data and information the bwFDM-Communities project is tasked to develop recommendations that shall enable scientists in our federal state to process and use data without barriers. In order to achieve this objective, we engage an active dialogue with all university research groups in Baden-Wurttemberg (~3000). Next to identifying and advertising best-practice solutions, this project is supposed to gather information on how federal IT support needs to be expanded in order to meet the increasing demands of future research. As this is an ongoing project there may be further results in time, but some early conclusions can be drawn: Scientists want clear-cut requirements and responsibilities for data management and are willing to share their data if there is a proper appreciation model for data publication. Additionally, a lot of scientists complain about too strict law regulations regarding copyright and need better information about available RDM support, partners and opportunities. Final conclusions and recommendations can only be given in the further course of the project, but we are confident that our final recommendations will help the scientists in Baden-Wurttemberg.
The increase in compute power and development of sophisticated simulation models with higher reso... more The increase in compute power and development of sophisticated simulation models with higher resolution output triggers a need for compression algorithms for scientific data. Several compression algorithms are currently under development. Most of these algorithms are using prediction-based compression algorithms, where each value is predicted and the residual between the prediction and true value is saved on disk. Currently there are two established forms of residual calculation: Exclusive-or and numerical difference. In this paper we will summarize both techniques and show their strengths and weaknesses. We will show that shifting the prediction and true value to a binary number with certain properties results in a better compression factor with minimal additional computational costs. This gain in compression factor allows for the usage of less sophisticated prediction algorithms to achieve a higher throughput during compression and decompression. In addition, we will introduce a new encoding scheme to achieve an 9% increase in compression factor on average compared to the current state-of-the-art.
Advanced Functional Materials
The basic modules for materials research are systems for the design, synthesis, preparation, anal... more The basic modules for materials research are systems for the design, synthesis, preparation, analysis, and application of materials and materials systems. To be efficient and produce findable, accessible, interoperable, and reusable (FAIR) data, state‐of‐the‐art materials research needs to consider the integration of research data management (RDM) workflows and, in the end, the implementation of process automation concepts for all parts of the main modules. Here, the state‐of‐the‐art methods of RDM in academia are described and a perspective on the future of digitalized molecular material systems workflows is given. The different elements of an integrated research data management strategy are described, and examples of automated processes are depicted. As such, the use of electronic lab notebooks for comprehensive documentation, the use of data‐integration and data‐conversion strategies, and the establishment of two platforms that enable the automated synthesis of chemical component...
2019 15th International Conference on eScience (eScience)
The increase in compute power and development of sophisticated simulation models with higher reso... more The increase in compute power and development of sophisticated simulation models with higher resolution output triggers a need for compression algorithms for scientific data. Several compression algorithms are currently under development. Most of these algorithms are using prediction-based compression algorithms, where each value is predicted and the residual between the prediction and true value is saved on disk. Currently there are two established forms of residual calculation: Exclusive-or and numerical difference. In this paper we will summarize both techniques and show their strengths and weaknesses. We will show that shifting the prediction and true value to a binary number with certain properties results in a better compression factor with minimal additional computational costs. This gain in compression factor allows for the usage of less sophisticated prediction algorithms to achieve a higher throughput during compression and decompression. In addition, we will introduce a new encoding scheme to achieve an 9% increase in compression factor on average compared to the current state-of-the-art.
One of the scientific communities that generate the largest amounts of data today are the climate... more One of the scientific communities that generate the largest amounts of data today are the climate sciences. New climate models enable model integration at unprecedented resolution, simulating decades and centuries of climate change, including many complex interactions in the Earth system, under different scenarios. Previously, the CPU intensive numerical integration's used to be the bottleneck. Nowadays, limited storage space and ever increasing model output is the bigger challenge. The number of variables stored for post-processing analysis has to be limited to keep the data amounts small. For this reason, we look at lossless compression of climate data to make better use of available storage space. More specifically, we investigate prediction-based data compression. In prediction-based compression, data is processed in a predefined sequence. A prediction is provided for each data point based on prior data in the sequence. We show that there is a significant dependence of the compression ratio on the chosen traversal method and the underlying spatiotemporal data model. We examine the influence of this structural dependency on compression algorithms and explore possibilities to retrieve this information to improve compression ratios. To do this, we introduce the concept of Information Spaces (IS), which helps improve the predictions made by individual predictors by nearly 10% on average. More importantly, the standard deviation of the compression results is decreased by over 20% on average. The use of IS provides better predictions and more consistent compression ratios. Furthermore, it allows options for consolidation and fine-granular tuning of predictions, which are not possible with many common approaches used today.
Der Umgang mit Forschungsdaten gewinnt mit fortschreitender Digitalisierung an gesellschaftlicher... more Der Umgang mit Forschungsdaten gewinnt mit fortschreitender Digitalisierung an gesellschaftlicher Aufmerksamkeit. Während ein weitgehender Konsens zu den in den FAIR-Prinzipien dargestellten Anforderungen an gutes Forschungsdatenmanagement besteht, ist ihre Umsetzung eine längerfristige Aufgabe. Das im Rahmen der Nationalen Forschungsdateninfrastruktur (NFDI) gegründete Konsortium NFDI4Phys führte im Frühjahr 2020 eine Umfrage zur Ermittlung des status quo und der vordringlichen Bedarfe an das Forschungsdatenmanagement in der Physik durch. In diesem Dokument stellen wir die Methodik und detaillierte Auswertung der Umfrage vor; für den Kontext und unsere Schlüsse verweisen wir auf die Publikation im PhysikJournal.
Der Research Data Management Organiser (RDMO) unterstützt Forschungsprojekte bei der Planung, Ums... more Der Research Data Management Organiser (RDMO) unterstützt Forschungsprojekte bei der Planung, Umsetzung und Verwaltung aller Aufgaben des Forschungsdatenmanagements über den gesamten Datenlebenszyklus. Er ermöglicht zusätzlich die textuelle Ausgabe eines Datenmanagementplans (DMP) nach den Vorgaben unterschiedlicher Förderer. Das Erstellen eines DMP erfordert ein frühzeitiges Nachdenken über die eigenen Daten und darüber, wie mit ihnen nach Projektende umgegangen werden soll. Es stellen sich Fragen zu Nutzungsrechten, Publikationen und Archivierungsmöglichkeiten. RDMO kann sofort für kleinere und größere Projekten eingesetzt werden. In der zweiten Phase des RDMO-Projekts seit November 2017 erweitern die Projektpartner des Leibnitz-Institut für Astrophysik Potsdam, der Fachhochschule Potsdam und der Bibliothek des Karlsruher Instituts für Technologie die bereits veröffentlichte Version des RDMO und arbeiten intensiv mit der Anwendergemeinde zusammen. Es werden Workshops zur Diskussio...
Das Informationsportal zum Forschungsdatenmanagement: forschungsdaten.info
2018 IEEE International Conference on Big Data (Big Data), 2018
One of the scientific communities that generate the largest amounts of data today are the climate... more One of the scientific communities that generate the largest amounts of data today are the climate sciences. New climate models enable model integration at unprecedented resolution, simulating decades and centuries of climate change, including many complex interactions in the Earth system, under different scenarios. Previously, the CPU intensive numerical integration's used to be the bottleneck. Nowadays, limited storage space and ever increasing model output is the bigger challenge. The number of variables stored for post-processing analysis has to be limited to keep the data amounts small. For this reason, we look at lossless compression of climate data to make better use of available storage space. More specifically, we investigate prediction-based data compression. In prediction-based compression, data is processed in a predefined sequence. A prediction is provided for each data point based on prior data in the sequence. We show that there is a significant dependence of the compression ratio on the chosen traversal method and the underlying spatiotemporal data model. We examine the influence of this structural dependency on compression algorithms and explore possibilities to retrieve this information to improve compression ratios. To do this, we introduce the concept of Information Spaces (IS), which helps improve the predictions made by individual predictors by nearly 10% on average. More importantly, the standard deviation of the compression results is decreased by over 20% on average. The use of IS provides better predictions and more consistent compression ratios. Furthermore, it allows options for consolidation and fine-granular tuning of predictions, which are not possible with many common approaches used today.