Elena Czeizler | Aalto University (original) (raw)

Papers by Elena Czeizler

Research paper thumbnail of One Dimensional DNA Tiles Self Assembly Model Simulation

Int. J. Unconv. Comput., 2018

Research paper thumbnail of Computational modeling of the kinetic Tile Assembly Model using a rule-based approach

The (abstract) Tile Assembly Model (aTAM), is a mathematical paradigm for the study and algorithm... more The (abstract) Tile Assembly Model (aTAM), is a mathematical paradigm for the study and algorithmic design of DNA self-assembly systems. It employs the use of so-called DNA-tiles, which are abstractions of experimentally achievable DNA nanostructure complexes with similar inter-matching behaviours. To this day, there are about half-dozen different experimental implementations of DNA tiles and their sub-sequent algorithmic assembly into larger complexes, see e.g. Reif et al. 2012. In order to provide further insight into the assembly process, the aTAM model has been extended to a kinetic counterpart (kTAM). Although there is a wide abundance of different variants of the abstract model, e.g., stage, step, hierarchical, temperature-k, signal-passing, etc. (see e.g. Patitz 2012), numerical simulations of the kinetic counterpart have been performed only for a few types of these systems. This might be due to the fact that the numerical models and simulations of kTAM were almost exclusivel...

Research paper thumbnail of Computational modelling of the kinetic Tile Assembly Model using a rule-based approach

Theoretical Computer Science, 2017

The (abstract) Tile Assembly Model (aTAM), is a mathematical paradigm for the study and algorithm... more The (abstract) Tile Assembly Model (aTAM), is a mathematical paradigm for the study and algorithmic design of DNA self-assembly systems. It employs the use of so-called DNA-tiles, which are abstractions of experimentally achievable DNA nanostructure complexes with similar inter-matching behaviours. To this day, there are about half-dozen different experimental implementations of DNA tiles and their subsequent algorithmic assembly into larger complexes, see e.g. Reif et al. 2012. In order to provide further insight into the assembly process, the aTAM model has been extended to a kinetic counterpart (kTAM). Although there is a wide abundance of different variants of the abstract model, e.g., stage, step, hierarchical, temperature-k, signal-passing, etc. (see e.g. Patitz 2012), numerical simulations of the kinetic counterpart have been performed only for a few types of these systems. This might be due to the fact that the numerical models and simulations of kTAM were almost exclusively implemented using classical stochastic simulation algorithms frameworks, which are not designed for capturing models with theoretically unbounded number of species. In this paper we introduce an agent-and rule-based modeling approach for kTAM, and its implementation on NFsim, one of the available platforms for such type of modelling. We show not only how the modelling of kTAM can be implemented, but we also explore the advantages of this modelling framework for kinetic simulations of kTAM and the easy way such models can be updated and modified. We present numerical comparisons both with classical numerical simulations of kTAM, as well as comparison in between four different kinetic variant of the TAM model, all implemented in NFsim as stand-alone rule-based models. 1. Introduction Recent advances in DNA-based nano-technology have opened the way towards the systematic engineering of inexpensive, nucleic-acid based nano-scale

Research paper thumbnail of Using federated data sources and Varian Learning Portal framework to train a neural network model for automatic organ segmentation

Physica Medica, 2020

In this study we trained a deep neural network model for female pelvis organ segmentation using d... more In this study we trained a deep neural network model for female pelvis organ segmentation using data from several sites without any personal data sharing. The goal was to assess its prediction power compared with the model trained in a centralized manner. Methods: Varian Learning Portal (VLP) is a distributed machine learning (ML) infrastructure enabling privacypreserving research across hospitals from different regions or countries, within the framework of a trusted consortium. Such a framework is relevant in the case when there is a high level of trust among the participating sites, but there are legal restrictions which do not allow the actual data sharing between them. We trained an organ segmentation model for the female pelvic region using the synchronous data distributed framework provided by the VLP. Results: The prediction performance of the model trained using the federated framework offered by VLP was on the same level as the performance of the model trained in a centralized manner where all training data was pulled together in one centre. Conclusions: VLP infrastructure can be used for GPU-based training of a deep neural network for organ segmentation for the female pelvic region. This organ segmentation instance is particularly difficult due to the high variation in the organs' shape and size. Being able to train the model using data from several clinics can help, for instance, by exposing the model to a larger range of data variations. VLP framework enables such a distributed training approach without sharing protected health information.

Research paper thumbnail of The non-parametrizability of the word equation

Research paper thumbnail of Methods for Biochemical Decomposition and Quantitative Submodel Comparison

Research paper thumbnail of An Extension of Lyndon and Schutzenberger's Result to DNA-Like Strings

Research paper thumbnail of On different constrains on three and four words

Research paper thumbnail of Control Strategies for the Regulation of the Eukaryotic Heat Shock Response

Lecture Notes in Computer Science, 2009

Elevated temperatures cause proteins in living cells to misfold. They start forming larger and la... more Elevated temperatures cause proteins in living cells to misfold. They start forming larger and larger aggregates that can eventually lead to the cell’s death. The heat shock response is an evolutionary well conserved cellular response to massive protein misfolding and it is driven by the need to keep the level of misfolded proteins under control. We consider in this paper

Research paper thumbnail of A graph-theoretical approach for motif discovery in protein sequences

Research paper thumbnail of A network analysis of large-scale biomedical data for identifying cancer subtypes-Extended Abstract

Research paper thumbnail of Quantitative Refinement of Reaction Models

ABSTRACT One approach to modelling complex biological systems is to start from an abstract repres... more ABSTRACT One approach to modelling complex biological systems is to start from an abstract representation of the biological process and then to incorporate more details regarding its reactions or reactants through an iterative refinement process. The refinement should be done so as to ensure the preservation of the numerical properties of the model, such as its numerical fit and validation. Such approaches are well established in software engineering: starting from a formal specification of the system, one refines it step-by-step towards an implementation that is guaranteed to satisfy a number of logical properties. We introduce here the concepts of (quantitative) data refinement and process refinement of a biomolecular, reaction-based model. We choose as a case study a recently proposed model for the heat shock response and refine it to include some details of its acetylation-induced control. Although the refinement process produces a substantial increase in the number of kinetic parameters and variables, the methodology we propose preserves all the numerical properties of the model with a minimal computational effort.

Research paper thumbnail of On a special class of primitive words

Theoretical Computer Science, 2010

When representing DNA molecules as words, it is necessary to take into account the fact that a wo... more When representing DNA molecules as words, it is necessary to take into account the fact that a word u encodes basically the same information as its Watson-Crick complement θ (u), where θ denotes the Watson-Crick complementarity function. Thus, an expression which involves only a word u and its complement can be still considered as a repeating sequence. In this context, we define and investigate the properties of a special class of primitive words, called pseudo-primitive words relative to θ or simply θ-primitive words, which cannot be expressed as such repeating sequences. For instance, we prove the existence of a unique θ-primitive root of a given word, and we give some constraints forcing two distinct words to share their θ-primitive root. Also, we present an extension of the well-known Fine and Wilf theorem, for which we give an optimal bound.

Research paper thumbnail of Computational Methods for Quantitative Submodel Comparison

From Logic Systems to Smart Sensors and Actuators, 2012

Research paper thumbnail of Biclustering Methods: Biological Relevance and Application in Gene Expression Analysis

PLoS ONE, 2014

DNA microarray technologies are used extensively to profile the expression levels of thousands of... more DNA microarray technologies are used extensively to profile the expression levels of thousands of genes under various conditions, yielding extremely large data-matrices. Thus, analyzing this information and extracting biologically relevant knowledge becomes a considerable challenge. A classical approach for tackling this challenge is to use clustering (also known as one-way clustering) methods where genes (or respectively samples) are grouped together based on the similarity of their expression profiles across the set of all samples (or respectively genes). An alternative approach is to develop biclustering methods to identify local patterns in the data. These methods extract subgroups of genes that are co-expressed across only a subset of samples and may feature important biological or medical implications. In this study we evaluate 13 biclustering and 2 clustering (k-means and hierarchical) methods. We use several approaches to compare their performance on two real gene expression data sets. For this purpose we apply four evaluation measures in our analysis: (1) we examine how well the considered (bi)clustering methods differentiate various sample types; (2) we evaluate how well the groups of genes discovered by the (bi)clustering methods are annotated with similar Gene Ontology categories; (3) we evaluate the capability of the methods to differentiate genes that are known to be specific to the particular sample types we study and (4) we compare the running time of the algorithms. In the end, we conclude that as long as the samples are well defined and annotated, the contamination of the samples is limited, and the samples are well replicated, biclustering methods such as Plaid and SAMBA are useful for discovering relevant subsets of genes and samples.

Research paper thumbnail of On the power of parallel communicating Watson–Crick automata systems

Theoretical Computer Science, 2006

Parallel communicating Watson-Crick automata systems were introduced in [E. Czeizler, E. Czeizler... more Parallel communicating Watson-Crick automata systems were introduced in [E. Czeizler, E. Czeizler, Parallel communicating Watson-Crick automata systems, in: Z. Ésik, Z. Fülöp (Eds.), Proc. Automata and Formal Languages, Dobogókő, Hungary, 2005, pp. 83-96] as possible models of DNA computations. This combination of Watson-Crick automata and parallel communicating systems comes as a natural extension due to the new developments in DNA manipulation techniques. It is already known, see [D. Kuske, P. Weigel, The that for Watson-Crick finite automata, the complementarity relation plays no active role. However, this is not the case when considering parallel communicating Watson-Crick automata systems. In this paper we prove that non-injective complementarity relations increase the accepting power of these systems. We also prove that although Watson-Crick automata are equivalent to two-head finite automata, this equivalence is not preserved when comparing parallel communicating Watson-Crick automata systems and multi-head finite automata.

Research paper thumbnail of The non-parametrizability of the word equation : A short proof

Theoretical Computer Science, 2005

Although Makanin proved the problem of satisfiability of word equations to be decidable, the gene... more Although Makanin proved the problem of satisfiability of word equations to be decidable, the general structure of solutions is difficult to describe. In particular, Hmelevskii proved that the set of solutions of xyz = zvx cannot be described using only finitely many parameters, contrary to the case of equations in three unknowns. In this paper we give a short, elementary proof of Hmelevskii's result.

Research paper thumbnail of Multiple constraints on three and four words

Theoretical Computer Science, 2008

In this paper we investigate the maximal size of chains of equations on three or four words such ... more In this paper we investigate the maximal size of chains of equations on three or four words such that every time we add a new equation the set of solutions strictly decreases. We also investigate how large systems of pairwise independent or pairwise non-equivalent equations exist accepting purely non-periodic solutions.

Research paper thumbnail of On systems of word equations over three unknowns with at most six occurrences of one of the unknowns

Theoretical Computer Science, 2009

ABSTRACT In this paper, we investigate the open question, formulated in 1983 by Culik II and Karh... more ABSTRACT In this paper, we investigate the open question, formulated in 1983 by Culik II and Karhumäki, asking whether there exist independent systems of three word equations over three unknowns admitting non-periodic solutions. In particular, we answer negatively the above mentioned question for systems in which one of the unknowns occurs at most six times. That is, we show that such systems admit only periodic solutions or they are not independent.

Research paper thumbnail of On the descriptional complexity of Watson–Crick automata

Theoretical Computer Science, 2009

Watson-Crick automata are finite state automata working on double-stranded tapes, introduced to i... more Watson-Crick automata are finite state automata working on double-stranded tapes, introduced to investigate the potential of DNA molecules for computing. In this paper, we continue the investigation of descriptional complexity of Watson-Crick automata initiated by Păun et al. [A. Păun, M. Păun, State and transition complexity of Watson-Crick finite automata, in: G. Ciobanu, G. Paun (Eds.), Fundamentals of Computation Theory, FCT'99, in: LNCS, vol. 1684. In particular, we show that any finite language, as well as any unary regular language, can be recognized by a Watson-Crick automaton with only two, and respectively three, states. Also, we formally define the notion of determinism for these systems. Contrary to the case of non-deterministic Watson-Crick automata, we show that, for deterministic ones, the complementarity relation plays a major role in the acceptance power of these systems.

Research paper thumbnail of One Dimensional DNA Tiles Self Assembly Model Simulation

Int. J. Unconv. Comput., 2018

Research paper thumbnail of Computational modeling of the kinetic Tile Assembly Model using a rule-based approach

The (abstract) Tile Assembly Model (aTAM), is a mathematical paradigm for the study and algorithm... more The (abstract) Tile Assembly Model (aTAM), is a mathematical paradigm for the study and algorithmic design of DNA self-assembly systems. It employs the use of so-called DNA-tiles, which are abstractions of experimentally achievable DNA nanostructure complexes with similar inter-matching behaviours. To this day, there are about half-dozen different experimental implementations of DNA tiles and their sub-sequent algorithmic assembly into larger complexes, see e.g. Reif et al. 2012. In order to provide further insight into the assembly process, the aTAM model has been extended to a kinetic counterpart (kTAM). Although there is a wide abundance of different variants of the abstract model, e.g., stage, step, hierarchical, temperature-k, signal-passing, etc. (see e.g. Patitz 2012), numerical simulations of the kinetic counterpart have been performed only for a few types of these systems. This might be due to the fact that the numerical models and simulations of kTAM were almost exclusivel...

Research paper thumbnail of Computational modelling of the kinetic Tile Assembly Model using a rule-based approach

Theoretical Computer Science, 2017

The (abstract) Tile Assembly Model (aTAM), is a mathematical paradigm for the study and algorithm... more The (abstract) Tile Assembly Model (aTAM), is a mathematical paradigm for the study and algorithmic design of DNA self-assembly systems. It employs the use of so-called DNA-tiles, which are abstractions of experimentally achievable DNA nanostructure complexes with similar inter-matching behaviours. To this day, there are about half-dozen different experimental implementations of DNA tiles and their subsequent algorithmic assembly into larger complexes, see e.g. Reif et al. 2012. In order to provide further insight into the assembly process, the aTAM model has been extended to a kinetic counterpart (kTAM). Although there is a wide abundance of different variants of the abstract model, e.g., stage, step, hierarchical, temperature-k, signal-passing, etc. (see e.g. Patitz 2012), numerical simulations of the kinetic counterpart have been performed only for a few types of these systems. This might be due to the fact that the numerical models and simulations of kTAM were almost exclusively implemented using classical stochastic simulation algorithms frameworks, which are not designed for capturing models with theoretically unbounded number of species. In this paper we introduce an agent-and rule-based modeling approach for kTAM, and its implementation on NFsim, one of the available platforms for such type of modelling. We show not only how the modelling of kTAM can be implemented, but we also explore the advantages of this modelling framework for kinetic simulations of kTAM and the easy way such models can be updated and modified. We present numerical comparisons both with classical numerical simulations of kTAM, as well as comparison in between four different kinetic variant of the TAM model, all implemented in NFsim as stand-alone rule-based models. 1. Introduction Recent advances in DNA-based nano-technology have opened the way towards the systematic engineering of inexpensive, nucleic-acid based nano-scale

Research paper thumbnail of Using federated data sources and Varian Learning Portal framework to train a neural network model for automatic organ segmentation

Physica Medica, 2020

In this study we trained a deep neural network model for female pelvis organ segmentation using d... more In this study we trained a deep neural network model for female pelvis organ segmentation using data from several sites without any personal data sharing. The goal was to assess its prediction power compared with the model trained in a centralized manner. Methods: Varian Learning Portal (VLP) is a distributed machine learning (ML) infrastructure enabling privacypreserving research across hospitals from different regions or countries, within the framework of a trusted consortium. Such a framework is relevant in the case when there is a high level of trust among the participating sites, but there are legal restrictions which do not allow the actual data sharing between them. We trained an organ segmentation model for the female pelvic region using the synchronous data distributed framework provided by the VLP. Results: The prediction performance of the model trained using the federated framework offered by VLP was on the same level as the performance of the model trained in a centralized manner where all training data was pulled together in one centre. Conclusions: VLP infrastructure can be used for GPU-based training of a deep neural network for organ segmentation for the female pelvic region. This organ segmentation instance is particularly difficult due to the high variation in the organs' shape and size. Being able to train the model using data from several clinics can help, for instance, by exposing the model to a larger range of data variations. VLP framework enables such a distributed training approach without sharing protected health information.

Research paper thumbnail of The non-parametrizability of the word equation

Research paper thumbnail of Methods for Biochemical Decomposition and Quantitative Submodel Comparison

Research paper thumbnail of An Extension of Lyndon and Schutzenberger's Result to DNA-Like Strings

Research paper thumbnail of On different constrains on three and four words

Research paper thumbnail of Control Strategies for the Regulation of the Eukaryotic Heat Shock Response

Lecture Notes in Computer Science, 2009

Elevated temperatures cause proteins in living cells to misfold. They start forming larger and la... more Elevated temperatures cause proteins in living cells to misfold. They start forming larger and larger aggregates that can eventually lead to the cell’s death. The heat shock response is an evolutionary well conserved cellular response to massive protein misfolding and it is driven by the need to keep the level of misfolded proteins under control. We consider in this paper

Research paper thumbnail of A graph-theoretical approach for motif discovery in protein sequences

Research paper thumbnail of A network analysis of large-scale biomedical data for identifying cancer subtypes-Extended Abstract

Research paper thumbnail of Quantitative Refinement of Reaction Models

ABSTRACT One approach to modelling complex biological systems is to start from an abstract repres... more ABSTRACT One approach to modelling complex biological systems is to start from an abstract representation of the biological process and then to incorporate more details regarding its reactions or reactants through an iterative refinement process. The refinement should be done so as to ensure the preservation of the numerical properties of the model, such as its numerical fit and validation. Such approaches are well established in software engineering: starting from a formal specification of the system, one refines it step-by-step towards an implementation that is guaranteed to satisfy a number of logical properties. We introduce here the concepts of (quantitative) data refinement and process refinement of a biomolecular, reaction-based model. We choose as a case study a recently proposed model for the heat shock response and refine it to include some details of its acetylation-induced control. Although the refinement process produces a substantial increase in the number of kinetic parameters and variables, the methodology we propose preserves all the numerical properties of the model with a minimal computational effort.

Research paper thumbnail of On a special class of primitive words

Theoretical Computer Science, 2010

When representing DNA molecules as words, it is necessary to take into account the fact that a wo... more When representing DNA molecules as words, it is necessary to take into account the fact that a word u encodes basically the same information as its Watson-Crick complement θ (u), where θ denotes the Watson-Crick complementarity function. Thus, an expression which involves only a word u and its complement can be still considered as a repeating sequence. In this context, we define and investigate the properties of a special class of primitive words, called pseudo-primitive words relative to θ or simply θ-primitive words, which cannot be expressed as such repeating sequences. For instance, we prove the existence of a unique θ-primitive root of a given word, and we give some constraints forcing two distinct words to share their θ-primitive root. Also, we present an extension of the well-known Fine and Wilf theorem, for which we give an optimal bound.

Research paper thumbnail of Computational Methods for Quantitative Submodel Comparison

From Logic Systems to Smart Sensors and Actuators, 2012

Research paper thumbnail of Biclustering Methods: Biological Relevance and Application in Gene Expression Analysis

PLoS ONE, 2014

DNA microarray technologies are used extensively to profile the expression levels of thousands of... more DNA microarray technologies are used extensively to profile the expression levels of thousands of genes under various conditions, yielding extremely large data-matrices. Thus, analyzing this information and extracting biologically relevant knowledge becomes a considerable challenge. A classical approach for tackling this challenge is to use clustering (also known as one-way clustering) methods where genes (or respectively samples) are grouped together based on the similarity of their expression profiles across the set of all samples (or respectively genes). An alternative approach is to develop biclustering methods to identify local patterns in the data. These methods extract subgroups of genes that are co-expressed across only a subset of samples and may feature important biological or medical implications. In this study we evaluate 13 biclustering and 2 clustering (k-means and hierarchical) methods. We use several approaches to compare their performance on two real gene expression data sets. For this purpose we apply four evaluation measures in our analysis: (1) we examine how well the considered (bi)clustering methods differentiate various sample types; (2) we evaluate how well the groups of genes discovered by the (bi)clustering methods are annotated with similar Gene Ontology categories; (3) we evaluate the capability of the methods to differentiate genes that are known to be specific to the particular sample types we study and (4) we compare the running time of the algorithms. In the end, we conclude that as long as the samples are well defined and annotated, the contamination of the samples is limited, and the samples are well replicated, biclustering methods such as Plaid and SAMBA are useful for discovering relevant subsets of genes and samples.

Research paper thumbnail of On the power of parallel communicating Watson–Crick automata systems

Theoretical Computer Science, 2006

Parallel communicating Watson-Crick automata systems were introduced in [E. Czeizler, E. Czeizler... more Parallel communicating Watson-Crick automata systems were introduced in [E. Czeizler, E. Czeizler, Parallel communicating Watson-Crick automata systems, in: Z. Ésik, Z. Fülöp (Eds.), Proc. Automata and Formal Languages, Dobogókő, Hungary, 2005, pp. 83-96] as possible models of DNA computations. This combination of Watson-Crick automata and parallel communicating systems comes as a natural extension due to the new developments in DNA manipulation techniques. It is already known, see [D. Kuske, P. Weigel, The that for Watson-Crick finite automata, the complementarity relation plays no active role. However, this is not the case when considering parallel communicating Watson-Crick automata systems. In this paper we prove that non-injective complementarity relations increase the accepting power of these systems. We also prove that although Watson-Crick automata are equivalent to two-head finite automata, this equivalence is not preserved when comparing parallel communicating Watson-Crick automata systems and multi-head finite automata.

Research paper thumbnail of The non-parametrizability of the word equation : A short proof

Theoretical Computer Science, 2005

Although Makanin proved the problem of satisfiability of word equations to be decidable, the gene... more Although Makanin proved the problem of satisfiability of word equations to be decidable, the general structure of solutions is difficult to describe. In particular, Hmelevskii proved that the set of solutions of xyz = zvx cannot be described using only finitely many parameters, contrary to the case of equations in three unknowns. In this paper we give a short, elementary proof of Hmelevskii's result.

Research paper thumbnail of Multiple constraints on three and four words

Theoretical Computer Science, 2008

In this paper we investigate the maximal size of chains of equations on three or four words such ... more In this paper we investigate the maximal size of chains of equations on three or four words such that every time we add a new equation the set of solutions strictly decreases. We also investigate how large systems of pairwise independent or pairwise non-equivalent equations exist accepting purely non-periodic solutions.

Research paper thumbnail of On systems of word equations over three unknowns with at most six occurrences of one of the unknowns

Theoretical Computer Science, 2009

ABSTRACT In this paper, we investigate the open question, formulated in 1983 by Culik II and Karh... more ABSTRACT In this paper, we investigate the open question, formulated in 1983 by Culik II and Karhumäki, asking whether there exist independent systems of three word equations over three unknowns admitting non-periodic solutions. In particular, we answer negatively the above mentioned question for systems in which one of the unknowns occurs at most six times. That is, we show that such systems admit only periodic solutions or they are not independent.

Research paper thumbnail of On the descriptional complexity of Watson–Crick automata

Theoretical Computer Science, 2009

Watson-Crick automata are finite state automata working on double-stranded tapes, introduced to i... more Watson-Crick automata are finite state automata working on double-stranded tapes, introduced to investigate the potential of DNA molecules for computing. In this paper, we continue the investigation of descriptional complexity of Watson-Crick automata initiated by Păun et al. [A. Păun, M. Păun, State and transition complexity of Watson-Crick finite automata, in: G. Ciobanu, G. Paun (Eds.), Fundamentals of Computation Theory, FCT'99, in: LNCS, vol. 1684. In particular, we show that any finite language, as well as any unary regular language, can be recognized by a Watson-Crick automaton with only two, and respectively three, states. Also, we formally define the notion of determinism for these systems. Contrary to the case of non-deterministic Watson-Crick automata, we show that, for deterministic ones, the complementarity relation plays a major role in the acceptance power of these systems.