Konstantin Kozlov | Saint Petersburg State Polytechnical University (SPBSPU) (original) (raw)
Uploads
Papers by Konstantin Kozlov
Frontiers in Genetics, Nov 20, 2018
International Journal of Molecular Sciences, Mar 2, 2023
International Journal of Molecular Sciences
Human pluripotent stem cells are promising for a wide range of research and therapeutic purposes.... more Human pluripotent stem cells are promising for a wide range of research and therapeutic purposes. Their maintenance in culture requires the deep control of their pluripotent and clonal status. A non-invasive method for such control involves day-to-day observation of the morphological changes, along with imaging colonies, with the subsequent automatic assessment of colony phenotype using image analysis by machine learning methods. We developed a classifier using a convolutional neural network and applied it to discriminate between images of human embryonic stem cell (hESC) colonies with “good” and “bad” morphological phenotypes associated with a high and low potential for pluripotency and clonality maintenance, respectively. The training dataset included the phase-contrast images of hESC line H9, in which the morphological phenotype of each colony was assessed through visual analysis. The classifier showed a high level of accuracy (89%) in phenotype prediction. By training the classi...
bioRxiv (Cold Spring Harbor Laboratory), Aug 13, 2017
Robustness in development allows for the accumulation of neutral genetically based variation in e... more Robustness in development allows for the accumulation of neutral genetically based variation in expression, and here will be termed 'genetic stochasticity'. This largely neutral variation is potentially important for both evolution and complex disease phenotypes. However, it has generally only been investigated as variation exhibited in the response to large genetic perturbations. In addition, work on variation in gene expression has similarly generally been limited to being spatial, or quantitative, but because of technical restrictions not both. Here we bridge these gaps by investigating replicated quantitative spatial gene expression using rigorous statistical models, in different genotypes, sexes, and species (Drosophila melanogaster and D. simulans). Using this type of quantitative approach with developmental data allows for effective comparison among conditions, including health versus disease. We apply this approach to the morphogenetic furrow, a wave of differentiation that sweeps across the developing eye disc. Within the morphogenetic furrow, we focus on four conserved morphogens, hairy, atonal, hedgehog, and Delta. Hybridization chain reaction quantitatively measures spatial gene expression, co-staining for all four genes simultaneously and with minimal effort. We find considerable variation in the spatial expression pattern of these genes in the eye between species, genotypes, and sexes. We also find that there has been evolution of the regulatory relationship between these genes. Lastly, we show that the spatial interrelationships of these genes evolved between species in the morphogenetic furrow. This is essentially the first 'population genetics of development' as we are able to evaluate wild type differences in spatial and quantitative gene expression at the level of genotype, species and sex. .
Frontiers in Genetics, 2018
Plants
Flowering time is an important target for breeders in developing new varieties adapted to changin... more Flowering time is an important target for breeders in developing new varieties adapted to changing conditions. In this work, a new approach is proposed in which the SNP markers influencing time to flowering in mung bean are selected as important features in a random forest model. The genotypic and weather data are encoded in artificial image objects, and a model for flowering time prediction is constructed as a convolutional neural network. The model uses weather data for only a limited time period of 5 days before and 20 days after planting and is capable of predicting the time to flowering with high accuracy. The most important factors for model solution were identified using saliency maps and a Score-CAM method. Our approach can help breeding programs harness genotypic and phenotypic diversity to more effectively produce varieties with a desired flowering time.
Background Accurate prediction of crop flowering time is required for reaching maximal farm effic... more Background Accurate prediction of crop flowering time is required for reaching maximal farm efficiency. Several models developed to accomplish this goal are based on deep knowledge of plant phenology, requiring large investment for every individual crop or new variety. Mathematical modeling can be used to make better use of more shallow data and to extract information from it with higher efficiency. Cultivars of chickpea, Cicer arietanum, are currently being improved by introgressing wild C. reticulatum biodiversity with very different flowering time requirements. More understanding is required for how flowering time will depend on environmental conditions in these cultivars developed by introgression of wild alleles. Results We built a novel model for flowering time of wild chickpeas collected at 21 different sites in Turkey and grown in 4 distinct environmental conditions over several different years and seasons. We propose a general approach, in which the analytic forms of depend...
Additional file 1 contains information on SNP based groups, climatic data for these groups, detai... more Additional file 1 contains information on SNP based groups, climatic data for these groups, details on Grammatical evolution method. (PDF 634 kb)
The genetic structure of human populations is extraordinarily com-plex and of fundamental importa... more The genetic structure of human populations is extraordinarily com-plex and of fundamental importance to studies of anthropology, evo-lution, and medicine. As increasingly many individuals are of mixed origin, there is an unmet need for tools that can infer multiple ori-gins. Misclassication of such individuals can lead to incorrect and costly misinterpretations of genomic data, primarily in disease stud-ies and drug trials. We present an advanced tool to infer ancestry that can identify the biogeographic origins of highly mixed individ-uals. reAdmix can incorporate individual's knowledge of ancestors (e.g. having some ancestors from Turkey or a Scottish grandmother). reAdmix is an online tool available at
Motivation: Modern molecular biology has massive amounts of quantitative data already at its disp... more Motivation: Modern molecular biology has massive amounts of quantitative data already at its disposal. The crucially important problem for getting closer insights into mechanisms of development is to reduce the complexity of finding the parameters of mathematical models by fitting to experimental data. Results: The new Combined Optimization Technique (COT) showed a high accuracy in reconstruction of phenomenological parameters of equations and saved about 30 % of the most time consuming operations in computation that allow to propose the COT as quite attractive instrument for processing big amounts of experimental data of various nature. Availability: available on request from the authors
Agronomy
Accurate prediction of flowering time helps breeders to develop new varieties that can achieve ma... more Accurate prediction of flowering time helps breeders to develop new varieties that can achieve maximal efficiency in a changing climate. A methodology was developed for the construction of a simulation model for flowering time in which a function for daily progression of the plant from one to the next phenological phase is obtained in analytic form by stochastic minimization. The resulting model demonstrated high accuracy on the recently assembled data set of wild chickpeas. The inclusion of genotype-by-climatic factors interactions accounted to 77% of accuracy in terms of root mean square error. It was found that the impact of minimal temperature is positively correlated with the longitude at primary collection sites, while the impact of day length is negatively correlated. It was interpreted as adaptation of accessions from highlands to lower temperatures and those from lower elevation river valleys to shorter days. We used bootstrap resampling to construct an ensemble of models, ...
Biophysics
Precise prediction of the timing of floral initiation helps breeders create new varieties that ca... more Precise prediction of the timing of floral initiation helps breeders create new varieties that can achieve maximum efficiency under the influence of a changing climate. A previously constructed model was used to compare the impact of daily weather parameters on the flowering time of wild varieties of chickpeas that were collected in different geographic locations in Turkey. We found that plants from the high altitude areas, unlike plant samples from lower altitudes, can adapt to lower temperatures and longer days. Forecasts of changes in time to flowering in the studied wild chickpea varieties were made with the model and climate change predictions using MarkSim software to generate daily weather data for Ankara. The mean thresholds for the sowing flowering period for the 2020–2039, 2040–2059, and 2060–2080 time periods shifted for 21 combinations of the scenarios of plant growth and development and plant collecting sites, accounting for approximately half of the 40 cases, thereby suggesting a moderate effect of climate change on flowering time in the studied varieties.
Molecular Biology of the Cell
This work investigates the role of DNA-binding by Runt in regulating the sloppy-paired-1 ( slp1) ... more This work investigates the role of DNA-binding by Runt in regulating the sloppy-paired-1 ( slp1) gene, and in particular two distinct cis-regulatory elements that mediate regulation by Runt and other pair-rule transcription factors during Drosophila segmentation. We find that a DNA-binding defective form of Runt is ineffective at repressing both the distal (DESE) and proximal (PESE) early stripe elements of slp1 and is also compromised for DESE-dependent activation. The function of Runt-binding sites in DESE is further investigated using site-specific transgenesis and quantitative imaging techniques. When DESE is tested as an autonomous enhancer, mutagenesis of the Runt sites results in a clear loss of Runt-dependent repression but has little to no effect on Runt-dependent activation. Notably, mutagenesis of these same sites in the context of a reporter gene construct that also contains the PESE enhancer results in a significant reduction of DESE-dependent activation as well as the ...
Biophysics
The differential evolution entirely parallel method has been developed to enable the identificati... more The differential evolution entirely parallel method has been developed to enable the identification of unknown parameters of mathematical models by minimization of the deviation of the solution from experimental data. The method is implemented in a free open-source software that is downloadable from the Internet. The results of processing of test functions showed that the accuracy of the method is comparable to that of the three best algorithms from CEC-2014. The method has been successfully used in a number of real biological problems.
Advanced Techniques in Biology & Medicine, 2015
Agronomy
Flowering time is an important target for breeders in developing new varieties adapted to changin... more Flowering time is an important target for breeders in developing new varieties adapted to changing conditions. A new approach is proposed that uses Approximate Bayesian Computation with Differential Evolution to construct a pool of models for flowering time. The functions for daily progression of the plant from planting to flowering are obtained in analytic form and depend on daily values of climatic factors and genetic information. The resulting pool of models demonstrated high accuracy on the dataset. Day length, solar radiation and temperature had a large impact on the model accuracy, while the impact of precipitation was comparatively small and the impact of maximal temperature has the maximal variation. The model pool was used to investigate the behavior of accessions from the dataset in case of temperature increase by 0.05–6.00°. The time to flowering changed differently for different accessions. The Pearson correlation coefficient between the SNP value and the change in time ...
BMC Plant Biology
Background Phenology data collected recently for about 300 accessions of Vigna radiata (mungbean)... more Background Phenology data collected recently for about 300 accessions of Vigna radiata (mungbean) is an invaluable resource for investigation of impacts of climatic factors on plant development. Results We developed a new mathematical model that describes the dynamic control of time to flowering by daily values of maximal and minimal temperature, precipitation, day length and solar radiation. We obtained model parameters by adaptation to the available experimental data. The models were validated by cross-validation and used to demonstrate that the phenology of adaptive traits, like flowering time, is strongly predicted not only by local environmental factors but also by plant geographic origin and genotype. Conclusions Of local environmental factors maximal temperature appeared to be the most critical factor determining how faithfully the model describes the data. The models were applied to forecast time to flowering of accessions grown in Taiwan in future years 2020-2030.
Frontiers in Genetics, Nov 20, 2018
International Journal of Molecular Sciences, Mar 2, 2023
International Journal of Molecular Sciences
Human pluripotent stem cells are promising for a wide range of research and therapeutic purposes.... more Human pluripotent stem cells are promising for a wide range of research and therapeutic purposes. Their maintenance in culture requires the deep control of their pluripotent and clonal status. A non-invasive method for such control involves day-to-day observation of the morphological changes, along with imaging colonies, with the subsequent automatic assessment of colony phenotype using image analysis by machine learning methods. We developed a classifier using a convolutional neural network and applied it to discriminate between images of human embryonic stem cell (hESC) colonies with “good” and “bad” morphological phenotypes associated with a high and low potential for pluripotency and clonality maintenance, respectively. The training dataset included the phase-contrast images of hESC line H9, in which the morphological phenotype of each colony was assessed through visual analysis. The classifier showed a high level of accuracy (89%) in phenotype prediction. By training the classi...
bioRxiv (Cold Spring Harbor Laboratory), Aug 13, 2017
Robustness in development allows for the accumulation of neutral genetically based variation in e... more Robustness in development allows for the accumulation of neutral genetically based variation in expression, and here will be termed 'genetic stochasticity'. This largely neutral variation is potentially important for both evolution and complex disease phenotypes. However, it has generally only been investigated as variation exhibited in the response to large genetic perturbations. In addition, work on variation in gene expression has similarly generally been limited to being spatial, or quantitative, but because of technical restrictions not both. Here we bridge these gaps by investigating replicated quantitative spatial gene expression using rigorous statistical models, in different genotypes, sexes, and species (Drosophila melanogaster and D. simulans). Using this type of quantitative approach with developmental data allows for effective comparison among conditions, including health versus disease. We apply this approach to the morphogenetic furrow, a wave of differentiation that sweeps across the developing eye disc. Within the morphogenetic furrow, we focus on four conserved morphogens, hairy, atonal, hedgehog, and Delta. Hybridization chain reaction quantitatively measures spatial gene expression, co-staining for all four genes simultaneously and with minimal effort. We find considerable variation in the spatial expression pattern of these genes in the eye between species, genotypes, and sexes. We also find that there has been evolution of the regulatory relationship between these genes. Lastly, we show that the spatial interrelationships of these genes evolved between species in the morphogenetic furrow. This is essentially the first 'population genetics of development' as we are able to evaluate wild type differences in spatial and quantitative gene expression at the level of genotype, species and sex. .
Frontiers in Genetics, 2018
Plants
Flowering time is an important target for breeders in developing new varieties adapted to changin... more Flowering time is an important target for breeders in developing new varieties adapted to changing conditions. In this work, a new approach is proposed in which the SNP markers influencing time to flowering in mung bean are selected as important features in a random forest model. The genotypic and weather data are encoded in artificial image objects, and a model for flowering time prediction is constructed as a convolutional neural network. The model uses weather data for only a limited time period of 5 days before and 20 days after planting and is capable of predicting the time to flowering with high accuracy. The most important factors for model solution were identified using saliency maps and a Score-CAM method. Our approach can help breeding programs harness genotypic and phenotypic diversity to more effectively produce varieties with a desired flowering time.
Background Accurate prediction of crop flowering time is required for reaching maximal farm effic... more Background Accurate prediction of crop flowering time is required for reaching maximal farm efficiency. Several models developed to accomplish this goal are based on deep knowledge of plant phenology, requiring large investment for every individual crop or new variety. Mathematical modeling can be used to make better use of more shallow data and to extract information from it with higher efficiency. Cultivars of chickpea, Cicer arietanum, are currently being improved by introgressing wild C. reticulatum biodiversity with very different flowering time requirements. More understanding is required for how flowering time will depend on environmental conditions in these cultivars developed by introgression of wild alleles. Results We built a novel model for flowering time of wild chickpeas collected at 21 different sites in Turkey and grown in 4 distinct environmental conditions over several different years and seasons. We propose a general approach, in which the analytic forms of depend...
Additional file 1 contains information on SNP based groups, climatic data for these groups, detai... more Additional file 1 contains information on SNP based groups, climatic data for these groups, details on Grammatical evolution method. (PDF 634 kb)
The genetic structure of human populations is extraordinarily com-plex and of fundamental importa... more The genetic structure of human populations is extraordinarily com-plex and of fundamental importance to studies of anthropology, evo-lution, and medicine. As increasingly many individuals are of mixed origin, there is an unmet need for tools that can infer multiple ori-gins. Misclassication of such individuals can lead to incorrect and costly misinterpretations of genomic data, primarily in disease stud-ies and drug trials. We present an advanced tool to infer ancestry that can identify the biogeographic origins of highly mixed individ-uals. reAdmix can incorporate individual's knowledge of ancestors (e.g. having some ancestors from Turkey or a Scottish grandmother). reAdmix is an online tool available at
Motivation: Modern molecular biology has massive amounts of quantitative data already at its disp... more Motivation: Modern molecular biology has massive amounts of quantitative data already at its disposal. The crucially important problem for getting closer insights into mechanisms of development is to reduce the complexity of finding the parameters of mathematical models by fitting to experimental data. Results: The new Combined Optimization Technique (COT) showed a high accuracy in reconstruction of phenomenological parameters of equations and saved about 30 % of the most time consuming operations in computation that allow to propose the COT as quite attractive instrument for processing big amounts of experimental data of various nature. Availability: available on request from the authors
Agronomy
Accurate prediction of flowering time helps breeders to develop new varieties that can achieve ma... more Accurate prediction of flowering time helps breeders to develop new varieties that can achieve maximal efficiency in a changing climate. A methodology was developed for the construction of a simulation model for flowering time in which a function for daily progression of the plant from one to the next phenological phase is obtained in analytic form by stochastic minimization. The resulting model demonstrated high accuracy on the recently assembled data set of wild chickpeas. The inclusion of genotype-by-climatic factors interactions accounted to 77% of accuracy in terms of root mean square error. It was found that the impact of minimal temperature is positively correlated with the longitude at primary collection sites, while the impact of day length is negatively correlated. It was interpreted as adaptation of accessions from highlands to lower temperatures and those from lower elevation river valleys to shorter days. We used bootstrap resampling to construct an ensemble of models, ...
Biophysics
Precise prediction of the timing of floral initiation helps breeders create new varieties that ca... more Precise prediction of the timing of floral initiation helps breeders create new varieties that can achieve maximum efficiency under the influence of a changing climate. A previously constructed model was used to compare the impact of daily weather parameters on the flowering time of wild varieties of chickpeas that were collected in different geographic locations in Turkey. We found that plants from the high altitude areas, unlike plant samples from lower altitudes, can adapt to lower temperatures and longer days. Forecasts of changes in time to flowering in the studied wild chickpea varieties were made with the model and climate change predictions using MarkSim software to generate daily weather data for Ankara. The mean thresholds for the sowing flowering period for the 2020–2039, 2040–2059, and 2060–2080 time periods shifted for 21 combinations of the scenarios of plant growth and development and plant collecting sites, accounting for approximately half of the 40 cases, thereby suggesting a moderate effect of climate change on flowering time in the studied varieties.
Molecular Biology of the Cell
This work investigates the role of DNA-binding by Runt in regulating the sloppy-paired-1 ( slp1) ... more This work investigates the role of DNA-binding by Runt in regulating the sloppy-paired-1 ( slp1) gene, and in particular two distinct cis-regulatory elements that mediate regulation by Runt and other pair-rule transcription factors during Drosophila segmentation. We find that a DNA-binding defective form of Runt is ineffective at repressing both the distal (DESE) and proximal (PESE) early stripe elements of slp1 and is also compromised for DESE-dependent activation. The function of Runt-binding sites in DESE is further investigated using site-specific transgenesis and quantitative imaging techniques. When DESE is tested as an autonomous enhancer, mutagenesis of the Runt sites results in a clear loss of Runt-dependent repression but has little to no effect on Runt-dependent activation. Notably, mutagenesis of these same sites in the context of a reporter gene construct that also contains the PESE enhancer results in a significant reduction of DESE-dependent activation as well as the ...
Biophysics
The differential evolution entirely parallel method has been developed to enable the identificati... more The differential evolution entirely parallel method has been developed to enable the identification of unknown parameters of mathematical models by minimization of the deviation of the solution from experimental data. The method is implemented in a free open-source software that is downloadable from the Internet. The results of processing of test functions showed that the accuracy of the method is comparable to that of the three best algorithms from CEC-2014. The method has been successfully used in a number of real biological problems.
Advanced Techniques in Biology & Medicine, 2015
Agronomy
Flowering time is an important target for breeders in developing new varieties adapted to changin... more Flowering time is an important target for breeders in developing new varieties adapted to changing conditions. A new approach is proposed that uses Approximate Bayesian Computation with Differential Evolution to construct a pool of models for flowering time. The functions for daily progression of the plant from planting to flowering are obtained in analytic form and depend on daily values of climatic factors and genetic information. The resulting pool of models demonstrated high accuracy on the dataset. Day length, solar radiation and temperature had a large impact on the model accuracy, while the impact of precipitation was comparatively small and the impact of maximal temperature has the maximal variation. The model pool was used to investigate the behavior of accessions from the dataset in case of temperature increase by 0.05–6.00°. The time to flowering changed differently for different accessions. The Pearson correlation coefficient between the SNP value and the change in time ...
BMC Plant Biology
Background Phenology data collected recently for about 300 accessions of Vigna radiata (mungbean)... more Background Phenology data collected recently for about 300 accessions of Vigna radiata (mungbean) is an invaluable resource for investigation of impacts of climatic factors on plant development. Results We developed a new mathematical model that describes the dynamic control of time to flowering by daily values of maximal and minimal temperature, precipitation, day length and solar radiation. We obtained model parameters by adaptation to the available experimental data. The models were validated by cross-validation and used to demonstrate that the phenology of adaptive traits, like flowering time, is strongly predicted not only by local environmental factors but also by plant geographic origin and genotype. Conclusions Of local environmental factors maximal temperature appeared to be the most critical factor determining how faithfully the model describes the data. The models were applied to forecast time to flowering of accessions grown in Taiwan in future years 2020-2030.