Nikolai Zolotykh - Academia.edu (original) (raw)

Papers by Nikolai Zolotykh

BioMed Research International, 2015

We demonstrate the potential of differentiating embryonic and induced pluripotent stem cells by t... more We demonstrate the potential of differentiating embryonic and induced pluripotent stem cells by the regularized linear and decision tree machine learning classification algorithms, based on a number of intragene methylation measures. The resulting average accuracy of classification has been proven to be above 95%, which overcomes the earlier achievements. We propose a constructive and transparent method of feature selection based on classifier accuracy. Enrichment analysis reveals statistically meaningful presence of stemness group and cancer discriminating genes among the selected best classifying features. These findings stimulate the further research on the functional consequences of these differences in methylation patterns. The presented approach can be broadly used to discriminate the cells of different phenotype or in different state by their methylation profiles, identify groups of genes constituting multifeature classifiers, and assess enrichment of these groups by the sets...

Procedia Computer Science

Studies in computational intelligence, Oct 19, 2022

Discrete Applied Mathematics

It is known that a positive Boolean function f depending on n variables has at least n + 1 extrem... more It is known that a positive Boolean function f depending on n variables has at least n + 1 extremal points, i.e. minimal ones and maximal zeros. We show that f has exactly n + 1 extremal points if and only if it is linear read-once. The class of linear read-once functions is known to be the intersection of the classes of read-once and threshold functions. Generalizing this result we show that the class of linear read-once functions is the intersection of read-once and Chow functions. We also find the set of minimal read-once functions which are not linear read-once and the set of minimal threshold functions which are not linear read-once. In other words, we characterize the class of linear read-once functions by means of minimal forbidden subfunctions within the universe of read-once and the universe of threshold functions. Within the universe of threshold functions the importance of linear read-once functions is due to the fact that they attain the minimum value of the specification number, which is n + 1 for functions depending on n variables. In 1995 Anthony et al. conjectured that for all other threshold functions the specification number is strictly greater than n + 1. We disprove this conjecture by exhibiting a threshold non-linear read-once function depending on n variables whose specification number is n + 1.

Studies in Computational Intelligence, 2019

We propose an algorithm for electrocardiogram (ECG) segmentation using a UNet-like full-convoluti... more We propose an algorithm for electrocardiogram (ECG) segmentation using a UNet-like full-convolutional neural network. The algorithm receives an arbitrary sampling rate ECG signal as an input, and gives a list of onsets and offsets of P and T waves and QRS complexes as output. Our method of segmentation differs from others in speed, a small number of parameters and a good generalization: it is adaptive to different sampling rates and it is generalized to various types of ECG monitors. The proposed approach is superior to other state-of-the-art segmentation methods in terms of quality. In particular, F 1-measures for detection of onsets and offsets of P and T waves and for QRS-complexes are at least 97.8%, 99.5%, and 99.9%, respectively.

Lobachevsky University Electrocardiography Database (LUDB) is an ECG signal database with marked ... more Lobachevsky University Electrocardiography Database (LUDB) is an ECG signal database with marked boundaries and peaks of P, T waves and QRS complexes. The database consists of 200 10-second 12-lead ECG signal records representing different morphologies of the ECG signal. The ECGs were collected from healthy volunteers and patients of the Nizhny Novgorod City Hospital No 5 in 2017-2018\. The patients had various cardiovascular diseases while some of them had pacemakers. The boundaries of P, T waves and QRS complexes were manually annotated by cardiologists for all 200 records. Also, each record is annotated with the corresponding diagnosis. The database can be used for educational purposes as well as for training and testing algorithms for ECG delineation, i.e. for automatic detection of boundaries and peaks of P, T waves and QRS complexes.

Communications in Computer and Information Science, 2020

Most decision tree induction algorithms are based on a greedy top-down recursive partitioning str... more Most decision tree induction algorithms are based on a greedy top-down recursive partitioning strategy for tree growth. In this paper, we propose several methods for induction of decision trees and their ensembles based on evolutionary algorithms. The main difference of our approach is using real-valued vector representation of decision tree that allows to use a large number of different optimization algorithms, as well as optimize the whole tree or ensemble for avoiding local optima. Differential evolution and evolution strategies were chosen as optimization algorithms, as they have good results in reinforcement learning problems. We test the predictive performance of this methods using several public UCI data sets, and the proposed methods show better quality than classical methods.

Павел Николаевич Дружков, кафедра математической логики и высшей алгебры, Нижегородский государст... more Павел Николаевич Дружков, кафедра математической логики и высшей алгебры, Нижегородский государственный университет им. Н.И.Лобачевского (Россия, г. Нижний Новгород), druzhkov_paul@mail.ru. Pavel Nikolaevich Druzhkov, Department of Mathematical Logic and Higher Algebra, N.I. Lobachevsky State University of Nizhni Novgorod (Russia, Nizhni Novgorod), druzhkov_paul@mail.ru. Николай Юрьевич Золотых, кандидат физико-математических наук, доцент, кафедра математической логики и высшей алгебры, Нижегородский государственный университет им. Н.И. Лобачевского (Россия, г. Нижний Новгород), nikolai.zolotykh@gmail.com. Nikolai Yur'evich Zolotykh, Candidate of Physico-mathematical Sciences, Department of Mathematical Logic and Higher Algebra, N.I. Lobachevsky State University of Nizhni Novgorod (Russia, Nizhni Novgorod), nikolai.zolotykh@gmail.com. Алексей Николаевич Половинкин, кафедра математического обеспечения ЭВМ, Нижегородский государственный университет им. Н.И. Лобачевского (Россия, г...

The paper deals with Google’s universal parser SyntaxNet. The system was used to analyze the Univ... more The paper deals with Google’s universal parser SyntaxNet. The system was used to analyze the Universal Dependencies linguistic corpora. We conducted an error analysis of the output of the parser to reveal to what extent the error types are connected with or preconditioned by the language types. In particular, we carried out several experiments, clustering the languages based on the frequency of different errors made by SyntaxNet, and studied the similarity of the resulting clustering with the traditional typology of languages. Three types of errors were separately considered: part-of-speech tagging, dependency labeling, and attachment errors. We show that there is indeed a correlation between error frequencies and language types, which might indicate that to further improve the performance of a universal parser, one needs to take into account language-specific morphological and syntactic structures.

The paper contains a take on the classification problem variation featuring class noise where eac... more The paper contains a take on the classification problem variation featuring class noise where each object in the training set is associated with a probability distribution over the class label set instead of a particular class label. That type of task was illustrated on the complex natural language processing problem – automatic Arabic dialect classification. In the task we have a set of objects that were labeled by a heuristic rule; which could cause errors during automatic annotation process. Suggested approach allows taking into account probabilities of these errors. Described experiments show that even relatively simple accounting of that probabilities helps to significantly improve the quality of the built classifier.

We consider three problems about cities from Alcuin's_Propositiones ad acuendos juvenes_. The... more We consider three problems about cities from Alcuin's_Propositiones ad acuendos juvenes_. These problems can be considered as the earliest packing problems difficult also for modern state-of-the-art packing algorithms. We discuss the Alcuin's solutions and give the known (to the author) best solutions to these problems.

Lecture Notes in Computer Science, 2013

ABSTRACT The problem of optimal parameters selection for the regression construction method using... more ABSTRACT The problem of optimal parameters selection for the regression construction method using Support Vector Machine is stated. Cross validation error function is taken as the criterion. Arising bound constrained nonlinear optimization problem is solved using parallel global search algorithm by R. Strongin with a number of modifications. Efficiency of the proposed approach is demonstrated using model problems. A possibility of the algorithm usage on large-scale cluster systems is evaluated. Linear speed-up of combined parallel global search algorithm is demonstrated.

Pattern Recognition and Image Analysis, 2011

zonnon.ethz.ch

We present our experience with the design and implementation of a mathematical extension for an o... more We present our experience with the design and implementation of a mathematical extension for an object-oriented programming language. Primarily this includes support for multidimensional matrices, indexing by ranges and vectors, sparse data structures. As first class citizens in the language, they lead to a more natural code in scientific programs and enable powerful compiler optimizations. We discuss our concept of smoothly integrating mathematical constructs into the language and compiler on top of .NET. Finally we show that this approach results in a reasonably good performance and thereby we prove the suitability of the (fully managed) .NET platform for high performance computing applications.

SIAM Journal on Discrete Mathematics, 2015

It is known that a minimal teaching set of any threshold function on the twodimensional rectangul... more It is known that a minimal teaching set of any threshold function on the twodimensional rectangular grid consists of 3 or 4 points. We derive exact formulae for the numbers of functions corresponding to these values and further refine them in the case of a minimal teaching set of size 3. We also prove that the average cardinality of the minimal teaching sets of threshold functions is asymptotically 7 /2. We further present corollaries of these results concerning some special arrangements of lines in the plane.

Entropy

The Rosenblatt’s first theorem about the omnipotence of shallow networks states that elementary p... more The Rosenblatt’s first theorem about the omnipotence of shallow networks states that elementary perceptrons can solve any classification problem if there are no discrepancies in the training set. Minsky and Papert considered elementary perceptrons with restrictions on the neural inputs: a bounded number of connections or a relatively small diameter of the receptive field for each neuron at the hidden layer. They proved that under these constraints, an elementary perceptron cannot solve some problems, such as the connectivity of input images or the parity of pixels in them. In this note, we demonstrated Rosenblatt’s first theorem at work, showed how an elementary perceptron can solve a version of the travel maze problem, and analysed the complexity of that solution. We also constructed a deep network algorithm for the same problem. It is much more efficient. The shallow network uses an exponentially large number of neurons on the hidden layer (Rosenblatt’s A-elements), whereas for th...

2020 International Joint Conference on Neural Networks (IJCNN)

Stochastic separation theorems play important role in high-dimensional data analysis and machine ... more Stochastic separation theorems play important role in high-dimensional data analysis and machine learning. It turns out that in high dimension any point of a random set of points can be separated from other points by a hyperplane with high probability even the number of points is exponential in terms of dimension. This and similar facts can be used for constructing correctors for artificial intelligent systems, for determining an intrinsic dimension of data and for explaining various natural intelligence phenomena. In this paper, we refine the bounds for the number of points and for the probability in stochastic separation theorems, thereby strengthening some results obtained in [5], [9], [10]. We give and discuss the bounds for linear and Fisher separability, when the points are drawn randomly, independently and uniformly from a d-dimensional spherical layer. These results allow us to better outline the applicability limits of the stochastic separation theorems in the mentioned applications.

Computers & Operations Research

We further present corollaries of these results concerning some special arrangements of lines in ... more We further present corollaries of these results concerning some special arrangements of lines in the plane.

Entropy

Stochastic separation theorems play important roles in high-dimensional data analysis and machine... more Stochastic separation theorems play important roles in high-dimensional data analysis and machine learning. It turns out that in high dimensional space, any point of a random set of points can be separated from other points by a hyperplane with high probability, even if the number of points is exponential in terms of dimensions. This and similar facts can be used for constructing correctors for artificial intelligent systems, for determining the intrinsic dimensionality of data and for explaining various natural intelligence phenomena. In this paper, we refine the estimations for the number of points and for the probability in stochastic separation theorems, thereby strengthening some results obtained earlier. We propose the boundaries for linear and Fisher separability, when the points are drawn randomly, independently and uniformly from a d-dimensional spherical layer and from the cube. These results allow us to better outline the applicability limits of the stochastic separation ...

BioMed Research International, 2015

Procedia Computer Science

Studies in computational intelligence, Oct 19, 2022

Discrete Applied Mathematics

Studies in Computational Intelligence, 2019

Communications in Computer and Information Science, 2020

Lecture Notes in Computer Science, 2013

Pattern Recognition and Image Analysis, 2011

zonnon.ethz.ch

SIAM Journal on Discrete Mathematics, 2015

Entropy

2020 International Joint Conference on Neural Networks (IJCNN)

Computers & Operations Research

Entropy