henry soldano | Université Sorbonne Paris Nord / Sorbonne Paris Nord University (original) (raw)

Papers by henry soldano

Finding certain regularities in a text is an important problem in many areas, for instance in the... more Finding certain regularities in a text is an important problem in many areas, for instance in the analysis of biological molecules such as nucleic acids or proteins. In the latter case, the text may be sequences of amino acids or a linear coding of 3D structures, and the regularities then correspond to lexical or structural motifs common to two, or more, proteins. We first recall an earlier algorithm allowing to find these regularities in a flexible way. Then we introduce a generalized version of this algorithm designed for the particular case of protein 3D structures, since these structures present a few peculiarities that make them computationally harder to process. Finally, we give some applications of our new algorithm on concrete examples.

We present in this paper a peptide matching approach to the multiple comparison of a set of prote... more We present in this paper a peptide matching approach to the multiple comparison of a set of protein sequences. This approach consists in looking for all the words that are common to q of these sequences, where q is a parameter. The comparison between words is done by using as reference an object called a model. In the case of proteins, a model is a product of subsets of the alphabet Σ of the amino acids. These subsets belong to a cover of Σ, that is, their union covers all of Σ. A word is said to be an instance of a model if it belongs to the model. A further flexibility is introduced in the comparison by allowing for up to e errors in the comparison between a word and a model. These errors may concern gaps or substitutions not allowed by the cover. A word is said to be this time an occurrence of a model if the Levenshtein distance between it and an instance of the model is inferior or equal to e. This corresponds to what we call a Set-Levenshtein distance between the occurrences an...

We present in this paper an algorithm that locates similar words common to a set of strings deene... more We present in this paper an algorithm that locates similar words common to a set of strings deened over an alphabet , where the similarity is stated in terms of a Levenshtein edit distance. The comparison of the words in the strings is realized by using a reference object called a model which is a word over. This allows us to perform a multiple comparison of the strings as opposed to pairwise comparisons, and the algorithm is particularly appropriate for the analysis of DNA/RNA sequences.

Revue d'intelligence artificielle, 2015

The Kluwer International Series in Engineering and Computer Science, 1986

We are interested in the task of Knowledge acquisition in addressing real problems as the interpr... more We are interested in the task of Knowledge acquisition in addressing real problems as the interpretation of complex objects (speech recognition, scene analysis, etc…).

We present in this paper an algorithm that locates similar words common to a set ofstrings define... more We present in this paper an algorithm that locates similar words common to a set ofstrings defined over an alphabet \Sigma, where the similarity is stated in terms of a Levenshteinedit distance. The comparison of the words in the strings is realized by using a referenceobject called a model which is a word over \Sigma. This allows us to perform

Graph-Based Representation and Reasoning

Applied Network Science

Applying closed pattern mining to attributed two-mode networks requires two conditions. First, as... more Applying closed pattern mining to attributed two-mode networks requires two conditions. First, as in two-mode networks there are two kinds of vertices, each described with a proper attribute set, we have to consider patterns made of two components that we call bi-patterns. The occurrences of a bi-pattern forms an extension made of a pair of vertex subsets. Second, Formal Concept Analysis and Closed Pattern Mining were recently applied to networks by reducing the extensions of pattern to their cores, according to some core definition. We need to consider appropriate core definitions for two-mode networks and define accordingly closed bi-patterns. We describe in this article a general framework to define closed bi-pattern mining. We also show that this methodology applies as well to cores of directed and undirected networks in which each vertex subset is associated with a specific role. We illustrate the methodology first on a two-mode network of epistemological data, then on a directed advice network of lawyers and finally on an undirected bibliographical network.

Nous nous intéressons ici à une classe de problèmes d'apprentissage dont la particularité es... more Nous nous intéressons ici à une classe de problèmes d'apprentissage dont la particularité est la suivante: on recherche une caractéristique commune à un ensemble d'objets associés à un concept cible; cependant la description d'un objet est ambigue au sens où elle ...

Revue D Intelligence Artificielle, 2005

La représentation générique d'un ensemble de données que nous utilisons ici est un treillis de Ga... more La représentation générique d'un ensemble de données que nous utilisons ici est un treillis de Galois, c'est-à-dire un treillis correspondant au partitionnement des termes d'un langage en classes d'équivalence relativement à leur extension (l'extension d'un terme est la partie d'un ensemble d'instances qui satisfait ce terme). Pour réduire la taille du treillis, nous proposons ici de simplifier la représentation des données, tout en conservant la structure formelle de treillis de Galois. Pour cela nous utilisons une partition préliminaire des données correspondant à l'association d'un type à chaque instance. En redéfinissant la notion d'extension d'un terme de manière à tenir compte, à un certain degré α, de cette partition, nous aboutissons à des treillis de Galois particuliers appelés treillis de Galois Alpha. Nous étudions ici cette nouvelle notion d'extension, la construction directe ou incrémentale et l'ordonnancement de ces treillis ainsi que les règles d'implications associées. ABSTRACT. Our basic representation of the data is a Galois lattice, i.e. a lattice in which the terms of a representation language are partitioned into equivalence classes w.r.t. their extent (the extent of a term is the part of the instance set that satisfies the term). We propose here to simplify our view of the data, still conserving the Galois lattice formal structure. For that purpose we use a preliminary partition of the instance set, representing the association of a type to each instance. By redefining the notion of extent of a term in order to cope, to a certain degree (denoted as α), with this partition, we define a particular family of Galois lattices denoted as Alpha Galois lattices.

... Relations Nadia Pisanti∗ Henry Soldano∗ Mathilde Carpentier Joel Pothier Dipartimento di ... more ... Relations Nadia Pisanti∗ Henry Soldano∗ Mathilde Carpentier Joel Pothier Dipartimento di Informatica, Universit`a di Pisa, Italy. ... Relations Nadia Pisanti∗ Henry Soldano∗ Mathilde Carpentier Joel Pothier Dipartimento di Informatica, Universit`a di Pisa, Italy. ...

Proceedings International Conference on Intelligent Systems For Molecular Biology Ismb International Conference on Intelligent Systems For Molecular Biology, Feb 1, 1995

We present in this paper an algorithm for the multiple comparison of a set of protein sequences. ... more We present in this paper an algorithm for the multiple comparison of a set of protein sequences. Our approach is that of peptide matching and consists in looking for all the words that occur approximatively in at least q of the sequences in the set, where q is a parameter. Words are compared by using a reference object called a model, that is itself a word over the alphabet of the amino acids, and the comparison between a model and a word is based on w-length words instead of single symbols. This idea is similar to the one used in the Blast program in the case of pairwise comparisons. Two w-length words are considered to be related if an alignment without gaps of the two using a similarity matrix has a score greater than a certain threshold value t. In our case, we say that a k-length word u is an occurrence of a model m of the same length if every w-length subword of u is related to the corresponding subword of m in the sense given above. If a model m has occurrences in at least q of the sequences of the set, m is said to occur in the set. In percentage terms, the value of q may correspond to something as small as 5% of the sequences (search for recurrent words in a set of non homologous proteins) or as high as 70-100% (establishment of a list of all similar words as a first step in a multiple alignment program). The algorithm presented here is an efficient and exact way of looking for all the models, of a fixed length k or of the greatest possible length kmax, that occur in a set of sequences. It can work with any kind of scoring matrix and an extension of the algorithm allows for the introduction of gaps between a model and its occurrences.