Giuseppe Narzisi | New York Genome Center (original) (raw)
Papers by Giuseppe Narzisi
Abstract We have developed a novel algorithmic framework for assembling haplotypic genome sequenc... more Abstract We have developed a novel algorithmic framework for assembling haplotypic genome sequences, and thus address a key open problem in the study of populations and polymorphisms, which cannot be solved with the currently available genotypic sequences.
The whole-genome sequence assembly (WGSA) problem is among one of the most studied problems in co... more The whole-genome sequence assembly (WGSA) problem is among one of the most studied problems in computational biology. Despite the availability of a plethora of tools (ie, assemblers), all claiming to have solved the WGSA problem, little has been done to systematically compare their accuracy and power.
Abstract In just the last decade, a multitude of bio-technologies and software pipelines have eme... more Abstract In just the last decade, a multitude of bio-technologies and software pipelines have emerged to revolutionize genomics. To further their central goal, they aim to accelerate and improve the quality of de novo whole-genome assembly starting from short DNA sequences/reads. However, the performance of each of these tools is contingent on the length and quality of the sequencing data, the structure and complexity of the genome sequence, and the resolution and quality of long-range information.
Abstract Since its launch in 2004, the open-source AMOS project has released several innovative D... more Abstract Since its launch in 2004, the open-source AMOS project has released several innovative DNA sequence analysis applications including: Hawkeye, a visual analytics tool for inspecting the structure of genome assemblies; the Assembly Forensics and FRCurve pipelines for systematically evaluating the quality of a genome assembly; and AMOScmp, the first comparative genome assembler.
1Author contributions: BM designed research; GN, VM and BM performed research; GN, VM and BM cont... more 1Author contributions: BM designed research; GN, VM and BM performed research; GN, VM and BM contributed new analytical tools; LN, DR and MT contributed to the clinical aspects of the study design and development; GN, VM and BM wrote the paper; and GN, VM, LN, DR, MT, LH, and IP reviewed the paper.
ABSTRACT In this paper, we describe the agent-based modeling (ABM), simulation and analysis of a ... more ABSTRACT In this paper, we describe the agent-based modeling (ABM), simulation and analysis of a potential Sarin gas attack in the Port Authority Bus Terminal in the island of Manhattan in New York city, USA. The streets and subways of Manhattan have been modeled as a non-planar graph. The people at the terminal are modeled as agents initially moving randomly, but with a resultant drift velocity towards their destinations, eg, work places.
Abstract Motivation. With the recent advent of a multitude of next-generation sequencing (NGS) te... more Abstract Motivation. With the recent advent of a multitude of next-generation sequencing (NGS) technologies (characterized by high throughput but relatively shorter read length), de novo DNA sequence assembly has become again one of the most prominent problems in Genomics and Computational Biology. Although algorithmic improvements play an important role in sequence assembly, the complexity of the problem is strongly reduced if higher quality (low error rate) sequences can be generated.
FRCbam computes features using a sliding window of size W. By default W is set to 1 Kbp, and in e... more FRCbam computes features using a sliding window of size W. By default W is set to 1 Kbp, and in each step it slides by 200 bp. Let A denote a genome to be assembled (ie, in other words it is the desired output). Let R={r1 1, r2 1,..., r1 n, r2 n} denote the set of sequenced paired reads from A. Pairs are at a known estimated distance, d (and standard variation, v) and with known orientations. FRCbam input is:
Abstract. Support Vector machines (SVMs) are a powerful method for both regression and classifica... more Abstract. Support Vector machines (SVMs) are a powerful method for both regression and classification. However, any SVM formulation requires the user to set two or more parameters which govern the training process and such parameters can have a strong effect on the result performance of the engine. Moreover, the design of learning systems is inherently a multi-objective optimization problem. It requires to find a suitable trade-off between at least two conflicting objectives: model complexity and accuracy.
ABSTRACT In this paper, we describe the agent-based modeling (ABM), simulation and analysis of a ... more ABSTRACT In this paper, we describe the agent-based modeling (ABM), simulation and analysis of a potential Sarin gas attack at the Port Authority Bus Terminal in the island of Manhattan in New York city, USA. The streets and subways of Manhattan have been modeled as a non-planar graph. The people at the terminal are modeled as agents initially moving randomly, but with a resultant drift velocity towards their destinations, eg, work places.
Complex Systems are often characterized by agents capable of interacting with each other dynamica... more Complex Systems are often characterized by agents capable of interacting with each other dynamically, often in non-linear and non-intuitive ways. Trying to characterize their dynamics often results in partial differential equations that are difficult, if not impossible, to solve. A large city or a city-state is an example of such an evolving and self-organizing complex environment that efficiently adapts to different and numerous incremental changes to its social, cultural and technological infrastructure [2].
Abstract. Numerical optimization of given objective functions is a crucial task in many real-life... more Abstract. Numerical optimization of given objective functions is a crucial task in many real-life problems. The present article introduces an immunological algorithm for continuous global optimization problems, called opt-IA. Several biologically inspired algorithms have been designed during the last few years and have shown to have very good performance on standard test bed for numerical optimization.
Abstract Natural proteins quickly fold into a complicated three-dimensional structure. Evolutiona... more Abstract Natural proteins quickly fold into a complicated three-dimensional structure. Evolutionary algorithms have been used to predict the native structure with the lowest energy conformation of the primary sequence of a given protein. Successful structure prediction requires a free energy function sufficiently close to the true potential for the native state, as well as a method for exploring the conformational space.
Abstract The protein structure prediction (PSP) problem is concerned with the prediction of the f... more Abstract The protein structure prediction (PSP) problem is concerned with the prediction of the folded, native, tertiary structure of a protein given its sequence of amino acids. It is a challenging and computationally open problem, as proven by the numerous methodological attempts and the research effort applied to it in the last few years.
Finding the native structure of a protein starting from its amino acid sequence remains one of th... more Finding the native structure of a protein starting from its amino acid sequence remains one of the most challenging open problems in bioinformatics and molecular biology. The Protein Structure Prediction (PSP) problem has been tackled from many different directions. The common approach is to cast it in the form of a global single-objective optimization problem using energy functions to evaluate the physical state of the conformations.
One of the main issues in ABM is to build models at the appropriate level of description, using t... more One of the main issues in ABM is to build models at the appropriate level of description, using the requisite level of details in order to produce a system that serves its analytical purpose. The details of our model have been summarized below from [2, 3, 5, 6, 4]. The table 1 shows the main parameters that the user can modify.
Yahoo!Clusty1 is a Clustering Meta-search Engine (MSE) that allows users to send queries to Yahoo... more Yahoo!Clusty1 is a Clustering Meta-search Engine (MSE) that allows users to send queries to Yahoo!. The returned snippets are grouped into homogeneous groups by topic. The objective of this project has been to create a flexible MSE for the Yahoo! web search engine. The purpose is to present the results returned to a query in a more structured format which will allow the user to easily explore them by category.
Page 1. Scoring-and-Unfolding Trimmed Tree Assembler: Algorithms for Assembling Genome Sequences ... more Page 1. Scoring-and-Unfolding Trimmed Tree Assembler: Algorithms for Assembling Genome Sequences Accurately and Efficiently by Giuseppe Narzisi A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of Computer Science Courant Institute of Mathematical Sciences New York University May 2011 Bud Mishra — Advisor Page 2. c Giuseppe Narzisi All Rights Reserved, 2011 Page 3. To Valentina & My family iii Page 4.
Artificial Immune Systems, Jan 1, 2005
This paper presents a comparative study of two important Clonal Selection Algorithms (CSAs): CLON... more This paper presents a comparative study of two important Clonal Selection Algorithms (CSAs): CLONALG and opt-IA. To deeply understand the performance of both algorithms, we deal with four different classes of problems: toy problems (one-counting and trap functions), pattern recognition, numerical optimization problems and NP-complete problem (the 2D HP model for protein structure prediction problem). Two possible versions of CLONALG have been implemented and tested. The experimental results show a global better performance of opt-IA with respect to CLONALG. Considering the results obtained, we can claim that CSAs represent a new class of Evolutionary Algorithms for effectively performing searching, learning and optimization tasks.
Applications on Evolutionary Computing, Jan 1, 2005
In this work we investigate the applicability of a multiobjective formulation of the Ab-Initio Pr... more In this work we investigate the applicability of a multiobjective formulation of the Ab-Initio Protein Structure Prediction (PSP) to medium size protein sequences (46-70 residues). In particular, we introduce a modified version of Pareto Archived Evolution Strategy (PAES) which makes use of immune inspired computing principles and which we will denote by "I-PAES". Experimental results on the test bed of five proteins from PDB show that PAES, (1+1)-PAES and its modified version I-PAES, are optimal multiobjective optimization algorithms and the introduced mutation operators, mut1 and mut2, are effective for the PSP problem. The proposed I-PAES is comparable with other evolutionary algorithms proposed in literature, both in terms of best solution found and computational cost.
Abstract We have developed a novel algorithmic framework for assembling haplotypic genome sequenc... more Abstract We have developed a novel algorithmic framework for assembling haplotypic genome sequences, and thus address a key open problem in the study of populations and polymorphisms, which cannot be solved with the currently available genotypic sequences.
The whole-genome sequence assembly (WGSA) problem is among one of the most studied problems in co... more The whole-genome sequence assembly (WGSA) problem is among one of the most studied problems in computational biology. Despite the availability of a plethora of tools (ie, assemblers), all claiming to have solved the WGSA problem, little has been done to systematically compare their accuracy and power.
Abstract In just the last decade, a multitude of bio-technologies and software pipelines have eme... more Abstract In just the last decade, a multitude of bio-technologies and software pipelines have emerged to revolutionize genomics. To further their central goal, they aim to accelerate and improve the quality of de novo whole-genome assembly starting from short DNA sequences/reads. However, the performance of each of these tools is contingent on the length and quality of the sequencing data, the structure and complexity of the genome sequence, and the resolution and quality of long-range information.
Abstract Since its launch in 2004, the open-source AMOS project has released several innovative D... more Abstract Since its launch in 2004, the open-source AMOS project has released several innovative DNA sequence analysis applications including: Hawkeye, a visual analytics tool for inspecting the structure of genome assemblies; the Assembly Forensics and FRCurve pipelines for systematically evaluating the quality of a genome assembly; and AMOScmp, the first comparative genome assembler.
1Author contributions: BM designed research; GN, VM and BM performed research; GN, VM and BM cont... more 1Author contributions: BM designed research; GN, VM and BM performed research; GN, VM and BM contributed new analytical tools; LN, DR and MT contributed to the clinical aspects of the study design and development; GN, VM and BM wrote the paper; and GN, VM, LN, DR, MT, LH, and IP reviewed the paper.
ABSTRACT In this paper, we describe the agent-based modeling (ABM), simulation and analysis of a ... more ABSTRACT In this paper, we describe the agent-based modeling (ABM), simulation and analysis of a potential Sarin gas attack in the Port Authority Bus Terminal in the island of Manhattan in New York city, USA. The streets and subways of Manhattan have been modeled as a non-planar graph. The people at the terminal are modeled as agents initially moving randomly, but with a resultant drift velocity towards their destinations, eg, work places.
Abstract Motivation. With the recent advent of a multitude of next-generation sequencing (NGS) te... more Abstract Motivation. With the recent advent of a multitude of next-generation sequencing (NGS) technologies (characterized by high throughput but relatively shorter read length), de novo DNA sequence assembly has become again one of the most prominent problems in Genomics and Computational Biology. Although algorithmic improvements play an important role in sequence assembly, the complexity of the problem is strongly reduced if higher quality (low error rate) sequences can be generated.
FRCbam computes features using a sliding window of size W. By default W is set to 1 Kbp, and in e... more FRCbam computes features using a sliding window of size W. By default W is set to 1 Kbp, and in each step it slides by 200 bp. Let A denote a genome to be assembled (ie, in other words it is the desired output). Let R={r1 1, r2 1,..., r1 n, r2 n} denote the set of sequenced paired reads from A. Pairs are at a known estimated distance, d (and standard variation, v) and with known orientations. FRCbam input is:
Abstract. Support Vector machines (SVMs) are a powerful method for both regression and classifica... more Abstract. Support Vector machines (SVMs) are a powerful method for both regression and classification. However, any SVM formulation requires the user to set two or more parameters which govern the training process and such parameters can have a strong effect on the result performance of the engine. Moreover, the design of learning systems is inherently a multi-objective optimization problem. It requires to find a suitable trade-off between at least two conflicting objectives: model complexity and accuracy.
ABSTRACT In this paper, we describe the agent-based modeling (ABM), simulation and analysis of a ... more ABSTRACT In this paper, we describe the agent-based modeling (ABM), simulation and analysis of a potential Sarin gas attack at the Port Authority Bus Terminal in the island of Manhattan in New York city, USA. The streets and subways of Manhattan have been modeled as a non-planar graph. The people at the terminal are modeled as agents initially moving randomly, but with a resultant drift velocity towards their destinations, eg, work places.
Complex Systems are often characterized by agents capable of interacting with each other dynamica... more Complex Systems are often characterized by agents capable of interacting with each other dynamically, often in non-linear and non-intuitive ways. Trying to characterize their dynamics often results in partial differential equations that are difficult, if not impossible, to solve. A large city or a city-state is an example of such an evolving and self-organizing complex environment that efficiently adapts to different and numerous incremental changes to its social, cultural and technological infrastructure [2].
Abstract. Numerical optimization of given objective functions is a crucial task in many real-life... more Abstract. Numerical optimization of given objective functions is a crucial task in many real-life problems. The present article introduces an immunological algorithm for continuous global optimization problems, called opt-IA. Several biologically inspired algorithms have been designed during the last few years and have shown to have very good performance on standard test bed for numerical optimization.
Abstract Natural proteins quickly fold into a complicated three-dimensional structure. Evolutiona... more Abstract Natural proteins quickly fold into a complicated three-dimensional structure. Evolutionary algorithms have been used to predict the native structure with the lowest energy conformation of the primary sequence of a given protein. Successful structure prediction requires a free energy function sufficiently close to the true potential for the native state, as well as a method for exploring the conformational space.
Abstract The protein structure prediction (PSP) problem is concerned with the prediction of the f... more Abstract The protein structure prediction (PSP) problem is concerned with the prediction of the folded, native, tertiary structure of a protein given its sequence of amino acids. It is a challenging and computationally open problem, as proven by the numerous methodological attempts and the research effort applied to it in the last few years.
Finding the native structure of a protein starting from its amino acid sequence remains one of th... more Finding the native structure of a protein starting from its amino acid sequence remains one of the most challenging open problems in bioinformatics and molecular biology. The Protein Structure Prediction (PSP) problem has been tackled from many different directions. The common approach is to cast it in the form of a global single-objective optimization problem using energy functions to evaluate the physical state of the conformations.
One of the main issues in ABM is to build models at the appropriate level of description, using t... more One of the main issues in ABM is to build models at the appropriate level of description, using the requisite level of details in order to produce a system that serves its analytical purpose. The details of our model have been summarized below from [2, 3, 5, 6, 4]. The table 1 shows the main parameters that the user can modify.
Yahoo!Clusty1 is a Clustering Meta-search Engine (MSE) that allows users to send queries to Yahoo... more Yahoo!Clusty1 is a Clustering Meta-search Engine (MSE) that allows users to send queries to Yahoo!. The returned snippets are grouped into homogeneous groups by topic. The objective of this project has been to create a flexible MSE for the Yahoo! web search engine. The purpose is to present the results returned to a query in a more structured format which will allow the user to easily explore them by category.
Page 1. Scoring-and-Unfolding Trimmed Tree Assembler: Algorithms for Assembling Genome Sequences ... more Page 1. Scoring-and-Unfolding Trimmed Tree Assembler: Algorithms for Assembling Genome Sequences Accurately and Efficiently by Giuseppe Narzisi A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of Computer Science Courant Institute of Mathematical Sciences New York University May 2011 Bud Mishra — Advisor Page 2. c Giuseppe Narzisi All Rights Reserved, 2011 Page 3. To Valentina & My family iii Page 4.
Artificial Immune Systems, Jan 1, 2005
This paper presents a comparative study of two important Clonal Selection Algorithms (CSAs): CLON... more This paper presents a comparative study of two important Clonal Selection Algorithms (CSAs): CLONALG and opt-IA. To deeply understand the performance of both algorithms, we deal with four different classes of problems: toy problems (one-counting and trap functions), pattern recognition, numerical optimization problems and NP-complete problem (the 2D HP model for protein structure prediction problem). Two possible versions of CLONALG have been implemented and tested. The experimental results show a global better performance of opt-IA with respect to CLONALG. Considering the results obtained, we can claim that CSAs represent a new class of Evolutionary Algorithms for effectively performing searching, learning and optimization tasks.
Applications on Evolutionary Computing, Jan 1, 2005
In this work we investigate the applicability of a multiobjective formulation of the Ab-Initio Pr... more In this work we investigate the applicability of a multiobjective formulation of the Ab-Initio Protein Structure Prediction (PSP) to medium size protein sequences (46-70 residues). In particular, we introduce a modified version of Pareto Archived Evolution Strategy (PAES) which makes use of immune inspired computing principles and which we will denote by "I-PAES". Experimental results on the test bed of five proteins from PDB show that PAES, (1+1)-PAES and its modified version I-PAES, are optimal multiobjective optimization algorithms and the introduced mutation operators, mut1 and mut2, are effective for the PSP problem. The proposed I-PAES is comparable with other evolutionary algorithms proposed in literature, both in terms of best solution found and computational cost.