D. Loewenstern - Academia.edu (original) (raw)
Papers by D. Loewenstern
conference on network and service management, Oct 24, 2011
IT Service Management (ITSM) encompasses the practices for managing information technology system... more IT Service Management (ITSM) encompasses the practices for managing information technology systems. ITSM processes can be laden with segments where the human becomes a bottleneck and slows down the entire process. These inefficiencies are usually caused by insufficient design of the process itself, or defects in the tools being used. Our work provides a systematic framework for analyzing inefficiencies through a combined model to guide the use and estimate the value of improving orchestration of the process using mashup design patterns.
2012 IEEE Network Operations and Management Symposium, 2012
Multi-domain IT services are delivered by technicians with a variety of expert knowledge in diffe... more Multi-domain IT services are delivered by technicians with a variety of expert knowledge in different areas. Their skills and availability are an important property of the service. However, most organizations do not have a consistent view of this information because creation and maintenance of a skill model is a difficult task, especially in light of privacy regulations, changing service catalogs
Catalog-based service request management To manage the delivery of services competitively on a la... more Catalog-based service request management To manage the delivery of services competitively on a large, global scale, an IT (information technology) service provider must efficiently use service delivery resources—in particular, skilled service delivery teams. Service requests form a large and important component of the management of a client’s IT infrastructure. Currently, the fulfillment of IT service requests is often managed on a per-account basis. Servicedelivery teams fulfill service requests according to account-specific processes by using an account-specific service-request-management environment, making it difficult to leverage the skills of the various delivery teams for multiple accounts. The service delivery management platform (SDMP) uses reusable service components that can be performed by multiple delivery teams and can be assembled into service compositions to which multiple clients can subscribe. The SDMP catalog is the information repository that manages service comp...
If DNA were a random string over its alphabet fA; C; G; T g, an optimal code would assign 2 bits ... more If DNA were a random string over its alphabet fA; C; G; T g, an optimal code would assign 2 bits to each nucleotide. We imagine DNA to be a highly ordered, purposeful molecule, and might therefore reasonably expect statistical models of its string representation to produce much lower entropy estimates. Surprisingly this has not been the case for many natural DNA sequences, including portions of the human genome. We introduce a new statistical model (compression algorithm), the strongest reported to date, for naturally occurring DNA sequences. Conventional techniques code a nucleotide using only slightly fewer bits (1.90) than one obtains by relying only on the frequency statistics of individual nucleotides (1.95). Our method in some cases increases this gap by more than ve-fold (1.66) and may lead to better performance in microbiological pattern recognition applications. One of our main contributions, and the principle source of these improvements, is the formal inclusion of inexact match information in the model. The existence of matches at various distances forms a panel of experts which are then combined into a single prediction. The structure of this combination is novel and its parameters are learned using Expectation Maximization (EM). Experiments are reported using a wide variety of DNA sequences and compared whenever possible with earlier work. Four reasonable notions for the string distance function used to identify near matches, are implemented and experimentally compared. We also report lower entropy estimates for coding regions extracted from a large collection of non-redundant human genes. The conventional estimate is 1.92 bits. Our model produces only slightly better results (1.91 bits) when considering nucleotides, but achieves 1.84-1.87 bits when the prediction problem is divided into two stages: i) predict the next amino acid based on inexact polypeptide matches, and ii) predict the particular codon. Our results suggest that matches at the amino acid level play some role, but a small one, in determining the statistical structure of non-redundant coding sequences.
Automated real-time problem diagnosis is a key feature of a self-healing system. However, rapidly... more Automated real-time problem diagnosis is a key feature of a self-healing system. However, rapidly growing size and complexity of modern distributed systems creates a challenge for traditional centralized diagnostic approaches and calls for parallel and distributed architectures. Dividing the system into subsystems controlled by separate diagnostic engines is an obvious choice; however, on top of that, a communication architecture must be provided that allows diagnostic engines to exchange information about common components in order to obtain better diagnosis. In this paper, we discuss a distributed belief propagation approach to diagnosis and provide a scalable parallel and distributed communication architecture that supports efficient message exchange among diagnostic engines.
Inductive learning methods, such as neural networks and decision trees, have become a popular app... more Inductive learning methods, such as neural networks and decision trees, have become a popular approach to developing DNA sequence identication tools. Such methods attempt to form models of a collection of training data that can be used to predict future data accurately. The common approach to using such methods on DNA sequence identication problems forms models that depend on the absolute locations of nucleotides and assume independence of consecutive nucleotide locations. This paper describes a new class of learning methods, called compression-based induction (CBI), that is geared towards sequence learning problems such as those that arise when learning DNA sequences. The central idea is to use text compression techniques on DNA sequences as the means for generalizing from sample sequences. The resulting methods form models that are based on the more important relative locations of nucleotides and on the dependence of consecutive locations. They also provide a suitable framework into which biological domain knowledge can be injected into the learning process. We present initial explorations of a range of CBI methods that demonstrate the potential of our methods for DNA sequence identication tasks.
Maximizing Management Performance and Quality with Service Analytics, 2015
Genome informatics. Workshop on Genome Informatics, 2000
Today, more and more DNA sequences are becoming available. The information about DNA sequences ar... more Today, more and more DNA sequences are becoming available. The information about DNA sequences are stored in molecular biology databases. The size and importance of these databases will be bigger and bigger in the future, therefore this information must be stored or communicated efficiently. Furthermore, sequence compression can be used to define similarities between biological sequences. The standard compression algorithms such as gzip or compress cannot compress DNA sequences, but only expand them in size. On the other hand, CTW (Context Tree Weighting Method) can compress DNA sequences less than two bits per symbol. These algorithms do not use special structures of biological sequences. Two characteristic structures of DNA sequences are known. One is called palindromes or reverse complements and the other structure is approximate repeats. Several specific algorithms for DNA sequences that use these structures can compress them less than two bits per symbol. In this paper, we impr...
Operations Research/Computer Science Interfaces Series, 2002
... of standard test problems indicates that GRASP is a competitive algorithm for finding approxi... more ... of standard test problems indicates that GRASP is a competitive algorithm for finding approximate solutions of ... 1, 3], local search [26], simulated annealing [27], tabu search [25, 21], and geneticalgorithms [9]. A ... the JSP is to find an orientation of E such that the longest path in G ...
conference on network and service management, Oct 24, 2011
IT Service Management (ITSM) encompasses the practices for managing information technology system... more IT Service Management (ITSM) encompasses the practices for managing information technology systems. ITSM processes can be laden with segments where the human becomes a bottleneck and slows down the entire process. These inefficiencies are usually caused by insufficient design of the process itself, or defects in the tools being used. Our work provides a systematic framework for analyzing inefficiencies through a combined model to guide the use and estimate the value of improving orchestration of the process using mashup design patterns.
2012 IEEE Network Operations and Management Symposium, 2012
Multi-domain IT services are delivered by technicians with a variety of expert knowledge in diffe... more Multi-domain IT services are delivered by technicians with a variety of expert knowledge in different areas. Their skills and availability are an important property of the service. However, most organizations do not have a consistent view of this information because creation and maintenance of a skill model is a difficult task, especially in light of privacy regulations, changing service catalogs
Catalog-based service request management To manage the delivery of services competitively on a la... more Catalog-based service request management To manage the delivery of services competitively on a large, global scale, an IT (information technology) service provider must efficiently use service delivery resources—in particular, skilled service delivery teams. Service requests form a large and important component of the management of a client’s IT infrastructure. Currently, the fulfillment of IT service requests is often managed on a per-account basis. Servicedelivery teams fulfill service requests according to account-specific processes by using an account-specific service-request-management environment, making it difficult to leverage the skills of the various delivery teams for multiple accounts. The service delivery management platform (SDMP) uses reusable service components that can be performed by multiple delivery teams and can be assembled into service compositions to which multiple clients can subscribe. The SDMP catalog is the information repository that manages service comp...
If DNA were a random string over its alphabet fA; C; G; T g, an optimal code would assign 2 bits ... more If DNA were a random string over its alphabet fA; C; G; T g, an optimal code would assign 2 bits to each nucleotide. We imagine DNA to be a highly ordered, purposeful molecule, and might therefore reasonably expect statistical models of its string representation to produce much lower entropy estimates. Surprisingly this has not been the case for many natural DNA sequences, including portions of the human genome. We introduce a new statistical model (compression algorithm), the strongest reported to date, for naturally occurring DNA sequences. Conventional techniques code a nucleotide using only slightly fewer bits (1.90) than one obtains by relying only on the frequency statistics of individual nucleotides (1.95). Our method in some cases increases this gap by more than ve-fold (1.66) and may lead to better performance in microbiological pattern recognition applications. One of our main contributions, and the principle source of these improvements, is the formal inclusion of inexact match information in the model. The existence of matches at various distances forms a panel of experts which are then combined into a single prediction. The structure of this combination is novel and its parameters are learned using Expectation Maximization (EM). Experiments are reported using a wide variety of DNA sequences and compared whenever possible with earlier work. Four reasonable notions for the string distance function used to identify near matches, are implemented and experimentally compared. We also report lower entropy estimates for coding regions extracted from a large collection of non-redundant human genes. The conventional estimate is 1.92 bits. Our model produces only slightly better results (1.91 bits) when considering nucleotides, but achieves 1.84-1.87 bits when the prediction problem is divided into two stages: i) predict the next amino acid based on inexact polypeptide matches, and ii) predict the particular codon. Our results suggest that matches at the amino acid level play some role, but a small one, in determining the statistical structure of non-redundant coding sequences.
Automated real-time problem diagnosis is a key feature of a self-healing system. However, rapidly... more Automated real-time problem diagnosis is a key feature of a self-healing system. However, rapidly growing size and complexity of modern distributed systems creates a challenge for traditional centralized diagnostic approaches and calls for parallel and distributed architectures. Dividing the system into subsystems controlled by separate diagnostic engines is an obvious choice; however, on top of that, a communication architecture must be provided that allows diagnostic engines to exchange information about common components in order to obtain better diagnosis. In this paper, we discuss a distributed belief propagation approach to diagnosis and provide a scalable parallel and distributed communication architecture that supports efficient message exchange among diagnostic engines.
Inductive learning methods, such as neural networks and decision trees, have become a popular app... more Inductive learning methods, such as neural networks and decision trees, have become a popular approach to developing DNA sequence identication tools. Such methods attempt to form models of a collection of training data that can be used to predict future data accurately. The common approach to using such methods on DNA sequence identication problems forms models that depend on the absolute locations of nucleotides and assume independence of consecutive nucleotide locations. This paper describes a new class of learning methods, called compression-based induction (CBI), that is geared towards sequence learning problems such as those that arise when learning DNA sequences. The central idea is to use text compression techniques on DNA sequences as the means for generalizing from sample sequences. The resulting methods form models that are based on the more important relative locations of nucleotides and on the dependence of consecutive locations. They also provide a suitable framework into which biological domain knowledge can be injected into the learning process. We present initial explorations of a range of CBI methods that demonstrate the potential of our methods for DNA sequence identication tasks.
Maximizing Management Performance and Quality with Service Analytics, 2015
Genome informatics. Workshop on Genome Informatics, 2000
Today, more and more DNA sequences are becoming available. The information about DNA sequences ar... more Today, more and more DNA sequences are becoming available. The information about DNA sequences are stored in molecular biology databases. The size and importance of these databases will be bigger and bigger in the future, therefore this information must be stored or communicated efficiently. Furthermore, sequence compression can be used to define similarities between biological sequences. The standard compression algorithms such as gzip or compress cannot compress DNA sequences, but only expand them in size. On the other hand, CTW (Context Tree Weighting Method) can compress DNA sequences less than two bits per symbol. These algorithms do not use special structures of biological sequences. Two characteristic structures of DNA sequences are known. One is called palindromes or reverse complements and the other structure is approximate repeats. Several specific algorithms for DNA sequences that use these structures can compress them less than two bits per symbol. In this paper, we impr...
Operations Research/Computer Science Interfaces Series, 2002
... of standard test problems indicates that GRASP is a competitive algorithm for finding approxi... more ... of standard test problems indicates that GRASP is a competitive algorithm for finding approximate solutions of ... 1, 3], local search [26], simulated annealing [27], tabu search [25, 21], and geneticalgorithms [9]. A ... the JSP is to find an orientation of E such that the longest path in G ...