Fabio Stella | Università degli Studi di Milano-Bicocca (original) (raw)

Papers by Fabio Stella

Research paper thumbnail of Application of influence diagrams for well contamination risk management: a case study in the Po plain, northern Italy

Hydrogeology Journal, 2018

The aquifer of the Oltrepò Pavese plain (northern Italy) is affected by paleo-saltwater intrusion... more The aquifer of the Oltrepò Pavese plain (northern Italy) is affected by paleo-saltwater intrusions that pose a contamination risk to water wells. The report first briefly describes how the presence of saline water can be predicted using geophysical investigations (electrical resistivity tomography or electromagnetic surveys) and a machine-learning tool specifically developed for the investigated area. Then, a probabilistic graphical model for addressing the risk of well contamination is presented. The model, a socalled ‘influence diagram’, allows researchers to compute the conditional probability that groundwater is unsuitable for use taking into account the results of the geophysical surveys, the predictions of the machine-learning software, the related uncertainties and the prior probability of contamination in different sectors of the plain. The model, in addition, allows for calculation and comparison of the expected utility of alternative decisions (drilling or not drilling the well, or using another water source). The model is designed for use in ordinary decision situations and, although conceived for a specific area, provides an example that may be adapted to other cases. Some adaptations and generalizations of the model are also discussed.

Research paper thumbnail of A class of stochastic optimization algorithms applied to some problems in Bayesian statistics

We consider interworking between statistical procedures for recovering distribution of random par... more We consider interworking between statistical procedures for recovering distribution of random parameters from observations and stochastic programming techniques, in particular stochastic gradient (quasigradient) methods. The proposed problem formulation is based upon a class of statistical models known as Bayesian networks. The reason for the latter choice is that Bayesian networks are powerful and general statistical models which emerged recently within the more general framework of Bayesian statistics, which is specifically designed for cases when the vector of random parameters can have considerable dimension and/or it is difficult to come up with traditional parametrical models of the joint distribution of random parameters. We define optimization problems on Bayesian networks. For solutions of this problem we develop algorithms for sensitivity analysis of such net and present combined optimization and sampling techniques.

Research paper thumbnail of A GAP Formulation for Solving Production Planning Problems in Telecom Industry

Operations Research Proceedings, 1995

Research paper thumbnail of Bayesian Belief Networks for Data Cleaning

Product or company names used in this set are for identification purposes only. Inclusion of the ... more Product or company names used in this set are for identification purposes only. Inclusion of the names of the products or companies does not indicate a claim of ownership by IGI Global of the trademark or registered trademark.

Research paper thumbnail of Constant rebalanced portfolios and side-information

Quantitative Finance, 2007

In last years there has been much work to design and to analyze online investment strategies base... more In last years there has been much work to design and to analyze online investment strategies based on constant rebalanced portfolios. A constant rebalanced portfolio is a sequential investment strategy which keeps fixed through time, trading period by trading period, the wealth distribution among a set of assets. In this framework Cover proposed the universal portfolio that is competitive with the best constant rebalanced portfolio determined in hindsight, i.e. the constant rebalanced portfolio obtained by assuming perfect knowledge of future stocks prices. However, the constant rebalanced portfolio is designed to deal with the portfolio selection problem in the case when no additional information, about the stock market, is available. To overcome this limitation, Cover and Ordentlich proposed the state constant rebalanced portfolio which is capable to appropriately exploit the available side-information about the stock market. In this paper we study and analyze the topic introduced by Cover and Ordentlich and focus the attention to the interplay between constant rebalanced portfolios and side-information. We introduce a mathematical framework to deal with constant rebalanced portfolio in the case when sideinformation, about the stock market, is available. The mathematical framework defines and analyzes the mixture best constant rebalanced portfolio which is proposed as the investment benchmark to be considered in the case when side-information, about the stock market, is available. The mixture best constant rebalanced portfolio outperforms the best constant rebalanced portfolio by an exponential factor in terms of the achieved wealth and therefore offers an interesting opportunity for side-information specialized online investment algorithms. We describe a new online investment algorithm which exploits the definition of mixture best constant rebalanced portfolio and the available side-information. The performance of the proposed online investment algorithm is investigated through a set of numerical experiments concerning four major stock market datasets, namely DJIA, S&P500, TSE and NYSE. The results obtained emphasize the relevance of the proposed online investment strategy and underline the central role of the side-information quality to outperform the best constant rebalanced portfolio.

Research paper thumbnail of PROFILING NEURAL NETWORKS FOR OPTION PRICING

International Journal of Theoretical and Applied Finance, 2000

In recent years the problem of option pricing has received increasing interest from both financia... more In recent years the problem of option pricing has received increasing interest from both financial institutions and academics. It is well known that conventional modeling techniques for option pricing have inherent, persistent and systematic biases which are mainly due to the assumption of constant volatility for prices associated with the underlying financial instrument. Nowadays, there is strong and increasing evidence that financial markets are far from being stationary and then, whenever dealing with option pricing, we have to take into account the market heteroschedasticity. A possible approach for dealing with non-constant volatility relies on the modeling of the basic characteristics named implied volatility. Unfortunately this task is extremely complex and parametric models are not available. In this paper the authors discuss how models from the class of Feedforward Neural Networks can be exploited for approaching the task of implied volatility modeling. In particular the paper shows how the main techniques from the nonlinear regression framework can be exploited when models from the class of Feedforward Neural Networks are used. Indeed, in such a case the paucity of data, which can be used for the network training, and the particular structure of Feedforward Neural Networks make the modeling task numerically complex. The authors discuss how the nonlinear regression technique named profile can be exploited for selecting the optimal network's structure and evaluating its numeical properties. To this end, a numerical procedure for empirical model building, in the case of Feedforward Neural Networks, has been developed. Results are evaluated through an ad-hoc procedure which utilizes the estimated implied volatility surface for pricing general contingent claims. Numerical experiments, in the case of the USD/DEM options, are presented and discussed.

Research paper thumbnail of Nonlinear Regression and Neural Networks

International Journal of Mathematical Algorithms, 2000

In this paper the authors are concerned with the problem of empirical model building when models ... more In this paper the authors are concerned with the problem of empirical model building when models from the class of Feedforward Neural Networks are considered. In this case, empirical model building consists of the following four tasks: network's structure selection, ...

Research paper thumbnail of A Bayesian approach for constructing implied volatility surfaces through neural networks

The Journal of Computational Finance, 2000

In this paper the authors present a new option pricing scheme which deals with non constant volat... more In this paper the authors present a new option pricing scheme which deals with non constant volatility for the price of the underlying asset. The main feature, of the proposed pricing scheme, consists of exploiting recent developments, about Bayesian learning, within the ...

Research paper thumbnail of A continuous time Bayesian network model for cardiogenic heart failure

Flexible Services and Manufacturing Journal, 2012

Continuous time Bayesian networks are used to diagnose cardiogenic heart failure and to anticipat... more Continuous time Bayesian networks are used to diagnose cardiogenic heart failure and to anticipate its likely evolution. The proposed model overcomes the strong modeling and computational limitations of dynamic Bayesian networks. It consists of both unobservable physiological variables, and clinically and instrumentally observable events which might support diagnosis like myocardial infarction and the future occurrence of shock. Three case studies related to cardiogenic heart failure are presented. The model predicts the occurrence of complicating diseases and the persistence of heart failure according to variations of the evidence gathered from the patient. Predictions are shown to be consistent with current pathophysiological medical understanding of clinical pictures.

Research paper thumbnail of Classification of dendritic cell phenotypes from gene expression data

BMC Immunology, 2011

Background: The selection of relevant genes for sample classification is a common task in many ge... more Background: The selection of relevant genes for sample classification is a common task in many gene expression studies. Although a number of tools have been developed to identify optimal gene expression signatures, they often generate gene lists that are too long to be exploited clinically. Consequently, researchers in the field try to identify the smallest set of genes that provide good sample classification. We investigated the genome-wide expression of the inflammatory phenotype in dendritic cells. Dendritic cells are a complex group of cells that play a critical role in vertebrate immunity. Therefore, the prediction of the inflammatory phenotype in these cells may help with the selection of immune-modulating compounds. Results: A data mining protocol was applied to microarray data for murine cell lines treated with various inflammatory stimuli. The learning and validation data sets consisted of 155 and 49 samples, respectively. The data mining protocol reduced the number of probe sets from 5,802 to 10, then from 10 to 6 and finally from 6 to 3. The performances of a set of supervised classification models were compared. The best accuracy, when using the six following genes -Il12b, Cd40, Socs3, Irgm1, Plin2 and Lgals3bp-was obtained by Tree Augmented Naïve Bayes and Nearest Neighbour (91.8%). Using the smallest set of three genes -Il12b, Cd40 and Socs3-the performance remained satisfactory and the best accuracy was with Support Vector Machine (95.9%). These data mining models, using data for the genes Il12b, Cd40 and Socs3, were validated with a human data set consisting of 27 samples. Support Vector Machines (71.4%) and Nearest Neighbour (92.6%) gave the worst performances, but the remaining models correctly classified all the 27 samples.

Research paper thumbnail of Determining factors in ICT adoption by MSME's in agriculture clusters: An exploratory case study

IEEE 7th International Conference on Research Challenges in Information Science (RCIS), 2013

In this paper we consider the case of the ICT adoption and use in an agriculture cluster in Lomba... more In this paper we consider the case of the ICT adoption and use in an agriculture cluster in Lombardy, a northern region of Italy. At the state of the art, relationships among key factors of adoption and use of ICT in agriculture area received little attention by the academic literature. Thus, in this paper we aim to identify a research model in order to provide evidence of four different research questions concerning the determining factors for ICT adoption. The proposed case study reports and discusses the results obtained by analysing data from a survey of about 600 agricultural farms. Finally, Belief Bayesian Networks (BBNs) are used to analyse the complex influence relationships detected between research variables.

Research paper thumbnail of Continuous Time Bayesian Networks for Gene Network Reconstruction: A Comparative Study on Time Course Data

Lecture Notes in Computer Science, 2014

Dynamic aspects of regulatory networks are typically investigated by measuring relevant variables... more Dynamic aspects of regulatory networks are typically investigated by measuring relevant variables at multiple points in time. Current state-of-the-art approaches for gene network reconstruction directly build on such data, making the strong assumption that the system evolves in a synchronous fashion and in discrete time. However, omics data generated with increasing time-course granularity allow to model gene networks as systems whose state evolves in continuous time, thus improving the model's expressiveness. In this work continuous time Bayesian networks are proposed as a new approach for regulatory network reconstruction from time-course expression data. Their performance is compared to that of two state-of-the-art methods: dynamic Bayesian networks and Granger causality. The comparison is accomplished using both simulated and experimental data. Continuous time Bayesian networks achieve the highest F-measure on both datasets. Furthermore, precision, recall and F-measure degrade in a smoother way than those of dynamic Bayesian networks and Granger causality, when the complexity of the gene regulatory network increases.

Research paper thumbnail of Gene network inference using continuous time Bayesian networks: a comparative study and application to Th17 cell differentiation

BMC Bioinformatics, 2014

Background: Dynamic aspects of gene regulatory networks are typically investigated by measuring s... more Background: Dynamic aspects of gene regulatory networks are typically investigated by measuring system variables at multiple time points. Current state-of-the-art computational approaches for reconstructing gene networks directly build on such data, making a strong assumption that the system evolves in a synchronous fashion at fixed points in time. However, nowadays omics data are being generated with increasing time course granularity. Thus, modellers now have the possibility to represent the system as evolving in continuous time and to improve the models' expressiveness. Results: Continuous time Bayesian networks are proposed as a new approach for gene network reconstruction from time course expression data. Their performance was compared to two state-of-the-art methods: dynamic Bayesian networks and Granger causality analysis. On simulated data, the methods comparison was carried out for networks of increasing size, for measurements taken at different time granularity densities and for measurements unevenly spaced over time. Continuous time Bayesian networks outperformed the other methods in terms of the accuracy of regulatory interactions learnt from data for all network sizes. Furthermore, their performance degraded smoothly as the size of the network increased. Continuous time Bayesian networks were significantly better than dynamic Bayesian networks for all time granularities tested and better than Granger causality for dense time series. Both continuous time Bayesian networks and Granger causality performed robustly for unevenly spaced time series, with no significant loss of performance compared to the evenly spaced case, while the same did not hold true for dynamic Bayesian networks. The comparison included the IRMA experimental datasets which confirmed the effectiveness of the proposed method. Continuous time Bayesian networks were then applied to elucidate the regulatory mechanisms controlling murine T helper 17 (Th17) cell differentiation and were found to be effective in discovering well-known regulatory mechanisms, as well as new plausible biological insights. Conclusions: Continuous time Bayesian networks were effective on networks of both small and large size and were particularly feasible when the measurements were not evenly distributed over time. Reconstruction of the murine Th17 cell differentiation network using continuous time Bayesian networks revealed several autocrine loops, suggesting that Th17 cells may be auto regulating their own differentiation process.

Research paper thumbnail of Analyzing user reviews in tourism with topic models

Information Technology & Tourism, 2015

Research paper thumbnail of An Integrated Forecasting and Regularization Framework for Light Rail Transit Systems

J Intell Transport Syst, 2006

In recent years, with half the world's population living in towns and cities and most of them rel... more In recent years, with half the world's population living in towns and cities and most of them relying heavily on public transport to meet their mobility needs, efficient and effective public transport operations have become critical to sustainable economic and social development. Nowadays, Light Rail Transit Systems are considered to be the most promising technological approach to satisfy these needs, i.e. to ensure efficient and reliable urban mobility. However, Light Rail Transit Systems are subject to frequent minor disrupted transit operations, often caused by stochastic variations of passenger demand at stations and traffic conditions on the service routes, which increase passenger waiting times discouraging them from using the transit system. Although these minor disruptions usually last no longer than a few minutes, they can degrade the level of service significantly on a short headway service. In this paper the authors propose a real-time disruption control model for Light Rail Transit Systems based on an integrated quantitative forecasting and regularization approach. The forecasting component relies on Artificial Neural Networks, a non-parametric computational model that has proved to be particularly efficient for the forecasting task in several applicative domains. The regularization engine involves the formulation of a constrained mathematical programming problem which can be solved quickly and therefore is well suited for real-time disruption control. The conceptual model is applied to a case study concerning the transit line number 7 operating in the urban area of Milan. To validate the proposed forecasting and regularization framework an experimental plan has been designed and performed under different traffic and passengers demand fluctuation conditions. The results of the simulation study witness the efficacy of the overall approach to forecast and regularize the considered Light Rail Transit System.

Research paper thumbnail of Conditional Log-Likelihood for Continuous Time Bayesian Network Classifiers

Lecture Notes in Artificial Intelligence, Sep 23, 2013

Continuous time Bayesian network classifiers are designed for analyzing multivariate streaming da... more Continuous time Bayesian network classifiers are designed for analyzing multivariate streaming data when time duration of events matters. New continuous time Bayesian network classifiers are introduced while their conditional log-likelihood scoring function is developed. A learning algorithm, combining conditional log-likelihood with Bayesian parameter estimation is developed. Classification accuracy values achieved on synthetic and real data by continuous time and dynamic Bayesian network classifiers are compared. Numerical experiments show that the proposed approach outperforms dynamic Bayesian network classifiers and continuous time Bayesian network classifiers learned with log-likelihood.

Research paper thumbnail of Nonstationary Optimization Approach for Finding Universal Portfolios

This paper extends to continuous time the concept of universal portfolio introduced by Cover (199... more This paper extends to continuous time the concept of universal portfolio introduced by Cover (1991). Being a performance weighted average of constant rebalanced portfolios, the universal portfolio outperforms constant rebalanced and buy-and-hold portfolios exponentially over the long run. an asymptotic formula summarizing its long-term performance is reported that supplements the one given by Cover. A criterion in terms of long-term averages of instantaneous stock drifts and covariances is found which determines the particular form of the asymptotic growth. A formula for the expected universal wealth is given. Copyright 1992 Blackwell Publishers.

Research paper thumbnail of Numerical techniques for solving estimation problems on robust Bayesian networks

Institute of Mathematical Statistics Lecture Notes - Monograph Series, 1996

AMS 1991 Subject Classifications. Primary: 30E05, secondary: 65K10.

Research paper thumbnail of A Software System for Topic Extraction and Document Classification

2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology, 2009

A software system for topic extraction and automatic document classification is presented. Given ... more A software system for topic extraction and automatic document classification is presented. Given a set of documents, the system automatically extracts the mentioned topics and assists the user to select their optimal number. The user-validated topics are exploited to build a model for multi-label document classification. While topic extraction is performed by using an optimized implementation of the Latent Dirichlet Allocation model, multi-label document classification is performed by using a specialized version of the Multi-Net Naive Bayes model. The performance of the system is investigated by using 10,056 documents retrieved from the WEB through a set of queries formed by exploiting the Italian Google Directory. This dataset is used for topic extraction while an independent dataset, consisting of 1,012 elements labeled by humans, is used to evaluate the performance of the Multi-Net Naive Bayes model. The results are satisfactory, with precision being consistently better than recall for the labels associated with the four most frequent topics.

Research paper thumbnail of Probabilistic Topic Discovery and Automatic Document Tagging

Perspectives and Applications, 2012

Research paper thumbnail of Application of influence diagrams for well contamination risk management: a case study in the Po plain, northern Italy

Hydrogeology Journal, 2018

The aquifer of the Oltrepò Pavese plain (northern Italy) is affected by paleo-saltwater intrusion... more The aquifer of the Oltrepò Pavese plain (northern Italy) is affected by paleo-saltwater intrusions that pose a contamination risk to water wells. The report first briefly describes how the presence of saline water can be predicted using geophysical investigations (electrical resistivity tomography or electromagnetic surveys) and a machine-learning tool specifically developed for the investigated area. Then, a probabilistic graphical model for addressing the risk of well contamination is presented. The model, a socalled ‘influence diagram’, allows researchers to compute the conditional probability that groundwater is unsuitable for use taking into account the results of the geophysical surveys, the predictions of the machine-learning software, the related uncertainties and the prior probability of contamination in different sectors of the plain. The model, in addition, allows for calculation and comparison of the expected utility of alternative decisions (drilling or not drilling the well, or using another water source). The model is designed for use in ordinary decision situations and, although conceived for a specific area, provides an example that may be adapted to other cases. Some adaptations and generalizations of the model are also discussed.

Research paper thumbnail of A class of stochastic optimization algorithms applied to some problems in Bayesian statistics

We consider interworking between statistical procedures for recovering distribution of random par... more We consider interworking between statistical procedures for recovering distribution of random parameters from observations and stochastic programming techniques, in particular stochastic gradient (quasigradient) methods. The proposed problem formulation is based upon a class of statistical models known as Bayesian networks. The reason for the latter choice is that Bayesian networks are powerful and general statistical models which emerged recently within the more general framework of Bayesian statistics, which is specifically designed for cases when the vector of random parameters can have considerable dimension and/or it is difficult to come up with traditional parametrical models of the joint distribution of random parameters. We define optimization problems on Bayesian networks. For solutions of this problem we develop algorithms for sensitivity analysis of such net and present combined optimization and sampling techniques.

Research paper thumbnail of A GAP Formulation for Solving Production Planning Problems in Telecom Industry

Operations Research Proceedings, 1995

Research paper thumbnail of Bayesian Belief Networks for Data Cleaning

Product or company names used in this set are for identification purposes only. Inclusion of the ... more Product or company names used in this set are for identification purposes only. Inclusion of the names of the products or companies does not indicate a claim of ownership by IGI Global of the trademark or registered trademark.

Research paper thumbnail of Constant rebalanced portfolios and side-information

Quantitative Finance, 2007

In last years there has been much work to design and to analyze online investment strategies base... more In last years there has been much work to design and to analyze online investment strategies based on constant rebalanced portfolios. A constant rebalanced portfolio is a sequential investment strategy which keeps fixed through time, trading period by trading period, the wealth distribution among a set of assets. In this framework Cover proposed the universal portfolio that is competitive with the best constant rebalanced portfolio determined in hindsight, i.e. the constant rebalanced portfolio obtained by assuming perfect knowledge of future stocks prices. However, the constant rebalanced portfolio is designed to deal with the portfolio selection problem in the case when no additional information, about the stock market, is available. To overcome this limitation, Cover and Ordentlich proposed the state constant rebalanced portfolio which is capable to appropriately exploit the available side-information about the stock market. In this paper we study and analyze the topic introduced by Cover and Ordentlich and focus the attention to the interplay between constant rebalanced portfolios and side-information. We introduce a mathematical framework to deal with constant rebalanced portfolio in the case when sideinformation, about the stock market, is available. The mathematical framework defines and analyzes the mixture best constant rebalanced portfolio which is proposed as the investment benchmark to be considered in the case when side-information, about the stock market, is available. The mixture best constant rebalanced portfolio outperforms the best constant rebalanced portfolio by an exponential factor in terms of the achieved wealth and therefore offers an interesting opportunity for side-information specialized online investment algorithms. We describe a new online investment algorithm which exploits the definition of mixture best constant rebalanced portfolio and the available side-information. The performance of the proposed online investment algorithm is investigated through a set of numerical experiments concerning four major stock market datasets, namely DJIA, S&P500, TSE and NYSE. The results obtained emphasize the relevance of the proposed online investment strategy and underline the central role of the side-information quality to outperform the best constant rebalanced portfolio.

Research paper thumbnail of PROFILING NEURAL NETWORKS FOR OPTION PRICING

International Journal of Theoretical and Applied Finance, 2000

In recent years the problem of option pricing has received increasing interest from both financia... more In recent years the problem of option pricing has received increasing interest from both financial institutions and academics. It is well known that conventional modeling techniques for option pricing have inherent, persistent and systematic biases which are mainly due to the assumption of constant volatility for prices associated with the underlying financial instrument. Nowadays, there is strong and increasing evidence that financial markets are far from being stationary and then, whenever dealing with option pricing, we have to take into account the market heteroschedasticity. A possible approach for dealing with non-constant volatility relies on the modeling of the basic characteristics named implied volatility. Unfortunately this task is extremely complex and parametric models are not available. In this paper the authors discuss how models from the class of Feedforward Neural Networks can be exploited for approaching the task of implied volatility modeling. In particular the paper shows how the main techniques from the nonlinear regression framework can be exploited when models from the class of Feedforward Neural Networks are used. Indeed, in such a case the paucity of data, which can be used for the network training, and the particular structure of Feedforward Neural Networks make the modeling task numerically complex. The authors discuss how the nonlinear regression technique named profile can be exploited for selecting the optimal network's structure and evaluating its numeical properties. To this end, a numerical procedure for empirical model building, in the case of Feedforward Neural Networks, has been developed. Results are evaluated through an ad-hoc procedure which utilizes the estimated implied volatility surface for pricing general contingent claims. Numerical experiments, in the case of the USD/DEM options, are presented and discussed.

Research paper thumbnail of Nonlinear Regression and Neural Networks

International Journal of Mathematical Algorithms, 2000

In this paper the authors are concerned with the problem of empirical model building when models ... more In this paper the authors are concerned with the problem of empirical model building when models from the class of Feedforward Neural Networks are considered. In this case, empirical model building consists of the following four tasks: network's structure selection, ...

Research paper thumbnail of A Bayesian approach for constructing implied volatility surfaces through neural networks

The Journal of Computational Finance, 2000

In this paper the authors present a new option pricing scheme which deals with non constant volat... more In this paper the authors present a new option pricing scheme which deals with non constant volatility for the price of the underlying asset. The main feature, of the proposed pricing scheme, consists of exploiting recent developments, about Bayesian learning, within the ...

Research paper thumbnail of A continuous time Bayesian network model for cardiogenic heart failure

Flexible Services and Manufacturing Journal, 2012

Continuous time Bayesian networks are used to diagnose cardiogenic heart failure and to anticipat... more Continuous time Bayesian networks are used to diagnose cardiogenic heart failure and to anticipate its likely evolution. The proposed model overcomes the strong modeling and computational limitations of dynamic Bayesian networks. It consists of both unobservable physiological variables, and clinically and instrumentally observable events which might support diagnosis like myocardial infarction and the future occurrence of shock. Three case studies related to cardiogenic heart failure are presented. The model predicts the occurrence of complicating diseases and the persistence of heart failure according to variations of the evidence gathered from the patient. Predictions are shown to be consistent with current pathophysiological medical understanding of clinical pictures.

Research paper thumbnail of Classification of dendritic cell phenotypes from gene expression data

BMC Immunology, 2011

Background: The selection of relevant genes for sample classification is a common task in many ge... more Background: The selection of relevant genes for sample classification is a common task in many gene expression studies. Although a number of tools have been developed to identify optimal gene expression signatures, they often generate gene lists that are too long to be exploited clinically. Consequently, researchers in the field try to identify the smallest set of genes that provide good sample classification. We investigated the genome-wide expression of the inflammatory phenotype in dendritic cells. Dendritic cells are a complex group of cells that play a critical role in vertebrate immunity. Therefore, the prediction of the inflammatory phenotype in these cells may help with the selection of immune-modulating compounds. Results: A data mining protocol was applied to microarray data for murine cell lines treated with various inflammatory stimuli. The learning and validation data sets consisted of 155 and 49 samples, respectively. The data mining protocol reduced the number of probe sets from 5,802 to 10, then from 10 to 6 and finally from 6 to 3. The performances of a set of supervised classification models were compared. The best accuracy, when using the six following genes -Il12b, Cd40, Socs3, Irgm1, Plin2 and Lgals3bp-was obtained by Tree Augmented Naïve Bayes and Nearest Neighbour (91.8%). Using the smallest set of three genes -Il12b, Cd40 and Socs3-the performance remained satisfactory and the best accuracy was with Support Vector Machine (95.9%). These data mining models, using data for the genes Il12b, Cd40 and Socs3, were validated with a human data set consisting of 27 samples. Support Vector Machines (71.4%) and Nearest Neighbour (92.6%) gave the worst performances, but the remaining models correctly classified all the 27 samples.

Research paper thumbnail of Determining factors in ICT adoption by MSME's in agriculture clusters: An exploratory case study

IEEE 7th International Conference on Research Challenges in Information Science (RCIS), 2013

In this paper we consider the case of the ICT adoption and use in an agriculture cluster in Lomba... more In this paper we consider the case of the ICT adoption and use in an agriculture cluster in Lombardy, a northern region of Italy. At the state of the art, relationships among key factors of adoption and use of ICT in agriculture area received little attention by the academic literature. Thus, in this paper we aim to identify a research model in order to provide evidence of four different research questions concerning the determining factors for ICT adoption. The proposed case study reports and discusses the results obtained by analysing data from a survey of about 600 agricultural farms. Finally, Belief Bayesian Networks (BBNs) are used to analyse the complex influence relationships detected between research variables.

Research paper thumbnail of Continuous Time Bayesian Networks for Gene Network Reconstruction: A Comparative Study on Time Course Data

Lecture Notes in Computer Science, 2014

Dynamic aspects of regulatory networks are typically investigated by measuring relevant variables... more Dynamic aspects of regulatory networks are typically investigated by measuring relevant variables at multiple points in time. Current state-of-the-art approaches for gene network reconstruction directly build on such data, making the strong assumption that the system evolves in a synchronous fashion and in discrete time. However, omics data generated with increasing time-course granularity allow to model gene networks as systems whose state evolves in continuous time, thus improving the model's expressiveness. In this work continuous time Bayesian networks are proposed as a new approach for regulatory network reconstruction from time-course expression data. Their performance is compared to that of two state-of-the-art methods: dynamic Bayesian networks and Granger causality. The comparison is accomplished using both simulated and experimental data. Continuous time Bayesian networks achieve the highest F-measure on both datasets. Furthermore, precision, recall and F-measure degrade in a smoother way than those of dynamic Bayesian networks and Granger causality, when the complexity of the gene regulatory network increases.

Research paper thumbnail of Gene network inference using continuous time Bayesian networks: a comparative study and application to Th17 cell differentiation

BMC Bioinformatics, 2014

Background: Dynamic aspects of gene regulatory networks are typically investigated by measuring s... more Background: Dynamic aspects of gene regulatory networks are typically investigated by measuring system variables at multiple time points. Current state-of-the-art computational approaches for reconstructing gene networks directly build on such data, making a strong assumption that the system evolves in a synchronous fashion at fixed points in time. However, nowadays omics data are being generated with increasing time course granularity. Thus, modellers now have the possibility to represent the system as evolving in continuous time and to improve the models' expressiveness. Results: Continuous time Bayesian networks are proposed as a new approach for gene network reconstruction from time course expression data. Their performance was compared to two state-of-the-art methods: dynamic Bayesian networks and Granger causality analysis. On simulated data, the methods comparison was carried out for networks of increasing size, for measurements taken at different time granularity densities and for measurements unevenly spaced over time. Continuous time Bayesian networks outperformed the other methods in terms of the accuracy of regulatory interactions learnt from data for all network sizes. Furthermore, their performance degraded smoothly as the size of the network increased. Continuous time Bayesian networks were significantly better than dynamic Bayesian networks for all time granularities tested and better than Granger causality for dense time series. Both continuous time Bayesian networks and Granger causality performed robustly for unevenly spaced time series, with no significant loss of performance compared to the evenly spaced case, while the same did not hold true for dynamic Bayesian networks. The comparison included the IRMA experimental datasets which confirmed the effectiveness of the proposed method. Continuous time Bayesian networks were then applied to elucidate the regulatory mechanisms controlling murine T helper 17 (Th17) cell differentiation and were found to be effective in discovering well-known regulatory mechanisms, as well as new plausible biological insights. Conclusions: Continuous time Bayesian networks were effective on networks of both small and large size and were particularly feasible when the measurements were not evenly distributed over time. Reconstruction of the murine Th17 cell differentiation network using continuous time Bayesian networks revealed several autocrine loops, suggesting that Th17 cells may be auto regulating their own differentiation process.

Research paper thumbnail of Analyzing user reviews in tourism with topic models

Information Technology & Tourism, 2015

Research paper thumbnail of An Integrated Forecasting and Regularization Framework for Light Rail Transit Systems

J Intell Transport Syst, 2006

In recent years, with half the world's population living in towns and cities and most of them rel... more In recent years, with half the world's population living in towns and cities and most of them relying heavily on public transport to meet their mobility needs, efficient and effective public transport operations have become critical to sustainable economic and social development. Nowadays, Light Rail Transit Systems are considered to be the most promising technological approach to satisfy these needs, i.e. to ensure efficient and reliable urban mobility. However, Light Rail Transit Systems are subject to frequent minor disrupted transit operations, often caused by stochastic variations of passenger demand at stations and traffic conditions on the service routes, which increase passenger waiting times discouraging them from using the transit system. Although these minor disruptions usually last no longer than a few minutes, they can degrade the level of service significantly on a short headway service. In this paper the authors propose a real-time disruption control model for Light Rail Transit Systems based on an integrated quantitative forecasting and regularization approach. The forecasting component relies on Artificial Neural Networks, a non-parametric computational model that has proved to be particularly efficient for the forecasting task in several applicative domains. The regularization engine involves the formulation of a constrained mathematical programming problem which can be solved quickly and therefore is well suited for real-time disruption control. The conceptual model is applied to a case study concerning the transit line number 7 operating in the urban area of Milan. To validate the proposed forecasting and regularization framework an experimental plan has been designed and performed under different traffic and passengers demand fluctuation conditions. The results of the simulation study witness the efficacy of the overall approach to forecast and regularize the considered Light Rail Transit System.

Research paper thumbnail of Conditional Log-Likelihood for Continuous Time Bayesian Network Classifiers

Lecture Notes in Artificial Intelligence, Sep 23, 2013

Continuous time Bayesian network classifiers are designed for analyzing multivariate streaming da... more Continuous time Bayesian network classifiers are designed for analyzing multivariate streaming data when time duration of events matters. New continuous time Bayesian network classifiers are introduced while their conditional log-likelihood scoring function is developed. A learning algorithm, combining conditional log-likelihood with Bayesian parameter estimation is developed. Classification accuracy values achieved on synthetic and real data by continuous time and dynamic Bayesian network classifiers are compared. Numerical experiments show that the proposed approach outperforms dynamic Bayesian network classifiers and continuous time Bayesian network classifiers learned with log-likelihood.

Research paper thumbnail of Nonstationary Optimization Approach for Finding Universal Portfolios

This paper extends to continuous time the concept of universal portfolio introduced by Cover (199... more This paper extends to continuous time the concept of universal portfolio introduced by Cover (1991). Being a performance weighted average of constant rebalanced portfolios, the universal portfolio outperforms constant rebalanced and buy-and-hold portfolios exponentially over the long run. an asymptotic formula summarizing its long-term performance is reported that supplements the one given by Cover. A criterion in terms of long-term averages of instantaneous stock drifts and covariances is found which determines the particular form of the asymptotic growth. A formula for the expected universal wealth is given. Copyright 1992 Blackwell Publishers.

Research paper thumbnail of Numerical techniques for solving estimation problems on robust Bayesian networks

Institute of Mathematical Statistics Lecture Notes - Monograph Series, 1996

AMS 1991 Subject Classifications. Primary: 30E05, secondary: 65K10.

Research paper thumbnail of A Software System for Topic Extraction and Document Classification

2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology, 2009

A software system for topic extraction and automatic document classification is presented. Given ... more A software system for topic extraction and automatic document classification is presented. Given a set of documents, the system automatically extracts the mentioned topics and assists the user to select their optimal number. The user-validated topics are exploited to build a model for multi-label document classification. While topic extraction is performed by using an optimized implementation of the Latent Dirichlet Allocation model, multi-label document classification is performed by using a specialized version of the Multi-Net Naive Bayes model. The performance of the system is investigated by using 10,056 documents retrieved from the WEB through a set of queries formed by exploiting the Italian Google Directory. This dataset is used for topic extraction while an independent dataset, consisting of 1,012 elements labeled by humans, is used to evaluate the performance of the Multi-Net Naive Bayes model. The results are satisfactory, with precision being consistently better than recall for the labels associated with the four most frequent topics.

Research paper thumbnail of Probabilistic Topic Discovery and Automatic Document Tagging

Perspectives and Applications, 2012