Andrea Romei - Academia.edu (original) (raw)

Papers by Andrea Romei

Research paper thumbnail of Discrimination Data Analysis: A Multi-disciplinary Bibliography

Studies in applied philosophy, epistemology and rational ethics, 2013

Discrimination data analysis has been investigated for the last fifty years in a large body of so... more Discrimination data analysis has been investigated for the last fifty years in a large body of social, legal, and economic studies. Recently, discrimination discovery and prevention has become a blooming research topic in the knowledge discovery community. This chapter provides a multidisciplinary annotated bibliography of the literature on discrimination data analysis, with the intended objective to provide a common basis to researchers from a multidisciplinary perspective. We cover legal, sociological, economic and computer science references.

Research paper thumbnail of A Case Study in Sequential Pattern Mining for IT-Operational Risk

Springer eBooks, Aug 13, 2008

IT-operational risk management consists of identifying, assessing, monitoring and mitigating the ... more IT-operational risk management consists of identifying, assessing, monitoring and mitigating the adverse risks of loss resulting from hardware and software system failures. We present a case study in IT-operational risk measurement in the context of a network of Private Branch eXchanges (PBXs). The approach relies on preprocessing and data mining tasks for the extraction of sequential patterns and their exploitation in the definition of a measure called expected risk.

Research paper thumbnail of SURVEY PAPER Inductive database languages: requirements

Inductive databases (IDBs) represent a database perspective on Knowledge discovery in databases (... more Inductive databases (IDBs) represent a database perspective on Knowledge discovery in databases (KDD). In an IDB, the KDD application can express both queries capable of accessing and manipulating data, and queries capable of generating, manipulating, and applying patterns allowing to formalize the notion of mining process. The feature that makes them different from other data mining applications is exactly the idea of looking at the support for knowledge discovery as an extension of the query process. This paper draws a list of desirable properties to be taken into account in the definition of an IDB framework. They involve several dimensions, such as the expressiveness of the language in representing data and models, the closure principle, the capability to provide a support for an efficient algorithm programming. These requirements are a basis for a comparative study that highlights strengths and weaknesses of existing IDB approaches. The paper focuses on the SQL-based ATLaS lang...

Research paper thumbnail of The layered structure of company share networks

2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA), 2015

We present a framework for the analysis of corporate governance problems using network science an... more We present a framework for the analysis of corporate governance problems using network science and graph algorithms on ownership networks. In such networks, nodes model companies/shareholders and edges model shares owned. Inspired by the widespread pyramidal organization of corporate groups of companies, we model ownership networks as layered graphs, and exploit the layered structure to design feasible and efficient solutions to three key problems of corporate governance. The first one is the long-standing problem of computing direct and indirect ownership (integrated ownership problem). The other two problems are introduced here: computing direct and indirect dividends (dividend problem), and computing the group of companies controlled by a parent shareholder (corporate group problem). We conduct an extensive empirical analysis of the Italian ownership network, which, with its 3.9M nodes, is 30× the largest network studied so far.

Research paper thumbnail of Discrimination Data Analysis: A Multi-disciplinary Bibliography

Studies in Applied Philosophy, Epistemology and Rational Ethics, 2013

Discrimination data analysis has been investigated for the last fifty years in a large body of so... more Discrimination data analysis has been investigated for the last fifty years in a large body of social, legal, and economic studies. Recently, discrimination discovery and prevention has become a blooming research topic in the knowledge discovery community. This chapter provides a multidisciplinary annotated bibliography of the literature on discrimination data analysis, with the intended objective to provide a common basis to researchers from a multidisciplinary perspective. We cover legal, sociological, economic and computer science references.

Research paper thumbnail of Language Support to XML Data Mining: A Case Study

Communications in Computer and Information Science, 2011

There are several reasons that justify the study of a powerful, expressive and efficient XML-base... more There are several reasons that justify the study of a powerful, expressive and efficient XML-based framework for intelligent data analysis. First of all, the proliferation of XML sources offer good opportunities to mine new data. Second, native XML databases appear to be a natural alternative to relational databases when the purpose is querying both data and the extracted models in

Research paper thumbnail of XML data mining

Software: Practice and Experience, 2009

With the spreading of XML sources, mining XML data can be an important objective in the near futu... more With the spreading of XML sources, mining XML data can be an important objective in the near future. This paper presents a project focussed on designing a general-purpose query language in support of mining XML data. In our framework, raw data, mining models and domain knowledge are represented by way of XML documents and stored inside native XML databases. Data mining (DM) tasks are expressed in an extension of XQuery. Special attention is given to the frequent pattern discovery problem, and a way of exploiting domain-dependent optimizations and efficient data structures as deeper as possible in the extraction process is presented. We report the results of a first bunch of experiments, showing that a good trade-off between expressiveness and efficiency in XML DM is not a chimera.

Research paper thumbnail of A multidisciplinary survey on discrimination analysis

The Knowledge Engineering Review, 2013

The collection and analysis of observational and experimental data represent the main tools for a... more The collection and analysis of observational and experimental data represent the main tools for assessing the presence, the extent, the nature, and the trend of discrimination phenomena. Data analysis techniques have been proposed in the last 50 years in the economic, legal, statistical, and, recently, in the data mining literature. This is not surprising, since discrimination analysis is a multidisciplinary problem, involving sociological causes, legal argumentations, economic models, statistical techniques, and computational issues. The objective of this survey is to provide a guidance and a glue for researchers and anti-discrimination data analysts on concepts, problems, application areas, datasets, methods, and approaches from a multidisciplinary perspective. We organize the approaches according to their method of data collection as observational, quasi-experimental, and experimental studies. A fourth line of recently blooming research on knowledge discovery based methods is als...

Research paper thumbnail of Inductive database languages: requirements and examples

Knowledge and Information Systems, 2010

Inductive databases (IDBs) represent a database perspective on Knowledge discovery in databases (... more Inductive databases (IDBs) represent a database perspective on Knowledge discovery in databases (KDD). In an IDB, the KDD application can express both queries capable of accessing and manipulating data, and queries capable of generating, manipulating, and applying patterns allowing to formalize the notion of mining process. The feature that makes them different from other data mining applications is exactly the

Research paper thumbnail of Discrimination discovery in scientific project evaluation: A case study

Expert Systems with Applications, 2013

Discovering contexts of unfair decisions in a dataset of historical decision records is a non-tri... more Discovering contexts of unfair decisions in a dataset of historical decision records is a non-trivial problem. It requires the design of ad-hoc methods and techniques of analysis, which have to comply with existing laws and with legal argumentations. While some data mining techniques have been adapted to the purpose, the state-of-the-art of research still needs both methodological refinements, the consolidation of a Knowledge Discovery in Databases (KDD) process, and, most of all, experimentation with real data. This paper contributes by presenting a case study on gender discrimination in a dataset of scientific research proposals, and by distilling from the case study a general discrimination discovery process. Gender bias in scientific research is a challenging problem, that has been tackled in the social sciences literature by means of statistical regression. However, this approach is limited to test an hypothesis of discrimination over the whole dataset under analysis. Our methodology couples data mining, for unveiling previously unknown contexts of possible discrimination, with statistical regression, for testing the significance of such contexts, thus obtaining the best of the two worlds.

Research paper thumbnail of KDDML: A middleware language and system for knowledge discovery in databases

Data & Knowledge Engineering, 2006

KDDML (KDD Markup Language) is a middleware language and system designed to support the developme... more KDDML (KDD Markup Language) is a middleware language and system designed to support the development of final applications or higher level systems which deploy a mixture of data access, data preprocessing, extraction and deployment of data mining models. A KDDML language query is an XML-document where XML tags corresponds to operations on data/models, XML attributes correspond to parameters of those operations and XML sub-elements define arguments passed to the operators. The core of the KDDML system is a KDDML language interpreter with modularity and extensibility requirements as the main goals.

Research paper thumbnail of KDDML‐G: a grid‐enabled knowledge discovery system

Concurrency and Computation: Practice and Experience, 2007

KDDML‐G is a middleware language and system for knowledge discovery on the grid. The challenge th... more KDDML‐G is a middleware language and system for knowledge discovery on the grid. The challenge that motivated the development of a grid‐enabled version of the ‘standalone’ KDDML (Knowledge Discovery in Databases Markup Language) environment was on one side to exploit the parallelism offered by the grid environment, and on the other side to overcome the problem of data immovability, a quite frequent restriction on real‐world data collections that has principally a privacy‐preserving purpose. The last question is addressed by moving the code and ‘mining’ the data ‘on the place’, that is by adapting the computation to the availability and localization of the data. Copyright © 2007 John Wiley & Sons, Ltd.

Research paper thumbnail of Pushing Constraints in Association Rule Mining: An Ontology-Based Approach

This paper proposes an integrated framework for the extraction of constraint-based multi-level as... more This paper proposes an integrated framework for the extraction of constraint-based multi-level association rules with the aid of an ontology. The latter, which represents an enriched taxonomy, is used to describe the application domain by means of data properties. Defining or updating these properties is a simple task and does not imply changing the items hierarchy, or the implementation level

Research paper thumbnail of Distributed knowledge discovery with the parallel KDDML system

... 2000, 413-424. 2. [2] M. Baglioni and F. Turini, MQL: An Algebraic Query Language for Knowled... more ... 2000, 413-424. 2. [2] M. Baglioni and F. Turini, MQL: An Algebraic Query Language for Knowledge Discovery, Proceedings of the 8th Congress of the Italian Association for Artificial Intelligence , Pisa, Italy, 2003, 225-236. 3. [3 ...

Research paper thumbnail of A Case Study in Sequential Pattern Mining for IT-Operational Risk

Lecture Notes in Computer Science

IT-operational risk management consists of identifying, assessing, monitoring and mitigating the ... more IT-operational risk management consists of identifying, assessing, monitoring and mitigating the adverse risks of loss resulting from hardware and software system failures. We present a case study in IT-operational risk measurement in the context of a network of Private Branch eXchanges (PBXs). The approach relies on preprocessing and data mining tasks for the extraction of sequential patterns and their exploitation in the definition of a measure called expected risk.

Research paper thumbnail of Discovering Gender Discrimination in Project Funding

2012 IEEE 12th International Conference on Data Mining Workshops, 2012

The selection of projects for funding can hide discriminatory decisions. We present a case study ... more The selection of projects for funding can hide discriminatory decisions. We present a case study investigating gender discrimination in a dataset of scientific research proposals submitted to an Italian national call. The method for the analysis relies on a data mining classification strategy that is inspired by a legal methodology for proving evidence of social discrimination against protected-by-law groups.

Research paper thumbnail of Survey on using constraints in data mining

Data Mining and Knowledge Discovery, 2016

This paper provides an overview of the current state-of-the-art on using constraints in knowledge... more This paper provides an overview of the current state-of-the-art on using constraints in knowledge discovery and data mining. The use of constraints in a data mining task requires specific definition and satisfaction tools during knowledge extraction. This survey proposes three groups of studies based on classification, clustering and pattern mining, whether the constraints are on the data, the models or the measures, respectively. We consider the distinctions between hard and soft constraint satisfaction, and between the knowledge extraction phases where constraints are considered. In addition to discussing how constraints can be used in data mining, we show how constraint-based languages can be used throughout the data mining process.

Research paper thumbnail of Preprocessing and Mining Web Log Data for Web Personalization

AI*IA 2003: Advances in Artificial Intelligence, 2003

We describe the web usage mining activities of an ongoing project, called ClickWorld 3 , that aim... more We describe the web usage mining activities of an ongoing project, called ClickWorld 3 , that aims at extracting models of the navigational behaviour of a web site users. The models are inferred from the access logs of a web server by means of data and web mining techniques. The extracted knowledge is deployed to the purpose of offering a personalized and proactive view of the web services to users. We first describe the preprocessing steps on access logs necessary to clean, select and prepare data for knowledge extraction. Then we show two sets of experiments: the first one tries to predict the sex of a user based on the visited web pages, and the second one tries to predict whether a user might be interested in visiting a section of the site.

Research paper thumbnail of XQuake as a Constraint-Based Mining Language

XQuake is a language for data mining inspired by the inductive databases theory. This work extend... more XQuake is a language for data mining inspired by the inductive databases theory. This work extends XQuake with the def-inition of domain-specific constraints. An ontology is used to de-scribe the domain knowledge. We give the main idea of the work-in-progress discussing its possibilities and advantages.

Research paper thumbnail of Extending KDDML with a Visual Metaphor for the KDD Process

Visual Information Systems. Web-Based Visual Information Search and Management

The spreading application of data mining techniques is clearly represented by the large number of... more The spreading application of data mining techniques is clearly represented by the large number of suites supporting the knowledge discovery process. The latter can be viewed as real visual programming environments. Based on this assumption, we define some requirements which a typical data mining high-level graphical user interface should satisfy, in order to guarantee a good level of interactivity and expressiveness. The aim of this study is to use these requirements during the engineering and development of visual knowledge flow abstraction for the existing KDDML (Knowledge Discovery in Databases Markup Language) system. We introduce some features not only directly related to the visual metaphor, but also to the whole system, here intended as a real visual programming environment for the knowledge discovery process.

Research paper thumbnail of Discrimination Data Analysis: A Multi-disciplinary Bibliography

Studies in applied philosophy, epistemology and rational ethics, 2013

Discrimination data analysis has been investigated for the last fifty years in a large body of so... more Discrimination data analysis has been investigated for the last fifty years in a large body of social, legal, and economic studies. Recently, discrimination discovery and prevention has become a blooming research topic in the knowledge discovery community. This chapter provides a multidisciplinary annotated bibliography of the literature on discrimination data analysis, with the intended objective to provide a common basis to researchers from a multidisciplinary perspective. We cover legal, sociological, economic and computer science references.

Research paper thumbnail of A Case Study in Sequential Pattern Mining for IT-Operational Risk

Springer eBooks, Aug 13, 2008

IT-operational risk management consists of identifying, assessing, monitoring and mitigating the ... more IT-operational risk management consists of identifying, assessing, monitoring and mitigating the adverse risks of loss resulting from hardware and software system failures. We present a case study in IT-operational risk measurement in the context of a network of Private Branch eXchanges (PBXs). The approach relies on preprocessing and data mining tasks for the extraction of sequential patterns and their exploitation in the definition of a measure called expected risk.

Research paper thumbnail of SURVEY PAPER Inductive database languages: requirements

Inductive databases (IDBs) represent a database perspective on Knowledge discovery in databases (... more Inductive databases (IDBs) represent a database perspective on Knowledge discovery in databases (KDD). In an IDB, the KDD application can express both queries capable of accessing and manipulating data, and queries capable of generating, manipulating, and applying patterns allowing to formalize the notion of mining process. The feature that makes them different from other data mining applications is exactly the idea of looking at the support for knowledge discovery as an extension of the query process. This paper draws a list of desirable properties to be taken into account in the definition of an IDB framework. They involve several dimensions, such as the expressiveness of the language in representing data and models, the closure principle, the capability to provide a support for an efficient algorithm programming. These requirements are a basis for a comparative study that highlights strengths and weaknesses of existing IDB approaches. The paper focuses on the SQL-based ATLaS lang...

Research paper thumbnail of The layered structure of company share networks

2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA), 2015

We present a framework for the analysis of corporate governance problems using network science an... more We present a framework for the analysis of corporate governance problems using network science and graph algorithms on ownership networks. In such networks, nodes model companies/shareholders and edges model shares owned. Inspired by the widespread pyramidal organization of corporate groups of companies, we model ownership networks as layered graphs, and exploit the layered structure to design feasible and efficient solutions to three key problems of corporate governance. The first one is the long-standing problem of computing direct and indirect ownership (integrated ownership problem). The other two problems are introduced here: computing direct and indirect dividends (dividend problem), and computing the group of companies controlled by a parent shareholder (corporate group problem). We conduct an extensive empirical analysis of the Italian ownership network, which, with its 3.9M nodes, is 30× the largest network studied so far.

Research paper thumbnail of Discrimination Data Analysis: A Multi-disciplinary Bibliography

Studies in Applied Philosophy, Epistemology and Rational Ethics, 2013

Discrimination data analysis has been investigated for the last fifty years in a large body of so... more Discrimination data analysis has been investigated for the last fifty years in a large body of social, legal, and economic studies. Recently, discrimination discovery and prevention has become a blooming research topic in the knowledge discovery community. This chapter provides a multidisciplinary annotated bibliography of the literature on discrimination data analysis, with the intended objective to provide a common basis to researchers from a multidisciplinary perspective. We cover legal, sociological, economic and computer science references.

Research paper thumbnail of Language Support to XML Data Mining: A Case Study

Communications in Computer and Information Science, 2011

There are several reasons that justify the study of a powerful, expressive and efficient XML-base... more There are several reasons that justify the study of a powerful, expressive and efficient XML-based framework for intelligent data analysis. First of all, the proliferation of XML sources offer good opportunities to mine new data. Second, native XML databases appear to be a natural alternative to relational databases when the purpose is querying both data and the extracted models in

Research paper thumbnail of XML data mining

Software: Practice and Experience, 2009

With the spreading of XML sources, mining XML data can be an important objective in the near futu... more With the spreading of XML sources, mining XML data can be an important objective in the near future. This paper presents a project focussed on designing a general-purpose query language in support of mining XML data. In our framework, raw data, mining models and domain knowledge are represented by way of XML documents and stored inside native XML databases. Data mining (DM) tasks are expressed in an extension of XQuery. Special attention is given to the frequent pattern discovery problem, and a way of exploiting domain-dependent optimizations and efficient data structures as deeper as possible in the extraction process is presented. We report the results of a first bunch of experiments, showing that a good trade-off between expressiveness and efficiency in XML DM is not a chimera.

Research paper thumbnail of A multidisciplinary survey on discrimination analysis

The Knowledge Engineering Review, 2013

The collection and analysis of observational and experimental data represent the main tools for a... more The collection and analysis of observational and experimental data represent the main tools for assessing the presence, the extent, the nature, and the trend of discrimination phenomena. Data analysis techniques have been proposed in the last 50 years in the economic, legal, statistical, and, recently, in the data mining literature. This is not surprising, since discrimination analysis is a multidisciplinary problem, involving sociological causes, legal argumentations, economic models, statistical techniques, and computational issues. The objective of this survey is to provide a guidance and a glue for researchers and anti-discrimination data analysts on concepts, problems, application areas, datasets, methods, and approaches from a multidisciplinary perspective. We organize the approaches according to their method of data collection as observational, quasi-experimental, and experimental studies. A fourth line of recently blooming research on knowledge discovery based methods is als...

Research paper thumbnail of Inductive database languages: requirements and examples

Knowledge and Information Systems, 2010

Inductive databases (IDBs) represent a database perspective on Knowledge discovery in databases (... more Inductive databases (IDBs) represent a database perspective on Knowledge discovery in databases (KDD). In an IDB, the KDD application can express both queries capable of accessing and manipulating data, and queries capable of generating, manipulating, and applying patterns allowing to formalize the notion of mining process. The feature that makes them different from other data mining applications is exactly the

Research paper thumbnail of Discrimination discovery in scientific project evaluation: A case study

Expert Systems with Applications, 2013

Discovering contexts of unfair decisions in a dataset of historical decision records is a non-tri... more Discovering contexts of unfair decisions in a dataset of historical decision records is a non-trivial problem. It requires the design of ad-hoc methods and techniques of analysis, which have to comply with existing laws and with legal argumentations. While some data mining techniques have been adapted to the purpose, the state-of-the-art of research still needs both methodological refinements, the consolidation of a Knowledge Discovery in Databases (KDD) process, and, most of all, experimentation with real data. This paper contributes by presenting a case study on gender discrimination in a dataset of scientific research proposals, and by distilling from the case study a general discrimination discovery process. Gender bias in scientific research is a challenging problem, that has been tackled in the social sciences literature by means of statistical regression. However, this approach is limited to test an hypothesis of discrimination over the whole dataset under analysis. Our methodology couples data mining, for unveiling previously unknown contexts of possible discrimination, with statistical regression, for testing the significance of such contexts, thus obtaining the best of the two worlds.

Research paper thumbnail of KDDML: A middleware language and system for knowledge discovery in databases

Data & Knowledge Engineering, 2006

KDDML (KDD Markup Language) is a middleware language and system designed to support the developme... more KDDML (KDD Markup Language) is a middleware language and system designed to support the development of final applications or higher level systems which deploy a mixture of data access, data preprocessing, extraction and deployment of data mining models. A KDDML language query is an XML-document where XML tags corresponds to operations on data/models, XML attributes correspond to parameters of those operations and XML sub-elements define arguments passed to the operators. The core of the KDDML system is a KDDML language interpreter with modularity and extensibility requirements as the main goals.

Research paper thumbnail of KDDML‐G: a grid‐enabled knowledge discovery system

Concurrency and Computation: Practice and Experience, 2007

KDDML‐G is a middleware language and system for knowledge discovery on the grid. The challenge th... more KDDML‐G is a middleware language and system for knowledge discovery on the grid. The challenge that motivated the development of a grid‐enabled version of the ‘standalone’ KDDML (Knowledge Discovery in Databases Markup Language) environment was on one side to exploit the parallelism offered by the grid environment, and on the other side to overcome the problem of data immovability, a quite frequent restriction on real‐world data collections that has principally a privacy‐preserving purpose. The last question is addressed by moving the code and ‘mining’ the data ‘on the place’, that is by adapting the computation to the availability and localization of the data. Copyright © 2007 John Wiley & Sons, Ltd.

Research paper thumbnail of Pushing Constraints in Association Rule Mining: An Ontology-Based Approach

This paper proposes an integrated framework for the extraction of constraint-based multi-level as... more This paper proposes an integrated framework for the extraction of constraint-based multi-level association rules with the aid of an ontology. The latter, which represents an enriched taxonomy, is used to describe the application domain by means of data properties. Defining or updating these properties is a simple task and does not imply changing the items hierarchy, or the implementation level

Research paper thumbnail of Distributed knowledge discovery with the parallel KDDML system

... 2000, 413-424. 2. [2] M. Baglioni and F. Turini, MQL: An Algebraic Query Language for Knowled... more ... 2000, 413-424. 2. [2] M. Baglioni and F. Turini, MQL: An Algebraic Query Language for Knowledge Discovery, Proceedings of the 8th Congress of the Italian Association for Artificial Intelligence , Pisa, Italy, 2003, 225-236. 3. [3 ...

Research paper thumbnail of A Case Study in Sequential Pattern Mining for IT-Operational Risk

Lecture Notes in Computer Science

IT-operational risk management consists of identifying, assessing, monitoring and mitigating the ... more IT-operational risk management consists of identifying, assessing, monitoring and mitigating the adverse risks of loss resulting from hardware and software system failures. We present a case study in IT-operational risk measurement in the context of a network of Private Branch eXchanges (PBXs). The approach relies on preprocessing and data mining tasks for the extraction of sequential patterns and their exploitation in the definition of a measure called expected risk.

Research paper thumbnail of Discovering Gender Discrimination in Project Funding

2012 IEEE 12th International Conference on Data Mining Workshops, 2012

The selection of projects for funding can hide discriminatory decisions. We present a case study ... more The selection of projects for funding can hide discriminatory decisions. We present a case study investigating gender discrimination in a dataset of scientific research proposals submitted to an Italian national call. The method for the analysis relies on a data mining classification strategy that is inspired by a legal methodology for proving evidence of social discrimination against protected-by-law groups.

Research paper thumbnail of Survey on using constraints in data mining

Data Mining and Knowledge Discovery, 2016

This paper provides an overview of the current state-of-the-art on using constraints in knowledge... more This paper provides an overview of the current state-of-the-art on using constraints in knowledge discovery and data mining. The use of constraints in a data mining task requires specific definition and satisfaction tools during knowledge extraction. This survey proposes three groups of studies based on classification, clustering and pattern mining, whether the constraints are on the data, the models or the measures, respectively. We consider the distinctions between hard and soft constraint satisfaction, and between the knowledge extraction phases where constraints are considered. In addition to discussing how constraints can be used in data mining, we show how constraint-based languages can be used throughout the data mining process.

Research paper thumbnail of Preprocessing and Mining Web Log Data for Web Personalization

AI*IA 2003: Advances in Artificial Intelligence, 2003

We describe the web usage mining activities of an ongoing project, called ClickWorld 3 , that aim... more We describe the web usage mining activities of an ongoing project, called ClickWorld 3 , that aims at extracting models of the navigational behaviour of a web site users. The models are inferred from the access logs of a web server by means of data and web mining techniques. The extracted knowledge is deployed to the purpose of offering a personalized and proactive view of the web services to users. We first describe the preprocessing steps on access logs necessary to clean, select and prepare data for knowledge extraction. Then we show two sets of experiments: the first one tries to predict the sex of a user based on the visited web pages, and the second one tries to predict whether a user might be interested in visiting a section of the site.

Research paper thumbnail of XQuake as a Constraint-Based Mining Language

XQuake is a language for data mining inspired by the inductive databases theory. This work extend... more XQuake is a language for data mining inspired by the inductive databases theory. This work extends XQuake with the def-inition of domain-specific constraints. An ontology is used to de-scribe the domain knowledge. We give the main idea of the work-in-progress discussing its possibilities and advantages.

Research paper thumbnail of Extending KDDML with a Visual Metaphor for the KDD Process

Visual Information Systems. Web-Based Visual Information Search and Management

The spreading application of data mining techniques is clearly represented by the large number of... more The spreading application of data mining techniques is clearly represented by the large number of suites supporting the knowledge discovery process. The latter can be viewed as real visual programming environments. Based on this assumption, we define some requirements which a typical data mining high-level graphical user interface should satisfy, in order to guarantee a good level of interactivity and expressiveness. The aim of this study is to use these requirements during the engineering and development of visual knowledge flow abstraction for the existing KDDML (Knowledge Discovery in Databases Markup Language) system. We introduce some features not only directly related to the visual metaphor, but also to the whole system, here intended as a real visual programming environment for the knowledge discovery process.