A Software Tool to Transform Relational Databases in order to Mine Functional Dependencies in it Using Formal Concept Analysis (original) (raw)
Related papers
A software tool for data analysis based on formal concept analysis
Formal Concept Analysis is a useful tool to represent logi-cal implications in datasets, to analyze the underground knowledge that lies behind large amounts of data. A database relation can be seen as a many-valued context [3]. J. Hereth in [4] introduces the formal context of functional dependencies. In this context, implications hold for functional dependencies. We develop a software application that analyzes an existing relational data table and detect functional dependencies in it. The user can choose to analyze a table from a MS SQL Server, Oracle or MySQL database and the software will build the formal context of functional depen-dencies. We use Conexp [6] to build the concept lattice and implications in this context. These implications will be the functional dependencies for the analyzed table. Having the functional dependencies, we can detect candidate keys and we can decide if the table is in 2NF or 3NF or BCNF. To our knowledge, this method was not implemented yet.
The theory and practice of coupling formal concept analysis to relational databases
In Formal Concept Analysis, a many-valued context is a collection of objects described by attributes that take on more than binary values, such as age (as integers or ranges of integer values) or color (a list or even a hierarchy of color combinations). Conceptual scaling is the process by which such a many-valued context is transformed into a formal context, by associating a concept lattice with the many-valued context. A many-valued context can be compared to a single table in a relational database populated with multiple rows and non-binary values. A generalization of conceptual scaling as a relational database as a whole should take into account the relations between objects, as expressed by means of foreign keys. Previous approaches to scaling a relational database (e.g. relational scaling) take such relations into account, but either do not maintain a separation between objects and values, which is characteristic for the unary case, or result in unary contexts only. In the approach presented in this paper, the use of n-ary scales is suggested, whereby a relational database is transformed into a family of n-ary contexts (a so called power context family). This paper describes the fundamentals of a Web application that allows connection to a relational database, its scaling interactively into a power context family, and navigation within that context family.
A method for mining functional dependencies in relational database design using FCA
Formal Concept Analysis (FCA) is a useful tool to explore the conceptual knowledge contained in a database by analyzing the formal conceptual structure of the data. In this paper, we present a new method to optimize and extend a previous research on FCA and databases, by analyzing the functional dependencies in order to correctly build database schemata. Our method intends to mine functional dependencies in a rela-tional database table. The novelty of our method is that it builds inverted index files in order to optimize the construction of the formal context of functional dependencies.
Characterizing Functional Dependencies in Formal Concept Analysis with Pattern Structures
Annals of Mathematics and Arti cial Intelligence, 2014
Computing functional dependencies from a relation is an impor- tant database topic, with many applications in database management, reverse engineering and query optimization. Whereas it has been deeply investigated in those elds, strong links exist with the mathematical framework of For- mal Concept Analysis. Considering the discovery of functional dependencies, it is indeed known that a relation can be expressed as the binary relation of a formal context, whose implications are equivalent to those dependencies. How- ever, this leads to a new data representation that is quadratic in the number of objects w.r.t. the original data. Here, we present an alternative avoiding such a data representation and show how to characterize functional dependencies using the formalism of pattern structures, an extension of classical FCA to handle complex data. We also show how another class of dependencies can be characterized with that framework, namely, degenerated multivalued depen- dencies. Finally, we discuss and compare the performances of our new approach in a series of experiments on classical benchmark datasets.
A Proposal for Combining Formal Concept Analysis and Description Logics for Mining Relational Data
Lecture Notes in Computer Science, 2007
Recent advances in data and knowledge engineering have emphasized the need for formal concept analysis (FCA) tools taking into account structured data. There are a few adaptations of the classical FCA methodology for handling contexts holding on complex data formats, e.g. graph-based or relational data. In this paper, relational concept analysis (RCA) is proposed, as an adaptation of FCA for analyzing objects described both by binary and relational attributes. The RCA process takes as input a collection of contexts and of inter-context relations, and yields a set of lattices, one per context, whose concepts are linked by relations. Moreover, a way of representing the concepts and relations extracted with RCA is proposed in the framework of a description logic. The RCA process has been implemented within the Galicia platform, offering new and efficient tools for knowledge and software engineering.
Lecture Notes in Computer Science, 2005
The fixpoints of Galois Connections form patterns in binary relational data, such as objectattribute relations, that are important in a number of data analysis fields, including Formal Concept Analysis (FCA), Boolean factor analysis and frequent itemset mining. However, the large number of such fixpoints present in a typical dataset requires efficient computation to make analysis tractable, particularly since any particular fixpoint may be computed many times. Because they can be computed in a canonical order, testing the canonicity of fixpoints to avoid duplicates has proven to be a key factor in the design of efficient algorithms. The most efficient of these algorithms have been variants of the Close-By-One (CbO) algorithm. In this article, the algorithms CbO, FCbO, In-Close, In-Close2 and a new variant, In-Close3, are presented together for the first time, with in-Close2 and In-Close3 being the results of breeding In-Close with FCbO. To allow them to be easily compared, the algorithms are presented in the same style and notation. The important advances in CbO are described and compared graphically using a simple example. For the first time, the algorithms are implemented using the same structures and techniques to provide a level playing field for evaluation. Their performance is tested and compared using a range of data sets and the most important features identified for a CbO 'Best-of-Breed'. This article also presents, for the first time, the 'partial-closure' canonicity test.
Association Mining and Formal Concept Analysis
2000
In this paper, we develop a connection between association queries and formal concept analysis. An association query discovers dependencies among values of an attribute grouped by other, non-primary attributes in a given relation. Formal concept analysis deals with formal mathematical tools and techniques to develop and analyze relationship between concepts and to develop concept structures. We show that dependencies found by an association query can be derived from a concept structure.
Formal Concept Analysis – Overview and Applications
Procedia Engineering, 2014
In this article we give a brief overview of the theory behind the formal concept analysis, a novel method for data representation and analysis. From given tabular input data this method finds all formal concepts and computes a concept lattice, a directed, acyclic graph, in which all formal concepts are hierarchically ordered. We describe the link between this method and formal logic, as well as graph theory. Finally we present one example of an application of this method in the field of computer aided learning.
Analysis of Large Data Sets using Formal Concept Lattices
Formal Concept Analysis (FCA) is an emerging data technology that has applications in the visual analysis of large-scale data. However, data sets are often too large (or contain too many Formal Concepts) for the resulting Concept Lattice to be readable. This paper complements existing work in this area by describing two methods by which useful and manageable Lattices can be derived from large data sets. This is achieved though the use of a set of freely available FCA tools: the Context creator FcaBedrock and the Concept miner In-Close, that were developed by the authors, and the Lattice builder ConExp. In the first method, a sub-Context is produced from a data set, giving rise to a readable Lattice that focuses on attributes of interest. In the second method, a Context is mined for 'large' Concepts which are then used to re-write the original Context, thus reducing 'noise' in the Context and giving rise to a readable Lattice that lucidly portrays a conceptual overview of the large set of data it is derived from. A three year European Framework 7 project called CUBIST will develop this work to provide FCA-based visual analytics for data warehouses.
Extracting formal concepts out of relational data
2003
Relational datasets, i.e., datasets in which individuals are described both by their own features and by their relations to other individuals, arise from various sources such as databases, both relational and object-oriented, or software models, e.g., UML class diagrams. When processing such complex datasets, it is of prime importance for an analysis tool to hold as much as possible to the initial format so that the semantics is preserved and the interpretation of the final results eased. There have been several attempts to introduce relations into the Galois lattice and formal concept analysis fields. We propose a novel approach to this problem which relies on an enhanced version of the classical binary data descriptions based on the distinction of several mutually related formal contexts.