Mining precise cause and effect rules in large time series data of socio-economic indicators (original) (raw)
Related papers
Survey and Evaluation of Causal Discovery Methods for Time Series
Journal of Artificial Intelligence Research, 2022
We introduce in this survey the major concepts, models, and algorithms proposed so far to infer causal relations from observational time series, a task usually referred to as causal discovery in time series. To do so, after a description of the underlying concepts and modelling assumptions, we present different methods according to the family of approaches they belong to: Granger causality, constraint-based approaches, noise-based approaches, score-based approaches, logic-based approaches, topology-based approaches, and difference-based approaches. We then evaluate several representative methods to illustrate the behaviour of different families of approaches. This illustration is conducted on both artificial and real datasets, with different characteristics. The main conclusions one can draw from this survey is that causal discovery in times series is an active research field in which new methods (in every family of approaches) are regularly proposed, and that no family or method st...
Mining temporal lag from fluctuating events for correlation and root cause analysis
10th International Conference on Network and Service Management (CNSM) and Workshop, 2014
The importance of mining time lags of hidden temporal dependencies from sequential data is highlighted in many domains including system management, stock market analysis, climate monitoring, and more. Mining time lags of temporal dependencies provides useful insights into understanding the sequential data and predicting its evolving trend. Traditional methods mainly utilize the predefined time window to analyze the sequential items or employ statistic techniques to identify the temporal dependencies from the sequential data. However, it is a challenging task for existing methods to find time lag of temporal dependencies in the real world, where time lags are fluctuating, noisy, and tend to be interleaved with each other. This paper introduces a parametric model to describe noisy time lags. Then an efficient expectation maximization approach is proposed to find the time lag with maximum likelihood. This paper also contributes an approximation method for learning time lag to improve the scalability without incurring significant loss of accuracy. Extensive experiments on both synthetic and real data sets are conducted to demonstrate the effectiveness and efficiency of proposed methods.
Mining causal relationships in multidimensional time series
2010
Time series are ubiquitous in all domains of human endeavor. They are generated, stored, and manipulated during any kind of activity. The goal of this chapter is to introduce a novel approach to mine multidimensional time-series data for causal relationships. The main feature of the proposed system is supporting discovery of causal relations based on automatically discovered recurring patterns in the input time series. This is achieved by integrating a variety of data mining techniques.
Enterprise Information Systems VI
Temporal mining is a natural extension of data mining with added capabilities of discovering interesting patterns, inferring relationships of contextual and temporal proximity and may also lead to possible causeeffect associations. Temporal mining covers a wide range of paradigms for knowledge modeling and discovery. A common practice is to discover frequent sequences and patterns of a single variable. In this paper we present a new algorithm which is the combination of many existing ideas consists of the reference event as proposed in (Bettini, Wang et al. 1998), the event detection technique proposed in (Guralnik and Srivastava 1999), the large fraction proposed in (Mannila, Toivonen et al. 1997), the causal inference proposed in (Blum 1982) We use all of these ideas to build up our new algorithm for the discovery of multivariable sequences in the form of the predisposing factor and co-incident factor of the reference event of interest. We define the event as positive direction of data change or negative direction of data change above a threshold value. From these patterns we infer predisposing and co-incident factors with respect to a reference variable. For this purpose we study the Open Source Software data collected from SourceForge website. Out of 240+ attributes we only consider thirteen time dependent attributes such as Page-views,
Discovering Temporal/Causal Rules: A Comparison of Methods
2003
We describe TimeSleuth, a hybrid tool based on the C4.5 classification software, which is intended for the discovery of temporal/causal rules. Temporally ordered data are gathered from observable attributes of a system, and used to discover relations among the attributes. In general, such rules could be atemporal or temporal. We evaluate TimeSleuth using synthetic data sets with well-known causal relations as well as real weather data. We show that by performing appropriate preprocessing and postprocessing operations, TimeSleuth extends C4.5’s domain of applicability to the unsupervised discovery of temporal relations among ordered data. We compare the results obtained from TimeSleuth to those of TETRAD and CaMML, and show that TimeSleuth performs better than the other systems.
Path Signature Area-Based Causal Discovery in Coupled Time Series
2021
Coupled dynamical systems are frequently observed in nature, but often not well understood in terms of their causal structure without additional domain knowledge about the system. Especially when analyzing observational time series data of dynamical systems where it is not possible to conduct controlled experiments, for example time series of climate variables, it can be challenging to determine how features causally influence each other. There are many techniques available to recover causal relationships from data, such as Granger causality, convergent cross mapping, and causal graph structure learning approaches such as PCMCI. Path signatures and their associated signed areas provide a new way to approach the analysis of causally linked dynamical systems, particularly in informing a model-free, data-driven approach to algorithmic causal discovery. With this paper, we explore the use of path signatures in causal discovery and propose the application of confidence sequences to analy...
Discovery of Causality and Acausality from Temporal Sequential Data
2005
In this thesis, we present a solution to the problem of discovering rules from sequential data. As part of the solution, the Temporal Investigation Method for Enregistered Record Sequences (TIMERS) and its implementation, the TimeSleuth software, are introduced. TIMERS uses the passage of time between attribute observations as justification for judging the causality of a rule set. Given a sorted sequence of input data records, and assuming that the effects take time to manifest themselves, we merge the input records to bring potential causes and effects together in the same record. Three tests are performed using three different assumptions on the nature of the relationship: instantaneous, causal,
Detecting causal associations in large nonlinear time series datasets
2018
Identifying causal relationships from observational time series data is a key problem in disciplines such as climate science or neuroscience, where experiments are often not possible. Data-driven causal inference is challenging since datasets are often high-dimensional and nonlinear with limited sample sizes. Here we introduce a novel method that flexibly combines linear or nonlinear conditional independence tests with a causal discovery algorithm that allows to reconstruct causal networks from large-scale time series datasets. We validate the method on a well-established climatic teleconnection connecting the tropical Pacific with extra-tropical temperatures and using large-scale synthetic datasets mimicking the typical properties of real data. The experiments demonstrate that our method outperforms alternative techniques in detection power from small to large-scale datasets and opens up entirely new possibilities to discover causal networks from time series across a range of resea...
Causal Graph Discovery For Hydrological Time Series Knowledge Discovery
2014
Causal relationship delivers important information in hydrological study to explore the causes of abnormal hydrology phenomena such as drought and flood, which will help improving our prediction and response ability to natural disasters. In this paper, we propose a new approach, mutual information causal (MI-Causal), for causal relationship discovery in time series data, which embodies the advantages of existing approaches and overcomes the limitations to satisfy the need from hydrological domain. Every time series data contain information from its causes and this information can be transferred to its effect. From this idea, we can create a causal graph in the same conditions based approaches but do not require high number of independency tests and causal relation calculation. Furthermore, the lead time is reported in the discovery of causal relationship, which is missing current causality research. The experimental results from both synthetic and real time hydrological data show that our proposed method outperforms regression approaches and Bayesian based approaches. CAUSAL DISCOVERY ALGORITHM Definition of causality Causal inference or causal relationship discovery is an important task in hydrological study to explore the causes of abnormal hydrology phenomena such as drought and flood, which will help improving our prediction and response ability to natural disasters. Different from generic causality study where causal relation discovery is sufficient, for extreme hydrological situation prediction and modeling, we need not only to construct a causal graph to reveal the contributing factors, but also to provide the lead time of each cause to its effect. Lead time is the time difference between the occurrence of lead and effect. There are two widely used causality definitions, one is from Granger [1] and the other is from Pearl [2]. Granger's causality has been widely used in hydrology, economics and finance. Granger utilizes linear auto-regressive model to identify causal relationships between time series. The major disadvantage of Granger's causality is its limitation to linear model. Research has been carried out using either Structure Equation Modeling (SEM) approach such as Shimizu et al. [3], Zhang et al. [4], Lacerda et al. [5], and Mooij et al. [6] or regression approach such as Haufe et al. [7], Hoyer et al. [8], Liu et al. [9], and Lazano et al. [10].
TimeSleuth: A Tool for Discovering Causal and Temporal Rules
IEEE Transactions on Applications and Industry, 2002
Discovering causal and temporal relations in a system is essential to understanding how it works, and to learning to control the behaviour of the system. TimeSleuth is a causality miner that uses association relations as the basis for the discovery of causal and temporal relations. It does so by introducing time into the observed data. TimeSleuth uses C4.5 as its association discoverer, and by using a series of preprocessing and post-processing techniques to enable the user to try different scenarios for mining causality. The data to be mined should originate sequentially from a single system. TimeSleuth's use of a standard decision tree builder such as C4.5 puts it outside the current mainstream method of discovering causality, which is based on conditional independencies and causal Bayesian Networks. This paper introduces TimeSleuth as a tool, and describes its functionality. TimeSleuth expands the abilities of C4.5 in some important ways. It is an unsupervised tool that can handle and interpret temporal data. It also helps the user in analyzing the relationships among the attributes by enabling him/her to see the rules, and statistics about them, in tabular form. There is also a mechanism to distinguish between causality and acausal relations. The user is thus encouraged to perform experiments and discover the nature of relationships among the data.