John Roddick | Flinders University of South Australia (original) (raw)
Papers by John Roddick
ACM Computing Surveys
Sequences of events, items, or tokens occurring in an ordered metric space appear often in data a... more Sequences of events, items, or tokens occurring in an ordered metric space appear often in data and the requirement to detect and analyze frequent subsequences is a common problem. Sequential Pattern Mining arose as a subfield of data mining to focus on this field. This article surveys the approaches and algorithms proposed to date.
ACM Computing Surveys, 2013
Sequences of events, items or tokens occurring in an ordered metric space appear often in data an... more Sequences of events, items or tokens occurring in an ordered metric space appear often in data and the requirement to detect and analyse frequent subsequences is a common problem. Sequential Pattern Mining arose as a sub-field of data mining to focus on this field. This paper surveys the approaches and algorithms proposed to date.
Proceedings of the 2004 SIAM International Conference on Data Mining, 2004
The detection of recurrent episodes in long strings of to- kens has attracted some interest and a... more The detection of recurrent episodes in long strings of to- kens has attracted some interest and a variety of useful methods have been developed. The temporal relation- ship between discovered episodes may also provide use- ful knowledge of the phenomenon but as yet has received little investigation. This paper discusses an approach for finding such relationships through the proposal of a robust and ecient search strategy and eective user
Lecture Notes in Computer Science, 2006
Abstract. Traditionally text mining has had a strong link with information retrieval and classifi... more Abstract. Traditionally text mining has had a strong link with information retrieval and classification and has largely aimed to classify documents according to embedded knowledge. Association rule mining and sequence mining, on the other hand, have had a different goal; one of eliciting relationships within or about the data being mined. Recently there has been research conducted using sequence mining techniques on digital document collections by treating the text as sequential data. In this paper we propose a multi-level ...
… Workshop (ADM'03), 2003
Flinders University, ...
IEEE Transactions on Knowledge and Data Engineering, 2000
The temporal interval relationships formalized by Allen, and later extended to accommodate semi-i... more The temporal interval relationships formalized by Allen, and later extended to accommodate semi-intervals by Freksa, have been widely utilized in both data modeling and artificial intelligence research to facilitate reasoning between the relative temporal ordering of events. In practice, however, some modifications to the relationships are necessary when linear temporal sequences are provided, when event times are aggregated, or when data is supplied to a granularity which is larger than required. This paper discusses these modifications and outlines a solution to this problem which accommodates any available knowledge of interval midpoints.
... Klein (Vrije Universiteit Amsterdam, The Netherlands) Richard McClatchey (University of the W... more ... Klein (Vrije Universiteit Amsterdam, The Netherlands) Richard McClatchey (University of the West of England, UK) Federica Mandreoli (University ... 257 Luis Jesus Arevalo Rosado, Antonio Polo Marquez, Juan Marıa Fernandez Gonzalez OIS 2006-1st International Workshop on ...
THE TSQL2 TEMPORAL QUERY LANGUAGE edited by Richard Thomas Snodgrass The TSQL2 Language Design Co... more THE TSQL2 TEMPORAL QUERY LANGUAGE edited by Richard Thomas Snodgrass The TSQL2 Language Design Committee Kluwer Academic Publishers ... THE KLUWER INTERNATIONAL SERIES IN ENGINEERING AND COMPUTER SCIENCE ... THE TSQL2 TEMPORAL ...
Information Organization and Databases, 2000
... Mukesh Mohania1 and John F. Roddick2 1 Department of Computer Science, Western Michigan Unive... more ... Mukesh Mohania1 and John F. Roddick2 1 Department of Computer Science, Western Michigan Unive7"sity, USA 2 School of Informatics fj Engineering, Flinders University, Australia {moha nia@cs.wmich.edu,roddick@cs.flinders.edu.au} ...
We present a query-processing model for mobile computing using summary databases (database stored... more We present a query-processing model for mobile computing using summary databases (database stored in some predefined condensed form). We use concept hierarchies to generate summary databases from the main database in various ways. Traditional database management systems are correct in that they are able to provide answers to queries that are both sound and complete with respect to the source data. In a mobile environment, it may be advantageous to relax one or other of these criteria to enhance availability through the use of summary databases. This would provide a more optimal use of data during periods of disconnection and to enable efficient utilization of low bandwidth and restricted memory size. The model for query processing proposed uses concept hierarchies and summary databases at run time to return approximate queries when access to the main database is either undesirable or unavailable. We present a model that is able to provide varying levels of approximate answer to queries that occur at a mobile host using the summary database stored either locally at mobile host (MH) or remotely at mobile service stations (MSS). The paper also discusses some cost-benefit analyses involving storage, transmission and query processing costs.
Proceedings of the 26th …, Jan 1, 2003
The relative difference between two data values is of interest in a number of application domains... more The relative difference between two data values is of interest in a number of application domains including temporal and spatial applications, schema versioning, data warehousing (particularly data preparation), internet searching, validation and error correction, and data mining. Moreover, consistency across systems in determining such distances and the robustness of such calculations is essential in some domains and useful in many. Despite this, there is no generally adopted approach to determining such distances and no accommodation of distance within SQL or any commercially available DBMS.
Conceptual Modeling for Advanced Application …, Jan 1, 2004
Database evolution can be considered a combination of schema evolution, in which the structure ev... more Database evolution can be considered a combination of schema evolution, in which the structure evolves with the addition and deletion of attributes and relations, together with domain evolution in which an attribute's specification, semantics and/or range of allowable values changes. We present a model in which mesodata -an additional domain definition layer containing domain structure and intelligence -is used to alleviate and in some cases obviate the need for data conversion or coercion. We present the nature and use of mesodata as it affects domain evolution, such as when a domain changes, when the semantics of a domain alter and when the attribute's specification is modified.
In this paper, an algorithm for cluster generation using tabu search approach with simulated anne... more In this paper, an algorithm for cluster generation using tabu search approach with simulated annealing is proposed. The main idea of this algorithm is to use the tabu search approach to generate non-local moves for the clusters and apply the simulated annealing technique to select suitable current best solution so that speed the cluster generation. Experimental results demonstrate the proposed tabu search approach with simulated annealing algorithm for cluster generation is superior to the tabu search approach with Generalised Lloyd algorithm. 1 Clustering Clustering is the process of grouping patterns into a number of clusters, each of which contains the patterns that are similar to each other in some way. The existing clustering algorithms can be simply classified into the following two categories: hierarchical clustering and partitional clustering [1]. The hierarchical clustering operates by partitioning the patterns into successively fewer structures. This method gives rise to a dendogram in which the patterns are formed a nested sequence of partitions. Hierarchical procedures can be either agglomerative or divisive. An agglomerative clustering approach is a process in which each pattern is placed in its own cluster and these atomic clusters are gradually merged into larger and larger clusters until the desired objective is attained. A divisive clustering approach reverses the process of Data Mining II, C.A. Brebbia & N.F.F. Ebecken (Editors)
Proceedings of the …, 2009
Itemsets, which are treated as intermediate results in association mining, have attracted signifi... more Itemsets, which are treated as intermediate results in association mining, have attracted significant research due to the inherent complexity of their generation. However, there is currently little literature focusing upon the interactions between itemsets, the nature of which may potentially contain valuable information. This paper presents a novel tree-based approach to discovering item-set interactions, a task which cannot be undertaken by current association mining techniques.
Proceedings of the 3rd Asia-Pacific …, 2006
The integration of data from different sources often leads to the adoption of schemata that entai... more The integration of data from different sources often leads to the adoption of schemata that entail a loss of information in respect of one or more of the data sets being combined. The coercion of data to conform to the type of the unified attribute is one of the major reasons for this information loss. We argue that for maximal information retention it would be useful to be able to define attributes over domains capable of accommodating multiple types, that is, domains that potentially allow an attribute to take its values from ...
Active conceptual modeling of learning, 2007
There are four classes of information system that are not well served by current modelling techni... more There are four classes of information system that are not well served by current modelling techniques. First, there are systems for which the number of instances for each entity is relatively low resulting in data definition taking a disproportionate amount of effort. Second, there are systems where the storage of data and the retrieval of information must take priority over the full definition of a schema describing that data. Third, there are those that undergo regular structural change and are thus subject to information loss as a result of changes to ...
Tutorials, posters, panels and …, Jan 1, 2007
There are a number of issues for information systems which are required to collect data urgently ... more There are a number of issues for information systems which are required to collect data urgently that are not well accommodated by current conceptual modelling methodologies and as a result the modelling step (and the use of databases) is often omitted. Such issues include the fact that • the number of instances for each entity are relatively low resulting in data definition taking a disproportionate amount of effort,
ACM Computing Surveys
Sequences of events, items, or tokens occurring in an ordered metric space appear often in data a... more Sequences of events, items, or tokens occurring in an ordered metric space appear often in data and the requirement to detect and analyze frequent subsequences is a common problem. Sequential Pattern Mining arose as a subfield of data mining to focus on this field. This article surveys the approaches and algorithms proposed to date.
ACM Computing Surveys, 2013
Sequences of events, items or tokens occurring in an ordered metric space appear often in data an... more Sequences of events, items or tokens occurring in an ordered metric space appear often in data and the requirement to detect and analyse frequent subsequences is a common problem. Sequential Pattern Mining arose as a sub-field of data mining to focus on this field. This paper surveys the approaches and algorithms proposed to date.
Proceedings of the 2004 SIAM International Conference on Data Mining, 2004
The detection of recurrent episodes in long strings of to- kens has attracted some interest and a... more The detection of recurrent episodes in long strings of to- kens has attracted some interest and a variety of useful methods have been developed. The temporal relation- ship between discovered episodes may also provide use- ful knowledge of the phenomenon but as yet has received little investigation. This paper discusses an approach for finding such relationships through the proposal of a robust and ecient search strategy and eective user
Lecture Notes in Computer Science, 2006
Abstract. Traditionally text mining has had a strong link with information retrieval and classifi... more Abstract. Traditionally text mining has had a strong link with information retrieval and classification and has largely aimed to classify documents according to embedded knowledge. Association rule mining and sequence mining, on the other hand, have had a different goal; one of eliciting relationships within or about the data being mined. Recently there has been research conducted using sequence mining techniques on digital document collections by treating the text as sequential data. In this paper we propose a multi-level ...
… Workshop (ADM'03), 2003
Flinders University, ...
IEEE Transactions on Knowledge and Data Engineering, 2000
The temporal interval relationships formalized by Allen, and later extended to accommodate semi-i... more The temporal interval relationships formalized by Allen, and later extended to accommodate semi-intervals by Freksa, have been widely utilized in both data modeling and artificial intelligence research to facilitate reasoning between the relative temporal ordering of events. In practice, however, some modifications to the relationships are necessary when linear temporal sequences are provided, when event times are aggregated, or when data is supplied to a granularity which is larger than required. This paper discusses these modifications and outlines a solution to this problem which accommodates any available knowledge of interval midpoints.
... Klein (Vrije Universiteit Amsterdam, The Netherlands) Richard McClatchey (University of the W... more ... Klein (Vrije Universiteit Amsterdam, The Netherlands) Richard McClatchey (University of the West of England, UK) Federica Mandreoli (University ... 257 Luis Jesus Arevalo Rosado, Antonio Polo Marquez, Juan Marıa Fernandez Gonzalez OIS 2006-1st International Workshop on ...
THE TSQL2 TEMPORAL QUERY LANGUAGE edited by Richard Thomas Snodgrass The TSQL2 Language Design Co... more THE TSQL2 TEMPORAL QUERY LANGUAGE edited by Richard Thomas Snodgrass The TSQL2 Language Design Committee Kluwer Academic Publishers ... THE KLUWER INTERNATIONAL SERIES IN ENGINEERING AND COMPUTER SCIENCE ... THE TSQL2 TEMPORAL ...
Information Organization and Databases, 2000
... Mukesh Mohania1 and John F. Roddick2 1 Department of Computer Science, Western Michigan Unive... more ... Mukesh Mohania1 and John F. Roddick2 1 Department of Computer Science, Western Michigan Unive7"sity, USA 2 School of Informatics fj Engineering, Flinders University, Australia {moha nia@cs.wmich.edu,roddick@cs.flinders.edu.au} ...
We present a query-processing model for mobile computing using summary databases (database stored... more We present a query-processing model for mobile computing using summary databases (database stored in some predefined condensed form). We use concept hierarchies to generate summary databases from the main database in various ways. Traditional database management systems are correct in that they are able to provide answers to queries that are both sound and complete with respect to the source data. In a mobile environment, it may be advantageous to relax one or other of these criteria to enhance availability through the use of summary databases. This would provide a more optimal use of data during periods of disconnection and to enable efficient utilization of low bandwidth and restricted memory size. The model for query processing proposed uses concept hierarchies and summary databases at run time to return approximate queries when access to the main database is either undesirable or unavailable. We present a model that is able to provide varying levels of approximate answer to queries that occur at a mobile host using the summary database stored either locally at mobile host (MH) or remotely at mobile service stations (MSS). The paper also discusses some cost-benefit analyses involving storage, transmission and query processing costs.
Proceedings of the 26th …, Jan 1, 2003
The relative difference between two data values is of interest in a number of application domains... more The relative difference between two data values is of interest in a number of application domains including temporal and spatial applications, schema versioning, data warehousing (particularly data preparation), internet searching, validation and error correction, and data mining. Moreover, consistency across systems in determining such distances and the robustness of such calculations is essential in some domains and useful in many. Despite this, there is no generally adopted approach to determining such distances and no accommodation of distance within SQL or any commercially available DBMS.
Conceptual Modeling for Advanced Application …, Jan 1, 2004
Database evolution can be considered a combination of schema evolution, in which the structure ev... more Database evolution can be considered a combination of schema evolution, in which the structure evolves with the addition and deletion of attributes and relations, together with domain evolution in which an attribute's specification, semantics and/or range of allowable values changes. We present a model in which mesodata -an additional domain definition layer containing domain structure and intelligence -is used to alleviate and in some cases obviate the need for data conversion or coercion. We present the nature and use of mesodata as it affects domain evolution, such as when a domain changes, when the semantics of a domain alter and when the attribute's specification is modified.
In this paper, an algorithm for cluster generation using tabu search approach with simulated anne... more In this paper, an algorithm for cluster generation using tabu search approach with simulated annealing is proposed. The main idea of this algorithm is to use the tabu search approach to generate non-local moves for the clusters and apply the simulated annealing technique to select suitable current best solution so that speed the cluster generation. Experimental results demonstrate the proposed tabu search approach with simulated annealing algorithm for cluster generation is superior to the tabu search approach with Generalised Lloyd algorithm. 1 Clustering Clustering is the process of grouping patterns into a number of clusters, each of which contains the patterns that are similar to each other in some way. The existing clustering algorithms can be simply classified into the following two categories: hierarchical clustering and partitional clustering [1]. The hierarchical clustering operates by partitioning the patterns into successively fewer structures. This method gives rise to a dendogram in which the patterns are formed a nested sequence of partitions. Hierarchical procedures can be either agglomerative or divisive. An agglomerative clustering approach is a process in which each pattern is placed in its own cluster and these atomic clusters are gradually merged into larger and larger clusters until the desired objective is attained. A divisive clustering approach reverses the process of Data Mining II, C.A. Brebbia & N.F.F. Ebecken (Editors)
Proceedings of the …, 2009
Itemsets, which are treated as intermediate results in association mining, have attracted signifi... more Itemsets, which are treated as intermediate results in association mining, have attracted significant research due to the inherent complexity of their generation. However, there is currently little literature focusing upon the interactions between itemsets, the nature of which may potentially contain valuable information. This paper presents a novel tree-based approach to discovering item-set interactions, a task which cannot be undertaken by current association mining techniques.
Proceedings of the 3rd Asia-Pacific …, 2006
The integration of data from different sources often leads to the adoption of schemata that entai... more The integration of data from different sources often leads to the adoption of schemata that entail a loss of information in respect of one or more of the data sets being combined. The coercion of data to conform to the type of the unified attribute is one of the major reasons for this information loss. We argue that for maximal information retention it would be useful to be able to define attributes over domains capable of accommodating multiple types, that is, domains that potentially allow an attribute to take its values from ...
Active conceptual modeling of learning, 2007
There are four classes of information system that are not well served by current modelling techni... more There are four classes of information system that are not well served by current modelling techniques. First, there are systems for which the number of instances for each entity is relatively low resulting in data definition taking a disproportionate amount of effort. Second, there are systems where the storage of data and the retrieval of information must take priority over the full definition of a schema describing that data. Third, there are those that undergo regular structural change and are thus subject to information loss as a result of changes to ...
Tutorials, posters, panels and …, Jan 1, 2007
There are a number of issues for information systems which are required to collect data urgently ... more There are a number of issues for information systems which are required to collect data urgently that are not well accommodated by current conceptual modelling methodologies and as a result the modelling step (and the use of databases) is often omitted. Such issues include the fact that • the number of instances for each entity are relatively low resulting in data definition taking a disproportionate amount of effort,