John F Roddick | Flinders University of South Australia (original) (raw)
Papers by John F Roddick
Data clustering has become an important task for discovering significant patterns and characteris... more Data clustering has become an important task for discovering significant patterns and characteristics in large spatial databases. The Mufti- Centroid, Multi-Run Sampling Scheme (MCMRS) has been shown to be effective in improving the k-medoids-based clustering algorit hms in our previous work. In this paper, a more advanced sampling scheme termed Incremental MultiCentrozd, Multi-Run Sampling Scheme (IMCMRS) is proposed for k-medoidsbased clustering algorithms. Experimental results demonstrate the proposed scheme can not only reduce by more than 80’ZOcomputation time but also reduce the average distance per object compared with CLARA and CLARANS. IMCMRS is also superior to MCMRS.
ACM SIGMOD Record, 1994
This document contains definitions of a wide range of concepts specific to and widely used within... more This document contains definitions of a wide range of concepts specific to and widely used within temporal databases. In addition to providing definitions, the document also includes separate explanations of many of the defined concepts. Two sets of criteria are included. First, all included concepts were required to satisfy four relevance criteria, and, second, the naming of the concepts was resolved using a set of evaluation criteria. The concepts are grouped into three categories: concepts of general database interest, of temporal database interest, and of specialized interest. This document is a digest of a full version of the glossary 1 . In addition to the material included here, the full version includes substantial discussions of the naming of the concepts.The consensus effort that lead to this glossary was initiated in Early 1992. Earlier status documents appeared in March 1993 and December 1992 and included terms proposed after an initial glossary appeared in SIGMOD Record...
WIT Transactions on Information and Communication Technologies, 2004
In this paper, a new inequality is derived which can be used for the problem of nearest neighbor ... more In this paper, a new inequality is derived which can be used for the problem of nearest neighbor searching. We also present a searching technique referred to as a previous medoid index to reduce the computation time particularly for the kmedoids-based algorithms. A novel method is also proposed to reduce the computational complexity by the utilization of memory. Four new search strategies for k-medoids-based algorithms based on the new inequality, previous medoid index, the utilization of memory, triangular inequality criteria and partial distance search are proposed.Experimentalresults demonstratethatthe proposedalgorithm applied to the CLARANS algorithm may reduce the computation time from 88.8% to 95.3% with the same average distance per object comparing with CLALRANS. The derived new inequality and proposed search strategies can also be applied to the nearest neighbor searching and the other clustering algorithms.
Ethics must be a condition of the world, like logic. Ludwig Wittgenstein, 1889-1951. The developm... more Ethics must be a condition of the world, like logic. Ludwig Wittgenstein, 1889-1951. The development of data mining is presenting significant ethical and social issues that must be addressed if the new technology is to widely accepted. This paper explores a range of these issues identifying in particular: privacy, data accuracy, database security, stereotyping, legal liability and the broader research dilemmas. Each issue is discussed and the implications for policy development are explored. The paper includes some consideration of possible solutions and suggests avenues for further investigation.
Active conceptual modeling of learning, 2007
There are four classes of information system that are not well served by current modelling techni... more There are four classes of information system that are not well served by current modelling techniques. First, there are systems for which the number of instances for each entity is relatively low resulting in data definition taking a disproportionate amount of effort. Second, there are systems where the storage of data and the retrieval of information must take priority over the full definition of a schema describing that data. Third, there are those that undergo regular structural change and are thus subject to information loss as a result of changes to ...
Encyclopaedia of Data Warehousing and Mining, 2nd edition, IGI Publishing, 2008
To paraphrase Winograd (1992), we bring to our communities a tacit comprehension of right and wro... more To paraphrase Winograd (1992), we bring to our communities a tacit comprehension of right and wrong that makes social responsibility an intrinsic part of our culture. Our ethics are the moral principles we use to assert social responsibility and to perpetuate safe and just societies. Moreover, the introduction of new technologies can have a profound effect on our ethical principles. The emergence of very large databases, and the associated automated data analysis tools, present yet another set of ethical challenges to ...
Information and Software Technology
Schema versioning is one of a number of related areas dealing with the same general problem—that ... more Schema versioning is one of a number of related areas dealing with the same general problem—that of using multiple heterogeneous schemata for various database related tasks. In particular, schema versioning, and its weaker companion, schema evolution, deal with ...
Handbook of Swarm Intelligence, 2010
A novel parallel approach to implement particle swarm optimization (PSO) algorithm on graphic pro... more A novel parallel approach to implement particle swarm optimization (PSO) algorithm on graphic processing units (GPU) in a personal computer is proposed in this chapter. By using the general-purpose computing ability of GPU and under the software platform of compute unified device architecture (CUDA) which is developed by NVIDIA, the PSO algorithm can be executed in parallel on the GPU. The process of fitness evaluation, as well as the updating of the velocity and the position of all the particles in the swarm are parallelized ...
In this paper, an algorithm for cluster generation using tabu search approach with simulated anne... more In this paper, an algorithm for cluster generation using tabu search approach with simulated annealing is proposed. The main idea of this algorithm is to use the tabu search approach to generate non-local moves for the clusters and apply the simulated annealing technique to select suitable current best solution so that speed the cluster generation. Experimental results demonstrate the proposed tabu search approach with simulated annealing algorithm for cluster generation is superior to the tabu search approach with Generalised Lloyd algorithm.
Encyclopedia of Database Systems
Encyclopedia of Database Systems
J. Netw. Intell., 2018
The content-based image retrieval (CBIR) is the most acceptable method often used in an image ret... more The content-based image retrieval (CBIR) is the most acceptable method often used in an image retrieval system because it can manage image database efficiently and effectively. The CBIR methods usually retrieve the images by image features. In this paper, we exploit a region called affine invariant region (AIR) as an image feature to help effectively retrieve the images even when the images have been attacked or processed. Moreover, we use vector quantization (VQ) to reduce the comparison of image features for improving the retrieval efficiency. The experimental results show that the method has a higher recall rate, lower retrieval time, and promising accuracy.
J. Inf. Hiding Multim. Signal Process., 2017
This paper presents an improvement of the flower pollination algorithm (FPA) for optimization loc... more This paper presents an improvement of the flower pollination algorithm (FPA) for optimization localization issues in wireless sensor networks (WSN). A novel probabilistic is used to generate a new candidate of competition for simulation optimization operations. The actual population of tentative solutions does not employ, but a unique representative probabilistic of them accumulate over generations. Evaluating this proposed method, we firstly used six selected benchmark functions to experiment and then we applied the proposal to solve the optimization problem of localization in WSN to confirm its performance further. The testing results compared with the original version of FPA show that the proposed method produces considerable improvements of reducing variable storing memory and running time consumption. Compared with the other approaches in the literature, the localization obtained from the proposed method is more accuracy and convergence rate indicate that the proposed method pr...
J. Netw. Intell., 2019
There are natural connections and structural similarities between the basic characteristics of ch... more There are natural connections and structural similarities between the basic characteristics of chaotic systems and cryptology. The application of chaotic systems to data encryption has also become a trend. In this paper, a pipelined architecture is introduced in the design of Logistic chaotic system, which greatly improves the operating frequency of the system and finally realizes a high-speed chaotic pseudo-random sequence generator based on the pipelined architecture on the Xilinx Artix-7 series FPGA chip. The operation frequency has reached 296MHz, and achieves a throughput of 296Mbps.
To date, most association rule mining algorithms have assumed that the domains of items are eithe... more To date, most association rule mining algorithms have assumed that the domains of items are either discrete or, in a limited number of cases, hierarchical, categorical or linear. This constrains the search for interesting rules to those that satisfy the specified quality metrics as independent values or as higher level concepts of those values. However, in many cases the determination of a single hierarchy is not practicable and, for many datasets, an item's value may be taken from a domain that is more conveniently structured as a graph with weights indicating semantic (or conceptual) distance. Research in the development of algorithms that generate disjunctive association rules has allowed the production of rules such as Radios ∨ TVs → Cables. In many cases there is little semantic relationship between the disjunctive terms and arguably less readable rules such as Radios ∨ Tuesday → Cables can result. This paper describes two association rule mining algorithms, SemGrAMG and Se...
Data hiding is a widely used technique to embed secrets in multimedia area so to achieve lower di... more Data hiding is a widely used technique to embed secrets in multimedia area so to achieve lower distortion and higher embedding capacity. Up to now, many methods which focus on solving these two problems have proposed constantly. Previous methods use various shaped shells to carry secret data in an image, but their schemes have high distortion of images and low capacity of carrying secrets due to their simple geometry layouts. We find out adjacent pixels have similar values, which means we can utilize this large area, and, therefore, we manipulate the data embedding and extracting on a difference-coordinate plan instead of the traditional pixel-coordinate plan. In this paper we propose a method aiming at solving these two problems mentioned above. This scheme embeds our secrets by using 3 pixels every time with the guidance of the flower-shaped reference matrix under a different coordinate system. The flower-shaped reference matrix combines three parts including petal matrix, calyx m...
2002 IEEE Region 10 Conference on Computers, Communications, Control and Power Engineering. TENCOM '02. Proceedings., 2002
In this paper, the concept of previous medoid index is introduced. The' utilization of memory for... more In this paper, the concept of previous medoid index is introduced. The' utilization of memory for efficient medoid search is also presented. We propose a hybrid search approach Cor the problem of nearest neighbor search. The hybrid search approach is to combine tlie previbus medoid index, the utilization of memory, the criterion of triangular inequality elimination add the partial distance search. The proposed hybrid search approach is applied to the k-medoids-based algoritluns. Experimental results based on Gauss-Markov source, curve data set and elliptic clusters demonstrate that the proposed algorithm applied to C L 4 M 5 ' algorithm may reduce the number of distance calculation from 88.4% to 95.2% with the s a n~ average distance per object comparing with CURAVX The proposed hybrid search approach can also he applied to the nearest neighbor searching and the other clustering algorithms.
Proceedings of the 2004 SIAM International Conference on Data Mining, 2004
The detection of recurrent episodes in long strings of tokens has attracted some interest and a v... more The detection of recurrent episodes in long strings of tokens has attracted some interest and a variety of useful methods have been developed. The temporal relationship between discovered episodes may also provide useful knowledge of the phenomenon but as yet has received little investigation. This paper discusses an approach for finding such relationships through the proposal of a robust and efficient search strategy and effective user interface both of which are validated through experiment.
ACM Computing Surveys, 2006
The task of finding correlations between items in a dataset, association mining, has received con... more The task of finding correlations between items in a dataset, association mining, has received considerable attention over the last decade. This article presents a survey of association mining fundamentals, detailing the evolution of association mining algorithms from the seminal to the state-of-the-art. This survey focuses on the fundamental principles of association mining, that is, itemset identification, rule generation, and their generic optimizations.
Data clustering has become an important task for discovering significant patterns and characteris... more Data clustering has become an important task for discovering significant patterns and characteristics in large spatial databases. The Mufti- Centroid, Multi-Run Sampling Scheme (MCMRS) has been shown to be effective in improving the k-medoids-based clustering algorit hms in our previous work. In this paper, a more advanced sampling scheme termed Incremental MultiCentrozd, Multi-Run Sampling Scheme (IMCMRS) is proposed for k-medoidsbased clustering algorithms. Experimental results demonstrate the proposed scheme can not only reduce by more than 80’ZOcomputation time but also reduce the average distance per object compared with CLARA and CLARANS. IMCMRS is also superior to MCMRS.
ACM SIGMOD Record, 1994
This document contains definitions of a wide range of concepts specific to and widely used within... more This document contains definitions of a wide range of concepts specific to and widely used within temporal databases. In addition to providing definitions, the document also includes separate explanations of many of the defined concepts. Two sets of criteria are included. First, all included concepts were required to satisfy four relevance criteria, and, second, the naming of the concepts was resolved using a set of evaluation criteria. The concepts are grouped into three categories: concepts of general database interest, of temporal database interest, and of specialized interest. This document is a digest of a full version of the glossary 1 . In addition to the material included here, the full version includes substantial discussions of the naming of the concepts.The consensus effort that lead to this glossary was initiated in Early 1992. Earlier status documents appeared in March 1993 and December 1992 and included terms proposed after an initial glossary appeared in SIGMOD Record...
WIT Transactions on Information and Communication Technologies, 2004
In this paper, a new inequality is derived which can be used for the problem of nearest neighbor ... more In this paper, a new inequality is derived which can be used for the problem of nearest neighbor searching. We also present a searching technique referred to as a previous medoid index to reduce the computation time particularly for the kmedoids-based algorithms. A novel method is also proposed to reduce the computational complexity by the utilization of memory. Four new search strategies for k-medoids-based algorithms based on the new inequality, previous medoid index, the utilization of memory, triangular inequality criteria and partial distance search are proposed.Experimentalresults demonstratethatthe proposedalgorithm applied to the CLARANS algorithm may reduce the computation time from 88.8% to 95.3% with the same average distance per object comparing with CLALRANS. The derived new inequality and proposed search strategies can also be applied to the nearest neighbor searching and the other clustering algorithms.
Ethics must be a condition of the world, like logic. Ludwig Wittgenstein, 1889-1951. The developm... more Ethics must be a condition of the world, like logic. Ludwig Wittgenstein, 1889-1951. The development of data mining is presenting significant ethical and social issues that must be addressed if the new technology is to widely accepted. This paper explores a range of these issues identifying in particular: privacy, data accuracy, database security, stereotyping, legal liability and the broader research dilemmas. Each issue is discussed and the implications for policy development are explored. The paper includes some consideration of possible solutions and suggests avenues for further investigation.
Active conceptual modeling of learning, 2007
There are four classes of information system that are not well served by current modelling techni... more There are four classes of information system that are not well served by current modelling techniques. First, there are systems for which the number of instances for each entity is relatively low resulting in data definition taking a disproportionate amount of effort. Second, there are systems where the storage of data and the retrieval of information must take priority over the full definition of a schema describing that data. Third, there are those that undergo regular structural change and are thus subject to information loss as a result of changes to ...
Encyclopaedia of Data Warehousing and Mining, 2nd edition, IGI Publishing, 2008
To paraphrase Winograd (1992), we bring to our communities a tacit comprehension of right and wro... more To paraphrase Winograd (1992), we bring to our communities a tacit comprehension of right and wrong that makes social responsibility an intrinsic part of our culture. Our ethics are the moral principles we use to assert social responsibility and to perpetuate safe and just societies. Moreover, the introduction of new technologies can have a profound effect on our ethical principles. The emergence of very large databases, and the associated automated data analysis tools, present yet another set of ethical challenges to ...
Information and Software Technology
Schema versioning is one of a number of related areas dealing with the same general problem—that ... more Schema versioning is one of a number of related areas dealing with the same general problem—that of using multiple heterogeneous schemata for various database related tasks. In particular, schema versioning, and its weaker companion, schema evolution, deal with ...
Handbook of Swarm Intelligence, 2010
A novel parallel approach to implement particle swarm optimization (PSO) algorithm on graphic pro... more A novel parallel approach to implement particle swarm optimization (PSO) algorithm on graphic processing units (GPU) in a personal computer is proposed in this chapter. By using the general-purpose computing ability of GPU and under the software platform of compute unified device architecture (CUDA) which is developed by NVIDIA, the PSO algorithm can be executed in parallel on the GPU. The process of fitness evaluation, as well as the updating of the velocity and the position of all the particles in the swarm are parallelized ...
In this paper, an algorithm for cluster generation using tabu search approach with simulated anne... more In this paper, an algorithm for cluster generation using tabu search approach with simulated annealing is proposed. The main idea of this algorithm is to use the tabu search approach to generate non-local moves for the clusters and apply the simulated annealing technique to select suitable current best solution so that speed the cluster generation. Experimental results demonstrate the proposed tabu search approach with simulated annealing algorithm for cluster generation is superior to the tabu search approach with Generalised Lloyd algorithm.
Encyclopedia of Database Systems
Encyclopedia of Database Systems
J. Netw. Intell., 2018
The content-based image retrieval (CBIR) is the most acceptable method often used in an image ret... more The content-based image retrieval (CBIR) is the most acceptable method often used in an image retrieval system because it can manage image database efficiently and effectively. The CBIR methods usually retrieve the images by image features. In this paper, we exploit a region called affine invariant region (AIR) as an image feature to help effectively retrieve the images even when the images have been attacked or processed. Moreover, we use vector quantization (VQ) to reduce the comparison of image features for improving the retrieval efficiency. The experimental results show that the method has a higher recall rate, lower retrieval time, and promising accuracy.
J. Inf. Hiding Multim. Signal Process., 2017
This paper presents an improvement of the flower pollination algorithm (FPA) for optimization loc... more This paper presents an improvement of the flower pollination algorithm (FPA) for optimization localization issues in wireless sensor networks (WSN). A novel probabilistic is used to generate a new candidate of competition for simulation optimization operations. The actual population of tentative solutions does not employ, but a unique representative probabilistic of them accumulate over generations. Evaluating this proposed method, we firstly used six selected benchmark functions to experiment and then we applied the proposal to solve the optimization problem of localization in WSN to confirm its performance further. The testing results compared with the original version of FPA show that the proposed method produces considerable improvements of reducing variable storing memory and running time consumption. Compared with the other approaches in the literature, the localization obtained from the proposed method is more accuracy and convergence rate indicate that the proposed method pr...
J. Netw. Intell., 2019
There are natural connections and structural similarities between the basic characteristics of ch... more There are natural connections and structural similarities between the basic characteristics of chaotic systems and cryptology. The application of chaotic systems to data encryption has also become a trend. In this paper, a pipelined architecture is introduced in the design of Logistic chaotic system, which greatly improves the operating frequency of the system and finally realizes a high-speed chaotic pseudo-random sequence generator based on the pipelined architecture on the Xilinx Artix-7 series FPGA chip. The operation frequency has reached 296MHz, and achieves a throughput of 296Mbps.
To date, most association rule mining algorithms have assumed that the domains of items are eithe... more To date, most association rule mining algorithms have assumed that the domains of items are either discrete or, in a limited number of cases, hierarchical, categorical or linear. This constrains the search for interesting rules to those that satisfy the specified quality metrics as independent values or as higher level concepts of those values. However, in many cases the determination of a single hierarchy is not practicable and, for many datasets, an item's value may be taken from a domain that is more conveniently structured as a graph with weights indicating semantic (or conceptual) distance. Research in the development of algorithms that generate disjunctive association rules has allowed the production of rules such as Radios ∨ TVs → Cables. In many cases there is little semantic relationship between the disjunctive terms and arguably less readable rules such as Radios ∨ Tuesday → Cables can result. This paper describes two association rule mining algorithms, SemGrAMG and Se...
Data hiding is a widely used technique to embed secrets in multimedia area so to achieve lower di... more Data hiding is a widely used technique to embed secrets in multimedia area so to achieve lower distortion and higher embedding capacity. Up to now, many methods which focus on solving these two problems have proposed constantly. Previous methods use various shaped shells to carry secret data in an image, but their schemes have high distortion of images and low capacity of carrying secrets due to their simple geometry layouts. We find out adjacent pixels have similar values, which means we can utilize this large area, and, therefore, we manipulate the data embedding and extracting on a difference-coordinate plan instead of the traditional pixel-coordinate plan. In this paper we propose a method aiming at solving these two problems mentioned above. This scheme embeds our secrets by using 3 pixels every time with the guidance of the flower-shaped reference matrix under a different coordinate system. The flower-shaped reference matrix combines three parts including petal matrix, calyx m...
2002 IEEE Region 10 Conference on Computers, Communications, Control and Power Engineering. TENCOM '02. Proceedings., 2002
In this paper, the concept of previous medoid index is introduced. The' utilization of memory for... more In this paper, the concept of previous medoid index is introduced. The' utilization of memory for efficient medoid search is also presented. We propose a hybrid search approach Cor the problem of nearest neighbor search. The hybrid search approach is to combine tlie previbus medoid index, the utilization of memory, the criterion of triangular inequality elimination add the partial distance search. The proposed hybrid search approach is applied to the k-medoids-based algoritluns. Experimental results based on Gauss-Markov source, curve data set and elliptic clusters demonstrate that the proposed algorithm applied to C L 4 M 5 ' algorithm may reduce the number of distance calculation from 88.4% to 95.2% with the s a n~ average distance per object comparing with CURAVX The proposed hybrid search approach can also he applied to the nearest neighbor searching and the other clustering algorithms.
Proceedings of the 2004 SIAM International Conference on Data Mining, 2004
The detection of recurrent episodes in long strings of tokens has attracted some interest and a v... more The detection of recurrent episodes in long strings of tokens has attracted some interest and a variety of useful methods have been developed. The temporal relationship between discovered episodes may also provide useful knowledge of the phenomenon but as yet has received little investigation. This paper discusses an approach for finding such relationships through the proposal of a robust and efficient search strategy and effective user interface both of which are validated through experiment.
ACM Computing Surveys, 2006
The task of finding correlations between items in a dataset, association mining, has received con... more The task of finding correlations between items in a dataset, association mining, has received considerable attention over the last decade. This article presents a survey of association mining fundamentals, detailing the evolution of association mining algorithms from the seminal to the state-of-the-art. This survey focuses on the fundamental principles of association mining, that is, itemset identification, rule generation, and their generic optimizations.