Andrew Edmonds - Academia.edu (original) (raw)

Papers by Andrew Edmonds

Research paper thumbnail of Hierarchical Learning Applied to Word Sense Disambiguation

SSRN Electronic Journal, 2016

This paper introduces a form of Hierarchical Learning that permits highly relevant association ru... more This paper introduces a form of Hierarchical Learning that permits highly relevant association rules to be extracted from data items ambiguously related to a hierarchy. Addressing the problem of word sense disambiguation in natural language processing, this paper shows how references between words and hypernymy hierarchies may be used to generate highly relevant general rules representing valid associations that can thereafter be used to disambiguate unseen text.

Research paper thumbnail of Using concept structures for efficient document comparison and location

2007 IEEE Symposium on Computational Intelligence and Data Mining, 2007

Page 1. Abstract–A method is discussed for comparing and locating similar documents in a computat... more Page 1. Abstract–A method is discussed for comparing and locating similar documents in a computationally efficient manner by making use of inferred concept statistics, rather than word frequencies. This novel technique uses ...

Research paper thumbnail of Multilingual extraction and editing of concept strings for the legal domain

Advances in Computer Science : an International Journal, 2016

Identifying semantic expressions (so-called concept strings (CSs)) in multilingual corpora is an ... more Identifying semantic expressions (so-called concept strings (CSs)) in multilingual corpora is an important NLP task, as it allows web search engines to define and perform semantic queries over large collection of documents. Existing web search engines in the legal domain are mainly limited to keyword search, in which the query word is matched against the textual content of the documents. This paper presents a novel framework named the Concept Strings Framework that makes use of CSs for representing the content of the documents, and for allowing semantic search over them. These CSs can consist of individual knowledge base (KB) concepts (e.g. WordNet concepts) or combination of them. In addition, this paper presents an interactive web-based toolkit, called the Template Editor that enables the creation, editing and evaluation of CSs. Experiments on two publicly available legislation websites show satisfactory results.

Research paper thumbnail of Multilingual extraction and editing of concept strings for the legal domain

Identifying semantic expressions (so-called concept strings (CSs)) in multilingual corpora is an ... more Identifying semantic expressions (so-called concept strings (CSs)) in multilingual corpora is an important NLP task, as it allows web search engines to define and perform semantic queries over large collection of documents. Existing web search engines in the legal domain are mainly limited to keyword search, in which the query word is matched against the textual content of the documents. This paper presents a novel framework named the Concept Strings Framework that makes use of CSs for representing the content of the documents, and for allowing semantic search over them. These CSs can consist of individual knowledge base (KB) concepts (e.g. WordNet concepts) or combination of them. In addition, this paper presents an interactive web-based toolkit, called the Template Editor that enables the creation, editing and evaluation of CSs. Experiments on two publicly available legislation websites show satisfactory results.

Research paper thumbnail of A discussion of methodology for an enquiry into the effectiveness of marketing tools used by universities in the UK

This work in progress paper describes the initial planning and data collection stages of work see... more This work in progress paper describes the initial planning and data collection stages of work seeking to analyse the effectiveness of various strategies for marketing universities. This will update existing work from 2001 collected from university marketers collating the perceived effectiveness of a range of promotion techniques. Since universities are not homogenous, methods for clustering them for fair comparison based on published data are discussed, as are a range of measures of results also derived from the same data.

Research paper thumbnail of On data mining Tree structured data represented in XML

The ubiquitous nature of XML as a data interchange format and the arrival of XMLdatabases offers ... more The ubiquitous nature of XML as a data interchange format and the arrival of XMLdatabases offers new opportunities and challenges to data mining. The XML format permits the
representation of data structures that cannot be easily mapped into a relational framework. New opportunities exist to mine relationships expressed by tree position and the presence or absence of sub trees as well as the conventional categorical and numeric data values contained at leaf nodes. Thispaper will consider why most conventional data mining algorithms are ill suited to mining tree data,
describe a novel XML based knowledge representation methodology called Metarule, and describe algorithms that do perform well. Two examples drawn from business will be used to demonstrate the application of these novel techniques.
Keywords: Data-mining, XML, Unstructured data

Research paper thumbnail of Using concept structures for efficient document comparison and location

A method is discussed for comparing and locating similar documents in a computationally efficient... more A method is discussed for comparing and locating similar documents in a computationally efficient manner by making use of inferred concept statistics, rather than word frequencies. This novel technique uses natural language structures to create a short ‘concept signature’ vector, which locates a document in ‘concept space’. Similar documents can be located in large corpora in O(log(n)) time by making use of this space for indexing. Results from trials with reference and real world data sets are presented, along with a comparison of the method’s document similarity characteristics and the cosine metric.

Research paper thumbnail of Quantitative analysis of the pyrolysis--mass spectra of complex mixtures using artificial neural networks: Application to amino acids in glycogen

Journal of Analytical and Applied …, Jan 1, 1993

Pyrolysis-mass spectrometry and artificial neural networks (ANNs) were used in combi- nation to p... more Pyrolysis-mass spectrometry and artificial neural networks (ANNs) were used in combi- nation to provide quantitative analyses of mixtures of casamino acids in glycogen, as representatives of complex proteins and carbohydrates. We studied fully interconnected feedforward networks, whose weights were modified using various types of back-propagation algorithms, and which exploited a sigmoidal activation function. The ability of the ANNs to generalise was evaluated by varying the number of data points in the training set. It was found that for the algorithms and architecture employed, a set of ten samples equally spaced over the desired concentration range should be used to provide good interpolation. ANNs were poor at extrapolating beyond the range over which they had been trained.

Research paper thumbnail of Oscillatory, stochastic and chaotic growth rate fluctuations in permittistatically controlled yeast cultures

Biosystems, Jan 1, 1996

We describe a continuous culture system related to the turbidostat, but using a feedback system ... more We describe a continuous culture system related to the turbidostat, but using a feedback
system based on biomass estimation from the dielectric permittivity of the cell suspension
rather than its optical density. It is shown that this system provides an excellent method of
maintaining a constant biomass level within a fermentor. The computer-controlled system was
able to effect the essentially continuous registration of growth rate by monitoring the rate of
medium addition via the time-dependent activity of the pump. At some biomass setpoints for
aerobically grown cultures of baker's yeast substantial time-dependent fluctuations in the
growth rate of the culture were thereby observed. At some biomass setpoints, however, or
under anaerobic conditions, or when using a non-Crabtree yeast, the growth rate was constant,
indicating that the fluctuations were inherent to the biological system and not simply a
property of the fermentor and control system. A variety of time series analyses (Fourier
transformations, Hurst and Lyapunov exponents, the determination of embedding dimension,
and nonlinear time series predictions based on the methodology of Sugihara and May) were
used to demonstrate, for the first time, that as well as stochastic and periodic components these
fluctuations exhibited deterministic chaos. "Trivial predictors" were unable to give accurate
predictions of the growth rate in these cultures. The growth rate fluctuations were studied
further by means of offline measurements of changes in percentage viability, bud count, and in
the external ethanol and glucose concentrations; these data and other evidence suggested that
the growth rate fluctuations were closely linked to the primary respiro-fermentative
metabolism of this organism. The identification of chaotic growth rates in cell cultures
suggests that there may be novel methods for controlling the growth of such cultures.

Research paper thumbnail of Time series prediction using supervised learning and tools from chaos theory

In this work methods for performing time series prediction on complex real world time series are... more In this work methods for performing time series prediction on complex real
world time series are examined. In particular series exhibiting non-linear or
chaotic behaviour are selected for analysis. A range of methodologies based
on Takens’ embedding theorem are considered and compared with more
conventional methods. A novel combination of methods for determining the
optimal embedding parameters are employed and tried out with multivariate
financial time series data and with a complex series derived from an
experiment in biotechnology. The results show that this combination of
techniques provide accurate results while improving dramatically the time
required to produce predictions and analyses, and eliminating a range of
parameters that had hitherto been fixed empirically. The architecture and
methodology of the prediction software developed is described along with
design decisions and their justification. Sensitivity analyses are employed to
justify the use of this combination of methods, and comparisons are made
with more conventional predictive techniques and trivial predictors showing
the superiority of the results generated by the work detailed in this thesis.

Research paper thumbnail of Genetic programming of Fuzzy logic production rules with application to financial trading

IEEE World Conference on Computational …, Jan 1, 1994

John Koza7 has demonstrated that a form of machine learning can be constructed by using the techn... more John Koza7 has demonstrated that a form of machine learning can be constructed
by using the techniques of Genetic Programming using LISP statements. We
describe here an extension to this principle using Fuzzy Logic sets and
operations instead of LISP expressions. We show that Genetic programming can
be used to generate trees of fuzzy logic statements, the evaluation of which
optimise some external process, in our example financial trading. We also show
that these trees can be simply converted to natural language rules, and that these
rules are easily comprehended by a lay audience. This clarity of internal
function can be compared to “Black Box” non-parametric modelling techniques
such as Neural Networks. We then show that even with minimal data
preparation the technique produces rules with good out of sample performance
on a range of different financial instruments.

Research paper thumbnail of Simultaneous prediction of multiple financial time series using supervised learning and chaos theory

Neural Networks, 1994. …, Jan 1, 2002

"Embedded" time series are often used with Neural networks or other Supervised Learning Algorithm... more "Embedded" time series are often used with Neural networks or other Supervised Learning Algorithms to generate predictions. Recent work in chaos theory has pointed to methods of determining the optimal embedding parameters for individual time series. The hypothesis is explored that these methods also hold when multiple time series are used together to generate a prediction, and that the optima for the individual series combined are the optimum for the group. A novel prediction explanation mechanism is described.
Examples will be taken from foreign exchange time series, and the analyses will be performed using The Prophet, a time series prediction program. [Edmonds1]

Drafts by Andrew Edmonds

Research paper thumbnail of Hierarchical Learning Applied to Word Sense Disambiguation

1 This paper introduces a form of Hierarchical Learning that permits highly relevant association ... more 1 This paper introduces a form of Hierarchical Learning that permits highly relevant association rules to be extracted from data items ambiguously related to a hierarchy. Addressing the problem of word sense disambiguation in natural language processing, this paper shows how references between words and hypernymy hierarchies may be used to generate highly relevant general rules representing valid associations that can thereafter be used to disambiguate unseen text.

Research paper thumbnail of Hierarchical Learning Applied to Word Sense Disambiguation

SSRN Electronic Journal, 2016

This paper introduces a form of Hierarchical Learning that permits highly relevant association ru... more This paper introduces a form of Hierarchical Learning that permits highly relevant association rules to be extracted from data items ambiguously related to a hierarchy. Addressing the problem of word sense disambiguation in natural language processing, this paper shows how references between words and hypernymy hierarchies may be used to generate highly relevant general rules representing valid associations that can thereafter be used to disambiguate unseen text.

Research paper thumbnail of Using concept structures for efficient document comparison and location

2007 IEEE Symposium on Computational Intelligence and Data Mining, 2007

Page 1. Abstract–A method is discussed for comparing and locating similar documents in a computat... more Page 1. Abstract–A method is discussed for comparing and locating similar documents in a computationally efficient manner by making use of inferred concept statistics, rather than word frequencies. This novel technique uses ...

Research paper thumbnail of Multilingual extraction and editing of concept strings for the legal domain

Advances in Computer Science : an International Journal, 2016

Identifying semantic expressions (so-called concept strings (CSs)) in multilingual corpora is an ... more Identifying semantic expressions (so-called concept strings (CSs)) in multilingual corpora is an important NLP task, as it allows web search engines to define and perform semantic queries over large collection of documents. Existing web search engines in the legal domain are mainly limited to keyword search, in which the query word is matched against the textual content of the documents. This paper presents a novel framework named the Concept Strings Framework that makes use of CSs for representing the content of the documents, and for allowing semantic search over them. These CSs can consist of individual knowledge base (KB) concepts (e.g. WordNet concepts) or combination of them. In addition, this paper presents an interactive web-based toolkit, called the Template Editor that enables the creation, editing and evaluation of CSs. Experiments on two publicly available legislation websites show satisfactory results.

Research paper thumbnail of Multilingual extraction and editing of concept strings for the legal domain

Identifying semantic expressions (so-called concept strings (CSs)) in multilingual corpora is an ... more Identifying semantic expressions (so-called concept strings (CSs)) in multilingual corpora is an important NLP task, as it allows web search engines to define and perform semantic queries over large collection of documents. Existing web search engines in the legal domain are mainly limited to keyword search, in which the query word is matched against the textual content of the documents. This paper presents a novel framework named the Concept Strings Framework that makes use of CSs for representing the content of the documents, and for allowing semantic search over them. These CSs can consist of individual knowledge base (KB) concepts (e.g. WordNet concepts) or combination of them. In addition, this paper presents an interactive web-based toolkit, called the Template Editor that enables the creation, editing and evaluation of CSs. Experiments on two publicly available legislation websites show satisfactory results.

Research paper thumbnail of A discussion of methodology for an enquiry into the effectiveness of marketing tools used by universities in the UK

This work in progress paper describes the initial planning and data collection stages of work see... more This work in progress paper describes the initial planning and data collection stages of work seeking to analyse the effectiveness of various strategies for marketing universities. This will update existing work from 2001 collected from university marketers collating the perceived effectiveness of a range of promotion techniques. Since universities are not homogenous, methods for clustering them for fair comparison based on published data are discussed, as are a range of measures of results also derived from the same data.

Research paper thumbnail of On data mining Tree structured data represented in XML

The ubiquitous nature of XML as a data interchange format and the arrival of XMLdatabases offers ... more The ubiquitous nature of XML as a data interchange format and the arrival of XMLdatabases offers new opportunities and challenges to data mining. The XML format permits the
representation of data structures that cannot be easily mapped into a relational framework. New opportunities exist to mine relationships expressed by tree position and the presence or absence of sub trees as well as the conventional categorical and numeric data values contained at leaf nodes. Thispaper will consider why most conventional data mining algorithms are ill suited to mining tree data,
describe a novel XML based knowledge representation methodology called Metarule, and describe algorithms that do perform well. Two examples drawn from business will be used to demonstrate the application of these novel techniques.
Keywords: Data-mining, XML, Unstructured data

Research paper thumbnail of Using concept structures for efficient document comparison and location

A method is discussed for comparing and locating similar documents in a computationally efficient... more A method is discussed for comparing and locating similar documents in a computationally efficient manner by making use of inferred concept statistics, rather than word frequencies. This novel technique uses natural language structures to create a short ‘concept signature’ vector, which locates a document in ‘concept space’. Similar documents can be located in large corpora in O(log(n)) time by making use of this space for indexing. Results from trials with reference and real world data sets are presented, along with a comparison of the method’s document similarity characteristics and the cosine metric.

Research paper thumbnail of Quantitative analysis of the pyrolysis--mass spectra of complex mixtures using artificial neural networks: Application to amino acids in glycogen

Journal of Analytical and Applied …, Jan 1, 1993

Pyrolysis-mass spectrometry and artificial neural networks (ANNs) were used in combi- nation to p... more Pyrolysis-mass spectrometry and artificial neural networks (ANNs) were used in combi- nation to provide quantitative analyses of mixtures of casamino acids in glycogen, as representatives of complex proteins and carbohydrates. We studied fully interconnected feedforward networks, whose weights were modified using various types of back-propagation algorithms, and which exploited a sigmoidal activation function. The ability of the ANNs to generalise was evaluated by varying the number of data points in the training set. It was found that for the algorithms and architecture employed, a set of ten samples equally spaced over the desired concentration range should be used to provide good interpolation. ANNs were poor at extrapolating beyond the range over which they had been trained.

Research paper thumbnail of Oscillatory, stochastic and chaotic growth rate fluctuations in permittistatically controlled yeast cultures

Biosystems, Jan 1, 1996

We describe a continuous culture system related to the turbidostat, but using a feedback system ... more We describe a continuous culture system related to the turbidostat, but using a feedback
system based on biomass estimation from the dielectric permittivity of the cell suspension
rather than its optical density. It is shown that this system provides an excellent method of
maintaining a constant biomass level within a fermentor. The computer-controlled system was
able to effect the essentially continuous registration of growth rate by monitoring the rate of
medium addition via the time-dependent activity of the pump. At some biomass setpoints for
aerobically grown cultures of baker's yeast substantial time-dependent fluctuations in the
growth rate of the culture were thereby observed. At some biomass setpoints, however, or
under anaerobic conditions, or when using a non-Crabtree yeast, the growth rate was constant,
indicating that the fluctuations were inherent to the biological system and not simply a
property of the fermentor and control system. A variety of time series analyses (Fourier
transformations, Hurst and Lyapunov exponents, the determination of embedding dimension,
and nonlinear time series predictions based on the methodology of Sugihara and May) were
used to demonstrate, for the first time, that as well as stochastic and periodic components these
fluctuations exhibited deterministic chaos. "Trivial predictors" were unable to give accurate
predictions of the growth rate in these cultures. The growth rate fluctuations were studied
further by means of offline measurements of changes in percentage viability, bud count, and in
the external ethanol and glucose concentrations; these data and other evidence suggested that
the growth rate fluctuations were closely linked to the primary respiro-fermentative
metabolism of this organism. The identification of chaotic growth rates in cell cultures
suggests that there may be novel methods for controlling the growth of such cultures.

Research paper thumbnail of Time series prediction using supervised learning and tools from chaos theory

In this work methods for performing time series prediction on complex real world time series are... more In this work methods for performing time series prediction on complex real
world time series are examined. In particular series exhibiting non-linear or
chaotic behaviour are selected for analysis. A range of methodologies based
on Takens’ embedding theorem are considered and compared with more
conventional methods. A novel combination of methods for determining the
optimal embedding parameters are employed and tried out with multivariate
financial time series data and with a complex series derived from an
experiment in biotechnology. The results show that this combination of
techniques provide accurate results while improving dramatically the time
required to produce predictions and analyses, and eliminating a range of
parameters that had hitherto been fixed empirically. The architecture and
methodology of the prediction software developed is described along with
design decisions and their justification. Sensitivity analyses are employed to
justify the use of this combination of methods, and comparisons are made
with more conventional predictive techniques and trivial predictors showing
the superiority of the results generated by the work detailed in this thesis.

Research paper thumbnail of Genetic programming of Fuzzy logic production rules with application to financial trading

IEEE World Conference on Computational …, Jan 1, 1994

John Koza7 has demonstrated that a form of machine learning can be constructed by using the techn... more John Koza7 has demonstrated that a form of machine learning can be constructed
by using the techniques of Genetic Programming using LISP statements. We
describe here an extension to this principle using Fuzzy Logic sets and
operations instead of LISP expressions. We show that Genetic programming can
be used to generate trees of fuzzy logic statements, the evaluation of which
optimise some external process, in our example financial trading. We also show
that these trees can be simply converted to natural language rules, and that these
rules are easily comprehended by a lay audience. This clarity of internal
function can be compared to “Black Box” non-parametric modelling techniques
such as Neural Networks. We then show that even with minimal data
preparation the technique produces rules with good out of sample performance
on a range of different financial instruments.

Research paper thumbnail of Simultaneous prediction of multiple financial time series using supervised learning and chaos theory

Neural Networks, 1994. …, Jan 1, 2002

"Embedded" time series are often used with Neural networks or other Supervised Learning Algorithm... more "Embedded" time series are often used with Neural networks or other Supervised Learning Algorithms to generate predictions. Recent work in chaos theory has pointed to methods of determining the optimal embedding parameters for individual time series. The hypothesis is explored that these methods also hold when multiple time series are used together to generate a prediction, and that the optima for the individual series combined are the optimum for the group. A novel prediction explanation mechanism is described.
Examples will be taken from foreign exchange time series, and the analyses will be performed using The Prophet, a time series prediction program. [Edmonds1]

Research paper thumbnail of Hierarchical Learning Applied to Word Sense Disambiguation

1 This paper introduces a form of Hierarchical Learning that permits highly relevant association ... more 1 This paper introduces a form of Hierarchical Learning that permits highly relevant association rules to be extracted from data items ambiguously related to a hierarchy. Addressing the problem of word sense disambiguation in natural language processing, this paper shows how references between words and hypernymy hierarchies may be used to generate highly relevant general rules representing valid associations that can thereafter be used to disambiguate unseen text.