Chris Watkins | Royal Holloway, University of London (original) (raw)
Papers by Chris Watkins
2017 International Conference on Indoor Positioning and Indoor Navigation (IPIN), 2017
The public transports provide an ideal means to enable contagious diseases transmission. This pap... more The public transports provide an ideal means to enable contagious diseases transmission. This paper introduces a novel idea to detect co-location of people in such environment using just the ubiquitous geomagnetic field sensor on the smart phone. Essentially, given that all passengers must share the same journey between at least two consecutive stations, we have a long window to match the user trajectory. Our idea was assessed over a painstakingly survey of over 150 kilometres of travelling distance, covering different parts of London, using the overground trains, the underground tubes and the buses.
Journal of Location Based Services, 2020
Contact tracing is widely considered as an effective procedure in the fight against epidemic dise... more Contact tracing is widely considered as an effective procedure in the fight against epidemic diseases. However, one of the challenges for technology based contact tracing is the high number of false positives, questioning its trustworthiness and efficiency amongst the wider population for mass adoption. To this end, this paper proposes a novel, yet practical smartphone-based contact tracing approach, employing WiFi and acoustic sound for relative distance estimate, in addition to the air pressure and the magnetic field for ambient environment matching. We present a model combining six smartphone sensors, prioritising some of them when certain conditions are met. We empirically verified our approach in various realistic environments to demonstrate an achievement of up to 95% fewer false positives, and 62% more accurate than Bluetooth-only system. To the best of our knowledge, this paper was one of the first work to propose a combination of smartphone sensors for contact tracing.
Journal of Medical Internet Research, 2016
Background: Concerns over online health information-seeking behavior point to the potential harm ... more Background: Concerns over online health information-seeking behavior point to the potential harm incorrect, incomplete, or biased information may cause. However, systematic reviews of health information have found few examples of documented harm that can be directly attributed to poor quality information found online. Objective: The aim of this study was to improve our understanding of the quality and quality characteristics of information found in online discussion forum websites so that their likely value as a peer-to-peer health information-sharing platform could be assessed. Methods: A total of 25 health discussion threads were selected across 3 websites (Reddit, Mumsnet, and Patient) covering 3 health conditions (human immunodeficiency virus [HIV], diabetes, and chickenpox). Assessors were asked to rate information found in the discussion threads according to 5 criteria: accuracy, completeness, how sensible the replies were, how they thought the questioner would act, and how useful they thought the questioner would find the replies. Results: In all, 78 fully completed assessments were returned by 17 individuals (8 were qualified medical doctors, 9 were not). When the ratings awarded in the assessments were analyzed, 25 of the assessments placed the discussion threads in the highest possible score band rating them between 5 and 10 overall, 38 rated them between 11 and 15, 12 rated them between 16 and 20, and 3 placed the discussion thread they assessed in the lowest rating band (21-25). This suggests that health threads on Internet discussion forum websites are more likely than not (by a factor of 4:1) to contain information of high or reasonably high quality. Extremely poor information is rare; the lowest available assessment rating was awarded only 11 times out of a possible 353, whereas the highest was awarded 54 times. Only 3 of 78 fully completed assessments rated a discussion thread in the lowest possible overall band of 21 to 25, whereas 25 of 78 rated it in the highest of 5 to 10. Quality assessments differed depending on the health condition (chickenpox appeared 17 times in the 20 lowest-rated threads, HIV twice, and diabetes once). Although assessors tended to agree on which discussion threads contained good quality information, what constituted poor quality information appeared to be more subjective. Conclusions: Most of the information assessed in this study was considered by qualified medical doctors and nonmedically qualified respondents to be of reasonably good quality. Although a small amount of information was assessed as poor, not all respondents agreed that the original questioner would have been led to act inappropriately based on the information presented. This suggests that discussion forum websites may be a useful platform through which people can ask health-related questions and receive answers of acceptable quality.
Proceedings of the eleventh annual conference on Computational learning theory, 1998
A typical problem in portfolio selection in stock markets is that it is not clear which of the ma... more A typical problem in portfolio selection in stock markets is that it is not clear which of the many available strategies should be used. We apply a general algorithm of prediction with expert advice (the Aggregating Algorithm) to two different idealizations of the stock market. One is the well-known game introduced by Cover in connection with his "universal portfolio" algorithm; the other is a more realistic modification of Cover's game introduced in this paper, where market's participants are allowed to take "short positions", so that the algorithm may be applied to currency and futures markets. Besides applying the Aggregating Algorithm to a countable (or finite) family of arbitrary investment strategies, we also apply it, in the case of Cover's game, to the uncountable family of "constant rebalanced portfolios" considered by Cover. We generalize Cover's worst-case bounds for his "universal portfolio" algorithm (which can be regarded as a special case of the Aggregating Algorithm corresponding to learning rate 1) to the case of learning rates not exceeding 1. Finally, we discuss a general approach to designing investment strategies in which, instead of making statistical or other assumptions about the market, natural assumptions of computability are made about possible investment strategies; this approach leads to natural extensions of the notion of Kolmogorov complexity.
Lecture Notes in Computer Science, 2006
Strings can be mapped into Hilbert spaces using feature maps such as the Parikh map. Languages ca... more Strings can be mapped into Hilbert spaces using feature maps such as the Parikh map. Languages can then be defined as the preimage of hyperplanes in the feature space, rather than using grammars or automata. These are the planar languages. In this paper we show that using techniques from kernel-based learning, we can represent and efficiently learn, from positive data alone, various linguistically interesting context-sensitive languages. In particular we show that the cross-serial dependencies in Swiss German, that established the non-context-freeness of natural language, are learnable using a standard kernel. We demonstrate the polynomial-time identifiability in the limit of these classes, and discuss some language theoretic properties of these classes, and their relationship to the choice of kernel/feature map.
Proceedings of the 9th annual conference on Genetic and evolutionary computation, 2007
Proceedings of the National Academy of Sciences, 2009
When a disease breaks out in a human population, changes in behavior in response to the outbreak ... more When a disease breaks out in a human population, changes in behavior in response to the outbreak can alter the progression of the infectious agent. In particular, people aware of a disease in their proximity can take measures to reduce their susceptibility. Even if no centralized information is provided about the presence of a disease, such awareness can arise through first-hand observation and word of mouth. To understand the effects this can have on the spread of a disease, we formulate and analyze a mathematical model for the spread of awareness in a host population, and then link this to an epidemiological model by having more informed hosts reduce their susceptibility. We find that, in a well-mixed population, this can result in a lower size of the outbreak, but does not affect the epidemic threshold. If, however, the behavioral response is treated as a local effect arising in the proximity of an outbreak, it can completely stop a disease from spreading, although only if the in...
Nucleic Acids Research, 2011
The complexity of gene expression data generated from microarrays and high-throughput sequencing ... more The complexity of gene expression data generated from microarrays and high-throughput sequencing make their analysis challenging. One goal of these analyses is to define sets of co-regulated genes and identify patterns of gene expression. To date, however, there is a lack of easily implemented methods that allow an investigator to visualize and interact with the data in an intuitive and flexible manner. Here, we show that combining a nonlinear dimensionality reduction method, t-statistic Stochastic Neighbor Embedding (t-SNE), with a novel visualization technique provides a graphical mapping that allows the intuitive investigation of transcriptome data. This approach performs better than commonly used methods, offering insight into underlying patterns of gene expression at both global and local scales and identifying clusters of similarly expressed genes. A freely available MATLABimplemented graphical user interface to perform t-SNE and nearest neighbour plots on genomic data sets is available at www.nimr.mrc.ac.uk/ research/james-briscoe/visgenex.
In this report we describe how the Support Vector (SV) technique of solving linear operator equat... more In this report we describe how the Support Vector (SV) technique of solving linear operator equations can be applied to the problem of density estimation 4]. We present a new optimization procedure and set of kernels closely related to current SV techniques that guarantee the monotonicity of the approximation. This technique estimates densities with a mixture of bumps (Gaussian-like shapes), with the usual SV property that only some coe cients are non-zero. Both the width and the height of each bump is chosen adaptively, by ...
2008 10th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, 2008
Selective breeding is considered as a communication channel, in a novel way. The Shannon informat... more Selective breeding is considered as a communication channel, in a novel way. The Shannon informational capacity of this channel is an upper limit on the amount of information that can be put into the genome by selection: this is a meaningful upper limit to the adaptive complexity of evolved organisms. We calculate the maximum adaptive complexity achievable for a given mutation rate for simple models of sexual and asexual reproduction. A new and surprising result is that, with sexual reproduction, the greatest adaptive complexity can be achieved with very long genomes, so long that genetic drift ensures that individual genetic elements are only weakly determined. Put another way, with sexual reproduction, the greatest adaptive complexity can in principle be obtained with genetic architectures that are, in a sense, error correcting codes. For asexual reproduction, for a given mutation rate, the achievable adaptive complexity is much less than for sexual reproduction, and depends only weakly on genome length. A possible implication of this result for genetic algorithms is that the greatest adaptive complexity is in principle achievable when genomes are so long that mutation prevents the population coming close to convergence.
Proceedings of the 14th International Conference on Availability, Reliability and Security
We demonstrate a breach in smartphone location privacy through the accelerometer and magnetometer... more We demonstrate a breach in smartphone location privacy through the accelerometer and magnetometer's footprints. The merits or otherwise of explicitly permissioned location sensors are not the point of this paper. Instead, our proposition is that other non-locationsensitive sensors can track users accurately when the users are in motion, as in travelling on public transport, such as trains, buses, and taxis. Through field trials, we provide evidence that high accuracy location tracking can be achieved even via non-locationsensitive sensors for which no access authorisation is required from users on a smartphone.
Machine Learning, 2010
Using string kernels, languages can be represented as hyperplanes in a high dimensional feature s... more Using string kernels, languages can be represented as hyperplanes in a high dimensional feature space. We present a new family of grammatical inference algorithms based on this idea. We demonstrate that some mildly context sensitive languages can be represented in this way and it is possible to efficiently learn these using kernel PCA. We present some experiments demonstrating the effectiveness of this approach on some standard examples of context sensitive languages using small synthetic data sets.
Fundamenta Informaticae, 2008
We describe methods of representing strings as real valued vectors or matrices; we show how to in... more We describe methods of representing strings as real valued vectors or matrices; we show how to integrate two separate lines of enquiry: string kernels, developed in machine learning, and Parikh matrices [8], which have been studied intensively over the last few years as a powerful tool in the study of combinatorics over words. In the field of machine learning, there is widespread use of string kernels, which use analogous mappings into high dimensional feature spaces based on the occurrences of subwords or factors. In this ...
Advances in kernel methods—Support vector learning, Feb 8, 1999
Support Vector Machines using ANOVA Decomposition Kernels (SVAD)[Vapng] are a way of imposing a s... more Support Vector Machines using ANOVA Decomposition Kernels (SVAD)[Vapng] are a way of imposing a structure on multi-dimensional kernels which are generated as the tensor product of one-dimensional kernels. This gives more accurate control over the capacity of the learning machine (VC-dimension). SVAD uses ideas from ANOVA decomposition methods and extends them to generate kernels which directly implement these ideas. SVAD is used with spline kernels and results show that SVAD performs better ...
We show that evolutionary computation can be implemented as standard Markov-chain Monte-Carlo (MC... more We show that evolutionary computation can be implemented as standard Markov-chain Monte-Carlo (MCMC) sampling. With some care, `genetic algorithms' can be constructed that are reversible Markov chains that satisfy detailed balance; it follows that the stationary distribution of populations is a Gibbs distribution in a simple factorised form. For some standard and popular nonparametric probability models, we exhibit Gibbs-sampling procedures that are plausible genetic algorithms. At mutation-selection equilibrium, a population of genomes is analogous to a sample from a Bayesian posterior, and the genomes are analogous to latent variables. We suggest this is a general, tractable, and insightful formulation of evolutionary computation in terms of standard machine learning concepts and techniques. In addition, we show that evolutionary processes in which selection acts by differences in fecundity are not reversible, and also that it is not possible to construct reversible evolutiona...
We show that evolutionary computation can be implemented as standard Markov-chain Monte-Carlo (MC... more We show that evolutionary computation can be implemented as standard Markov-chain Monte-Carlo (MCMC) sampling. With some care, `genetic algorithms' can be constructed that are reversible Markov chains that satisfy detailed balance; it follows that the stationary distribution of populations is a Gibbs distribution in a simple factorised form. For some standard and popular nonparametric probability models, we exhibit Gibbs-sampling procedures that are plausible genetic algorithms. At mutation-selection equilibrium, a population of genomes is analogous to a sample from a Bayesian posterior, and the genomes are analogous to latent variables. We suggest this is a general, tractable, and insightful formulation of evolutionary computation in terms of standard machine learning concepts and techniques. In addition, we show that evolutionary processes in which selection acts by differences in fecundity are not reversible, and also that it is not possible to construct reversible evolutiona...
In addition, we show that evolutionary processes in which selection acts by differences in fecund... more In addition, we show that evolutionary processes in which selection acts by differences in fecundity are not reversible, and also that it is not possible to construct reversible evolutionary models in which each child is produced by only two parents.
Selective breeding is considered as a communication channel, in a novel way. The Shannon informat... more Selective breeding is considered as a communication channel, in a novel way. The Shannon informational capacity of this channel is an upper limit on the amount of information that can be put into the genome by selection: this is a meaningful upper limit to the adaptive complexity of evolved organisms. We calculate the maximum adaptive complexity achievable for a given mutation rate for simple models of sexual and asexual reproduction. A new and surprising result is that, with sexual reproduction, the greatest adaptive complexity can be achieved with very long genomes, so long that genetic drift ensures that individual genetic elements are only weakly determined. Put another way, with sexual reproduction, the greatest adaptive complexity can in principle be obtained with genetic architectures that are, in a sense, error correcting codes. For asexual reproduction, for a given mutation rate, the achievable adaptive complexity is much less than for sexual reproduction, and depends only weakly on genome length.
Machine Learning, 1992
) is a simple way for agents to learn how to act optimally in controlled Markovian domains. It am... more ) is a simple way for agents to learn how to act optimally in controlled Markovian domains. It amounts to an incremental method for dynamic programming which imposes limited computational demands. It works by successively improving its evaluations of the quality of particular actions at particular states. This paper presents and proves in detail a convergence theorem for Q,-learning based on that outlined in Watkins (1989). We show that Q-learning converges to the optimum action-values with probability 1 so long as all actions are repeatedly sampled in all states and the action-values are represented discretely. We also sketch extensions to the cases of non-discounted, but absorbing, Markov environments, and where many Q values can be changed each iteration, rather than just one.
2017 International Conference on Indoor Positioning and Indoor Navigation (IPIN), 2017
The public transports provide an ideal means to enable contagious diseases transmission. This pap... more The public transports provide an ideal means to enable contagious diseases transmission. This paper introduces a novel idea to detect co-location of people in such environment using just the ubiquitous geomagnetic field sensor on the smart phone. Essentially, given that all passengers must share the same journey between at least two consecutive stations, we have a long window to match the user trajectory. Our idea was assessed over a painstakingly survey of over 150 kilometres of travelling distance, covering different parts of London, using the overground trains, the underground tubes and the buses.
Journal of Location Based Services, 2020
Contact tracing is widely considered as an effective procedure in the fight against epidemic dise... more Contact tracing is widely considered as an effective procedure in the fight against epidemic diseases. However, one of the challenges for technology based contact tracing is the high number of false positives, questioning its trustworthiness and efficiency amongst the wider population for mass adoption. To this end, this paper proposes a novel, yet practical smartphone-based contact tracing approach, employing WiFi and acoustic sound for relative distance estimate, in addition to the air pressure and the magnetic field for ambient environment matching. We present a model combining six smartphone sensors, prioritising some of them when certain conditions are met. We empirically verified our approach in various realistic environments to demonstrate an achievement of up to 95% fewer false positives, and 62% more accurate than Bluetooth-only system. To the best of our knowledge, this paper was one of the first work to propose a combination of smartphone sensors for contact tracing.
Journal of Medical Internet Research, 2016
Background: Concerns over online health information-seeking behavior point to the potential harm ... more Background: Concerns over online health information-seeking behavior point to the potential harm incorrect, incomplete, or biased information may cause. However, systematic reviews of health information have found few examples of documented harm that can be directly attributed to poor quality information found online. Objective: The aim of this study was to improve our understanding of the quality and quality characteristics of information found in online discussion forum websites so that their likely value as a peer-to-peer health information-sharing platform could be assessed. Methods: A total of 25 health discussion threads were selected across 3 websites (Reddit, Mumsnet, and Patient) covering 3 health conditions (human immunodeficiency virus [HIV], diabetes, and chickenpox). Assessors were asked to rate information found in the discussion threads according to 5 criteria: accuracy, completeness, how sensible the replies were, how they thought the questioner would act, and how useful they thought the questioner would find the replies. Results: In all, 78 fully completed assessments were returned by 17 individuals (8 were qualified medical doctors, 9 were not). When the ratings awarded in the assessments were analyzed, 25 of the assessments placed the discussion threads in the highest possible score band rating them between 5 and 10 overall, 38 rated them between 11 and 15, 12 rated them between 16 and 20, and 3 placed the discussion thread they assessed in the lowest rating band (21-25). This suggests that health threads on Internet discussion forum websites are more likely than not (by a factor of 4:1) to contain information of high or reasonably high quality. Extremely poor information is rare; the lowest available assessment rating was awarded only 11 times out of a possible 353, whereas the highest was awarded 54 times. Only 3 of 78 fully completed assessments rated a discussion thread in the lowest possible overall band of 21 to 25, whereas 25 of 78 rated it in the highest of 5 to 10. Quality assessments differed depending on the health condition (chickenpox appeared 17 times in the 20 lowest-rated threads, HIV twice, and diabetes once). Although assessors tended to agree on which discussion threads contained good quality information, what constituted poor quality information appeared to be more subjective. Conclusions: Most of the information assessed in this study was considered by qualified medical doctors and nonmedically qualified respondents to be of reasonably good quality. Although a small amount of information was assessed as poor, not all respondents agreed that the original questioner would have been led to act inappropriately based on the information presented. This suggests that discussion forum websites may be a useful platform through which people can ask health-related questions and receive answers of acceptable quality.
Proceedings of the eleventh annual conference on Computational learning theory, 1998
A typical problem in portfolio selection in stock markets is that it is not clear which of the ma... more A typical problem in portfolio selection in stock markets is that it is not clear which of the many available strategies should be used. We apply a general algorithm of prediction with expert advice (the Aggregating Algorithm) to two different idealizations of the stock market. One is the well-known game introduced by Cover in connection with his "universal portfolio" algorithm; the other is a more realistic modification of Cover's game introduced in this paper, where market's participants are allowed to take "short positions", so that the algorithm may be applied to currency and futures markets. Besides applying the Aggregating Algorithm to a countable (or finite) family of arbitrary investment strategies, we also apply it, in the case of Cover's game, to the uncountable family of "constant rebalanced portfolios" considered by Cover. We generalize Cover's worst-case bounds for his "universal portfolio" algorithm (which can be regarded as a special case of the Aggregating Algorithm corresponding to learning rate 1) to the case of learning rates not exceeding 1. Finally, we discuss a general approach to designing investment strategies in which, instead of making statistical or other assumptions about the market, natural assumptions of computability are made about possible investment strategies; this approach leads to natural extensions of the notion of Kolmogorov complexity.
Lecture Notes in Computer Science, 2006
Strings can be mapped into Hilbert spaces using feature maps such as the Parikh map. Languages ca... more Strings can be mapped into Hilbert spaces using feature maps such as the Parikh map. Languages can then be defined as the preimage of hyperplanes in the feature space, rather than using grammars or automata. These are the planar languages. In this paper we show that using techniques from kernel-based learning, we can represent and efficiently learn, from positive data alone, various linguistically interesting context-sensitive languages. In particular we show that the cross-serial dependencies in Swiss German, that established the non-context-freeness of natural language, are learnable using a standard kernel. We demonstrate the polynomial-time identifiability in the limit of these classes, and discuss some language theoretic properties of these classes, and their relationship to the choice of kernel/feature map.
Proceedings of the 9th annual conference on Genetic and evolutionary computation, 2007
Proceedings of the National Academy of Sciences, 2009
When a disease breaks out in a human population, changes in behavior in response to the outbreak ... more When a disease breaks out in a human population, changes in behavior in response to the outbreak can alter the progression of the infectious agent. In particular, people aware of a disease in their proximity can take measures to reduce their susceptibility. Even if no centralized information is provided about the presence of a disease, such awareness can arise through first-hand observation and word of mouth. To understand the effects this can have on the spread of a disease, we formulate and analyze a mathematical model for the spread of awareness in a host population, and then link this to an epidemiological model by having more informed hosts reduce their susceptibility. We find that, in a well-mixed population, this can result in a lower size of the outbreak, but does not affect the epidemic threshold. If, however, the behavioral response is treated as a local effect arising in the proximity of an outbreak, it can completely stop a disease from spreading, although only if the in...
Nucleic Acids Research, 2011
The complexity of gene expression data generated from microarrays and high-throughput sequencing ... more The complexity of gene expression data generated from microarrays and high-throughput sequencing make their analysis challenging. One goal of these analyses is to define sets of co-regulated genes and identify patterns of gene expression. To date, however, there is a lack of easily implemented methods that allow an investigator to visualize and interact with the data in an intuitive and flexible manner. Here, we show that combining a nonlinear dimensionality reduction method, t-statistic Stochastic Neighbor Embedding (t-SNE), with a novel visualization technique provides a graphical mapping that allows the intuitive investigation of transcriptome data. This approach performs better than commonly used methods, offering insight into underlying patterns of gene expression at both global and local scales and identifying clusters of similarly expressed genes. A freely available MATLABimplemented graphical user interface to perform t-SNE and nearest neighbour plots on genomic data sets is available at www.nimr.mrc.ac.uk/ research/james-briscoe/visgenex.
In this report we describe how the Support Vector (SV) technique of solving linear operator equat... more In this report we describe how the Support Vector (SV) technique of solving linear operator equations can be applied to the problem of density estimation 4]. We present a new optimization procedure and set of kernels closely related to current SV techniques that guarantee the monotonicity of the approximation. This technique estimates densities with a mixture of bumps (Gaussian-like shapes), with the usual SV property that only some coe cients are non-zero. Both the width and the height of each bump is chosen adaptively, by ...
2008 10th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, 2008
Selective breeding is considered as a communication channel, in a novel way. The Shannon informat... more Selective breeding is considered as a communication channel, in a novel way. The Shannon informational capacity of this channel is an upper limit on the amount of information that can be put into the genome by selection: this is a meaningful upper limit to the adaptive complexity of evolved organisms. We calculate the maximum adaptive complexity achievable for a given mutation rate for simple models of sexual and asexual reproduction. A new and surprising result is that, with sexual reproduction, the greatest adaptive complexity can be achieved with very long genomes, so long that genetic drift ensures that individual genetic elements are only weakly determined. Put another way, with sexual reproduction, the greatest adaptive complexity can in principle be obtained with genetic architectures that are, in a sense, error correcting codes. For asexual reproduction, for a given mutation rate, the achievable adaptive complexity is much less than for sexual reproduction, and depends only weakly on genome length. A possible implication of this result for genetic algorithms is that the greatest adaptive complexity is in principle achievable when genomes are so long that mutation prevents the population coming close to convergence.
Proceedings of the 14th International Conference on Availability, Reliability and Security
We demonstrate a breach in smartphone location privacy through the accelerometer and magnetometer... more We demonstrate a breach in smartphone location privacy through the accelerometer and magnetometer's footprints. The merits or otherwise of explicitly permissioned location sensors are not the point of this paper. Instead, our proposition is that other non-locationsensitive sensors can track users accurately when the users are in motion, as in travelling on public transport, such as trains, buses, and taxis. Through field trials, we provide evidence that high accuracy location tracking can be achieved even via non-locationsensitive sensors for which no access authorisation is required from users on a smartphone.
Machine Learning, 2010
Using string kernels, languages can be represented as hyperplanes in a high dimensional feature s... more Using string kernels, languages can be represented as hyperplanes in a high dimensional feature space. We present a new family of grammatical inference algorithms based on this idea. We demonstrate that some mildly context sensitive languages can be represented in this way and it is possible to efficiently learn these using kernel PCA. We present some experiments demonstrating the effectiveness of this approach on some standard examples of context sensitive languages using small synthetic data sets.
Fundamenta Informaticae, 2008
We describe methods of representing strings as real valued vectors or matrices; we show how to in... more We describe methods of representing strings as real valued vectors or matrices; we show how to integrate two separate lines of enquiry: string kernels, developed in machine learning, and Parikh matrices [8], which have been studied intensively over the last few years as a powerful tool in the study of combinatorics over words. In the field of machine learning, there is widespread use of string kernels, which use analogous mappings into high dimensional feature spaces based on the occurrences of subwords or factors. In this ...
Advances in kernel methods—Support vector learning, Feb 8, 1999
Support Vector Machines using ANOVA Decomposition Kernels (SVAD)[Vapng] are a way of imposing a s... more Support Vector Machines using ANOVA Decomposition Kernels (SVAD)[Vapng] are a way of imposing a structure on multi-dimensional kernels which are generated as the tensor product of one-dimensional kernels. This gives more accurate control over the capacity of the learning machine (VC-dimension). SVAD uses ideas from ANOVA decomposition methods and extends them to generate kernels which directly implement these ideas. SVAD is used with spline kernels and results show that SVAD performs better ...
We show that evolutionary computation can be implemented as standard Markov-chain Monte-Carlo (MC... more We show that evolutionary computation can be implemented as standard Markov-chain Monte-Carlo (MCMC) sampling. With some care, `genetic algorithms' can be constructed that are reversible Markov chains that satisfy detailed balance; it follows that the stationary distribution of populations is a Gibbs distribution in a simple factorised form. For some standard and popular nonparametric probability models, we exhibit Gibbs-sampling procedures that are plausible genetic algorithms. At mutation-selection equilibrium, a population of genomes is analogous to a sample from a Bayesian posterior, and the genomes are analogous to latent variables. We suggest this is a general, tractable, and insightful formulation of evolutionary computation in terms of standard machine learning concepts and techniques. In addition, we show that evolutionary processes in which selection acts by differences in fecundity are not reversible, and also that it is not possible to construct reversible evolutiona...
We show that evolutionary computation can be implemented as standard Markov-chain Monte-Carlo (MC... more We show that evolutionary computation can be implemented as standard Markov-chain Monte-Carlo (MCMC) sampling. With some care, `genetic algorithms' can be constructed that are reversible Markov chains that satisfy detailed balance; it follows that the stationary distribution of populations is a Gibbs distribution in a simple factorised form. For some standard and popular nonparametric probability models, we exhibit Gibbs-sampling procedures that are plausible genetic algorithms. At mutation-selection equilibrium, a population of genomes is analogous to a sample from a Bayesian posterior, and the genomes are analogous to latent variables. We suggest this is a general, tractable, and insightful formulation of evolutionary computation in terms of standard machine learning concepts and techniques. In addition, we show that evolutionary processes in which selection acts by differences in fecundity are not reversible, and also that it is not possible to construct reversible evolutiona...
In addition, we show that evolutionary processes in which selection acts by differences in fecund... more In addition, we show that evolutionary processes in which selection acts by differences in fecundity are not reversible, and also that it is not possible to construct reversible evolutionary models in which each child is produced by only two parents.
Selective breeding is considered as a communication channel, in a novel way. The Shannon informat... more Selective breeding is considered as a communication channel, in a novel way. The Shannon informational capacity of this channel is an upper limit on the amount of information that can be put into the genome by selection: this is a meaningful upper limit to the adaptive complexity of evolved organisms. We calculate the maximum adaptive complexity achievable for a given mutation rate for simple models of sexual and asexual reproduction. A new and surprising result is that, with sexual reproduction, the greatest adaptive complexity can be achieved with very long genomes, so long that genetic drift ensures that individual genetic elements are only weakly determined. Put another way, with sexual reproduction, the greatest adaptive complexity can in principle be obtained with genetic architectures that are, in a sense, error correcting codes. For asexual reproduction, for a given mutation rate, the achievable adaptive complexity is much less than for sexual reproduction, and depends only weakly on genome length.
Machine Learning, 1992
) is a simple way for agents to learn how to act optimally in controlled Markovian domains. It am... more ) is a simple way for agents to learn how to act optimally in controlled Markovian domains. It amounts to an incremental method for dynamic programming which imposes limited computational demands. It works by successively improving its evaluations of the quality of particular actions at particular states. This paper presents and proves in detail a convergence theorem for Q,-learning based on that outlined in Watkins (1989). We show that Q-learning converges to the optimum action-values with probability 1 so long as all actions are repeatedly sampled in all states and the action-values are represented discretely. We also sketch extensions to the cases of non-discounted, but absorbing, Markov environments, and where many Q values can be changed each iteration, rather than just one.