Luke Dickens | Imperial College London (original) (raw)

Papers by Luke Dickens

Research paper thumbnail of On Efficient Meta-Data Collection for Crowdsensing

Crowdsensing Workshop at PerCom 2014, Mar 24, 2014

Participatory sensing applications have an on-going requirement to turn raw data into useful kno... more Participatory sensing applications have an on-going
requirement to turn raw data into useful knowledge, and to
achieve this, many rely on prompt human generated meta-data to
support and/or validate the primary data payload. These human
contributions are inherently error prone and subject to bias
and inaccuracies, so multiple overlapping labels are needed to
cross-validate one another. While probabilistic inference can be
used to reduce the required label overlap, there is still a need
to minimise the overhead and improve the accuracy of timely
label collection. We present three general algorithms for efficient
human meta-data collection, which support different constraints
on how the central authority collects contributions, and three
methods to intelligently pair annotators with tasks based on
formal information theoretic principles. We test our methods’
performance on challenging synthetic data-sets, based on real
data, and show that our algorithms can significantly lower the
cost and improve the accuracy of human meta-data labelling,
with little or no impact on time.

Research paper thumbnail of GRIDCC: real-time workflow system

The Grid is a concept which allows the sharing of resources between distributed communities, allo... more The Grid is a concept which allows the sharing of resources between distributed communities, allowing each to progress towards potentially different goals. As adoption of the Grid increases so are the activities that people wish to conduct through it. The GRIDCC project is a European Union funded project addressing the issues of integrating instruments into the Grid. This increases the requirement of workflows and Quality of Service upon these workflows as many of these instruments have real-time requirements. In this paper we present the workflow management service within the GRIDCC project which is tasked with optimising the workflows and ensuring that they meet the pre-defined QoS requirements specified upon them.

Research paper thumbnail of GRIDCC: A Real-time Grid workflow system with QoS

Scientific Programming, 2007

A. Stephen McGougha, Asif Akrama, Li Guoa, Marko Krznarica, Luke Dickensa, David Collingb, Janusz... more A. Stephen McGougha, Asif Akrama, Li Guoa, Marko Krznarica, Luke Dickensa, David Collingb, Janusz Martyniakb, Roger Powellc, Paul Kyberdc, Chenxi Huangc, Constantinos Kotsokalisd and Panayiotis Tsanakasd aLondon e-Science Centre, Imperial College London, ...

Research paper thumbnail of Modelling MAS with Finite Analytic Stochastic Processes

The Multi-Agent paradigm is becoming increasingly popular as a way of capturing complex control p... more The Multi-Agent paradigm is becoming increasingly popular as a way of capturing complex control processes with stochastic properties. Many existing modelling tools are not flexible enough for these purposes, possibly because many of the modelling frameworks available inherit their structure from single agent frameworks. This paper proposes a new family of modelling frameworks called FASP, which is based on state encapsulation and powerful enough to capture multi-agent domains. It identifies how the FASP is more flexible, and describes systems more naturally than other approaches, demonstrating this with a number of robot football (soccer) formulations. This is important because more natural descriptions give more control when designing the tasks, against which a group of agents' collective behaviour is evaluated and regulated.

Research paper thumbnail of Transparent Modelling of Finite Stochastic Processes for Multiple Agents

Stochastic Processes are ubiquitous, from automated engineering, through financial markets, to sp... more Stochastic Processes are ubiquitous, from automated engineering, through financial markets, to space exploration. These systems are typically highly dynamic, unpredictable and resistant to analytic methods; coupled with a need to orchestrate long control sequences which are both highly complex and uncertain. This report examines some existing single-and multi-agent modelling frameworks, details their strengths and weaknesses, and uses the experience to identify some fundamental tenets of good practice in modelling stochastic processes. It goes on to develop a new family of frameworks based on these tenets, which can model single-and multi-agent domains with equal clarity and flexibility, while remaining close enough to the existing frameworks that existing analytic and learning tools can be applied with little or no adaption. Some simple and larger examples illustrate the similarities and differences of this approach, and a discussion of the challenges inherent in developing more flexible tools to exploit these new frameworks concludes matters. © Pr pm1pxq y, m2pyq z |xq m1py|xqm2pz|yq

Research paper thumbnail of Risk-based Security Decisions under Uncertainty

This paper addresses the making of security decisions, such as access-control decisions or spam f... more This paper addresses the making of security decisions, such as access-control decisions or spam filtering decisions, under uncertainty, when the benefit of doing so outweighs the need to absolutely guarantee these decisions are correct. For instance, when there are limited, costly, or failed communication channels to a policy-decision-point. Previously, local caching of decisions has been proposed, but when a correct decision is not available, either a policy-decision-point must be contacted, or a default decision used. We improve upon this model by using learned classifiers of access control decisions. These classifiers, trained on known decisions, infer decisions when an exact match has not been cached, and uses intuitive notions of utility, damage and uncertainty to determine when an inferred decision is preferred over contacting a remote PDP. Clearly there is uncertainty in the
predicted decisions, introducing a degree of risk. Our solution proposes a mechanism to quantify the uncertainty of these decisions and allows administrators to bound the overall risk posture of the system. The learning component continuously refines its models based on inputs from a central policy server in cases where the risk is too high or there is too much uncertainty. We have validated our models by building a prototype system and evaluating it with requests from real access control policies. Our experiments show that over a range of system parameters, it is feasible to use machine learning methods to infer access control policies decisions.
Thus our system yields several benefits, including reduced
calls to the PDP, reducing latency and communication costs;
increased net utility; and increased system survivability.

Research paper thumbnail of Mining roles with noisy data

ABSTRACT There has been increasing interest in automatic techniques for generating roles for role... more ABSTRACT There has been increasing interest in automatic techniques for generating roles for role based access control, a process known as role mining. Most role mining approaches assume the input data is clean, and attempt to optimize the RBAC state. We examine role ...

Research paper thumbnail of The Dynamics of Multi-Agent Reinforcement Learning

Infinite-horizon multi-agent control processes with nondeterminism and partial state knowledge ha... more Infinite-horizon multi-agent control processes with nondeterminism and partial state knowledge have particularly interesting properties with respect to adaptive control, such as the non-existence of Nash Equilibria (NE) or non-strict NE which are nonetheless points of convergence. The identification of reinforcement learning (RL) algorithms that are robust, accurate and efficient when applied to these general multi-agent domains is an open, challenging problem. This paper uses learning pressure fields as a means for evaluating RL algorithms in the context of multi-agent processes. Specifically, we show how to model partially observable infinite-horizon stochastic processes (single-agent) and games (multi-agent) within the Finite Analytic Stochastic Process framework. Taking long term average expected returns as utility measures, we show the existence of learning pressure fields: vector fields -similar to the dynamics of evolutionary game theory, which indicate medium and long term learning behaviours of agents independently seeking to maximise this utility. We show empirically that these learning pressure fields are followed closely by policy-gradient RL algorithms.

Research paper thumbnail of Learning Stochastic Models of Information Flow

Proceedings of IEEE 28th International Conference on Data Engineering, Apr 2012

An understanding of information flow has many applications, including for maximizing marketing i... more An understanding of information flow has many
applications, including for maximizing marketing impact on
social media, limiting malware propagation, and managing undesired disclosure of sensitive information. This paper presents
scalable methods for both learning models of information flow in
networks from data, based on the Independent Cascade Model;
and predicting probabilities of unseen flow from these models.
Our approach is based on a principled probabilistic construction
and results compare favourably with existing methods in terms
of accuracy of prediction and scalable evaluation, with the
addition that we are able to evaluate a broader range of queries
than previously shown, including probability of joint and/or
conditional flow, as well as reflecting model uncertainty. Exact
evaluation of flow probabilities is exponential in the number of
edges and naive sampling can also be expensive, so we propose
sampling in an efficient Markov-Chain Monte-Carlo fashion
using the Metropolis-Hastings algorithm – details described in
the paper. We identify two types of data, those where the paths
of past flows are known – attributed data, and those where
only the endpoints are known – unattributed data. Both data
types are addressed in this paper, including training methods,
example real world data sets, and experimental evaluation. In
particular, we investigate flow data from the Twitter micro-blogging service, exploring the flow of messages through retweets
(tweet forwards) for the attributed case, and the propagation of
hashtags (metadata tags) and urls for the unattributed case.

Research paper thumbnail of On Efficient Meta-Data Collection for Crowdsensing

Crowdsensing Workshop at PerCom 2014, Mar 24, 2014

Participatory sensing applications have an on-going requirement to turn raw data into useful kno... more Participatory sensing applications have an on-going
requirement to turn raw data into useful knowledge, and to
achieve this, many rely on prompt human generated meta-data to
support and/or validate the primary data payload. These human
contributions are inherently error prone and subject to bias
and inaccuracies, so multiple overlapping labels are needed to
cross-validate one another. While probabilistic inference can be
used to reduce the required label overlap, there is still a need
to minimise the overhead and improve the accuracy of timely
label collection. We present three general algorithms for efficient
human meta-data collection, which support different constraints
on how the central authority collects contributions, and three
methods to intelligently pair annotators with tasks based on
formal information theoretic principles. We test our methods’
performance on challenging synthetic data-sets, based on real
data, and show that our algorithms can significantly lower the
cost and improve the accuracy of human meta-data labelling,
with little or no impact on time.

Research paper thumbnail of GRIDCC: real-time workflow system

The Grid is a concept which allows the sharing of resources between distributed communities, allo... more The Grid is a concept which allows the sharing of resources between distributed communities, allowing each to progress towards potentially different goals. As adoption of the Grid increases so are the activities that people wish to conduct through it. The GRIDCC project is a European Union funded project addressing the issues of integrating instruments into the Grid. This increases the requirement of workflows and Quality of Service upon these workflows as many of these instruments have real-time requirements. In this paper we present the workflow management service within the GRIDCC project which is tasked with optimising the workflows and ensuring that they meet the pre-defined QoS requirements specified upon them.

Research paper thumbnail of GRIDCC: A Real-time Grid workflow system with QoS

Scientific Programming, 2007

A. Stephen McGougha, Asif Akrama, Li Guoa, Marko Krznarica, Luke Dickensa, David Collingb, Janusz... more A. Stephen McGougha, Asif Akrama, Li Guoa, Marko Krznarica, Luke Dickensa, David Collingb, Janusz Martyniakb, Roger Powellc, Paul Kyberdc, Chenxi Huangc, Constantinos Kotsokalisd and Panayiotis Tsanakasd aLondon e-Science Centre, Imperial College London, ...

Research paper thumbnail of Modelling MAS with Finite Analytic Stochastic Processes

The Multi-Agent paradigm is becoming increasingly popular as a way of capturing complex control p... more The Multi-Agent paradigm is becoming increasingly popular as a way of capturing complex control processes with stochastic properties. Many existing modelling tools are not flexible enough for these purposes, possibly because many of the modelling frameworks available inherit their structure from single agent frameworks. This paper proposes a new family of modelling frameworks called FASP, which is based on state encapsulation and powerful enough to capture multi-agent domains. It identifies how the FASP is more flexible, and describes systems more naturally than other approaches, demonstrating this with a number of robot football (soccer) formulations. This is important because more natural descriptions give more control when designing the tasks, against which a group of agents' collective behaviour is evaluated and regulated.

Research paper thumbnail of Transparent Modelling of Finite Stochastic Processes for Multiple Agents

Stochastic Processes are ubiquitous, from automated engineering, through financial markets, to sp... more Stochastic Processes are ubiquitous, from automated engineering, through financial markets, to space exploration. These systems are typically highly dynamic, unpredictable and resistant to analytic methods; coupled with a need to orchestrate long control sequences which are both highly complex and uncertain. This report examines some existing single-and multi-agent modelling frameworks, details their strengths and weaknesses, and uses the experience to identify some fundamental tenets of good practice in modelling stochastic processes. It goes on to develop a new family of frameworks based on these tenets, which can model single-and multi-agent domains with equal clarity and flexibility, while remaining close enough to the existing frameworks that existing analytic and learning tools can be applied with little or no adaption. Some simple and larger examples illustrate the similarities and differences of this approach, and a discussion of the challenges inherent in developing more flexible tools to exploit these new frameworks concludes matters. © Pr pm1pxq y, m2pyq z |xq m1py|xqm2pz|yq

Research paper thumbnail of Risk-based Security Decisions under Uncertainty

This paper addresses the making of security decisions, such as access-control decisions or spam f... more This paper addresses the making of security decisions, such as access-control decisions or spam filtering decisions, under uncertainty, when the benefit of doing so outweighs the need to absolutely guarantee these decisions are correct. For instance, when there are limited, costly, or failed communication channels to a policy-decision-point. Previously, local caching of decisions has been proposed, but when a correct decision is not available, either a policy-decision-point must be contacted, or a default decision used. We improve upon this model by using learned classifiers of access control decisions. These classifiers, trained on known decisions, infer decisions when an exact match has not been cached, and uses intuitive notions of utility, damage and uncertainty to determine when an inferred decision is preferred over contacting a remote PDP. Clearly there is uncertainty in the
predicted decisions, introducing a degree of risk. Our solution proposes a mechanism to quantify the uncertainty of these decisions and allows administrators to bound the overall risk posture of the system. The learning component continuously refines its models based on inputs from a central policy server in cases where the risk is too high or there is too much uncertainty. We have validated our models by building a prototype system and evaluating it with requests from real access control policies. Our experiments show that over a range of system parameters, it is feasible to use machine learning methods to infer access control policies decisions.
Thus our system yields several benefits, including reduced
calls to the PDP, reducing latency and communication costs;
increased net utility; and increased system survivability.

Research paper thumbnail of Mining roles with noisy data

ABSTRACT There has been increasing interest in automatic techniques for generating roles for role... more ABSTRACT There has been increasing interest in automatic techniques for generating roles for role based access control, a process known as role mining. Most role mining approaches assume the input data is clean, and attempt to optimize the RBAC state. We examine role ...

Research paper thumbnail of The Dynamics of Multi-Agent Reinforcement Learning

Infinite-horizon multi-agent control processes with nondeterminism and partial state knowledge ha... more Infinite-horizon multi-agent control processes with nondeterminism and partial state knowledge have particularly interesting properties with respect to adaptive control, such as the non-existence of Nash Equilibria (NE) or non-strict NE which are nonetheless points of convergence. The identification of reinforcement learning (RL) algorithms that are robust, accurate and efficient when applied to these general multi-agent domains is an open, challenging problem. This paper uses learning pressure fields as a means for evaluating RL algorithms in the context of multi-agent processes. Specifically, we show how to model partially observable infinite-horizon stochastic processes (single-agent) and games (multi-agent) within the Finite Analytic Stochastic Process framework. Taking long term average expected returns as utility measures, we show the existence of learning pressure fields: vector fields -similar to the dynamics of evolutionary game theory, which indicate medium and long term learning behaviours of agents independently seeking to maximise this utility. We show empirically that these learning pressure fields are followed closely by policy-gradient RL algorithms.

Research paper thumbnail of Learning Stochastic Models of Information Flow

Proceedings of IEEE 28th International Conference on Data Engineering, Apr 2012

An understanding of information flow has many applications, including for maximizing marketing i... more An understanding of information flow has many
applications, including for maximizing marketing impact on
social media, limiting malware propagation, and managing undesired disclosure of sensitive information. This paper presents
scalable methods for both learning models of information flow in
networks from data, based on the Independent Cascade Model;
and predicting probabilities of unseen flow from these models.
Our approach is based on a principled probabilistic construction
and results compare favourably with existing methods in terms
of accuracy of prediction and scalable evaluation, with the
addition that we are able to evaluate a broader range of queries
than previously shown, including probability of joint and/or
conditional flow, as well as reflecting model uncertainty. Exact
evaluation of flow probabilities is exponential in the number of
edges and naive sampling can also be expensive, so we propose
sampling in an efficient Markov-Chain Monte-Carlo fashion
using the Metropolis-Hastings algorithm – details described in
the paper. We identify two types of data, those where the paths
of past flows are known – attributed data, and those where
only the endpoints are known – unattributed data. Both data
types are addressed in this paper, including training methods,
example real world data sets, and experimental evaluation. In
particular, we investigate flow data from the Twitter micro-blogging service, exploring the flow of messages through retweets
(tweet forwards) for the attributed case, and the propagation of
hashtags (metadata tags) and urls for the unattributed case.