Fabio Cuzzolin | Oxford Brookes University (original) (raw)

Videos by Fabio Cuzzolin

Invited talk @ DeepView: ``Global Multi-Target Visual Surveillance Based on Real-Time Large-Scale... more Invited talk @ DeepView: ``Global Multi-Target Visual Surveillance Based on Real-Time Large-Scale Analysis", AVSS 2021, Nov 16 2021 https://sites.google.com/view/deepview2021/

Autonomous vehicles (AVs) employ a variety of sensors to identify roadside infrastructure and other road users, with much of the existing work focusing on scene understanding and robust object detection. Human drivers, however, approach the driving task in a more holistic fashion which entails, in particular, recognising and understanding the evolution of road events. Testing an AV’s capability to recognise the actions undertaken by other road agents is thus crucial to improve their situational awareness and facilitate decision making.
In this talk we introduce the ROad event Awareness Dataset (ROAD) for Autonomous Driving, to our knowledge the first of its kind. ROAD is explicitly designed to test the ability of an autonomous vehicle to detect road events.

28 views

Invited seminar, Department of Statistics, Harvard University, 2016 The theory of belief funct... more Invited seminar, Department of Statistics, Harvard University, 2016

The theory of belief functions, sometimes referred to as evidence theory or Dempster-Shafer theory, was first introduced by Arthur P. Dempster in the context of statistical inference, to be later developed by Glenn Shafer as a general framework for modelling epistemic uncertainty. The methodology is now well established as a general framework for reasoning with uncertainty, with well-understood connections to related frameworks such as probability, possibility, random set and imprecise probability theories.

This talk aims at bridging the gap between researchers in the field and the wider AI and Uncertainty Theory community, with the longer term goal of a more fruitful collaboration and dissemination of ideas.

43 views

Papers by Fabio Cuzzolin

Developments in Neuroethics and Bioethics, 2024

Understanding Theory of Mind is challenging as it can be viewed as a complex holistic process tha... more Understanding Theory of Mind is challenging as it can be viewed as a complex holistic process that can be decomposed into a number of hot and cold cognitive processes. Cold cognitive processes are non-emotional, whereas hot cognition is both social and emotional. Cold cognition includes working memory, cognitive flexibility and 'if-then' inferential logic and planning, processes which are used in non-social contexts, but which are often components of Theory of Mind tests. In social situations, we use our social cognition to process, remember and use information to explain and predict other people's behaviour, as well as our own. Therefore, strategic behaviour for goal achievement involving other people often relies on an interaction between hot and cold cognition. Similarly, for goal achievement in artificial intelligence (AI), for example robust performance in autonomous cars, or therapeutic interactions with humans, it is important to not only have the cold cognitive processes, which are well established in AI, but also the hot cognitive processes that require further development. This chapter will address hot cognitive processes, their underlying neural networks and how this information might be integrated in AI models to more successfully mimic the human brain and to enhance AI-human interactions. Finally, the importance of an integrated and interdisciplinary approach to AI models and the increasingly arising ethical issues in AI are discussed.

Neurocomputing, 2024

The complexity of scene parsing grows with the number of object and scene classes, which is highe... more The complexity of scene parsing grows with the number of object and scene classes, which is higher in unrestricted open scenes. The biggest challenge is to model the spatial relation between scene elements while succeeding in identifying objects at smaller scales. This paper presents a novel feature-boosting network that gathers spatial context from multiple levels of feature extraction and computes the attention weights for each level of representation to generate the final class labels. A novel 'channel attention module' is designed to compute the attention weights, ensuring that features from the relevant extraction stages are boosted while the others are attenuated. The model also learns spatial context information at low resolution to preserve the abstract spatial relationships among scene elements and reduce computation cost. Spatial attention is subsequently concatenated into a final feature set before applying feature boosting. Low-resolution spatial attention features are trained using an auxiliary task that helps learning a coarse global scene structure. The proposed model outperforms all state-ofthe-art models on both the ADE20K and the Cityscapes datasets.

Information Fusion, 2024

Classical probability is not the only mathematical theory of uncertainty, or the most general. Ma... more Classical probability is not the only mathematical theory of uncertainty, or the most general. Many authors have argued that probability theory is ill-equipped to model the 'epistemic', reducible uncertainty about the process generating the data. To address this, many alternative theories of uncertainty have been formulated. In this paper, we highlight how uncertainty theories can be seen as forming clusters characterised by a shared rationale, are connected to each other in an intricate but interesting way, and can be ranked according to their degree of generality. Our objective is to propose a structured, critical summary of the research landscape in uncertainty theory, and discuss its potential for wider adoption in artificial intelligence.

2024 Conference on Neural Information Processing Systems (NeurIPS 2024), 2024

Statistical learning theory is the foundation of machine learning, providing theoretical bounds f... more Statistical learning theory is the foundation of machine learning, providing theoretical bounds for the risk of models learned from a (single) training set, assumed to issue from an unknown probability distribution. In actual deployment, however, the data distribution may (and often does) vary, causing domain adaptation/generalization issues. In this paper we lay the foundations for a 'credal' theory of learning, using convex sets of probabilities (credal sets) to model the variability in the data-generating distribution. Such credal sets, we argue, may be inferred from a finite sample of training sets. Bounds are derived for the case of finite hypotheses spaces (both assuming realizability or not), as well as infinite model spaces, which directly generalize classical results.

2024 Conference on Neural Information Processing Systems (NeurIPS 2024), 2024

This paper introduces an innovative approach to classification called Credal Deep Ensembles (CreD... more This paper introduces an innovative approach to classification called Credal Deep Ensembles (CreDEs), namely, ensembles of novel Credal-Set Neural Networks (CreNets). CreNets are trained to predict a lower and an upper probability bound for each class, which, in turn, determine a convex set of probabilities (credal set) on the class set. The training employs a loss inspired by distributionally robust optimization which simulates the potential divergence of the test distribution from the training distribution, in such a way that the width of the predicted probability interval reflects the 'epistemic' uncertainty about the future data distribution. Ensembles can be constructed by training multiple CreNets, each associated with a different random seed, and averaging the outputted intervals. Extensive experiments are conducted on various out-of-distributions (OOD) detection benchmarks (CIFAR10/100 vs SVHN/Tiny-ImageNet, CIFAR10 vs CIFAR10-C, ImageNet vs ImageNet-O) and using different network architectures (ResNet50, VGG16, and ViT Base). Compared to Deep Ensemble baselines, CreDEs demonstrate higher test accuracy, lower expected calibration error, and significantly improved epistemic uncertainty estimation.

arXiv:2307.05772, 2023

Machine learning is increasingly deployed in safety-critical domains where robustness against adv... more Machine learning is increasingly deployed in safety-critical domains where robustness against adversarial attacks is crucial and erroneous predictions could lead to potentially catastrophic consequences. This highlights the need for learning systems to be equipped with the means to determine a model's confidence in its prediction and the epistemic uncertainty associated with it, 'to know when a model does not know'. In this paper, we propose a novel Random-Set Convolutional Neural Network (RS-CNN) for classification which predicts belief functions rather than probability vectors over the set of classes, using the mathematics of random sets, i.e., distributions over the power set of the sample space. Based on the epistemic deep learning approach, random-set models are capable of representing the 'epistemic' uncertainty induced in machine learning by limited training sets. We estimate epistemic uncertainty by approximating the size of credal sets associated with the predicted belief functions, and experimentally demonstrate how our approach outperforms competing uncertainty-aware approaches in a classical evaluation setting. The performance of RS-CNN is best demonstrated on OOD samples where it manages to capture the true prediction while standard CNNs fail.

arXiv:2401.05043, 2024

Uncertainty estimation is increasingly attractive for improving the reliability of neural network... more Uncertainty estimation is increasingly attractive for improving the reliability of neural networks. In this work, we present novel credal-set interval neural networks (CreINNs) designed for classification tasks. CreINNs preserve the traditional interval neural network structure, capturing weight uncertainty through deterministic intervals, while forecasting credal sets using the mathematical framework of probability intervals. Experimental validations on an out-of-distribution detection benchmark (CIFAR10 vs SVHN) showcase that CreINNs outperform epistemic uncertainty estimation when compared to variational Bayesian neural networks (BNNs) and deep ensembles (DEs). Furthermore, CreINNs exhibit a notable reduction in computational complexity compared to variational BNNs and demonstrate smaller model sizes than DEs.

arXiv:2401.09435, 2023

In this paper, we discuss a potential agenda for future work in the theory of random sets and bel... more In this paper, we discuss a potential agenda for future work in the theory of random sets and belief functions, touching upon a number of focal issues: the development of a fully-fledged theory of statistical reasoning with random sets, including the generalisation of logistic regression and of the classical laws of probability; the further development of the geometric approach to uncertainty, to include general random sets, a wider range of uncertainty measures and alternative geometric representations; the application of this new theory to high-impact areas such as climate change, machine learning and statistical learning theory.

arXiv:2402.00957, 2024

Statistical learning theory is the foundation of machine learning, providing theoretical bounds f... more Statistical learning theory is the foundation of machine learning, providing theoretical bounds for the risk of models learnt from a (single) training set, assumed to issue from an unknown probability distribution. In actual deployment, however, the data distribution may (and often does) vary, causing domain adaptation/generalization issues. In this paper we lay the foundations for a 'credal' theory of learning, using convex sets of probabilities (credal sets) to model the variability in the data-generating distribution. Such credal sets, we argue, may be inferred from a finite sample of training sets. Bounds are derived for the case of finite hypotheses spaces (both assuming realizability or not) as well as infinite model spaces, which directly generalize classical results.

MM '22: Proceedings of the 30th ACM International Conference on Multimedia, 2022

We deal with the problem of few-shot class incremental learning (FSCIL), which requires a model t... more We deal with the problem of few-shot class incremental learning (FSCIL), which requires a model to continuously recognize new categories for which limited training data are available. Existing FSCIL methods depend on prior knowledge to regularize the model parameters for combating catastrophic forgetting. Devising an effective prior in a low-data regime, however, is not trivial. The memory-replay based approaches from the fully-supervised class incremental learning (CIL) literature cannot be used directly for FSCIL as the generative memory-replay modules of CIL are hard to train from few training samples. However, generative replay can tackle both the stability and plasticity of the models simultaneously by generating a large number of class-conditional samples. Convinced by this fact, we propose a generative modeling-based FSCIL framework using the paradigm of memory-replay in which a novel conditional few-shot generative adversarial network (GAN) is incrementally trained to produce visual features while ensuring the stability-plasticity trade-off through novel loss functions and combating the mode-collapse problem effectively. Furthermore, the class-specific synthesized visual features from the few-shot GAN are constrained to match the respective latent semantic prototypes obtained from a well-defined semantic space. We find that the advantages of this semantic restriction is two-fold, in dealing with forgetting, while making the features class-discernible. The model requires a single per-class prototype vector to be maintained in a dynamic memory buffer. Experimental results on the benchmark and large-scale CiFAR-100, CUB-200, and Mini-ImageNet confirm the superiority of our model over the current FSCIL state of the art.

2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), 2023

The emerging field of action prediction-the task of forecasting action in a video sequence-plays ... more The emerging field of action prediction-the task of forecasting action in a video sequence-plays a vital role in various computer vision applications such as autonomous driving, activity analysis and human-computer interaction. Despite significant advancements, accurately predicting future actions remains a challenging problem due to high dimensionality, complex dynamics and uncertainties inherent in video data. Traditional supervised approaches require large amounts of labelled data, which is expensive and timeconsuming to obtain. This paper introduces a novel selfsupervised video strategy for enhancing action prediction inspired by DINO (self-distillation with no labels). The approach, named Temporal-DINO, employs two models; a 'student' processing past frames; and a 'teacher' processing both past and future frames, enabling a broader temporal context. During training, the teacher guides the student to learn future context by only observing past frames. The strategy is evaluated on ROAD dataset for the action prediction downstream task using 3D-ResNet, Transformer, and LSTM architectures. The experimental results showcase significant improvements in prediction performance across these architectures, with our method achieving an average enhancement of 9.9% Precision Points (PP), which highlights its effectiveness in enhancing the backbones' capabilities of capturing long-term dependencies. Furthermore, our approach demonstrates efficiency in terms of the pretraining dataset size and the number of epochs required. This method overcomes limitations present in other approaches, including the consideration of various backbone architectures, addressing multiple prediction horizons, reducing reliance on hand-crafted augmentations, and streamlining the pretraining process into a single stage. These findings highlight the potential of our approach in diverse video-based tasks such as activity recognition, motion planning, and scene understanding. Code can be found at https://github.com/IzzeddinTeeti/ssl pred.

IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2024

Interpretation and understanding of video presents a challenging computer vision task in numerous... more Interpretation and understanding of video presents a challenging computer vision task in numerous fields - e.g. autonomous driving and sports analytics. Existing approaches to interpreting the actions taking place within a video clip are based upon Temporal Action Localisation (TAL), which typically identifies short-term actions. The emerging field of Complex Activity Detection (CompAD) extends this analysis to long-term activities, with a deeper understanding obtained by modelling the internal structure of a complex activity taking place within the video. We address the CompAD problem using a hybrid graph neural network which combines attention applied to a graph encoding the local (short-term) dynamic scene with a temporal graph modelling the overall long-duration activity. Our approach is as follows: i) Firstly, we propose a novel feature extraction technique which, for each video snippet, generates spatiotemporal `tubes' for the active elements (`agents') in the (local) scene by detecting individual objects, tracking them and then extracting 3D features from all the agent tubes as well as the overall scene. ii) Next, we construct a local scene graph where each node (representing either an agent tube or the scene) is connected to all other nodes. Attention is then applied to this graph to obtain an overall representation of the local dynamic scene. iii) Finally, all local scene graph representations are interconnected via a temporal graph, to estimate the complex activity class together with its start and end time. The proposed framework outperforms all previous state-of-the-art methods on all three datasets including ActivityNet-1.3, Thumos-14, and ROAD.

Expert Systems with Applications, 2021

Fire disaster throughout the globe causes social, environmental, and economical damage, making it... more Fire disaster throughout the globe causes social, environmental, and economical damage, making its early detection and instant reporting essential for saving human lives and properties. Smoke detection plays a key role in early fire detection but majority of the existing methods are limited to either indoor or outdoor surveillance environments, with poor performance for hazy scenarios. In this paper, we present a Convolutional Neural Network (CNN)-based smoke detection and segmentation framework for both clear and hazy environments. Unlike existing methods, we employ an efficient CNN architecture, termed EfficientNet, for smoke detection with better accuracy. We also segment the smoke regions using DeepLabv3+, which is supported by effective encoders and decoders along with a pixel-wise classifier for optimum localization. Our smoke detection results evince a noticeable gain up to 3% in accuracy and a decrease of 0.46% in False Alarm Rate (FAR), while segmentation reports a significant increase of 2% and 1% in global accuracy and mean Intersection over Union (IoU) scores, respectively. This makes our method a best fit for smoke detection and segmentation in real-world surveillance settings.

Frontiers in Artificial Intelligence, 2022

Theory of Mind (ToM)-the ability of the human mind to attribute mental states to others-is a key ... more Theory of Mind (ToM)-the ability of the human mind to attribute mental states to others-is a key component of human cognition. In order to understand other people's mental states or viewpoint and to have successful interactions with others within social and occupational environments, this form of social cognition is essential. The same capability of inferring human mental states is a prerequisite for artificial intelligence (AI) to be integrated into society, for example in healthcare and the motoring industry. Autonomous cars will need to be able to infer the mental states of human drivers and pedestrians to predict their behavior. In the literature, there has been an increasing understanding of ToM, specifically with increasing cognitive science studies in children and in individuals with Autism Spectrum Disorder. Similarly, with neuroimaging studies there is now a better understanding of the neural mechanisms that underlie ToM. In addition, new AI algorithms for inferring human mental states have been proposed with more complex applications and better generalisability. In this review, we synthesize the existing understanding of ToM in cognitive and neurosciences and the AI computational models that have been proposed. We focus on preference learning as an area of particular interest and the most recent neurocognitive and computational ToM models. We also discuss the limitations of existing models and hint at potential approaches to allow ToM models to fully express the complexity of the human mind in all its aspects, including values and preferences.

Frontiers in Artificial Intelligence, 2022

redictions about the future are inherently uncertain, so it is hard to make very confident statem... more redictions about the future are inherently uncertain, so it is hard to make very confident statements about what AI systems will ultimately be capable of or what kind of AI approaches might help bridge the gap to human capabilities. Despite this, we think it is likely that a better integration of theory of mind findings from cognitive neuroscience and AI will be useful, even if it were only to help improve Human-AI interaction. If we want AI systems to be capable of inferring and motivated to respect human preferences, how humans do theory of mind seems an obvious place to look for ideas. We think the stakes are likely to only get higher.

International Journal of Intelligent Systems, 2021

Video surveillance has shown encouraging outcomes to monitor human activities and prevent crimes ... more Video surveillance has shown encouraging outcomes to monitor human activities and prevent crimes in real time. To this extent, violence detection (VD) has received substantial attention from the research community due to its vast applications, such as ensuring security over public areas and industrial settings through smart machine intelligence. However, because of changing illumination, complex background and low resolution, the analysis of violence patterns remains challenging in the industrial video surveillance domain. In this paper, we propose a computationally intelligent VD approach to precisely detect violent scenes through deep analysis of surveillance video sequential patterns. First, the video stream acquired through the vision sensor is processed by a lightweight convolutional neural network (CNN) for the segmentation of important shots. Next, temporal optical flow features are extracted from the informative shots via a residential optical flow CNN. These are concatenated with appearance-invariant features extracted from a Darknet CNN model. Finally, a multilayer long short-term memory network is plugged to generate the final feature map for learning the violence patterns in a sequence of frames. In addition, we contribute to the existing surveillance VD data set by considering its indoor and outdoor scenarios separately for the proposed method's evaluation, achieving a 2% increase in accuracy over surveillance fight data set. Experiments also show encouraging results over the state of the art on other challenging benchmark data sets.

Machine Learning , 2023

Neural networks have proven to be very powerful at computer vision tasks. However, they often exh... more Neural networks have proven to be very powerful at computer vision tasks. However, they often exhibit unexpected behaviors, acting against background knowledge about the problem at hand. This calls for models (i) able to learn from requirements expressing such background knowledge, and (ii) guaranteed to be compliant with the requirements themselves. Unfortunately, the development of such models is hampered by the lack of real-world datasets equipped with formally specified requirements. In this paper, we introduce the ROad event Awareness Dataset with logical Requirements (ROAD-R), the first publicly available dataset for autonomous driving with requirements expressed as logical constraints. Given ROAD-R, we show that current state-of-the-art models often violate its logical constraints, and that it is possible to exploit them to create models that (i) have a better performance, and (ii) are guaranteed to be compliant with the requirements themselves.

Cambridge Yearbook of European Legal Studies, 2022

Artificial Intelligence (‘AI’) technologies present great opportunities for the investment manage... more Artificial Intelligence (‘AI’) technologies present great opportunities for the investment management industry (as well as broader financial services). However, there are presently no regulations specifically aiming at AI in investment management. Does this mean that AI is currently unregulated? If not, which hard and soft law rules apply?

Investments are a heavily regulated industry (MIFID II, UCITS IV and V, SM&CR, GDPR etc). Most regulations are intentionally technology-neutral. These regulations are legally binding (hard law). Recent years saw the emergence of regulatory and industry publications (soft laws) focusing specifically on AI. In this Article we analyse both hard law and soft law instruments.

The contributions of this work are: first, a review of key regulations applicable to AI in investment management (and oftentimes by extension to banking as well) from multiple jurisdictions; second, a framework and an analysis of key regulatory themes for AI.

28 views

Invited seminar, Department of Statistics, Harvard University, 2016 The theory of belief funct... more Invited seminar, Department of Statistics, Harvard University, 2016

43 views

Developments in Neuroethics and Bioethics, 2024

Neurocomputing, 2024

Information Fusion, 2024

2024 Conference on Neural Information Processing Systems (NeurIPS 2024), 2024

arXiv:2307.05772, 2023

arXiv:2401.05043, 2024

arXiv:2401.09435, 2023

arXiv:2402.00957, 2024

Statistical learning theory is the foundation of machine learning, providing theoretical bounds f... more Statistical learning theory is the foundation of machine learning, providing theoretical bounds for the risk of models learnt from a (single) training set, assumed to issue from an unknown probability distribution. In actual deployment, however, the data distribution may (and often does) vary, causing domain adaptation/generalization issues. In this paper we lay the foundations for a 'credal' theory of learning, using convex sets of probabilities (credal sets) to model the variability in the data-generating distribution. Such credal sets, we argue, may be inferred from a finite sample of training sets. Bounds are derived for the case of finite hypotheses spaces (both assuming realizability or not) as well as infinite model spaces, which directly generalize classical results.

MM '22: Proceedings of the 30th ACM International Conference on Multimedia, 2022

2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), 2023

IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2024

Expert Systems with Applications, 2021

Frontiers in Artificial Intelligence, 2022

International Journal of Intelligent Systems, 2021

Machine Learning , 2023

Cambridge Yearbook of European Legal Studies, 2022

arXiv (Cornell University), Jun 15, 2022

The theory of belief functions, sometimes referred to as evidence theory or Dempster-Shafer theor... more The theory of belief functions, sometimes referred to as evidence theory or Dempster-Shafer theory, was first introduced by Arthur P. Dempster in the context of statistical inference, to be later developed by Glenn Shafer as a general framework for modelling epistemic uncertainty. Belief theory and the closely related random set theory form natural frameworks for modelling situations in which data are missing or scarce: think of extremely rare events such as volcanic eruptions or power plant meltdowns, problems subject to huge uncertainties due to the number and complexity of the factors involved (e.g. climate change), but also the all-important issue with generalisation from small training sets in machine learning. This tutorial is designed to introduce the principles and rationale of random sets and belief function theory to mainstream statisticians, mathematicians and working scientists, survey the key elements of the methodology and the most recent developments, make practitioners aware of the set of tools that have been developed for reasoning in the belief function framework on real-world problems. Attendees will acquire first-hand knowledge of how to apply these tools to significant problems in major application fields such as computer vision, climate change, and others. A research programme for the future of random set theory and high impact applications is eventually outlined.

Artificial intelligence is becoming part of our lives. Smart cars will engage our roads in less t... more Artificial intelligence is becoming part of our lives. Smart cars will engage our roads in less than ten years’ time; shops with no checkout, which automatically recognise customers and what they purchase, are already open for business. But to enable machines to deal with uncertainty, we must fundamentally change the way machines learn from the data they observe so that they will be able to cope with situations they have never encountered in the safest possible way. Interacting naturally with human beings and their complex environments will only be possible if machines are able to put themselves in people’s shoes: to guess their goals, beliefs and intentions – in other words, to read our minds.

Fabio will explain just how machines can be provided with this mind-reading ability.

ICCV 2017

Poster presented at ICCV 2017

Although born within the remit of mathematical statistics, the theory of belief functions has lat... more Although born within the remit of mathematical statistics, the theory of belief functions has later evolved towards subjective interpretations which have distanced it from its mother field, and have drawn it nearer to artificial intelligence.
The purpose of this talk, in its first part, is to understanding belief theory in the context of mathematical probability and its main interpretations, Bayesian and frequentist statistics, contrasting these three methodologies according to their treatment of uncertain data.
In the second part we recall the existing statistical views of belief function theory, due to the work by Dempster, Almond, Hummel and Landy, Zhang and Liu, Walley and Fine, among others.
Finally, we outline a research programme for the development of a fully-fledged theory of statistical inference with random sets. In particular, we discuss the notion of generalised lower and upper likelihoods, the formulation of a framework for logistic regression with belief functions, the generalisation of the classical total probability theorem to belief functions, the formulation of parametric models based of random sets, and the development of a theory of random variables and processes in which the underlying probability space is replaced by a random set space.

We investigate the problem of online action localisation in videos. Our model uses appearance and... more We investigate the problem of online action localisation in videos. Our model uses appearance and motion cues to generate region proposals from streaming video frames. Recently, deep feature representation outperforms the handcrafted features in object classification. Driven by this progress, we model our system using deep CNN features. We proposed an online incremental learning framework which initially learns from a burst of streaming video frames and iteratively updates the learner by solving a set of linear SVMs (1-vs-rest) using a batch stochastic gradient descent (SGD) algorithm with hard example mining.

In this work we propose a new approach to the spatiotemporal localisation (detection) and classif... more In this work we propose a new approach
to the spatiotemporal localisation (detection)
and classification of multiple concurrent actions
within temporally untrimmed videos. Our
framework is composed of three stages. In stage
1, a cascade of deep region proposal and detection
networks are employed to classify regions
of each video frame potentially containing an
action of interest. In stage 2, appearance and
motion cues are combined by merging the detection
boxes and softmax classification scores
generated by the two cascades. In stage 3, sequences
of detection boxes most likely to be associated
with a single action instance, called action
tubes, are constructed by solving two optimisation
problems via dynamic programming.

This half-day tutorial on Belief function (random sets) for the working scientist was presented o... more This half-day tutorial on Belief function (random sets) for the working scientist was presented on July 9th 2016 at the latest International Joint Conference on Artificial Intelligence (IJCAI-16).

The tutorial is very comprehensive (468 slides), covering:

(i) a review of mathematical probability and its interpretations (Bayesian and frequentist);

(ii) the rational for going beyond standard probability: it's all about the data!

(iii) the basis notions of the theory of belief functions;

(iv) reasoning with belief functions: inference, combination/conditioning, graphical models, decision making;

(v) using belief functions for classification, regression, estimation, etc;

(vi) dealing with computational issues and extending belief measures to real numbers;

(vii) the main frameworks derived from belief theory, and its relationship with other theories of uncertainty;

(viii) a number of example applications;

(ix) new horizons, from the formulation of limit theorems for random sets, generalising the notion of likelihood and logistic regression for rare event estimation, climatic change modelling and new foundations for machine learning based on random set theory, a geometry of uncertainty.

Tutorial slides are downloadable at http://cms.brookes.ac.uk/staff/FabioCuzzolin/files/IJCAI2016.pdf

WHY A MATHEMATICS OF UNCERTAINTY? - probabilities do not represent well ignorance and lack of da... more WHY A MATHEMATICS OF UNCERTAINTY?

- probabilities do not represent well ignorance and lack of data;
- evidence is normally limited, rather than infinite as assumed by (frequentist) probability;
- expert knowledge needs often to be combined with hard evidence;
- in extreme cases (rare events or far-future predictions) very little data;
- bottom line: not enough evidence to determine the actual probability describing the problem.

The theory of belief functions, sometimes referred to as evidence theory or Dempster-Shafer theor... more The theory of belief functions, sometimes referred to as evidence theory or Dempster-Shafer theory, was first introduced by Arthur P. Dempster in the context of statistical inference, to be later developed by Glenn Shafer as a general framework for modelling epistemic uncertainty. The methodology is now well established as a general framework for reasoning with uncertainty, with well-understood connections to related frameworks such as probability, possibility, random set and imprecise probability theories. Importantly, in recent years the number of papers published on the theory and application of belief functions has been booming (reaching over 800 in 2014 alone), displaying strong growth in particular in the East Asian community and among practitioners working on multi-criteria decision making, earth sciences, and sensor fusion. Belief functions are a natural tool to cope with heavy uncertainty, lack of evidence and missing data, and extremely rare events.

An early debate on the rationale of belief functions gave a strong contribution to the growth and success of the UAI community and series of conference in the Eighties and Nineties, thanks to the contribution of scientists of the caliber of Glenn Shafer, Judea Pearl, Philippe Smets and Prakash Shenoy, among others. Ever since the UAI and BELIEF community have somewhat diverged, and the proposers’ effort has been recently directed towards going back to a closer relationships and exchange of ideas between the two communities. This was one of the aims of the recent BELIEF 2014 International Conference of which the proposers were General Chair and member of the Steering Committee, respectively. A number of books are being published on the subject as we speak, and the impact of the belief function approach to uncertainty is growing.

The tutorial aims at bridging the gap between researchers in the field and the wider AI and Uncertainty Theory community, with the longer term goal of a more fruitful collaboration and dissemination of ideas.

Whereas continual learning has recently attracted much attention in the machine learning communit... more Whereas continual learning has recently attracted much attention in the machine learning community, the focus has been mainly on preventing the model updated in the light of new data from ‘catastrophically forgetting’ its initial knowledge and abilities. This, however, is in stark contrast with common real-world situations in which an initial model is trained using limited data, only to be later deployed without any additional supervision. In these scenarios the goal is for the model to be incrementally updated using the new (unlabelled) data, in order to adapt to a target domain continually shifting over time. These situations can be modeled by an original continual semi-supervised learning (CSSL) paradigm. There, an initial training batch of data-points annotated with ground truth (class labels for classification problems, or vectors of target values for regression ones) is available and can be used to train an initial model. Then, however, the model is incrementally updated by exploiting the information provided by a stream of unlabelled data points, each of which is generated by a data generating process (modelled, as typically assumed, by a probability distribution) which varies with time. No artificial subdivision into ‘tasks’ is assumed, as the data-generating distribution may arbitrarily vary over time. The aim of the First International Workshop on Continual Semi-Supervised Learning (CSSL @ IJCAI 2021)1 was to formalise this new learning paradigm and to introduce it to the wider machine learning community, in order to mobilise effort in this direction. As part of the workshop we also presented the first two benchmark datasets for this problem, derived from important computer vision scenarios, and proposed the first Continual Semi-Supervised Learning Challenges to the research community. The workshop encouraged the submission of papers on continual learning in its broader sense, covering topics such as: the suitability of existing datasets for continual learning; new benchmark datasets explicitly designed for continual learning; protocols for training and testing in different continual learning settings; metrics for assessing continual learning methods; traditional task-based continual learning; the relation between continual learning and model adaptation; the distinction between the learning of new classes and the learning from new instances; real-world applications of continual learning; catastrophic forgetting and possible mitigation strategies; applications of transfer learning, multi-task and meta-learning to continual learning; continual supervised, semisupervised and unsupervised learning; lifelong learning; few-shot learning; and continual reinforcement and inverse reinforcement learning. The aim was to foster the debate around all aspects of continual learning, especially those which are the subject of ongoing frontier research. As part of the event, we invited both paper track contributions on the above-mentioned topics as well as submissions of entries to two challenges specifically designed to test CSSL approaches. To this purpose, two new benchmarks, a Continual Activity Recognition (CAR) dataset2 and a Continual Crowd Counting (CCC) dataset, were specifically designed to assess continual semisupervised learning on two important computer vision tasks: activity recognition and crowd counting. Papers submitted to the workshop were asked to follow the standard IJCAI 2021 template (6 pages plus 1 for the references). Paper submission took place through EasyChair. Authors were allowed to submit a supplementary material document with details on their implementation. However, reviewers were not required to consult this additional material when assessing the submission. A double-blind review process was followed. Authors were asked not include any identifying information (names, affiliations, etc.) or links and self-references that could reveal their identities. Each submission received three reviews from members of the Program Committee, which assessed it based on relevance, novelty and potential for impact. No rebuttal stage was introduced. The authors of the accepted papers were asked to guarantee their presence at the workshop, with at least one author for each accepted paper registering for the conference. The workshop allowed for the presentation during the workshop of results published elsewhere, but these papers were not considered for or included in these published proceedings. The paper submission deadline was initially set to June 15, 2021, but was later extended to July 2, 2021. Authors were notified of the result on July 19, 2021, and asked to submit a camera-ready version of their paper by July 31. A total of 14 papers were submitted, of which one was withdrawn and one rejected, for an acceptance rate of 86% of papers presented at the workshop, while the rate of acceptance for papers intended for the published proceedings is 69%, 9 papers. The 20 members of the Program Committee were assigned on average two papers to review each. The workshop issued a Best Paper Award to the author(s) of the best accepted paper, as judged by the Organising Committee based on the reviews assigned by PC members, as well as a Best Student Paper Award, selected in the same way and a Prize to be awarded to the winners of each of the Challenges. The Best Paper Award was assigned to “SPeCiaL: Self-Supervised Pretraining for Continual Learning”, by Lucas Caccia and Joelle Pineau. The Best Student Paper Award was secured by “Hypernetworks for Continual Semi-Supervised Learning”, by Dhanajit Brahma, Vinay Kumar Verma and Piyush Rai.

Lecture Notes in Computer Science, 2018

This book constitutes the refereed proceedings of the 5th International Conference on Belief Func... more This book constitutes the refereed proceedings of the 5th International Conference on Belief Functions, BELIEF 2018, held in Compiègne, France, in September 2018.The 33 revised regular papers presented in this book were carefully selected and reviewed from 73 submissions. The papers were solicited on theoretical aspects (including for example statistical inference, mathematical foundations, continuous belief functions) as well as on applications in various areas including classification, statistics, data fusion, network analysis and intelligent vehicles.

This book constitutes the thoroughly refereed proceedings of the Third International Conference o... more This book constitutes the thoroughly refereed proceedings of the Third International Conference on Belief Functions, BELIEF 2014, held in Oxford, UK, in September 2014. The 47 revised full papers presented in this book were carefully selected and reviewed from 56 submissions. The papers are organized in topical sections on belief combination; machine learning; applications; theory; networks; information fusion; data association; and geometry.

Artificial Intelligence: Foundations, Theory, and Algorithms, 2020

The principal aim of this book is to introduce to the widest possible audience an original view o... more The principal aim of this book is to introduce to the widest possible audience an original view of belief calculus and uncertainty theory. In this geometric approach to uncertainty, uncertainty measures can be seen as points of a suitably complex geometric space, and manipulated in that space, for example, combined or conditioned.

In the chapters in Part I, Theories of Uncertainty, the author offers an extensive recapitulation of the state of the art in the mathematics of uncertainty. This part of the book contains the most comprehensive summary to date of the whole of belief theory, with Chap. 4 outlining for the first time, and in a logical order, all the steps of the reasoning chain associated with modelling uncertainty using belief functions, in an attempt to provide a self-contained manual for the working scientist. In addition, the book proposes in Chap. 5 what is possibly the most detailed compendium available of all theories of uncertainty. Part II, The Geometry of Uncertainty, is the core of this book, as it introduces the author’s own geometric approach to uncertainty theory, starting with the geometry of belief functions: Chap. 7 studies the geometry of the space of belief functions, or belief space, both in terms of a simplex and in terms of its recursive bundle structure; Chap. 8 extends the analysis to Dempster’s rule of combination, introducing the notion of a conditional subspace and outlining a simple geometric construction for Dempster’s sum; Chap. 9 delves into the combinatorial properties of plausibility and commonality functions, as equivalent representations of the evidence carried by a belief function; then Chap. 10 starts extending the applicability of the geometric approach to other uncertainty measures, focusing in particular on possibility measures (consonant belief functions) and the related notion of a consistent belief function. The chapters in Part III, Geometric Interplays, are concerned with the interplay of uncertainty measures of different kinds, and the geometry of their relationship, with a particular focus on the approximation problem. Part IV, Geometric Reasoning, examines the application of the geometric approach to the various elements of the reasoning chain illustrated in Chap. 4, in particular conditioning and decision making. Part V concludes the book by outlining a future, complete statistical theory of random sets, future extensions of the geometric approach, and identifying high-impact applications to climate change, machine learning and artificial intelligence.

The book is suitable for researchers in artificial intelligence, statistics, and applied science engaged with theories of uncertainty. The book is supported with the most comprehensive bibliography on belief and uncertainty theory.

Computer vision is an ever growing discipline whose ambitious goal is to enable machines with the... more Computer vision is an ever growing discipline whose ambitious goal is to enable machines with the intelligent
visual skills humans and animals are provided by Nature, allowing them to interact effortlessly
with complex, dynamic environments. Designing automated visual recognition and sensing systems
typically involves tackling a number of challenging tasks, and requires an impressive variety of sophisticated
mathematical tools. In most cases, the knowledge a machine has of its surroundings is at best
incomplete – missing data is a common problem, and visual cues are affected by imprecision. The need
for a coherent mathematical ‘language’ for the description of uncertain models and measurements then
naturally arises from the solution of computer vision problems.

The theory of evidence (sometimes referred to as ‘evidential reasoning’, ‘belief theory’ or ‘Dempster-
Shafer theory’) is, perhaps, one of the most successful approaches to uncertainty modelling, as arguably
the most straightforward and intuitive approaches to a generalized probability theory. Emerging in the
last Sixties from a profound criticism of the more classical Bayesian theory of inference and modelling
of uncertainty, it stimulated in the last decades an extensive discussion of the epistemic nature of both
subjective ‘degrees of beliefs’ and frequentist ‘chances’ or relative frequencies. More recently, a renewed
interest in belief functions, the mathematical generalization of probabilities which are the object of study
of the theory of evidence, has seen a blossoming of applications to a variety of fields of applied science.

In this Book we are going to show how, indeed, the fruitful interaction of computer vision and evidential
reasoning is able stimulate a number of advances in both fields. From a methodological point of
view, novel theoretical advances concerning the geometric and algebraic properties of belief functions as
mathematical objects will be illustrated in some detail in Part II, with a focus on a perspective ‘geometric
approach’ to uncertainty and an algebraic solution of the issue of conflicting evidence. In Part III we will
illustrate how these new perspectives on the theory of belief functions arise from important computer vision
problems, such as articulated object tracking, data association and object pose estimation, to which
in turn the evidential formalism can give interesting new solutions. Finally, some initial steps towards
a generalization of the notion of total probability to belief functions will be taken, in the perspective of
endowing the theory of evidence with a complete battery of estimation and inference tools to the benefit
of scientists and practitioners.

This invited talk at COSUR 2018 describes a number of aspects of the application of machine learn... more This invited talk at COSUR 2018 describes a number of aspects of the application of machine learning to surgical robotics, ranging from perception to cognition (the recognition of surgeon actions, anomalous events, and the prediction of future developments).

Random set theory, originally born within the remit of mathematical statistics, lies nowadays at ... more Random set theory, originally born within the remit of mathematical statistics, lies nowadays at the interface of statistics and AI. Arguably more mathematically complex than standard probability, the field is now facing open issues such as the formulation of generalised laws of probability, the generalisation of the notion of random variable to random set spaces, the extension of the notion of random process, and so on. Frequentist inference with random sets can be envisaged to better describe common situations such as lack of data and set-valued observations. To this aim, parameterised families of random sets (and Gaussian random sets in particular) are a crucial area of investigation. In particular, we will present some recent work on the generalisation of the notion of likelihood, as the basis for a generalised logistic regression framework capable to better estimate rare events; a random set-version of maximum-entropy classifiers; and a recent generalisation of the law of total probability to belief functions. In a longer-term perspective, random set theory can be instrumental to new robust foundations for statistical machine learning allowing the formulation of models and algorithms able to deal with mission-critical applications ‘in the wild’, in a mutual beneficial exchange between statistics and artificial intelligence.

This short talk abstracted from an upcoming half-day tutorial at IJCAI 2016 is designed to introduce to non-experts the principles and rationale of random sets and belief function theory, review its rationale in the context of frequentist and Bayesian interpretations of probability but also in relationship with the other main approaches to non-additive probability, survey the key elements of the methodology and the most recent developments, discuss current trends in both its theory and applications. Finally, a research program for the future is outlined, which include a robustification of Vapnik' statistical learning theory for an Artificial Intelligence 'in the wild'.

The First International Workshop on Continual Semi-Supervised Learning (CSSL @ IJCAI 2021), 2021

The aim of this paper is to formalise a new continual semi-supervised learning (CSSL) paradigm, p... more The aim of this paper is to formalise a new continual semi-supervised learning (CSSL) paradigm, proposed to the attention of the machine learning community via the IJCAI 2021 International Workshop on Continual Semi-Supervised Learning (CSSL@IJCAI), with the aim of raising the field’s awareness about this problem and mobilising its effort in this direction. After a formal definition of continual semi-supervised learning and the appropriate training and testing protocols, the paper introduces two new benchmarks specifically designed to assess CSSL on two important computer vision tasks: activity recognition and crowd counting. We describe the Continual Activity Recognition (CAR) and Continual Crowd Counting (CCC) challenges built upon those benchmarks, the baseline models proposed for the challenges, and describe a simple CSSL baseline which consists in applying batch self-training in temporal sessions, for a limited number of rounds. The results show that learning from unlabelled data streams is extremely challenging, and stimulate the search for methods that can encode the dynamics of the data stream.

ICCV 2021 Workshop: The ROAD challenge: Event Detection for Situation Awareness in Autonomous Driving

As autonomous vehicles and autonomous racing rise in popularity, so does the need for faster and ... more As autonomous vehicles and autonomous racing rise in popularity, so does the need for faster and more accurate detectors. While our naked eyes are able to extract contextual information almost instantly, even from far away, image resolution and computational resources limitations
make detecting smaller objects (that is, objects that occupy a small pixel area in the input image) a truly challenging task for machines and a wide open research field.

This study explores ways in which the popular YOLOv5 object detector can be modified to improve its performance in detecting smaller objects, with a particular focus on its application to autonomous racing. To achieve this, we investigate how replacing certain structural elements of the
model (as well as their connections and other parameters) can affect performance and inference time. In doing so, we propose a series of models at different scales, which we name ‘YOLO-Z’, and which display an improvement of up to 6.9% in mAP when detecting smaller objects at 50% IOU, at a cost of just a 3ms increase in inference time compared to the original YOLOv5.

Our objective is not only to inform future research on the potential of adjusting a popular detector such as YOLOv5 to address specific tasks, but also to provide insights on how specific changes can impact small object detection. Such findings, applied to the wider context of autonomous vehicles, could increase the amount of contextual information available to such systems.

Current state-of-the-art human action recognition is fo-cused on the classification of temporally... more Current state-of-the-art human action recognition is fo-cused on the classification of temporally trimmed videos in which only one action occurs per frame. In this work we address the problem of action localisation and instance segmentation in which multiple concurrent actions of the same class may be segmented out of an image sequence. We cast the action tube extraction as an energy maximisa-tion problem in which configurations of region proposals in each frame are assigned a cost and the best action tubes are selected via two passes of dynamic programming. One pass associates region proposals in space and time for each action category, and another pass is used to solve for the tube's temporal extent and to enforce a smooth label sequence through the video. In addition, by taking advantage of recent work on action foreground-background seg-mentation, we are able to associate each tube with class-specific segmentations. We demonstrate the performance of our algorithm on the challenging LIRIS-HARL dataset and achieve a new state-of-the-art result which is 14.3 times better than previous methods.

Current state-of-the-art methods solve spatio-temporal action localisation by extending 2D anchor... more Current state-of-the-art methods solve spatio-temporal action localisation by extending 2D anchors to 3D-cuboid proposals on stacks of frames, to generate sets of temporally connected bounding boxes called action micro-tubes. However, they fail to consider that the underlying anchor proposal hypotheses should also move (transition) from frame to frame, as the actor or the camera do. Assuming we evaluate n 2D anchors in each frame, then the number of possible transitions from each 2D anchor to he next, for a sequence of f consecutive frames, is in the order of O(n f), expensive even for small values of f. To avoid this problem we introduce a Transition-Matrix-based Network (TraMNet) which relies on computing transition probabilities between anchor proposals while maximising their overlap with ground truth bounding boxes across frames, and enforcing sparsity via a transition threshold. As the resulting transition matrix is sparse and stochastic, this reduces the proposal hypothesis search space from O(n f) to the cardinality of the thresholded matrix. At training time, transitions are specific to cell locations of the feature maps, so that a sparse (efficient) transition matrix is used to train the network. At test time, a denser transition matrix can be obtained either by decreasing the threshold or by adding to it all the relative transitions originating from any cell location, allowing the network to handle transitions in the test data that might not have been present in the training data, and making detection translation-invariant. Finally, we show that our network is able to handle sparse annotations such as those available in the DALY dataset, while allowing for both dense (accurate) or sparse (efficient) evaluation within a single model. We report extensive experiments on the DALY, UCF101-24 and Transformed-UCF101-24 datasets to support our claims.

The notion of belief likelihood function of repeated trials is introduced, whenever the uncertain... more The notion of belief likelihood function of repeated trials is introduced, whenever the uncertainty for individual trials is encoded by a belief measure (a finite random set). This gen-eralises the traditional likelihood function, and provides a natural setting for belief inference from statistical data. Factorisation results are proven for the case in which conjunctive or disjunctive combination are employed, leading to analytical expressions for the lower and upper likelihoods of 'sharp' samples in the case of Bernoulli trials, and to the formulation of a generalised logistic regression framework.

We present the new Road Event and Activity Detection (READ) dataset, designed and created from an... more We present the new Road Event and Activity Detection (READ) dataset, designed and created from an autonomous vehicle perspective to take action detection challenges to autonomous driving. READ will give scholars in computer vision, smart cars and machine learning at large the opportunity to conduct research into exciting new problems such as understanding complex (road) activities, discerning the behaviour of sentient agents, and predicting both the label and the location of future actions and events, with the final goal of supporting autonomous decision making.

Current state-of-the-art action detection systems are tailored for offline batch-processing appli... more Current state-of-the-art action detection systems are tailored for offline batch-processing applications. However, for online applications like human-robot interaction, current systems fall short, either because they only detect one action per video, or because they assume that the entire video is available ahead of time. In this work, we introduce a real-time and online joint-labelling and association algorithm for action detection that can incrementally construct space-time action tubes on the most challenging action videos in which different action categories occur concurrently. In contrast to previous methods, we solve the detection-window association and action labelling problems jointly in a single pass. We demonstrate superior on-line association accuracy and speed (2.2ms per frame) as compared to the current state-of-the-art offline systems. We further demonstrate that the entire action detection pipeline can easily be made to work effectively in real-time using our action tube construction algorithm.

Current state-of-the-art human activity recognition is fo-cused on the classification of temporal... more Current state-of-the-art human activity recognition is fo-cused on the classification of temporally trimmed videos in which only one action occurs per frame. We propose a simple, yet effective, method for the temporal detection of activities in temporally untrimmed videos with the help of untrimmed classification. Firstly, our model predicts the top k labels for each untrimmed video by analysing global video-level features. Secondly, frame-level binary classification is combined with dynamic programming to generate the temporally trimmed activity proposals. Finally, each proposal is assigned a label based on the global label, and scored with the score of the temporal activity proposal and the global score. Ultimately, we show that untrimmed video classification models can be used as stepping stone for temporal detection. Our method wins runner-up prize in Ac-tivtiyNet Detection challenge 2016.