Mark Cummins | University of Oxford (original) (raw)
I completed my DPhil thesis in the Mobile Robotics Group at Oxford, working on place recognition and appearance based navigation. My advisor was Paul Newman, and the algorithm we developed is called FAB-MAP.
Since finishing, I co-founded a start-up company called Plink, together with James Philbin, which developed visual search engine technology. Our PlinkArt app allowed users to identify a painting just by taking a picture of it with their mobile phone. Plink was acquired by Google in April 2010. I now work on the Goggle Goggles team, developing Google's computer vision systems.
Supervisors: Paul Newman
less
Uploads
Papers by Mark Cummins
2013 IEEE International Conference on Computer Vision, 2013
IEEE Transactions on Robotics, 2010
The International Journal of Robotics Research, 2010
We describe a new formulation of appearance-only SLAM suitable for very large scale place recogni... more We describe a new formulation of appearance-only SLAM suitable for very large scale place recognition. The system navigates in the space of appearance, assigning each new observation to either a new or a previously visited location, without reference to metric position. The system is demonstrated performing reliable online appearance mapping and loop-closure detection over a 1000 km trajectory, with mean filter update times of 14 ms. The scalability of the system is achieved by defining a sparse approximation to the FAB-MAP model suitable for implementation using an inverted index. Our formulation of the problem is fully probabilistic and naturally incorporates robustness against perceptual aliasing. We also demonstrate that the approach substantially outperforms the standard term-frequency inverse-document-frequency (tf-idf) ranking measure. The 1000 km data set comprising almost a terabyte of omni-directional and stereo imagery is available for use, and we hope that it will serve ...
Robotics Research: The 13 International Symposium ISRR, Nov 8, 2010
We describe PhotoOCR, a system for text extraction from images. Our particular focus is reliable ... more We describe PhotoOCR, a system for text extraction from images. Our particular focus is reliable text extraction from smartphone imagery, with the goal of text recognition as a user input modality similar to speech recognition. Commercially available OCR performs poorly on this task. Recent progress in machine learning has substantially improved isolated character classification; we build on this progress by demonstrating a complete OCR system using these techniques. We also incorporate modern datacenter-scale distributed language modelling. Our approach is capable of recognizing text in a variety of challenging imaging conditions where traditional OCR systems fail, notably in the presence of substantial blur, low resolution, low contrast, high image noise and other distortions. It also operates with low latency; mean processing time is 600 ms per image. We evaluate our system on public benchmark datasets for text extraction and outperform all previously reported results, more than halving the error rate on multiple bench- marks. The system is currently in use in many applications at Google, and is available as a user input modality in Google Translate for Android.
Abstract Real-time SLAM in large-scale environments is a computationally demanding task. This pap... more Abstract Real-time SLAM in large-scale environments is a computationally demanding task. This paper discusses an approach to dealing with storage and computation constraints based on selective evidence gathering at significant areas in the environment. Local measures of novelty, based on the input from multiple sensors, are used to detect salient locations. This can then be used to trigger the collection of reliable descriptors of the location.
Robotics Research, Jan 1, 2011
International journal of …, Jan 1, 2011
2013 IEEE International Conference on Computer Vision, 2013
IEEE Transactions on Robotics, 2010
The International Journal of Robotics Research, 2010
We describe a new formulation of appearance-only SLAM suitable for very large scale place recogni... more We describe a new formulation of appearance-only SLAM suitable for very large scale place recognition. The system navigates in the space of appearance, assigning each new observation to either a new or a previously visited location, without reference to metric position. The system is demonstrated performing reliable online appearance mapping and loop-closure detection over a 1000 km trajectory, with mean filter update times of 14 ms. The scalability of the system is achieved by defining a sparse approximation to the FAB-MAP model suitable for implementation using an inverted index. Our formulation of the problem is fully probabilistic and naturally incorporates robustness against perceptual aliasing. We also demonstrate that the approach substantially outperforms the standard term-frequency inverse-document-frequency (tf-idf) ranking measure. The 1000 km data set comprising almost a terabyte of omni-directional and stereo imagery is available for use, and we hope that it will serve ...
Robotics Research: The 13 International Symposium ISRR, Nov 8, 2010
We describe PhotoOCR, a system for text extraction from images. Our particular focus is reliable ... more We describe PhotoOCR, a system for text extraction from images. Our particular focus is reliable text extraction from smartphone imagery, with the goal of text recognition as a user input modality similar to speech recognition. Commercially available OCR performs poorly on this task. Recent progress in machine learning has substantially improved isolated character classification; we build on this progress by demonstrating a complete OCR system using these techniques. We also incorporate modern datacenter-scale distributed language modelling. Our approach is capable of recognizing text in a variety of challenging imaging conditions where traditional OCR systems fail, notably in the presence of substantial blur, low resolution, low contrast, high image noise and other distortions. It also operates with low latency; mean processing time is 600 ms per image. We evaluate our system on public benchmark datasets for text extraction and outperform all previously reported results, more than halving the error rate on multiple bench- marks. The system is currently in use in many applications at Google, and is available as a user input modality in Google Translate for Android.
Abstract Real-time SLAM in large-scale environments is a computationally demanding task. This pap... more Abstract Real-time SLAM in large-scale environments is a computationally demanding task. This paper discusses an approach to dealing with storage and computation constraints based on selective evidence gathering at significant areas in the environment. Local measures of novelty, based on the input from multiple sensors, are used to detect salient locations. This can then be used to trigger the collection of reliable descriptors of the location.
Robotics Research, Jan 1, 2011
International journal of …, Jan 1, 2011