Learning Patterns in Images (original) (raw)

Computer Vision through Learning

1997

This report concerns problems of learning patterns in images and image sequences, and using the obtained patterns to interpret new images. The study describes our approach to these problems, the developed methodology, called MIST, and our results in the following problem areas: (i) semantic interpretation of color images of outdoor scenes, (ii) detection of blasting caps in x-ray images of airport luggage, (iii) recognizing actions in video image sequences, (iv) recognizing targets in SAR images, (v) "robotic estimation", (vi) visual memories, and (vii) designing "Bisight control library". We discuss the image formation processes in these problem areas, and the choices of representation spaces used in our approaches to solving these problems. The results presented indicate the advantages and significant potential in applying machine learning to computer vision problems. Contents 1 Introduction 3 2 Previous Work on Machine Learning in Computer Vision 4 Semantic Interpretation of Color Images of Outdoor Scenes 6 3.1 The MIST Methodology 7 3.2 Implementation and Experimental Results 9 4 Detection of Blasting Caps in X-ray Images of Luggage 11 4.1 Preliminaries 12 4.2 Problem Statement 13 4.3 The Method and Experimental Results 15 5 Recognizing Actions in Video Image Sequences 16 5.

Understanding Video Events: A Survey of Methods for Automatic Interpretation of Semantic Occurrences in Video

IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 2000

Understanding Video Events, the translation of low-level content in video sequences into highlevel semantic concepts, is a research topic that has received much interest in recent years. Important applications of this work include smart surveillance systems, semantic video database indexing, and interactive systems. This technology can be applied to several video domains including: airport terminal, parking lot, traffic, subway stations, aerial surveillance, and sign language data. In this work we survey the two main components of the event understanding process: Abstraction and Event modeling. Abstraction is the process of molding the data into informative units to be used as input to the event model. Event modeling is devoted to describing events of interest formally and enabling recognition of these events as they occur in the video sequence. Event modeling can be further decomposed in the categories of Pattern Recognition Methods, State Event Models, and Semantic Event Models. In this survey we discuss this proposed taxonomy of the literature, offer a unifying terminology, and discuss popular abstraction schemes (e.g. Motion History Images) and event modeling formalisms (e.g. Hidden Markov Model) and their use in video event understanding using extensive examples from the literature. Finally we consider the application domain of video event understanding in light of the proposed taxonomy, and propose future directions for research in this field. 1

What is going on? a high level interpretation of sequences of images

Automated scene recognition in dynamic environments involves not only object classi cation or recognition, but also a further step consisting in nding what is going on in this environment, that is to say what the objects are doing and what their purposes are. In order to perform that high level function, a Symbolic Level is designed which is based on a conceptual knowledge describing the Objects and their possible behaviours in time, in terms of Activity and Plan Prototypes, and on a consistency-based reasoning matching the data delivered by the Numerical Level to the Prototypes. An example on real world data is given.

Machine Learning for Object Recognition and Scene Analysis

International Journal of Pattern Recognition and Artificial Intelligence, 1994

Learning is a critical research field for autonomous computer vision systems. It can bring solutions to the knowledge acquisition bottleneck of image understanding systems. Recent developments of machine learning for computer vision are reported in this paper. We describe several different approaches for learning at different levels of the image understanding process, including learning 2-D shape models, learning strategic knowledge for optimizing model matching, learning for adaptive target recognition systems, knowledge acquisition of constraint rules for labelling and automatic parameter optimization for vision systems. Each approach will be commented on and its strong and weak points will be underlined. In conclusion we will suggest what could be the “ideal” learning system for vision.

Representation models and machine learning techniques for scene classification

Pattern recognition and machine vision'( …, 2010

Scene classification is a fundamental process of human vision that allows us to efficiently and rapidly analyze our surroundings. Humans are able to recognize complex visual scenes at a single glance, despite the number of objects with different poses, colors, shadows and textures that may be contained in the scenes. Understanding the robustness and rapidness of this human ability has been a focus of investigation in the cognitive sciences over many years. These studies have stimulated researches in computer vision in building artificial scene recognition systems. Motivations beyond that of pure scientific curiosity are provided by several important computer vision applications in which scene classification can be exploited (e.g., robot navigation systems). Different methods have been proposed to model and to describe the content of a scene. Different machine learning procedures have been employed to automatically learn commonalities and differences between different classes. In this chapter we survey some of the state of the art approaches for scene classification. For each approach we report a description and a discussion of the most relevant peculiarities.

A learning approach to semantic image analysis

Proceedings of the 2nd …, 2006

In this paper, a learning approach coupling Support Vector Machines (SVMs) and a Genetic Algorithm (GA) is presented for knowledge-assisted semantic image analysis in specific domains. Explicitly defined domain knowledge under the proposed approach includes objects of the domain of interest and their spatial relations. SVMs are employed using low-level features to extract implicit information for each object of interest via training in order to provide an initial annotation of the image regions based solely on visual features. To account for the inherent visual information ambiguity, fuzzy spatial relations along with the previously computed initial annotations are supplied to a genetic algorithm, which decides on the globally most plausible annotation. Experiments with images of the beach vacation domain demonstrate the performance of the proposed approach.

Automatic Learning of Conceptual Knowledge in Image Sequences for Human Behavior Interpretation

2007

This work describes an approach for the interpretation and explanation of human behavior in image sequences, within the context of a Cognitive Vision System. The information source is the geometrical data obtained by applying tracking algorithms to an image sequence, which is used to generate conceptual data. The spatial characteristics of the scene are automatically extracted from the resuling tracking trajectories obtained during a training period. Interpretation is achieved by means of a rule-based inference engine called Fuzzy Metric Temporal Horn Logic and a behavior modeling tool called Situation Graph Tree. These tools are used to generate conceptual descriptions which semantically describe observed behaviors.