Object Recognition Research Papers - Academia.edu (original) (raw)

We present our work on the implementation and calibration of a multi sensor measuring system. The work is part of a large scale research project on optical measurement using sensor actuator coupling and active exploration. This project is... more

We present our work on the implementation and calibration of a multi sensor measuring system. The work is part of a large scale research project on optical measurement using sensor actuator coupling and active exploration. This project is a collaboration of researchers from seven institutes of the University of Stuttgart including photogrammetry, mechanical engineering and computer science. The system consists of optical sensors which can be manipulated in position and orientation by robot actuators, and light sources which control illumination. The system performs different tasks including object recognition, localization and gauging. Flexibility is achieved by replacing the common serial measurement chain by nested control loops involving autonomous agents which perform basic tasks in a modular fashion. The system is able to inspect and gauge several parts from a set of parts stored in a 3-D model database. The paper gives an overview of the entire system and details some of the photogrammetry-related aspects such as the calibration of the different sensors (cameras, stereo-head, stripe projector), the calibration of the measurement robot using photogrammetric measurements, as well as data processing steps like segmentation, object pose determination, and gauging.

This paper describes an integrated M M W radar and vision sensor system for autonomous on-mad navigation. The radar sensor has a range of approximately 200 metres and uses a linear array of receivers and wavefmnt reconstruction techniques... more

This paper describes an integrated M M W radar and vision sensor system for autonomous on-mad navigation. The radar sensor has a range of approximately 200 metres and uses a linear array of receivers and wavefmnt reconstruction techniques to compute range and bearing of objects within thefield of view. It is integrated with a vision based lane keeping system to accurately detect and classify obstacles with respect to the danger they pose to the vehicle and to execute required avoidance maneuvres. I ,

This paper describes an application that enables quick reconstruction of interconnected events, sparsely captured by one or more surveillance cameras. Unlike related efforts, our approach does not require indexing, advance knowledge of... more

This paper describes an application that enables quick reconstruction of interconnected events, sparsely captured by one or more surveillance cameras. Unlike related efforts, our approach does not require indexing, advance knowledge of potential search criteria, nor a solution to the generalized object-recognition problem. Instead, we strategically pair the intelligence and skill of a human investigator with the speed and flexibility of a parallel image search engine that exploits local storage and processing capabilities distributed across large collections of video recording devices. The result is a system for fast, interactive, brute-force video searching which is both effective and highly scalable.

This paper presents a haptic virtual reality tool developed to enhance the accessibility for the visually impaired. The proposed approach focuses on the development of a highly interactive haptic virtual reality system that allows... more

This paper presents a haptic virtual reality tool developed to enhance the accessibility for the visually impaired. The proposed approach focuses on the development of a highly interactive haptic virtual reality system that allows visually impaired, to study and interact with various virtual objects in specially designed virtual environments. The system is based on the use of the CyberGrasp™ and the PHANToM™ haptic devices. A number of custom applications have been developed based on object recognition and manipulation, and utilizing the advantages of both haptic devices. The system has been tested and evaluated in three custom training applications for the visually impaired.

This work describes a new approach for the computation of 3D Fourier descriptors, which are used for characterization, classification, and recognition of 3D objects. The method starts with a polygonized surface which is mapped onto a unit... more

This work describes a new approach for the computation of 3D Fourier descriptors, which are used for characterization, classification, and recognition of 3D objects. The method starts with a polygonized surface which is mapped onto a unit sphere using an inflation algorithm, after which the polyhedron is expanded in spherical harmonic functions. Homogeneous distribution of the vertices is achieved by applying an iterative watershed algorithm to the surface graph.

Accurate image classification is crucial in many robotics and surveillance applications -for example, a vision system on a robot needs to accurately recognize the objects seen by its camera. Object recognition systems typically need a... more

Accurate image classification is crucial in many robotics and surveillance applications -for example, a vision system on a robot needs to accurately recognize the objects seen by its camera. Object recognition systems typically need a large amount of training data for satisfactory performance. The problem is particularly acute when many object categories are present. In this paper we present a batch-mode active learning framework for multi-class image classification systems. In active learning, images are to be chosen for interactive labeling, instead of passively accepting training data. Our framework addresses two important issues: i) it handles redundancy between images which is crucial when batch-mode selection is performed; and ii) we pose batch selection as a submodular function optimization problem that makes an inherently intractable problem efficient to solve, while having approximation guarantees. We show results on image classification data in which our approach substantially reduces the amount of training required over the baseline.

This paper considers the problem of automatically learning an activity-based semantic scene model from a stream of video data. A scene model is proposed that labels regions according an identifiable activity in each region, such as... more

This paper considers the problem of automatically learning an activity-based semantic scene model from a stream of video data. A scene model is proposed that labels regions according an identifiable activity in each region, such as entry/exit zones, junctions, paths and stop zones. We present several unsupervised methods that learn these scene elements and present results that show the efficiency of our approach. Finally, we describe how the models can be used to support the interpretation of moving objects in a visual surveillance environment.

In this paper, a new algorithm for Automatic License Plate Localisation and Recognition (ALPR) is proposed on the basis of isotropic dilation that can be achieved using the binary image Euclidean distance transform. In a blob analysis... more

In this paper, a new algorithm for Automatic License Plate Localisation and Recognition (ALPR) is proposed on the basis of isotropic dilation that can be achieved using the binary image Euclidean distance transform. In a blob analysis problem, any two Region of Interest (RoIs) that is discontinuous are typically treated as separate blobs. However, the proposed algorithm combine with Connected Component Analysis (CCA) are coded to seek for RoI within a certain distance of other RoI to be treated as non-unique. This paper investigates the design and implementation of several pre-processing techniques and isotropic dilation algorithm to classify moving vehicles with different backgrounds and varying angles. A multi-layer feed-forward back-propagation Neural Network is used to train the segmented and refined characters. The results obtained can be used for implementation in the vehicle parking management system.

This paper presents the effectiveness of perceptual features and iterative classification approach for offline Arabic word images classification. Optimum word image feature extraction is the system which can obtain the minimum feature... more

This paper presents the effectiveness of perceptual features and iterative classification approach for offline Arabic word images classification. Optimum word image feature extraction is the system which can obtain the minimum feature that completely represents the target for matching or classification. In this paper we develop the Arabic word image classification by extracting the feature in three main steps: firstly Scales Invariant Feature Transformation (SIFT) is applied after preprocessing. Secondly important points were selected from the descriptors by using locale maxima operation to the SIFT feature matrix. At last we use linear classifier recognizer. Our proposed approach not only performs well and effectively but also was fastesr when applied to big database images.

AbstractÐThe problem of screening images of the skies to determine whether or not aircraft are present is of both theoretical and practical interest. After the most prominent signal in an infrared image of the sky is extracted, the... more

AbstractÐThe problem of screening images of the skies to determine whether or not aircraft are present is of both theoretical and practical interest. After the most prominent signal in an infrared image of the sky is extracted, the question is whether the signal corresponds to an aircraft. Common approaches calculate the degree of similarity of the shape of the Ph signal with a model aircraft using a similarity measure such as Euclidean distance, and make a decision based on whether the degree of similarity exceeds a (prespecified) threshold. We present a new approach that avoids metric similarity measures and the use of thresholds, and instead attempts to learn similarity measures like those used by humans. In the absence of sufficient real data, the approach allows us to specifically generate an arbitrarily large number of training exemplars projecting near the classification boundary. Once trained on such a training set, the performance of our neural network-based system was comparable to that of a human expert and far better than a network trained only on the available real data. Furthermore, the results were considerably better than those obtained using an Euclidean discriminator.

We describe a method of generating and utilizing visual landmarks that is well suited for SLAM applications. The landmarks created are highly distinctive and reliably detected, virtually eliminating the data association problem present in... more

We describe a method of generating and utilizing visual landmarks that is well suited for SLAM applications. The landmarks created are highly distinctive and reliably detected, virtually eliminating the data association problem present in other landmark schemes. Upon subsequent detections of a landmark, a 3-D pose can be estimated. The scheme requires a single camera.

In this paper we present a non-intrusive model-based gaze tracking system. The system estimates the 3-D pose of a user's head by tracking as few as six facial feature points. The system locates a human face using a statistical color model... more

In this paper we present a non-intrusive model-based gaze tracking system. The system estimates the 3-D pose of a user's head by tracking as few as six facial feature points. The system locates a human face using a statistical color model without any mark on the face and then finds and tracks the facial features, such as eyes, nostrils and lip corners. A full perspective model is employed to map these feature points onto the 3D pose. Several techniques have been developed to track the features points and recover from failure. We currently achieve a frame rate of 15+ frames per second using an HP 9000 workstation with a framegrabber and a Canon VC-C1 camera. The application of the system has been demonstrated by a gazedriven panorama image viewer. The potential applications of the system include multimodal interfaces, virtual reality and video-teleconferencing.

Any object recognition system must address cases when parts of the object are not visible due to occlusion, shadows, ... etc. In this paper we introduce a simple matching method that is based on matching boundary signatures. Boundary... more

Any object recognition system must address cases when parts of the object are not visible due to occlusion, shadows, ... etc. In this paper we introduce a simple matching method that is based on matching boundary signatures. Boundary signatures are surface feature vectors that reflect the probability of occurrence of a feature of a surface (or an object) boundary. Boundary signatures are an extension to our surface signature formulation which we have presented with good success in our earlier work. We introduce four types of surface boundary signatures; The Curvature Boundary Signature, the Direction Boundary Signature, the Distance Boundary Signature and the Parameter Boundary Signature. These four signatures are constructed based on local and global geometric shape attributes of the boundary. Tests conducted on objects of different shapes have produced excellent results in the absence of occlusion and good results when objects retain at least 70% of their original shapes.

We present an efficient Hough transform for automatic detection of cylinders in point clouds. As cylinders are one of the most frequently used primitives for industrial design, automatic and robust methods for their detection and fitting... more

We present an efficient Hough transform for automatic detection of cylinders in point clouds. As cylinders are one of the most frequently used primitives for industrial design, automatic and robust methods for their detection and fitting are essential for reverse engineering from point clouds. The current methods employ automatic segmentation followed by geometric fitting, which requires a lot of manual interaction during modelling. Although Hough transform can be used for automatic detection of cylinders, the required 5D Hough space has a prohibitively high time and space complexity for most practical applications. We address this problem in this paper and present a sequential Hough transform for automatic detection of cylinders in point clouds. Our algorithm consists of two sequential steps of low dimensional Hough trans- forms. The first step, called Orientation Estimation, uses the Gaussian sphere of the input data and performs a 2D Hough Transform for finding strong hypotheses ...

Visual object analysis researchers are increasingly experimenting with video, because it is expected that motion cues should help with detection, recognition, and other analysis tasks. This paper presents the Cambridge-driving Labeled... more

Visual object analysis researchers are increasingly experimenting with video, because it is expected that motion cues should help with detection, recognition, and other analysis tasks. This paper presents the Cambridge-driving Labeled Video Database (CamVid) as the first collection of videos with object class semantic labels, complete with metadata. The database provides ground truth labels that associate each pixel with one of 32 semantic classes. The database addresses the need for experimental data to quantitatively evaluate emerging algorithms. While most videos are filmed with fixed-position CCTV-style cameras, our data was captured from the perspective of a driving automobile. The driving scenario increases the number and heterogeneity of the observed object classes. Over 10 min of high quality 30 Hz footage is being provided, with corresponding semantically labeled images at 1 Hz and in part, 15 Hz. The CamVid Database offers four contributions that are relevant to object analysis researchers. First, the per-pixel semantic segmentation of over 700 images was specified manually, and was then inspected and confirmed by a second person for accuracy. Second, the high-quality and large resolution color video images in the database represent valuable extended duration digitized footage to those interested in driving scenarios or ego-motion. Third, we filmed calibration sequences for the camera color response and intrinsics, and computed a 3D camera pose for each frame in the sequences. Finally, in support of expanding this or other databases, we present custom-made labeling software for assisting users who wish to paint precise class-labels for other images and videos. We evaluate the relevance of the database by measuring the performance of an algorithm from each of three distinct domains: multi-class object recognition, pedestrian detection, and label propagation.

In this paper,w ec onsider the problem of learning to predict the correct pose of a 3D object, assuming orthographic projection and 3D linear transformations. A neural network is trained to learn the desired mapping.F irst, we consider... more

In this paper,w ec onsider the problem of learning to predict the correct pose of a 3D object, assuming orthographic projection and 3D linear transformations. A neural network is trained to learn the desired mapping.F irst, we consider the problem of predicting all possible views that an object can produce.T his is performed by representing the object with a small number of reference views and using algebraic functions of views to construct the space of all possible views that the object can produce.F undamental to this procedurei sam ethodology based on Singular Value Decomposition and Interval Arithmetic for estimating of the ranges of values that the parameterso fa lgebraic functions can assume.Then, a neural network is trained using a number of views (training views) whicha re generated by sampling the space of views of the object. During learning,a training viewispresented to the inputs of the network which is required to respond at its outputs with the parameterso f the algebraic functions used to generate the viewf romt he reference views. Compared to similar approaches in the literature, the proposed approachh as the advantage that it does not requiret he 3D models of the objects or a large number of views, it is extendible to other types of projections, and it is morepractical for object recognition.

The ability to filter improper content from multimedia sources based on visual content has important applications, since text-based filters are clearly insufficient against erroneous and/or malicious associations between text and actual... more

The ability to filter improper content from multimedia sources based on visual content has important applications, since text-based filters are clearly insufficient against erroneous and/or malicious associations between text and actual content. In this paper, we investigate a method for detection of nudity in videos based on a bag-of-visual-features representation for frames and an associated voting scheme. Bag-of-Visual-Features (BoVF) approaches have been successfully applied to object recognition and scene classification, showing robustness to occlusion and also to the several kinds of variations that normally curse object detection methods. To the best of our knowledge, only two proposals in the literature use BoVF for nude detection in still images, and no other attempt has been made at applying BoVF for videos. Nevertheless, the results of our experiments show that this approach is indeed able to provide good recognition rates for nudity even at the frame level and with a relatively low sampling ratio. Also, the proposed voting scheme significantly enhances the recognition rates for video segments, achieving, in the best case, a value of 93.2% of correct classification, using a sampling ratio of 1/15 frames. Finally, a visual analysis of some particular cases indicates possible sources of misclassifications.

MORSE is an object recognition system, based on geometric invariants of 3D structures taken from a single 2D intensity view. The system exploits the geometric constraints inherent in object classes such as polyhedra, rotational symmetry,... more

MORSE is an object recognition system, based on geometric invariants of 3D structures taken from a single 2D intensity view. The system exploits the geometric constraints inherent in object classes such as polyhedra, rotational symmetry, bi-lateral symmetry and extruded surfaces. Invariants have been used in the past to index many of these classes, but MORSE is designed to treat multi-class recognition in a unform system architecture. The class constraints are also used to drive image feature extraction and grouping. 1 Invariant Representation The computer recognition of objects has attracted considerable research effort over the last 25 years. It is now widely accepted that object recognition, in the setting of real world scenes and based on a single perspective view, is a difficult problem and cannot be achieved without the use of object models to guide the processing of image data and to confirm object hypotheses. It is also accepted that the most reliable information which is av...

This chapter discusses the volitional basis of Personality Systems Interaction Theory (PSI), and applies it to the improvement of conditions for learning and psychological treatment. The theory explains motivational and volitional... more

This chapter discusses the volitional basis of Personality Systems Interaction Theory (PSI), and applies it to the improvement of conditions for learning and psychological treatment. The theory explains motivational and volitional phenomena, including concentration, coping with failure, identi"cation and intrinsic commitment to personal goals, persistence, and implementation of intentions. Patterns of interactions among four cognitive systems, viz., thinking and intention memory, feeling and extension memory, discrepancy-sensitive object recognition, and intuitive behavior control, are shown to be modulated by a!ective change. Self-regulatory abilities support a!ective change in learning and in therapy, while developmental and educational risk factors compromise it. Educational and clinical treatment applications are discussed in terms of the functional mechanisms involved. 2001 Elsevier Science Ltd. All rights reserved. Kurt Lewin (1951) said,`There is nothing so practical as a good theorya (p. 169). One way a theory becomes practical is through opportunities for intervention. In this article, I describe the volitional core of a broader, personality systems interactions (PSI) theory, and discuss its implications for interventions in situations involving learning and motivation, including training and clinical therapy. This task requires movement between complex concepts, linkages between concepts, and potential examples. Consider some problems of the sort that PSI theory addresses: There is 10-year old Samuel, a student who has di$culty getting down to work on projects soon after they are assigned, and tends instead to leave most of his work for the last night. Or consider Candace, a 13-year old student who often helps

We explore a location-based approach for behavior modeling and abnormality detection. In contrast to the conventional object-based approach where an object may first be tagged, identified, classified, and tracked, we proceed directly with... more

We explore a location-based approach for behavior modeling and abnormality detection. In contrast to the conventional object-based approach where an object may first be tagged, identified, classified, and tracked, we proceed directly with event characterization and behavior modeling at the pixel(s) level based on motion labels obtained from background subtraction. Since events are temporally and spatially dependent, this calls for techniques that account for statistics of spatio-temporal events. Based on motion labels, we learn co-occurrence statistics for normal events across space-time. For one (or many) key pixel(s), we estimate a co-occurrence matrix that accounts for any two active labels which co-occur simultaneously within the same spatio-temporal volume. This co-occurrence matrix is then used as a potential function in a Markov Random Field (MRF) model to describe the probability of observations within the same spatio-temporal volume. The MRF distribution implicitly accounts for speed, direction, as well as the average size of the objects passing in front of each key pixel. Furthermore, when the spatio-temporal volume is large enough, the co-occurrence distribution contains the average normal path followed by moving objects. The learned normal co-occurrence distribution can be used for abnormal detection. Our method has been tested on various outdoor videos representing various challenges.

This paper introduces a successful approach for distinguishing abandoned luggage in surveillance recordings. We join short-and long-term foundation models to concentrate on closer view of objects, where every pixel in an information... more

This paper introduces a successful approach for distinguishing abandoned luggage in surveillance recordings. We join short-and long-term foundation models to concentrate on closer view of objects, where every pixel in an information picture is named a 2 bit code. In this manner, we acquaint a structure with recognized static frontal areas in light of the worldly move of code designs, and to figure out if the applicant districts contain surrendered protests by breaking down the back-followed directions of baggage proprietors. The trial comes about acquired in light of video pictures from 2006 performance evaluation of tracking and surveillance, and 2007 advanced video and signal-based surveillance databases demonstrate that the proposed approach is successful for identifying relinquished gear, and that it outflanks past techniques.

Real world objects have persistent structure. However, as we move about in the world the spatio-temporal patterns coming from our sensory organs vary continuously. How the brain creates invariant representations from the always-changing... more

Real world objects have persistent structure. However, as we move about in the world the spatio-temporal patterns coming from our sensory organs vary continuously. How the brain creates invariant representations from the always-changing input patterns is a major unanswered question. We propose that the neocortex solves the invariance problem by using a hierarchical structure. Each region in the hierarchy learns and recalls sequences of inputs. Temporal sequences at each level of the hierarchy become the spatial inputs to the next higher regions. Thus the entire memory system stores sequences in sequences. The hierarchical model is highly efficient in that object representations at any level in the hierarchy can be shared among multiple higher order objects, therefore, transformations learned for one set of objects will automatically apply to others. Assuming a hierarchy of sequences, and assuming that each region in the hierarchy behaves equivalently, we derive the optimal Bayes inference rules for any level in the cortical hierarchy and we show how feedfoward and feedback can be understood within this probabilistic framework. We discuss how the hierarchical nested structure of sequences can be learned. We show that static group formation and probability density formation are special cases of remembering sequences. Thus, although normal vision is a temporal process we are able to recognize flashed static images as well. We use the most basic form of one of these special cases to train an object recognition system that exhibits robust invariant recognition.

Functional cerebral asymmetries, once thought to be exclusively human, are now accepted to be a widespread principle of brain organization in vertebrates [1]. The prevalence of lateralization makes it likely that it has some major... more

Functional cerebral asymmetries, once thought to be exclusively human, are now accepted to be a widespread principle of brain organization in vertebrates [1]. The prevalence of lateralization makes it likely that it has some major advantage. Until now, however, conclusive evidence has been lacking. To analyze the relation between the extent of cerebral asymmetry and the degree of performance in visual foraging, we studied grain-grit discrimination success in pigeons, a species with a left hemisphere dominance for visual object processing [2,3]. The birds performed the task under left-eye, right-eye or binocular seeing conditions. In most animals, right-eye seeing was superior to left-eye seeing performance, and binocular performance was higher than each monocular level. The absolute difference between left-and right-eye levels was defined as a measure for the degree of visual asymmetry. Animals with higher asymmetries were more successful in discriminating grain from grit under binocular conditions. This shows that an increase in visual asymmetry enhances success in visually guided foraging. Possibly, asymmetries of the pigeon's visual system increase the computational speed of object recognition processes by concentrating them into one hemisphere while preventing the other side of the brain from initiating conflicting search sequences of its own.

Cognitive deficits are a core feature of patients with methamphetamine (METH) abuse. It has been reported that repeated METH treatment impairs long-term recognition memory in the novel object recognition test (NORT) in mice. Recent... more

Cognitive deficits are a core feature of patients with methamphetamine (METH) abuse. It has been reported that repeated METH treatment impairs long-term recognition memory in the novel object recognition test (NORT) in mice. Recent studies indicate that silibinin, a flavonoid derived from the herb milk thistle, has potent neuroprotective effects in cell cultures and several animal models of neurological diseases. However, its effect on the cognitive deficit induced by METH remains unclear. In the present study, we attempt to clarify the effect of silibinin on impairments of recognition memory caused by METH in mice. Mice were co-administered silibinin with METH for 7 days and then cognitive function was assessed by NORT after 7-day withdrawal. Tissue levels of dopamine and serotonin as well as their metabolites in the prefrontal cortex and hippocampus were measured 1 day after NORT. Silibinin dose-dependently ameliorated the impairment of recognition memory caused by METH treatment in mice. Silibinin significantly attenuated the decreases in the dopamine content of the prefrontal cortex and serotonin content of the hippocampus caused by METH treatment. We also found a correlation between the recognition values and dopamine and serotonin contents of the prefrontal cortex and hippocampus. The effect of silibinin on cognitive impairment may be associated with an amelioration of decreases in dopamine and serotonin levels in the prefrontal cortex and hippocampus, respectively. These results suggest that silibinin may be useful as a pharmacological tool to investigate the mechanisms of METH-induced cognitive impairments.

The extent to which the brain regions associated with face processing are selective for that specific function remains controversial. In addition, little is known regarding the extent to which face-responsive brain regions are selective... more

The extent to which the brain regions associated with face processing are selective for that specific function remains controversial. In addition, little is known regarding the extent to which face-responsive brain regions are selective for human faces. To study regional selectivity of face processing, we used functional magnetic resonance imaging to examine whole brain activation in response to human faces, dog faces, and houses. Fourteen healthy right-handed volunteers participated in a passive viewing, blocked experiment. Results indicate that the lateral fusiform gyrus (Brodmann's area 37) responds maximally to both dog and human faces when compared with other sites, followed by the middle/inferior occipital gyrus (BA 18/19). Sites that were activated by houses versus dog and human faces included the medial fusiform gyrus (BA 19/37), the posterior cingulate (BA 30), and the superior occipital gyrus (BA 19). The only site that displayed significant differences in activation between dog and human faces was the lingual/medial fusiform gyrus. In this site, houses elicited the strongest activation, followed by dog faces, while the response to human faces was negligible and did not differ from fixation. The parahippocampal gyrus/amygdala was the sole site that displayed significant activation to human faces, but not to dog faces or houses. D

Color-based object recognition is typically concerned with building statistical descriptions from pixels that correspond to an object class and then using these models to detect pixels that belong to previously seen objects. Specific... more

Color-based object recognition is typically concerned with building statistical descriptions from pixels that correspond to an object class and then using these models to detect pixels that belong to previously seen objects. Specific instances of color-based classification occur in a number of computer vision problems including background modeling, image-based retrieval, and multi-view object recognition and tracking. Color-based models are dependent on the intrinsic parameters of the camera(s) used to acquire them. Rather than view this as a problem, we propose to utilize this relationship to control (to a degree) how color models are acquired by modifying camera intrinsics. In particular, we introduce an algorithm that searches for the best set of camera settings that will facilitate class separability for a given set of colored objects. The method searches the space of color settings including white balance, hue and saturation in order to maximize classification accuracy of examp...

A machine vision algorithm to find the longest common subcurve of two 3-D curves is presented. The curves axe represented by splines fitted through sequences of sample points extracted from dense range data. The approximated 3-D curves... more

A machine vision algorithm to find the longest common subcurve of two 3-D curves is presented. The curves axe represented by splines fitted through sequences of sample points extracted from dense range data. The approximated 3-D curves are transformed into 1-D numerical strings of rotation and translation invariant shape signatures, based on a multi-resolution representation of the curvature and torsion values of the space curves. The shape signature strings are matched using an efficient hashing technique that finds longest matching substrings. The results of the string matching stage axe later verified by a robust, least-squares, 3-D curve matching technique, which also recovers the Euclidean transformation between the curves being matched. This algorithm is of average complexity O(n) where n is the number of the sample points on the two curves. The algorithm has applications in assembly and object recognitio~i tasks. Results of assembly experiments are included.

This paper presents a color object recognition scheme which proceeds in three sequential steps: segmentation, features extraction and classification. We mainly focus on the first and the third steps here. A color watershed using global... more

This paper presents a color object recognition scheme which proceeds in three sequential steps: segmentation, features extraction and classification. We mainly focus on the first and the third steps here. A color watershed using global and local criteria is first described. A color contrast value is defined to select the best color space for segmenting color objects. Then, an architecture of binary neural networks is described. Its properties relies on the simplification of the recognition problem, leading to a noticeable increase in the classification rate. We conclude with the abilities of such a recognition scheme and present an automated cell sorting system.

We introduce a new general framework for the recognition of complex visual scenes, which is motivated by biology: We describe a hierarchical system that closely follows the organization of visual cortex and builds an increasingly complex... more

We introduce a new general framework for the recognition of complex visual scenes, which is motivated by biology: We describe a hierarchical system that closely follows the organization of visual cortex and builds an increasingly complex and invariant feature representation by alternating between a template matching and a maximum pooling operation. We demonstrate the strength of the approach on a range of recognition tasks: From invariant single object recognition in clutter to multiclass categorization problems and complex scene understanding tasks that rely on the recognition of both shape-based as well as texture-based objects. Given the biological constraints that the system had to satisfy, the approach performs surprisingly well: It has the capability of learning from only a few training examples and competes with state-of-the-art systems. We also discuss the existence of a universal, redundant dictionary of features that could handle the recognition of most object categories. In addition to its relevance for computer vision, the success of this approach suggests a plausibility proof for a class of feedforward models of object recognition in cortex.

To study the selectivity of visual perceptual impairment in children with early brain injury, eight visual perceptual tasks (L94), were administered to congenitally disabled children both with and without risk for cerebral visual... more

To study the selectivity of visual perceptual impairment in children with early brain injury, eight visual perceptual tasks (L94), were administered to congenitally disabled children both with and without risk for cerebral visual impairment (CVI). The battery comprised six object-recognition and two visuoconstructive tasks. Seven tasks were newly designed. For these normative data are presented (age 2.75±6.50 years). Because the recognition tasks required object naming, each item included a canonical control drawing and visual perceptual ability was evaluated relative to the non-verbal intelligence level, instead of chronological age. In 22 multiple disabled children with no indications of CVI, the frequency of impairment did not exceed that in the reference sample for any L94 task. In contrast, in 57 5-year-old children who were at risk for CVI due to pre-maturity or birth asphyxia, a signi®cant increase in the frequency of impairment was seen on six L94 tasks (range 12±38%). However, only ®ve children had more than two impairments, indicating that the de®cits were selective, not pervasive. We conclude that early brain lesions interfere with the functioning of particular visual subsystems, yet leave other subsystems intact and functioning within the normal range. q

Geometric hashing is a model-based recognition technique based on matching of transformation-invariant object representations stored in a hash table. In the last decade, a number of enhancements have been suggested to the basic method... more

Geometric hashing is a model-based recognition technique based on matching of transformation-invariant object representations stored in a hash table. In the last decade, a number of enhancements have been suggested to the basic method improving its performance and reliability. One of the important enhancements is rehashing, improving the computational performance by dealing with the problem of non-uniform occupancy of hash bins. However, the proposed rehashing schemes aim to redistribute the hash entries uniformly, which is not appropriate for Bayesian approach, another enhancement optimizing the recognition rate in presence of noise. In this paper, we derive the rehashing for Bayesian voting scheme, thus improving the computational performance by minimizing the hash table size and the number of bins accessed, while maintaining optimal recognition rate.

Object recognition and image understanding have increasingly become major subjects of interest for research activity in digital photogrammetry. This paper provides an overview of object recognition in photogrammetry, beginning with a... more

Object recognition and image understanding have increasingly become major subjects of interest for research activity in digital photogrammetry. This paper provides an overview of object recognition in photogrammetry, beginning with a problem statement and brief paradigm description. In order to exemplify the concept, automatic interior orientation is presented as an object recognition problem. Subsequent sections discuss the current status of object recognition by identifying relevant criteria, such as modelling, system strategies and inference components. Such criteria are useful for comparing object recognition systems or proposed approaches. Strengths and weaknesses of current systems are summarized, followed by a more detailed analysis of the modelling problem. Finally, two new approaches (scale-space and fusion of multisensor/multispectral data) are mentioned. These approaches serve as examples of promising new trends which have the potential of advancing object recognition to a new level.

Hand rehabilitation, following stroke or hand surgery, is repetitive and long duration and can be facilitated with the assistance of complex, heavy and cumbersome haptic gloves based on sensors. The present paper describes a virtual... more

Hand rehabilitation, following stroke or hand surgery, is repetitive and long duration and can be facilitated with the assistance of complex, heavy and cumbersome haptic gloves based on sensors. The present paper describes a virtual glove, software based, which tracks hand movements by using images collected from webcams and numerical analysis. Finger forces are calculated from the deformations impressed to some objects (of known elastic coefficient) grasped by the patient hand. The presented system is notable for simplicity, generality and low cost. Implementation and results of the proposed virtual glove will be the objects of a future paper. ᭧

The main goal of this survey is to present a complete analysis of object recognition methods based on local invariant features from a robotics perspective; a summary which can be used by developers of robot vision applications in the... more

The main goal of this survey is to present a complete analysis of object recognition methods based on local invariant features from a robotics perspective; a summary which can be used by developers of robot vision applications in the selection and development of object recognition systems. The survey includes a brief description of the main approaches reported in the literature, with more specific analyses of local interest point computation methods, local descriptor computation and matching methods, and geometric verification methods. Different methods are analyzed by considering the main requirements of robotics applications, such as real-time operation with limited on-board computational resources, and constrained observational conditions derived from the robot geometry (e.g. limited camera resolution). In addition, various object recognition systems are evaluated in a service-robot domestic environment, where the final task to be performed by a service robot is the manipulation of objects. It can be concluded from the results reported that (i) the most suitable keypoint detectors are ORB, BRISK, Fast Hessian, and DoG, (ii) the most suitable descriptors are ORB, BRISK, SIFT, and SURF, (iii) the final performance of object recognition systems using local invariant features under real-world conditions depends strongly on the geometric verification methods being used, and (iv) the best performing object recognition systems are built using ORB-ORB and DoG-SIFT keypoint-descriptor combinations. ORB-ORB based systems are faster, while DoG-SIFT are more robust to real-world conditions.

The outreach of computer vision to non-traditional areas has enormous potential to enable new ways of solving real world problems. One such problem is how to incorporate technology in the effort to protect endangered and threatened... more

The outreach of computer vision to non-traditional areas has enormous potential to enable new ways of solving real world problems. One such problem is how to incorporate technology in the effort to protect endangered and threatened species in the wild. This paper presents a snapshot of our interdisciplinary team's ongoing work in the Mojave Desert to build vision tools for field biologists to study the currently threatened Desert Tortoise and Mohave Ground Squirrel. Animal population studies in natural habitats present new recognition challenges for computer vision, where open set testing and access to just limited computing resources lead us to algorithms that diverge from common practices. We introduce a novel algorithm for animal classification that addresses the open set nature of this problem and is suitable for implementation on a smartphone. Further, we look at a simple model for object recognition applied to the problem of individual species identification. A thorough experimental analysis is provided for real field data collected in the Mojave desert.

In this paper we present the main features of software modules dedicated to the aid of visually impaired or blind users. The main aim of developing this software is to reduce or eliminate the need of separate dedicated devices for object... more

In this paper we present the main features of software modules dedicated to the aid of visually impaired or blind users. The main aim of developing this software is to reduce or eliminate the need of separate dedicated devices for object recognition and motion detection. The software modules are designed for Android operating system, used in majority of the smartphones today. There are two main trainable (ANN based)modules, namely, the object recognition module and the motion detection module. Image processing algorithms used to identify the objects and detect motion are described. Notification to the users is given by means of verbal messages in this system.

In this paper a non-linear extension to the synthetic discriminant function (SDF) is proposed. The SDF is a well known 2-D correlation filter for object recognition. The proposed nonlinear version of the SDF is derived from kernel-based... more

In this paper a non-linear extension to the synthetic discriminant function (SDF) is proposed. The SDF is a well known 2-D correlation filter for object recognition. The proposed nonlinear version of the SDF is derived from kernel-based learning. The kernel SDF is implemented in a nonlinear high dimensional space by using the kernel trick and it can improve the performance of the linear SDF by incorporating the image's class higher order moments. We show that this kernelized composite correlation filter has an intrinsic connection with the recently proposed correntropy function. We apply this kernel SDF to face recognition and simulations show that the kernel SDF significantly outperforms the traditional SDF as well as is robust in noisy data environments.

Vehicle detection and classification are daily challenges for computer vision algorithms. The wide range of applications, together with the large amount of data available, raises interest towards these topics up to the point at which new... more

Vehicle detection and classification are daily challenges for computer vision algorithms. The wide range of applications, together with the large amount of data available, raises interest towards these topics up to the point at which new techniques with excellent performances are developed constantly. Still, while trying to generalize the results to different targets, issues arise due to the large number of variables that affect the scores. In this work, we describe how to approach the delicate choice of the best vision-based application for vehicle detection and classification on a reallife dataset, performing parameter evaluation and scoring for the GMG+SVM, MoG2+SVM and Faster R-CNN techniques. We also highlight how the best network choice is affected by the specific usage requirements.

This paper addresses the problem of parametric representation and estimation of complex planar curves in 2-D, surfaces in 3-D and nonplanar space curves in 3-D. Curves and surfaces can be defined either parametrically or implicitly, and... more

This paper addresses the problem of parametric representation and estimation of complex planar curves in 2-D, surfaces in 3-D and nonplanar space curves in 3-D. Curves and surfaces can be defined either parametrically or implicitly, and we use the latter representation. A planar curve is the set of zeros of a smooth function of two variables X-Y, a surface is the set of zeros of a smooth function of three variables X-~-Z, and a space curve is the intersection of two surfaces, which are the set of zeros of two linearly independent smooth functions of three variables X-!/-Z. For example, the surface of a complex object in 3-D can be represented as a subset of a single implicit surface, with similar results for planar and space curves. We show how this unified representation can be used for object recognition, object position estimation, and segmentation of objects into meaningful subobjects, that is, the detection of "interest regions" that are more complex than high curvature regions and, hence, more useful as features for object recognition. Fitting implicit curves and surfaces to data would be ideally based on minimizing the mean square distance from the data points to the curve or surface. Since the distance from a point to a curve or surface cannot be computed exactly by direct methods, the approximate distance, which is a first-order approximation of the real distance, is introduced, generalizing and unifying previous results. We fit implicit curves and surfaces to data minimizing the approximate mean square distance, which is a nonlinear least squares problem. We show that in certain cases, this problem reduces to the generalized eigenvector fit, which is the minimization of the sum of squares of the values of the functions that define the curves or surfaces under a quadratic constraint function of the data. This fit is computationally reasonable to compute, is readily parallelizable, and, hence, is easily computed in real time. In general, the generalized eigenvector lb provides a very good initial estimate for the iterative minimization of the approximate mean square distance. Although we are primarily interested in the 2-D and 3-D cases, the methods developed herein are dimension independent. We show that in the case of algebraic curves and surfaces, i.e., those defined by sets of zeros of polynomials, the minimizers of the approximate mean square distance and the generalized eigenvector fit are invariant with respect to similarity transformations. Thus, the generalized eigenvector lit is independent of the choice of coordinate system, which is a very desirable property for object recognition, position estimation, and the stereo matching problem. Finally, as applications of the previous techniques, we illustrate the concept of "interest regions"

Shape representation and recognition is an important topic in many applications of computer vision and artificial intelligence, including character recognition, pattern recognition, machine monitoring, robot manipulation and production... more

Shape representation and recognition is an important topic in many applications of computer vision and artificial intelligence, including character recognition, pattern recognition, machine monitoring, robot manipulation and production part recognition. In this paper, a structural model based on boundary information is proposed to describe the silhouette of planar objects (especially machined parts). The structural model describes objects by a set of primitives, each of which is represented by three geometric features: its length, curvature, and relative orientation. This representation scheme not only compresses the data, but also provides a compact and meaningful form to facilitate further recognition operations. Based on this model, the object recognition is accomplished by using a multilayered feedforward neural network. The proposed model is transformation invafiant, which offers the necessary flexibility for real-time implementation in automated manufacturing systems. In addition, the numerical results for a set of ten reference shapes indicate that the matching engine can achieve very high success rates using short recognition times.

While the study of geometry has mainly concentrated on single viewpoint (SVP) cameras, there is growing attention to more general non-SVP systems. Here, we study an important class of systems that inherently have a non-SVP: a perspective... more

While the study of geometry has mainly concentrated on single viewpoint (SVP) cameras, there is growing attention to more general non-SVP systems. Here, we study an important class of systems that inherently have a non-SVP: a perspective camera imaging through an interface into a medium. Such systems are ubiquitous: They are common when looking into water-based environments. The paper analyzes the common flat-interface class of systems. It characterizes the locus of the viewpoints (caustic) of this class and proves that the SVP model is invalid in it. This may explain geometrical errors encountered in prior studies. Our physicsbased model is parameterized by the distance of the lens from the medium interface, besides the focal length. The physical parameters are calibrated by a simple approach that can be based on a single frame. This directly determines the system geometry. The calibration is then used to compensate for modeled system distortion. Based on this model, geometrical measurements of objects are significantly more accurate than if based on an SVP model. This is demonstrated in real-world experiments. In addition, we examine by simulation the errors expected by using the SVP model. We show that when working at a constant range, the SVP model can be a good approximation.

Research in learning algorithms and sensor hardware has led to rapid advances in artificial systems over the past decade. However, their performance continues to fall short of the efficiency and versatility of human behavior. In many... more

Research in learning algorithms and sensor hardware has led to rapid advances in artificial systems over the past decade. However, their performance continues to fall short of the efficiency and versatility of human behavior. In many ways, a deeper understanding of how human perceptual systems process and act upon physical sensory information can contribute to the development of better artificial systems. In the presented research, we highlight how the latest tools in computer vision, computer graphics, and virtual reality technology can be used to systematically understand the factors that determine how humans perform in realistic scenarios of complex task-solving.

Sleep has been shown to play a facilitating role in memory consolidation, whereas sleep deprivation leads to performance impairment both in humans and rodents. The effects of 4-h sleep deprivation on recognition memory were investigated... more

Sleep has been shown to play a facilitating role in memory consolidation, whereas sleep deprivation leads to performance impairment both in humans and rodents. The effects of 4-h sleep deprivation on recognition memory were investigated in the Djungarian hamster (Phodopus sungorus). Because sleep during the first hours after daily torpor has many similarities to recovery from sleep deprivation, the effects of spontaneous torpor on object recognition were also assessed.

In this paper, we have presented a system utilizing Radio Frequency Identification (RFID) for the assistance of blind people. The proposed system incorporates a mobile RFID reader module with an integrated ZigBee transceiver [1] for... more

In this paper, we have presented a system utilizing Radio Frequency Identification (RFID) for the assistance of blind people. The proposed system incorporates a mobile RFID reader module with an integrated ZigBee transceiver [1] for transmitting the tag's information. Utensils and other objects in the house or building are embedded with passive RFID tags (transponders) along with an audio file, recorded for and unique to each object, residing on the server. This system further takes in the way finding technique by employing an RFID tag grid [4] using an ample separation area. The reader reads the tags and transmits the data wirelessly to the server PC which in turn scans for the particular ID in the database and plays the corresponding audio file. A self designed coordinates system with a server side routing application is used for routing the person to a particular room requested, based on his current tag coordinates. The audio playback is relayed wirelessly using an FM transmitter to either a headset with FM receiver or a Smart Phone's FM radio. The feasibility and reliability of the developed system was tested by deploying the proposed system at Government Institute for Blind, Peshawar, Pakistan [6].

The detection, recognition and classification of features in a digital image is an important component of quality control systems in production and process engineering and industrial systems monitoring, in general. In this paper, a new... more

The detection, recognition and classification of features in a digital image is an important component of quality control systems in production and process engineering and industrial systems monitoring, in general. In this paper, a new pattern recognition system is presented that has been designed for the specific task of monitoring the quality of sheet-steel production in a rolling mill. The system is based on using both the Euclidean and Fractal geometric properties of an imaged object to develop training data that is used in conjunction with a supervised learning procedure based on the application of a fuzzy inference engine. Thus, the classification method includes the application of a set of features which include fractal parameters such as the Lacunarity and Fractal Dimension and thereby incorporates the characterisation of an object in terms of texture that, in this application, has metallurgical significance. The principal issues associated with object recognition are presented including a new segmentation algorithm. The selflearning procedure for designing a decision making engine using fuzzy logic and membership function theory is also presented and a new technique for the creation and extraction of information from a membership function considered. The methods discussed, and the system developed, have a range of applications in 'machine vision' and automatic inspection. However, in this publication, we focus on the development and implementation of a surface inspection system designed specifically for monitoring surface quality in the manufacture of sheet-steel. For this publication, we include a demonstration version of the system which can be downloaded, installed and utilised by interested readers as discussed in Section VI.

In modern times the quantity of on road vehicles is expanding very quickly. Most of the time, it is important to verify the identity of these vehicles for authorization of the transit regulation, overseeing parking garages. it is hard to... more

In modern times the quantity of on road vehicles is expanding very quickly. Most of the time, it is important to verify the identity of these vehicles for authorization of the transit regulation, overseeing parking garages. it is hard to check this colossal number of moving vehicles physically. Subsequently, building up a precise automatic license plate recognition model (ALPR) including character recognition is important to ease the issues mentioned above. We have developed a model based on multiple types of license plates from different countries. The dataset of images was trained using Yolov4 which uses CNN architectures. Character recognition was done using the Tesseract OCR after multiple image pre-processing techniques and morphological transformations. The proposed program has obtained an accuracy of 92% in license plate detection and 81% in character recognition.

In this study we tested the hypothesis that St John's wort (Hypericum perforatum) may counteract stress-induced memory impairment. Object recognition test and Morris water maze were used to determine whether administration of H.... more

In this study we tested the hypothesis that St John's wort (Hypericum perforatum) may counteract stress-induced memory impairment. Object recognition test and Morris water maze were used to determine whether administration of H. perforatum (350 mg kg −1 for 21 days), standardized to 0.3% hypericin content, protects against non-spatial and/or spatial memory impairments due to chronic restraint stress (2 h daily for 21 days). A group of rats administered the exogenous corticosterone at the dose of 5 mg kg −1 daily for 21 days, yielding its similar plasma levels as these observed in stress was run in parallel. In the first experiment all rats were tested for recognition memory in the object recognition test. On the following day, the animals were tested in open field and elevated "plus" maze to control for the contribution of respectively, motor and emotional effects of our treatments to the memory tests. In the second experiment, new group of stressed animals was tested for spatial memory in the water maze. We observed that H. perforatum prevented the deleterious effects of both chronic restraint stress and long-term corticosterone on learning and memory as measured in both, the object recognition and the water maze tests. The herb not only prevented stress-and corticosterone-induced memory impairments, but it significantly improved recognition memory (p < 0.01) in comparison to control. These results suggest that H. perforatum has a potential to prevent stress memory disorders.