Computer Vision Research Papers - Academia.edu (original) (raw)
Since the beginning of the movement for quality management programs based on ISO 9000 or/and ISO/IEC 17025 that were adopted by many type of industries, calibration laboratories have been under pressure to increase productivity. This work... more
Since the beginning of the movement for quality management programs based on ISO 9000 or/and ISO/IEC 17025 that were adopted by many type of industries, calibration laboratories have been under pressure to increase productivity. This work is about a computational system to automate the entire instrument calibration process. The computer vision system proposed is to be used to read the display of both analogue as well as digital instruments that do not have communication interfaces to computers such as GPIB or RS-232. In order to figure out instrument indication, the system employs an optical character recognition technique in digital displays and Canny's method for edge detection and the Hough's transform method for line localization in analogue displays, accelerating the data acquisition process and makes it less prone to errors. Consequently, it contributes to improve calibration and reduce costs, increasing the number of instruments with quality assured measurements.
A new method based on a computer vision and statistical learning system is proposed to estimate the wear level in cutting inserts and to identify the time for its replacement. AISI SAE 1045 and 4140 steel bars of 250 mm of length and 90... more
A new method based on a computer vision and statistical learning system is proposed to estimate the wear level in cutting inserts and to identify the time for its replacement. AISI SAE 1045 and 4140 steel bars of 250 mm of length and 90 mm of diameter were machined using a CNC parallel lathe. The image acquisition system comprised a Pulnix PE2015 B/W camera; a 70XL industrial zoom, with an extension tube of 1X; several lenses, a DCR®III regulated light source and a diffuse lighting system. The images were captured by a Matrox Meteor II card and pre-processed and segmented with Matlab. For each wear region, a set of 9 geometrical descriptors was obtained. The cluster analysis revealed the presence of three distinct categories that corresponded to low, medium and high wear levels. The effectiveness of the classification was verified by means of a LDA class reconstruction that reported a Fowlkes-Mallows index of 0.8571. The LDA likelihood estimates of the wear region provide a useful tool insert replacement criterion. ABSTRACT A new method based on a computer vision and statistical learning system is proposed to estimate the wear level in cutting inserts and to identify the time for its replacement. AISI SAE 1045 and 4140 steel bars of 250 mm of length and 90 mm of diameter were machined using a CNC parallel lathe. The image acquisition system comprised a Pulnix PE2015 B/W camera; a 70XL industrial zoom, with an extension tube of 1X; several lenses, a DCR®III regulated light source and a diffuse lighting system. The images were captured by a Matrox Meteor II card and pre-processed and segmented with Matlab. For each wear region, a set of 9 geometrical descriptors was obtained. The cluster analysis revealed the presence of three distinct categories that corresponded to low, medium and high wear levels. The effectiveness of the classification was verified by means of a LDA class reconstruction that reported a Fowlkes-Mallows index of 0.8571. The LDA likelihood estimates of the wear region provide a useful tool insert replacement criterion.
- by Enrique Alegre and +1
- •
- Computer Vision, Image Processing, Statistical Analysis, Modeling
Computer vision algorithms are natural candidates for high performance computing systems. Algorithms in computer vision are characterized by complex and repetitive operations on large amounts of data involving a variety of data... more
Computer vision algorithms are natural candidates for high performance computing systems. Algorithms in computer vision are characterized by complex and repetitive operations on large amounts of data involving a variety of data interactions (e.g., point operations, neighborhood operations, global operations). In this paper, we describe the use of the custom computing approach to meet the computation and communication needs of computer vision algorithms. By customizing hardware architecture at the instruction level for every application, the optimal grain size needed for the problem at hand and the instruction granularity can be matched. A custom computing approach can also reuse the same hardware by reconfiguring at the software level for different levels of the computer vision application. We demonstrate the advantages of our approach using Splash 2-a Xilinx 4010-based custom computer.
We present solutions to two problems arising in the context of automatically focusing a general-purpose servo-controlled video camera on manually selected targets: (i) how to best determine the focus motor position providing the sharpest... more
We present solutions to two problems arising in the context of automatically focusing a general-purpose servo-controlled video camera on manually selected targets: (i) how to best determine the focus motor position providing the sharpest focus on an object point at an unknown distance; and (ii) how to compute the distance to a sharply focused object point. We decompose the first problem into two parts: how to measure the sharpness of focus with a criterion function, and how to optimally locate the mode of the criterion function. After analyzing defocus as an attenuation of high spatial-frequencies and reviewing and experimentally comparing a number of possihie criterion functions, we find that a method based on maximizing the magnitude of the intensity gradient proves superior to the others in being unimodal, monotonic about the mode, and robust in the presence of noise. We employ the Fibonacci search technique to optimally locate the mode of the criterion function. We solve the second problem by application of the thick-lens law. We can compute the distance to objects lying between 1 and 3 m with a precision of 2.5 percent, commensurate to the depth of field of the lens. The precision decreases quadratically with increasing object distance, but this effect is insignificant at the (small) object distances investigated. The solutions are computed in the time required to digitize and filter 11 images, a total of approximately 15 seconds per point for this implementation.
One aspect of a growing VR industry that developers have to face is the ethics behind the technology. This includes making sure that technology is readily available and accessible to as large of a population as possible. Current consumer... more
One aspect of a growing VR industry that developers have to face is the ethics behind the technology. This includes making sure that technology is readily available and accessible to as large of a population as possible. Current consumer virtual reality (VR) headsets typically utilize two controllers to navigate a virtual environment, leading to accessibility issues for potential users that cannot effectively operate a controller. We propose NaVRgate, a proof of concept idea that removes the need for controllers in which a user uses expressions to navigate a virtual environment. The system utilizes the computer webcam and computer vision face and eye position tracking to capture the nature of expression tracking, with certain positional thresholds representing different facial expressions. To test this system, We design a game environment where a user navigates with either a controller or the face position tracker, collecting a set of orbs scattered around the map as quickly as they can, comparing the efficiency between navigation through the novel computer vision and traditional controller methods. Users are also questioned on the difficulty of use and experience with each control input method. This paper details the process of the development and drafts, to the statistical experiment constructed to determine the efficiency of head gestures.
Plants are fundamentally important to life. Key research areas in plant science include plant species identification, weed classification using hyper spectral images, monitoring plant health and tracing leaf growth, and the semantic... more
Plants are fundamentally important to life. Key research areas in plant science include plant species identification, weed classification using hyper spectral images, monitoring plant health and tracing leaf growth, and the semantic interpretation of leaf information. Botanists easily identify plant species by discriminating between the shape of the leaf, tip, base, leaf margin and leaf vein, as well as the texture of the leaf and the arrangement of leaflets of compound leaves. Because of the increasing demand for experts and calls for biodiversity, there is a need for intelligent systems that recognize and characterize leaves so as to scrutinize a particular species, the diseases that affect them, the pattern of leaf growth, and so on. We review several image processing methods in the feature extraction of leaves, given that feature extraction is a crucial technique in computer vision. As computers cannot comprehend images, they are required to be converted into features by individually analyzing image shapes, colors, textures and moments. Images that look the same may deviate in terms of geometric and photometric variations. In our study, we also discuss certain machine learning classifiers for an analysis of different species of leaves.
7th International Conference on Artificial Intelligence and Applications (AI 2021) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of Artificial Intelligence and its... more
7th International Conference on Artificial Intelligence and Applications (AI 2021) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of Artificial Intelligence and its applications. The Conference
looks for significant contributions to all major fields of the Artificial Intelligence, Soft Computing in theoretical and practical aspects. The aim of the Conference is to provide a platform to the researchers and practitioners from both academia as well as industry to meet and
share cutting-edge development in the field. Authors are solicited to contribute to the conference by submitting articles that illustrate research results, projects, surveying works and industrial experiences that describe significant advances in the following areas, but are not limited to.
This paper concerns the problem of actively searching for and localizing ground features by a coordinated team of air and ground robotic sensor platforms. The approach taken builds on well known Decentralized Data Fusion (DDF)... more
This paper concerns the problem of actively searching for and localizing ground features by a coordinated team of air and ground robotic sensor platforms. The approach taken builds on well known Decentralized Data Fusion (DDF) methodology. In particular, it brings together established representations developed for identification and linearized estimation problems to jointly address feature detection and localization. This provides transparent and scalable integration of sensor information from air and ground platforms. As in previous studies, an Informationtheoretic utility measure and local control strategy drive the robots to uncertainty reducing team configurations. Complementary characteristics in terms of coverage and accuracy are revealed through analysis of the observation uncertainty for air and ground on-board cameras. Implementation results for a detection and localization example indicate the ability of this approach to scalably and efficiently realize such collaborative potential.
Abstract—CAMTUAL: An interactive mobile app to research fundamental aspects of deep learning technologies to support Smartphones users a 2D-to-3D automatic converter using smartphones built-in cameras. The whole idea inspired by MagicToon... more
Abstract—CAMTUAL: An interactive mobile app to research fundamental aspects of deep learning technologies to support Smartphones users a 2D-to-3D automatic converter using smartphones built-in cameras. The whole idea inspired by MagicToon App [1]. As 3D video viewing becomes important and virtual reality market started, the request for 3D devices and contents is growing faster. Producing 3D videos, still remains as a big challenge. In this paper we presented a mobile app that uses a 3D deep neural networks algorithm to automatically convert 2D video and images to a stereoscopic 3D format [2]. In comparison to other mobile apps that doesn’t use automatic 2D-to-3D conversion algorithms, our mobile app uses method that trained end-to-end automatically on stereo pairs extracted from existing 3D videos. This novel mobile app approach outperforms baselines in human evaluations and quantitative.
Kinect for Xbox 360 is a low-cost controller-free device originally designed for gaming and entertainment experience by Microsoft Corporation. This device is equipped with one IR camera, one color camera and one IR projector to produce... more
Kinect for Xbox 360 is a low-cost controller-free device originally designed for gaming and entertainment experience by Microsoft Corporation. This device is equipped with one IR camera, one color camera and one IR projector to produce images with voxels (depth pixels). This additional dimension to the image, makes it a tempting device to use in robotics applications. This work presents a solution using the Kinect sensor to cope with one important aspect of autonomous mobile robotics, obstacle avoidance.
The International Journal of Artificial Intelligence & Applications (IJAIA) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Artificial Intelligence & Applications... more
The International Journal of Artificial Intelligence & Applications (IJAIA) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Artificial Intelligence & Applications (IJAIA). It is an international journal intended for professionals and researchers in all fields of AI for researchers, programmers, and software and hardware manufacturers. The journal also aims to publish new attempts in the form of special issues on emerging areas in Artificial Intelligence and applications.
Purpose-The purpose of this paper is to find a real-time parking location for a four-wheeler. Design/methodology/approach-Real-time parking availability using specific infrastructure requires a high cost of installation and maintenance... more
Purpose-The purpose of this paper is to find a real-time parking location for a four-wheeler. Design/methodology/approach-Real-time parking availability using specific infrastructure requires a high cost of installation and maintenance cost, which is not affordable to all urban cities. The authors present statistical block matching algorithm (SBMA) for real-time parking management in small-town cities such as Bhavnagar using an in-built surveillance CCTV system, which is not installed for parking application. In particular, data from a camera situated in a mall was used to detect the parking status of some specific parking places using a region of interest (ROI). The method proposed computes the mean value of the pixels inside the ROI using blocks of different sizes (8 Â 10 and 20 Â 35), and the values were compared among different frames. When the difference between frames is more significant than a threshold, the process generates "no parking space for that place." Otherwise, the method yields "parking place available." Then, this information is used to print a bounding box on the parking places with the color green/red to show the availability of the parking place. Findings-The real-time feedback loop (car parking positions) helps the presented model and dynamically refines the parking strategy and parking position to the users. A whole-day experiment/validation is shown in this paper, where the evaluation of the method is performed using pattern recognition metrics for classification: precision, recall and F1 score. Originality/value-The authors found real-time parking availability for Himalaya Mall situated in Bhavnagar, Gujarat, for 18th June 2018 video using the SBMA method with accountable computational time for finding parking slots. The limitations of the presented method with future implementation are discussed at the end of this paper.
A method is presented for the recovery of optical flow. The key idea is that the local spatial structure of optical flow, with the exception of surface boundaries, is usually rather coherent and can thus be appropriately approximated by a... more
A method is presented for the recovery of optical flow. The key idea is that the local spatial structure of optical flow, with the exception of surface boundaries, is usually rather coherent and can thus be appropriately approximated by a linear vector field. According to the proposed method, the optical flow components and their first order spatial derivatives are computed
This work proposes a pose-based visual servoing control, through using planar homography, to estimate the position and orientation of a miniature helicopter relative to a known pattern. Once having the current flight information, the... more
This work proposes a pose-based visual servoing control, through using planar homography, to estimate the position and orientation of a miniature helicopter relative to a known pattern. Once having the current flight information, the nonlinear underactuated controller presented in one of our previous works, which attends all flight phases, is used to guide the rotorcraft during a 3Dtrajectory tracking task. In the sequel, the simulation framework and the results obtained using it are presented and discussed, validating the proposed controller when a visual system is used to determine the helicopter pose information.
For mobile robot manipulation, autonomous object detection and localization is at the present still an open issue. In this paper is presented a method for detection and localization of simple colored geometric objects like cubes, prisms... more
For mobile robot manipulation, autonomous object detection and localization is at the present still an open issue. In this paper is presented a method for detection and localization of simple colored geometric objects like cubes, prisms and cylinders, located over a table. The method proposed uses a passive stereovision system and consists on two steps. The first is colored object detection, where it is used a combination of a color segmentation procedure with an edge detection method, to restrict colored regions. Second step consists on pose recovery; where the merge of the colored objects detection mask is combined with the disparity map coming from stereo camera. Later step is very important to avoid noise inherent to the stereo correlation process. Filtered 3D data is then used to determine the main plane where the objects are posed, the table, and then the footprint is used to localize them in the stereo camera reference frame and then to the world reference frame.
This manuscript presents an autonomous navigation of a mobile robot using SLAM, while relying on an active stereo vision. We show a framework of low-level software coding which is necessary when the vision is used for multiple purposes... more
This manuscript presents an autonomous navigation of a mobile robot using SLAM, while relying on an active stereo vision. We show a framework of low-level software coding which is necessary when the vision is used for multiple purposes such as obstacle discovery. The built system incorporated a number of SLAM based routines while replying on stereo vision mechanism. The system was implemented and tested on a mobile robot platform, and perform an experiment of autonomous navigation in an indoor environment.
In this paper an unsupervised colour image segmentation algorithm is presented. This method combines the advantages of the approaches based on split&merge and region growing, and the use of the RGB and HSV colour representation... more
In this paper an unsupervised colour image segmentation algorithm is presented. This method combines the advantages of the approaches based on split&merge and region growing, and the use of the RGB and HSV colour representation models. The effectiveness of the proposed method has been verified by the implementation of the algorithm using three different testing images with homogeneous regions, spatially compact and continuous. It was observed that the proposed algorithm outperforms the other analysed techniques requiring shorter processing time when compared with the other analysed methods.
Progress in understanding the brain mechanisms underlying vision requires the construction of computational models that not only emulate the brain's anatomy and physiology, but ultimately match its performance on visual tasks. In recent... more
Progress in understanding the brain mechanisms underlying vision requires the construction of computational models that not only emulate the brain's anatomy and physiology, but ultimately match its performance on visual tasks. In recent years, ''natural'' images have become popular in the study of vision and have been used to show apparently impressive progress in building such models. Here, we challenge the use of uncontrolled ''natural'' images in guiding that progress. In particular, we show that a simple V1-like model-a neuroscientist's ''null'' model, which should perform poorly at real-world visual object recognition tasks-outperforms state-of-the-art object recognition systems (biologically inspired and otherwise) on a standard, ostensibly natural image recognition test. As a counterpoint, we designed a ''simpler'' recognition test to better span the real-world variation in object pose, position, and scale, and we show that this test correctly exposes the inadequacy of the V1-like model. Taken together, these results demonstrate that tests based on uncontrolled natural images can be seriously misleading, potentially guiding progress in the wrong direction. Instead, we reexamine what it means for images to be natural and argue for a renewed focus on the core problem of object recognition-real-world image variation.
In this paper we propose to develop a device that can be used by the visually challenged to read normal English books. Here we focus on letter-by-letter segmentation, recognition and transliteration to the Braille format. The device would... more
In this paper we propose to develop a device that can be used by the visually challenged to read normal English books. Here we focus on letter-by-letter segmentation, recognition and transliteration to the Braille format. The device would use on board software to do the recognition and conversion. The recognized characters are transmitted to the interface which converts the characters to the Braille format which can be felt-read by the visually challenged. The device would be cheaper among its counterparts.
This paper introduces a new rigorous theoretical framework to address discrete MRF-based optimization in computer vision. Such a framework exploits the powerful technique of Dual Decomposition. It is based on a projected subgradient... more
This paper introduces a new rigorous theoretical framework to address discrete MRF-based optimization in computer vision. Such a framework exploits the powerful technique of Dual Decomposition. It is based on a projected subgradient scheme that attempts to solve an MRF optimization problem by first decomposing it into a set of appropriately chosen subproblems, and then combining their solutions in a principled way. In order to determine the limits of this method, we analyze the conditions that these subproblems have to satisfy and demonstrate the extreme generality and flexibility of such an approach. We thus show that by appropriately choosing what subproblems to use, one can design novel and very powerful MRF optimization algorithms. For instance, in this manner we are able to derive algorithms that: 1) generalize and extend state-of-the-art message-passing methods, 2) optimize very tight LP-relaxations to MRF optimization, and 3) take full advantage of the special structure that may exist in particular MRFs, allowing the use of efficient inference techniques such as, e.g., graph-cut-based methods. Theoretical analysis on the bounds related with the different algorithms derived from our framework and experimental results/comparisons using synthetic and real data for a variety of tasks in computer vision demonstrate the extreme potentials of our approach.
Finite element models of current structures often behave differently than the structure itself. Model updating techniques are used to enhance the capabilities of the numerical model such that it behaves like the real structure.... more
Finite element models of current structures often behave differently than the structure itself. Model updating techniques are used to enhance the capabilities of the numerical model such that it behaves like the real structure. Experimental data is used in model updating techniques to identify the parameters of the numerical model. In civil infrastructure these model updating techniques use either static or dynamic measurements, separately. This paper studies how a Bayesian updating framework behaves when both static and dynamic data are used to updated the model. Displacements at specific structure locations are obtained for static tests using a computer vision method. High density mode shapes and natural frequencies are obtained using a moving accelerometer structure. The static data and the modal characteristics are combined in a Bayesian modal updating technique that accounts for the incompleteness and uncertainty of the data as well as the possible nonuniqueness of the solution. Results show how the posterior probability density function changes when different type of information is included for updating.
- by Boris Zárate and +1
- •
- Computer Vision, Model Updating, Natural Frequency, Dynamic Panel Data
Conoscopic holography is an interferometric technique that permits the recording of three-dimensional objects. A two-step scheme is presented to recover an opaque object's shape from its conoscopic hologram, consisting of a reconstruction... more
Conoscopic holography is an interferometric technique that permits the recording of three-dimensional objects. A two-step scheme is presented to recover an opaque object's shape from its conoscopic hologram, consisting of a reconstruction algorithm to give a first estimate of the shape and an iterative restoration procedure that uses the object's support information to make the reconstruction more robust. The existence, uniqueness, and stability of the solution, as well as the convergence of the restoration algorithm, are studied. A preliminary experimental result is presented.
In this paper, we present a new, computationally efficient simulator for time-varying multi-path (fast fading) vector channels that can be used to evaluate the performance of antenna array wireless receivers at the base station. The... more
In this paper, we present a new, computationally efficient simulator for time-varying multi-path (fast fading) vector channels that can be used to evaluate the performance of antenna array wireless receivers at the base station. The development of the simulator is based on the emulation of the spatio-temporal correlation properties of the vector channel. The channel is modeled as a single-input multi-output finite impulse response (FIR) system with time-varying coefficients which are obtained via the application of a space-time correlation shaping transformation on some independent random sequences. The various parts of the new simulator are detailed and channel simulation realizations are presented and commented.
This paper proposes a system that relates objects in an image using occlusion cues and arranges them according to depth. The system does not rely on a priori knowledge of the scene structure and focuses on detecting special points, such... more
This paper proposes a system that relates objects in an image using occlusion cues and arranges them according to depth. The system does not rely on a priori knowledge of the scene structure and focuses on detecting special points, such as T-junctions and highly convex contours, to infer the depth relationships between objects in the scene. The system makes extensive use of the binary partition tree as hierarchical region-based image representation jointly with a new approach for candidate T-junction estimation. Since some regions may not involve T-junctions, occlusion is also detected by examining convex shapes on region boundaries. Combining T-junctions and convexity leads to a system which only relies on low level depth cues and does not rely on semantic information. However, it shows a similar or better performance with the state-of-the-art while not assuming any type of scene.
Food recognition is difficult because food items are deformable objects that exhibit significant variations in appearance. We believe the key to recognizing food is to exploit the spatial relationships between different ingredients (such... more
Food recognition is difficult because food items are deformable objects that exhibit significant variations in appearance. We believe the key to recognizing food is to exploit the spatial relationships between different ingredients (such as meat and bread in a sandwich). We propose a new representation for food items that calculates pairwise statistics between local features computed over a soft pixellevel segmentation of the image into eight ingredient types. We accumulate these statistics in a multi-dimensional histogram, which is then used as a feature vector for a discriminative classifier. Our experiments show that the proposed representation is significantly more accurate at identifying food than existing methods.
Kawasaki 2 1 2-8 5 8 2Japan fmiki. y amada, o samu1. y amaguchi, akiko. nakashima, t ake shi. mit ag@t o shiba. co. j p Kaz uhir oFukui De par t me nto fCo mput e rScie nce Gr aduat eSchoo lo fSyst e ms and I nf o r mat io nEngine e r ing... more
Kawasaki 2 1 2-8 5 8 2Japan fmiki. y amada, o samu1. y amaguchi, akiko. nakashima, t ake shi. mit ag@t o shiba. co. j p Kaz uhir oFukui De par t me nto fCo mput e rScie nce Gr aduat eSchoo lo fSyst e ms and I nf o r mat io nEngine e r ing Univ e r sit yo fTsukuba 1-1-1Te n-no udai,
This paper describes a 2D motion detection method developed for a road traffic monitoring system. Such systems collect data allowing the management of traffic, increasing road security and traffic capacity. The objective of the... more
This paper describes a 2D motion detection method developed for a road traffic monitoring system. Such systems collect data allowing the management of traffic, increasing road security and traffic capacity. The objective of the development of this method was to allow the construction of a road traffic monitoring system that would work in real time, based on low cost hardware, namely a video camera, an image acquisition board and a Pentium 133MHz personal computer. In order to the system work in real time the algorithm for the 2D motion detection had to be simple and at the same time provide the desirable high vehicle detection rate.
An off-line Nepali handwritten character recognition, based on the neural networks, is described in this paper. A good set of spatial features are extracted from character images. Accuracy and efficiency of Multilayer Perceptron (MLP) and... more
An off-line Nepali handwritten character recognition, based on the neural networks, is described in this paper. A good set of spatial features are extracted from character images. Accuracy and efficiency of Multilayer Perceptron (MLP) and Radial Basis Function (RBF) classifiers are analyzed. Recognition systems are tested with three datasets for Nepali handwritten numerals, vowels and consonants. The strength of this research is the efficient feature extraction and the comprehensive recognition techniques, due to which, the recognition accuracy of 94.44% is obtained for numeral dataset, 86.04% is obtained for vowel dataset and 80.25% is obtained for consonant dataset. In all cases, RBF based recognition system outperforms MLP based recognition system but RBF based recognition system takes little more time while training.
This paper describes an underwater walking robotic system being developed under the name AQUA, the goals of the AQUA project, the overall hardware and software design, the basic hardware and sensor packages that have been developed, and... more
This paper describes an underwater walking robotic system being developed under the name AQUA, the goals of the AQUA project, the overall hardware and software design, the basic hardware and sensor packages that have been developed, and some initial experiments. The robot is based on the RHex hexapod robot and uses a suite of sensing technologies, primarily based on computer vision and INS, to allow it to navigate and map clear shallow-water environments. The sensor-based navigation and mapping algorithms are based on the use of both artificial floating visual and acoustic landmarks as well as on naturally occurring underwater landmarks and trinocular stereo.
Perception techniques in novel times have enormously improved in autonomously and accurately predicting the ultimate states of the delivery robots. The precision and accuracy in recent research lead to high computation costs for... more
Perception techniques in novel times have enormously improved in autonomously and accurately predicting the ultimate states of the delivery robots. The precision and accuracy in recent research lead to high computation costs for autonomous locomotion and expensive sensors and server dependency. Low computational algorithms for delivery robots are more viable as compared to pipelines used in autonomous vehicles or prevailing delivery robots. A blend of different autonomy approaches, including semantic segmentation, obstacle detection, obstacle tracking, and high fidelity maps, is presented in our work. Moreover, LCPP comprises low computational algorithms feasible on embedded devices with algorithms running more efficiently and accurately. Research also analyzes state-of-the-art algorithms via practical applications. Low computational algorithms have a downside of accuracy, which is not as proportional as computation. Finally, the study proposes that this algorithm will be more realizable as compared to Level 5 autonomy for delivery robots.
- by Soofiyan Atar and +1
- •
- Robotics, Computer Vision, Perception
Tracking of objects or image region undergoing nonrigid transformation is a challenging and central problem of computer vision. It gets further complicated when the object/region to be tracked is similar to other nearby objects/regions or... more
Tracking of objects or image region undergoing nonrigid transformation is a challenging and central problem of computer vision. It gets further complicated when the object/region to be tracked is similar to other nearby objects/regions or background. This paper presents a Region Of Interest(ROI) tracking approach based on Maxwell's demon based image registration algorithm. This approach doesn't require features of an object/region to be extracted, but rather works only on the pixel intensities. This enables it to be suitable for tracking object/region undergoing non-rigid transformations and having little contrast with the background. The extensibility of the current approach to more complex problems like multiple ROI tracking and to handle almost any arbitrary changes in ROI is evident. We demonstrate the proposed non-rigid ROI tracking algorithm using endoscopy video data which is one of the potential applications of proposed algorithm.
In this paper, we propose a neural network model for human emotion and gesture classification. We demonstrate that the proposed architecture represents an effective tool for real-time processing of customer's behavior for distributed... more
In this paper, we propose a neural network model for human emotion and gesture classification. We demonstrate that the proposed architecture represents an effective tool for real-time processing of customer's behavior for distributed on-land systems, such as information kiosks, automated cashiers and ATMs. The proposed approach combines most recent biometric techniques with the neural network approach for real-time emotion and behavioral analysis. In the series of experiments, emotions of human subjects were recorded, recognized, and analyzed to give statistical feedback of the overall emotions of a number of targets within a certain time frame. The result of the study allows automatic tracking of user’s behavior based on a limited set of observations.
Figure 1. Creepy Tracker is an open-source toolkit that provides spatial information about people and interactive surfaces. To do this, it resorts to multiple depth sensing cameras (A, B). It helps the design of systems that handle, for... more
Figure 1. Creepy Tracker is an open-source toolkit that provides spatial information about people and interactive surfaces. To do this, it resorts to multiple depth sensing cameras (A, B). It helps the design of systems that handle, for instance, (C) interactive tabletops, (D) vertical surfaces, (E) floor projections and even capture avatars for (F) telepresence or (G) virtual reality.
It is common practice to utilize evidence from biological and psychological vision experiments to develop computational models for low-level feature extraction. The receptive profiles of simple cells in mammalian visual systems have been... more
It is common practice to utilize evidence from biological and psychological vision experiments to develop computational models for low-level feature extraction. The receptive profiles of simple cells in mammalian visual systems have been found to closely resemble Gabor filters. ...
Extraction of text from image and video is an important step in building efficient indexing and retrieval systems for multimedia databases. We adopt a hybrid approach for such text extraction by exploiting a number of characteristics of... more
Extraction of text from image and video is an important step in building efficient indexing and retrieval systems for multimedia databases. We adopt a hybrid approach for such text extraction by exploiting a number of characteristics of text blocks in color images and video frames. Our system detects both caption text as well as scene text of different font, size, color and intensity. We have developed an application for on-line extraction and recognition of texts from videos. Such texts are used for retrieval of video clips based on any given keyword. The application is available on the web for the readers to repeat our experiments and also to try text extraction and retrieval from their own videos.
This paper presents a novel approach to recognizing driver activities using a multi-perspective (i.e., four camera views) multi-modal (i.e., thermal infrared and color) video-based system for robust and real-time tracking of important... more
This paper presents a novel approach to recognizing driver activities using a multi-perspective (i.e., four camera views) multi-modal (i.e., thermal infrared and color) video-based system for robust and real-time tracking of important body parts. The multi-perspective characteristics of the system provides redundant trajectories of the body parts, while the multi-modal characteristics of the system provides robustness and reliability of feature detection and tracking. The combination of a deterministic activity grammar (called 'operationtriplet') and a Hidden Markov model-based classifier provides semantic-level analysis of human activity. The application context for this research is that of intelligent vehicles and driver assistance systems. Experimental results in real-world street driving demonstrate effectiveness of the proposed system.
We address the problem of stitching together the three videos acquired by a special rig consisting of three high resolution cameras. The three cameras are placed in the horizontal plane on the top of the service vehicle in a way that the... more
We address the problem of stitching together the three videos acquired by a special rig consisting of three high resolution cameras. The three cameras are placed in the horizontal plane on the top of the service vehicle in a way that the fields of view of the lateral cameras overlap with the field of view of the middle camera. In the presented approach, the transformations between the common parts of the corresponding video frames are approximated by planar projective mappings. The required mappings are estimated by aligning the common parts of the three views in corresponding video frames. The experiments have been performed on production EuroRAP videos provided by our industrial partner. The obtained results confirm that the presented approach would simplify the existing road inspection procedures relying on the recorded multi-view video.
Recognizing human action from image sequences is an active area of research in computer vision. In this paper, we present a novel method for human action recognition from image sequences in different viewing angles that uses the Cartesian... more
Recognizing human action from image sequences is an active area of research in computer vision. In this paper, we present a novel method for human action recognition from image sequences in different viewing angles that uses the Cartesian component of optical flow velocity and human body shape feature vector information. We use principal component analysis to reduce the higher dimensional shape feature space into low dimensional shape feature space. We represent each action using a set of multidimensional discrete hidden Markov model and model each action for any viewing direction. We performed experiments of the proposed method by using KU gesture database. Experimental results based on this database of different actions show that our method is robust.
Real-time eye and iris tracking is important for handsoff gaze-based password entry, instrument control by paraplegic patients, Internet user studies, as well as homeland security applications. In this project, a smart camera, LabVIEW and... more
Real-time eye and iris tracking is important for handsoff gaze-based password entry, instrument control by paraplegic patients, Internet user studies, as well as homeland security applications. In this project, a smart camera, LabVIEW and vision software tools are utilized to generate eye detection and tracking algorithms. The algorithms are uploaded to the smart camera for on-board image processing. Eye detection refers to finding eye features in a single frame. Eye tracking is achieved by detecting the same eye features across multiple image frames and correlating them to a particular eye. The algorithms are tested for eye detection and tracking under different conditions including different angles of the face, head motion speed, and eye occlusions to determine their usability for the proposed applications. This paper presents the implemented algorithms and performance results of these algorithms on the smart camera.
Image blur and noise are difficult to avoid in many situations and can often ruin a photograph. We present a novel image deconvolution algorithm that deblurs and denoises an image given a known shift-invariant blur kernel. Our algorithm... more
Image blur and noise are difficult to avoid in many situations and can often ruin a photograph. We present a novel image deconvolution algorithm that deblurs and denoises an image given a known shift-invariant blur kernel. Our algorithm uses local color statistics derived from the image as a constraint in a unified framework that can be used for deblurring, denoising, and upsampling. A pixel's color is required to be a linear combination of the two most prevalent colors within a neighborhood of the pixel. This two-color prior has two major benefits: it is tuned to the content of the particular image and it serves to decouple edge sharpness from edge strength. Our unified algorithm for deblurring and denoising out-performs previous methods that are specialized for these individual applications. We demonstrate this with both qualitative results and extensive quantitative comparisons that show that we can out-perform previous methods by approximately 1 to 3 DB.
We present novel approaches for fully automated extraction of tree-like tubular structures from 3-D image stacks. A 4-D Open-Curve Active Contour (Snake) model is proposed for simultaneous 3-D centerline tracing and local radius... more
We present novel approaches for fully automated extraction of tree-like tubular structures from 3-D image stacks. A 4-D Open-Curve Active Contour (Snake) model is proposed for simultaneous 3-D centerline tracing and local radius estimation. An image energy term, stretching term, and a novel region-based radial energy term constitute the energy to be minimized. This combination of energy terms allows the 4-D open-curve snake model, starting from an automatically detected seed point, to stretch along and fit the tubular structures like neurites and blood vessels. A graph-based curve completion approach is proposed to merge possible fragments caused by discontinuities in the tree structures. After tree structure extraction, the centerlines serve as the starting points for a Fast Marching segmentation for which the stopping time is automatically chosen. We illustrate the performance of our method with various datasets.
Computer vision applications for mobile phones are gaining increasing attention due to several practical needs resulting from the popularity of digital cameras in today's mobile phones. In this work, we consider the task of face detection... more
Computer vision applications for mobile phones are gaining increasing attention due to several practical needs resulting from the popularity of digital cameras in today's mobile phones. In this work, we consider the task of face detection and authentication in mobile phones and experimentally analyze a face authentication scheme using Haar-like features with Ad-aBoost for face and eye detection, and Local Binary Pattern (LBP) approach for face authentication. For comparison, another approach to face detection using skin color for fast processing is also considered and implemented. Despite the limited CPU and memory capabilities of today's mobile phones, our experimental results show good face detection performance and average authentication rates of 82% for small-sized faces (40×40 pixels) and 96% for faces of 80×80 pixels. The system is running at 2 frames per second for images of 320×240 pixels. The obtained results are very promising and assess the feasibility of face authentication in mobile phones. Directions for further enhancing the performance of the system are also discussed.
Visual impairment and blindness caused by infectious diseases has been greatly reduced, but increasing numbers of people are at risk of age-related visual impairment. Visual information is the basis for most navigational tasks, so... more
Visual impairment and blindness caused by infectious diseases has been greatly reduced, but increasing numbers of people are at risk of age-related visual impairment. Visual information is the basis for most navigational tasks, so visually impaired individuals are at disadvantage because appropriate information about the surrounding environment is not available. With the recent advances in inclusive technology it is possible to extend the support given to people with visual impairment during their mobility. In this context we propose a system, named SmartVision, whose global objective is to give blind users the ability to move around in unfamiliar environments, whether indoor or outdoor, through a user friendly interface. This paper is focused mainly in the development of the computer vision module of the SmartVision system.
One of the major problems an Computer Visaon zs to build systems with the abilily of detecting shapes in arbitrary "real World" images. The target application of our work is the correct adentzfication of road trafic signs in images taken... more
One of the major problems an Computer Visaon zs to build systems with the abilily of detecting shapes in arbitrary "real World" images. The target application of our work is the correct adentzfication of road trafic signs in images taken by a car mounted camera. The baszc technique used in this kind of situation is to compare each portzon of an zmage with a set of known models. The approach taken an our work as to zmplement this comparison with Cellular Neural Networks, making it possable to eficaently use a massively parallel architecture. In order to reduce the response time of the system, our approach also ancludes data reduction techniques. The results of several tests, an dzfferent conditions, are reported in the paper. The system correctly detects a test shape in almost all the experiment performed. The paper also contains a detailed description of the system archatecture and of the processzng steps.