Computer Vision and Pattern Recognition Research Papers (original) (raw)
In this paper, we propose an approach to the problem of simultaneous shape and refractive index recovery from multispectral polarisation imagery captured from a single viewpoint. The focus of this paper is on dielectric surfaces which... more
In this paper, we propose an approach to the problem of simultaneous shape and refractive index recovery from multispectral polarisation imagery captured from a single viewpoint. The focus of this paper is on dielectric surfaces which diffusely polarise light transmitted from the dielectric body into the air. The diffuse polarisation of the reflection process is modelled using a Transmitted Radiance Sinusoid curve and the Fresnel transmission theory. We provide a method of estimating the azimuth angle of surface ...
US Patent Jim. 12,2001 Sheet 1 of 4 US 6,246,790 Bl dmax FIG. ... US Patent Jim. 12,2001 Sheet 2 of 4 US 6,246,790 Bl n I P1=(xry1) n FIG, 2 ... US Patent Jun. 12, 2001 Sheet 3 of 4 US 6,246,790 Bl o o -° o 2 •S£ 3^ O 70 Image 1, color 1... more
US Patent Jim. 12,2001 Sheet 1 of 4 US 6,246,790 Bl dmax FIG. ... US Patent Jim. 12,2001 Sheet 2 of 4 US 6,246,790 Bl n I P1=(xry1) n FIG, 2 ... US Patent Jun. 12, 2001 Sheet 3 of 4 US 6,246,790 Bl o o -° o 2 •S£ 3^ O 70 Image 1, color 1 Image 1, color2 Image2, colorl FIG. 3
- by Yael Pritch and +2
- •
- Computer Science, Optical Imaging, Virtual Reality, Motion Pictures
Conducting an independent life is probably the most important issue for visually impaired people. In this paper, we suggest a contribution to the solution of this problem using wearable computer technology. We present a visual support... more
Conducting an independent life is probably the most important issue for visually impaired people. In this paper, we suggest a contribution to the solution of this problem using wearable computer technology. We present a visual support system that provides acoustic information about the objects in the surrounding environment, obtained by remotely reading barcode tags sticked on significant objects and surrounding elements, such as doors, windows and so on. The user, walking in an indoor environment, is informed in realtime about the location (direction, distance and pose) of the available objects. Barcode tags deployed in the environment can act as reliable stimuli that trigger local navigation behaviours to achieve global navigation objectives. The proposed system is expected to be useful in the real-time interaction with dynamic environments. To illustrate our work, we introduce a proof-of-concept multimodal, sensorbased application and discuss its implementation and the obtained experimental results.
Image segmentation has traditionally been thought of us a low/mid-level vision process incorporating no high level constraints. However, in complex and uncontrolled environments, such bottom-up strategies have drawbacks that lead to large... more
Image segmentation has traditionally been thought of us a low/mid-level vision process incorporating no high level constraints. However, in complex and uncontrolled environments, such bottom-up strategies have drawbacks that lead to large misclassification rates. Remedies to this situation include taking into account (1) contextual and application constraints, (2) user input and feedback to incrementally improve the performance of the system. We attempt to incorporate these in the context of pipeline segmentation in industrial images. This problem is of practical importance for the 3D reconstruction of factory environments. However it poses several fundamental challenges mainly due to shading. Highlights and textural variations, etc. Our system performs pipe segmentation by fusing methods from physics-based vision, edge and texture analysis, probabilistic learning and the use of the graph-cut formalism
It is well known that this soiling can reduce the generation efficiency in PV system. In some case according to the literature of loss of energy production in photovoltaic systems can reach up to 50%. In the industry there are various... more
It is well known that this soiling can reduce the generation efficiency in PV system. In some case according to the literature of loss of energy production in photovoltaic systems can reach up to 50%. In the industry there are various types of cleaning robots, they can substitute the human action, reducing cleaning cost, be used in places where access is difficult, and increasing significantly the gain of the systems. In this paper we present an application of computer vision method for soiling recognition in photovoltaic modules for autonomous cleaning robots. Our method extends classic CV algorithm such Region Growing and the Hough. Additionally, we adopt a pre-processing technique based on Top Hat and Edge detection filters. We have performed a set of experiments to test and validate this method. The article concludes that the developed method can bring more intelligence to photovoltaic cleaning robots.
Two of the most critical requirements in support of producing reliable face-recognition systems are a large database of facial images and a testing procedure to evaluate systems. The Face Recognition Technology (FERET) program has... more
Two of the most critical requirements in support of producing reliable face-recognition systems are a large database of facial images and a testing procedure to evaluate systems. The Face Recognition Technology (FERET) program has addressed both issues through the FERET database of facial images and the establishment of the FERET tests. To date, 14,126 images from 1199 individuals are included in the FERET database, which is divided into development and sequestered portions. In September 1996, the FERET program administered the third in a series of FERET face-recognition tests. The primary objectives of the third test were to (1) assess the state of the art, (2) identify future areas of research, and (3) measure algorithm performance on large databases
Deep learning's breakthrough in the field of artificial intelligence has resulted in the creation of a slew of deep learning models. One of these is the Generative Adversarial Network, which has only recently emerged. The goal of GAN is... more
Deep learning's breakthrough in the field of artificial intelligence has resulted in the creation of a slew of deep learning models. One of these is the Generative Adversarial Network, which has only recently emerged. The goal of GAN is to use unsupervised learning to analyse the distribution of data and create more accurate results. The GAN allows the learning of deep representations in the absence of substantial labelled training information. Computer vision, language and video processing, and image synthesis are just a few of the applications that might benefit from these representations. The purpose of this research is to get the reader conversant with the GAN framework as well as to provide the background information on Generative Adversarial Networks, including the structure of both the generator and discriminator, as well as the various GAN variants along with their respective architectures. Applications of GANs are also discussed with examples.
In the present study, we propose to implement a new framework for estimating generative models via an adversarial process to extend an existing GAN framework and develop a white-box controllable image cartoonization, which can generate... more
In the present study, we propose to implement a new framework for estimating generative models via an adversarial process to extend an existing GAN framework and develop a white-box controllable image cartoonization, which can generate high-quality cartooned images/videos from real-world photos and videos. The learning purposes of our system are based on three distinct representations: surface representation, structure representation, and texture representation. The surface representation refers to the smooth surface of the images. The structure representation relates to the sparse colour blocks and compresses generic content. The texture representation shows the texture, curves, and features in cartoon images. Generative Adversarial Network (GAN) framework decomposes the images into different representations and learns from them to generate cartoon images. This decomposition makes the framework more controllable and flexible which allows users to make changes based on the required output. This approach overcomes any previous system in terms of maintaining clarity, colours, textures, shapes of images yet showing the characteristics of cartoon images.
The thesis concentrates on computational methods pertaining to ancient ostraca - ink on clay inscriptions, written in Hebrew. These texts originate from the biblical kingdoms of Israel and Judah, and dated to the late First Temple period... more
The thesis concentrates on computational methods pertaining to ancient ostraca - ink on clay inscriptions, written in Hebrew. These texts originate from the biblical kingdoms of Israel and Judah, and dated to the late First Temple period (8th – early 6th centuries BCE). The ostraca are almost the sole remaining epigraphic evidence from the First Temple period and are therefore important for archaeological, historical, linguistic, and religious studies of this era. This “noisy” material offers a fertile ground for the development of various “robust” image analysis, image processing, computer vision and machine learning methods, dealing with the challenging domain of ancient documents’ analysis. The common procedures of modern epigraphers involve manual and labor-intensive steps, facing the risk of unintentionally mixing documentation with interpretation. Therefore, the main goal of this study is establishing a computerized paleographic framework for handling First Temple period epigraphic material. The major research questions, addressed in this thesis are: quality evaluation of manual facsimiles; quality evaluation of ostraca images; automatic binarization of the documents and its subsequent refinement; quality evaluation of binarizations on global and local levels; identification of different writers between inscriptions (two distinct methods are proposed); image segmentation (with improvements over the classical Chan-Vese algorithm); and letters’ shape prior estimation. The developed methods were tested on real-world archaeological and modern data and their results are found to be favorable.
The unsupervised selection and posterior recognition of visual landmarks is a highly valuable perceptual capability for a mobile robot. Recently, in [6], we propose a system that aims to achieve this capability by combining a bottom-up... more
The unsupervised selection and posterior recognition of visual landmarks is a highly valuable perceptual capability for a mobile robot. Recently, in [6], we propose a system that aims to achieve this capability by combining a bottom-up data driven approach with top-down feedback provided by high level semantic representations. The bottom-up approach is based on three main mechanisms: visual attention, area segmentation, and landmark characterization. The top-down feedback is based on two information sources: i) An estimation of the robot position that reduces the searching scope for potential matches with previously selected landmarks, ii) A set of weights that, according to the results of previous recognitions, controls the influence of different segmentation algorithms in the recognition of each landmark. In this paper we explore the benefits of extending our previous work by including a visual tracking step for each of the selected landmarks. Our intuition is that the inclusion o...
Information fusion consists of organizing a set of data for correlation in time, association over collections, and estimation in space. There exist many methods for object tracking and classification; however, video analytics systems... more
Information fusion consists of organizing a set of data for correlation in time, association over collections, and estimation in space. There exist many methods for object tracking and classification; however, video analytics systems suffer from robust methods that perform well in all operating conditions (i.e., scale changes, occlusions, high signal-to-noise ratios, etc.). Challenging scenarios where context can play a role includes: object labeling, track correlation/stitching through dropouts, and activity recognition. In this chapter we propose a novel framework to fuse video data with text data for enhanced simultaneous tracking and identification. The need for such methodology resides in answering user queries, linking information over different collections, and providing meaningful product reports. For example, text data can establish that a pedestrian is crossing the road in a low-resolution video and/or the activity type is the object turning. Together, physics-derived and human-derived fusion (PHF) enhances situation awareness, provides situation understanding, and affords situation assessment. PHF is an example of hard (e.g., video) and soft (i.e., text) data fusion that links Level 5 user refinement to Level 1 object tracking and characterization. A demonstrated example for multimodal text and video sensing is shown where context provides the means for associating the multimode data aligned in space and time.