Object Detection Research Papers - Academia.edu (original) (raw)
Radiographic inspection is a reliable non-destructive test, widely used for integrity evaluation of structures and equipments. Nowadays, high quality images with very accurate resolutions have been supported by modern digital radiographic... more
Radiographic inspection is a reliable non-destructive test, widely used for integrity evaluation of structures and equipments. Nowadays, high quality images with very accurate resolutions have been supported by modern digital radiographic systems. However, the image analysis for internal defect detection and geometric characterization is still a not totally automated task. The main reason is that image analysis is usually a very complex task, which involves heuristic decisions based on experiences, as object detection and recognition. For that reason, a new automatic radiographic image analysis system was developed in order to identify important components or component parts, which must be inspected separately, as weld joints, pipe walls, pipe wall thicknesses, valves and mechanical parts. The developed methodology involves the use of a genetic algorithm search to find desirable patterns on the image. Image indexing procedures are used for a final verification process. As a result, the system offers quick and correct answers and also flexibility to be applied in others applications.
For many audiovisual applications, the integration and synchronization of audio and video signals is essential. The objective of this paper is to develop a system that displays the active objects in the captured video signal, integrated... more
For many audiovisual applications, the integration and synchronization of audio and video signals is essential. The objective of this paper is to develop a system that displays the active objects in the captured video signal, integrated with their respective audio signals in the form of text. The video and audio signals are captured and processed separately. The signals are buffered and integrated and synchronized using a time-stamping technique. Time-stamps provide the timing information for each of the audio and video processes, the speech recognition and the object detection, respectively. This information is necessary to correlate the audio packets to the video frames. Hence, integration is achieved without the use of video information, such as lip movements. The results obtained are based on a specific implementation of the speech recognition module, which is determined to be the bottleneck process in the proposed system.
Extraction of anatomical structures (landmarks), such as optic disk (OD), fovea and blood vessels, from fundus images is useful in automatic diagnosis. Current approaches largely use spatial relationship among the landmarks' position for... more
Extraction of anatomical structures (landmarks), such as optic disk (OD), fovea and blood vessels, from fundus images is useful in automatic diagnosis. Current approaches largely use spatial relationship among the landmarks' position for detection. In this paper, we present an appearance-based method for detecting fovea and OD from colour images. The strategy used for detection is based on improving the local contrast which is achieved by combining information from two spectral channels of the given image. The proposed method has been successfully tested on different datasets and the results show 96% detection for fovea and 91% detection for OD (a total of 502 and 531 images for fovea and OD are taken respectively).
The ability to filter improper content from multimedia sources based on visual content has important applications, since text-based filters are clearly insufficient against erroneous and/or malicious associations between text and actual... more
The ability to filter improper content from multimedia sources based on visual content has important applications, since text-based filters are clearly insufficient against erroneous and/or malicious associations between text and actual content. In this paper, we investigate a method for detection of nudity in videos based on a bag-of-visual-features representation for frames and an associated voting scheme. Bag-of-Visual-Features (BoVF) approaches have been successfully applied to object recognition and scene classification, showing robustness to occlusion and also to the several kinds of variations that normally curse object detection methods. To the best of our knowledge, only two proposals in the literature use BoVF for nude detection in still images, and no other attempt has been made at applying BoVF for videos. Nevertheless, the results of our experiments show that this approach is indeed able to provide good recognition rates for nudity even at the frame level and with a relatively low sampling ratio. Also, the proposed voting scheme significantly enhances the recognition rates for video segments, achieving, in the best case, a value of 93.2% of correct classification, using a sampling ratio of 1/15 frames. Finally, a visual analysis of some particular cases indicates possible sources of misclassifications.
Abstract: Each mobile phone with a built-in CMOS sensor can inherently be seen as sophisticated optical sensor being able to analyze its environment in terms of visual events and its own mobility. Due to mass production their price... more
Abstract: Each mobile phone with a built-in CMOS sensor can inherently be seen as sophisticated optical sensor being able to analyze its environment in terms of visual events and its own mobility. Due to mass production their price decreases steadily, although their processing capacity increases. Mobile phones are usually attached to people, who are driven by mobility. We define activities arising from this mobility as internal activities in contrast to external activities, that are caused by visual events. Both activities can be recognized by measuring the sensor’s optical flow. We present a method to identify internal activities based on optical flow measurements and probabilistic reasoning. We implement a lifelogging application, running on a Linux-based mobile phone, that can detect internal activities such as moving left-hand, right-hand or walking with a recognition rate of 80%. While standing still external activities are recognized using object detection. 1
Video-frame-rate millimetre-wave imaging has recently been demonstrated with a quality similar to that of a low-quality uncooled thermal imager. In this paper we will discuss initial investigations into the transfer of image processing... more
Video-frame-rate millimetre-wave imaging has recently been demonstrated with a quality similar to that of a low-quality uncooled thermal imager. In this paper we will discuss initial investigations into the transfer of image processing algorithms from more mature imaging modalities to ...
A series of recent findings strengthened the case for the preventive wearing of masks. Face masks are very important utility in eliminating spread of Covid-19 by blocking droplets. The prevailing logical sentiment says covers are... more
A series of recent findings strengthened the case for the preventive wearing of masks. Face masks are very important utility in eliminating spread of Covid-19 by blocking droplets. The prevailing logical sentiment says covers are extremely helpful, and even moderately basic home-caused veils to can offer an extraordinary level of insurance against the novel coronavirus. In offices, schools, colleges, industries it is quiet difficult to monitor whether people are wearing masks or not. In such scenarios deep learning could play vital role in identifying whether people are wearing masks or not and identify them. The major aim and objective of this study is to explore the possibility of the computerized detection of mask or spectacles kind of objects and implementing a convolutional neural network using supervised machine learning to reliably detect the presence of mask or spectacle in input images.
Shadow detection is an important aspect of foreground/background classification. Many techniques exist, most of them assuming that only the intensity changes under shadow. In this paper we show that in most practical indoor and outdoor... more
Shadow detection is an important aspect of foreground/background classification. Many techniques exist, most of them assuming that only the intensity changes under shadow. In this paper we show that in most practical indoor and outdoor situations there will also be a color shift. We propose an algorithm for estimating this color shift from the images, and using it to remove shadow pixels. The proposed algorithm is compared experimentally to an existing algorithm using real image sequences. Results show a significant improvement of performance.
There are numerous applications of unmanned aerial vehicles (UAVs) in the management of civil infrastructure assets. A few examples include routine bridge inspections, disaster management, power line surveillance and traffic surveying. As... more
There are numerous applications of unmanned aerial vehicles (UAVs) in the management of civil infrastructure assets. A few examples include routine bridge inspections, disaster management, power line surveillance and traffic surveying. As UAV applications become widespread, increased levels of autonomy and independent decision-making are necessary to improve the safety, efficiency, and accuracy of the devices. This paper details the procedure and parameters used for the training of convolutional neural networks (CNNs) on a set of aerial images for efficient and automated object recognition. Potential application areas in the transportation field are also highlighted. The accuracy and reliability of CNNs depend on the network's training and the selection of operational parameters. This paper details the CNN training procedure and parameter selection. The object recognition results show that by selecting a proper set of parameters, a CNN can detect and classify objects with a high level of accuracy (97.5%) and computational efficiency. Furthermore, using a convolutional neural network implemented in the "YOLO" ("You Only Look Once") platform, objects can be tracked, detected ("seen"), and classified ("comprehended") from video feeds supplied by UAVs in real-time.
Blindness is one of the world’s biggest issues. According to the recent data of World Health Organization (WHO),39 million of world’s population is blind. World Health Organization (WHO) developed many polices, techniquesand strategies to... more
Blindness is one of the world’s biggest issues. According to the recent data of World Health Organization (WHO),39 million of world’s population is blind. World Health Organization (WHO) developed many polices, techniquesand strategies to prevent blindness. Another organization is developed with the name World Access for the Blindunder the leadership of Daniel Kish who is also a blind person but uses a technique of echolocation called flashsonar. Our idea is based on sensory substitution in which one sensory loss is substituted by the other. We proposedan innovative rehabilitation prototype device that uses a sonar sensor and a bone conduction speaker to performand enhance the ability of human echolocation. The rehabilitation period can be longer and can vary from person toperson. Furthermore we have also installed a smart system that can perform object detection and image to speechsynthesis using Python with the help of smartphone camera during a live video streaming. Hence, this p...
Communication is the method of sharing or exchanging information, ideas or feelings. To have a communication between two people, both of them need to have knowledge and understanding of a common language. But in the case of deaf and dumb... more
Communication is the method of sharing or exchanging information, ideas or feelings. To have a communication between two people, both of them need to have knowledge and understanding of a common language. But in the case of deaf and dumb people, the means they use for communicating is different from that of normal people. Deaf is not able to hear and dumb is not able to speak. They communicate using sign language among themselves and with normal people but normal people don't take seriously the importance of sign languages. Not everyone has the knowledge and understanding of sign language which make the communication difficult between a normal person and a deaf and dumb person. For overcoming this barrier,a model can be build based on machine learning. A model can be trained to recognize different gestures of sign language and translate them into English language. This will help a lot of people in communicating with deaf and dumb people with ease. A real time ML based system is built for the real time sign language detection with TensorFlow object detection in this paper. The major purpose of this project is to build a system for the differently abled people to communicate with others easily and efficiently.
It's a known fact that estimated number of blind persons in the world is about 285 million, approximately equal to the 20% of the Indian Population. They are mostly dependent on someone for even accessing their basic daily needs. In our... more
It's a known fact that estimated number of blind persons in the world is about 285 million, approximately equal to the 20% of the Indian Population. They are mostly dependent on someone for even accessing their basic daily needs. In our project, we used TensorFlow, it's a new library from Google. TensorFlow model our neural networks. The TensorFlow Object Detection API is used to detect many objects. We have Introduce an algorithm (SSD). SSD uses a similar phase while training, to match the appropriate anchor box with the bounding boxes of each ground truth object within an image. Essentially, the anchor box with the highest degree of flap with an object is responsible for predicting that object's class and its location. It has microcontroller which has wi-fi inbuilt module. This guide is convenient and offers data to the client to move around in new condition, regardless of whether indoor or open air, through an ease to use interface. Then again, and so as to lessen route challenges of the visually impaired, a deterrent location framework utilizing ultrasounds is added to this gadget. The proposed framework identifies the closest hindrance through ultrasonic sensors and it gives an alert to illuminate the visually impaired about its confinement.
Uninterrupted power supply to electric power consumers has increasingly become a global necessity. Monitoring the health of distribution network is crucial to provide quality service. Traditional monitoring methods based on on-site... more
Uninterrupted power supply to electric power consumers has increasingly become a global necessity. Monitoring the health of distribution network is crucial to provide quality service. Traditional monitoring methods based on on-site patrols to detect faults have increasingly become labor-intensive and time-consuming, raising demand for new and more efficient techniques. To address this issue, we propose faster-RCNN by MXNet for both detection and classification tasks. We utilize convolutional neural network (CNN) for detecting and classifying both insulator components and faulty insulator discs from images captured on overhead electric power transmission systems. Using a dataset of images acquired through UAV (unmanned aerial vehicle) captures, detection and classification is dealt with by dividing the picture content of the training set into three classes: background, insulator and the defective part of insulator. We achieve target insulator recognition and positioning with impressive precision compared to other traditional technologies. Our work could have practical integrated implementation solutions for automated inspection of overhead transmission power line insulators.
An abandoned object detection system is presented and evaluated using benchmark datasets. The detection is based on a simple mathematical model and works efficiently at QVGA resolution at which most CCTV cameras operate. The... more
An abandoned object detection system is presented and evaluated using benchmark datasets. The detection is based on a simple mathematical model and works efficiently at QVGA resolution at which most CCTV cameras operate. The pre-processing involves a dual-time background subtraction algorithm which dynamically updates two sets of background, one after a very short interval (less than half a second) and the other after a relatively longer duration. The framework of the proposed algorithm is based on the Approximate Median model. An algorithm for tracking of abandoned objects even under occlusion is also proposed. Results show that the system is robust to variations in lighting conditions and the number of people in the scene. In addition, the system is simple and computationally less intensive as it avoids the use of expensive filters while achieving better detection results.
In this paper, a vision-based system for underwater object detection is presented. The system is able to detect automatically a pipeline placed on the sea bottom, and some objects, e.g. trestles and anodes, placed in its neighborhoods. A... more
In this paper, a vision-based system for underwater object detection is presented. The system is able to detect automatically a pipeline placed on the sea bottom, and some objects, e.g. trestles and anodes, placed in its neighborhoods. A color compensation procedure has been introduced in order to reduce problems connected with the light attenuation in the water.
The objective of object recognition algorithms in computer vision is to quantify the presence or absence of a certain class of objects, for e.g.: bicycles, cars, people, etc. which is highly useful in traffic estimation applications.... more
The objective of object recognition algorithms in computer vision is to quantify the presence or absence of a certain class of objects, for e.g.: bicycles, cars, people, etc. which is highly useful in traffic estimation applications. Sparse signal models and dictionary learning techniques can be utilized to not only classify images as belonging to one class or another, but also to detect the case when two or more of these classes co-occur with the help of augmented dictionaries. We present results comparing the classification accuracy when different image classes occur together. Practical scenarios where such an approach can be applied include forms of intrusion detection i.e., where an object of class B should not co-occur with objects of class A. An example is when there are bicyclists riding on prohibited sidewalks, or a person is trespassing a hazardous area. Mixed class detection in terms of determining semantic content can be performed in a global manner on downscaled versions of images or thumbnails. However to accurately classify an image as belonging to one class or the other, we resort to higher resolution images and localized content examination. With the help of blob tracking we can use this classification method to count objects in traffic videos. The method of feature extraction illustrated in this paper is highly suited to images obtained in practical cases, which are usually of poor quality and lack enough texture for the popular gradient based methods to produce adequate feature points. We demonstrate that by training different types of dictionaries appropriately, we can perform various tasks required for traffic monitoring.
Colorectal polyps are abnormal growths of tissue in the colon surface that have the potential to evolve into colorectal cancer. High survival rates in colon cancer can be achieved when it is detected in early stages, when polyps are... more
Colorectal polyps are abnormal growths of tissue in the colon surface that have the potential to evolve into colorectal cancer. High survival rates in colon cancer can be achieved when it is detected in early stages, when polyps are identified and removed before developing into cancer. The primary method for colon cancer screening is the endoscopic examination of the colon, known as colonoscopy. But some colons might not be detected during colonoscopy due to human errors. This work aims to detect these polyps by making use of state-of-the-art object detection networks like SSD.
The key obstacle to communicating images over wireless sensor networks has been the lack of suitable processing architecture and communication strategies to deal with the large volume of data. High packet error rates and the need for... more
The key obstacle to communicating images over wireless sensor networks has been the lack of suitable processing architecture and communication strategies to deal with the large volume of data. High packet error rates and the need for retransmission make it inefficient in terms of energy and bandwidth. This paper presents novel architecture and protocol for energy efficient image processing and communication over wireless sensor networks. Practical results show the effectiveness of these approaches to make image communication over wireless sensor networks feasible, reliable and efficient.
Object detection is one of the most important preprocessing steps in object recognition and identification systems. This can be done by searching and indexing still image or video containing object in various size, position and... more
Object detection is one of the most important preprocessing steps in object recognition and identification systems. This can be done by searching and indexing still image or video containing object in various size, position and background. This paper deals the beverage cans detection system on the moving conveyor belt using combination of sharpening and edge detection method. Experimental results show that the accuracy of the cans detection system is strongly enough which is based on the quality and quantity of the data used in the database. It also shows that the sharpening and edge detection algorithm improves the detecting speed significantly.
One of the problems that often arise during the application of medical research to real life is the high number of false positive cases. This situation causes experts to be warned with false alarms unnecessarily and increases their... more
One of the problems that often arise during the application of medical research to real life is the high number of false positive cases. This situation causes experts to be warned with false alarms unnecessarily and increases their workload. This study proposes a new data centric approach to reduce bias-based false positive predictions in brain MRI-specific medical object detection applications. The proposed method has been tested using two different datasets: Gazi Brains 2020 and BraTS 2020, and three different deep learning-based object detection models: Mask R-CNN, YOLOv5, and EfficientDet. According to the results, the proposed pipeline outperformed the classical pipeline, up to 18% on the Gazi Brains 2020 dataset, and up to 24% on the BraTS 2020 dataset for mean specificity value without much change in sensitivity metric. It means that the proposed pipeline reduces false positive rates due to bias in real-life applications and it can help to reduce the workload of experts.
This paper focuses on feature selection for problems dealing with high-dimensional data. We discuss the benefits of adopting a regularized approach with L 1 or L 1–L 2 penalties in two different applications—microarray data analysis in... more
This paper focuses on feature selection for problems dealing with high-dimensional data. We discuss the benefits of adopting a regularized approach with L 1 or L 1–L 2 penalties in two different applications—microarray data analysis in computational biology and object detection in computer vision. We describe general algorithmic aspects as well as architecture issues specific to the two domains. The very promising results obtained show how the proposed approach can be useful in quite different fields of application.
According to statistics released by WHO, 16.6% of the world population is suffering from vision impairment. In the past years, many endeavours have been done to develop several devices to provide support to the visually blind and enhance... more
According to statistics released by WHO, 16.6% of the world population is suffering from vision impairment. In the past years, many endeavours have been done to develop several devices to provide support to the visually blind and enhance the quality of their lives by making them capable. This paper discusses various applications being used in assistive technologies for the visually impaired. These applications focus on using different deep learning algorithms, various sensors and text-to-speech output to the users and are available for various platforms such as Android, iOS. The idea of writing this paper is to concisely review the recent progress of the technologies being used in this topic and provide a clear understanding to other researchers who wish to develop further applications in this field.
OIS is a new Optical Information System for road traffic observation and management. The complete system architecture from the sonsor for automatic traffic detection up to the traffic light management for a wide area is designed under the... more
OIS is a new Optical Information System for road traffic observation and management. The complete system architecture from the sonsor for automatic traffic detection up to the traffic light management for a wide area is designed under the requirements of an interlligent transportation system. Particular features of this system are the vision sensors with intergrated computational and real-time capabilities, real-tim algorithms for image processing and a new approach for dynamic traffic light management for a single intersection as well as for a wide area. The developed real-time algorithms for image processing extract traffic data even at night and under bad weather conditions. This approach opens the opportunity to identify and specify each traffic object, its location, its speed and other important object information. Furthermore the algorithms are able to identify accidents, and non-motorized traffic like pedestrians and bicyclists. Combining all these single information the syst...
Road traffic accidents are being recognised as a major problem in developing countries. A mechanism to prevent these road accidents is the need of the hour. It is hoped that the mechanism which we introduced can prevent accidents... more
Road traffic accidents are being recognised as a major problem in developing countries. A mechanism to prevent these road accidents is the need of the hour. It is hoped that the mechanism which we introduced can prevent accidents occurring due to overtaking. In order to accomplish the task, we use a camera which is fixed on the rear view mirror of our vehicle which collects the footage of the opposite lane up to 100m. This footage is used to detect any obstacle present using real time computer vision. If an obstacle is present, driver will be alarmed so. This will ensure overtaking only at safe conditions
We propose a novel template matching approach for the discrimination of handwritten and machine-printed text. We first pre-process the scanned document images by performing denoising, circles/lines exclusion and word-block level... more
We propose a novel template matching approach for the discrimination of handwritten and machine-printed text. We first pre-process the scanned document images by performing denoising, circles/lines exclusion and word-block level segmentation. We then align and match characters in a flexible sized gallery with the segmented regions, using parallelised normalised cross-correlation. The experimental results over the Pattern Recognition & Image Analysis Research Lab-Natural History Museum (PRImA-NHM) dataset show remarkably high robustness of the algorithm in classifying cluttered, occluded and noisy samples, in addition to those with significant high missing data. The algorithm, which gives 84.0% classification rate with false positive rate 0.16 over the dataset, does not require training samples and generates compelling results as opposed to the training-based approaches, which have used the same benchmark.
Convolution Neural Network (CNN)-based object detection models have achieved unprecedented accuracy in challenging detection tasks. However, existing detection models (detection heads) trained on 8-bits/pixel/channel low dynamic range... more
Convolution Neural Network (CNN)-based object detection models have achieved unprecedented accuracy in challenging detection tasks. However, existing detection models (detection heads) trained on 8-bits/pixel/channel low dynamic range (LDR) images are unable to detect relevant objects under lighting conditions where a portion of the image is either underexposed or overexposed. Although this issue can be addressed by introducing High Dynamic Range (HDR) content and training existing detection heads on HDR content, there are several major challenges, such as the lack of real-life annotated HDR dataset(s) and extensive computational resources required for training and the hyper-parameter search. In this paper, we introduce an alternative backwards-compatible methodology to detect objects in challenging lighting conditions using existing CNN-based detection heads. This approach facilitates the use of HDR imaging without the immediate need for creating annotated HDR datasets and the associated expensive retraining procedure. The proposed approach uses HDR imaging to capture relevant details in high contrast scenarios. Subsequently, the scene dynamic range and wider colour gamut are compressed using HDR to LDR mapping techniques such that the salient highlight, shadow, and chroma details are preserved. The mapped LDR image can then be used by existing pre-trained models to extract relevant features required to detect objects in both the underexposed and overexposed regions of a scene. In addition, we also conduct an evaluation to study the feasibility of using existing HDR to LDR mapping techniques with existing detection heads trained on standard detection datasets such as PASCAL VOC and MSCOCO. Results show that the images obtained from the mapping techniques are suitable for object detection, and some of them can significantly outperform traditional LDR images. INDEX TERMS High dynamic range (HDR), low dynamic range (LDR), object detection, faster RCNN, SSD, R-FCN.
We present an approach to color image segmentation by applying it to recognition and vectorization of geo-images (satellite, cartographic). This is a simultaneous segmentation-recognition system when segmented geographical objects of... more
We present an approach to color image segmentation by applying it to recognition and vectorization of geo-images (satellite, cartographic). This is a simultaneous segmentation-recognition system when segmented geographical objects of interest (alphanumeric, punctual, linear, and area) are labeled by the system in same, but are different for each type of objects, gray-level values. We exchange the source image by a number of simplified images. These images are called composites. Every composite image is associated with certain image feature. Some of the composite images that contain the objects of interest are used in the following object detection-recognition by means of association to the segmented objects corresponding "names" from the user-defined subject domain. The specification of features and object names associated with perspective composite representations is regarded as a type of knowledge domain, which allows automatic or interactive system's learning. The results of gray-level and color image segmentation-recognition and vectoriztion are shown.
This paper describes a visual object detection framework that is capable of processing images extremely rapidly while achieving high detection rates. There are three key contributions. The first is the introduction of a new image... more
This paper describes a visual object detection framework that is capable of processing images extremely rapidly while achieving high detection rates. There are three key contributions. The first is the introduction of a new image representation called the "Integral Image" which allows the features used by our detector to be computed very quickly. The second is a learning algorithm, based on AdaBoost, which selects a small number of critical visual features and yields extremely efficient classifiers . The third contribution is a method for combining classifiers in a "cascade" which allows background regions of the image to be quickly discarded while spending more computation on promising object-like regions. A set of experiments in the domain of face detection are presented. The system yields face detection performace comparable to the best previous systems . Implemented on a conventional desktop, face detection proceeds at 15 frames per second.
Forward Collision Avoidance (FCA) systems in automobiles is an essential part of Advanced Driver Assistance System (ADAS) and autonomous vehicles. These devices currently use, radars as the main sensor. The increasing resolution of camera... more
Forward Collision Avoidance (FCA) systems in automobiles is an essential part of Advanced Driver Assistance System (ADAS) and autonomous vehicles. These devices currently use, radars as the main sensor. The increasing resolution of camera sensors, processing capability of hardware chipsets and advances in image processing algorithms, have been pushing the camera based features recently. Monocular cameras face the challenge of accurate scale estimation which limits it use as a stand-alone sensor for this application. This paper proposes an efficient system which can perform multi scale object detection which is being patent granted and efficient 3D reconstruction using structure from motion (SFM) framework. While the algorithms need to be accurate it also needs to operate real time in low cost embedded hardware. The focus of the paper is to discuss how the proposed algorithms are designed in such a way that it can be provide real time performance on low cost embedded CPU's which makes use of only Digital Signal processors (DSP) and vector processing cores.
Underwater Acoustic methods have been extensively used to locate and identify marine objects. These applications include locating underwater vehicles, finding shipwrecks, imaging sediments and imaging bubble fields. Ocean is fairly... more
Underwater Acoustic methods have been extensively used to locate and identify marine objects. These applications include locating underwater vehicles, finding shipwrecks, imaging sediments and imaging bubble fields. Ocean is fairly transparent to sound and opaque to all other sources of radiation. Acoustics technology is the most effective tool for monitoring this environment because of the sound's ability to propagate long distance in water. We used single beam echo sounder to discriminate underwater objects. Development of the algorithm and applied it to detect and quantify underwater object such as fish, sea grass, and seabed. We found the detected target has different backscatter value.
The dimensional analysis of an object from an image reduces a lot of burden for a user, like the traditional measuring tape method. Using the dimensions will make reconstruction of the 3D model of the real-time object easier. However,... more
The dimensional analysis of an object from an image reduces a lot of burden for a user, like the traditional measuring tape method. Using the dimensions will make reconstruction of the 3D model of the real-time object easier. However, this method is not used in the current implementation. Dimensional analysis can also be helpful in online shopping where the user's availability for fitting is not possible. 3D model replaces the fitting stage in online shopping. Once the dimensions of an object's surface are found, it is easy to calculate surface areas, given surface areas we can calculate volume. But the calculation of volume requires more than one dimension of the object. In this paper, an approach using a reference object, whose real-time dimensions are already known is used. The whole process is divided into three tasks-Object detection using SURF algorithm, Dimensional Analysis of the 2D object using pixel per metric ratio given that there is a reference object on the same plane and 3D reconstruction using Structure from Motion algorithm.
The complicated nature of interior construction works makes the detailed progress monitoring challenging. Current interior construction progress monitoring methods involve submission of periodic reports and are constrained by their... more
The complicated nature of interior construction works makes the detailed progress monitoring challenging. Current interior construction progress monitoring methods involve submission of periodic reports and are constrained by their reliance on manually intensive processes and limited support for recording visual information. Recent advances in image-based visualization techniques enable reporting construction progress using interactive and visual approaches. However, analyzing significant amounts of as-built construction photographs requires sophisticated techniques. To overcome limitations of existing approaches, this research focuses on visualization and computer vision techniques to monitor detailed interior construction progress using an object-based approach. As-planned 3D models from Building Information Modeling (BIM) and as-built photographs are visualized and compared in a walk-through model. Within such an environment, the as-built interior construction objects are decomposed to automatically generate the status of construction progress. This object-based approach introduces an advanced model that enables the user to have a realistic understanding of the interior construction progress.
A robust face detection technique along with mouth localization, processing every frame in real time (video rate), is presented. Moreover, it is exploited for motion analysis onsite to verify "liveness" as well as to achieve lip reading... more
A robust face detection technique along with mouth localization, processing every frame in real time (video rate), is presented. Moreover, it is exploited for motion analysis onsite to verify "liveness" as well as to achieve lip reading of digits. A methodological novelty is the suggested quantized angle features ("quangles") being designed for illumination invariance without the need for preprocessing (e.g., histogram equalization). This is achieved by using both the gradient direction and the double angle direction (the structure tensor angle), and by ignoring the magnitude of the gradient. Boosting techniques are applied in a quantized feature space. A major benefit is reduced processing time (i.e., that the training of effective cascaded classifiers is feasible in very short time, less than 1 h for data sets of order 10 4 ). Scale invariance is implemented through the use of an image scale pyramid. We propose "liveness" verification barriers as applications for which a significant amount of computation is avoided when estimating motion. Novel strategies to avert advanced spoofing attempts (e.g., replayed videos which include person utterances) are demonstrated. We present favorable results on face detection for the YALE face test set and competitive results for the CMU-MIT frontal face test set as well as on "liveness" verification barriers.
Waste management is one of the major problems of the modern cities. Increasing the effectiveness of the recycling processes helps to reduce the amount of the waste needed to store, to improve the quality of life in the city, and to... more
Waste management is one of the major problems of the modern cities. Increasing the effectiveness of the recycling processes helps to reduce the amount of the waste needed to store, to improve the quality of life in the city, and to preserve economical values. Modern neural network-based object detection methods are used to automatize the recyclable waste segregation. In this paper, we proposed a novel waste localization and classification method that works on video streams of unconditional environments. The method uses a variant of Inception architecture and faster R-CNN algorithm to detect the recyclable items. Spatio-temporal information is used to enhance the classification accuracy on varying illumination and backgrounds by adopting an object tracking algorithm. The experiments showed that the proposed method achieves promising results on difficult scenes and different kinds of recyclable objects.
Nowadays, the possibilities offered by state-of-the-art deep neural networks allow the creation of systems capable of recognizing and indexing visual content with very high accuracy. Performance of these systems relies on the availability... more
Nowadays, the possibilities offered by state-of-the-art deep neural networks allow the creation of systems capable of recognizing and indexing visual content with very high accuracy. Performance of these systems relies on the availability of high quality training sets, containing a large number of examples (e.g. million), in addition to the the machine learning tools themselves. For several applications, very good training sets can be obtained, for example, crawling (noisily) annotated images from the internet, or by analyzing user interaction (e.g.: on social networks). However, there are several applications for which high quality training sets are not easy to be obtained/created. Consider, as an example, a security scenario where one wants to automatically detect rarely occurring threatening events. In this respect, recently, researchers investigated the possibility of using a visual virtual environment, capable of artificially generating controllable and photo-realistic contents, to create training sets for applications with little available training images. We explored this idea to generate synthetic photo-realistic training sets to train classifiers to recognize the proper use of individual safety equipment (e.g.: worker protection helmets, high-visibility vests, ear protection devices) during risky human activities. Then, we performed domain adaptation to real images by using a very small image data set of real-world photographs. We show that training with the generated synthetic training set and using the domain adaptation step is an effective solution to address applications for which no training sets exist.
- by Fabrizio Falchi and +1
- •
- Machine Learning, Safety, Deep Learning, Virtual Words
Moving object detection in video applications is usually performed based on techniques such as background subtraction, optical flow and temporal differencing. The most popular literature technique approach to detect moving object from... more
Moving object detection in video applications is usually performed based on techniques such as background subtraction, optical flow and temporal differencing. The most popular literature technique approach to detect moving object from video sequences is background subtraction. This approach utilized mathematical model of static background and comparing it with every new frame of video sequence. In this paper, background subtraction technique using Mixture of Gaussian (MoG) method is conducted for detection of moving object at outdoor environment. Focus is specified at the five parameters of MoG namely background component weight threshold (T S ), standard deviation scaling factor (D), user-define learning rate (α), Total number of Gaussian components (K) and Maximum number of components M in the background model (M) to give significant impact in producing the optimize background subtraction process. Experimental results showed that by varying each of the parameter can produce acceptable results that enable us to propose suitable parameter range of each parameter for detection of moving object in an outdoor environment.
In today's world, the need for autonomous robots is increasing at an exponential rate and the implementation of Simultaneous Localisation And Mapping (SLAM) is gaining more and more attention. One of the major component of SLAM is... more
In today's world, the need for autonomous robots is increasing at an exponential rate and the implementation of Simultaneous Localisation And Mapping (SLAM) is gaining more and more attention. One of the major component of SLAM is 3D Mapping of the environment which enables autonomous robots to perceive the environment like a human does for which many Depth cameras or RGB-D cameras prove useful. This paper proposes a continuous real-time 3D mapping system that tackles the long existing problem of point cloud distortion induced by dynamic objects in the frame. Our method uses the Microsoft Kinect V1 as the RGB-D camera and the packages in the Robotic Operating System (ROS) like the Real Time Appearance Based Mapping (RTAB-map) for 3D reconstruction. A ROS based method is used to implement dynamic object elimination in real-time. For the purpose of dynamic objects detection in the frame, two algorithms-Deep Learning based tiny YOLO-v3 and a Machine Learning based Haar Cascade classifier are used. The results from the two are compared in terms of accuracy, execution time and mean Average Precision (mAP) and it was inferred that although Haar Cascade model is comparatively less accurate when detecting objects, it is two times faster than YOLO which makes the system more real-time. The real-time implementation was given more preference while selecting the model.
Wrong-way driving is one of the main causes of road accidents and traffic jam all over the world. By detecting wrong-way vehicles, the number of accidents can be minimized and traffic jam can be reduced. With the increasing popularity of... more
Wrong-way driving is one of the main causes of road accidents and traffic jam all over the world. By detecting wrong-way vehicles, the number of accidents can be minimized and traffic jam can be reduced. With the increasing popularity of real-time traffic management systems and due to the availability of cheaper cameras, the surveillance video has become a big source of data. In this paper, we propose an automatic wrong-way vehicle detection system from on-road surveillance camera footage. Our system works in three stages: the detection of vehicles from the video frame by using the You Only Look Once (YOLO) algorithm, track each vehicle in a specified region of interest using centroid tracking algorithm and detect the wrong-way driving vehicles. YOLO is very accurate in object detection and the centroid tracking algorithm can track any moving object efficiently. Experiment with some traffic videos shows that our proposed system can detect and identify any wrong-way vehicle in different light and weather conditions. The system is very simple and easy to implement.
- by Zillur Rahman and +2
- •
- Computer Vision, Machine Learning, Deep Learning, Object Detection
Collective behaviours are varied yet ubiquitous both within groups of biological and robotic agents, with a vast amount of these behaviours requiring that the agents can recognise their own kind. Presented in this dissertation are two... more
Collective behaviours are varied yet ubiquitous both within groups of biological and robotic agents, with a vast amount of these behaviours requiring that the agents can recognise their own kind. Presented in this dissertation are two vision algorithms that have been implemented in a biomimetic robot named MiRo, in order that MiRo may recognise another MiRo using its two front facing cameras. The first vision algorithm utilises the histograms of a greyscale image space and inputs them into a perceptron neural network that is pre-trained off-board. The second vision algorithm adopts the SURF process combined with Bayesian inference for estimating a probability of an image space containing a MiRo. Vision algorithm 1 achieves satisfactory classification rates of 90.1%, whilst algorithm 2 manages to compute significant probabilities correctly 87.8% of the time. The most contrasting difference between the two algorithms is the relative time taken to run each, with algorithm 1 performing 313 times faster than algorithm 2, due predominantly to algorithm 2's long computations when solving for Bayes' inference.
Subsequent development was then conducted for a MiRo-MiRo following strategy, with the vision algorithms enabling MiRo's to follow each other. This success proved collective behaviours requiring MiRo recognition are now feasible.
Efficient and accurate object detection has been an important topic in the advancement of computer vision systems. With the advent of machine learning and deep learning techniques, the accuracy for object detection has increased... more
Efficient and accurate object detection has been an important topic in the advancement of computer vision systems. With the advent of machine learning and deep learning techniques, the accuracy for object detection has increased drastically. The project aims to incorporate state-of-the-art technique for object detection with the goal of achieving high accuracy with a real-time performance. In this project, we use a completely machine learning with opencv and deep learning based approach to solve the problem of object detection in an end-to-end fashion. The network is trained on the most challenging publicly available dataset, on which a object detection challenge is conducted annually. The resulting system is fast and accurate, thus aiding those applications which require object detection.
This paper contains the details of different object detection (OD) techniques, object identification's relationship with video investigation, and picture understanding, it has pulled in much exploration consideration as of late. Customary... more
This paper contains the details of different object detection (OD) techniques, object identification's relationship with video investigation, and picture understanding, it has pulled in much exploration consideration as of late. Customary item identification strategies are based on high-quality highlights and shallow teachable models. This survey paper presents one such strategy which is named as Optical Flow method (OFM). This strategy is discovered to be stronger and more effective for moving item recognition and the equivalent has been appeared by an investigation in this review paper. Applying optical stream to a picture gives stream vectors of the focuses comparing to the moving items. Next piece of denoting the necessary moving object of interest checks to the post-preparing. Post handling is the real commitment of the review paper for moving item identification issues. Their presentation effectively deteriorates by developing complex troupes which join numerous low-level picture highlights with significant level setting from object indicators and scene classifiers. With the fast advancement in profound learning, all the more useful assets, which can learn semantic, significant level, further highlights, are acquainted with address the issues existing in customary designs. These models carry on contrastingly in network design, preparing system, and advancement work, and so on in this review paper, we give an audit on profound learning-based item location systems. Our survey starts with a short presentation on the historical backdrop of profound learning and its agent device, in particular, Convolutional Neural Network (CNN) and region-based convolutional neural networks (R-CNN).
The goal of this article is to review the state-of-the-art tracking methods, classify them into different cate-gories, and identify new trends. Object tracking, in general, is a challenging problem. Difficulties in tracking objects can... more
The goal of this article is to review the state-of-the-art tracking methods, classify them into different cate-gories, and identify new trends. Object tracking, in general, is a challenging problem. Difficulties in tracking objects can arise due to abrupt object motion, changing appearance patterns ...
It is a report of the observation done on previous sign language detection system.
- by Abhijeet Anil and +1
- •
- Machine Learning, Python Programming, Object Detection
The detection of moving objects in a video sequence is an essential step in almost all the systems of vision by computer. However, because of the dynamic change in natural scenes, the detection of movement becomes a more difficult task.... more
The detection of moving objects in a video sequence is an essential step in almost all the systems of vision by computer. However, because of the dynamic change in natural scenes, the detection of movement becomes a more difficult task. In this work, we propose a new method for the detection moving objects that is robust to shadows, noise and illumination changes. For this purpose, the detection phase of the proposed method is an adaptation of the MOG approach where the foreground is extracted by considering the HSV color space. To allow the method not to take shadows into consideration during the detection process, we developed a new shade removal technique based on a dynamic thresholding of detected pixels of the foreground. The calculation model of the threshold is established by two statistical analysis tools that take into account the degree of the shadow in the scene and the robustness to noise. Experiments undertaken on a set of video sequences showed that the method put forward provides better results compared to existing methods that are limited to using static thresholds.
Batak Toba scripts is one of the diversity of arts and culture in Indonesia. In a previous study [1] they discussed about the recognition of the Batak Toba scripts in handwriting and fonts originating from Ulikozok. In that study, the... more
Batak Toba scripts is one of the diversity of arts and culture in Indonesia. In a previous study [1] they discussed about the recognition of the Batak Toba scripts in handwriting and fonts originating from Ulikozok. In that study, the researchers used the k-Nearest Neighbor algorithm. The results of previous studies were successful in reading data from Microsoft word font (Uli Kozok) but the accuracy obtained in handwriting was still low. Therefore, this study aims to build a model that can detect the Toba Batak script from handwriting by applying Faster R-CNN and YoloV3 Algorithm. The steps in this research are image preprocessing of the image dataset consisting of rotation, labeling, and splitting data. Then, the researchers conducted training, testing and evaluation. In The Result accuracy the Faster R-CNN Algorithm produces 97.3%, the YoloV3 produces an 98.9%, in testing the new data for the single Batak Toba scripts, Faster R-CNN produces 76.3%, the YoloV3 produces 93.1%. Then in testing the number of Batak Toba scripts, Faster R-CNN produces 73.6%, the YoloV3 produces 53%. And then, in testing the Batak Toba scripts mixed with scripts or symbols outside the Batak Toba scripts, Faster R-CNN produces 47.3% and the YoloV3 produces 35.5%.
There have been a lot of developments towards the Humans Computers Interaction (HCI). Many modules have been developed to help the physical world interact with the digital world. Here, the proposed paper serves to be a new approach for... more
There have been a lot of developments towards the Humans Computers Interaction (HCI). Many modules have been developed to help the physical world interact with the digital world. Here, the proposed paper serves to be a new approach for controlling mouse movement using Colored object and marker motion tracking. The project mainly aims at mouse cursor movements and click events based on the object detection and marker identification. The software is developed in Python Language and OpenCV and PyAutoGUI for mouse functions. We have used colored object to perform actions such as movement of mouse and click events. This method mainly focuses on the use of a Web Camera to develop a virtual human computer interaction device in a cost effective manner.
In computer vision, real-time object detection and recognition is considered as a challenging task in uncontrolled environments. In this research work, an improved real-time object detection and recognition technique from web camera video... more
In computer vision, real-time object detection and recognition is considered as a challenging task in uncontrolled environments. In this research work, an improved real-time object detection and recognition technique from web camera video is introduced. Objects such as people, vehicles, animals, etc. are detected and recognized by this technique. Single Shot Detector (SSD) and You Only Look Once (YOLO) models are used in our paper shown promising results in the task of object detection and recognition for getting better performance. Our system can detect objects even in adverse as well as uncontrolled environments like excess or lack of light, rotation, mirroring and a variety of backgrounds, etc. Here, the convolutional neural network (CNN) has been used for the purpose of classifying the object. Our investigated technique is able to gain real-time performance with satisfactory detection as well as classification results and also provides better accuracy. The percentage of accuracy in the detection and classification of an object through our investigated model is about 63-90%.