Convolutional Neural Network Research Papers (original) (raw)

With the most recent advances in technology, computer programming has reached the capabilities of human brain to decide things for almost all healthcare systems. The implementation of convolutional neural network (CNN) and extreme gradient boosting (XGBoost) is expected to improve the accurateness of breast cancer detection. The aims of this research were to; i) determine the stages of CNN-XGBoost integration in diagnosis of breast cancer and ii) calculate the accuracy of the CNN-XGBoost integration in breast cancer detection. By combining transfer learning and data augmentation, CNN with XGBoost as a classifier was used. After acquiring accuracy results through transfer learning, this reasearch connects the final layer to the XGBoost classifier. Furthermore, the interface design for the evaluation process was established using the Python programming language and the Django platform. The results: i) the stages of CNN-XGBoost integration on histopathology images for breast cancer detection were discovered. ii) achieved a higher level of accuracy as a result of the CNN-XGBoost integration for breast cancer detection. In conclusion, breast cancer detection was revealed through the integration of CNN-XGBoost through histopathological images. The combination of CNN and XGBoost can enhance the accuracy of breast cancer detection.

- by
- •
- Breast Cancer, Transfer Learning, Convolutional Neural Network, Data augmentation

In Low and Middle-Income Countries (LMICs), efforts to eliminate the Tuberculosis (TB) epidemic are challenged by the persistent social inequalities in health, the limited number of local healthcare professionals, and the weak healthcare infrastructure found in resource-poor settings. The modern development of computer techniques has accelerated the TB diagnosis process. In this paper, we propose a novel method using Convolutional Neural Network(CNN) to deal with unbalanced, less-category X-ray images. Our method improves the accuracy for classifying multiple TB manifestations by a large margin. We explore the effectiveness and efficiency of shuffle sampling with cross-validation in training the network and find its outstanding effect in medical images classification. We achieve an 85.68% classification accuracy in a large TB image dataset, surpassing any state-of-art classification accuracy in this area. Our methods and results show a promising path for more accurate and faster TB diagnosis in LMICs healthcare facilities.

Blind image steganalysis (BIS) is the process of detecting whether an input image has hidden data or not, without any prior known information ( i.e., blind) on the applied steganography technique. Recent BIS approaches typically suffer from limited detection accuracy and higher computational cost due to, e.g., pre-processing. In this paper, the proposed BIS approach discards the pre-processing step, so that the computational cost is reduced. As well, significant modifications on a recent convolution neural network (CNN)-model are considered in order to enhance the detection accuracy. First, an efficient parameters initialization is considered. Second, a cyclic learning rate and the LReLU activation function are used, during the learning phase, for faster convergence with noticeably higher detection accuracy. Finally, a hybrid technique of model and data parallelism techniques is performed in both convolution and fully connected layers, respectively, thus significantly reducing the c...

- by Eslam Magdy
- •
- Computer Science, Artificial Intelligence, Steganalysis, Convolutional Neural Network

Individually working up on collection of human behaviour identification structure supported the Convolution Neural Network created for the precise behaviour of human in publicly places. Essentially, a video with some behaviors of human information sets are divided into pictures. Subsequently, we have a tendency to method all the pictures by using a vigorous mechanism called background subtraction which detects the changes in order of images that helps in finding many applications. For instance the coaching information set area unit are up skilled with an outline of CNN model, and the deep learning networks are made of random Gradient descent used for updating the framework of our model. Ultimately, assorted behaviors with samples area unit are systematized and known with the acquired system replica. Therefore, area unit will equate the present thought ways. Upshot displays that Convolutional Neural Network will analyse the human behaviour model mechanically and determine the beha...

The identification of nanomaterials with the properties required for energy-efficient electronic systems is usually a tedious human task. A workflow to rapidly localize and characterize nanomaterials at the various stages of their integration into large-scale fabrication processes is essential for quality control and, ultimately, their industrial adoption. In this work, we develop a high-throughput approach to rapidly identify suspended carbon nanotubes (CNTs) by using high-speed Raman imaging and deep learning analysis. Even for Raman spectra with extremely low signal-to-noise ratios (SNRs) of 0.9, we achieve a classification accuracy that exceeds 90%, while it reaches 98% for an SNR of 2.2. By applying a threshold on the output of the softmax layer of an optimized convolutional neural network (CNN), we further increase the accuracy of the classification. Moreover, we propose an optimized Raman scanning strategy to minimize the acquisition time while simultaneously identifying the ...

- by Miroslav Haluska
- •
- Materials Science, Raman Spectroscopy, Nanomaterials, Carbon Nanotube

Automatically describing the information of an image using properly constructed sentences is a tricky task in any language. However, it has the potential to have a significant effect by enabling visually challenged individuals to better understand their surroundings. This paper proposes an image captioning system that generates detailed captions and extracts text from an image, if any, and uses it as a part of the caption to provide a more precise description of the image. To extract the image features, the proposed model uses Convolutional Neural Networks (CNNs) followed by Long Short-Term Memory (LSTM) that generates corresponding sentences based on the learned image features. Further, using the text extraction module, the extracted text (if any) is included in the image description and the captions are presented in audio form. Publicly available benchmark datasets for image captioning like MS COCO, Flickr-8k, Flickr-30k have a variety of images, but they hardly have images that c...

After the rise of the coronavirus, every country has made a compulsion on wearing face masks in public places. Even though a few countries have reached a good number of vaccinations to date, the human body is still not immune to the new variants of the virus. It will take a few more years till everyone becomes mask-free. Thus, authorities need a system to keep a proper check on the discipline whether everyone is wearing a mask in a public place or not. This paper proposes a model with 99.5% accuracy which can be deployed and thus can monitor public places. This will help the authorities to control the spread of the virus. The model proposed is a fast model which took only 38.03 seconds on average per epoch while training. Using the Kaggle dataset, the CNFMD model is trained and tested. The dataset, pre-split into three parts is used from training, validation as well as testing of the model.

- by Saksham Checker
- •
- Computer Science, Artificial Intelligence, Machine Learning, Deep Learning

This paper presents machine-printed character recognition acquired from license plate using convolutional neural network (CNN). CNN is a special type of feed-forward multilayer perceptron trained in supervised mode using a gradient descent Backpropagation learning algorithm that enables automated feature extraction. Common methods usually apply a combination of handcrafted feature extractor and trainable classifier. This may result in sub-optimal result and low accuracy. CNN has proved to achieve state-of-the-art results in such tasks such as optical character recognition, generic objects recognition, real-time face detection and pose estimation, speech recognition, license plate recognition etc. CNN combines three architectural concept namely local receptive field, shared weights and subsampling. The combination of these concepts and optimization method resulted in accuracy around 98%. In this paper, the method implemented to increase the performance of character recognition using CNN is proposed and discussed.

- by syafeeza radzi
- •
- Back Propagation, Convolutional Neural Network

We present a hybrid neural-network for human face recognition which compares favourably with other methods. The system combines local image sampling, a self-organizing map (SOM) neural network, and a convolutional neural network. The SOM provides a quantization of the image samples into a topological space where inputs that are nearby in the original space are also nearby in the output space, thereby providing dimensionality reduction and invariance to minor changes in the image sample, and the convolutional neural network provides partial invariance to translation, rotation, scale, and deformation. The convolutional network extracts successively larger features in a hierarchical set of layers. We present results using the Karhunen-Loeve transform in place of the SOM, and a multilayer perceptron (MLP) in place of the convolutional network for comparison. We use a database of 400 images of 40 individuals which contains quite a high degree of variability in expression, pose, and facial details. We analyze the computational complexity and discuss how new classes could be added to the trained recognizer

- by Weng Ah
- •
- Computational Complexity, Pattern Recognition, Face Recognition, Access Control

Facial Expression Recognition (FER) is one of the most important information channels by which Human-Computer Interaction (HCI) systems can recognize human emotions. The importance of FER is not limited to the direct interaction between the machine and humans but can be extended to security, virtual reality, education, and entertainment. In this paper, we propose two Convolutional Neural Network (CNN) models for FER. One of these models achieved 100% accuracy for the JAFFE and CK+ benchmark datasets with lower computational complexity. We applied image augmentation techniques and image enhancement techniques with the first m odel. T he o ther C NN m odel i s a n e xtended version of the first m odel t hat h as b een v alidated f or t he more challenging FER2013 dataset and we obtained 69.32% for this dataset. By comparing to the recent state-of-the-art approaches to FER, we demonstrate the superior accuracy and efficiency o f t he p roposed approaches.

In the agriculture field, plant disease diagnosis is one of the leading factors. Therefore, to create an automated system that may identify plant disease diagnosis as simple as possible. The exact identification of crop diseases is exceptionally wanted in the field of agriculture. In this project, to propose a convolution neural network architecture for corn leaf disease identification. The analysis is implemented using corn leaf images from the plant village dataset. The proposed CNNs are trained to identify three different classes, that is two diseases and one healthy class. The trained model achieves an accuracy of 96.04%. The trained program specimens are embedded in the raspberry pi, and it fitted on the drone. Recently, Unmanned Aerial Vehicle (UAVs) has increased a lot of consideration. Specifically, there is a developing in utilizing UAVs for agricultural application such as crop monitoring and management. Proposing a computerized framework that is capable of detecting corn leaf diseases with high accuracy. The framework embraces computer vision and deep learning to process the images captured by UAVs at low altitudes and to identify the infected corn leaf.

Predictive maintenance (PdM) is a successful strategy used to reduce cost by minimizing the breakdown stoppages and production loss. The massive amount of data that results from the integration between the physical and digital systems of the production process makes it possible for deep learning (DL) algorithms to be applied and utilized for fault prediction and diagnosis. This paper presents a hybrid convolutional neural network based and long short-term memory network (CNN-LSTM) approach to a predictive maintenance problem. The proposed CNN-LSTM approach enhances the predictive accuracy and also reduces the complexity of the model. To evaluate the proposed model, two comparisons with regular LSTM and gradient boosting decision tree (GBDT) methods using a freely available dataset have been made. The PdM model based on CNN-LSTM method demonstrates better prediction accuracy compared to the regular LSTM, where the average F-Score increases form 93.34% in the case of regular LSTM to 9...

Bangladesh is the fourth largest rice-producing country in the world. Agriculture plays a vital role in the country's economy. One of the major obstacles in rice production is rice paddy diseases. In this paper, we develop a deep learning-based system to detect rice paddy diseases. In the first step, a rice paddy image dataset is analyzed and preprocessed for classification. To build the classifier, we use the Efficient Net B3 Convolution Neural Network (CNN) model. Next, we train a new model using segmented rice paddy diseaseaffected areas to detect affected regions using MASK Recurrent Convolutional Neural Network (Mask RCNN). For the classification methods, we obtain an accuracy of nearly ~99%. For segmentation, the loss value of the class, bounding box, and mask are 0.09, 0.29, 0.30. The mean Average Precision(mAP) of the segmentation is around ~89%.

- by Fahim Mashroor and +1
- •
- Image Processing, Recurrent Neural Network, LAB color space, Convolutional Neural Network

As convolution contributes most operations in convolutional neural network (CNN), the convolution acceleration scheme significantly affects the efficiency and performance of a hardware CNN accelerator. Convolution involves multiply and accumulate (MAC) operations with four levels of loops, which results in a large design space. Prior works either employ limited loop optimization techniques, e.g. loop unrolling, tiling and interchange, or only tune some of the design variables after the accelerator architecture and dataflow are already fixed. Without fully studying the convolution loop optimization before the hardware design phase, the resulting accelerator can hardly exploit the data reuse and manage data movement efficiently. This work overcomes these barriers by quantitatively analyzing and optimizing the design objectives (e.g. memory access) of the CNN accelerator based on multiple design variables. Then, we propose a specific dataflow of hardware CNN acceleration to minimize the data communication while maximizing the resource utilization to achieve high performance. The proposed CNN acceleration scheme and architecture are demonstrated by implementing endto-end CNNs including NiN, VGG-16 and ResNet-50/ResNet-152 for inference. For VGG-16 CNN, the overall throughputs achieve 348 GOPS and 715 GOPS on Intel Stratix V and Arria 10 FPGAs, respectively.

Helm adalah alat penting sebagai pelindung kepala pada saat berkendara, namun pada saat ini helm sering digunakan ketika memasuki memasuki tempat tempat yang semestinya tidak harus menggunakan helm. Contoh tempat yang seharusnya tidak bisa menggunakan helm yaitu ketika memasuki ruang ATM, mini market dan lainnya. Penelitian ini mengadopsi algoritma Convolutional Neural Network (CNN) untuk melakukan identifikasi manusia yang menggunakan helm dan tidak menggunkan helm berdasarkan citra digital. Beberapa model CNN seperti MobileNet-V2, ResNet-50, dan VGG-16 dikomparasikan performanya dalam melakukan identifikasi penggunan helm. Eksperimen dilakukan dengan menggunakan dataset yang terdiri dari 3.207 gambaryang dibagi menjadi menjadi 2 kelas manusia yang menggunakan helm 1.603 gambar dan tidak menggunkan helm 1.604 gambar.Hasil penelitian menunjukkan bahwa arsitektur yang paling akurat dalam mengidentifikasi penggunaan helm adalah ResNet-50 dengan akurasi sebesar 97,81% dalam pengujian model. Sedangkan arsitektur mobileNet-V2 diperoleh tingkat akurasi sebesar 96.36% dan akurasi VGG-16 sebesar 52,25%.

The number of new vehicles on the road is increasing rapidly, which in turn causes highly congested roads and serving as a reason to break traf ic rules by violating them. This leads to a high number of road accidents. Traf ic violation detection systems using computer vision are a very efficient tool to reduce traf ic violations by tracking and Penalizing. The proposed system was implemented using YOLOV3 object detection for traf ic violation detections such as signal jump, vehicle speed, and the number of vehicles. Further, the system is optimized in terms of accuracy. Using the Region of interest and location of the vehicle in the duration of frames, determining signal jump. This implementation obtained an accuracy of 97.67% for vehicle count detection and an accuracy of 89.24% for speed violation detection.

- by IAEME Publication
- •
- Convolutional Neural Network, Traffic violation, Iaeme Ijaret, YOLOV3

The demand for automatic detection of Novel Coronavirus or COVID-19 is increasing across the globe. The exponential rise in cases burdens healthcare facilities, and a vast amount of multimedia healthcare data is being explored to find a solution. This study presents a practical solution to detect COVID-19 from chest X-rays while distinguishing those from normal and impacted by Viral Pneumonia via Deep Convolution Neural Networks (CNN). In this study, three pre-trained CNN models (EfficientNetB0, VGG16, and InceptionV3) are evaluated through transfer learning. The rationale for selecting these specific models is their balance of accuracy and efficiency with fewer parameters suitable for mobile applications. The dataset used for the study is publicly available and compiled from different sources. This study uses deep learning techniques and performance metrics (accuracy, recall, specificity, precision, and F1 scores). The results show that the proposed approach produced a high-quality model, with an overall accuracy of 92.93%, COVID-19, a sensitivity of 94.79%. The work indicates a definite possibility to implement computer vision design to enable effective detection and screening measures.

Automatic Colorization helps to hallucinate what an input gray scale image would look like when colorized. Automatic coloring makes it look and feel better than Grayscale. One of the most important technologies used in Machine learning is Deep Learning. Deep learning is nothing but to train the computer with certain algorithms which imitates the working of the human brain. Some of the areas in which it is used are medical, Industrial Automation, Electronics etc. The main objective of this project is coloring Grayscale images. We have umbrellaed the concepts of convolutional neural networks along with the use of the Opencv library in Python to construct our desired model. A user interface has also been fabricated to get personalized inputs using PIL. The user had to give details about boundaries, what colors to put, etc. Colorization requires considerable user intervention and remains a tedious, time consuming, and expensive task. So, in this paper we try to build a model to colorize...

- by Sohal Rehman
- •
- Computer Science, Machine Learning, Deep Learning, Computer Science Engineering

In precision farming, identifying weeds is an essential first step in planning an integrated pest management program in cereals. By knowing the species present, we can learn about the types of herbicides to use to control them, especially in non-weeding crops where mechanical methods that are not effective (tillage, hand weeding, and hoeing and mowing). Therefore, using the deep learning based on convolutional neural network (CNN) will help to automatically identify weeds and then an intelligent system comes to achieve a localized spraying of the herbicides avoiding their large-scale use, preserving the environment. In this article we propose a smart system based on object detection models, implemented on a Raspberry, seek to identify the presence of relevant objects (weeds) in an area (wheat crop) in real time and classify those objects for decision support including spot spray with a chosen herbicide in accordance to the weed detected.

Magnifera Indica, traditionally known as mango, is a drupe found around the world in over 500 species. India has produced 19.5 million metric tons of mango in 2017. In Bangladesh, mango has been referred as the national tree and government has included endemic species of mango as geographical index (GI) of Bangladesh. Recognizing specific breeds has become a significant computer vision task. In this paper, we have proposed the convolutional neural network (CNN) based approach for detecting five mango species namely, Chosha, Fazli, Harivanga, Lengra and Rupali from 15000 different images. For better experimentation, we have applied three different models of CNN and analyzed the recognition rates with various criteria. For performance evaluation, we have utilized the classic metrics such as precision, recall, F1-score, ROC and accuracy. Among the experimented three models, the third model, outperformed in terms of accuracy with 92.80%.

– Bioinformatics, an interdisciplinary area of biology and computer science, handles large and complex data sets with linear and non-linear relationships between attributes. To handle such relationships, deep learning has got a greater importance these days. This paper analyses different deep learning architectures and their applications in Bioinformatics. The paper also addresses the limitations and challenges of deep learning.

- by IJFRCSCE Journal
- •
- Bioinformatics, Genetics, Machine Learning, Molecular Biology

Deep learning and image processing are two areas of great interest to academics and industry professionals alike. The areas of application of these two disciplines range widely, encompassing fields such as medicine, robotics, and security and surveillance. The aim of this book, 'Deep Learning for Image Processing Applications', is to offer concepts from these two areas in the same platform, and the book brings together the shared ideas of professionals from academia and research about problems and solutions relating to the multifaceted aspects of the two disciplines. The first chapter provides an introduction to deep learning, and serves as the basis for much of what follows in the subsequent chapters, which cover subjects including: the application of deep neural networks for image classification; hand gesture recognition in robotics; deep learning techniques for image retrieval; disease detection using deep learning techniques; and the comparative analysis of deep data and big data. The book will be of interest to all those whose work involves the use of deep learning and image processing techniques

Deep Learning in layman terms is basically stacking together multiple layers of neurons. These neurons lie on connected layers where the output from the previous layer is passed or connected to every other node in the next layer. In general, the performance of neural networks is optimized by stacking several layers together. In late programmed discourse acknowledgment considerations, profound learning engineering applications for acoustic include eclipsed regular sound highlights, for example, Mel-recurrence cepstral coefficient. Be that as it may, for visual discourse acknowledgment (VSR) studies, high-quality visual element extraction systems are still broadly used. Right now, this paper aims to apply convolutional neural system (CNN) as a visual component extraction instrument for VSR. Here, three different variants of CNN is developed to process the pictures of a speaker's mouth zone in blend with phoneme marks. Later, the developed CNN variants are used to remove visual highlights basic for perceiving phonemes and are evaluated for their performance with the most widely used benchmark dataset, MIRACL-VC1

- by IAEME Publication
- •
- Image Classification, Deep Learning, Lip Reading, Convolutional Neural Network

Deep Learning is a subset of machine learning, designed to continually analyze data with logic similar to human. It uses a layered structure of an algorithm called Artificial Neural Network (ANN). They are mainly used in medical diagnosis for making critical decisions like disease prediction, robotic surgery, and radiation treatments. Disease prediction includes identifying and classifying Alzheimer's disease. It is the most common cause of dementia which affects around 46 million people in the world. The disease has several stages and it is classified into Mild and Severe. The symptoms include reduced ability to remember the information, impaired speaking and writing. Many machine learning algorithm techniques like Decision tree classifier, Independent Component Analysis, Linear Discriminant Analysis (LDA) were used to predict the disease based on their stages, but the precision in identifying stages of the signals is not much good. In this work, a Deep Learning based technique is proposed which improves the accuracy of classification by using the Convolutional Neural Network (CNN). This work analyzes the Electroencephalogram (EEG) signal,

- by IAEME Publication
- •
- Alzheimer's Disease, Brain, Electroencephalogram, Convolutional Neural Network

The main concept of our project is to experiment with using deep learning neural networks to detect and quickly respond to crimes in progress with effective Criminal Recognition and Person Tracking system to reduce the crime rate. Surveillance can be of different forms like malicious activity detection, identification of a particular entity particular individual in a CCTV video) or in general keeping tracks of movements of human beings. In our project, the focus has been given to find the trajectory/path of human through the grid of CCTV cameras also known as tracking. Also, manually doing tracking can be very difficult. This is done with the help of face recognition plus video processing. Current system in this field aims to search for an entity in video by extracting its face and matching (or running) it against a database of human faces that is in the interest. So, none of the systems solve the task if they do not have a predefined database against whom the matching is done. Our, Smart AI will do this in a smart way by first generating datasets from human faces taken from CCTV video and use it in a Face Recognition model we are using. The use of deep learning libraries like OpenFace along with some image processing tools like openCV with a cloud-based solution is done to achieve this task

This Accurate Real-time object detection needs faster computation power to identify the object at that specific time. The accuracy of object detection has increased drastically with the advancement of deep learning techniques. We incorporate a stateof-the-art method for object detection to achieve high accuracy with real-time performance. The state-of-the-art methods are subdivided into two types. The first is one-stage methods that prioritize inference speed, and example models include YOLO, SSD, and RetinaNet. The second is two-stage methods that prioritize detection accuracy, and its example models include Faster R-CNN, Mask R-CNN, and Cascade R-CNN. Among all these, Faster-RCNN and SSD have better accuracy, while YOLO performs better when speed is given preference over accuracy. A major challenge in many of the object detection systems is that it is dependent on the other computer vision techniques for helping the deep learning-based approach, which results in slow and non-optimal performance. In this paper, we have used a deep learning-based approach to solve the matter of object detection in an end-to-end fashion. Deep learning combines SSD and Mobile Nets to perform the efficient implementation of detection and tracking.SSD eliminates the feature resampling stage and combined all calculated results as a single component. MobileNet is a lightweight network model that uses depth-wise separable convolution for the places which lacks computational power like mobile devices (eg: laptop, mobile phones, etc). This algorithm performs efficient object detection while not compromising on the performance The main purpose of our research is to elaborate the accuracy of an object detection method SSD and the importance of the pre-trained deep learning model MobileNet. The resultant system is fast and accurate, thus aiding those applications which require object detection

- by Tejal Palwankar
- •
- Computer Science, Artificial Intelligence, CNN, Deep Learning

Skin detection from images, typically used as a preprocessing step, has a wide range of applications such as dermatology diagnostics, human computer interaction designs and so on. It is a challenging problem due to many factors such as variation in pigment melanin, uneven illumination and differences in ethnicity geographics. Besides, age and gender introduce additional difficulties to the detection process. It is hard to determine whether a single pixel is skin or non-skin without considering the context. An efficient traditional hand-engineered skin color detection algorithm requires extensive work by domain experts. Recently, deep learning algorithms, especially convolutional neural networks (CNNs), have achieved great success in pixel-wise labelling tasks. However, CNN-based architectures are not sufficient for modeling the relationship between pixels and their neighbors. In this letter, we integrate recurrent neural networks (RNNs) layers into the fully convolutional neural networks (FCNs), and develop an end-to-end network for human skin detection. In particular, FCN layers capture generic local features while RNN layers model the semantic contextual dependencies in images. Experimental results on the COMPAQ and ECU skin datasets validate the effectiveness of the proposed approach, where RNN layers enhance the discriminative power of skin detection in complex background situations.

Sentiment analysis has become the most popular research topic due to its various application in business, politics, entertainment, however analyzing opinion of people from short text such as Twitter message and single sentence is quite a challenging task due to theirinformality, misspell and semantic error. In thisstudy, we propose character level multiscale sentiment analysis for Afaan Oromoo using combined Convolutional Neural Network and Bidirectional Long Short-Term Memory (CNN-Bi-LSTM) approach. Since there is no standardized and suf icient corpus prepared for Afaan Oromoo Natural Language Processing (NLP) task including sentiment analysis so far, we have collected data from two domain, Facebook and Twitter for the experiment. After collecting data, we removed user names, links, none Afaan Oromoo texts, and any unnecessary characters. The cleaned data were annotated manually by 4 dif erent annotators into five class namely, 2 ,1, -2, -1, and 0 which represent very positive, positive, very negative, negative and neutral respectively. This multi-scale sentiment analysis provides a more refined analysis, which is vital for prioritizing and comparison of dif erent opinion. Afterward we performed experiments on the prepared corpus from Facebook and Twitter by applying Convolutional Neural Network, Bidirectional Long Short-Term Memory and combined Convolutional Neural Network-Bidirectional Long Short-Term Memory with character level word embedding. The experimental result validate that the proposed model outperforms both CNN and Bi-LSTM in both Facebook and Twitter dataset. Based on the implemented Facebook dataset we achieved a promising performance accuracy of 93.3%, 91.4%, and 94.1% for CNN, Bi-LSTM and CNN-Bi-LSTM respectively. Consequently, we executed twitter dataset and achieved 92.6%, 90.3%, 93.8% for CNN, Bi-LSTM and CNN-Bi-LSTM respectively. The result suggests the possibility of multi-scale sentiment analysis as well as CNN-Bi-LSTM on Afaan Oromoo. We have also suggested that the accuracy can be improved by building standardized and suf icient amount of data set, which was one of the most dif icult and demanding tasks of our work

Traffic signs displayed on the roads play an important role in our lives while driving. They supply critical information, for the road users. This successively requires them to regulate their driving behaviour and ensure that they strictly follow the road regulations currently enforced without causing any trouble to other drivers and pedestrians. Traffic Sign Classification is employed to detect and classify traffic signs to inform and warn a driver beforehand to avoid violation of rules. There are certain disadvantages of the existing systems, used for classification, like incorrect predictions, hardware cost and maintenance, which are to a great extent resolved by the proposed system. The proposed approach implements a traffic signs classification algorithm employing a convolutional neural network. Also, it consists of the feature of web cam detection of the traffic sign. This will help the driver to observe the sign close to his / her eyes on the display screen and thus save his/her time in manually checking the traffic sign each time.

Although the traditional template matching is considered as a tracker and achieves some results, it fails in the presence of constraints and in terms of both on accuracy and robustness there is a need to improve the overall tracking capability. This paper focuses on visual object tracking based on the Adaptive Correlation Filter method introduced in the research paper of Bolme et al. [1]. They derived a fast correlation filter, i.e., MOSSE tracker, which utilizes classical signal processing analysis and contains a minimum output sum of squared errors filter. This application is an implementation of this approach and has the ability to track multiple objects simultaneously. It has implemented in Python 3.7 which uses OpenCV library. The result of the current implementation based on [1] will be compared with the cross-correlation base technique. These comparisons will be plotted in MATLAB.

- by Azadeh Hadadi
- •
- Computer Vision, Image Processing, Machine Learning, Deep Learning

Currently, the available information regarding Dutch hallmarks
on silver is limited to printed books in which a short description
or drawing of the mark has been produced. Aside from these
minimalistic reproductions, archival information on guilds
and workshops has been lost. As a result, there is still much
uncertainty concerning hallmarks, and false attributions
are often made. For this research, a database will be built
containing all known and confirmed marks from surviving
insculpation plates and from objects with an unchallenged
provenance. The first phase of this project, reported here,
created a sample database and trained a deep ResNet-50
convolutional neural network (CNN) to perform classification
of images from the database to report the known identities of
those hallmarks. The network can be used to perform inference,
during which hallmark images from new objects can be input
to the network, which outputs an estimate of the possible
identities of the hallmark along with estimates of the certainty
of those predictions. The proof-of-concept network was trained
on a database of 483 silver mark photographs of 82 distinct
types, resulting in an overall top-1 accuracy exceeding 99%.
This tool will aid in linking marks found on objects to known
marks, including their metadata, and could therefore help with
attribution and dating of silver works of art.

- by Ellen van Bork
- •
- Machine Learning, Silver, Augmentation, Database

- by IJCSMC Journal
- •
- Computer Science, Machine Learning, Deep Learning, COMPUTER SCIENCE & ENGINEERING

Automatic Facial Expression Recognition (AFER), has been an active research area in the past three decades. Research and development in this area have become continually active due to its wide range of potential applications in many fields. Recent research in the field presents impressive results when using Convolution Neural Network (CNN's, ConvNets). In general, ConvNets proved to be a very common and promising choice for many computer vision tasks including AFER. Motivated by this fact, we parallelly combine modified versions of three ConvNets to generate an Automated Facial Expression Recognition system. This research aims to present a robust architecture and better learning process for a deep ConvNet. Adding four additional layers to the combination of the basic models assembles the net to one large ConvNet and enables the sophisticated boosting of the basic models. The main contribution of this work comes out of this special architecture and the use of a two-phase training process that enables better learning. The new system we present is trained to detect universal facial expressions of seven\eight basic emotions when targeting the FER2013 and FER2013+ benchmarks, respectively. The presented approach improves the results of the used architectures by 4% using the FER2013 and 2% using FER2013+ data sets. The second round of training the presented system increases the accuracy of some of the basic models by close to 3% while improving the accuracy of the whole net.

This paper focuses on the enhancement of the generalization ability and training stability of deep neural networks (DNNs). New activation functions that we call bounded rectified linear unit (ReLU), bounded leaky ReLU, and bounded bi-firing are proposed. These activation functions are defined based on the desired properties of the universal approximation theorem (UAT). An additional work on providing a new set of coefficient values for the scaled hyperbolic tangent function is also presented. These works result in improved classification performances and training stability in DNNs. Experimental works using the multilayer perceptron (MLP) and convolutional neural network (CNN) models have shown that the proposed activation functions outperforms their respective original forms in regards to the classification accuracies and numerical stability. Tests on MNIST, mnist-rot-bg-img handwritten digit, and AR Purdue face databases show that significant improvements of 17.31%, 9.19%, and 74.99% can be achieved in terms of the testing misclassification error rates (MCRs), applying both mean squared error (MSE) and cross-entropy (CE) loss functions This is done without sacrificing the computational efficiency. With the MNIST dataset, bounding the output of an activation function results in a 78.58% reduction in numerical instability , and with the mnist-rot-bg-img and AR Purdue databases the problem is completely eliminated. Thus, this work has demonstrated the significance of bounding an activation function in helping to alleviate the training instability problem when training a DNN model (particularly CNN).

- by Shan Sung Liew and +1
- •
- Numerical Stability, Convolutional Neural Network, Deep Neural Network, Activation Function

The significance of extracting roads from a satellite image of high resolution can help in road navigation, update geographic information systems, emergency rescue system that requires instantaneous maps. The difficulty lies in identifying and differentiating a road from its background. To overcome this difficulty more and more research is being conducted for devising efficient methods that can extract a road from a satellite image. The large presence of buildings and manmade structures along the roads or the presence of trees or the acquisition properties of the sensor can cause difficulties in identifying a road from a satellite image. The conventional way of identifying a road was using semi-automated approaches which is not feasible as well as consumed large amounts of time. A high accuracy technique for road extraction is needed. Also when compared with extraction from aerial images the extraction from satellite images is a challenging domain. The reason for that is satellite images have a resolution that is low and the presence of noise contents in any given image is quite high. This work deals with extracting a road network from high resolution satellite images. This work deals with estimating if a pixel in a satellite image is part of a road or not using Convolutional Neural Network. The advent of Tensor flow library has made this work feasible. The work proposes a new approach for making data sets for this complex problem and has concluded with a feasible solution for the problem.

The beauty of Indonesian women is distinguished by skin color, facial structure, hair color and body posture. For women today trying to look beautiful is a must. The way to make yourself look beautiful can be tricked by using make-up. But it's not that easy to use make-up because the type of make-up is differentiated based on the basic skin color, this is the problem for women in using make-up. Undertone is the basic color of the skin, there are three types of undertones, namely warm, cool and neutral. By knowing the type of undertone, it will make it easier for women to use make-up, namely to determine the appropriate shade based on the type of undertone. For this reason, a modeling of undertone image classification was made using the Convolutional Neural Network algorithm. This algorithm is claimed to be the best algorithm for solving object recognition and detection problems. The wrist vein color image dataset is required. The dataset used is 30 data per class, then preprocessing is carried out by homogenizing the image size to 64x64 pixels, then augmentation is carried out on each image by rotating and zooming. At this stage, the dataset will be divided into 3000 images which are divided into 80% training data and 20% testing data. Then it is processed through the convolution and pooling process at the feature learning stage, then the fully connected layer and classification stage where the feature learning results will be used for the classification process based on subclasses. Produces accuracy and training model values reaching 98% with a loss value of 0.0214 and for accuracy from data validation it reaches 99% with a loss value of 0.0239 with model testing results of 99.5%.

It is known that convolutional neural networks (CNNs) are efficient for optical character recognition (OCR) and many other visual classification tasks. This paper applies error-correcting output coding (ECOC) to the CNN for segmentation-free OCR such that: 1) the CNN target outputs are designed according to code words of length N; 2) the minimum Hamming distance of the code words is designed to be as large as possible given N. ECOC provides the CNN with the ability to reject or correct output errors to reduce character insertions and substitutions in the recognized text. Also, using code words instead of letter images as the CNN target outputs makes it possible to construct an OCR for a new language without designing the letter images as the target outputs. Experiments on the recognition of English letters, 10 digits, and some special characters show the effectiveness of ECOC in reducing insertions and substitutions.

- by george stathopoulos
- •
- Pattern Recognition, Neural Networks, Neural Network, Image Classification

Convolutional Neural Networks (CNN) have showed success in achieving translation invariance for many image processing tasks. The success is largely attributed to the use of local filtering and maxpooling in the CNN architecture. In this paper, we propose to apply CNN to speech recognition within the framework of hybrid NN-HMM model. We propose to use local filtering and max-pooling in frequency domain to normalize speaker variance to achieve higher multi-speaker speech recognition performance. In our method, a pair of local filtering layer and max-pooling layer is added at the lowest end of neural network (NN) to normalize spectral variations of speech signals. In our experiments, the proposed CNN architecture is evaluated in a speaker independent speech recognition task using the standard TIMIT data sets. Experimental results show that the proposed CNN method can achieve over 10% relative error reduction in the core TIMIT test sets when comparing with a regular NN using the same number of hidden layers and weights. Our results also show that the best result of the proposed CNN model is better than previously published results on the same TIMIT test sets that use a pre-trained deep NN model.

- by abdelrahman mohamed
- •
- Acoustics, Neural Networks, Speech Recognition, Speech

- by ahmad radzi
- •
- Back Propagation, Neural Networks, Artificial Neural Networks, Deep Learning

- by IAEME Publication
- •
- Convolutional Neural Network, Traffic violation, Iaeme Ijaret, YOLOV3

Morphological image processing (or morphology) [1] describes a range of image processing techniques that deal with the shape (or morphology) of features in an image. Morphological operations are typically applied to remove imperfections introduced during segmentation, and so typically operate on bi level images. This report will present this image processing step on both binary and grayscale image [2] and its different operations including Erosion, Dilation, Opening, Closing, Black hat, White hat and Morphological gradient will be discussed in detail. Then, implementation of each operation will be described and will be unified in a GUI tool for better usability. The tool is a user interface application which developed in Python 3.7 and used OpenCV library. The objective of the tool is to show the result of applying these operations.

- by Azadeh Hadadi
- •
- Computer Vision, Image Processing, Machine Learning, Deep Learning

Fire incident is one of the most common undesirable events that cause damages toward houses and human. It can start anytime and anywhere without any prior hint when it would happen. With the cause it brings, the implementation of a device capable of detecting and providing notification is necessary. This study presents the method of early-stage fire-flame detection utilizing image processing integrated with the Android application for emergency notification. The proposed system utilized the image processing applying Convolutional Neural Network (CNN) integrated with mobile app as early notification to the end user. After executing 3999 iterations of 210 images are used for testing to get the Train Accuracy (how accurate the training is) and Cross entropy (how far is the prediction of the actual result), it resulted to 100% trained accuracy and 96% validation. The test evaluation accuracy result was 99% using a cached file of 210, it means that expect the model to perform ~99% accuracy on the new data. From the test that has been done, the proposed system was able to capture the fire flame images and send necessary notification real time. The test results show the captured actual images and the information sent to the end user. The summary test results verified that all gathered data was higher than 90% to be able to send notification. The android app used to show both the pre-investigation information useful for the house owner and BFP personnel.

- by Joshua Gulmatico
- •
- Computer Science, Artificial Intelligence, Image Processing, Fire detection

We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0%, respectively, which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overfitting in the fully connected layers we employed a recently developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% ach...

- by brahim jabir
- •
- Information Technology, Technology, Image Processing, Decision Making

The accurate detection of cracks in paintings, which generally portray rich and varying content, is a challenging task. Traditional crack detection methods are often lacking on recent acquisitions of paintings as they are poorly adapted to high-resolutions and do not make use of the other imaging modalities often at hand. Furthermore, many paintings portray a complex or cluttered composition, significantly complicating a precise detection of cracks when using only photographic material. In this paper, we propose a fast crack detection algorithm based on deep convolutional neural networks (CNN) that is capable of combining several imaging modalities, such as regular photographs, infrared photography and X-Ray images. Moreover, we propose an efficient solution to improve the CNN-based localization of the actual crack boundaries and extend the CNN architecture such that areas where it makes little sense to run expensive learning models are ignored. This allows us to process large resolution scans of paintings more efficiently. The proposed on-line method is capable of continuously learning from newly acquired visual data, thus further improving classification results as more data becomes available. A case study on multimodal acquisitions of the Ghent Altarpiece, taken during the currently ongoing conservation-restoration treatment, shows improvements over the state-of-the-art in crack detection methods and demonstrates the potential of our proposed method in assisting art conservators. INDEX TERMS Digital painting analysis, crack detection, virtual restoration, machine learning, morphological filtering, convolutional neural networks, transfer learning, multimodal data, Ghent Altarpiece.

Information regarding the conditions of roads is a safety concern when driving. In Bangkok, public weather sensors such as weather stations and rain sensors are insufficiently available to provide such information. On the other hand, a number of existing CCTV cameras have been deployed recently in various places for surveillance and traffic monitoring. Instead of deploying new sensors designed specifically for monitoring road conditions, images and location information from existing cameras can be used to obtain precise environmental information. Therefore, we propose a road environment extraction framework that covers different situations, such as raining and non-raining scenes, daylight and night-time scenes, crowded and non-crowded traffic, and wet and dry roads. The framework is based on CCTV images from a Bangkok metropolitan dataset, provided by the Bangkok Metropolitan Administration. To obtain information from CCTV image sequences, multi-label classification was considered b...

The malignancy of the colorectal testing methods has been exposed triumph to decrease the occurrence and death rate; this cancer is the relatively sluggish rising and has an extremely peculiar to develop the premalignant lesions. Now, many patients are not going to colorectal cancer screening, and people who do, are able to diagnose existing tests and screening methods. The most important concept of this motivation for this research idea is to evaluate the recognized data from the immediately available colorectal cancer screening methods. The data provided to laboratory technologists is important in the formulation of appropriate recommendations that will reduce colorectal cancer. With all standard colon cancer tests can be recognized agitatedly, the treatment of colorectal cancer is more efficient. The intelligent computer assisted diagnosis (CAD) is the most powerful technique for recognition of colorectal cancer in recent advances. It is a lot to reduce the level of interference nature has contributed considerably to the advancement of the quality of cancer treatment. To enhance diagnostic accuracy intelligent CAD has a research always active, ongoing with the deep learning and machine learning approaches with the associated convolutional neural network (CNN) scheme.