Deep Neural Networks Research Papers (original) (raw)

We examine Dropout through the perspective of interactions: learned effects that combine multiple input variables. Given NNN variables, there are O(N2)O(N^2)O(N2) possible pairwise interactions, O(N3)O(N^3)O(N3) possible 3-way interactions, etc. We show that Dropout implicitly sets a learning rate for interaction effects that decays exponentially with the size of the interaction, corresponding to a regularizer that balances against the hypothesis space which grows exponentially with number of variables in the interaction. This understanding of Dropout has implications for the optimal Dropout rate: higher Dropout rates should be used when we need stronger regularization against spurious high-order interactions. This perspective also issues caution against using Dropout to measure term saliency because Dropout regularizes against terms for high-order interactions. Finally, this view of Dropout as a regularizer of interaction effects provides insight into the varying effectiveness of Dropout for diffe...

- by and +1
- •
- Computer Science, Artificial Intelligence, Deep Learning, Deep Neural Networks

This article describes the methodology used to train
and test a Deep Neural Network (DNN) with Photoplethysmography
(PPG) data performing a regression
task to estimate the Respiratory Rate (RR). The
DNN architecture is based on a model used to infer
the heart rate (HR) from noisy PPG signals, which
is optimized to the RR problem using genetic optimization.
Two open-access datasets were used in
the tests, the BIDMC and the CapnoBase. With the
CapnoBase dataset, the DNN achieved a median error
of 1.16 breaths/min, which is comparable with
analytical methods in the literature, in which the best
error found is 1.1 breaths/min (excluding the 8 %
noisiest data). The BIDMC dataset seems to be more
challenging, as the minimum median error of the literature’s
methods is 2.3 breaths/min (excluding 6 %
of the noisiest data), and the DNN based approach
achieved a median error of 1.52 breaths/min with the
whole dataset.

- by Ingenius: Revista de Ciencia y Tecnología and +1
- •
- Deep Neural Networks, Photoplethysmography, Respiratory Rate

In this paper, we propose a neural network model for human emotion and gesture classification. We demonstrate that the proposed architecture represents an effective tool for real-time processing of customer's behavior for distributed on-land systems, such as information kiosks, automated cashiers and ATMs. The proposed approach combines most recent biometric techniques with the neural network approach for real-time emotion and behavioral analysis. In the series of experiments, emotions of human subjects were recorded, recognized, and analyzed to give statistical feedback of the overall emotions of a number of targets within a certain time frame. The result of the study allows automatic tracking of user’s behavior based on a limited set of observations.

One of the most famous cultural heritages in Indonesia is batik. Batik is a specially made drawing cloth by writing Malam (wax) on the cloth, then processed in a certain way. The diversity of motifs both in Indonesia and the allied countries raises new research topics in the field of information technology, both for conservation, storage, publication and the creation of new batik motifs. In computer science research area, studies about Batik pattern have been done by researchers and some algorithms have been successfully applied in Batik pattern recognition. This study was focused on Batik motif recognition using texture fusion feature which is Gabor, Log-Gabor, and GLCM; and using PCA feature reduction to improve the classification accuracy and reduce the computational time. To improve the accuracy, we proposed a Deep Neural Network model to recognise batik pattern and used batch normalisation as a regularises to generalise the model and to reduce time complexity. From the experiments, the feature extraction, selection, and reduction gave better accuracy than the raw dataset. The feature selection and reduction also reduce time complexity. The DNN+BN significantly improve the accuracy of the classification model from 65.36% to 83.15%. BN as a regularization has successfully made the model more general, hence improve the accuracy of the model. The parameters tuning also improved accuracy from 83.15% to 85.57%.

This paper proposes a new hybrid deep learning model for heart disease prediction using recurrent neural network (RNN) with the combination of multiple gated recurrent units (GRU), long short-term memory (LSTM) and Adam optimizer. This proposed model resulted in an outstanding accuracy of 98.6876% which is the highest in the existing model of RNN. The model was developed in Python 3.7 by integrating RNN in multiple GRU that operates in Keras and Tensorflow as the backend for deep learning process, supported by various Python libraries. The recent existing models using RNN have reached an accuracy of 98.23% and deep neural network (DNN) has reached 98.5%. The common drawbacks of the existing models are low accuracy due to the complex build-up of the neural network, high number of neurons with redundancy in the neural network model and imbalance datasets of Cleveland. Experiments were conducted with various customized model, where results showed that the proposed model using RNN and multiple GRU with synthetic minority oversampling technique (SMOTe) has reached the best performance level. This is the highest accuracy result for RNN using Cleveland datasets and much promising for making an early heart disease prediction for the patients.

In today's world women safety is one of the most important issues to be addressed in our country. When a women needs urgent help at the time of harassment or molestation, proper reachability is not present for them. Apart from being aware about the significance of women's safety, it is essential that they are provided with protection during those crucial times. The earlier existing system are helpful in detecting the women's location after the crime has been committed. In this project we will be using the women's handbag in which we will be fixing camera lenses and which will be carried anywhere they go. Whenever she comes in contact with any person outside, an image of that person is taken and the activities of the person can be monitored continuously. If the person behaves normally the image can be of no use and can be deleted. But if the activities of the person varies resulting in any harmful action then our system will detect it and process the captured image and it will send to the police and family members with GPS location tracked from IP address. Thus our project helps in saving the life of a women and safeguarding her in the present situation.

- by IJCSMC Journal
- •
- Mathematics, Information Technology, Technology, Neural Networks

Due to its efficiency in storage and search speed, binary hashing has become an attractive approach for a large audio database search. However, most existing hashing-based methods focus on data-independent scheme where random linear projections or some arithmetic expression are used to construct hash functions. Hence, the binary codes do not preserve the similarity and may degrade the search performance. In this paper, an unsupervised similarity-preserving hashing method for content-based audio retrieval is proposed. Different from data-independent hashing methods, we develop a deep network to learn compact binary codes from multiple hierarchical layers of nonlinear and linear transformations such that the similarity between samples is preserved. The independence and balance properties are included and optimized in the objective function to improve the codes. Experimental results on the Extended Ballroom dataset with 8 genres of 3,000 musical excerpts show that our proposed method significantly outperforms state-of-the-art data-independent method in both effectiveness and efficiency. Keywords: Content-based audio retrieval Deep learning Deep neural networks Similarity-preserving hash Unsupervised learning This is an open access article under the CC BY-SA license.

In this paper, we present an application, which recognizes spoken Tamil utterances and speaks out the recognized text in Tamil through our Tamil text-to-speech (TTS) system. Further, we translate the recognized Tamil text to English using google translate and play it through our English TTS. Our Tamil speech recognition system, which can recognize about 75,000 words, has been trained on a 150-hour transcribed speech corpus. We have trained a deep neural network for the acoustic model and employed tri-gram language models to build our recognition system. Our Thirukkural TTS system performs unit-selection based, concatenative speech synthesis, using 2.5 hours of Tamil spoken utterances transcribed at the phone-level. Our English TTS uses 2.7 hours of phone-transcribed utterances. This is a technology demonstration of a complete web application, which, when perfected, could be used to assist Tamil users in learning English, by speaking in Tamil into the system. The playback of the recognized text from Tamil TTS serves to demonstrate the effectiveness of the Tamil ASR to the majority of the conference registrants (who cannot read the recognized
Tamil text.

- by Ramakrishnan Angarai Ganesan and +1
- •
- Acoustic Modelling, Web Applications, Speech Recognition, Tamil

To create a system for speech recognition customized for services in a particular domain, it is very important to add more and more languages to the ‘supported languages’ database of the system. In this study, we have collected speech data from a sample of the population we were targeting the system for i.e. tasks for agricultural commodities. We performed the acoustic modelling of this data using a combination of Deep Neural Network (DNN) and Hidden Markov model (HMM) in which the HMM state likelihoods are taken from the outputs of the DNN. We have performed a three stage training: RBM pre-training, frame cross-entropy training, and sequence-training optimizing MMI/sMBR. After extensive experimentation, the accuracy of our system comes to about 82%. This study motivates further research for fine-tuning of such systems.

- by Gaurav Ojha
- •
- Speech Recognition, Artificial Neural Networks, Deep Neural Networks

In recent years, computerized adaptive testing (CAT) has gained popularity as an important means to evaluate students’ ability. Assigning tags to test questions is crucial in CAT. Manual tagging is widely used for constructing question banks; however, this approach is time-consuming and might lead to consistency issues. Automatic question tagging, an alternative, has not been studied extensively. In this paper, we propose a position-based attention model and keywords-based model to automatically tag questions with knowledge units. With regard to multiple-choice questions, the proposed models employ mechanisms to capture useful information from keywords to enhance tagging performance. Unlike traditional machine learning-based tagging methods, our models utilize deep neural networks to represent questions using contextual information. The experimental results show that our proposed models outperform some traditional classification and topic methods by a large margin on an English question bank dataset.

- by IJIRMPS International Journal and +1
- •
- Artificial Intelligence, Artificial Neural Networks, Computerized Adaptive Testing, Deep Neural Networks

Toponym matching, i.e. pairing strings that represent the same real-world location, is a fundamental problemfor several practical applications. The current state-of-the-art relies on string similarity metrics, either specifically developed for matching place names or integrated within methods that combine multiple metrics. However, these methods all rely on common sub-strings in order to establish similarity, and they do not effectively capture the character replacements involved in toponym changes due to transliterations or to changes in language and culture over time. In this article, we present a novel matching approach, leveraging a deep neural network to classify pairs of toponyms as either matching or nonmatching. The proposed network architecture uses recurrent nodes to build representations from the sequences of bytes that correspond to the strings that are to be matched. These representations are then combined and passed to feed-forward nodes, finally leading to a classification decision. We present the results of a wide-ranging evaluation on the performance of the proposed method, using a large dataset collected from the GeoNames gazetteer. These results show that the proposed method can significantly outperform individual similarity metrics from previous studies, as well as previous methods based on supervised machine learning for combining multiple metrics.

In almost every type of business a retention stage is very important in the customer life cycle because according to market theory, it is always expensive to attract new customers than retaining existing ones. Thus, a churn prediction system that can predict accurately ahead of time, whether a customer will churn in the foreseeable future and also help the enterprises with the possible reasons which may cause a customer to churn is an extremely powerful tool for any marketing team. In this paper, we propose an approach to predict customer churn for non-subscription based business settings. We suggest a set of generic features that can be extracted from sales and payment data of almost all non-subscription based businesses and can be used in predicting customer churn. We have used the neural network-based Multilayer perceptron for prediction purposes. The proposed method achieves an F1-Score of 80% and a recall of 85%, comparable to the accuracy of churn prediction for subscription-based business settings. We also propose a system for causality analysis of churn, which will predict a set of causes which may have led to the customer churn and helps to derive customer retention strategies.

Recent developments in Natural Language Processing have led to the introduction of state-of-the-art Neural Language Models, enabled with unsupervised transferable learning, using different pretraining objectives. While these models achieve excellent results on the downstream NLP tasks, various domain adaptation techniques can improve their performance on domain-specific tasks. We compare and analyze the pretrained Neural Language Models, XLNet (autoregressive), and BERT (autoencoder) on the Legal Tasks. Results show that XLNet Model performs better on our Sequence Classification task of Legal Opinions Classification, whereas BERT produces better results on the NER task. We use domain-specific pretraining and additional legal vocabulary to adapt BERT Model further to the Legal Domain. We prepared multiple variants of the BERT Model, using both methods and their combination. Comparing our variants of the BERT Model, specializing in the Legal Domain, we conclude that both additional pretraining and vocabulary techniques enhance the BERT model's performance on the Legal Opinions Classification task. Additional legal vocabulary improves BERT's performance on the NER task. Combining the pretraining and vocabulary techniques further improves the final results. Our Legal-Vocab-BERT Model gives the best results on the Legal Opinions Task, outperforming the larger pretrained general Language Models, i.e., BERT-Base and XLNet-Base.

As the capabilities of mobile phones have increased, the potential of their negative use has also increased tremendously. For example, use of mobile phones while driving or in high-security zones can lead to accidents, information leaks and security breaches. In this paper, we use deep-learning algorithms viz., single shot multiBox detector (SSD) and faster-region based convolution neural network (Faster-RCNN), to detect mobile phone usage. We highlight the importance of mobile phone usage detection and the challenges involved in it. We have used a subset of State Farm Distracted Driver Detection dataset from Kaggle, which we term as Kag-gleDriver dataset. In addition, we have created a dataset on mobile phone usage, which we term as IITH-dataset on mobile phone usage (IITH-DMU). Although small, IITH-DMU is more generic than the KaggleDriver dataset, since it has images with higher amount of variation in foreground and background objects. Ours is possibly the first work to perform mobile-phone detection for a wide range of scenarios. On the KaggleDriver dataset, the AP at 0.5IoU is 98.97% with SSD and 98.84% with Faster-RCNN. On the IITH-DMU dataset, these numbers are 92.6% for SSD and 95.92% for Faster-RCNN. These pretrained models and the datasets are available at sites.google.com/view/mobile-phone-usage-detection.

- by Sparsh Mittal and +1
- •
- Artificial Intelligence, Image Processing, Machine Learning, Mobile phone

Audio mastering procedures include various processes like frequency equalisation and dynamic range compression. These processes rely solely on musical and perceptually pleasing facets of the acoustic characteristics, derived from subjective listening criteria according to the genre of the audio material or content. These facets are playing a significant role into audio production and mastering, while modelling such a behaviour becomes vital in automated applications. In this work we present a system for automated dynamic range compression in the frequency. The system predicts coefficients, derived by deep neural networks, based on observations of magnitude information retrieved from a critical band filter bank, similar to human’s peripheral auditory system, and applies them to the original, unmastered signal.

Diyabet hastalığının tespiti çalışmalarının temelini klinik karar destek sistemlerinin altyapısını oluşturan sınıflandırma çalışmaları oluşturmaktadır. Sınıflandırma çalışmalarındaki temel amaç sınıflandırma başarımını artırmak, teşhis oranını yükseltmektir. Bunun için çok farklı sınıflandırma yöntemleri ve farklı optimizasyon algoritmaları kullanılmaktadır. Bu bağlamda, bu çalışmada, diyabet hastalığının teşhisi için, Oto Kodlayıcı Sinir Ağları (OKSA) ile sınıflandırma çalışması yapılmıştır. Sınıflandırma çalışmasında yaygın olarak kullanılan UCI makine öğrenme laboratuvarındaki Pima Indian diyabet veri setinden yararlanılmıştır. Çalışmanın sonuçları diyabet hastalığının teşhisine odaklanan ve UCI'nin aynı makine öğrenme veri setini kullanan önceki çalışmaların sonuçları ile karşılaştırılmıştır. Elde edilen sınıflandırma doğruluğu %97,3'dir ve daha önce belirtilen sınıflandırma yöntemlerine kıyasla daha yüksektir. Elde edilen değerlendirmeler, önerilen yöntemin çok verimli olduğunu ve sınıflandırma başarısını artırdığını göstermiştir.

- by Omer Deperlioglu and +1
- •
- Artificial Intelligence, Biomedical Engineering, Machine Learning, Diabetes

The complexity of the term Style consists in the unusual weight and flexibility of the concept itself. In essence the concept defines the main basic rulesets of artistic achievement and excellence. The term Style itself is a latecomer to the considerations of the examination of artistic endeavour and is being discussed in a fierce fashion to this very day.
The term Style in this frame of conversation can be divided into two specific cases. Case one is the idea of Style in Architecture. Case two is the use of the term in Computer Science. In combination, these two instances form the frame of this essay on the emergence of novel considerations of style in the architecture discipline through the application of Neural Networks. More specifically: through the adoption of Style transfer as a technique.

- by Matias del Campo and +1
- •
- Robotics, Coding Theory, Artificial Intelligence, Machine Learning

Study on deep neural networks and big data is merging now by several aspects to enhance the capabilities of intrusion detection system (IDS). Many IDS models has been introduced to provide security over big data. This study focuses on the intrusion detection in computer networks using big datasets. The advent of big data has agitated the comprehensive assistance in cyber security by forwarding a brunch of affluent algorithms to classify and analysis patterns and making a better prediction more efficiently. In this study, to detect intrusion a detection model has been propounded applying deep neural networks. We applied the suggested model on the latest dataset available at online, formatted with packet based, flow based data and some additional metadata. The dataset is labeled and imbalanced with 79 attributes and some classes having much less training samples compared to other classes. The proposed model is build using Keras and Google Tensorflow deep learning environment. Experimental result shows that intrusions are detected with the accuracy over 99% for both binary and multiclass classification with selected best features. Receiver operating characteristics (ROC) and precision-recall curve average score is also 1. The outcome implies that Deep Neural Networks offers a novel research model with great accuracy for intrusion detection model, better than some models presented in the literature.

Behavioural Science is the study of human behaviour in different contexts, situation and time. Investigating about past human behaviour can help us calculate human behaviour in the future. In this paper we are analysing the public opinion of a product, as available on social media sites specifically Twitter. Our end goal is to visually represent the vital business insights that cannot be gathered from a plain dataset that can assist in developing further intelligent solutions. Sentence sentiment classification is a predictive modelling task achieved through supervised learning. Here, the extracted sentence is segregated into two target variables i.e. positive and negative stances using Natural Language Processing (NLP), through the utilization of neural networks. Categorization of the public sentiment will help in market testing, public anticipation of the product and public sentiment analysis. Market testing involves assessing the risks involved, gathering the bias people cradle and determining our prospective customers. Once the product is launched it is essential to understand how people judge the product. This will provide a platform for developers to improve in the future. As text is a sequence of information and not simply a discrete representation we need an iterative process to train the model, thus the application of RNN. For training the model, we mined our own dataset from twitter API so that the model got accustomed to the natural trend of writing, as in tweets. Using specific keywords, we gathered all posts and tweets related to our product. After data collection and cleaning, the polarity of sentences is found. A special kind of neural network called as the convolution LSTM-RNN is used to train the machine. Live social media data is given as an input to the model. Data is processed and distributed in its respective class. This result is stored in MSSQL server to which the Tableau's dashboard is connected. Using Tableau, we perform visual analytics on the collected data, where we can classify tweets geographically to understand location wise reaction. Having gathered this data a company might reach out to dissatisfied customers with solution to their predicament, thereby improving customer relation. Not only can we gather the notion people hold about a product but also about competing products.

S ign language recognition is used to help communicate effectively between normal hearing peoples and hearing-impaired. According to literature review, Turkish sign language recognition studies are very few. For this reason, this study has been performed on Turkish sign language recognition. Depth cameras, such as the Leap Motion controller, allows the researchers to exploit depth knowledge to better understand hand movements. In this study, data of 10 letters in Turkish sign language was taken from Leap Motion. Five of these data are composed of letters (I, C, L, V, O) that It can be expressed with one hand, while the other five are composed of letters (B, D, M, N, K) that It can be expressed with two hands. The dataset was taken by two different people. Each person made five trials for each letter. Ten samples were taken at each trial. In this study, Artificial Neural Network, Deep Learning and Decision Tree based models were designed and the effectiveness of these models in recognizing the Turkish sign language is evaluated. Regression (R), Mean S quare Error (MS E) and Estimation Accuracy performance metrics are used to evaluate models' performance. The data set was randomly divided into 30% for training and 70% for testing. According to the experimental results, the most successful models for the data set with 120 features are decision tree and DNN models. For the data set with 390 features, DNN is the most successful model.

Deep learning and image processing are two areas of great interest to academics and industry professionals alike. The areas of application of these two disciplines range widely, encompassing fields such as medicine, robotics, and security and surveillance. The aim of this book, 'Deep Learning for Image Processing Applications', is to offer concepts from these two areas in the same platform, and the book brings together the shared ideas of professionals from academia and research about problems and solutions relating to the multifaceted aspects of the two disciplines. The first chapter provides an introduction to deep learning, and serves as the basis for much of what follows in the subsequent chapters, which cover subjects including: the application of deep neural networks for image classification; hand gesture recognition in robotics; deep learning techniques for image retrieval; disease detection using deep learning techniques; and the comparative analysis of deep data and big data. The book will be of interest to all those whose work involves the use of deep learning and image processing techniques

In recent years, Internet technologies are grown pervasively not only in information-based web pages but also in online social networking and online banking, which made people's lives easier. As a result of this growth, computer networks encounter with lots of different security threats from all over the world. One of these serious threats is "phishing", which aims to deceive their victims for getting their private information such as username, passwords, social security numbers, financial information, and credit card number by using fake e-mails, webpage's or both. Detection of phishing attack is a challenging problem, because it is considered as a semantics-based attack, which focuses on users' vulnerabilities, not networks' vulnerabilities. Most of the anti-phishing tools mainly use the blacklist/white list methods; however, they fail to catch new phishing attacks and results a high false-positive rate. To overcome this deficiency, we aimed to use a machine learning based algorithms, Artificial Neural Networks(ANNs) and Deep Neural Networks(DNNs), for training the system and catch abnormal request by analysing the URL of web pages. We used a dataset which contains 37,175 phishing and 36,400 legitimate web pages to train the system. According to the experimental results, the proposed approaches has the accuracy in detection of phishing websites with the rate of 92 % and 96 % by the use of ANN and DNN approaches respectively.

Credit card frauds are at an ever-increasing rate and have become a major problem in the financial sector. Because of these frauds, card users are hesitant in making purchases and both the merchants and financial institutions bear heavy losses. Some major challenges in credit card frauds involve the availability of public data, high class imbalance in data, changing nature of frauds and the high number of false alarms. Machine learning techniques have been used to detect credit card frauds but no fraud detection systems have been able to offer great efficiency to date. Recent development of deep learning has been applied to solve complex problems in various areas. This paper presents a thorough study of deep learning methods for the credit card fraud detection problem and compare their performance with various machine learning algorithms on three different financial datasets. Experimental results show great performance of the proposed deep learning methods against traditional machine learning models and imply that the proposed approaches can be implemented effectively for real-world credit card fraud detection systems.

Intrusion detection has attracted a considerable interest from researchers and industries. The community, after many years of research, still faces the problem of building reliable and efficient IDS that are capable of handling large quantities of data, with changing patterns in real time situations. The work presented in this manuscript classifies intrusion detection systems (IDS). Moreover, a taxonomy and survey of shallow and deep networks intrusion detection systems is presented based on previous and current works. This taxonomy and survey reviews machine learning techniques and their performance in detecting anomalies. Feature selection which influences the effectiveness of machine learning (ML) IDS is discussed to explain the role of feature selection in the classification and training phase of ML IDS. Finally, a discussion of the false and true positive alarm rates is presented to help researchers model reliable and efficient machine learning based intrusion detection systems.

- by Xavier Bellekens
- •
- Mathematics, Algorithms, Information Security, Machine Learning

The inspiration for this research paper was the natural bias in university paper checking. When a paper is checked it is either checked by a professor who teaches the subject or someone who has no knowledge of the subject. When checked by the latter type, the answers cannot be appropriately marked unless obviously highlighted. This paper aims to check long answers without human intervention using artificial intelligence and regular expressions. It checks student or examinee written digital form answer by comparing it to an answer key which is to be provided by the exam host. The proposed methodology allows doing so by combining two techniques to get a faster and more accurate system to check long answers. The long answers will be evaluated by breaking them to simplest form of sentences and then encoding them to high density vectors using a Deep Averaging Network (DAN) to analyses the semantic similarity of the examinees answer to the provided answer key. This system does not look for only keywords in the content of the answer but looks at the sentence as a whole and if it evaluates similarly to the content in the answer key. This research relies on the availability of an answer key to check answers and does not check the relevance of content written by the examinee, meaning as long as examinee writes points mentioned in the answer key, he/she will be marked correct. This system of evaluation doesn't cut marks for wrong point (meaning no negative marking).

NeuraLink technology takes the step towards ultra-high Bandwidth technology using BMIs. Brain-Machine Interfaces (BMIs) hold promise for the restoration of sensory and motor function and the treatment of neurological disorders, but clinical BMIs have not yet been widely adopted, in part because modest channel counts have limited their potential [1]. All our feelings, emotions and senses are perceived by neurons in our brains which form a network that communicate through different parts of our body, through synapses. According to 2013 survey, an average person uses only 10 percent of his brain. What if this percentage was increased to 60? This conceptuality is realised by Neuralink. The concept uses threads consisting of electrodes, which are injected in our brains, digitalizes it and makes it such that it can be controlled from external devices. already has a neurosurgical robot which is capable of inserting six threads, 192 electrodes per minute in the brain. The ultimate goal of Neuralink is to merge Man with Machine, fusing human intelligence with artificial intelligence to bring humanity up to a higher level of cognitive reasoning.

- by Shruti Mishra and +1
- •
- Robotics, Neuroscience, Artificial Intelligence, Machine Learning

Deep learning is an emerging technology that is considered one of the most promising directions for reaching higher levels of artificial intelligence.
Among the other achievements, building computers that understand speech represents a crucial leap towards intelligent machines.
Despite the great efforts of the past decades, however, a natural and
robust human-machine speech interaction still appears to be out of reach,
especially when users interact with a distant microphone in noisy and re-
verberant environments. The latter disturbances severely hamper the intelligibility of a speech signal, making Distant Speech Recognition (DSR) one
of the major open challenges in the field.
This thesis addresses the latter scenario and proposes some novel techniques, architectures, and algorithms to improve the robustness of distant-talking acoustic models. We first elaborate on methodologies for realistic data contamination, with a particular emphasis on DNN training with simulated data. We then investigate on approaches for better exploiting speech contexts, proposing some original methodologies for both feed-forward and recurrent neural networks. Lastly, inspired by the idea that cooperation across different DNNs could be the key for counteracting the harmful effects of noise and reverberation, we propose a novel deep learning paradigm called “network of deep neural networks”.
The analysis of the original concepts were based on extensive experimental validations conducted on both real and simulated data, considering different corpora, microphone configurations, environments, noisy conditions, and ASR tasks.

- by Mirco Ravanelli
- •
- Speech Recognition, Recurrent Neural Network, Deep Learning, Deep Neural Networks

The continually increasing number of complex datasets each year necessitates ever improving machine learning methods for robust and accurate categorization of these data. This paper introduces Random Multimodel Deep Learning (RMDL): a new ensemble, deep learning approach for classification. Deep learning models have achieved state-of-the-art results across many domains. RMDL solves the problem of finding the best deep learning structure and architecture while simultaneously improving robustness and accuracy through ensembles of deep learning architectures. RDML can accept as input a variety data to include text, video, images, and symbolic. This paper describes RMDL and shows test results for image and text data including MNIST, CIFAR-10, WOS, Reuters, IMDB, and 20newsgroup. These test results show that RDML produces consistently better performance than standard methods over a broad range of data types and classification problems.

- by Kamran Kowsari
- •
- Machine Learning, Text Mining, Image Classification, CNN

Breast cancer is a common fatal disease for women. Early diagnosis and detection is necessary in order to improve the prognosis of breast cancer affected people. For predicting breast cancer, several automated systems are already developed using different medical imaging modalities. This paper provides a systematic review of the literature on artificial neural network (ANN) based models for the diagnosis of breast cancer via mammography. The advantages and limitations of different ANN models including spiking neural network (SNN), deep belief network (DBN), convolutional neural network (CNN), multilayer neural network (MLNN), stacked autoencoders (SAE), and stacked de-noising autoencoders (SDAE) are described in this review. The review also shows that the studies related to breast cancer detection applied different deep learning models to a number of publicly available datasets. For comparing the performance of the models, different metrics such as accuracy, precision, recall, etc. were used in the existing studies. It is found that the best performance was achieved by residual neural network (ResNet)-50 and ResNet-101 models of CNN algorithm.

- by Subrato Bharati
- •
- Artificial Intelligence, Breast Cancer, Transfer Learning, Deep Learning

This thesis considers applications of machine learning techniques in hospital emergency readmission and comorbidity risk problems, using healthcare administrative data. The aim is to introduce generic and robust solution approaches that can be applied to different healthcare settings. Existing solution methods and techniques of predictive risk modelling of hospital emergency readmission and comorbidity risk modelling are reviewed. Several modelling approaches, including Logistic Regression, Bayes Point Machine, Random Forest and Deep Neural Network are considered. Firstly, a framework is proposed for pre-processing hospital administrative data, including data preparation, feature generation and feature selection. Then, the Ensemble Risk Modelling of Hospital Readmission (ERMER) is presented, which is a generative ensemble risk model of hospital readmission model. After that, the Temporal-Comorbidity Adjusted Risk of Emergency Readmission (T-CARER) is presented for identifying very sick comorbid patients. A Random Forest and a Deep Neural Network are used to model risks of temporal comorbidity, operations and complications of patients using the T-CARER. The computational results and benchmarking are presented using real data from Hospital Episode Statistics (HES) with several samples across a ten-year period. The models select features from a large pool of generated features, add temporal dimensions into the models and provide highly accurate and precise models of problems with complex structures. The performances of all the models have been evaluated across different timeframes, sub-populations and samples, as well as previous models.

Scope of the book:
This book focusses on the technical concepts of deep learning and its associated branch Neural Networks for the various dimensions of image processing applications. The proposed volume intends to bring together researchers to report the latest results or progress in the development of the above-mentioned areas. Since there is a deficit of books on this specific subject matter, the editors aim to provide a common platform for researchers working in this area to exhibit their novel findings.
Topics of Interest:
This book solicits contributions, which include the fundamentals in the field of Deep Artificial Neural Networks and Image Processing supported by case studies and practical examples. Each chapter is expected to be self-contained and to cover an in-depth analysis of real life applications of neural networks to image analysis.

- by Vania V Estrela
- •
- Applied Mathematics, Computer Graphics, Artificial Intelligence, Computer Vision

In non-contractual freemium and sharing economy settings, a small share of users often drives the largest part of revenue for firms and co-finances the free provision of the product or service to a large number of users. Successfully retaining and upselling such high-value users can be crucial to firms' survival. Predictions of customers' Lifetime Value (LTV) are a much used tool to identify high-value users and inform marketing initiatives. This paper frames the related prediction problem and applies a number of common machine learning methods for the prediction of individual-level LTV. As only a small subset of users ever makes a purchase, data are highly imbalanced. The study therefore combines said methods with synthetic minority oversampling (SMOTE) in an attempt to achieve better prediction performance. Results indicate that data augmentation with SMOTE improves prediction performance for premium and high-value users, especially when used in combination with deep neural networks.

- by Julian Runge and +1
- •
- Customer lifetime value, Freemium Business Models, Deep Neural Networks

The fabulous results of Deep Convolution Neural Networks in computer vision and image analysis have recently attracted considerable attention from researchers of other application domains as well. In this paper we present NgramCNN, a neural network architecture we designed for sentiment analysis of long text documents. It uses pretrained word embeddings for dense feature representation and a very simple single-layer classifier. The complexity is encapsulated in feature extraction and selection parts that benefit from the effectiveness of convolution and pooling layers. For evaluation we utilized different kinds of emotional text datasets and achieved an accuracy of 91.2 % accuracy on the popular IMDB movie reviews. NgramCNN is more accurate than similar shallow convolution networks or deeper recurrent networks that were used as baselines. In the future, we intent to generalize the architecture for state of the art results in sentiment analysis of variable-length texts.

Diyabetik retinopati, şeker hastalığı kaynaklı ciddi bir göz hastalığıdır ve gelişmiş ülkelerde körlüğün en yaygın nedenidir. Bu çalışma retina fundus görüntülerinden diyabetik retinopatiyi teşhis etmek için görüntü işleme ve derin öğrenmenin kullanımını açıklamaktadır. Retina fundus görüntülerini iyileştirmek için HSV, V dönüşümü algoritması ve histogram eşitleme tekniklerini içeren pratik bir yöntem kullanılmıştır. Son olarak, retinal fundus görüntüsüne Gauss alçak geçiren filtre uygulanmıştır. Görüntü işlemeden sonra, Sınıflandırma Evrişimsel Sinir Ağı kullanılarak yapılmıştır. Önerilen yöntemin performansı Kaggle Diyabetik Retinopati Saptama veri tabanındaki 400 retinal fundus görüntüsü kullanılarak değerlendirilmiştir. Görüntü işlemenin her aşaması için sınıflandırma çalışması yapılmıştır. Her aşama için yirmi deneme yapılmış ve ortalama değerler alınmıştır. Görüntü işleme sonrası sınıflandırma çalışması yapılmıştır. Bu çalışmada doğruluk % 96,67, duyarlılık % 93,33, özgüllük % 97,78, hassasiyet ve hatırlama % 93,33, F skoru % 93,33 olarak bulunmuştur. Elde edilen sonuçlar, önerilen yöntemin, retina fundus görüntülerinden diyabetik retinopatiyi teşhis etmek için çok etkili ve başarılı olduğunu göstermektedir.

- by Omer Deperlioglu and +1
- •
- Artificial Intelligence, Biomedical Engineering, Image Processing, Machine Learning

We propose data and knowledge-driven approaches for multilingual training of the automated speech recognition (ASR) system for a target language by pooling speech data from multiple source languages. Exploiting the acoustic similarities between Indian languages, we implement two approaches. In phone/senone mapping, deep neural network (DNN) learns non-linear functions to map senones or phones from one language to the others, and the transcriptions of the source languages are modified such that they can be used along with the target language data to train and fine-tune the target language ASR system. In the other approach, we model the acoustic information for all the languages simultaneously by training a multitask DNN (MTDNN) to predict the senones of each language in different output layers. The cross-entropy loss function and the weight update procedure are modified such that only the shared layers and the output layer responsible for predicting the senone classes of a language are updated during training, if the feature vector belongs to that particular language. In the low-resourced setting (LRS), 40 hours of transcribed speech data each for Tamil, Telugu and Gujarati languages are used for training. The DNN based senone mapping technique gives relative improvements in word error rates (WERs) of 9.66%, 7.2% and 15.21% over the baseline system for Tamil, Gujarati and Telugu languages, respectively. In medium-resourced setting (MRS), 160, 275 and 135 hours of data for Tamil, Kannada and Hindi languages are used, where, the same technique gives better relative improvements of 13.94%, 10.28% and 27.24% for Tamil, Kannada and Hindi, respectively. The MTDNN with senone mapping based training in LRS, gives higher relative WER improvements of 15.0%, 17.54% and 16.06%, respectively for Tamil, Gujarati and Telugu, whereas in MRS, we see improvements of 21.24% 21.05% and 30.17% for Tamil, Kannada and Hindi languages, respectively. CCS Concepts: • Computing methodologies → Speech recognition.

- by Ramakrishnan Angarai Ganesan
- •
- Automatic Speech Recognition, Speech Recognition, Telugu, Kannada

A scientific study on the importance of machine learning and its applications in the field of computer vision is carried out in this paper. Recent advancements in Artificial Intelligence, deep learning, computing resources and availability of large training datasets made tasks such as computer vision and natural language processing extremely fast and accurate. Thus Artificial intelligence is a trending topic in the field of computing. Deep learning is a subcategory of machine learning in the field of artificial intelligence. Image processing task can be performed efficiently by using machine learning methods, thus machine learning will provide a better understanding of complex images. Object detection, recognition and tracking are the fields related to computer vision. In the computer vision with the help convolutional neural network-based algorithms like YOLO and R-CNN make a big leap in this field. Algorithms based on machine learning models are excellent at recognizing patterns but typically requires an enormous amount of data sets and lots of computational power. Generally, the neural network requires graphics processing unit for faster execution of machine learning models. This review paper gives a brief overview of real-time object detection and machine learning algorithms implemented by various researchers around the world. Also, this paper consists of a study of various methodology used to detect and recognize a particular object in the image. Real-time object detection algorithms are going to play a vital role in the field of computer vision.

Deep learning is progressively gaining popularity as a viable alternative to i-vectors for speaker recognition. Promising results have been recently obtained with Convolutional Neural Networks (CNNs) when fed by raw speech samples directly. Rather than employing standard hand-crafted features, the latter CNNs learn low-level speech representations from wave-forms, potentially allowing the network to better capture important narrow-band speaker characteristics such as pitch and formants. Proper design of the neural network is crucial to achieve this goal. This paper proposes a novel CNN architecture, called SincNet, that encourages the first convolutional layer to discover more meaningful filters. SincNet is based on parametrized sinc functions, which implement band-pass filters. In contrast to standard CNNs, that learn all elements of each filter, only low and high cutoff frequencies are directly learned from data with the proposed method. This offers a very compact and efficient way to derive a customized filter bank specifically tuned for the desired application. Our experiments, conducted on both speaker identification and speaker verification tasks, show that the proposed architecture converges faster and performs better than a standard CNN on raw waveforms.

The diagnosis of blood related diseases involves the identification and characterization of a patient's blood sample. As such, automated methods for detecting and classifying the types of blood cells have important medical applications in this field. Although deep convolutional neural network (CNN) and the traditional machine learning methods have shown good results in the classification of blood cell images. Blood cells mainly include red blood cells, white blood cells and platelets. In blood, leucocyte plays an important role in the human immune function, so it is also called the immune cell. Usually, hematologists use granulated information and shape information in leukocytes to divide white blood cells into granular cells: neutrophil, eosinophil, basophil and non-granular cells: monocyte and lymphocyte. We apply the transfer learning method to transfer the weight parameters that were pre-trained on the Image Net dataset to the CNN section, and adopted a custom loss function to allow our network to train and converge faster and with more accurate weight parameters. Hence this process is applied to detect the abnormality of the cell by means of segmentation and classification. The classification results have shown an accuracy of 88.04% for KNN and 54% for ANN. The proposed system work can be further enhanced by taking other classifiers.

In this project, we have used three different deep neural networks: (1) Densely Connected Neural Network, (2) Convolutional Neural Network (CNN), and (3) Long and Short Term Memory (LSTM) Networks, for carrying out sentiment analysis on a large database of textual movie reviews. In our work, we have used the “embedding layer” in Keras and “GloVe word embeddings” to convert text format of reviews into their corresponding numeric values. We have tested all the models, and compared their performances.

- by Jaydip Sen
- •
- Natural Language Processing, Sentiment Analysis, Text Mining, Deep Neural Networks

Laboratory demonstrations of brain-computer interface (BCI) systems show promise for reducing disability associated with paralysis by directly linking neural activity to the control of assistive devices. Surveys of potential users have revealed several key BCI performance criteria for clinical translation of such a system. Of these criteria, high accuracy, short response latencies, and multi-functionality are three key characteristics directly impacted by the neural decoding component of the BCI system, the algorithm that translates neural activity into control signals. Building a decoder that simultaneously addresses these three criteria is complicated because optimizing for one criterion may lead to undesirable changes in the other criteria. Unfortunately, there has been little work to date to quantify how decoder design simultaneously affects these performance characteristics. Here, we systematically explore the trade-off between accuracy, response latency, and multi-functionality for discrete movement classification using two different decoding strategies-a support vector machine (SVM) classifier which represents the current state-of-the-art for discrete movement classification in laboratory demonstrations and a proposed deep neural network (DNN) framework. We utilized historical intracortical recordings from a human tetraplegic study participant, who imagined performing several different hand and finger movements. For both decoders, we found that response time increases (i.e., slower reaction) and accuracy decreases as the number of functions increases. However, we also found that both the increase of response times and the decline in accuracy with additional functions is less for the DNN than the SVM. We also show that data preprocessing steps can affect the performance characteristics of the two decoders in drastically different ways. Finally, we evaluated the performance of our tetraplegic participant using the DNN decoder in real-time to control functional electrical stimulation (FES) of his paralyzed forearm. We compared his