Siamak Mehrkanoon - Profile on Academia.edu (original) (raw)

Papers by Siamak Mehrkanoon

2021 IEEE Symposium Series on Computational Intelligence (SSCI), Dec 5, 2021

or visit the DOI to the publisher's website. • The final author version and the galley proof are ... more or visit the DOI to the publisher's website. • The final author version and the galley proof are versions of the publication after peer review. • The final published version features the final layout of the paper including the volume, issue and page numbers. Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal. If the publication is distributed under the terms of Article 25fa of the Dutch Copyright Act, indicated by the "Taverne" license above, please follow below link for the End User

Fixed-Size kernel models in System Identification under a reduced complexity perspective

The European Symposium on Artificial Neural Networks, 2020

Providing sufficient labeled training data in many application domains is a laborious and costly ... more Providing sufficient labeled training data in many application domains is a laborious and costly task. Designing models that can learn from partially labeled data, or leveraging labeled data in one domain and unlabeled data in a different but related domain is of great interest in many applications. In particular, in this context one can refer to semi-supervised modelling, transfer learning, domain adaptation and multi-view learning among others. There are several possibilities for designing such models ranging from shallow to deep models. These type of models have received increasing interest due to their successful applications in real-life problems. This paper provides a brief overview of recent techniques in learning from partially labeled data.

Scalable Hybrid Deep Neural Kernel Networks

The European Symposium on Artificial Neural Networks, 2017

This paper introduces a novel hybrid deep neural kernel framework. The proposed deep learning mod... more This paper introduces a novel hybrid deep neural kernel framework. The proposed deep learning model follows a combination of neural networks based architecture and a kernel based model. In particular, here an explicit feature map, based on random Fourier features, is used to make the transition between the two architectures more straightforward as well as making the model scalable to large datasets by solving the optimization problem in the primal. The introduced framework can be considered as the first building block for the development of even deeper models and more advanced architectures. Experimental results show a significant improvement over shallow models on several medium to large scale real-life datasets. 1 Deep Learning Models Conventional machine learning techniques were limited in processing natural data in their raw forms and a lot of domain experts were required in transforming raw data into meaningful features or representations. Deep Learning is a class of machine learning techniques that belongs to the family of representation learning models [1]. It has attracted many researchers due to its success in revolutionizing many application domains ranging from auditory to vision sensory signal processing. Deep learning based models deal with complex tasks by learning from subtasks. In particular, several nonlinear modules are stacked in hierarchical architectures to learn multiple levels of representation (hierarchical features) from the raw input data. Each module transforms the representation at one level into a slightly more abstract representation at a higher level, i.e. the higher-level features are defined in terms of lower-level ones. Deep learning architectures have grown significantly, resulting in different models such as stacked denoising autoencoders [2], Restricted Boltzmann Machines [3], Convolutional Neural Networks [4, 5] among others. Recent works in machine learning have highlighted the superiority of deep architectures over shallow architectures in terms of accuracy in several application domains [1, 6]. But, no free lunch, training deep neural networks involves difficult nonlinear optimizations and demands huge amount of training data. Most of the developed deep learning models are based on artificial neural networks (ANN) architecture, whereas deep kernel based models have not yet been explored in great detail. Authors in [7] introduced a convex deep learning model 17 ESANN 2017 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges (Belgium), 26-28 April 2017, i6doc.com publ., ISBN 978-287587039-1. Available from http://www.i6doc.com/en/.

The European Symposium on Artificial Neural Networks, 2020

This paper designs a deep model to detect PCB defects from an input pair of a detect-free templat... more This paper designs a deep model to detect PCB defects from an input pair of a detect-free template and a defective tested image. A novel group pyramid pooling module is proposed to efficiently extract features in various resolutions to predict defects in different scales. To train the deep model, a dataset including 6 common types of PCB defects is established, namely DeepPCB, which contains 1,500 image pairs with annotations. Besides, a semi-supervised learning manner is examined to effectively utilize the unlabelled images for training the PCB defect detector. Experiment results validate the effectiveness and efficiency of the proposed model by achieving 98.6% mAP @ 62 FPS on DeepPCB dataset. Deep-PCB is now available at: .

Pattern Recognition Letters, Jul 1, 2019

A novel cross-domain neural-kernel networks architecture for semi-supervised domain adaption prob... more A novel cross-domain neural-kernel networks architecture for semi-supervised domain adaption problem is introduced. The proposed model consists of two stream neural-kernel networks corresponding to the source and target domains which are enriched with a coupling term. Each stream neural-kernel networks follows a combination of neural network layer and an explicit feature map constructed by means of random Fourier features. The introduced coupling term aims at enforcing correlations among the output of the intermediate layers of the two stream networks as well as encouraging the two networks to learn shared representation of the data from both source and target domains. Experimental results are given to illustrate the effectiveness of the proposed approaches on real-life datasets.

Neural Networks, Dec 1, 2021

Weather nowcasting consists of predicting meteorological components in the short term at high spa... more Weather nowcasting consists of predicting meteorological components in the short term at high spatial resolutions. Due to its influence in many human activities, accurate nowcasting has recently gained plenty of attention. In this paper, we treat the nowcasting problem as an image-to-image translation problem using satellite imagery. We introduce Broad-UNet, a novel architecture based on the core UNet model, to efficiently address this problem. In particular, the proposed Broad-UNet is equipped with asymmetric parallel convolutions as well as Atrous Spatial Pyramid Pooling (ASPP) module. In this way, The the Broad-UNet model learns more complex patterns by combining multi-scale features while using fewer parameters than the core UNet model. The proposed model is applied on two different nowcasting tasks, i.e. precipitation maps and cloud cover nowcasting. The obtained numerical results show that the introduced Broad-UNet model performs more accurate predictions compared to the other examined architectures.

arXiv (Cornell University), Jan 25, 2021

Wind speed prediction and forecasting is important for various business and management sectors. I... more Wind speed prediction and forecasting is important for various business and management sectors. In this paper, we introduce new models for wind speed prediction based on graph convolutional networks (GCNs). Given hourly data of several weather variables acquired from multiple weather stations, wind speed values are predicted for multiple time steps ahead. In particular, the weather stations are treated as nodes of a graph whose associated adjacency matrix is learnable. In this way, the network learns the graph spatial structure and determines the strength of relations between the weather stations based on the historical weather data. We add a self-loop connection to the learnt adjacency matrix and normalize the adjacency matrix. We examine two scenarios with the self-loop connection setting (two separate models). In the first scenario, the self-loop connection is imposed as a constant additive. In the second scenario a learnable parameter is included to enable the network to decide about the self-loop connection strength. Furthermore, we incorporate data from multiple time steps with temporal convolution, which together with spatial graph convolution constitutes spatio-temporal graph convolution. We perform experiments on real datasets collected from weather stations located in cities in Denmark and the Netherlands. The numerical experiments show that our proposed models outperform previously developed baseline models on the referenced datasets. We provide additional insights by visualizing learnt adjacency matrices from each layer of our models.

Accurate wind speed forecasting is of great importance for many economic, business and management... more Accurate wind speed forecasting is of great importance for many economic, business and management sectors. This paper introduces a new model based on convolutional neural networks (CNNs) for wind speed prediction tasks. In particular, we show that compared to classical CNN-based models, the proposed model is able to better characterise the spatiotemporal evolution of the wind data by learning the underlying complex input-output relationships from multiple dimensions (views) of the input data. The proposed model exploits the spatio-temporal multivariate multidimensional historical weather data for learning new representations used for wind forecasting. We conduct experiments on two real-life weather datasets. The datasets are measurements from cities in Denmark and in the Netherlands. The proposed model is compared with traditional 2-and 3-dimensional CNN models, a 2D-CNN model with an attention layer and a 2D-CNN model equipped with upscaling and depthwise separable convolutions.

arXiv (Cornell University), Jul 13, 2020

In this chapter we review the main literature related to the recent advancement of deep neural-ke... more In this chapter we review the main literature related to the recent advancement of deep neural-kernel architecture, an approach that seek the synergy between two powerful class of models, i.e. kernel-based models and artificial neural networks. The introduced deep neural-kernel framework is composed of a hybridization of the neural networks architecture and a kernel machine. More precisely, for the kernel counterpart the model is based on Least Squares Support Vector Machines with explicit feature mapping. Here we discuss the use of one form of an explicit feature map obtained by random Fourier features. Thanks to this explicit feature map, in one hand bridging the two architectures has become more straightforward and on the other hand one can find the solution of the associated optimization problem in the primal, therefore making the model scalable to large scale datasets. We begin by introducing a neural-kernel architecture that serves as the core module for deeper models equipped with different pooling layers. In particular, we review three neural-kernel machines with average, maxout and convolutional pooling layers. In average pooling layer the outputs of the previous representation layers are averaged. The maxout layer triggers competition among different input representations and allows the formation of multiple sub-networks within the same model. The convolutional pooling layer reduces the dimensionality of the multi-scale output representations. Comparison with neural-kernel model, kernel based models and the classical neural networks architecture have been made and the numerical experiments illustrate the effectiveness of the introduced models on several benchmark datasets.

arXiv (Cornell University), Aug 16, 2021

Reliable and accurate wind speed prediction has significant impact in many industrial sectors suc... more Reliable and accurate wind speed prediction has significant impact in many industrial sectors such as economic, business and management among others. This paper presents a new model for wind speed prediction based on Graph Attention Networks (GAT). In particular, the proposed model extends GAT architecture by equipping it with a learnable adjacency matrix as well as incorporating a new attention mechanism with the aim of obtaining attention scores per weather variable. The output of the GAT based model is combined with the LSTM layer in order to exploit both the spatial and temporal characteristics of the multivariate multidimensional historical weather data. Real weather data collected from several cities in Denmark and Netherlands are used to conduct the experiments and evaluate the performance of the proposed model. We show that in comparison to previous architectures used for wind speed prediction, the proposed model is able to better learn the complex input-output relationships of the weather data. Furthermore, thanks to the learned attention weights, the model provides an additional insights on the most important weather variables and cities for the studied prediction task.

arXiv (Cornell University), Jul 2, 2020

Neuroimaging techniques have shown to be useful when studying the brains activity. This paper use... more Neuroimaging techniques have shown to be useful when studying the brains activity. This paper uses Magnetoencephalography (MEG) data, provided by the Human Connectome Project (HCP), in combination with various deep artificial neural network models to perform brain decoding. More specifically, here we investigate to which extent can we infer the task performed by a subject based on its MEG data. Three models based on compact convolution, combined convolutional and long short-term architecture as well as a model based on multi-view learning that aims at fusing the outputs of the two stream networks are proposed and examined. These models exploit the spatio-temporal MEG data for learning new representations that are used to decode the relevant tasks across subjects. In order to realize the most relevant features of the input signals, two attention mechanisms, i.e. self and global attention, are incorporated in all the models. The experimental results of cross subject multi-class classification on the studied MEG dataset show that the inclusion of attention improves the generalization of the models across subjects.

arXiv (Cornell University), Jun 28, 2021

Reliable weather forecasting is of great importance in science, business, and society. The best p... more Reliable weather forecasting is of great importance in science, business, and society. The best performing data-driven models for weather prediction tasks rely on recurrent or convolutional neural networks, where some of which incorporate attention mechanisms. In this work, we introduce a novel model based on Transformer architecture for weather forecasting. The proposed Tensorial Encoder Transformer (TENT) model is equipped with tensorial attention and thus it exploits the spatiotemporal structure of weather data by processing it in multidimensional tensorial format. We show that compared to the classical encoder transformer, 3D convolutional neural networks, LSTM, and Convolutional LSTM, the proposed TENT model can better learn the underlying complex pattern of the weather data for the studied temperature prediction task. Experiments on two real-life weather datasets are performed. The datasets consist of historical measurements from weather stations in the USA, Canada and Europe. The first dataset contains hourly measurements of weather attributes for 30 cities in the USA and Canada from October 2012 to November 2017. The second dataset contains daily measurements of weather attributes of 18 cities across Europe from May 2005 to April 2020. Two attention scores are introduced based on the obtained tonsorial attention and are visualized in order to shed light on the decision-making process of our model and provide insight knowledge on the most important cities for the target cities.

Knowledge-Based Systems

The supply and demand of energy is influenced by meteorological conditions. The relevance of accu... more The supply and demand of energy is influenced by meteorological conditions. The relevance of accurate weather forecasts increases as the demand for renewable energy sources increases. The energy providers and policy makers require weather information to make informed choices and establish optimal plans according to the operational objectives. Due to the recent development of deep learning techniques applied to satellite imagery, weather forecasting that uses remote sensing data has also been the subject of major progress. The present paper investigates multiple steps ahead frame prediction for coastal sea elements in the Netherlands using U-Net based architectures. Hourly data from the Copernicus observation programme spanned over a period of 2 years has been used to train the models and make the forecasting, including seasonal predictions. We propose a variation of the U-Net architecture and further extend this novel model using residual connections, parallel convolutions and asymmetric convolutions in order to introduce three additional architectures. In particular, we show that the architecture equipped with parallel and asymmetric convolutions as well as skip connections outperforms the other three discussed models.

This paper introduces a general framework of non-parallel support vector machines, which involves... more This paper introduces a general framework of non-parallel support vector machines, which involves a regularization term, a scatter loss and a misclassification loss. When dealing with binary problems, the framework with proper losses covers some existing non-parallel classifiers, such as multisurface proximal support vector machine via generalized eigenvalues, twin support vector machines, and its least squares version. The possibility of incorporating different existing scatter and misclassification loss functions into the general framework is discussed. Moreover, in contrast with the mentioned methods, which applies kernelgenerated surface, we directly apply the kernel trick in the dual and then obtain nonparametric models. Therefore, one does not need to formulate two different primal problems for the linear and nonlinear kernel respectively. In addition, experimental results are given to illustrate the performance of different loss functions.

2021 IEEE Symposium Series on Computational Intelligence (SSCI)

Polymer International, 2022

Polymeric dispersing agents were prepared from aliphatic polyesters consisting of δ‐undecalactone... more Polymeric dispersing agents were prepared from aliphatic polyesters consisting of δ‐undecalactone (UDL) and β,δ‐trimethyl‐ε‐caprolactones (TMCL) as biobased monomers, which were polymerized in bulk via organocatalysts. Graft copolymers were obtained by coupling of the polyesters to poly(ethylene imine) (PEI) in the bulk without using solvents. Various parameters that influence the performance of the dispersing agents in pigment‐based UV‐curable matrices were investigated: chemistry of the polyester (UDL or TMCL), polyester/PEI weight ratio, molecular weight of the polyesters and of PEI. The performance of the dispersing agents was modelled using machine learning in order to increase the efficiency of the dispersant design. The resulting models were presented as analytical models for the individual polyesters and the synthesis conditions for optimally performing dispersing agents were indicated as a preference for high‐molecular‐weight polyesters and a polyester‐dependent maximum pol...

2021 IEEE Symposium Series on Computational Intelligence (SSCI), 2021

Symbolic regression corresponds to an ensemble of techniques that allow to uncover an analytical ... more Symbolic regression corresponds to an ensemble of techniques that allow to uncover an analytical equation from data. Through a closed form formula, these techniques provide great advantages such as potential scientific discovery of new laws, as well as explainability, feature engineering as well as fast inference. Similarly, deep learning based techniques has shown an extraordinary ability of modeling complex patterns. The present paper aims at applying a recent end-to-end symbolic regression technique, i.e. the equation learner (EQL), to get an analytical equation for wind speed forecasting. We show that it is possible to derive an analytical equation that can achieve reasonable accuracy for short term horizons predictions only using few number of features.

2020 IEEE Symposium Series on Computational Intelligence (SSCI), 2020

2021 IEEE Symposium Series on Computational Intelligence (SSCI), Dec 5, 2021

Fixed-Size kernel models in System Identification under a reduced complexity perspective

The European Symposium on Artificial Neural Networks, 2020

Scalable Hybrid Deep Neural Kernel Networks

The European Symposium on Artificial Neural Networks, 2017

The European Symposium on Artificial Neural Networks, 2020

Pattern Recognition Letters, Jul 1, 2019

Neural Networks, Dec 1, 2021

arXiv (Cornell University), Jan 25, 2021

arXiv (Cornell University), Jul 13, 2020

arXiv (Cornell University), Aug 16, 2021

arXiv (Cornell University), Jul 2, 2020

arXiv (Cornell University), Jun 28, 2021

Knowledge-Based Systems

2021 IEEE Symposium Series on Computational Intelligence (SSCI)

Polymer International, 2022

2021 IEEE Symposium Series on Computational Intelligence (SSCI), 2021

2020 IEEE Symposium Series on Computational Intelligence (SSCI), 2020