Self Organized Map Research Papers (original) (raw)

2025, Molecular Phylogenetics and Evolution

The evolutionary history of Neotropical crocodiles has remained elusive. They inhabit a broad geographic range with populations spanning from coastal, inland, and insular locations. Using a selection of natural insular, coastal, and one inland population of C. acutus, coastal C. moreletii, and the single surviving population of C. rhombifer, we discovered a remarkable genetic diversity for the group. Moreover, geometric morphometric results of skull shapes shows that these crocodylus species span a morphological cline. We recovered a high genetic differentiation between C. moreletii, C. rhombifer, and five clusters of C. acutus. The genetic and geographic differences among the C. acutus clusters were used to suggest these may be a species complex. Several ecological, morphological and genetics traits are identified in the well-studied populations from Banco Chinchorro and Cozumel islands off the Mexican Yucatan Peninsula to support discrete species designations for these populations. This work suggests the presence of rapid, recent evolution of several cryptic Crocodylus species throughout the Neotropics.

2025

Pan-tilt-verge (PTV) vision system is one of the most widely used in active vision. The main advantage of using such system is its 4 DOF which allows tracking of moving objects efficiently. Besides a physical design of the head, an overall tracking performance of the system depends on its controller. This paper presents a development of controlling PTV head to achieve one of human-like eye movement behaviors, i.e. saccade. The PTV head is driven directly from a controller using visual feedback. The dynamic Jacobian estimation is obtained by using a self-organizing map network with unsupervised learning scheme. The estimated Jacobian is used in PTV head controller and results are desirable for both performance and speed of learning. Moreover, the system can eventually perform tracking without a priori knowledge of the head structure, e.g. mathematical model of the head and hardware calibration. Thus the system can conveniently be implemented.

2025

2025, DAAAM International Scientific Book

Neuro Fuzzy Logic can be used in the process of selecting the most appropriate tools as well as in Cost Estimation of CNC manufacturing of prototypes, which consists usually in a very small production series, many times as little as a single production unit. In such cases the cutting tools and manufacturing cost must be estimated in a reasonable amount of time and with a certain degree of accuracy, so the shop specialized in this market can be efficient and economically competitive. Therefore an Artificial Intelligence approach can be helpful. Our solution is implemented within an object-oriented programming paradigm for part description and data processing. A set of Fuzzy Logic Rules is applied in a decision-making process to obtain the best tool selection. We use Neural-Networks to generate selforganizing maps (SOM) for adjustment of either the shape and/or boundaries of the membership functions. The optimal cutting conditions are selected by a heuristic approach. Thus the time and cost of a competitive manufacturing job can be adequately predicted.

2025, Bioinformatics of Genome Regulation and Structure II

Motivation: One of the major challenges in the post-genomic era is the speed up of the process of identification of molecular targets related to a specific pathology. Even if the experimental procedure have greatly enhanced the analytical capability, the textual data analysis still play a central role in the planning of the experiments or for database construction. The extraction of relevant information from the published paper requires a lot of time; tools that automatically cluster together the retrieved documents into topic categories labelled by specific relevant keywords can give a great support to this activity. Results: In this paper we present the a application of document clustering system based on Self-Organizing maps to cluster PUBMED abstracts and for the extraction of class specific terms that allow to select the items that are related to some specific topics. The system allows the discrimination of different groups of items and gives an index of relevance for the terms. We have tested the system on a small test sample of PUBMED abstract related to the CDK5 proteins.

2025, International Journal of Computer Science and Engineering

Intrusion detection systems monitor computer system events to discover malicious activities in the network. There are two types of intrusion detection systems, namely, signature-based and anomaly-based. Anomaly detection can be either flow-based or packet-based. In the flow-based approach, the system looks at aggregated information of related packets in the form of flow. Packet-based detection system inspects the complete packet which consists of a header as well as payload data. In this paper, a packet-based improved anomaly detection technique is proposed. In the training module, the normal profiles of the network traffic are generated by modeling the payload of the network using n-gram approach by applying lengthwise clustering of packets according to payload length. Lengthwise clustering is done to reduce the number of models for normal profiles. Then the mean and standard deviation is calculated which are used in detection module. In detection module, the distance between normal profiles and newly arriving data in the network is computed using cosine similarity. The standard dataset DARPA'99 and the Panjab University collected data are used for testing the proposed technique. Anomaly detection of the proposed technique is done on port numbers 21, 23 and 80 and the results are compared with the various n-gram techniques and other techniques used in literature for payload anomaly detection. It is concluded that this improved technique can reduce space and provide better results on port 21 and port 23 than on port 80.

2025, IEEE Transactions on Neural Networks

2025, IEEE Transactions on Power Systems

Different methodologies are available for clustering purposes. The objective of this paper is to review the capacity of some of them and specifically to test the ability of self-organizing maps (SOMs) to filter, classify, and extract patterns from distributor, commercializer, or customer electrical demand databases. These market participants can achieve an interesting benefit through the knowledge of these patterns, for example, to evaluate the potential for distributed generation, energy efficiency, and demand-side response policies (market analysis). For simplicity, customer classification techniques usually used the historic load curves of each user. The first step in the methodology presented in this paper is anomalous data filtering: holidays, maintenance, and wrong measurements must be removed from the database. Subsequently, two different treatments (frequency and time domain) of demand data were tested to feed SOM maps and evaluate the advantages of each approach. Finally, the ability of SOM to classify new customers in different clusters is also examined. Both steps have been performed through a well-known technique: SOM maps. The results clearly show the suitability of this approach to improve data management and to easily find coherent clusters between electrical users, accounting for relevant information about weekend demand patterns.

2025, 2008 Eighth IEEE International Conference on Data Mining

We introduce a Self-Organizing Map (SOM) based visualization method that compares cluster structures in temporal datasets using Relative Density SOM (ReDSOM) visualization. Our method, combined with a distance matrix-based visualization, is capable of visually identifying emerging clusters, disappearing clusters, enlarging clusters, contracting clusters, the shifting of cluster centroids, and changes in cluster density. For example, when a region in a SOM becomes significantly more dense compared to an earlier SOM, and well separated from other regions, then the new region can be said to represent a new cluster. The capabilities of ReDSOM are demonstrated using synthetic datasets, as well as real-life datasets from the World Bank and the Australian Taxation Office. The results on the real-life datasets demonstrate that changes identified interactively can be related to actual changes. The identification of such cluster changes is important in many contexts, including the exploration of changes in population behavior in the context of compliance and fraud in taxation.

2025, Workshop on Self-Organizing Maps

In some application contexts, data are better described by a matrix of pairwise dissimilarities rather than by a vector representation. Clustering and topographic mapping algorithms have been adapted to this type of data, either via the generalized Median principle, or more recently with the so called relational approach, in which prototypes are represented by virtual linear combinations of the original observations. One drawback of those methods is their complexity, which scales as the square of the number of observations, mainly because they use dense prototype representations: each prototype is obtained as a virtual combination of all the elements of its cluster (at least). We propose in this paper to use a sparse representation of the prototypes to obtain relational algorithms with sub-quadratic complexity.

2025, Neurocomputing

The self-organizing map (SOM) and neural gas (NG) and generalizations thereof such as the generative topographic map constitute popular algorithms to represent data by means of prototypes arranged on a (hopefully) topology representing map. However, most standard methods rely on the Euclidean metric, hence the resulting clusters are isotropic and they cannot account for local distorsions or correlations of data. In this contribution, we extend prototype-based clustering algorithms such as NG and SOM towards a more general metric which is given by a full adaptive matrix such that ellipsoidal clusters are accounted for. Thereby, the approach relies on a natural extension of the standard cost functions of NG and SOM (in the form of Heskes) and is conceptually intuitive. We derive batch optimization learning rules for prototype and matrix adaptation based on these generalized cost functions and we show convergence of the algorithm. Thereby, it can be seen that matrix learning implicitly performs local principal component analysis (PCA) and the local eigenvectors correspond to the main axes of the ellipsoidal clusters. Thus, the proposal also provides a cost function associated to alternative proposals in the literature which combine SOM or NG with local PCA models. We demonstrate the behavior of the proposed model in several benchmark examples and in an application to image compression.

2025, Springer eBooks

Clustering constitutes an ubiquitous problem when dealing with huge data sets for data compression, visualization, or preprocessing. Prototype-based neural methods such as neural gas or the self-organizing map offer an intuitive and fast variant which represents data by means of typical representatives, thereby running in linear time. Recently, an extension of these methods towards relational clustering has been proposed which can handle general non-vectorial data characterized by dissimilarities only, such as alignment or general kernels. This extension, relational neural gas, is directly applicable in important domains such as bioinformatics or text clustering. However, it is quadratic in m both in memory and in time (m being the number of data points). Hence, it is infeasible for huge data sets. In this contribution we introduce an approximate patch version of relational neural gas which relies on the same cost function but it dramatically reduces time and memory requirements. It offers a single pass clustering algorithm for huge data sets, running in constant space and linear time only.

2025, Lecture Notes in Computer Science

Clustering and visualization constitute key issues in computersupported data inspection, and a variety of promising tools exist for such tasks such as the self-organizing map (SOM) and variations thereof. Real life data, however, pose severe problems to standard data inspection: on the one hand, data are often represented by complex non-vectorial objects and standard methods for finite dimensional vectors in Euclidean space cannot be applied. On the other hand, very large data sets have to be dealt with, such that data do neither fit into main memory, nor more than one pass over the data is still affordable, i.e. standard methods can simply not be applied due to the sheer amount of data. We present two recent extensions of topographic mappings: relational clustering, which can deal with general proximity data given by pairwise distances, and patch processing, which can process streaming data of arbitrary size in patches. Together, an efficient linear time data inspection method for general dissimilarity data structures results. We present the theoretical background as well as applications to the areas of text and multimedia processing based on the generalized compression distance.

2025, Lecture Notes in Computer Science

4.3 Results of matrix NG, SOM, and k-means after 100 epochs for the spirals data set. Obviously, matrix k-means suffers from local optima. 4.4 Results of matrix NG, SOM, and k-means (top) and standard NG, SOM, and k-means (bottom) after 100 epochs for the spirals data set. Obviously, k-means suffers from local optima. Further, the shape is better represented by matrix clustering as can be seen by the classification accuracy of the maps.

2025, Neural Networks

HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L'archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d'enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

2025

A great challenge today, arising in many fields of science, is the proper mapping of datasets to explore their structure and gain information that otherwise would remain concealed due to the high-dimensionality. This task is impossible without appropriate tools helping the experts to understand the data. A promising way to support the experts in their work is the topographic mapping of the datasets to a low-dimensional space where the structure of the data can be visualized and understood. This thesis focuses on Neural Gas and Self-Organizing Maps as particularly successful methods for prototype-based topographic maps. The aim of the thesis is to extend these methods such that they can deal with real life datasets which are possibly very huge and complex, thus probably not treatable in main memory, nor embeddable in Euclidean space. As a foundation, we propose and investigate a fast batch scheme for topographic mapping which features quadratic convergence. This formulation allows to...

2025, Lecture Notes in Computer Science

2025, Artificial Neural Networks in Pattern Recognition

2025, Lecture Notes in Computer Science

Median clustering extends popular neural data analysis methods such as the self-organizing map or neural gas to general data structures given by a dissimilarity matrix only. This offers flexible and robust global data inspection methods which are particularly suited for a variety of data as occurs in biomedical domains. In this chapter, we give an overview about median clustering and its properties and extensions, with a particular focus on efficient implementations adapted to large scale data analysis.

2025, Lecture Notes in Computer Science

2025, Neural Networks

Neural Gas (NG) constitutes a very robust clustering algorithm given euclidian data which does not suffer from the problem of local minima like simple vector quantization, or topological restrictions like the self-organizing map. Based on the cost function of NG, we introduce a batch variant of NG which shows much faster convergence and which can be interpreted as an optimization of the cost function by the Newton method. This formulation has the additional benefit that, based on the notion of the generalized median in analogy to Median SOM, a variant for non-vectorial proximity data can be introduced. We prove convergence of batch and median versions of NG, SOM, and k-means in a unified formulation, and we investigate the behavior of the algorithms in several experiments.

2025

Denial of Service attacks constitute one of the greatest problem in network security. Monitoring traffic is one of the main techniques used in order to find out the existence of possible outliers in the traffic patterns. In this paper, we propose an approach that detects Denial of Service attacks using Emergent Self-Organizing Maps. The approach is based on classifying "normal" traffic against "abnormal" traffic in the sense of Denial of Service attacks. The approach permits the automatic classification of events that are contained in logs and visualization of network traffic. Extensive simulations show the effectiveness of this approach compared to previously proposed approaches regarding false alarms and detection probabilities.

2025, The Journal of the Acoustical Society of America

A multiple regression model was developed to account for the variance in phonatory jitter among normal speakers and for predicting an expected jitter value and confidence interval for individual subjects. Jitter was measured with a system resolution capable of measuring a minimum perturbation of 2.15 μs. Measures were made from maximum phonation lengths of 95 adults without laryngeal pathology. Seven factors were examined for contributions to the prediction of jitter: sex, age, smoking history, drinking habits, F0, vocal intensity, and length of phonation. A multiple correlation coefficient of determination of 97.6% was obtained for the normal subject pool with a three-factor model including: vocal intensity, F0 and phonation length. For 20 patients with laryngeal pathology, individual predicted jitter values and 90% confidence intervals were computed using the normal regression model, for determining when patients' actual jitter values were outside of their expected confidence ...

2025, Studies in Computational Intelligence

In this chapter, we discuss the use of Self Organizing Maps (SOM) to deal with various tasks in Document Image Analysis. The SOM is a particular type of artificial neural network that computes, during the learning, an unsupervised clustering of the input data arranging the cluster centers in a lattice. After an overview of the previous applications of unsupervised learning in document image analysis, we present our recent work in the field. We describe the use of the SOM at three processing levels: the character clustering, the word clustering, and the layout clustering, with applications to word retrieval, document retrieval and page classification. In order to improve the clustering effectiveness, when dealing with small training sets, we propose an extension of the SOM training algorithm that considers the tangent distance so as to increase the SOM robustness with respect to small transformations of the patterns. Experiments on the use of this extended training algorithm are reported for both character and page layout clustering.

2025, Lecture Notes in Computer Science

We propose an approach for efficient word retrieval from printed documents belonging to Digital Libraries. The approach combines word image clustering (based on Self Organizing Maps, SOM) with Principal Component Analysis. The combination of these methods allows us to efficiently retrieve the matching words from large documents collections without the need for a direct comparison of the query word with each indexed word.

2025, 2008 The Eighth IAPR International Workshop on Document Analysis Systems

In this paper we explore the effectiveness of three clustering methods used to perform word image indexing. The three methods are: the Self-Organazing Map (SOM), the Growing Hierarchical Self-Organazing Map (GHSOM), and the Spectral Clustering. We test these methods on a real data set composed of word images extrapolated from pages that are part of an encyclopedia of the XIX th Century. In essence, the word images are stored into the clusters defined by the clustering methods and subsequently retrieved by identifying the closest cluster to a query word. The accuracy of the methods is compared considering the performance of our word retrieval algorithm developed in our previous work. From the experimental results we may conclude that methods designed to automatically determine the number and the structure of clusters, such as GHSOM, are particularly suitable in the context represented by our data set.

2025, Ecological Chemistry and …

Abstract: The present study deals with the application of self-organizing maps (SOM) of Kohonen for the classification of aerosol monitoring data sets from two sampling points (Arnoldstein and Unterloibach) located close to the border... more

2025, International Journal of Neural Systems

We present several applications of non-linear data modeling, using principal manifolds and principal graphs constructed using the metaphor of elasticity (elastic principal graph approach). These approaches are generalizations of the Kohonen"s self-organizing maps, a class of artificial neural networks. On several examples we show advantages of using non-linear objects for data approximation in comparison to the linear ones. We propose four numerical criteria for comparing linear and non-linear mappings of datasets into the spaces of lower dimension. The examples are taken from comparative political science, from analysis of high-throughput data in molecular biology, from analysis of dynamical systems.

2025, Keçeci Layout

Keçeci Layout is a deterministic node layout algorithm designed for graph visualization in Python. Its primary purpose is to position the nodes of a graph in a predefined, sequential, and repeatable manner. The algorithm processes nodes sequentially, placing them along a user-defined primary axis (e.g., top-down or left-to-right) while applying an offset on the secondary axis in a zigzag pattern. This zigzag pattern helps prevent node overlaps while maintaining an orderly structure. The function is designed to be compatible with popular Python graph libraries, including NetworkX, Rustworkx, igraph, Networkit, and Graphillion (via GraphSet objects). It takes a graph object from one of these libraries as input, processes the nodes (usually by sorting their IDs), and returns a Python dictionary mapping each node's identifier (in the library-specific format) to its calculated (x, y) coordinates. Users can customize the node spacing along the primary and secondary axes (`primary_spacing`, `secondary_spacing`), the main direction of the layout (`primary_direction`), and the starting side of the zigzag pattern (`secondary_start`) through parameters. This regular and predictable structure is useful, particularly when the order of nodes is significant or when a simple, aesthetically pleasing, and easily traceable graph visualization is desired. Keçeci Layout is a deterministic node layout algorithm designed for graph visualization in Python. Its primary purpose is to position the nodes of a graph in a predefined, sequential, and repeatable manner. The algorithm processes nodes sequentially, placing them along a user-defined primary axis (e.g., top-down or left-to-right) while applying an offset on the secondary axis in a zigzag pattern. This zigzag pattern helps prevent node overlaps while maintaining an orderly structure. The function is designed to be compatible with popular Python graph libraries, including NetworkX, Rustworkx, igraph, Networkit, and Graphillion (via GraphSet objects). It takes a graph object from one of these libraries as input, processes the nodes (usually by sorting their IDs), and returns a Python dictionary mapping each node's identifier (in the library-specific format) to its calculated (x, y) coordinates. Users can customize the node spacing along the primary and secondary axes (`primary_spacing`, `secondary_spacing`), the main direction of the layout (`primary_direction`), and the starting side of the zigzag pattern (`secondary_start`) through parameters. This regular and predictable structure is useful, particularly when the order of nodes is significant or when a simple, aesthetically pleasing, and easily traceable graph visualization is desired.

2025, Biosensors and Bioelectronics

An overview of the present industrial scenario with regard to the application of neural network approaches is reported. A brief summary of practical applications is presented. Neural network architectures capable of resolving various types of industrial problems are also illustrated. The first application presented deals with the problem of digit recognition. The proposed architecture is composed of different mulfilayer networks trained by back-propagation, organized in a hierarchical structure. Such a structure, as opposed to a single network structure, allows the avoidance of typical errors occurring on comparison of certain couples of digits. The second application deals with a texture-based segmentation of an image. In order to solve this problem, a structure composed of two different networks is proposed. The first network, a simplified BCS model, is a "feature extractor"; the second network, Learning Vector Quantization model, is able to classify different textures on the basis of the parameters extracted by BCS. The third application deals with a fault-diagnosis system for the process state monitoring of complex industrial plants. An innovative approach is presented. The proposed model analyzes process parameters and predicts possible malfunctions on the basis of self-organizing networks.

2025, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453)

The main purpose of this paper is to detect and follow the pipeline in sonar image. This work is performed by two steps. The first one is to split an transformed line image of pipeline signal into regions of uniform texture using the Gray Level Co-occurrence Matrix Method (GLCM) which is widely used in texture segmentation application. The last one addresses the unsupervised learning method based on the Artificial Neural Networks (Self-Organizing Map or SOM) used for determining the comparative model of pipeline from the image. To increase the performance of SOM, we propose a penalty function based on data histogram visualization for detecting the position of pipeline. After a brief review of both techniques (GLCM and SOM), we present our method and some results from several experiments on the real world data set.

2025

The success of the Semantic Web research is dependent upon the construction of complete and reliable domain ontologies. In this paper we describe an unsupervised framework for domain ontology enrichment based on mining domain text corpora. Specifically, we enrich the hierarchical backbone of an existing ontology, i.e. its taxonomy, with new domain-specific concepts. The framework is based on an extended model of hierarchical self-organizing maps. As being founded on an unsupervised neural network architecture, the framework can be applied to different languages and domains. Terms extracted by mining a text corpus encode contextual content information, in a distributional vector space. The enrichment behaves like a classification of the extracted terms into the existing taxonomy by attaching them as hyponyms for the nodes of the taxonomy. The experiments reported are in the "Lonely Planet" tourism domain. The taxonomy and the corpus are the ones proposed in the PASCAL ontology learning and population challenge. The experimental results prove that the quality of the enrichment is considerably improved by using semantics based vector representations for the classified (newly added) terms, like the document category histograms (DCH) and the document frequency times inverse term frequency (DF-ITF) weighting scheme.

2025

La visualisation de regroupement d’individus statistiques se presente souvent comme une serie de nuage de points a analyser dimension par dimension. Cependant, leur comparaison devient de plus en plus couteuse en temps a mesure que le nombre de dimension augmente, jusqu’a devenir hors de portee pour l’etre humain. Afin de palier ce probleme, nous proposons une methode de visualisation complete allant du traitement statistique des donnees a leur affichage graphique. Les traitements statistiques se basent sur des methodes de reduction de dimension et de regroupement de donnees. La visualisation des donnees, elle, est une representation graphique unique en deux dimensions. Elle se construit autour des groupes et non pas des individus comme un nuage de points classique pourrait le faire. Ainsi, nous obtenons une liste d’objets representant les groupes disposes dans un espace a deux dimensions connectes par des liens de similarites et dissimilarites. Cette methode de visualisation a ete ...

2025

Abstract. – Prediction of financial time series using artificial neural networks has been the subject of many publications, even if the predictability of financial series remains a subject of scientific debate in the financial literature. Facing this difficulty, analysts often consider a large number of exogenous indica-tors, which makes the fitting of neural networks extremely difficult. In this paper, we analyze how to aggregate a large number of indicators in a smaller number using-possibly nonlinear- projection methods. Nonlinear projection methods are shown to be equivalent to the linear Principal Component Analysis when the prediction tool used on the new variables is linear. Furthermore, the computation of the nonlinear projection gives an objective way to evaluate the number of resulting indicators needed for the prediction. Finally, the advantages of nonlinear projection could be further exploited by using a subsequent nonlinear prediction model. The methodology developed i...

2025, Advances in Self-Organising Maps

Three extensions of the traditional learning rule for Self-Organizing Maps are presented. They are based on geometrical considerations and explore various possibilities regarding the norm and the direction of the adaptation vectors. The performance and convergence of each rule is evaluated by two criteria: topology preservation and quantization error.

2025, Neurocomputing

A general-purpose useful parameter in time series forecasting is the regressor size, corresponding to the minimum number of variables necessary to forecast the future values of the time series. If the models are nonlinear, the choice of this regressor becomes very difÿcult. We present a quasi-automatic method using a nonlinear projection named curvilinear component analysis to build this regressor. The size of this regressor will be determined by the estimation of the intrinsic dimension of an over-sized regressor. This method will be applied to electric consumption of Poland using systematic cross-validation. The nonlinear model used for the prediction is a Kohonen map (self-organizing map).

2025

Clustering methods are commonly used on time series, either as a preprocessing for other methods or for themselves. This paper illustrates the problem of clustering applied on regressor vectors obtained from row time series. It is thus shown why time series clustering may sometimes seem meaningless. A preprocessing is proposed to unfold time series and allow a meaningful clustering of regressors. Graphical and experimental results show the usefulness of the unfolding preprocessing.

2025

Prediction of financial time series using artificial neural networks has been the subject of many publications, even if the predictability of financial series remains a subject of scientific debate in the financial literature. Facing this difficulty, analysts often consider a large number of exogenous indicators, which makes the fitting of neural networks extremely difficult. In this paper, we analyze how to aggregate a large number of indicators in a smaller number using -possibly nonlinear-projection methods. Nonlinear projection methods are shown to be equivalent to the linear Principal Component Analysis when the prediction tool used on the new variables is linear. The methodology developed in the paper is validated on data from the BEL20 market index.

2025

Dimension reduction techniques are widely used for the analysis and visualization of complex sets of data. This paper compares two nonlinear projection methods: Isomap and Curvilinear Distance Analysis. Contrarily to the traditional linear PCA, these methods work like multidimensional scaling, by reproducing in the projection space the pairwise distances measured in the data space. They differ from the classical linear MDS by the metrics they use and by the way they build the mapping (algebraic or neural). While Isomap relies directly on the traditional MDS, CDA is based on a nonlinear variant of MDS, called CCA (Curvilinear Component Analysis). Although Isomap and CDA share the same metrics, the comparison highlights their respective strengths and weaknesses. * This work was realized with the support of the 'Ministère de la Région wallonne', under the 'Programme de Formation et d'Impulsion à la Recherche Scientifique et Technologique'. † M.V. works as a senior research associate of the Belgian FNRS.

2025

With the recent launch of MERIS, a wide range of new possibilities for the periodic land cover characterization at regional scale is available. This sensor offers a combination of innovative features, such as high spectral and temporal resolutions, wide geographical coverage and improved atmospheric correction. We believe that the exploitation of data obtained by this new sensor fills previous technological gaps, improving automatic land cover classes' discrimination. At the same time, the extra spectral information provided by MERIS can introduce some difficulties on land cover characterization with long-established classification techniques, e.g. k-Nearest Neighbour. In this paper we report the performance of artificial neural networks (ANNs) in the context of high spectral dimensional satellite image classification. The main goal of this research is to assess the potential of the Self-Organizing Maps (SOM) neural network to extract complex land cover type information from medium resolution satellite imagery. The study was carried out with MERIS Full Resolution data from 2004 for the continental Portuguese territory.

2025, The Microbe (Elsevier)

Untreatable listeriosis and wastage could be traced to contaminated fruits. This study assessed Listeria spp, antimicrobial resistance and virulence genes in ready-for-sale fruits. Listeria spp was identified in 270 fruits: garden egg (90), tomato (90) and watermelon (90), were purchased from thirty markets, in Southwest Nigeria. Listeria spp were evaluated, identified and sequenced. Antimicrobial sensitivity assay (15 antibiotics), eighteen antimicrobial and nine virulence genes were screened for. Listeria spp 28 (100.00 %) at 66.25 MPN/g comprising of pathogenic (19) (L. monocytogenes 6 (21.43 %), L. ivanovii 5 (17.86 %), L. seegligeri 8 (28.57 %)) and nonpathogenic (9) (L. welshimeri 5 (17.86 %), L. grayi 2 (7.14 %) and L. innocua 1 (3.57 %)) strains were distributed in garden egg 8 (28.57 % at 56.63 MPN/g), tomato 14 (50.00 % at 54.29 MPN/g) and watermelon 6 (21.43 % at 57.00 MPN/g). Carbapenem, chloramphenicol, macrolides, tetracycline and folate resistant Listeria strains with highest prevalence were in fruit from Balogun 9 (60.00 %), Agege 8 (28.57 %) and Lekki 10 (45.45 %) in Lagos state. Virulent L. strains had five L. monocytogenes, two L. ivanovii, eight L. seegligeri, five L. welshimeri, two L. innocua and eight L. grayi in fruit from Lagos, Osun and Ondo States respectively. Listeria monocytogenes, L. ivanovii and L. seegligeri in fruit from Lagos State had prfA, plcA and plcB genes. Fruits could thereby be versatile route for various diverse virulent-resistant Listeria strains that could cause constant listeriosis and spoilage. However, there is need for more enforced, good and healthy handling of fruits on farms and in markets.

2025

Computational analysis of natural language is often focused on the syntactic structure of languageoften with no regard to the overall context and area of expertise of the analyzed text. In this paper, we present a means of analyzing text documents from various areas of expertise to discover groups of thematically similar texts with no prior information about the topics. The presented results show how a relatively simple keyword analysis combined with a SOM projection can be very descriptive in terms of analyzing the contextual relationships between documents and their authors.

2025

Data obtained from molecular dynamics simulation provides important intuition into the dynamical interactions of biological molecules. The chronicles of sequential time-dependent atomic motions of configurations obtained from simulation and the derived properties estimated from molecule's trajectory is specified by this sequence. Therefore, knowing how to efficiently extract representative structures from simulation data is important because often, we will want to identify changes in conformation of a protein structure when simulation is performed. We use unsupervised machine learning techniques to cluster such data and investigated a few of protein structural properties. The algorithms implemented in this paper presents clusters of the simulation data that tends to group frames from an adjacent block of time together, even when sampling at 10 ps intervals. We found that sampling of conformational space for a shorter run simulation may not be able to completely visit all structures that belong to a specific cluster. But for the sufficiently long simulation, the systems revisit previous clusters repeatedly. Cluster populations change rapidly at the initial stage of the simulations, but became steady before each got to their terminal values, indicating equilibrium attainment. Investigation of protein structure properties also attest the correspondence between clusters of protein structures obtained from the clustering algorithms.

2025, Zenodo (CERN European Organization for Nuclear Research)

Electronic noses, instruments for automatic recognition of odours, are typically composed of an array of partially selective sensors, a sampling system, a data acquisition device and a data processing system. For the purpose of evaluating the quality of oliveoil, an electronic nose based on an arrayofconducting polymer sensors capable of discriminating olive oil aromas was developed. The selection of suitable pattern recognition techniques for a particular application can enhance the performance of electronic noses. Therefore, an advanced neural recognition algorithm for improving the measurement capability of the device was designed and implemented. This method combines multivariate statistical analysis and a hierarchical neural-network architecture based on self-organizing maps and error back-propagation. The complete system was tested using samples composed of characteristic olive oil aromatic components in refined olive oil. The results obtained have shown that this approach is effective in grouping aromas into different categories representative of their chemical structure.

2025

Learning to programme requires complex cognitive skills that computing students find it arduous in comprehension. PP (pair programming) is an intensive style of programme cooperation where two people working together in resolving programming scenarios. It begins to draw the interests of educators as a teaching approach to facilitate learning and improve programming performance. The approach of PP, its model, benefits and limitations as well as the LS (learning style) preference are presented in the first part of this paper. The research findings and discussion on the application of PP involving 96 first year computing students are incorporated in the second part of this paper. The participants in these two intact classes were randomly assigned either to the experimental group that received PP or to the control group that received DI (direct instruction) method only. In PP group, students worked in pairs based on the visual-verbal LS dimension and those of DI group work individually. During a seven-week treatment, both groups applied program flowcharts and pseudocode in solving programming tasks. This study used two assessment methods-the formative and summative to examine the students' programming achievements. Two programming assignments were used as a formative assessment tool, also the CPPT (computer programming performance test) as the second tool which comprises of a pre-test, an immediate post-test and a delayed post-test, was administrated to assess the students' programming recall and retention. The result findings indicated that students in PP group significantly outperformed those in DI group for both the formative and summative assessments. However, only the visual and verbal students performed significantly better in recall than the retention. The analysis on the interaction effects revealed that learning is within inner self with regard to the instructional strategies applied and LS preference in classroom environment. In this case, the effectiveness of instructional strategies adopted to foster learning somehow depends on the type of learners. Therefore, educators should reflect on individual learning abilities while applying PP to stimulate students' engagement and critical thinking skills that subsequently will have positive influence on academic performance.

2025, Pattern Analysis and Applications

In this study the impact of a planar and toroidal self-organizing map (SOM) configuration are investigated with respect to their impact on SOM trajectories. Such trajectories are an encoding of processes within an n-dimensional input data set and offer an important means of visualizing and analyzing process complexity in large n-dimensional problem domains. However, discontinuity associated with boundaries in the standard, planar SOM results in error that limits their analytical use. Previous studies have recommended the use of a toroidal SOM to reduce these errors, but fall short of a fully quantified analysis of the benefits that result. In this study, the comparative analysis of fifteen pairs of identically initiated and trained SOMs, of planar and toroidal configuration, allows the error in trajectory magnitude to be quantified and visualized; both within the SOM and data space. This offers an important insight into the impact of planar SOM boundaries that goes beyond the general, statistical measures of clustering efficacy associated with previous work. The adoption of a toroidal SOM can be seen to improve the distribution of error in the trajectory sets, with the specific spatial configuration of SOM neurons associated with the largest errors changing from those at the corners of the planar SOM to a more complex and less predictable pattern in the toroidal SOM. However, this improvement is limited to the smallest 60% of errors, with torus and planar SOMs performing similarly for the largest 40%.

2025, Neural Computing and Applications

Applications in the water treatment domain generally rely on complex sensors located at remote sites. The processing of the corresponding measurements for generating higher-level information such as optimization of coagulation dosing must therefore account for possible sensor failures and imperfect input data. In this paper, selforganizing map (SOM)-based methods are applied to multiparameter data validation and missing data reconstruction in a drinking water treatment. The SOM is a special kind of artificial neural networks that can be used for analysis and visualization of large high-dimensional data sets. It performs both in a nonlinear mapping from a high-dimensional data space to a low-dimensional space aiming to preserve the most important topological and metric relationships of the original data elements and, thus, inherently clusters the data. Combining the SOM results with those obtained by a fuzzy technique that uses marginal adequacy concept to identify the functional states (normal or abnormal), the SOM performances of validation and reconstruction process are tested successfully on the experimental data stemming from a coagulation process involved in drinking water treatment.