Parallel Algorithms Research Papers - Academia.edu (original) (raw)

2025, Lecture Notes in Computer Science

Geographical Information Systems (GIS) are able to manipulate spatial data. Such spatial data can be available in a variety of formats, one of the most important of which is the vector-topological. This format retains the topological relationships between geographical features and is commonly used in a range of geographical data analyses. This paper describes the implementation and performance of a parallel data partitioning algorithm for the input of vector-topological data to parallel processes.

2025, 2009 First Asian Himalayas International Conference on Internet

The essence of High performance computing (HPC) in the field of computation Nanotechnology and problems encountered by HPC arrangement in applying HPC to Nanoenabled calculations have been presented in the paper. A proposal to optimize computations in an HPC setup has been formulated to make Nanotechnology computations more effective and realistic on a CUDA based framework. Results and findings in the expected setup and the computation complexities that will be needed in its implementation have been suggested with an algorithm to take advantage of inbuilt powerful parallelization capabilities of GPU making large scale simulation possible. Implementation of CUDA in certain complex techniques in Nanotechnology is presented with a significant improvement in performance as compared to the last work which was implemented using distributive computing toolbox in MATLAB. We have discussed about the problems that exist and how we can optimize the computations in a HPC setup and how we can make use of computational power of GPU to make Nanotechnology computations more effective and realistic. A description of the progress in this area of research, future works and a probable extension is proposed. N

2025, ACM Transactions on Algorithms

This article presents asymptotically optimal algorithms for rectangular matrix transpose, fast Fourier transform (FFT), and sorting on computers with multiple levels of caching. Unlike previous optimal algorithms, these algorithms are cache oblivious : no variables dependent on hardware parameters, such as cache size and cache-line length, need to be tuned to achieve optimality. Nevertheless, these algorithms use an optimal amount of work and move data optimally among multiple levels of cache. For a cache with size M and cache-line length B where M = Ω ( B 2 ), the number of cache misses for an m × n matrix transpose is Θ (1 + mn / B ). The number of cache misses for either an n -point FFT or the sorting of n numbers is Θ (1 + ( n / B )(1 + log M n )). We also give a Θ ( mnp )-work algorithm to multiply an m × n matrix by an n × p matrix that incurs Θ (1 + ( mn + np + mp )/ B + mnp / B √ M ) cache faults. We introduce an “ideal-cache” model to analyze our algorithms. We prove that a...

2025, The Journal of Supercomputing

Over the past years, researchers drew their attention to propose optoelectronic architectures, including optical transpose interconnection system (OTIS) networks. On the other hand, there are limited attempts devoted to design parallel algorithms for applications that could be mapped on such optoelectronic architectures. Thus, exploiting the attractive features of OTIS networks and investigating their performance in solving combinatorial optimization problems become a great necessity. In this paper, a parallel repetitive nearest neighbor algorithm for solving the symmetric traveling salesman problem on OTIS-Hypercube and OTIS-Mesh optoelectronic architectures is presented. This algorithm has been evaluated analytically and by simulation on both optoelectronic architectures in terms of number of communication steps, parallel run time, speedup, efficiency, cost and communication cost. The simulation results attained almost near-linear speedup and high efficiency among the two selected optoelectronic architectures, where OTIS-Hypercube gained better results in comparison with OTIS-Mesh.

2025, Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data

The appeal of MapReduce has spawned a family of systems that implement or extend it. In order to enable parallel collection processing with User-Defined Functions (UDFs), these systems expose extensions of the MapReduce programming model as library-based dataflow APIs that are tightly coupled to their underlying runtime engine. Expressing data analysis algorithms with complex data and control flow structure using such APIs reveals a number of limitations that impede programmer's productivity. In this paper we show that the design of data analysis languages and APIs from a runtime engine point of view bloats the APIs with low-level primitives and affects programmer's productivity. Instead, we argue that an approach based on deeply embedding the APIs in a host language can address the shortcomings of current data analysis languages. To demonstrate this, we propose a language for complex data analysis embedded in Scala, which (i) allows for declarative specification of dataflows and (ii) hides the notion of dataparallelism and distributed runtime behind a suitable intermediate representation. We describe a compiler pipeline that facilitates efficient data-parallel processing without imposing runtime engine-bound syntactic or semantic restrictions on the structure of the input programs. We present a series of experiments with two state-of-the-art systems that demonstrate the optimization potential of our approach.

2025, J. Inf. Sci. Eng.

This paper presents a method for fault-tolerant broadcasting in faulty hypercubes using a new metric called local safety. A new concept of the broadcast subcube is introduced, based on which various techniques are proposed to improve the performance of a broadcast algorithm. An unsafe hypercube can be split into a set of maximal safe sub-cubes. We show that if these maximal safe subcubes meet certain requirements given in the paper, broadcasting can still be carried out successfully, and in some cases optimal broadcast is still possible. The sufficient condition for optimal broadcasting is also presented. Limited backtracks are utilized in the process of broadcasting by setting up a partial broadcast tree. Extensive simulation results are presented.

2025, Computers & Mathematics with Applications

Qualitative properties of matrix splitting methods for linear systems with tridiagonal and block tridiagonal Stieltjes-Toeplitz matrices are studied. Two particular splittings, the so-called symmetric tridiagonal splittings and the bidiagonal splittings, are considered, and conditions for qualitative properties like nonnegativity and shape preservation are shown for them. Special attention is paid to their close relation to the well-known splitting techniques like regular and weak regular splitting methods. Extensions to block tridiagonal matrices are given, and their relation to algebraic representations of domain decomposition methods is discussed. The paper is concluded with illustrative numerical experiments.

2025, Journal of Signal Processing Systems

Low-Density Parity-Check (LDPC) codes are very powerful channel coding schemes with a broad range of applications. The existence of low complexity (i.e., linear time) iterative message passing decoders with close to optimum error correction performance is one of the main strengths of LDPC codes. It has been shown that the performance of these decoders can be further enhanced if the LDPC codes are extended to higher order Galois fields, yielding so called non-binary LDPC codes. However, this performance gain comes at the cost of rapidly increasing decoding complexity. To deal with this increased complexity, we present an efficient implementation of a signed-log domain FFT decoder for non-binary irregular LDPC codes which exploits the inherent massive parallelization capabilities of message passing decoders. We employ Nvidia's Compute Unified Device Architecture (CUDA) to incorporate the available processing power of Parts of this paper have been presented at the 2013 IEEE Workshop on Signal Processing System under the title "High Speed Decoding of Non-Binary Irregular LDPC Codes Using GPUs" [3] M. Beermann ( ) • P. Vary

2025, Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming - PPOPP '93

We present a solution to the reaching definitions problem for programs with explicit lexicully specified parallel constructs, such as cobeginicoend orparallel.sections, hothwith and without explicit synchronization operations, such as Post, Wait or Advance. The reaching definitiona information for sequential programs is used to solve many standard optimization problems. ln parallel programs, th~information can also be used to explicitly direct communication and data ownership. Although work has been done on analyzing parallel programs to detect data races, little work has been done on optimizing such programs. We show how the memory consistency model specified by an explicitly parallel programming language can influence the complexity of the reaching definitions problem. By selecting the "weakest" memory consistency semantics, we can efficiently solve the reaching definitions problem for correct programs,

2025, Parallel Computing

In this paper, we s u r v ey loop parallelization algorithms, analyzing the dependence representations they use, the loop transformations they generate, the code generation schemes they require, and their ability to incorporate various optimizing criteria such as maximal parallelism detection, permutable loops detection, minimization of synchronizations, easiness of code generation, etc. We complete the discussion by presenting new results related to code generation and loop fusion for a particular class of multi-dimensional schedules, called shifted linear schedules. We demonstrate that algorithms based on such s c hedules, while generally considered as too complex, can indeed lead to simple codes.

2025, 2009 16th International Conference on Systems, Signals and Image Processing

In this work we propose a real time 2D parallel featurebased for image morphing and warping algorithm fully implemented in GPU (Graphics Processing Units). We applied the proposed algorithm to animate the appearance of a 3D character's face by morphing its texture map. We compared the performance of the proposed algorithm with the sequential version implemented in CPU and the results have shown that the method is promising, being scalable on the number of features.

2025, 2012 IEEE International Symposium on Information Theory Proceedings

A class of two-bit bit flipping algorithms for decoding low-density parity-check codes over the binary symmetric channel was proposed in . Initial results showed that decoders which employ a group of these algorithms operating in parallel can offer low error floor decoding for high-speed applications. As the number of two-bit bit flipping algorithms is large, designing such a decoder is not a trivial task. In this paper, we describe a procedure to select collections of algorithms that work well together. This procedure relies on a recursive process which enumerates error configurations that are uncorrectable by a given algorithm. The error configurations uncorrectable by a given algorithm form its trapping set profile. Based on their trapping set profiles, algorithms are selected so that in parallel, they can correct a fixed number of errors with high probability.

2025, IEEE Transactions on Parallel and Distributed Systems

A new approach to broadcast in wormhole-routed two-and three-dimensional torus networks is proposed. The approach extends the concept of dominating sets from graph theory by accounting for the relative distance-insensitivity of the wormhole routing switching strategy and by taking advantage of an allport communication architecture. The resulting broadcast operation is based on a tree structure that uses multiple levels of extended dominating nodes (EDN). Performance results are presented that con rm the advantage of this method over recursive doubling.

2025, Parallel Computing

This paper presents a global reduction algorithm for wormhole-routed 2D meshes. Well-known reduction algorithms that are optimized for short vectors have complexity O(M log N), where N = n x n is the number of nodes, and M the vector length. Algorithms suitable for long vectors have complexity O(fi + M). Previously known asymptotically optimal algorithms with complexity O(log N + M) incur inherent network contention among constituent messages. The proposed algorithm adapts to the given vector length, resulting in complexities O(M log N) for short vectors, O(log N + M) for medium-sized vectors, and O(fi + M) for sufficiently long vectors. The O(fi + M) version is preferred to the O(log N + M) version for long vectors, due to its small coefficient associated with M, the dominating factor for such vectors. The algorithm is contention-free in a synchronous environment. Under asynchronous execution models, depth contention (contention among message-passing steps) may occur. However, simulation studies show that the effect of depth contention on the actual performance is negligible. 0 1997 Elsevier Science B.V.

2025, IEEE Transactions on Pattern Analysis and Machine Intelligence

Algorithmic enhancements are described that enable large computational reduction in mean square-error data clustering. These improvements are incorporated into a parallel data-clustering tool, P-CLUSTER, designed to execute on a network of workstations. Experiments involving the unsupervised segmentation of standard texture images were performed. For some data sets, a 96 percent reduction in computation was achieved.

2025, IEEE Transactions on Parallel and Distributed Systems

2025, 2006 International Conference on Advanced Computing and Communications

This paper presents an object-oriented simulation environment to evaluate and restructure parallel programs for software distributed shared memory (DSM) systems. This simulator provides a toolbox for various network topologies and communication parameters. The simulator models a software DSM system that can support shared memory as well as message passing. Prediction of performance of parallel programs helps the compilers to analyze, transform and to generate efficient and highly parallel code. The performance characteristics like speedup and message passing delays help also in the design of a parallel machine under development by predicting its performance using benchmark programs. This simulator is designed to study the performance characteristics of the shared memory parallel programs and also message passing parallel programs. Four popular parallel algorithms - Reduction, Radix sort (SPLASH-2), Block matrix multiplication and Hyper quicksort have been studied using this simulator and the results are presented.

2025, Parallel Computing

We propose a parallel and asynchronous approach to give near-optimal solutions to the non-fixed point-to-point connection problem. This problem is NP-hard and has practical applications in multicast routing. The technique adopted to solve the problem is an organization of heuristics that communicate with each other by means of a virtually shared memory. This technique is called A-Teams (for Asynchronous Teams). The virtual shared memory is implemented in a physically distributed memory system. Computational results comparing our approach with a branchand-cut algorithm are presented.

2025, Parallel Computing

2025, 19th IEEE International Parallel and Distributed Processing Symposium

We live in an era of data explosion that necessitates the discovery of novel out-of-core techniques. The I/O bottleneck has to be dealt with in developing out-of-core methods. The Parallel Disk Model (PDM) has been proposed to alleviate the I/O bottleneck. Sorting is an important problem that has ubiquitous applications. Several asymptotically optimal PDM sorting algorithms are known and now the focus has shifted to developing algorithms for problem sizes of practical interest. In this paper we present several novel algorithms for sorting on the PDM that take only a small number of passes through the data. We also present a generalization of the zero-one principle for sorting. A shuffling lemma is presented as well. These lemmas should be of independent interest for average case analysis of sorting algorithms as well as for the analysis of randomized sorting algorithms.

2025, SIAM Journal on Computing

2025, International Journal of Advance Research and Innovative Ideas in Education

Keyword search generally used to search large amount of data. There are certain difficulties occurred to answer some queries due to their ambiguity. Also due to short and uncertain keywords diversification of keyword creates a problem. We proposed a system to address these problems. Our system automatically expands keyword search. It is based on different context information of the XML data. Our system firstly selects a feature selection model for designing an effective XML keyword search from a large database. Then it will automatically diversify the keyword search. For searching keyword from XML data a short and vague keyword query is used. Feature selection model is used to derive search candidate of the query search. Our proposed model proves the effectiveness of our system by evaluating real as well as synthetic datasets. In this system more efficiency can be achieved as we proposed pruning algorithm and Hadoop platform for implementation of our system.

2025, Theoretical and Computational Fluid Dynamics

Many fluid flows of engineering interest, though very complex in appearance, can be approximated by low-order models governed by a few modes, able to capture the dominant behavior (dynamics) of the system. This feature has fueled the development of various methodologies aimed at extracting dominant coherent structures from the flow. Some of the more general techniques are based on data-driven decompositions, most of which rely on performing a singular value decomposition (SVD) on a formulated snapshot (data) matrix. The amount of experimentally or numerically generated data expands as more detailed experimental measurements and increased computational resources become readily available. Consequently, the data-matrix to be processed will consist of far more rows than columns, resulting in a so-called tall-and-skinny (TS) matrix. Ultimately, the SVD of such a TS data-matrix can no longer be performed on a single processor and parallel algorithms are necessary. The present study employs the parallel TSQR algorithm of , which is further used as a basis of the underlying parallel SVD. This algorithm is shown to scale well on machines with a large number of processors and, therefore, allows the decomposition of very large data-sets. In addition, the simplicity of its implementation and the minimum required communication makes it suitable for integration in existing numerical solvers and data-decomposition techniques. Examples that demonstrate the capabilities of highly parallel datadecomposition algorithms include transitional processes in compressible boundary layers without and with induced flow separation.

2025

Resumen Esta línea de investigación se enfoca en las metaheurísticas distribuidas, en especial los algoritmos genéticos distribuidos, dado que permiten reducir significativemente la complejidad temporal de la resolución y mejorar la calidad de las soluciones obtenidas. Dos características sumamente importantes a la hora de resolver problmas NP-duros y NP-completos. Por un lado, esta línea de investigación se dedica a la configuración y evaluación del desempeño de los algoritmos genéticos distribuidos ejecutados sobre plataformas heterogéneas. En este sentido surge una nueva metodología, denominada HAPA, cuyo fin es brindar una implementación eficiente y eficaz de este tipo de algoritmos. Por otro lado, la investigación se enfoca en una nueva política migratoria de los algoritmos genéticos distribuidos, con el objetivo de mejorar su desempeño. Para ello se propone una estrategia centrada en la obligatoriedad de participación en el crossover de los individuos recibidos por medio de la...

2025

En la actualidad los algoritmos evolutivos (AE) se usan para buscar soluciones a problemas complejos para los cuales otras técnicas pueden insumir mucho tiempo y que, por lo general, proveen una única solución óptima. Una tendencia actual consiste en disponer de la mayor cantidad de recursos computacionales para alcanzar los resultados de forma más rápida por medio de un trabajo cooperativo. La inclusión del paralelismo, distribución de tareas en varios procesadores, en el diseño de los algoritmos evolutivos ha sido muy importante dando lugar a mecanismos de búsqueda y optimización mejorados: algoritmos evolutivos paralelos. Este trabajo presenta una breve revisión de los algoritmos evolutivos paralelos. Además, realiza un análisis comparativo del comportamiento de estos algoritmos con su versión secuencial, a fin de identificar cuáles son sus aciertos y debilidades. El paquete de software utilizado responde a un modelo unificado desarrollado en la Universidad de M álaga. La evaluación de los algoritmos se realiza analizando los resultados obtenidos para dos problemas de optimización bien conocidos como lo son: OneMax y Mochila Binaria.

2025

A network of identical processors that work synchronously at discrete steps is given. At each step every processor sends messages only to a given subset of its neighboring processors and receives only from the remaining neighbors. The computation starts with one distinguished processor in a particular starting state and all other processors in a quiescent state. The problem is the following: to set all the processors in a given state for the first time and at the very same instant. This problem is known as the firing squad synchronization problem and was introduced in [9]. In this paper solutions are presented that synchronize processors communicating on one-way links arranged in a ring or in a square with rows and columns that are rings. In particular, we provide optimal algorithms to synchronize both of the networks. In addition, compositions of solutions are shown and solutions which synchronize at a time f (n) are given for f (n) equal to n 2 , n log n, and 2 n .

2025, III Congreso Argentino de Ciencias de la Computación

2025

An “any time” algorithm has to deliver its answer any time it is needed; it has to provide an answer or a solution in certain (often small) amount of time, which is not known in advance. Thus, the algorithm must have always “an (approximate) answer at hand”, that may keep refining as time passes, until the demand for it arrives. Some ways to create any time (ant) algorithms from non ant ones is presented. A collection of ant algorithms is given. Several manners to form new ant algorithms out of old ones are exposed. It is also shown a creative way to use the idle time (the time the ant algorithm is doing nothing, waiting to be called) to improve the values that are required in a hurry.

2025, AMW

Hypertree decompositions, generalized hypertree decompositions, and fractional hypertree decompositions are hypergraph decomposition methods successfully used for answering conjunctive queries and for the solution of constraint satisfaction problems. In this work, we present new intractability and tractability results for the problem of recognizing if a given hypergraph has a generalized or fractional hypertree decomposition of low width.

2025, arXiv (Cornell University)

We introduce a variation of the scheduling with precedence constraints problem that has applications to molecular folding and production management. We are given a bipartite graph H = (B, S). Vertices in B are thought of as goods or services that must be bought to produce items in S that are to be sold. An edge from j ∈ S to i ∈ B indicates that the production of j requires the purchase of i. Each vertex in B has a cost, and each vertex in S results in some gain. The goal is to obtain an ordering of B ∪ S that respects the precedence constraints and maximizes the minimal net profit encountered as the vertices are processed. We call this optimal value the budget or capital investment required for the bipartite graph, and refer to our problem as the bipartite graph ordering problem. The problem is equivalent to a version of an NP-complete molecular folding problem that has been studied recently . Work on the molecular folding problem has focused on heuristic algorithms and exponential-time exact algorithms for the un-weighted problem where costs are ±1 and when restricted to graphs arising from RNA folding. The present work seeks exact algorithms for solving the bipartite ordering problem. We demonstrate an algorithm that computes the optimal ordering in time O * (2 n ) when n is the number of vertices in the input bipartite graph. We give non-trivial polynomial time algorithms for finding the optimal solutions for bipartite permutation graphs, trivially perfect bipartite graphs, co-bipartite graphs. We introduce a general strategy that can be used to find an optimal ordering in polynomial time for bipartite graphs that satisfy certain properties. One of our ultimate goals is to completely characterize the classes of graphs for which the problem can be solved exactly in polynomial time. Job Scheduling with Precedence Constraints The setting of job scheduling with precedence constraints is a natural one that has been much studied (see, e.g., ). A number of variations

2025, ArXiv

2025

In this paper we describe the parallelization of a data structure used to perform multimedia web searches. Multimedia Web Engines have not been deeply studied and is a challenging issue. The data structure selected to index the queries is the Spatial Approximation Tree, where the complexity measure is given by the number of distance computed to retrieve those objects close enough to the query. We present a parallel method for load balancing the work performed by the processors. The method can adapt itself to the changes of the workload produced by the user queries. Empirical results with different kind of databases show efficient performance in a real cluster of PC. The algorithm is designed with the bulk-synchronous model of parallel computing.

2025, Journal of emerging technologies and innovative research

Parallel computing, a form of computation which allows many instructions in a program to run concurrently, in parallel. In fact to accomplish this, a program has to be split into independent parts so that each processor can execute its part of program simultaneously with the other processor. Algorithmic Species, an algorithm classification which captures low-level algorithmic details and represents them with the use of five easy way to understand array access patterns. To inscribe the challenge of parallel programming, we propose Bones. Bones is one of a source-to-source compiler that deploy on algorithmic skeletons and, a new algorithm classification. To evaluate the applicability of algorithmic species and to validate A-Darwin and Bones, we test against the HPCC benchmark. In this work, the existing species for all algorithmic classification are analyzed and some of the computationally intensive programs from HPCC benchmark have been tested using existing tools such as Bones compi...

2025

Fil: Schab, Esteban Alejandro. Universidad Tecnologica Nacional. Facultad Regional Concepcion del Uruguay; Argentina.

2025, Graphical Models /graphical Models and Image Processing /computer Vision, Graphics, and Image Processing

We present an efficient method for visual simulations of shock phenomena in compressible, inviscid fluids. Our algorithm is derived from one class of the finite volume method especially designed for capturing shock propagation, but offers improved efficiency through physically-based simplification and adaptation for graphical rendering. Our technique is well suited for parallel implementation on multicore architectures and is also capable of handling complex, bidirectional object-shock interactions stably and robustly. We describe its applications to various visual effects, including explosion, sonic booms and turbulent flows. Categories and Subject Descriptors (according to ACM CCS): I.3.7 [Computer Graphics]: Three-Dimensional Graphics and Realism G.1.8 [Mathematics of Computing]: Partial Differential Equations Physically correct methods for shockwave modeling focus less on conventional metrics of accuracy (such as order of convergence) and emphasize the ability to propagate discontinuities stably and with minimal diffusion. Specifically, techniques based on the finite volume method (FVM) have been developed that handle discontinuities well and al-

2025

El eje central de la línea presentada son los temas de procesamiento paralelo y distribuido para HPC (fundamentos y aplicaciones). Interesa la construcción, evaluación y optimización de soluciones con algoritmos concurrentes, paralelos y distribuidos sobre diferentes plataformas de software y arquitecturas con múltiples procesadores (multicore, clusters de multicore, cloud y aceleradores como GPU, FPGA y Xeon Phi), los lenguajes y paradigmas de programación paralela (puros e híbridos), los modelos de representación de aplicaciones paralelas, los algoritmos de mapping y scheduling, el balance de carga, las métricas de evaluación de complejidad y rendimiento (speedup, eficiencia, escalabilidad, consumo energético), y la construcción de ambientes para la enseñanza de la programación concurrente y paralela. Se propone aplicar los conceptos en problemas numéricos y no numéricos de cómputo intensivo y/o sobre grandes volúmenes de datos (búsquedas, simulaciones, n-body, imágenes, big data,...

2025

El eje central de la línea presentada son los temas de procesamiento paralelo y distribuido para HPC (fundamentos y aplicaciones). Interesa la construcción, evaluación y optimización de soluciones con algoritmos concurrentes, paralelos y distribuidos sobre diferentes plataformas de software y arquitecturas con múltiples procesadores (multicore, clusters de multicore, cloud y aceleradores como GPU, FPGA y Xeon Phi), los lenguajes y paradigmas de programación paralela (puros e híbridos), los modelos de representación de aplicaciones paralelas, los algoritmos de (mapping y scheduling), el balance de carga, las métricas de evaluación de complejidad y rendimiento (speedup, eficiencia, escalabilidad, consumo energético), y la construcción de ambientes para la enseñanza de la programación concurrente y paralela.

2025

El eje de esta línea de I/D lo constituye el estudio de tendencias actuales en las áreas de arquitecturas y algoritmos paralelos. Incluye como temas centrales: Arquitecturas Many-core (GPU, procesadores MIC), Arquitecturas híbridas (diferentes combinaciones de multicores y GPUs) y Arquitecturas heterogéneas. HPC en Cloud Computing, especialmente para aplicaciones de Big Data. Lenguajes y Estructuras de Datos para nuevas arquitecturas de cómputo paralelo. Desarrollo y evaluación de algoritmos paralelos sobre nuevas arquitecturas y su evaluación de rendimiento energético y computacional. Empleo de contadores de hardware, en particular en toma de decisiones en tiempo de ejecución.

2025

CONTEXTO Se presenta una línea de Investigación que es parte de los Proyectos 11/F010 “Arquitecturas multiprocesador distribuidas. Modelos, Software de Base y Aplicaciones” y 11/F011 “Procesamiento paralelo y distribuido. Fundamentos y aplicaciones en Sistemas Inteligentes y Tratamiento de imágenes y video” del III-LIDI acreditados por el Ministerio de Educación. Asimismo los proyectos “Eficiencia energética en Sistemas Paralelos” y “Algoritmos Paralelos utilizando GPGPUs. Análisis de rendimiento” financiados por la Facultad de Informática de la UNLP. En el tema hay cooperación con varias Universidades de Argentina y se está trabajando con Universidades de América Latina y Europa en proyectos financiados por CyTED, AECID y la OEI (Organización de Estados Iberoamericanos). Se participa en iniciativas como el Programa IberoTIC de intercambio de Profesores y Alumnos de Doctorado en el área de Informática. Por otra parte, se tiene financiamiento de Telefónica de Argentina en Becas de gr...

2025

El eje de esta línea de I/D lo constituye el estudio de tendencias actuales en las áreas de arquitecturas y algoritmos paralelos. Incluye como temas centrales: Arquitecturas many-core (GPU, procesadores MIC), FPGAs, híbridas (diferentes combinaciones de multicores y aceleradores), y asimétricas. Cloud Computing para HPC (especialmente para aplicaciones de Big Data) y sistemas distribuidos de tiempo real (Cloud Robotics). Desarrollo y evaluación de algoritmos paralelos sobre nuevas arquitecturas y su evaluación de rendimiento energético y computacional.

2025

El eje central de la línea son los temas de procesamiento paralelo y distribuido para HPC (fundamentos y aplicaciones). Interesa la construcción, evaluación y optimización de soluciones sobre diferentes plataformas de software y arquitecturas con múltiples procesadores (multicore, clusters, cloud, aceleradores y placas de bajo costo), los lenguajes y paradigmas de programación paralela (puros e híbridos), los modelos de representación de aplicaciones paralelas, los algoritmos de mapping y scheduling, el balance de carga, las métricas de evaluación de complejidad y rendimiento computacional y energético, y la construcción de ambientes para la enseñanza de la programación concurrente y paralela. Se propone aplicar los conceptos en problemas numéricos y no numéricos de cómputo intensivo y/o sobre grandes volúmenes de datos (búsquedas, simulaciones, n-body, big data, reconocimiento de patrones, bioinformática, etc), con el fin de obtener soluciones de alto rendimiento. En la dirección d...

2025

El eje central de la línea de I/D lo constituye el estudio de temas de procesamiento paralelo y distribuido para cómputo de altas prestaciones, en lo referente a los fundamentos y a las aplicaciones. Incluye problemas de software... more

2025, Proceedings Sixth International Parallel Processing Symposium

We develop reconfigurable mesh (RMESH) algorithms for window broadcasting, data shifts, and consecutive sum. These are then used to develop efficient algorithms to compute the histogram of an image and to perform histogram modification. The histogram of an N×N image is computed by an N×N RMESH in O (√ ¡ B log √ ¢ ¢ B (N/√ £ B ) for B < N, Ο(√ ¤ N ) for B = N, and Ο(√ £ B ) for B > N. B is the number of gray scale values. Histogram modification is done in O (√ ¤ N ) time by an N×N RMESH.