High Performance Computing Research Papers (original) (raw)

With the increasing functionality and complexity of distributed systems, resource failures are inevitable. While numerous models and algorithms for dealing with failures exist, the lack of public trace data sets and tools has prevented... more

With the increasing functionality and complexity of distributed systems, resource failures are inevitable. While numerous models and algorithms for dealing with failures exist, the lack of public trace data sets and tools has prevented meaningful comparisons. To facilitate the design, validation, and comparison of fault-tolerant models and algorithms, we have created the Failure Trace Archive (FTA) as an online public repository of availability traces taken from diverse parallel and distributed systems. Our main contributions in this study are the following. First, we describe the design of the archive, in particular the rationale of the standard FTA format, and the design of a toolbox that facilitates automated analysis of trace data sets. Second, applying the toolbox, we present a uniform comparative analysis with statistics and models of failures in nine distributed systems. Third, we show how different interpretations of these data sets can result in different conclusions. This emphasizes the critical need for the public availability of trace data and methods for their analysis.

As the gap between memory and processor speeds continues to widen, cache efficiency is an increasingly important component of processor performance. Compiler techniques have been used to improve instruction and data cache performance for... more

As the gap between memory and processor speeds continues to widen, cache efficiency is an increasingly important component of processor performance. Compiler techniques have been used to improve instruction and data cache performance for virtually indexed caches by mapping code and data with temporal locality to different cache blocks. In this

Porting well known computer vision algorithms to low power, high performance computing devices such as SIMD linear processor arrays can be a challenging task. One especially useful such algorithm is the color-based particle filter, which... more

Porting well known computer vision algorithms to low power, high performance computing devices such as SIMD linear processor arrays can be a challenging task. One especially useful such algorithm is the color-based particle filter, which has been applied successfully by many research groups to the problem of tracking non-rigid objects. In this paper, we propose an implementation of the color-based particle filter suitable for SIMD processors. The main focus of our work is on the parallel computation of the particle weights. This step is the major bottleneck of standard implementations of the color-based particle filter since it requires the knowledge of the histograms of the regions surrounding each hypothesized target position. We expect this approach to perform faster in an SIMD processor than an implementation in a standard desktop computer even running at much lower clock speeds.

In object-oriented (OO) languages, the ability to encapsulate software concerns of the dominant decomposition in objects is the key to reaching high modularity and loss of complexity in large scale designs. However, distributed memory... more

In object-oriented (OO) languages, the ability to encapsulate software concerns of the dominant decomposition in objects is the key to reaching high modularity and loss of complexity in large scale designs. However, distributed memory parallelism tends to break modularity and encapsulation of concerns in OO languages, since a parallel computation cannot be encapsulated in an individual object. For reconciling object-orientation and distributed memory parallelism, we propose PObC++, a OO language that introduces innovative ideas for object-oriented parallel programming (OOPP).

The study of material culture generated by military engagements has created an emergent sub-discipline of archaeological studies centred on battlefields. This approach has developed a particular and sophisticated methodology that is able... more

The study of material culture generated by military engagements has created an emergent sub-discipline of archaeological studies centred on battlefields. This approach has developed a particular and sophisticated methodology that is able to deal with the fact that archaeologists will often not find either structures or a useful stratigraphical record on the site, as the material remains of the battle

We present two designs (I and II) for IEEE 754 double precision floating point matrix multiplication, optimized for implementation on high-end FPGAs. It forms the kernel in many important tile-based BLAS algorithms, making an excellent... more

We present two designs (I and II) for IEEE 754 double precision floating point matrix multiplication, optimized for implementation on high-end FPGAs. It forms the kernel in many important tile-based BLAS algorithms, making an excellent candidate for acceleration. The designs, both based on the rank-1 update scheme, can handle arbitrary matrix sizes, and are able to sustain their peak performance

Data Stream Processing (DaSP) is a recent and highly active research field, applied in various real world scenarios. Differently than traditional applications, input data is seen as transient continuous streams that must be processed “on... more

Data Stream Processing (DaSP) is a recent and highly active research field, applied in various real world scenarios. Differently than traditional applications, input data is seen as transient continuous streams that must be processed “on the fly”, with critical requirements on throughput, latency and memory occupancy. A parallel solution is often advocated, but the problem of designing and implementing high throughput and low latency DaSP applications is complex per se and because of the presence of multiple streams characterized by high volume, high velocity and high variability. Moreover, parallel DaSP applications must be able to adapt themselves to data dynamics in order to satisfy desired QoS levels. The aim of our work is to study these problems in an integrated way, providing to the programmers a methodological framework for the parallelization of DaSP applications.

As técnicas de processamento digital de imagens tendem a ser extremamente custosas computacionalmente. Isso leva os cientistas da área a procurar soluções com desempenho otimizado. Sendo assim, neste trabalho foi implementado um hardware... more

As técnicas de processamento digital de imagens tendem a ser extremamente custosas computacionalmente. Isso leva os cientistas da área a procurar soluções com desempenho otimizado. Sendo assim, neste trabalho foi implementado um hardware otimizado (comumente tem-se soluções baseadas em software) para os filtros Sharpen e Smooth de pré-processamento de imagens, os quais tem por objetivo melhorar a qualidade da imagem, cujo destino, nesse caso, é passar pelo filtro Prewitt de detecção de bordas.

This contribution provides a definition of High Performance Computing, and then considers the importance of the computing paradigm for the humanities. It identifies potential applications of HPC for the humanities, and factors in Canada... more

This contribution provides a definition of High Performance Computing, and then considers the importance of the computing paradigm for the humanities. It identifies potential applications of HPC for the humanities, and factors in Canada that currently constrain effective humanist exploitation of HPC.

With increasing interest among mainstream users to run HPC applications, Infrastructure-as-a-Service (IaaS) cloud computing platforms represent a viable alternative to the acquisition and maintenance of expensive hardware, often out of... more

With increasing interest among mainstream users to run HPC applications, Infrastructure-as-a-Service (IaaS) cloud computing platforms represent a viable alternative to the acquisition and maintenance of expensive hardware, often out of the financial capabilities of such users. Also, one of the critical needs of HPC applications is an efficient, scalable and persistent storage. Unfortunately, storage options proposed by cloud providers are not standardized and typically use a different access model. In this context, the local disks on the compute nodes can be used to save large data sets such as the data generated by Checkpoint-Restart (CR). This local storage offers high throughput and scalability but it needs to be combined with persistency techniques, such as block replication or erasure codes. One of the main challenges that such techniques face is to minimize the overhead of performance and I/O resource utilization (i.e., storage space and bandwidth), while at the same time guar...

The purpose of this paper is to investigate the e-commerce credibility factors affecting the perception of users in Saudi Arabia and, moreover, to investigate whether the variation of credibility factors in Saudi Arabian e-commerce... more

The purpose of this paper is to investigate the e-commerce credibility factors affecting the perception of
users in Saudi Arabia and, moreover, to investigate whether the variation of credibility factors in Saudi
Arabian e-commerce websites influence users' performance. Website credibility, which refers to the
believability of the website and its content, plays an important role in consumers’ successful online
shopping experience and satisfaction. This investigation is conducted by employing two credibility
evaluation methods: heuristic evaluation and performance measurement. This study adopts Fogg's 10
Stanford credibility guidelines as a starting point for the heuristic evaluation. In the performance
measurement method, two measurements are used: the amount of time needed to finish the task and the
total number of clicks taken to finish the task. A frequency analysis of the comments and a one-way ANOVA
test are used to establish the results. Three e-commerce websites in Saudi Arabia are selected. The findings
show that Fogg’s 10 Stanford credibility guidelines can be implemented in the Saudi Arabian e-commerce
context with minor modifications and expansions by adding reputation, endorsement, security, and service
diversity guidelines. Another important finding is that professional website design plays a vital role in
users' first impression of websites, while usability is the most important credibility factor investigated used
to evaluate the credulity of an e-commerce website. Lastly, the results of this study indicate a relationship
between the e-commerce credibility level and users’ performance. This paper contributes to the literature
by providing a set of credibility guidelines associated with specific criteria, which can be assessed to
improve the future of e-commerce in Saudi Arabia.

Finding a longest common subsequence (LCS) of two sequences is an important problem for several fields, such as biology, medicine and linguistics. LCS algorithm is usually implemented with dynamic programming techniques where a score... more

Finding a longest common subsequence (LCS) of two sequences is an important problem for several fields, such as biology, medicine and linguistics. LCS algorithm is usually implemented with dynamic programming techniques where a score matrix (C) is filled to determine the size of the LCS. Parallelization of this task usually follows the wavefront pattern. When using popular APIs, such as OpenMP, wavefront is implemented by processing the elements of each diagonal of C in parallel. This approach restricts parallelism and forces threads to wait on a barrier at the end of the computation of each diagonal. In this paper we propose a dataflow parallel version of LCS. Multiple tasks are created, each one being responsible for processing a block of C, and task execution is fired when all data dependencies are satisfied. Comparison with OpenMP implementation showed performance gains of up to 23%, which suggests that dataflow execution can be an interesting alternative to wavefront applications.

The leading-edge of Internet of Things (IoT) gradually make item available on the Internet but data processing is not scaling effectively to fulfil the requirements of centralized cloud environment. One of the main reason of this problem... more

The leading-edge of Internet of Things (IoT) gradually make item available on the Internet but data processing is not scaling effectively to fulfil the requirements of centralized cloud environment. One of the main reason of this problem is that deadline oriented cloud applications such as health monitoring, flight control system and command control system, which needs minimum latency and response time originated by transmission of large amount of data (Big Data) to centralized database and then database to an IoT application or end device which leads to performance degradation. Fog computing is an innovative solution to reduce the delay (or latency), resource contention and network congestion, in which cloud is extended to the edge of the network. We proposed a fog-assisted information model in this paper, which delivers healthcare as a cloud service using IoT devices. Further, proposed model efficiently manages the data of heart patients, which is coming through their user requests. iFogSim toolkit is used to analyse the performance of proposed model in Fog-enabled cloud environment.

This paper covers the main subjects that have been discussed during the very interesting lecture of Change Management by Dr. Foster. This course consisted of severallectures where we discussed the theory in the first part and the second... more

This paper covers the main subjects that have been discussed during the very interesting lecture of Change Management by Dr. Foster. This course consisted of severallectures where we discussed the theory in the first part and the second part of the lecture we could benefit of a guest lecture from a practitioner from the field. The variation of the lectures in combination with the presence of the professionals from the field gave us the perfect example of implementing the theory in practice. We have been familiarized with different theories and they possible outcomes of them. Every guest lecturers gave us their perception of change and change management and how they implemented change in their organizations. This made it possible for us to spot the similarities, differencesand best practices according to these organizations.

As the trends of process scaling make memory system even more crucial bottleneck, the importance of latency hiding techniques such as prefetching grows further. However, naively using prefetching can harm performance and energy efficiency... more

As the trends of process scaling make memory system even more crucial bottleneck, the importance of latency hiding techniques such as prefetching grows further. However, naively using prefetching can harm performance and energy efficiency and hence, several factors and parameters need to be taken into account to fully realize its potential. In this paper, we survey several recent techniques that aim to improve implementation and effectiveness of prefetching. We characterize the techniques on several parameters to highlight their similarities and differences. The aim of this survey is to provide insights to researchers into working of prefetching techniques and spark interesting future work for improving the performance advantages of prefetching even further.

Abstract–In Cellular networks, number of users increase in an exponential manner. As a result of this exponential growth of users, overloading of the system takes place. Besides this, unacceptable delays and high computational cost create... more

Abstract–In Cellular networks, number of users increase in an exponential manner. As a result of this exponential growth of users, overloading of the system takes place. Besides this, unacceptable delays and high computational cost create a strong prerequisite for efficient location management techniques. Motivated by this fact, a novel user profile based (UPB) scheme is proposed in this paper to track the location of the mobile users.

With the ubiquity and pervasiveness of mobile computing, together with the increasing number of social networks, end-users have learned to live and share all kinds of information about themselves. As an example, Facebook reports that it... more

With the ubiquity and pervasiveness of mobile computing, together with the increasing number of social networks, end-users have learned to live and share all kinds of information about themselves. As an example, Facebook reports that it has currently 500 million active users, 200 million of which access its services on mobile systems; moreover, users that access Facebook through mobile applications are twice as active as non-mobile users, and it is used by 200 mobile operators in 60 countries [1]. More specific mobile platforms such as Foursquare, which unlike Facebook only collects location information, reports 6.5 million users worldwide, and also has a mobile presence (both with a web application and iPhone / Android applications) [2]. Contextaware architectures intend to explore this increasing number of context information sources and provide richer, targeted services to end-users, while also taking into account arising privacy issues. While multiple context management platform...

Real time systems are defined as the systems which are bound to time constraints and if not fulfilled lead to catastrophic results. In the present technological era, more and more real time systems are making their way through even in the... more

Real time systems are defined as the systems which are bound to time constraints and if not fulfilled lead to catastrophic results. In the present technological era, more and more real time systems are making their way through even in the commercial sphere in the form of satellites, auto pilot systems, ecommerce applications and many more. This increasing utility of real time systems requires them to be more and more fault tolerant. This paper gives a brief account of ‘Information Redundancy’ as a measure of fault tolerance in real time systems and concludes by giving the future perspectives in the field.

A grid computing environment provides a type of distributed computation that is unique because it is not centrally managed and it has the capability to connect heterogeneous resources. A grid system provides location-independent access to... more

A grid computing environment provides a type of distributed computation that is unique because it is not centrally managed and it has the capability to connect heterogeneous resources. A grid system provides location-independent access to the resources and services of geographically distributed machines. An essential ingredient for supporting location-independent computations is the ability to discover resources that have been requested by the users. Because the number of grid users can increase and the grid environment is continuously changing, a scheduler that can discover decentralized resources is needed. Grid resource scheduling is considered to be a complicated, NP-hard problem because of the distribution of resources, the changing conditions of resources, and the unreliability of infrastructure communication. Various artificial intelligence algorithms have been proposed for scheduling tasks in a computational grid. This paper uses the imperialist competition algorithm (ICA) to address the problem of independent task scheduling in a grid environment, with the aim of reducing the makespan. Experimental results compare ICA with other algorithms and illustrate that ICA finds a shorter makespan relative to the others. Moreover, it converges quickly, finding its optimum solution in less time than the other algorithms.

Parallel computing has become most important issue right this time but because of the high cost of supercomputer it is not accessible for everyone. Cluster is the only technique that provides parallel computing, scalability and high... more

Parallel computing has become most important issue right this time but because of the high cost of supercomputer it is not accessible for everyone. Cluster is the only technique that provides parallel computing, scalability and high availability at low cost. Collection of personal computers (PCs) builds a cluster that provides us parallel execution. High Performance Computing (HPC) is the field of computer science that emphases on making of cluster computers, supercomputers and parallel algorithms. At this present time, clusters technique has practical in numerous areas, for instance scientific calculations, weather forecasting, bioinformatics, signal processing, petroleum exploration and so on. This paper compares the two different types of clusters to check the overall performance in execution time. One cluster is made up of Dell core 2 duo systems and second cluster is made up of HP core 2 duo systems each with two nodes having almost same configurations. To analyze the performance of these two clusters we have executed two different parallel programs on the clusters for pi calculation and quick sort with different problem sizes. We observed that with small size problems Dell cluster performed better against HP cluster while with large size problems HP cluster won the game.