Konstantinos Masselos | University of the Peloponnese (original) (raw)

Papers by Konstantinos Masselos

Data correspondsto quantitative (raw) effort assessments/predictions during maintenance process o... more Data correspondsto quantitative (raw) effort assessments/predictions during maintenance process of a sample of 1000 possibleinstances of the general selection problem among Visitor and Inheritance Based Implementation over the Composite design patterns (CIBI vs CVP).Related values correspond to deterministic results of effort predictions returned by the equations of Formal Model (FM) and stochastic resultsof effort predictions returned by the Simulation Model (SM), for different 'lamda' values (number of scenario applications) and for all simulation states of the source study entitled "Modeling Software Evolution to Support Early Decision-making among Design Alternatives towards Maintainability"

ACM Transactions on Embedded Computing Systems, 2020

This article presents a MATLAB-to-C compiler that exploits custom instructions present in state-o... more This article presents a MATLAB-to-C compiler that exploits custom instructions present in state-of-the-art processor architectures and supports semi-automatic vectorization. A parameterized processor model is used to describe the target instruction set architecture to achieve user-friendly retargetability. Custom instructions are represented via specialized intrinsic functions in the generated code, which can then be used as input to any C/C++ compiler supporting the target processor. In addition, the compiler supports the generation of data parallel/vectorized code through the introduction of data packing/unpacking statements. The compiler has been used for code generation targeting ARM and x86 architectures for several benchmarks. The vectorized code generated by the compiler achieves an average speedup of 4.1× and 2.7× for packed fixed and floating point data, respectively, compared to scalarized code for ARM architecture and an average speedup of 3.1× and 1.5× for packed fixed a...

VLSI Design, 2002

A systematic methodology for energy dissipation reduction of multimedia applications realized on ... more A systematic methodology for energy dissipation reduction of multimedia applications realized on architectures based on embedded cores and application specific data memory organization is proposed. Performance and area are explicitly taken into account. The proposed methodology includes two major steps: A high-level code transformation step that reorganizes the original description of the target application. The second major step includes the determination of the processor, memory and bus organization of the system and is briefly described. Experimental results from several real-life demonstrators prove the impact of the high level step of the proposed methodology.

ACM Transactions on Software Engineering and Methodology

Critical decisions among design alternatives with regards to maintainability arise early in the s... more Critical decisions among design alternatives with regards to maintainability arise early in the software design cycle. Existing comparison models relayed on the structural evolution of the used design patterns are suitable to support such decisions. However, their effectiveness on predicting maintenance effort is usually verified on a limited number of case studies under heterogeneous metrics. In this paper, a multi-variable simulation model for validating the decision-making reliability of the derived formal comparison models for the significant designing problem of recursive hierarchies of part-whole aggregations, proposed in our prior work, is introduced. In the absence of a strictly validation, the simulation model has been thoroughly calibrated concerning its decision-making precision based on empirical distributions from time series analysis, approximating the highly uncertain nature of actual maintenance process. The decision reliability of the formal models has been statisti...

Abstract: Open Source Web Applications are all those applications that are accessible through the... more Abstract: Open Source Web Applications are all those applications that are accessible through the World Wide Web[1], and also their source code is open to anyone who wants to participate in their amendment. The benefits of the World Community from these applications are enormous. The use of open source applications begins already at school. At University it doesn't create just users, but develops students’ programming skills. At the same time, it promotes and contributes to the research conducted by the academic community. Later, in the workplace, these applications are always useful and can help in all areas of daily work as well as in the development of a company. It's very important that the open source applications promote cooperation and exchange of knowledge between people, through discussion groups(forum), since they allow the simultaneous work. Furthermore, the benefits are economic as well, since it can be saved a great amount of financial resources, that ...

Unified low-power design flow for data-dominated multi-media and telecom applications, 2000

Sum-of-products computations are widely used in multimedia and communications systems. Techniques... more Sum-of-products computations are widely used in multimedia and communications systems. Techniques for the power efficient data path synthesis of sum-of-products computations are presented in this chapter. Simple and efficient heuristics for the instruction-level scheduling and assignment steps are described. These steps are crucial sub-steps of the custom processor synthesis stage of the system level design meta-flow proposed in chapter 2. The proposed heuristics exploit the inherent independence of the sum-of-products computations to formulate the synthesis tasks using the concept of the Traveling Salesman’s Problem. In this way the synthesis tasks can be solved very efficiently. Different partly static cost functions are proposed to drive the synthesis tasks. The proposed cost functions target the power consumption either in the buses connecting the functional units with the storage elements or in the functional units. The partly static nature of the proposed cost functions reduces the time of the synthesis procedure. Experimental results from different relevant digital signal processing algorithmic kernels prove that the proposed synthesis techniques lead to significant power savings.

ACM Transactions on Software Engineering and Methodology, 2017

Selecting between different design options is a crucial decision for object-oriented software dev... more Selecting between different design options is a crucial decision for object-oriented software developers that affects code quality characteristics. Conventionally developers use their experience to make such decisions, which leads to suboptimal results regarding code quality. In this article, a formal model for providing early estimates of quality metrics of object-oriented software implementation alternatives is proposed. The model supports software developers in making fast decisions in a systematic way early during the design phase to achieve improved code characteristics. The approach employs a comparison model related to the application of the Visitor design pattern and inheritance-based implementation on structures following the Composite design pattern. The model captures maintainability as a metric of software quality and provides precise assessments of the quality of each implementation alternative. Furthermore, the model introduces the structural maintenance cost metric ba...

2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

This paper presents and evaluates a cache hierarchy-aware code parallelization/mapping and schedu... more This paper presents and evaluates a cache hierarchy-aware code parallelization/mapping and scheduling strategy for multicore architectures. Our proposed parallelization/mapping strategy determines a loop iteration-to-core mapping by taking into account the data access pattern of an application and the on-chip cache hierarchy of a target architecture. The goal of this step is to maximize data locality at each level of caches while minimizing the data dependences across the cores. Our scheduling strategy on the other hand determines a schedule for the iterations assigned to each core in the target architecture, with the goal of satisfying all the data dependences in the code (both intra-core and inter-core) and reducing data reuse distances across the cores that share data. We formulate both parallelization/mapping problem and scheduling problem in a linear algebraic framework and solve them using the Farkas Lemma and the Integer Fourier-Motzkin Elimination. To measure the effectiveness of our schemes, we implemented them in a compiler and tested them using eight multithreaded application programs on a multicore machine. Our results show that the proposed mapping scheme reduces cache miss rates at all levels of the cache hierarchy and improves execution time of applications significantly, compared to alternate approaches, and when supported by scheduling, the improvements in cache miss rates and execution time become much larger.

Source code analysis and manipulation tools have become an essential part of software development... more Source code analysis and manipulation tools have become an essential part of software development processes. Automating the development of such tools can heavily reduce development time, effort and cost. This paper proposes a framework for the efficient development of code analysis software. A tool for automatically generating the front end of analysis tools for a given language grammar is proposed. The proposed approach can be applied to any language that can be described using the BNF notation. The proposed framework also provides a domain specific language to concisely express queries on the internal representation generated by the front end. This language tackles the problem of writing complex code in a general purpose programming language in order to retrieve information from the internal representation. The approach has been evaluated through two different realistic usage scenarios applied to a number of different benchmark applications. The front end generator has also been t...

Proceedings of the 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE), 2016

This paper discusses a MATLAB to C compiler exploiting custom instructions such as instructions f... more This paper discusses a MATLAB to C compiler exploiting custom instructions such as instructions for SIMD processing and instructions for complex arithmetic present in Application Specific Instruction Set Processors (ASIPs). The compiler generates ANSI C code in which the processor's special instructions are represented via specialized intrinsic functions. By doing this the generated code can be used as input to any C/C++ compiler. Thus the proposed compiler allows the description of the specialized instruction set of the target processor in a parameterized way allowing the support of any processor. The proposed compiler has been used for the generation of application code for an ASIP targeting DSP applications. The code generated by the proposed compiler achieves a speed up between 2x-30x on the targeted ASIP for six DSP benchmarks compared to the code generated by Mathworks MATLAB to C compiler. Thus the proposed compiler can be employed to reduce the development time/effort/cost and time to market by raising the abstraction of application design in an embedded systems / system-onchip development context while still improving implementation efficiency.

In this demonstration we present the usage of XMSIM, a tool for memory hierarchy evaluation of mu... more In this demonstration we present the usage of XMSIM, a tool for memory hierarchy evaluation of multimedia applications. The input is a high level C code application description and a memory hierarchy specification and the output are the statistics characterizing the memory operation.

Realizing image and signal processing algorithms in embedded systems is a three step process incl... more Realizing image and signal processing algorithms in embedded systems is a three step process including algorithmic design, implementation and mapping to a target architecture and memory hierarchy. This paper presents MemAddIn, a dynamic analysis tool for C applications that exposes the critical application's loops which deserve the designer's attention for memory hierarchy optimization. MemAddIn is based on an extension of MEMSCOPT compiler and integrates in the Visual Studio IDE offering a unified environment for the application's implementation and optimization. To conclude on the criticality of the application loops the tool utilizes two metrics which are relevant with the underlying memory architecture cost and performance.

International Journal of Innovation and Regional Development, 2015

HercuLeS is an extensible high-level synthesis environment for automatically mapping algorithms t... more HercuLeS is an extensible high-level synthesis environment for automatically mapping algorithms to hardware. It overcomes limitations of known work: insufficient representations, maintenance difficulties, necessity of code templates, lack of usage paradigms and vendor-dependence. Aspects that are highlighted include automatic IP integration and especially source-and intermediate-level optimising transformations. In this context, we present transformational patterns for loop and if-conversion optimisations. Further, we focus on constant multiplication and division by proposing a suitable scheme for their straightforward and decoupled utilisation in user applications. It is shown that loop optimisations provide benefits of up to 32% in cycle performance, while if-conversion delivers an average improvement of 6.5%. By applying arithmetic optimisations, a 3.3-5.9× speedup over sequential implementations is achieved. It is also shown that HercuLeS is highly competitive to state-of-the-art commercial tools.

Advances in Design and Specification Languages for SoCs

... SYSTEMC AND OCAPI-XL BASED SYSTEM-LEVEL DESIGN FOR RECONFIGURABLE SYSTEMS-ON-CHIP Kari Tiensy... more ... SYSTEMC AND OCAPI-XL BASED SYSTEM-LEVEL DESIGN FOR RECONFIGURABLE SYSTEMS-ON-CHIP Kari Tiensyrjä 1 , Miroslav Cupak 2 , Kostas Masselos 3 , Marko Pettissalo 4 , Konstantinos Potamianos 3 , Yang Qu 1 , Luc Rynders 2 , Geert Vanmeerbeeck 2 , ...

Proceedings of the 17th Panhellenic Conference on Informatics - PCI '13, 2013

HercuLeS is an extensible high-level synthesis (HLS) environment. It removes significant human ef... more HercuLeS is an extensible high-level synthesis (HLS) environment. It removes significant human effort by automatically mapping algorithms to hardware, providing a valuable design assist to software-oriented developers. To enable accessibility and easiness of hardware design space exploration (DSE), HercuLeS overcomes limitations of known work: non-standard source languages, insufficient representations, maintenance difficulties, necessity of code templates, lack of usage paradigms and vendor-dependence. Specific aspects that are highlighted in this manuscript are: a) the innerworkings of the HercuLeS hardware compilation engine, b) manipulation of SSA (Static Single Assignment) form, c) automatic third-party IP integration, d) backend C code generation for compiled simulation, and e) an exemplary case of DSE. HercuLeS enables efficient hardware generation that can closely match the quality of results of a manuallydeveloped implementation with much reduced human effort and time requirements.

2013 23rd International Conference on Field programmable Logic and Applications, 2013

HercuLeS by Ajax Compilers 1 is an extensible HLS environment that allows pluggable analyses and ... more HercuLeS by Ajax Compilers 1 is an extensible HLS environment that allows pluggable analyses and optimizations. It can be used for pushbutton synthesis from ANSI C and other source languages to custom hardware.

Lecture Notes in Electrical Engineering, 2011

This paper presents a memory hierarchy evaluation framework for multimedia applications. It takes... more This paper presents a memory hierarchy evaluation framework for multimedia applications. It takes as input a high level C code application description and a memory hierarchy specification and provides statistics characterizing the memory operation. Essentially the tool is a specialized C++ data type library which is used to replace the application's data types with others that monitor memory access activity. XMSIM's operation is event driven which means that every access to a specific data structure is converted to a message towards the memory model which subsequently emulates memory hierarchy operation. The memory model is highly parametric allowing a large number of alternatives to be modeled. XMSIM's main advantage is its modularity allowing the designer to alter specific aspects of the memory operation beyond the predefined ones. The main features are the capability to: 1) simulate any subset of the application's data types, 2) user defined mapping of data to memories, 3) simultaneously simulate multiple memory hierarchy scenarios, 4) immediate feedback to code transformations effect on memory hierarchy behavior, 5) verification utilities for the validation of code transformations.

Microprocessors and Microsystems, 2014

ABSTRACT This paper presents XMSIM, an early memory hierarchy evaluation simulator for multimedia... more ABSTRACT This paper presents XMSIM, an early memory hierarchy evaluation simulator for multimedia applications. The input is source code in C and a memory hierarchy description and the output is profiling information about memory operations during the execution of the source code. A memory hierarchy can be made of arbitrary levels of cache and main memory while multiple hierarchies can be modeled in parallel. Any subset of the source code’s variables can be mapped to the simulated memory units to any location of choice and the contents of any memory level are available at any execution step. Specialized routines can be added to verify correctness of optimizations made in the source code in respect to memory usage. The simulator is extensible in that additional memory characteristics can be modeled and more information on the effect of software-hardware interaction can be extended. A demonstration is presented of how the tool can be used to optimize a multimedia application.

Microprocessors and Microsystems, 2013

The mapping process of high performance embedded applications to today's multiprocessor system-on... more The mapping process of high performance embedded applications to today's multiprocessor system-onchip devices suffers from a complex toolchain and programming process. The problem is the expression of parallelism with a pure imperative programming language, which is commonly C. This traditional approach limits the mapping, partitioning and the generation of optimized parallel code, and consequently the achievable performance and power consumption of applications from different domains. The Architecture oriented paraLlelization for high performance embedded Multicore systems using scilAb (ALMA) European project aims to bridge these hurdles through the introduction and exploitation of a Scilab-based toolchain which enables the efficient mapping of applications on multiprocessor platforms from a high level of abstraction. The holistic solution of the ALMA toolchain allows the complexity of both the application and the architecture to be hidden, which leads to better acceptance, reduced development cost, and shorter time-to-market. Driven by the technology restrictions in chip design, the end of exponential growth of clock speeds and an unavoidable increasing request of computing performance, ALMA is a fundamental step forward in the necessary introduction of novel computing paradigms and methodologies.