Paul Stravers - Academia.edu (original) (raw)

Papers by Paul Stravers

Research paper thumbnail of Challenges in physical chip design

Research paper thumbnail of Resource Reservations in Shared-Memory Multiprocessor SoCs

Consumer electronics vendors increasingly deploy shared-memory multiprocessor SoCs, such as Phili... more Consumer electronics vendors increasingly deploy shared-memory multiprocessor SoCs, such as Philips Nexperia, to balance flexibility (late changes, software download, reuse) and cost (silicon area, power consumption) requirements. With the convergence of storage, digital television, and connectivity, these media-processing systems must support numerous operational modes. Within a mode, the system concurrently processes many streams, each imposing a potentially dynamic workload on the scarce system resources. The dynamic sharing of scarce resources is known to jeopardize robustness and predictability. Resource reservation is an accepted approach to tackle this problem. This chapter applies the resource reservation paradigm to interrelated SoC resources: processor cycles, cache space, and memory access cycles. The presented virtual platform approach aims to integrate the reservation mechanisms of each shared SoC resource as the first step towards robust, yet flexible and cost-effective consumer products.

Research paper thumbnail of Transputer network with flexible topology

Microprocessing and Microprogramming, 1988

ABSTRACT

Research paper thumbnail of Cache-Coherent Heterogeneous Multiprocessing as Basis for Streaming Applications

Systems-on-Chip (SoC) of the new generation will be extremely complex devices, composed from comp... more Systems-on-Chip (SoC) of the new generation will be extremely complex devices, composed from complex subsystems, relying on abstraction from implementation details. These chips will support the execution of a mix of concurrent applications that are not known in detail at chip design time. These SoCs require a significant degree of programmability to configure both the set of functions that must execute as well as the structure of the dataflow between these functions. To ease the programming effort multiprocessor computers have employed cache coherent share memory for decades, abstracting the average programmer from system complexity issues such as multiple processors and memory hierarchies. Memory coherency in multiprocessor computers has a history of decades, and has proven to be an indispensable abstraction from system complexity towards the application programmer. This chapter describes a next generation SoC for the consumer electronics domain (e.g. audio/video, vision, robotics). It features heterogeneous multiprocessor subsystems with a snooping cache coherence protocol, combined in a system with distributed memory employing a directory coherency protocol. It is explained why and how the coherent memory model is indispensable for implementing both data transport and synchronization for multi-tasking streaming applications in distributed memory systems.

Research paper thumbnail of Homogeneous Multiprocessing for Consumer Electronics - Invited Talk

Research paper thumbnail of Exploring design space of parallel realizations: MPEG2 decoder case study

Many applications lend them to parallelism at different leveh of gmnularity. We first identify th... more Many applications lend them to parallelism at different leveh of gmnularity. We first identify the key issues involved in creating a pamllel model of an application. These are done with a view to estimate performance and explore the "pamllel" design space to select a suitable design point. The framework presented pmvides an opportunity to perform this explomtion both in the target architecture independent and target architecture dependent manner. An MPEG-2 decoder model in YAPI has been presented which has more p a dlelism and impmved performance. This model has further been mapped onto SpaceCAKE architecture to study ita architectuml pammeters. Detailed ?eSUh!S obtained with YAPI simulation (target architecture independent) and TSS simulation (afrer process-processor binding) on MPEG-1 decoder application establish the effectiveness of our approach.

Research paper thumbnail of Exploring design space of parallel realizations: MPEG2 decoder case study

Many applications lend them to parallelism at different leveh of gmnularity. We first identify th... more Many applications lend them to parallelism at different leveh of gmnularity. We first identify the key issues involved in creating a pamllel model of an application. These are done with a view to estimate performance and explore the "pamllel" design space to select a suitable design point. The framework presented pmvides an opportunity to perform this explomtion both in the target architecture independent and target architecture dependent manner. An MPEG-2 decoder model in YAPI has been presented which has more p a dlelism and impmved performance. This model has further been mapped onto SpaceCAKE architecture to study ita architectuml pammeters. Detailed ?eSUh!S obtained with YAPI simulation (target architecture independent) and TSS simulation (afrer process-processor binding) on MPEG-1 decoder application establish the effectiveness of our approach.

Research paper thumbnail of Homogeneous multiprocessing for the masses

Summary form only given. Processor architectures have reached a point where it is getting increas... more Summary form only given. Processor architectures have reached a point where it is getting increasingly hard to improve their performance without resorting to complex and exotic measures. Polack observed in 2000 that Intel processors had been "on the wrong side of a square law" for almost a decade. Embedded processors for consumer and telecommunication chips are now confronted with the same rule of diminishing returns. To further improve their performance, the processors are getting disproportionally bigger and consume much more energy per operation than previous generations. Traditionally, embedded systems-on-chip (SoC) have been designed as heterogeneous multiprocessors, where most processors are not programmable and a single control processor synchronizes all communication. Obvious advantages of such systems include low cost and low power consumption. In high volume products this outweighs disadvantages like a low degree of design reuse, little software reuse, and long product lead times. Despite all the hard work and good intentions it has proved difficult to establish a platform around heterogeneous SoC architectures. With the rise of non-recurrent engineering costs and an increasingly global and competitive semiconductor market, the need for a successful SoC platform is felt stronger than ever in the industry. Next to cost, the availability of qualified engineers is often even a bigger problem. Given that it is not unusual to spend several hundreds of men years on software development for a single product, it is easy to see that even a multinational company can only have a very limited number of products in development at any point in time. The solution we propose is to move away from heterogeneous SoC and instead embrace homogeneous embedded multiprocessors. In this talk we discuss embedded multiprocessor architectures and how they relate to programming models. We contrast heterogeneous to homogeneous architectures, and we show how the traditional efficiency gap between the two is narrowing. We also discuss issues related to hardware and software reuse, and the quest for composable systems to speed up the often lengthy process of embedded system integration.

Research paper thumbnail of Homogeneous multiprocessing and the future of silicon design paradigms

This paper addresses two challenges of the consumer semiconductor industry: (1) economical and so... more This paper addresses two challenges of the consumer semiconductor industry: (1) economical and social forces are increasingly reducing the length of product life cycles, and (2) the continuing exponential growth of the on-chip transistor count is pushing design complexity. In concert these two trends represent a formidable challenge for semiconductor companies that aim to benefit from future technological developments in highly competitive markets. The paper derives a relation between on-chip memory real estate and compute logic, suggesting that homogeneous multiprocessors are an unavoidable consequence of the technology curve. A particular approach to homogeneous multiprocessing is then presented that combines scalability with high computational performance and with high power efficiency. We also present the implementation of a programming paradigm for homogeneous multiprocessors that focuses on reuse of tested and approved functions at the software level. This enables a shift from today's not-so-successful practice of hardware core reuse to the reuse of functions that have very well defined and uniform interfaces. The time frame for large scale commercial application of this type of homogeneous multiprocessor architecture is expected to coincide with the arrival of 0.07 micron technology for consumer products, i.e. 2006 and beyond. The paper concludes with a case study of an MPEG2 decoder and how a few simple guidelines can significantly increase the exposed concurrency of the application

Research paper thumbnail of Challenges in physical chip design

Research paper thumbnail of Resource Reservations in Shared-Memory Multiprocessor SoCs

Consumer electronics vendors increasingly deploy shared-memory multiprocessor SoCs, such as Phili... more Consumer electronics vendors increasingly deploy shared-memory multiprocessor SoCs, such as Philips Nexperia, to balance flexibility (late changes, software download, reuse) and cost (silicon area, power consumption) requirements. With the convergence of storage, digital television, and connectivity, these media-processing systems must support numerous operational modes. Within a mode, the system concurrently processes many streams, each imposing a potentially dynamic workload on the scarce system resources. The dynamic sharing of scarce resources is known to jeopardize robustness and predictability. Resource reservation is an accepted approach to tackle this problem. This chapter applies the resource reservation paradigm to interrelated SoC resources: processor cycles, cache space, and memory access cycles. The presented virtual platform approach aims to integrate the reservation mechanisms of each shared SoC resource as the first step towards robust, yet flexible and cost-effective consumer products.

Research paper thumbnail of Transputer network with flexible topology

Microprocessing and Microprogramming, 1988

ABSTRACT

Research paper thumbnail of Cache-Coherent Heterogeneous Multiprocessing as Basis for Streaming Applications

Systems-on-Chip (SoC) of the new generation will be extremely complex devices, composed from comp... more Systems-on-Chip (SoC) of the new generation will be extremely complex devices, composed from complex subsystems, relying on abstraction from implementation details. These chips will support the execution of a mix of concurrent applications that are not known in detail at chip design time. These SoCs require a significant degree of programmability to configure both the set of functions that must execute as well as the structure of the dataflow between these functions. To ease the programming effort multiprocessor computers have employed cache coherent share memory for decades, abstracting the average programmer from system complexity issues such as multiple processors and memory hierarchies. Memory coherency in multiprocessor computers has a history of decades, and has proven to be an indispensable abstraction from system complexity towards the application programmer. This chapter describes a next generation SoC for the consumer electronics domain (e.g. audio/video, vision, robotics). It features heterogeneous multiprocessor subsystems with a snooping cache coherence protocol, combined in a system with distributed memory employing a directory coherency protocol. It is explained why and how the coherent memory model is indispensable for implementing both data transport and synchronization for multi-tasking streaming applications in distributed memory systems.

Research paper thumbnail of Homogeneous Multiprocessing for Consumer Electronics - Invited Talk

Research paper thumbnail of Exploring design space of parallel realizations: MPEG2 decoder case study

Many applications lend them to parallelism at different leveh of gmnularity. We first identify th... more Many applications lend them to parallelism at different leveh of gmnularity. We first identify the key issues involved in creating a pamllel model of an application. These are done with a view to estimate performance and explore the "pamllel" design space to select a suitable design point. The framework presented pmvides an opportunity to perform this explomtion both in the target architecture independent and target architecture dependent manner. An MPEG-2 decoder model in YAPI has been presented which has more p a dlelism and impmved performance. This model has further been mapped onto SpaceCAKE architecture to study ita architectuml pammeters. Detailed ?eSUh!S obtained with YAPI simulation (target architecture independent) and TSS simulation (afrer process-processor binding) on MPEG-1 decoder application establish the effectiveness of our approach.

Research paper thumbnail of Exploring design space of parallel realizations: MPEG2 decoder case study

Many applications lend them to parallelism at different leveh of gmnularity. We first identify th... more Many applications lend them to parallelism at different leveh of gmnularity. We first identify the key issues involved in creating a pamllel model of an application. These are done with a view to estimate performance and explore the "pamllel" design space to select a suitable design point. The framework presented pmvides an opportunity to perform this explomtion both in the target architecture independent and target architecture dependent manner. An MPEG-2 decoder model in YAPI has been presented which has more p a dlelism and impmved performance. This model has further been mapped onto SpaceCAKE architecture to study ita architectuml pammeters. Detailed ?eSUh!S obtained with YAPI simulation (target architecture independent) and TSS simulation (afrer process-processor binding) on MPEG-1 decoder application establish the effectiveness of our approach.

Research paper thumbnail of Homogeneous multiprocessing for the masses

Summary form only given. Processor architectures have reached a point where it is getting increas... more Summary form only given. Processor architectures have reached a point where it is getting increasingly hard to improve their performance without resorting to complex and exotic measures. Polack observed in 2000 that Intel processors had been "on the wrong side of a square law" for almost a decade. Embedded processors for consumer and telecommunication chips are now confronted with the same rule of diminishing returns. To further improve their performance, the processors are getting disproportionally bigger and consume much more energy per operation than previous generations. Traditionally, embedded systems-on-chip (SoC) have been designed as heterogeneous multiprocessors, where most processors are not programmable and a single control processor synchronizes all communication. Obvious advantages of such systems include low cost and low power consumption. In high volume products this outweighs disadvantages like a low degree of design reuse, little software reuse, and long product lead times. Despite all the hard work and good intentions it has proved difficult to establish a platform around heterogeneous SoC architectures. With the rise of non-recurrent engineering costs and an increasingly global and competitive semiconductor market, the need for a successful SoC platform is felt stronger than ever in the industry. Next to cost, the availability of qualified engineers is often even a bigger problem. Given that it is not unusual to spend several hundreds of men years on software development for a single product, it is easy to see that even a multinational company can only have a very limited number of products in development at any point in time. The solution we propose is to move away from heterogeneous SoC and instead embrace homogeneous embedded multiprocessors. In this talk we discuss embedded multiprocessor architectures and how they relate to programming models. We contrast heterogeneous to homogeneous architectures, and we show how the traditional efficiency gap between the two is narrowing. We also discuss issues related to hardware and software reuse, and the quest for composable systems to speed up the often lengthy process of embedded system integration.

Research paper thumbnail of Homogeneous multiprocessing and the future of silicon design paradigms

This paper addresses two challenges of the consumer semiconductor industry: (1) economical and so... more This paper addresses two challenges of the consumer semiconductor industry: (1) economical and social forces are increasingly reducing the length of product life cycles, and (2) the continuing exponential growth of the on-chip transistor count is pushing design complexity. In concert these two trends represent a formidable challenge for semiconductor companies that aim to benefit from future technological developments in highly competitive markets. The paper derives a relation between on-chip memory real estate and compute logic, suggesting that homogeneous multiprocessors are an unavoidable consequence of the technology curve. A particular approach to homogeneous multiprocessing is then presented that combines scalability with high computational performance and with high power efficiency. We also present the implementation of a programming paradigm for homogeneous multiprocessors that focuses on reuse of tested and approved functions at the software level. This enables a shift from today's not-so-successful practice of hardware core reuse to the reuse of functions that have very well defined and uniform interfaces. The time frame for large scale commercial application of this type of homogeneous multiprocessor architecture is expected to coincide with the arrival of 0.07 micron technology for consumer products, i.e. 2006 and beyond. The paper concludes with a case study of an MPEG2 decoder and how a few simple guidelines can significantly increase the exposed concurrency of the application