Chris Rowen | Cadence - Academia.edu (original) (raw)
Papers by Chris Rowen
Next-generation embedded systems in application domains such as multimedia, wired and wireless co... more Next-generation embedded systems in application domains such as multimedia, wired and wireless communications, and multipurpose portable devices, are increasingly turning to multiprocessor platforms as a vehicle for their realization. But entirely fixed platforms composed of entirely fixed components lack the flexibility and ability to be optimized to the application to offer the best solution in any of these areas. Configurability at multiple levels offers a much better chance to optimize the resulting multiprocessor platform. Existing and emerging technologies for configurable and extensible processors and the creation of configurable multiprocessor subsystem platforms offer significant capability to design teams to both differentiate and optimize their products.
Proceedings. 42nd Design Automation Conference, 2005., 2005
Configurable processors enable dramatic gains in energy efficiency, relative to traditional fixed... more Configurable processors enable dramatic gains in energy efficiency, relative to traditional fixed instruction-set processors. This energy advantage comes from three improvements. First, configuration of the instruction set permits a much closer fit of the processor to the target applications, reducing the number of execution cycles required. Second, configuring the processor removes unneeded features, reducing power and area overhead. Third, automatic processor generation tools enable logic optimization, signal switching reductions, and seamless mapping into low-voltage circuits and processes, for very low-power operation. The first improvement has been well-studied. Analysis of the second and third improvements requires detailed circuit and layout experiments, which is the primary focus of this paper. Starting from a range of existing available power saving options, this work explores the tradeoff and analyzes the results: the design priority tradeoff, the process technology impact, and implementing low-power configurable processor using commercial scaled-VDD cell libraries compatible with mainstream SOC practices. These real processor designs can achieve power dissipation approaching 20µW/MHz at 0.8V and close to 10µW/MHz at 0.6V, using production 0.13um libraries. Finally, this work quantifies the dramatic process, voltage and temperature dependence in post-layout leakage power for small processor designs.
Proceedings of the 36th ACM/IEEE conference on Design automation conference - DAC '99, 1999
... Chair: Jonah McLeod - Silicon Strategies, Mountain View, CA Organizers: Ghulam Nurie - Synops... more ... Chair: Jonah McLeod - Silicon Strategies, Mountain View, CA Organizers: Ghulam Nurie - Synopsys, Inc., Mountain View, CA Panelists: ... Chris Rowen, Tensilica In the regime of system-on-a-chip, verification becomes both harder and more critical. ...
Proceedings of the 37th conference on Design automation - DAC '00, 2000
Advances in device technology have led to an era where entire systems can be implemented on a sin... more Advances in device technology have led to an era where entire systems can be implemented on a single component, commonly referred to as system-on-chip. With shrinking product life cycles placing severe time to market demands on manufacturers, coupled with their need to quickly change a product's feature set to address evolving customer requirements, programmability will emerge as a corner-stone for
Proceedings of the 5th ACM international conference on Embedded software - EMSOFT '05, 2005
Among the many directions of IT, the most pervasive is the fusion of information processing with ... more Among the many directions of IT, the most pervasive is the fusion of information processing with physical processes-called embedded computing. It is the basic engine of innovation and source of competitiveness for broad range of industrial sectors from automotive to telecommunications and from aerospace to manufacturing. Embedded computing transforms products, creates new markets and disrupts the status quo. Embedded computing is rapidly taking over the role of being the universal system integrator for physical systems. Prominent leaders of industrial and academic R&D organizations will discuss the consistency between present and future application challenges as seen by industry and dominating research challenges as conceived by academia.
ArXiv, 2018
This invention addresses fixed-point representations of convolutional neural networks (CNN) in in... more This invention addresses fixed-point representations of convolutional neural networks (CNN) in integrated circuits. When quantizing a CNN for a practical implementation there is a trade-off between the precision used for operations between coefficients and data and the accuracy of the system. A homogenous representation may not be sufficient to achieve the best level of performance at a reasonable cost in implementation complexity or power consumption. Parsimonious ways of representing data and coefficients are needed to improve power efficiency and throughput while maintaining accuracy of a CNN.
Proceedings of the 27th European Solid-State Circuits Conference, 2001
New application-focused system-on-chip platforms motivate new application-specific processors. Co... more New application-focused system-on-chip platforms motivate new application-specific processors. Configurable and extensible processor architectures offer the efficiency of tuned logic solutions with the flexibility of standard high-level programming methodology. Automated extension of processor function units and the associated software environment - compilers, debuggers, simulators and real-time operating systems - satisfies these needs. At the same time, designing at the level of software and instruction set architecture significantly shortens the design cycle and reduces verification effort and risk. This paper describes the key dimensions of extensibility within the processor architecture, the instruction set extension description language and the means of automatically extending the software environment from that description. It also describes two groups of benchmarks, EEMBC's Consumer and Telecommunications suites, that show 20 to 40 times acceleration of a broad set of alg...
With the majority of chip real estate being filled with re-used IP blocks, the process of block a... more With the majority of chip real estate being filled with re-used IP blocks, the process of block assembly has significantly grown in importance. Marketing literature seems to suggest that assembling a chip from IP is as easy as browsing a library of blocks, assembling them in a block diagram and then pushing a button. Are we there yet? Is it like the assembly of LEGO blocks? In parallel, the evolution of IP-reuse has gone through various stages from smaller blocks like multipliers and adders to more complex blocks like USB, SATA, PCI and others. What is the next step up from here? Given the growing importance of software, a new form of re-use has started to emerge, that is re-use at the sub-system level, often including hardware and software. This panel will explore the trends in IP re-use and discuss where IP blocks end and systems start. Furthermore, it will also explore the trends in IP-reuse and assembly, assess the state of the art of IP assembly and co-simulation standards like...
In this work, we examine the computational efficiency of scientific applications on three high-pe... more In this work, we examine the computational efficiency of scientific applications on three high-performancecomputing systems based on processors of varying degrees of specialization: an x86 server processor, the AMD Opteron; a more specialized System-on-Chip solution, the BlueGene/L and BlueGene/P; and a configurable embedded core, the Tensilica Xtensa. We use the atmospheric component of the global Community Atmospheric Model to motivate our study by defining a problem that requires exascale-class computing performance currently beyond the capabilities of existing systems. Significant advances in power-efficiency are necessary to make such a system practical to field.
IEEE Transactions on Pattern Analysis and Machine Intelligence
The 55th Design Automation Conference (DAC) held its first System Design Contest (SDC) in 2018. S... more The 55th Design Automation Conference (DAC) held its first System Design Contest (SDC) in 2018. SDC'18 features a lower power object detection challenge (LPODC) on designing and implementing novel algorithms based object detection in images taken from unmanned aerial vehicles (UAV). The dataset includes 95 categories and 150k images, and the hardware platforms include Nvidia's TX2 and Xilinx's PYNQ Z1. DAC-SDC'18 attracted more than 110 entries from 12 countries. This paper presents in detail the dataset and evaluation procedure. It further discusses the methods developed by some of the entries as well as representative results. The paper concludes with directions for future improvements.
SOCs solve complex, data-intensive application problems by delivering high performance with good ... more SOCs solve complex, data-intensive application problems by delivering high performance with good power-efficiency. Configurable, application-specific processor cores used as task blocks in an SOC deliver hardware-like performance and programmability, which reduces design effort and risk when compared to manual RTL block-design techniques.
IEEE Design and Test of Computers, 1999
ACM Transactions on …, 1988
MIPS is a 32-bit processor architecture that has been implemented as an nMOS VLSI chip. The instr... more MIPS is a 32-bit processor architecture that has been implemented as an nMOS VLSI chip. The instruction set architecture is RISC-based. Close coupling with compilers and efficient use of the instruction set by compiled programs were goals of the architecture. The MIPS architecture requires that the software implement some constraints in the design that are normally considered part of the hardware implementation. This paper presents experimental results on the effectiveness of this processor as a program host. Using sets of large and small benchmarks, the instruction and operand usage patterns are examined both for optimized and unoptimized code. Several of the architectural and organizational innovations in MIPS, including software pipeline scheduling, multiple-operation instructions, and word-based addressing, are examined in light of this data.
Next-generation embedded systems in application domains such as multimedia, wired and wireless co... more Next-generation embedded systems in application domains such as multimedia, wired and wireless communications, and multipurpose portable devices, are increasingly turning to multiprocessor platforms as a vehicle for their realization. But entirely fixed platforms composed of entirely fixed components lack the flexibility and ability to be optimized to the application to offer the best solution in any of these areas. Configurability at multiple levels offers a much better chance to optimize the resulting multiprocessor platform. Existing and emerging technologies for configurable and extensible processors and the creation of configurable multiprocessor subsystem platforms offer significant capability to design teams to both differentiate and optimize their products.
Proceedings. 42nd Design Automation Conference, 2005., 2005
Configurable processors enable dramatic gains in energy efficiency, relative to traditional fixed... more Configurable processors enable dramatic gains in energy efficiency, relative to traditional fixed instruction-set processors. This energy advantage comes from three improvements. First, configuration of the instruction set permits a much closer fit of the processor to the target applications, reducing the number of execution cycles required. Second, configuring the processor removes unneeded features, reducing power and area overhead. Third, automatic processor generation tools enable logic optimization, signal switching reductions, and seamless mapping into low-voltage circuits and processes, for very low-power operation. The first improvement has been well-studied. Analysis of the second and third improvements requires detailed circuit and layout experiments, which is the primary focus of this paper. Starting from a range of existing available power saving options, this work explores the tradeoff and analyzes the results: the design priority tradeoff, the process technology impact, and implementing low-power configurable processor using commercial scaled-VDD cell libraries compatible with mainstream SOC practices. These real processor designs can achieve power dissipation approaching 20µW/MHz at 0.8V and close to 10µW/MHz at 0.6V, using production 0.13um libraries. Finally, this work quantifies the dramatic process, voltage and temperature dependence in post-layout leakage power for small processor designs.
Proceedings of the 36th ACM/IEEE conference on Design automation conference - DAC '99, 1999
... Chair: Jonah McLeod - Silicon Strategies, Mountain View, CA Organizers: Ghulam Nurie - Synops... more ... Chair: Jonah McLeod - Silicon Strategies, Mountain View, CA Organizers: Ghulam Nurie - Synopsys, Inc., Mountain View, CA Panelists: ... Chris Rowen, Tensilica In the regime of system-on-a-chip, verification becomes both harder and more critical. ...
Proceedings of the 37th conference on Design automation - DAC '00, 2000
Advances in device technology have led to an era where entire systems can be implemented on a sin... more Advances in device technology have led to an era where entire systems can be implemented on a single component, commonly referred to as system-on-chip. With shrinking product life cycles placing severe time to market demands on manufacturers, coupled with their need to quickly change a product's feature set to address evolving customer requirements, programmability will emerge as a corner-stone for
Proceedings of the 5th ACM international conference on Embedded software - EMSOFT '05, 2005
Among the many directions of IT, the most pervasive is the fusion of information processing with ... more Among the many directions of IT, the most pervasive is the fusion of information processing with physical processes-called embedded computing. It is the basic engine of innovation and source of competitiveness for broad range of industrial sectors from automotive to telecommunications and from aerospace to manufacturing. Embedded computing transforms products, creates new markets and disrupts the status quo. Embedded computing is rapidly taking over the role of being the universal system integrator for physical systems. Prominent leaders of industrial and academic R&D organizations will discuss the consistency between present and future application challenges as seen by industry and dominating research challenges as conceived by academia.
ArXiv, 2018
This invention addresses fixed-point representations of convolutional neural networks (CNN) in in... more This invention addresses fixed-point representations of convolutional neural networks (CNN) in integrated circuits. When quantizing a CNN for a practical implementation there is a trade-off between the precision used for operations between coefficients and data and the accuracy of the system. A homogenous representation may not be sufficient to achieve the best level of performance at a reasonable cost in implementation complexity or power consumption. Parsimonious ways of representing data and coefficients are needed to improve power efficiency and throughput while maintaining accuracy of a CNN.
Proceedings of the 27th European Solid-State Circuits Conference, 2001
New application-focused system-on-chip platforms motivate new application-specific processors. Co... more New application-focused system-on-chip platforms motivate new application-specific processors. Configurable and extensible processor architectures offer the efficiency of tuned logic solutions with the flexibility of standard high-level programming methodology. Automated extension of processor function units and the associated software environment - compilers, debuggers, simulators and real-time operating systems - satisfies these needs. At the same time, designing at the level of software and instruction set architecture significantly shortens the design cycle and reduces verification effort and risk. This paper describes the key dimensions of extensibility within the processor architecture, the instruction set extension description language and the means of automatically extending the software environment from that description. It also describes two groups of benchmarks, EEMBC's Consumer and Telecommunications suites, that show 20 to 40 times acceleration of a broad set of alg...
With the majority of chip real estate being filled with re-used IP blocks, the process of block a... more With the majority of chip real estate being filled with re-used IP blocks, the process of block assembly has significantly grown in importance. Marketing literature seems to suggest that assembling a chip from IP is as easy as browsing a library of blocks, assembling them in a block diagram and then pushing a button. Are we there yet? Is it like the assembly of LEGO blocks? In parallel, the evolution of IP-reuse has gone through various stages from smaller blocks like multipliers and adders to more complex blocks like USB, SATA, PCI and others. What is the next step up from here? Given the growing importance of software, a new form of re-use has started to emerge, that is re-use at the sub-system level, often including hardware and software. This panel will explore the trends in IP re-use and discuss where IP blocks end and systems start. Furthermore, it will also explore the trends in IP-reuse and assembly, assess the state of the art of IP assembly and co-simulation standards like...
In this work, we examine the computational efficiency of scientific applications on three high-pe... more In this work, we examine the computational efficiency of scientific applications on three high-performancecomputing systems based on processors of varying degrees of specialization: an x86 server processor, the AMD Opteron; a more specialized System-on-Chip solution, the BlueGene/L and BlueGene/P; and a configurable embedded core, the Tensilica Xtensa. We use the atmospheric component of the global Community Atmospheric Model to motivate our study by defining a problem that requires exascale-class computing performance currently beyond the capabilities of existing systems. Significant advances in power-efficiency are necessary to make such a system practical to field.
IEEE Transactions on Pattern Analysis and Machine Intelligence
The 55th Design Automation Conference (DAC) held its first System Design Contest (SDC) in 2018. S... more The 55th Design Automation Conference (DAC) held its first System Design Contest (SDC) in 2018. SDC'18 features a lower power object detection challenge (LPODC) on designing and implementing novel algorithms based object detection in images taken from unmanned aerial vehicles (UAV). The dataset includes 95 categories and 150k images, and the hardware platforms include Nvidia's TX2 and Xilinx's PYNQ Z1. DAC-SDC'18 attracted more than 110 entries from 12 countries. This paper presents in detail the dataset and evaluation procedure. It further discusses the methods developed by some of the entries as well as representative results. The paper concludes with directions for future improvements.
SOCs solve complex, data-intensive application problems by delivering high performance with good ... more SOCs solve complex, data-intensive application problems by delivering high performance with good power-efficiency. Configurable, application-specific processor cores used as task blocks in an SOC deliver hardware-like performance and programmability, which reduces design effort and risk when compared to manual RTL block-design techniques.
IEEE Design and Test of Computers, 1999
ACM Transactions on …, 1988
MIPS is a 32-bit processor architecture that has been implemented as an nMOS VLSI chip. The instr... more MIPS is a 32-bit processor architecture that has been implemented as an nMOS VLSI chip. The instruction set architecture is RISC-based. Close coupling with compilers and efficient use of the instruction set by compiled programs were goals of the architecture. The MIPS architecture requires that the software implement some constraints in the design that are normally considered part of the hardware implementation. This paper presents experimental results on the effectiveness of this processor as a program host. Using sets of large and small benchmarks, the instruction and operand usage patterns are examined both for optimized and unoptimized code. Several of the architectural and organizational innovations in MIPS, including software pipeline scheduling, multiple-operation instructions, and word-based addressing, are examined in light of this data.