Chris Nicol - Academia.edu (original) (raw)
Papers by Chris Nicol
Proceedings of the IEEE 2002 Custom Integrated Circuits Conference (Cat. No.02CH37285)
Abstract This paper describes the architecture of integrated circuits for base-band processing in... more Abstract This paper describes the architecture of integrated circuits for base-band processing in 3rd Generation (3G) mobile wireless systems. Wideband CDMA receiver functions including the RAKE, rate de/matching, channel dehterleaving, channel dekoding and multi-user ...
2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2014
Increasing data rates require the use of a number of technologies to achieve high data throughput... more Increasing data rates require the use of a number of technologies to achieve high data throughput while minimizing power consumption. The papers in this session showcase industry-led innovations that underpin future high-speed data networks. The first paper maximizes inter-chip data rates while lowering power consumption for memory architectures in high-speed switches. The second paper utilizes advanced DSP techniques to enable high-capacity optical networking systems. The third paper demonstrates the integration of high-performance data converters with 28nm FPGAs. The first and third papers also demonstrate the benefits of 3D integration.
2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers, 2015
While the Internet of Things (IoT) is a recent phenomenon in the consumer electronics market, suc... more While the Internet of Things (IoT) is a recent phenomenon in the consumer electronics market, such exciting developments are frequently the result of decades of microelectronics research presented at ISSCC. In this session we address the security and efficiency issues that must be addressed before IoT devices can be deployed in a ubiquitous manner. IoT devices require advanced security mechanisms that protect the private data contained within them from cryptanaiytic and physical attacks. This session includes four papers that present demonstrations of security, power and context-aware voice-activated circuitry, an emerging computing paradigm inspired by magnetic spin interactions, and a low power, low cost backplane interconnect.
2004 IEEE International Solid-State Circuits Conference (IEEE Cat. No.04CH37519)
ISLPED'00: Proceedings of the 2000 International Symposium on Low Power Electronics and Design (Cat. No.00TH8514)
Proceedings of the 2001 international symposium on Low power electronics and design - ISLPED '01, 2001
The requirement of turbo decoding in 3G wireless standards has forced handset designers to consid... more The requirement of turbo decoding in 3G wireless standards has forced handset designers to consider power consumption issues in their implementations. The phenomenal performance of turbo codes comes at the expense of computation. Primarily this paper looks at methods of substantially reducing the power consumption for the decoding operation, making it feasible to integrate turbo decoders into a low power handset. The techniques presented include early termination of the turbo process, encoding of extrinsic information to reduce the memory size, and disabling portions of the MAP algorithm when the results will not affect the decoded output. The net result of these techniques is almost a 70% reduction in power over a fixed 6 iteration, 8-state baseline turbo decoder at 2 dB of signal to noise ratio (SNR).
2003 IEEE International Solid-State Circuits Conference, 2003. Digest of Technical Papers. ISSCC.
Abstract A 24Mb/s 3GPP-HSDPA radix-4 logMAP turbo decoder is designed for 3G data terminals. It f... more Abstract A 24Mb/s 3GPP-HSDPA radix-4 logMAP turbo decoder is designed for 3G data terminals. It features an approximate radix-4 logsum circuit to achieve 145MHz operation. Power is reduced using 1/2-iteration early termination and extrinsics are interleaved in companded format. The decoder core is 14.5 mm 2 in 0.18μm CMOS.
IEEE Custom Integrated Circuits Conference 2006, 2006
I. INTRODUCTION UMTS mobile wireless networks conforming to 3GPP Release 99 (R99) standards are n... more I. INTRODUCTION UMTS mobile wireless networks conforming to 3GPP Release 99 (R99) standards are now widely deployed with more than 90 operational networks in over 35 countries [1]. They are, however, yet to realize the vision of ubiquitous mobile broadband data due to the ...
Proceedings of the 1998 international symposium on Low power electronics and design - ISLPED '98, 1998
Proceedings of the 2000 international symposium on Low power electronics and design - ISLPED '00, 2000
Wireless communications and more specifically, the fast growing penetration of cellular phones an... more Wireless communications and more specifically, the fast growing penetration of cellular phones and cellular infrastructure are the major drivers for the development of new programmable Digital Signal Processors (DSP's). In this tutorial, an overview will be given of recent developments in DSP processor architectures, that makes them well suited to execute computationally intensive algorithms typically found in communications systems. DSP processors have adapted instruction sets, memory architectures and data paths to execute compute intensive communications algorithms efficiently and in a low power fashion. Basic building blocks include convolutional decoders (mainly the Viterbi algorithm), turbo coding algorithms, FIR filters, speech coders, etc. This is illustrated with examples of different commercial and research processors. Please note that the authors do not endorse the processors used in this tutorial. These processors are used to illustrate how different solutions are proposed for the same problem.
Proceedings of the 2002 international symposium on Low power electronics and design - ISLPED '02, 2002
This paper presents a decision feedback equalizer (DFE) for a high-speed packet modem utilizing t... more This paper presents a decision feedback equalizer (DFE) for a high-speed packet modem utilizing the normalized least mean squared (NLMS) tap update algorithm. The equalizer supports up to 43.2 Mbps uncoded data over a wireless channel with a 10% training preamble (48 Mbps with no training). In this work the rapid convergence of the NLMS algorithm is combined a technique for early termination of the tap training process to yield a low power DFE implementation. The low power techniques result in a 43% power reduction over a baseline design. Furthermore, low power synthesis techniques result in an additional 30% power savings on top of the algorithmic power savings.
2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2014
In a “call for leadership” panel at ISSCC, we will be seeking leaders' perspectives on the fu... more In a “call for leadership” panel at ISSCC, we will be seeking leaders' perspectives on the future of discontinuous innovation in the semiconductor industry. An ensemble of visionaries, experts and CEOs will discuss the opportunities and challenges for innovation in our industry. Of primary concern is the reduction in funding available for new semiconductor ventures. Are the escalating NRE costs of ASICs providing a barrier to new entrants? What advice would our distinguished panellists give to entrepreneurs thinking of starting a new semiconductor company?
IEEE Solid-State Circuits Magazine, 2014
— By ,taking advantage ,of the redundancy in a 4-2 compressor, we reduce the number,of transition... more — By ,taking advantage ,of the redundancy in a 4-2 compressor, we reduce the number,of transitions in carry-save adder trees that are common ,in large multipliers. Three new ,4-2 compressors,are proposed. These are used ,in different configurations to reduce the probability of atransition in the global carry wires by up,to 40% over current techniques. Power reductions are demonstrated with the use of a ,4-tap FIR filter module and a 54×54-bit multiplier. Transistor level circuit simulations indicate 5-6% power reduction with no increase in delay.
IEEE Journal of Solid-State Circuits, 2000
An MIMD multiprocessor digital signal-processing (DSP) chip containing four 64-b processing eleme... more An MIMD multiprocessor digital signal-processing (DSP) chip containing four 64-b processing elements (PE's) interconnected by a 128-b pipelined split transaction bus (STBus) is presented. Each PE contains a 32-b RISC core with DSP enhancements and a 64-b single-instruction, multiple-data vector coprocessor with four 16-b MAC/s and a vector reduction unit. PE's are connected to the STBus through reconfigurable dual-ported snooping L1 cache memories that support shared memory multiprocessing using a modified-MESI data coherency protocol. High-bandwidth data transfers between system memory and on-chip caches are managed in a pipelined memory controller that supports multiple outstanding transactions. An embedded RTOS dynamically schedules multiple tasks onto the PE's. Process synchronization is achieved using cached semaphores. The 200-mm 2 , 0.25-m CMOS chip operates at 100 MHz and dissipates 4 W from a 3.3-V supply.
IEEE Journal of Solid-State Circuits, 2002
Abstract A channel decoder chip compliant with the 3GPP mobile wireless standard is described. It... more Abstract A channel decoder chip compliant with the 3GPP mobile wireless standard is described. It supports both data and voice calls simultaneously in a unified turbo/Viterbi decoder architecture. For voice services, the decoder can process over 128 voice channels encoded with rate 1/2 or 1/3, constraint length 9 convolutional codes. For data services, the turbo decoder is capable of processing any mix of rate 1/3, constraint length 4 turbo encoded data streams with an aggregate data rate of up to 2.5 Mb/s with 10 iterations per ...
IEEE Communications Magazine, 2003
ee.ucla.edu
... 1,200,000 1,400,000 1,600,000 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 200... more ... 1,200,000 1,400,000 1,600,000 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 Global W ireline Gobal W ireless ... Ingrid Verbauwhede, Chris Nicol Example 1: TMS320C10 (1982) Data RAM Program ROM 1.5K x 16 144 x 16 ...
Proceedings of the 13th ACM Great Lakes symposium on VLSI, 2003
This paper describes a power, speed and area efficient VLSI implementation of a noise whitening a... more This paper describes a power, speed and area efficient VLSI implementation of a noise whitening algorithm for a 4x4 MIMO channel. The architecture combines innovative use of Hermitian matrices to streamline the iterative calculations, with a 4x1 matrix row-column multiplier as the core component. The optimisations in the datapath reduce the power and latency needed to implement the algorithm. The Booth recoded complex multipliers use logic sharing to reduce power and complexity, and incorporate low-power sleep logic that does not increase the critical path. The design has been successfully synthesised in a 0.18µm, 1.8V CMOS technology, and has the potential to be adapted to other applications requiring matrix multiplication.
Proceedings of the IEEE 2002 Custom Integrated Circuits Conference (Cat. No.02CH37285)
Abstract This paper describes the architecture of integrated circuits for base-band processing in... more Abstract This paper describes the architecture of integrated circuits for base-band processing in 3rd Generation (3G) mobile wireless systems. Wideband CDMA receiver functions including the RAKE, rate de/matching, channel dehterleaving, channel dekoding and multi-user ...
2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2014
Increasing data rates require the use of a number of technologies to achieve high data throughput... more Increasing data rates require the use of a number of technologies to achieve high data throughput while minimizing power consumption. The papers in this session showcase industry-led innovations that underpin future high-speed data networks. The first paper maximizes inter-chip data rates while lowering power consumption for memory architectures in high-speed switches. The second paper utilizes advanced DSP techniques to enable high-capacity optical networking systems. The third paper demonstrates the integration of high-performance data converters with 28nm FPGAs. The first and third papers also demonstrate the benefits of 3D integration.
2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers, 2015
While the Internet of Things (IoT) is a recent phenomenon in the consumer electronics market, suc... more While the Internet of Things (IoT) is a recent phenomenon in the consumer electronics market, such exciting developments are frequently the result of decades of microelectronics research presented at ISSCC. In this session we address the security and efficiency issues that must be addressed before IoT devices can be deployed in a ubiquitous manner. IoT devices require advanced security mechanisms that protect the private data contained within them from cryptanaiytic and physical attacks. This session includes four papers that present demonstrations of security, power and context-aware voice-activated circuitry, an emerging computing paradigm inspired by magnetic spin interactions, and a low power, low cost backplane interconnect.
2004 IEEE International Solid-State Circuits Conference (IEEE Cat. No.04CH37519)
ISLPED'00: Proceedings of the 2000 International Symposium on Low Power Electronics and Design (Cat. No.00TH8514)
Proceedings of the 2001 international symposium on Low power electronics and design - ISLPED '01, 2001
The requirement of turbo decoding in 3G wireless standards has forced handset designers to consid... more The requirement of turbo decoding in 3G wireless standards has forced handset designers to consider power consumption issues in their implementations. The phenomenal performance of turbo codes comes at the expense of computation. Primarily this paper looks at methods of substantially reducing the power consumption for the decoding operation, making it feasible to integrate turbo decoders into a low power handset. The techniques presented include early termination of the turbo process, encoding of extrinsic information to reduce the memory size, and disabling portions of the MAP algorithm when the results will not affect the decoded output. The net result of these techniques is almost a 70% reduction in power over a fixed 6 iteration, 8-state baseline turbo decoder at 2 dB of signal to noise ratio (SNR).
2003 IEEE International Solid-State Circuits Conference, 2003. Digest of Technical Papers. ISSCC.
Abstract A 24Mb/s 3GPP-HSDPA radix-4 logMAP turbo decoder is designed for 3G data terminals. It f... more Abstract A 24Mb/s 3GPP-HSDPA radix-4 logMAP turbo decoder is designed for 3G data terminals. It features an approximate radix-4 logsum circuit to achieve 145MHz operation. Power is reduced using 1/2-iteration early termination and extrinsics are interleaved in companded format. The decoder core is 14.5 mm 2 in 0.18μm CMOS.
IEEE Custom Integrated Circuits Conference 2006, 2006
I. INTRODUCTION UMTS mobile wireless networks conforming to 3GPP Release 99 (R99) standards are n... more I. INTRODUCTION UMTS mobile wireless networks conforming to 3GPP Release 99 (R99) standards are now widely deployed with more than 90 operational networks in over 35 countries [1]. They are, however, yet to realize the vision of ubiquitous mobile broadband data due to the ...
Proceedings of the 1998 international symposium on Low power electronics and design - ISLPED '98, 1998
Proceedings of the 2000 international symposium on Low power electronics and design - ISLPED '00, 2000
Wireless communications and more specifically, the fast growing penetration of cellular phones an... more Wireless communications and more specifically, the fast growing penetration of cellular phones and cellular infrastructure are the major drivers for the development of new programmable Digital Signal Processors (DSP's). In this tutorial, an overview will be given of recent developments in DSP processor architectures, that makes them well suited to execute computationally intensive algorithms typically found in communications systems. DSP processors have adapted instruction sets, memory architectures and data paths to execute compute intensive communications algorithms efficiently and in a low power fashion. Basic building blocks include convolutional decoders (mainly the Viterbi algorithm), turbo coding algorithms, FIR filters, speech coders, etc. This is illustrated with examples of different commercial and research processors. Please note that the authors do not endorse the processors used in this tutorial. These processors are used to illustrate how different solutions are proposed for the same problem.
Proceedings of the 2002 international symposium on Low power electronics and design - ISLPED '02, 2002
This paper presents a decision feedback equalizer (DFE) for a high-speed packet modem utilizing t... more This paper presents a decision feedback equalizer (DFE) for a high-speed packet modem utilizing the normalized least mean squared (NLMS) tap update algorithm. The equalizer supports up to 43.2 Mbps uncoded data over a wireless channel with a 10% training preamble (48 Mbps with no training). In this work the rapid convergence of the NLMS algorithm is combined a technique for early termination of the tap training process to yield a low power DFE implementation. The low power techniques result in a 43% power reduction over a baseline design. Furthermore, low power synthesis techniques result in an additional 30% power savings on top of the algorithmic power savings.
2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2014
In a “call for leadership” panel at ISSCC, we will be seeking leaders' perspectives on the fu... more In a “call for leadership” panel at ISSCC, we will be seeking leaders' perspectives on the future of discontinuous innovation in the semiconductor industry. An ensemble of visionaries, experts and CEOs will discuss the opportunities and challenges for innovation in our industry. Of primary concern is the reduction in funding available for new semiconductor ventures. Are the escalating NRE costs of ASICs providing a barrier to new entrants? What advice would our distinguished panellists give to entrepreneurs thinking of starting a new semiconductor company?
IEEE Solid-State Circuits Magazine, 2014
— By ,taking advantage ,of the redundancy in a 4-2 compressor, we reduce the number,of transition... more — By ,taking advantage ,of the redundancy in a 4-2 compressor, we reduce the number,of transitions in carry-save adder trees that are common ,in large multipliers. Three new ,4-2 compressors,are proposed. These are used ,in different configurations to reduce the probability of atransition in the global carry wires by up,to 40% over current techniques. Power reductions are demonstrated with the use of a ,4-tap FIR filter module and a 54×54-bit multiplier. Transistor level circuit simulations indicate 5-6% power reduction with no increase in delay.
IEEE Journal of Solid-State Circuits, 2000
An MIMD multiprocessor digital signal-processing (DSP) chip containing four 64-b processing eleme... more An MIMD multiprocessor digital signal-processing (DSP) chip containing four 64-b processing elements (PE's) interconnected by a 128-b pipelined split transaction bus (STBus) is presented. Each PE contains a 32-b RISC core with DSP enhancements and a 64-b single-instruction, multiple-data vector coprocessor with four 16-b MAC/s and a vector reduction unit. PE's are connected to the STBus through reconfigurable dual-ported snooping L1 cache memories that support shared memory multiprocessing using a modified-MESI data coherency protocol. High-bandwidth data transfers between system memory and on-chip caches are managed in a pipelined memory controller that supports multiple outstanding transactions. An embedded RTOS dynamically schedules multiple tasks onto the PE's. Process synchronization is achieved using cached semaphores. The 200-mm 2 , 0.25-m CMOS chip operates at 100 MHz and dissipates 4 W from a 3.3-V supply.
IEEE Journal of Solid-State Circuits, 2002
Abstract A channel decoder chip compliant with the 3GPP mobile wireless standard is described. It... more Abstract A channel decoder chip compliant with the 3GPP mobile wireless standard is described. It supports both data and voice calls simultaneously in a unified turbo/Viterbi decoder architecture. For voice services, the decoder can process over 128 voice channels encoded with rate 1/2 or 1/3, constraint length 9 convolutional codes. For data services, the turbo decoder is capable of processing any mix of rate 1/3, constraint length 4 turbo encoded data streams with an aggregate data rate of up to 2.5 Mb/s with 10 iterations per ...
IEEE Communications Magazine, 2003
ee.ucla.edu
... 1,200,000 1,400,000 1,600,000 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 200... more ... 1,200,000 1,400,000 1,600,000 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 Global W ireline Gobal W ireless ... Ingrid Verbauwhede, Chris Nicol Example 1: TMS320C10 (1982) Data RAM Program ROM 1.5K x 16 144 x 16 ...
Proceedings of the 13th ACM Great Lakes symposium on VLSI, 2003
This paper describes a power, speed and area efficient VLSI implementation of a noise whitening a... more This paper describes a power, speed and area efficient VLSI implementation of a noise whitening algorithm for a 4x4 MIMO channel. The architecture combines innovative use of Hermitian matrices to streamline the iterative calculations, with a 4x1 matrix row-column multiplier as the core component. The optimisations in the datapath reduce the power and latency needed to implement the algorithm. The Booth recoded complex multipliers use logic sharing to reduce power and complexity, and incorporate low-power sleep logic that does not increase the critical path. The design has been successfully synthesised in a 0.18µm, 1.8V CMOS technology, and has the potential to be adapted to other applications requiring matrix multiplication.