Karthikeyan Sankaralingam - Academia.edu (original) (raw)

Uploads

Papers by Karthikeyan Sankaralingam

Research paper thumbnail of Efficient execution of memory access phases using dataflow specialization

Proceedings of the 42nd Annual International Symposium on Computer Architecture - ISCA '15, 2015

Research paper thumbnail of Karthikeyan Sankaralingam, Ramadass Nagarajan, Stephen W. Keckler, and Doug Burger. SimpleScalar Simulation

Research paper thumbnail of Comprehensive Circuit Failure Prediction for Logic and SRAM Using Virtual Aging

Research paper thumbnail of A wire-delay scalable microprocessor architecture for high performance systems

2003 IEEE International Solid-State Circuits Conference, 2003. Digest of Technical Papers. ISSCC., 2000

... IBM, and Intel. References [1] MS Hrishikesh, NP Jouppi, KI Farkas, D. Burger, SW Keckler, an... more ... IBM, and Intel. References [1] MS Hrishikesh, NP Jouppi, KI Farkas, D. Burger, SW Keckler, and P. Shivakumar, “The Optimal Logic Depth Per Pipeline Stage is 6 to 8 FO4 Inverter Delays,” ISCA-29, pp. 14-24, May, 2002. [2] R ...

Research paper thumbnail of Toward a multicore architecture for real-time ray-tracing

2008 41st IEEE/ACM International Symposium on Microarchitecture, 2008

Significant improvement to visual quality for real-time 3D graphics requires modeling of complex ... more Significant improvement to visual quality for real-time 3D graphics requires modeling of complex illumination effects like soft-shadows, reflections, and diffuse lighting interac-tions. The conventional Z-buffer algorithm driven GPU model does not provide sufficient support for this ...

Research paper thumbnail of A Technology-Scalable Architecture for Fast Clocks and High ILP

CMOS technology scaling poses challenges in designing dynamically scheduled cores that can sustai... more CMOS technology scaling poses challenges in designing dynamically scheduled cores that can sustain both high instruction-level parallelism and aggressive clock frequencies. In this paper, we present a new architecture that maps compiler-scheduled blocks onto a two-dimensional grid of ALUs. For the mapped window of execution, instructions execute in a dataflow-like manner, with each ALU forwarding its result along short wires to the consumers of the result. We describe our studies of program behavior and a preliminary evaluation that show that this architecture has the potential for both high clock speeds and high ILP, and may offer the best of both the VLIW and dynamic superscalar architectures.

Research paper thumbnail of SimpleScalar Simulation of the PowerPC Instruction Set Architecture

Research paper thumbnail of Appears in the Proceedings of the 34 th Annual International Symposium on Microarchitecture

Research paper thumbnail of Appears in the Proceedings of the Annual International Symposium on Computer Architecture

Research paper thumbnail of Appears in the 36

Research paper thumbnail of Appears in the 5th Annual Workshop on Interaction between Compilers and Computer Architectures (INTERACT-5)

Research paper thumbnail of Appears in the

Research paper thumbnail of Appears in the Proceedings of the 30

Research paper thumbnail of Exploring the Synergy of Emerging Workloads and Silicon Reliability Trends

Technology constraints and application characteristics are radically changing as we scale to the ... more Technology constraints and application characteristics are radically changing as we scale to the end of silicon technology. Devices are becoming increasingly brittle, highly varying in their properties, and error-prone, leading to a fundamentally unpredictable hardware substrate. Applications are also changing, and emerging new classes of applications are increasingly relying on probabilistic methods. They have an inherent tolerance for uncertainty and can tolerate hardware errors.

Research paper thumbnail of Design and analysis of routed Inter-ALU Networks for ILP scalability and performance

Abstract: Modern processors rely heavily on broadcast networks to bypass instruction results tode... more Abstract: Modern processors rely heavily on broadcast networks to bypass instruction results todependent instructions in the pipeline. However, as architectures get wider and pipelinesget deeper, broadcasting becomes more complex, slower, and more difficult to implement.

Research paper thumbnail of January 2008Signature Matching in Network Processing using SIMD/GPU Architectures

Deep packet inspection is becoming prevalent for mod- ern network processing systems. They inspec... more Deep packet inspection is becoming prevalent for mod- ern network processing systems. They inspect packet pay- loads for a variety of reasons, including intrusion detecti on, traffic policing, and load balancing. The focus of this paper is deep packet inspection in intrusion detection/preventi on systems (IPSes). The performance critical operation in the se systems is signature matching: matching payloads against signatures of vulnerabilities. Increasing network speedsof today's networks and the transition from simple string-bas ed signatures to complex regular expressions has rapidly in- creased the performance requirement of signature matching . To meet these requirements, solutions range from hardware- centric ASIC/FPGA implementations to software implemen- tations using high-performance microprocessors. In this paper, we propose a programmable SIMD archi- tecture design for IPSes and develop a prototype implemen- tation on an Nvidia G80 GPU. We first present a detailed archi...

Research paper thumbnail of Exploring the potential of heterogeneous von neumann/dataflow execution models

Proceedings of the 42nd Annual International Symposium on Computer Architecture - ISCA '15, 2015

Research paper thumbnail of Optimization and Mathematical Modeling in Computer Architecture

Synthesis Lectures on Computer Architecture, 2013

Research paper thumbnail of Architectural Simulators Considered Harmful

Research paper thumbnail of Enabling GPGPU Low-Level Hardware Explorations with MIAOW

ACM Transactions on Architecture and Code Optimization, 2015

Research paper thumbnail of Efficient execution of memory access phases using dataflow specialization

Proceedings of the 42nd Annual International Symposium on Computer Architecture - ISCA '15, 2015

Research paper thumbnail of Karthikeyan Sankaralingam, Ramadass Nagarajan, Stephen W. Keckler, and Doug Burger. SimpleScalar Simulation

Research paper thumbnail of Comprehensive Circuit Failure Prediction for Logic and SRAM Using Virtual Aging

Research paper thumbnail of A wire-delay scalable microprocessor architecture for high performance systems

2003 IEEE International Solid-State Circuits Conference, 2003. Digest of Technical Papers. ISSCC., 2000

... IBM, and Intel. References [1] MS Hrishikesh, NP Jouppi, KI Farkas, D. Burger, SW Keckler, an... more ... IBM, and Intel. References [1] MS Hrishikesh, NP Jouppi, KI Farkas, D. Burger, SW Keckler, and P. Shivakumar, “The Optimal Logic Depth Per Pipeline Stage is 6 to 8 FO4 Inverter Delays,” ISCA-29, pp. 14-24, May, 2002. [2] R ...

Research paper thumbnail of Toward a multicore architecture for real-time ray-tracing

2008 41st IEEE/ACM International Symposium on Microarchitecture, 2008

Significant improvement to visual quality for real-time 3D graphics requires modeling of complex ... more Significant improvement to visual quality for real-time 3D graphics requires modeling of complex illumination effects like soft-shadows, reflections, and diffuse lighting interac-tions. The conventional Z-buffer algorithm driven GPU model does not provide sufficient support for this ...

Research paper thumbnail of A Technology-Scalable Architecture for Fast Clocks and High ILP

CMOS technology scaling poses challenges in designing dynamically scheduled cores that can sustai... more CMOS technology scaling poses challenges in designing dynamically scheduled cores that can sustain both high instruction-level parallelism and aggressive clock frequencies. In this paper, we present a new architecture that maps compiler-scheduled blocks onto a two-dimensional grid of ALUs. For the mapped window of execution, instructions execute in a dataflow-like manner, with each ALU forwarding its result along short wires to the consumers of the result. We describe our studies of program behavior and a preliminary evaluation that show that this architecture has the potential for both high clock speeds and high ILP, and may offer the best of both the VLIW and dynamic superscalar architectures.

Research paper thumbnail of SimpleScalar Simulation of the PowerPC Instruction Set Architecture

Research paper thumbnail of Appears in the Proceedings of the 34 th Annual International Symposium on Microarchitecture

Research paper thumbnail of Appears in the Proceedings of the Annual International Symposium on Computer Architecture

Research paper thumbnail of Appears in the 36

Research paper thumbnail of Appears in the 5th Annual Workshop on Interaction between Compilers and Computer Architectures (INTERACT-5)

Research paper thumbnail of Appears in the

Research paper thumbnail of Appears in the Proceedings of the 30

Research paper thumbnail of Exploring the Synergy of Emerging Workloads and Silicon Reliability Trends

Technology constraints and application characteristics are radically changing as we scale to the ... more Technology constraints and application characteristics are radically changing as we scale to the end of silicon technology. Devices are becoming increasingly brittle, highly varying in their properties, and error-prone, leading to a fundamentally unpredictable hardware substrate. Applications are also changing, and emerging new classes of applications are increasingly relying on probabilistic methods. They have an inherent tolerance for uncertainty and can tolerate hardware errors.

Research paper thumbnail of Design and analysis of routed Inter-ALU Networks for ILP scalability and performance

Abstract: Modern processors rely heavily on broadcast networks to bypass instruction results tode... more Abstract: Modern processors rely heavily on broadcast networks to bypass instruction results todependent instructions in the pipeline. However, as architectures get wider and pipelinesget deeper, broadcasting becomes more complex, slower, and more difficult to implement.

Research paper thumbnail of January 2008Signature Matching in Network Processing using SIMD/GPU Architectures

Deep packet inspection is becoming prevalent for mod- ern network processing systems. They inspec... more Deep packet inspection is becoming prevalent for mod- ern network processing systems. They inspect packet pay- loads for a variety of reasons, including intrusion detecti on, traffic policing, and load balancing. The focus of this paper is deep packet inspection in intrusion detection/preventi on systems (IPSes). The performance critical operation in the se systems is signature matching: matching payloads against signatures of vulnerabilities. Increasing network speedsof today's networks and the transition from simple string-bas ed signatures to complex regular expressions has rapidly in- creased the performance requirement of signature matching . To meet these requirements, solutions range from hardware- centric ASIC/FPGA implementations to software implemen- tations using high-performance microprocessors. In this paper, we propose a programmable SIMD archi- tecture design for IPSes and develop a prototype implemen- tation on an Nvidia G80 GPU. We first present a detailed archi...

Research paper thumbnail of Exploring the potential of heterogeneous von neumann/dataflow execution models

Proceedings of the 42nd Annual International Symposium on Computer Architecture - ISCA '15, 2015

Research paper thumbnail of Optimization and Mathematical Modeling in Computer Architecture

Synthesis Lectures on Computer Architecture, 2013

Research paper thumbnail of Architectural Simulators Considered Harmful

Research paper thumbnail of Enabling GPGPU Low-Level Hardware Explorations with MIAOW

ACM Transactions on Architecture and Code Optimization, 2015