Mike Houston - Academia.edu (original) (raw)

Papers by Mike Houston

Research paper thumbnail of N-Body Simulations on GPUs

ArXiv, 2007

Commercial graphics processors (GPUs) have high compute capacity at very low cost, which makes th... more Commercial graphics processors (GPUs) have high compute capacity at very low cost, which makes them attractive for general purpose scientific computing. In this paper we show how graphics processors can be used for N-body simulations to obtain improvements in performance over current generation CPUs. We have developed a highly optimized algorithm for performing the O(N^2) force calculations that constitute the major part of stellar and molecular dynamics simulations. In some of the calculations, we achieve sustained performance of nearly 100 GFlops on an ATI X1900XTX. The performance on GPUs is comparable to specialized processors such as GRAPE-6A and MDGRAPE-3, but at a fraction of the cost. Furthermore, the wide availability of GPUs has significant implications for cluster computing and distributed computing efforts like Folding@Home.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of ClawHMMER: A Streaming HMMer-Search Implementation

Bookmarks Related papers MentionsView impact

Research paper thumbnail of ClawHMMER: A Streaming HMMer-Search Implementatio

ACM/IEEE SC 2005 Conference (SC'05), 2005

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Stream computing

ACM SIGGRAPH 2008 classes on - SIGGRAPH '08, 2008

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Understanding GPUs through benchmarking

ACM SIGGRAPH 2007 courses on - SIGGRAPH '07, 2007

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Graphics Hardware (2004)

Partitioning fragment shaders into multiple rendering passes is an effective technique for virtua... more Partitioning fragment shaders into multiple rendering passes is an effective technique for virtualizing shading resource limits in graphics hardware. The Recursive Dominator Split (RDS) algorithm is a polynomial-time algorithm for partitioning fragment shaders for real-time rendering that has been shown to generate efficient partitions.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of GPUs

Queue, 2008

A gamer wanders through a virtual world rendered in near- cinematic detail. Seconds later, the sc... more A gamer wanders through a virtual world rendered in near- cinematic detail. Seconds later, the screen fills with a 3D explosion, the result of unseen enemies hiding in physically accurate shadows. Disappointed, the user exits the game and returns to a computer desktop that exhibits the stylish 3D look-and-feel of a modern window manager. Both of these visual experiences require hundreds of gigaflops of computing performance, a demand met by the GPU (graphics processing unit) present in every consumer PC.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Interactive k-d tree GPU raytracing

Proceedings of the 2007 symposium on Interactive 3D graphics and games - I3D '07, 2007

Bookmarks Related papers MentionsView impact

Research paper thumbnail of High level languages for GPUs overview

ACM SIGGRAPH 2007 courses on - SIGGRAPH '07, 2007

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Fast volume segmentation with simultaneous visualization using programmable graphics hardware

IEEE Transactions on Ultrasonics, Ferroelectrics and Frequency Control, 2003

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Efficient partitioning of fragment shaders for multiple-output hardware

Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware - HWWS '04, 2004

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Poster reception---N-Body simulation on GPUs

Proceedings of the 2006 ACM/IEEE conference on Supercomputing - SC '06, 2006

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Beyond programmable shading (parts I and II)

ACM SIGGRAPH 2009 Courses on - SIGGRAPH '09, 2009

... Downloads (6 Weeks), 108. Downloads (12 Months), 1,391. View colleagues of Tim Foley. Justin ... more ... Downloads (6 Weeks), 108. Downloads (12 Months), 1,391. View colleagues of Tim Foley. Justin Hensley ... 11. Nguyen, H., and Donelly, W. 2005. Hair animation and rendering in the nalu demo. GPU Gems 2, 361--380. 12. Sintorn, E., and Assarsson, U. 2007. ...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of A portable runtime interface for multi-level memory hierarchies

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming - PPoPP '08, 2008

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Compilation for explicitly managed memory hierarchies

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '07, 2007

Bookmarks Related papers MentionsView impact

Research paper thumbnail of A tuning framework for software-managed memory hierarchies

Proceedings of the 17th international conference on Parallel architectures and compilation techniques - PACT '08, 2008

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Sequoia: Programming the Memory Hierarchy

ACM/IEEE SC 2006 Conference (SC'06), 2006

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Beyond programmable shading

ACM SIGGRAPH 2008 classes on - SIGGRAPH '08, 2008

This first course in a series gives an introduction to parallel programming architectures and env... more This first course in a series gives an introduction to parallel programming architectures and environments for interactive graphics. There are strong indications that the future of interactive graphics involves a programming model more flexible than today's OpenGL/Direct3D pipelines. As such, graphics developers need to have a basic understanding of how to combine emerging parallel programming techniques with the traditional interactive

Bookmarks Related papers MentionsView impact

Research paper thumbnail of S07---GPGPU

Proceedings of the 2006 ACM/IEEE conference on Supercomputing - SC '06, 2006

Bookmarks Related papers MentionsView impact

Research paper thumbnail of ATI Stream Profiler

ACM SIGGRAPH 2010 Posters on - SIGGRAPH '10, 2010

Modern GPUs have been shown to be highly efficient machines for data-parallel applications such a... more Modern GPUs have been shown to be highly efficient machines for data-parallel applications such as graphics, image, video processing, or physical simulation applications. For example, a single ATI Radeon#8482; HD 5870 GPU has a theoretical peak of 2.72 teraflops (1012 floating-point operations per second) with a video memory bandwidth of 153.6 GB/s. While it is not difficult to port CPU

Bookmarks Related papers MentionsView impact

Research paper thumbnail of N-Body Simulations on GPUs

ArXiv, 2007

Commercial graphics processors (GPUs) have high compute capacity at very low cost, which makes th... more Commercial graphics processors (GPUs) have high compute capacity at very low cost, which makes them attractive for general purpose scientific computing. In this paper we show how graphics processors can be used for N-body simulations to obtain improvements in performance over current generation CPUs. We have developed a highly optimized algorithm for performing the O(N^2) force calculations that constitute the major part of stellar and molecular dynamics simulations. In some of the calculations, we achieve sustained performance of nearly 100 GFlops on an ATI X1900XTX. The performance on GPUs is comparable to specialized processors such as GRAPE-6A and MDGRAPE-3, but at a fraction of the cost. Furthermore, the wide availability of GPUs has significant implications for cluster computing and distributed computing efforts like Folding@Home.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of ClawHMMER: A Streaming HMMer-Search Implementation

Bookmarks Related papers MentionsView impact

Research paper thumbnail of ClawHMMER: A Streaming HMMer-Search Implementatio

ACM/IEEE SC 2005 Conference (SC'05), 2005

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Stream computing

ACM SIGGRAPH 2008 classes on - SIGGRAPH '08, 2008

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Understanding GPUs through benchmarking

ACM SIGGRAPH 2007 courses on - SIGGRAPH '07, 2007

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Graphics Hardware (2004)

Partitioning fragment shaders into multiple rendering passes is an effective technique for virtua... more Partitioning fragment shaders into multiple rendering passes is an effective technique for virtualizing shading resource limits in graphics hardware. The Recursive Dominator Split (RDS) algorithm is a polynomial-time algorithm for partitioning fragment shaders for real-time rendering that has been shown to generate efficient partitions.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of GPUs

Queue, 2008

A gamer wanders through a virtual world rendered in near- cinematic detail. Seconds later, the sc... more A gamer wanders through a virtual world rendered in near- cinematic detail. Seconds later, the screen fills with a 3D explosion, the result of unseen enemies hiding in physically accurate shadows. Disappointed, the user exits the game and returns to a computer desktop that exhibits the stylish 3D look-and-feel of a modern window manager. Both of these visual experiences require hundreds of gigaflops of computing performance, a demand met by the GPU (graphics processing unit) present in every consumer PC.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Interactive k-d tree GPU raytracing

Proceedings of the 2007 symposium on Interactive 3D graphics and games - I3D '07, 2007

Bookmarks Related papers MentionsView impact

Research paper thumbnail of High level languages for GPUs overview

ACM SIGGRAPH 2007 courses on - SIGGRAPH '07, 2007

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Fast volume segmentation with simultaneous visualization using programmable graphics hardware

IEEE Transactions on Ultrasonics, Ferroelectrics and Frequency Control, 2003

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Efficient partitioning of fragment shaders for multiple-output hardware

Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware - HWWS '04, 2004

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Poster reception---N-Body simulation on GPUs

Proceedings of the 2006 ACM/IEEE conference on Supercomputing - SC '06, 2006

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Beyond programmable shading (parts I and II)

ACM SIGGRAPH 2009 Courses on - SIGGRAPH '09, 2009

... Downloads (6 Weeks), 108. Downloads (12 Months), 1,391. View colleagues of Tim Foley. Justin ... more ... Downloads (6 Weeks), 108. Downloads (12 Months), 1,391. View colleagues of Tim Foley. Justin Hensley ... 11. Nguyen, H., and Donelly, W. 2005. Hair animation and rendering in the nalu demo. GPU Gems 2, 361--380. 12. Sintorn, E., and Assarsson, U. 2007. ...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of A portable runtime interface for multi-level memory hierarchies

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming - PPoPP '08, 2008

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Compilation for explicitly managed memory hierarchies

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '07, 2007

Bookmarks Related papers MentionsView impact

Research paper thumbnail of A tuning framework for software-managed memory hierarchies

Proceedings of the 17th international conference on Parallel architectures and compilation techniques - PACT '08, 2008

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Sequoia: Programming the Memory Hierarchy

ACM/IEEE SC 2006 Conference (SC'06), 2006

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Beyond programmable shading

ACM SIGGRAPH 2008 classes on - SIGGRAPH '08, 2008

This first course in a series gives an introduction to parallel programming architectures and env... more This first course in a series gives an introduction to parallel programming architectures and environments for interactive graphics. There are strong indications that the future of interactive graphics involves a programming model more flexible than today's OpenGL/Direct3D pipelines. As such, graphics developers need to have a basic understanding of how to combine emerging parallel programming techniques with the traditional interactive

Bookmarks Related papers MentionsView impact

Research paper thumbnail of S07---GPGPU

Proceedings of the 2006 ACM/IEEE conference on Supercomputing - SC '06, 2006

Bookmarks Related papers MentionsView impact

Research paper thumbnail of ATI Stream Profiler

ACM SIGGRAPH 2010 Posters on - SIGGRAPH '10, 2010

Modern GPUs have been shown to be highly efficient machines for data-parallel applications such a... more Modern GPUs have been shown to be highly efficient machines for data-parallel applications such as graphics, image, video processing, or physical simulation applications. For example, a single ATI Radeon#8482; HD 5870 GPU has a theoretical peak of 2.72 teraflops (1012 floating-point operations per second) with a video memory bandwidth of 153.6 GB/s. While it is not difficult to port CPU

Bookmarks Related papers MentionsView impact