Mike Houston - Academia.edu (original) (raw)
Papers by Mike Houston
ArXiv, 2007
Commercial graphics processors (GPUs) have high compute capacity at very low cost, which makes th... more Commercial graphics processors (GPUs) have high compute capacity at very low cost, which makes them attractive for general purpose scientific computing. In this paper we show how graphics processors can be used for N-body simulations to obtain improvements in performance over current generation CPUs. We have developed a highly optimized algorithm for performing the O(N^2) force calculations that constitute the major part of stellar and molecular dynamics simulations. In some of the calculations, we achieve sustained performance of nearly 100 GFlops on an ATI X1900XTX. The performance on GPUs is comparable to specialized processors such as GRAPE-6A and MDGRAPE-3, but at a fraction of the cost. Furthermore, the wide availability of GPUs has significant implications for cluster computing and distributed computing efforts like Folding@Home.
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
ACM/IEEE SC 2005 Conference (SC'05), 2005
Bookmarks Related papers MentionsView impact
ACM SIGGRAPH 2008 classes on - SIGGRAPH '08, 2008
Bookmarks Related papers MentionsView impact
ACM SIGGRAPH 2007 courses on - SIGGRAPH '07, 2007
Bookmarks Related papers MentionsView impact
Partitioning fragment shaders into multiple rendering passes is an effective technique for virtua... more Partitioning fragment shaders into multiple rendering passes is an effective technique for virtualizing shading resource limits in graphics hardware. The Recursive Dominator Split (RDS) algorithm is a polynomial-time algorithm for partitioning fragment shaders for real-time rendering that has been shown to generate efficient partitions.
Bookmarks Related papers MentionsView impact
Queue, 2008
A gamer wanders through a virtual world rendered in near- cinematic detail. Seconds later, the sc... more A gamer wanders through a virtual world rendered in near- cinematic detail. Seconds later, the screen fills with a 3D explosion, the result of unseen enemies hiding in physically accurate shadows. Disappointed, the user exits the game and returns to a computer desktop that exhibits the stylish 3D look-and-feel of a modern window manager. Both of these visual experiences require hundreds of gigaflops of computing performance, a demand met by the GPU (graphics processing unit) present in every consumer PC.
Bookmarks Related papers MentionsView impact
Proceedings of the 2007 symposium on Interactive 3D graphics and games - I3D '07, 2007
Bookmarks Related papers MentionsView impact
ACM SIGGRAPH 2007 courses on - SIGGRAPH '07, 2007
Bookmarks Related papers MentionsView impact
IEEE Transactions on Ultrasonics, Ferroelectrics and Frequency Control, 2003
Bookmarks Related papers MentionsView impact
Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware - HWWS '04, 2004
Bookmarks Related papers MentionsView impact
Proceedings of the 2006 ACM/IEEE conference on Supercomputing - SC '06, 2006
Bookmarks Related papers MentionsView impact
ACM SIGGRAPH 2009 Courses on - SIGGRAPH '09, 2009
... Downloads (6 Weeks), 108. Downloads (12 Months), 1,391. View colleagues of Tim Foley. Justin ... more ... Downloads (6 Weeks), 108. Downloads (12 Months), 1,391. View colleagues of Tim Foley. Justin Hensley ... 11. Nguyen, H., and Donelly, W. 2005. Hair animation and rendering in the nalu demo. GPU Gems 2, 361--380. 12. Sintorn, E., and Assarsson, U. 2007. ...
Bookmarks Related papers MentionsView impact
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming - PPoPP '08, 2008
Bookmarks Related papers MentionsView impact
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '07, 2007
Bookmarks Related papers MentionsView impact
Proceedings of the 17th international conference on Parallel architectures and compilation techniques - PACT '08, 2008
Bookmarks Related papers MentionsView impact
ACM/IEEE SC 2006 Conference (SC'06), 2006
Bookmarks Related papers MentionsView impact
ACM SIGGRAPH 2008 classes on - SIGGRAPH '08, 2008
This first course in a series gives an introduction to parallel programming architectures and env... more This first course in a series gives an introduction to parallel programming architectures and environments for interactive graphics. There are strong indications that the future of interactive graphics involves a programming model more flexible than today's OpenGL/Direct3D pipelines. As such, graphics developers need to have a basic understanding of how to combine emerging parallel programming techniques with the traditional interactive
Bookmarks Related papers MentionsView impact
Proceedings of the 2006 ACM/IEEE conference on Supercomputing - SC '06, 2006
Bookmarks Related papers MentionsView impact
ACM SIGGRAPH 2010 Posters on - SIGGRAPH '10, 2010
Modern GPUs have been shown to be highly efficient machines for data-parallel applications such a... more Modern GPUs have been shown to be highly efficient machines for data-parallel applications such as graphics, image, video processing, or physical simulation applications. For example, a single ATI Radeon#8482; HD 5870 GPU has a theoretical peak of 2.72 teraflops (1012 floating-point operations per second) with a video memory bandwidth of 153.6 GB/s. While it is not difficult to port CPU
Bookmarks Related papers MentionsView impact
ArXiv, 2007
Commercial graphics processors (GPUs) have high compute capacity at very low cost, which makes th... more Commercial graphics processors (GPUs) have high compute capacity at very low cost, which makes them attractive for general purpose scientific computing. In this paper we show how graphics processors can be used for N-body simulations to obtain improvements in performance over current generation CPUs. We have developed a highly optimized algorithm for performing the O(N^2) force calculations that constitute the major part of stellar and molecular dynamics simulations. In some of the calculations, we achieve sustained performance of nearly 100 GFlops on an ATI X1900XTX. The performance on GPUs is comparable to specialized processors such as GRAPE-6A and MDGRAPE-3, but at a fraction of the cost. Furthermore, the wide availability of GPUs has significant implications for cluster computing and distributed computing efforts like Folding@Home.
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
ACM/IEEE SC 2005 Conference (SC'05), 2005
Bookmarks Related papers MentionsView impact
ACM SIGGRAPH 2008 classes on - SIGGRAPH '08, 2008
Bookmarks Related papers MentionsView impact
ACM SIGGRAPH 2007 courses on - SIGGRAPH '07, 2007
Bookmarks Related papers MentionsView impact
Partitioning fragment shaders into multiple rendering passes is an effective technique for virtua... more Partitioning fragment shaders into multiple rendering passes is an effective technique for virtualizing shading resource limits in graphics hardware. The Recursive Dominator Split (RDS) algorithm is a polynomial-time algorithm for partitioning fragment shaders for real-time rendering that has been shown to generate efficient partitions.
Bookmarks Related papers MentionsView impact
Queue, 2008
A gamer wanders through a virtual world rendered in near- cinematic detail. Seconds later, the sc... more A gamer wanders through a virtual world rendered in near- cinematic detail. Seconds later, the screen fills with a 3D explosion, the result of unseen enemies hiding in physically accurate shadows. Disappointed, the user exits the game and returns to a computer desktop that exhibits the stylish 3D look-and-feel of a modern window manager. Both of these visual experiences require hundreds of gigaflops of computing performance, a demand met by the GPU (graphics processing unit) present in every consumer PC.
Bookmarks Related papers MentionsView impact
Proceedings of the 2007 symposium on Interactive 3D graphics and games - I3D '07, 2007
Bookmarks Related papers MentionsView impact
ACM SIGGRAPH 2007 courses on - SIGGRAPH '07, 2007
Bookmarks Related papers MentionsView impact
IEEE Transactions on Ultrasonics, Ferroelectrics and Frequency Control, 2003
Bookmarks Related papers MentionsView impact
Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware - HWWS '04, 2004
Bookmarks Related papers MentionsView impact
Proceedings of the 2006 ACM/IEEE conference on Supercomputing - SC '06, 2006
Bookmarks Related papers MentionsView impact
ACM SIGGRAPH 2009 Courses on - SIGGRAPH '09, 2009
... Downloads (6 Weeks), 108. Downloads (12 Months), 1,391. View colleagues of Tim Foley. Justin ... more ... Downloads (6 Weeks), 108. Downloads (12 Months), 1,391. View colleagues of Tim Foley. Justin Hensley ... 11. Nguyen, H., and Donelly, W. 2005. Hair animation and rendering in the nalu demo. GPU Gems 2, 361--380. 12. Sintorn, E., and Assarsson, U. 2007. ...
Bookmarks Related papers MentionsView impact
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming - PPoPP '08, 2008
Bookmarks Related papers MentionsView impact
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '07, 2007
Bookmarks Related papers MentionsView impact
Proceedings of the 17th international conference on Parallel architectures and compilation techniques - PACT '08, 2008
Bookmarks Related papers MentionsView impact
ACM/IEEE SC 2006 Conference (SC'06), 2006
Bookmarks Related papers MentionsView impact
ACM SIGGRAPH 2008 classes on - SIGGRAPH '08, 2008
This first course in a series gives an introduction to parallel programming architectures and env... more This first course in a series gives an introduction to parallel programming architectures and environments for interactive graphics. There are strong indications that the future of interactive graphics involves a programming model more flexible than today's OpenGL/Direct3D pipelines. As such, graphics developers need to have a basic understanding of how to combine emerging parallel programming techniques with the traditional interactive
Bookmarks Related papers MentionsView impact
Proceedings of the 2006 ACM/IEEE conference on Supercomputing - SC '06, 2006
Bookmarks Related papers MentionsView impact
ACM SIGGRAPH 2010 Posters on - SIGGRAPH '10, 2010
Modern GPUs have been shown to be highly efficient machines for data-parallel applications such a... more Modern GPUs have been shown to be highly efficient machines for data-parallel applications such as graphics, image, video processing, or physical simulation applications. For example, a single ATI Radeon#8482; HD 5870 GPU has a theoretical peak of 2.72 teraflops (1012 floating-point operations per second) with a video memory bandwidth of 153.6 GB/s. While it is not difficult to port CPU
Bookmarks Related papers MentionsView impact