Brandon Lloyd - Academia.edu (original) (raw)
Papers by Brandon Lloyd
Proceedings of the …, 2008
We present novel algorithms for computing discrete Fourier transforms with high performance on GP... more We present novel algorithms for computing discrete Fourier transforms with high performance on GPUs. We present hierarchical, mixed radix FFT algorithms for both power-of-two and non-power-of-two sizes. Our hierarchical FFT algorithms efficiently exploit shared memory ...
ACM SIGGRAPH 2006 Sketches on - SIGGRAPH '06, 2006
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming - PPoPP '11, 2011
We present an auto-tuning framework for FFTs on graphics processors (GPUs). Due to complex design... more We present an auto-tuning framework for FFTs on graphics processors (GPUs). Due to complex design of the memory and compute subsystems on GPUs, the performance of FFT kernels over the range of possible input parameters can vary widely. We generate several variants for ...
ACM SIGGRAPH 2005 Courses on - SIGGRAPH '05, 2005
We present new algorithms on commodity graphics processors for performing fast computation of sev... more We present new algorithms on commodity graphics processors for performing fast computation of several common database operations. Specifically, we consider operations such as conjunctive selections, aggregations, and semi-linear queries, which are essential computational components of typical database, data warehousing, and data mining applications. While graphics processing units (GPUs) have been designed for fast display of geometric primitives, we utilize the inherent pipelining and parallelism, single instruction and multiple data (SIMD) capabilities, and vector processing functionality of GPUs, for evaluating boolean predicate combinations and semi-linear queries on attributes and executing database operations efficiently. Our algorithms take into account some of the limitations of the programming model of current GPUs and perform no data rearrangements. Our algorithms have been implemented on a programmable GPU (e.g. NVIDIA's GeForce FX 5900) and applied to databases consisting of up to a million records. We have compared their performance with an optimized implementation of CPU-based algorithms. Our experiments indicate that the graphics processor available on commodity computer systems is an effective co-processor for performing database operations. While CPUs are used for general purpose computation, GPUs have been primarily designed for transforming, rendering, and texturing geometric primitives, such as triangles. The driving application of GPUs has been fast rendering for visual simulation, virtual reality, and computer gaming. GPUs are increasingly being used as co-processors to CPUs. GPUs are extremely fast and are capable of processing tens of millions of geometric primitives per second. The peak performance of GPUs has been increasing at the rate of 2.5 − 3.0 times a year, much faster than the Moore's law for CPUs. At this rate, the GPU's peak performance may move into the teraflop range by 2005 [19]. Most of this performance arises from multiple processing units and stream processing. The GPU treats the vertices and pixels constituting graphics primitives as streams. Multiple vertex and pixel processing engines on a GPU are connected via data flows. These processing engines perform simple operations in parallel.
: These images demonstrate the benefits of CC shadow volumes on a scene with 96K polygons. Standa... more : These images demonstrate the benefits of CC shadow volumes on a scene with 96K polygons. Standard shadow volumes are shown in the left image and CC shadow volumes in the middle. Shadow volumes are shown in transparent yellow. The right image shows the shadows generated by CC shadow volumes at interactive rates. CC shadow volumes generate up to 7 times less fill than standard shadow volumes in this scene.
ACM Transactions on Graphics, 2008
LogPSM PSM cube map error LogPSM error : Night-time scene of robots in a hangar with a point ligh... more LogPSM PSM cube map error LogPSM error : Night-time scene of robots in a hangar with a point light. We compare our algorithm (LogPSM) to Kozlov's improved perspective shadow map (PSM) algorithm. Both algorithms use a cube map with a total resolution of 1024 × 1024. The images have a resolution of 512 × 512. (Left) Compared to a standard cube map, the PSM cube map greatly reduces aliasing artifacts near the viewer, but some aliasing is still visible. The shadows are severely stretched on the back wall. LogPSMs provide higher quality both near the viewer and in the distance. The shadow map grid has been superimposed to aid visualization (grid lines every 20 texels). (Right) An error visualization for both algorithms.
ACM SIGPLAN Notices, 2011
We present an auto-tuning framework for FFTs on graphics processors (GPUs). Due to complex design... more We present an auto-tuning framework for FFTs on graphics processors (GPUs). Due to complex design of the memory and compute subsystems on GPUs, the performance of FFT kernels over the range of possible input parameters can vary widely. We generate several variants for ...
ACM Transactions on Graphics, 2003
: The left image shows a snapshot generated from the application of our hybrid shadow generation ... more : The left image shows a snapshot generated from the application of our hybrid shadow generation algorithm to the powerplant model (12.7M triangles). The middle image shows a different viewpoint generated using perspective shadow maps. Notice the aliasing artifacts. The right image highlights the shadows generated by our interactive algorithm from the same viewpoint with sharper boundaries.
Proceedings of the …, 2008
We present novel algorithms for computing discrete Fourier transforms with high performance on GP... more We present novel algorithms for computing discrete Fourier transforms with high performance on GPUs. We present hierarchical, mixed radix FFT algorithms for both power-of-two and non-power-of-two sizes. Our hierarchical FFT algorithms efficiently exploit shared memory ...
ACM SIGGRAPH 2006 Sketches on - SIGGRAPH '06, 2006
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming - PPoPP '11, 2011
We present an auto-tuning framework for FFTs on graphics processors (GPUs). Due to complex design... more We present an auto-tuning framework for FFTs on graphics processors (GPUs). Due to complex design of the memory and compute subsystems on GPUs, the performance of FFT kernels over the range of possible input parameters can vary widely. We generate several variants for ...
ACM SIGGRAPH 2005 Courses on - SIGGRAPH '05, 2005
We present new algorithms on commodity graphics processors for performing fast computation of sev... more We present new algorithms on commodity graphics processors for performing fast computation of several common database operations. Specifically, we consider operations such as conjunctive selections, aggregations, and semi-linear queries, which are essential computational components of typical database, data warehousing, and data mining applications. While graphics processing units (GPUs) have been designed for fast display of geometric primitives, we utilize the inherent pipelining and parallelism, single instruction and multiple data (SIMD) capabilities, and vector processing functionality of GPUs, for evaluating boolean predicate combinations and semi-linear queries on attributes and executing database operations efficiently. Our algorithms take into account some of the limitations of the programming model of current GPUs and perform no data rearrangements. Our algorithms have been implemented on a programmable GPU (e.g. NVIDIA's GeForce FX 5900) and applied to databases consisting of up to a million records. We have compared their performance with an optimized implementation of CPU-based algorithms. Our experiments indicate that the graphics processor available on commodity computer systems is an effective co-processor for performing database operations. While CPUs are used for general purpose computation, GPUs have been primarily designed for transforming, rendering, and texturing geometric primitives, such as triangles. The driving application of GPUs has been fast rendering for visual simulation, virtual reality, and computer gaming. GPUs are increasingly being used as co-processors to CPUs. GPUs are extremely fast and are capable of processing tens of millions of geometric primitives per second. The peak performance of GPUs has been increasing at the rate of 2.5 − 3.0 times a year, much faster than the Moore's law for CPUs. At this rate, the GPU's peak performance may move into the teraflop range by 2005 [19]. Most of this performance arises from multiple processing units and stream processing. The GPU treats the vertices and pixels constituting graphics primitives as streams. Multiple vertex and pixel processing engines on a GPU are connected via data flows. These processing engines perform simple operations in parallel.
: These images demonstrate the benefits of CC shadow volumes on a scene with 96K polygons. Standa... more : These images demonstrate the benefits of CC shadow volumes on a scene with 96K polygons. Standard shadow volumes are shown in the left image and CC shadow volumes in the middle. Shadow volumes are shown in transparent yellow. The right image shows the shadows generated by CC shadow volumes at interactive rates. CC shadow volumes generate up to 7 times less fill than standard shadow volumes in this scene.
ACM Transactions on Graphics, 2008
LogPSM PSM cube map error LogPSM error : Night-time scene of robots in a hangar with a point ligh... more LogPSM PSM cube map error LogPSM error : Night-time scene of robots in a hangar with a point light. We compare our algorithm (LogPSM) to Kozlov's improved perspective shadow map (PSM) algorithm. Both algorithms use a cube map with a total resolution of 1024 × 1024. The images have a resolution of 512 × 512. (Left) Compared to a standard cube map, the PSM cube map greatly reduces aliasing artifacts near the viewer, but some aliasing is still visible. The shadows are severely stretched on the back wall. LogPSMs provide higher quality both near the viewer and in the distance. The shadow map grid has been superimposed to aid visualization (grid lines every 20 texels). (Right) An error visualization for both algorithms.
ACM SIGPLAN Notices, 2011
We present an auto-tuning framework for FFTs on graphics processors (GPUs). Due to complex design... more We present an auto-tuning framework for FFTs on graphics processors (GPUs). Due to complex design of the memory and compute subsystems on GPUs, the performance of FFT kernels over the range of possible input parameters can vary widely. We generate several variants for ...
ACM Transactions on Graphics, 2003
: The left image shows a snapshot generated from the application of our hybrid shadow generation ... more : The left image shows a snapshot generated from the application of our hybrid shadow generation algorithm to the powerplant model (12.7M triangles). The middle image shows a different viewpoint generated using perspective shadow maps. Notice the aliasing artifacts. The right image highlights the shadows generated by our interactive algorithm from the same viewpoint with sharper boundaries.