Graphics and Imaging Architectures (CMU 15-869, Fall 2011) (original) (raw)
CMU 15-869
Graphics and Imaging Architectures
Instructor:
Time:
Location:
Tues/Thurs 3:00 - 4:20pm (Fall 2011)
GHC 4101
Description
Visual computing tasks such as 3D graphics and image processing are increasingly important to the capabilities and overall user experience delivered by computer systems ranging from high-end workstations to sensor-rich smart phones. The aim of this reading and project-based course is to examine key ideas, trends, and challenges associated with the design of architectures and systems responsible for efficiently executing these workloads.
This course begins with an in-depth study of the real-time graphics pipeline architecture and the efficiency of its modern GPU implementations. Key topics include the design of GPU processing and communication resources, graphics pipeline components and their scheduling on heterogeneous, parallel hardware, and how current abstractions balance conflicting needs for both efficiency and programmability. The second part of the course will address system design challenges in a broader array of emerging visual computing topics including: image processing architectures for mobile computing, programmable camera platforms, alternative graphics pipelines, and GPU-accelerated interfaces for application domains beyond graphics.
This course is intended for systems/graphics students interested in architecting future graphics or image processing platforms and for students interested in gaining experience designing applications or programming frameworks for emerging heterogeneous, parallel systems.
Schedule
Sept 13: | Course Introduction Readings: (due for class Sept 15) D. Blythe, Rise of the Graphics Processor. Proceedings of the IEEE, 2008 T. Akinene-Moller and E. Haines, Real-Time Rendering, Chapter 2 (handed out in class) |
---|---|
Sept 15: | The Real-Time Graphics Pipeline (system inputs, outputs, entities, and operations... a.k.a., real-time graphics from a systems perspective) Readings: (due for class Sept 20) M. Segal and K. Akeley, The Design of the OpenGL Graphics Interface. [unpublished 1994] D. Blythe, The Direct3D 10 System. SIGGRAPH 2006 Supplemental: M. Kilgard, Realizing OpenGL: Two Implementations of One Architecture. Eurographics 1997 (sections 1 and 2) |
Sept 20: | Graphics Workload Characterization + Parallelizing the Graphics Pipeline (characteristics of the graphics pipeline workload, Molnar's "sorting" taxonomy for mapping the pipeline to parallel execution resources: trade-offs between parallelism, communication, and locality) Readings: (due for class Sept 22) S. Molnar et al., A Sorting Classification of Parallel Rendering. IEEE Computer Graphics and Applications, 1994 M. Eldridge et al., Pomegranate: A Fully Scalable Graphics Architecture. SIGGRAPH 2000 |
Sept 22: | Geometry Processing (clipping, tessellation, workload scheduling challenges introduced by tessellation) Readings: (due for class Sept 27) Tessellation Overview. Direct3D 11 Programming Guide. Microsoft Dev Center Documentation, 2011 For the curious: (suggested additional readings on parallel tessellation) H. Moreton, Watertight Tessellation With Forward Differencing. Graphics Hardware 2001 C. Loop and S. Schaefer, Approximating Catmull-Clark Subdivision Surfaces with Bicubic Patches. Transactions on Graphics, 2008 M. Fisher, DiagSplit: Parallel, Crack-Free, Adaptive Tessellation for Micropolygon Rendering. SIGGRAPH Asia 2009 |
Sept 27: | Rasterization and Occlusion (fixed-function implementation, precision issues, work efficient algorithms vs. parallel algorithms, hierarchical and early Z, Z-buffer compression) Readings: (due for class Sept 29) A. R. Smith, A Pixel is Not a Little Square. Microsoft Technical Memo, 1995 M. Abrash, Rasterization on Larrabee. Dr. Dobbs Portal, May 1, 2009 (original article is available online here) S. Morein, ATI Radeon HyperZ Technology. Graphics Hardware Hot3D Presentation, 2000 For the curious: Other rasterizers: M. Olano and T. Greer, Triangle Scan Conversation Using 2D Homogeneous Coordinates. Graphics Hardware 1997 Take a look at NVIDIA's CUDA software rasterizer (with source code from S. Laine et al., High-Performance Software Rasterization on GPUs. High Performance Graphics 2011) More on the Z-buffer: N. Greene, Hierarchical Z-Buffer Visibility. SIGGRAPH 93 J. Hasselgren and T. A. Moller., Efficient Depth Buffer Compression. Graphics Hardware 2006 K. Akeley and J. Su, Minimum Triangle Separation for Correct Z-Buffer Occlusion. Eurographics 2006 Recent rasterization topics: G. Johnson et al., The Irregular Z-Buffer: Hardware Acceleration for Irregular Data Structures, Transactions on Graphics (4), 2005 K. Fatahalian et al., Data-Parallel Rasterization of Micropolygons with Defocus and Motion Blur, High Performance Graphics 2009 S. Laine et al., Clipless Dual-Space Bounds for Faster Stochastic Rasterization, SIGGRAPH 2011 |
Sept 29: | Texturing (anti-aliasing using the mip-map, fixed-function filtering, texture prefetching and caching policies) Readings: (due for class Oct 4) Z. Hakura and A. Gupta, The Design and Analysis of a Cache Architecture for Texture Mapping. ISCA 1997 H. Igehy et al., Prefetching in a Texture Cache Architecture. Graphics Hardware 1998 Supplemental material: L. Williams, Pyramidal Parametrics. Computer Graphics 1993 (this is the original mip-mapping paper) P. Heckbert and H. Moreton, Interpolation for Polygon Texture Mapping and Shading. State of the Art in Computer Graphics: Visualization and Modeling. 1991 D. Peachey, Texture on Demand. Pixar Animation Studios - Technical Memo #217, 1990 |
At this point in the course we have covered many of the basic algorithms implemented by graphics pipelines before the introduction of application-programmable shading stages. We have also discussed key factors in the graphics pipeline's implementation. The following papers describe graphics systems of historical significance that may be of interest to the class. Given course lectures so far, you should be able to digest most of the material in these papers. For the curious: H. Fuchs et al., Pixel-Planes 5: A Heterogeneous Multiprocessor Graphics System Using Processor-Enhanced Memories. Computer Graphics 89 S. Molnar et al., PixelFlow: High-Speed Rendering Using Image Composition. Computer Graphics 92 K. Akeley, RealityEngine Graphics. SIGGRAPH 93 J. Montrym et al., InfiniteReality: A Real-Time Graphics System. SIGGRAPH 97 | |
Oct 4: | The GPU Programmable Processing Core (implicit vs. explicit SIMD execution, large-scale multi-threading, thread scheduling, modern GPU case studies: NVIDIA Fermi and AMD Cypress) Review reading: (combines topics up to, but not including, today's GPU core lecture; due for class Oct 6) K. Akeley, RealityEngine Graphics. SIGGRAPH 93 Supplemental material on GPU processing cores: K. Fatahalian and M. Houston, A Closer Look at GPUs. Communications of the ACM, October 2008 E. Lindholm et al., A User-Programmable Vertex Engine. SIGGRAPH 2001 J. Montrym et al., The GeForce 6800. IEEE Micro, March-April, 2005 E. Lindholm et al., NVIDIA Tesla: A Unified Graphics and Computing Architecture. IEEE Micro, March 2008 C. R. Wittenbrink et al., Fermi GF100 GPU Architecture. IEEE Micro, March 2011 L. Seiler et al., Larrabee: A Many-Core x86 Architecture for Visual Computing. SIGGRAPH 2008 A. Levinthal and T. Porter, Chap - A SIMD Graphics Processor. Computer Graphics 1984 M. Gebhart et al., Energy-efficient Mechanisms for Managing Thread Context in Throughput Processors. ISCA 2011 |
Oct 6: | GPU Memory Hierarchy (accessing data, the role of caches, bandwidth-limited applications) Readings: (due for class Oct 11) E. Lindholm et al., NVIDIA Tesla: A Unified Graphics and Computing Architecture. IEEE Micro, March 2008 Supplemental material: K. Fatahalian et al., Understanding the Efficiency of GPU Algorithms for Matrix-Matrix Multiplication, Graphics Hardware 2004 (no longer true of GPUs, but an interesting workload analysis example) |
Oct 11: | Guest Speaker: The Life of an Architect at AMD During Graphics Core Next Architecture Development Mike Mantor, Senior Fellow, AMD |
Oct 13: | Guest Speaker: Implementation of Tessellation and the Geometry Shader in NVIDIA Fermi Henry Moreton, NVIDIA |
Oct 18: | Review of SIMT Execution + Introduction to Shading (Part 1: an extended discussion of how to implement SIMT execution. Part 2: the rendering equation and why expressing shading/lighting functions requires programmability) Homework Assignment 1 [Solution] (due for class Oct 20) Supplemental material on SIMT/SIMD (also see material from Oct 4th lecture): M. Abrash, A First Look at the Larrabee New Instructions, Dr. Dobbs, April 2009. (original on-line posting here) A. Levinthal and T. Porter, Chap - A SIMD Graphics Processor. Computer Graphics 1984 W. Fund et al., Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow. International Symposium on Microarchitecture, 2007 V. Narasiman et al., Improving GPU Performance via Large Warps and Two-Level Warp Scheduling. University of Texas Technical Report, TR-HPS-2010-006 |
Oct 20 | Shading Languages (key primitives, level of abstraction decisions [mimic hardware or align with domain?], Renderman Shading Language, modern GPU shading languages) Readings: W. R. Mark et al., Cg: A system for programming graphics hardware in a C-like language. SIGGRAPH 2003 P. Hanrahan and J. Lawson, A Language for Shading and Lighting Calculations. SIGGRAPH 90 T. Foley et al., Spark: Modular, Composable Shaders for Graphics Hardware. SIGGRAPH 2011 Supplemental material on shading languages: M. Cook, Shade Trees. SIGGRAPH 84 K. Perlin, An Image Synthesizer. SIGGRAPH 85K. Proudfoot et al., A Real-Time Procedural Shading System for Programmable Graphics Hardware, SIGGRAPH 2001 M. McCool et al., Shader Metaprogramming, Graphics Hardware 2002 A. Apodaca and L. Gritz, Advanced Renderman: Creating CGI for Motion Pictures. 1999 (Chapter 7 and Chapter 9) |
Oct 25: | GPU Computing and the CUDA/OpenCL Programming Model (history of GPGPU leading to CUDA (and then OpenCL), its good and not-so-good ideas, looking forward to CPU/GPU integration, GRAMPS) Supplemental material: NVIDIA, NVIDIA CUDA C Programming Guide. 2011 (skim through chapters 1-2 of the CUDA programming guide, or take a look at one of the OpenCL tutorials on the web) S. Larson and D. McAllister, Fast Matrix Multiplies using Graphics Hardware. Supercomputing 2001 I. Buck et al., Brook for GPUs: Stream Computing on Graphics Hardware. SIGGRAPH 2004 J. Sugerman et al., GRAMPS: A Programming Model for Graphics Pipelines. Transactions on Graphics 2009 |
Oct 27: | Guest Speaker: Heterogeneous Client Programming (Graphics and OpenCL) David Blythe, Chief Graphics Software Architect, Intel |
Nov 1: | Deferred Shading (deferred shading costs/benefits in terms of coherence, operation count, and bandwidth; anti-aliasing in deferred rendering systems, tile-based deferred shading) Supplemental material: A. Lauritzen, Deferred Rendering for Current and Future Rendering Pipelines. Beyond Programmable Shading, SIGGRAPH 2010 Courses (corresponding source code available here) J. Andersson, Parallel Graphics in Frostbite - Current & Future. Beyond Programmable Shading I, SIGGRAPH 2009 Courses (see deferred rendering notes in second half of talk) A. Reshetov, Morphological Anti-Aliasing. High Performance Graphics 2009 S. Molnar et al., PixelFlow: High-Speed Rendering Using Image Composition. Computer Graphics 92 |
Nov 3: | Reyes Architecture and Implementation (design constraints, differences between the Reyes pipeline and the real-time OpenGL/Direct3D graphics pipeline) Motion blur demos: White ball on black background experiment from class (images rendered with PRman, box pixel filter used) Animation: fast camera pan [ 32 samples per pixel ] Animation: rolling ball [ 16 samples per pixel ] (note error due to linear approximation of rotation) Required reading: (due for class Nov 3) R. Cook et al., The Reyes Image Rendering Architecture. SIGGRAPH 87 Supplemental material: A. Apodaca and L. Gritz, Advanced Renderman: Creating CGI for Motion Pictures. 1999 (Chapter 6) L. Carpenter, The A-Buffer: an Anti-Aliased Hidden Surface Method. SIGGRAPH 84 J. Lane et al., Scan Line Methods for Displaying Parametrically Defined Surfaces. Communications of the ACM 1980 R. Cook, Stochastic Sampling in Computer Graphics. Transactions on Graphics 1986 |
Nov 8: | Real-Time Ray Tracing (modern packet-based approaches, ray reordering, ray-tracing vs. rasterization) Required reading: (due for class Nov 10) T. Aila and S. Laine, Understanding the Efficiency of Ray Traversal on GPUs. High Performance Graphics 2009 Supplemental material on modern ray packet tracing: I. Wald et al., Ray Tracing Deformable Scenes Using Dynamic Bounding Volume Hierarchies. Transactions on Graphics 2007 (great reference for a modern packet tracer) I. Wald et al., SIMD Ray Traversal With Generalized Ray Packets and On-the-Fly Re-Ordering. Univ. of Utah Tech Report UUSCI-2007-012 Boulos et al., Adaptive Ray Packet Reordering. Symposium on Interactive Ray Tracing 2008 I. Wald et al., Getting Rid of Packets: Efficient SIMD Single-Ray Traversal using Multi-Branching BVHs. Symposium on Interactive Ray Tracing 2008 Supplemental material on global ray reordering for increased locality: M. Pharr et al., Rendering Complex Scenes With Memory-Coherent Ray Tracing. SIGGRAPH 97 T. Aila and S. Laine, Architecture Considerations for Tracing Incoherent Rays. High Performance Graphics 2010 More good ray tracing reads: S. Parker et al., OptiX: A General Purpose Ray Tracing Engine. SIGGRAPH 2010 P. Christensen et al., Ray Tracing for the Movie Cars. Symposium on Interactive Ray Tracing 2007 W. Hunt and W. R. Mark, Ray-Specialized Acceleration Structures for Ray Tracing. Symposium on Interactive Ray Tracing 2008 High-quality, open-source ray tracing implementations: PBRT: Physically Based Rendering. (textbook and full source implementation. Not fastest ray tracer, but fully-featured) Manta Interactive Ray Tracer (excellent CPU-based packet tracing system from the University of Utah) Intel Embree (highly optimized CPU-based single-ray tracing system from Intel, SSE/AVX implementations) |
Nov 10: | The Light Field and Image-Based Rendering (light-field theory, how image-based rendering has found its legs in rendering massive data: e.g., Google Street View, Microsoft Photosynth) Suggested readings: M. Levoy and P. Hanrahan, Light Field Rendering. SIGGRAPH 1996 S. J. Gortler et al., The Lumigraph. SIGGRAPH 1996 W. R. Mark and G. Bishop, Efficient Reconstruction Techniques for Post-Rendering Image Warping. UNC Technical Report TR98-011 N. Snavely et al., Photo Tourism: Exploring Photo Collections in 3D. SIGGRAPH 2006 |
Nov 15: | Digital Camera Image Processing Pipeline, Part I (sensor basics, noise sources, the early stages of the image processing pipeline) Suggested readings: The Stanford CS178 course notes are an incredible source of detail about digital cameras. The demos shown in class (and more) are available on the Stanford CS178 pages. The brave may want to check out the description of the image processing pipeline in Sections 12.4.6 through 12.4.8 of the TI OMAP35x Application Processor Technical Reference Manual (now getting dated). The OMAP4 version is in Section 8.3 here. Homework Assignment 2: Related paper links are below (due for class Nov 22) D. E. Shaw et al., Anton: a Special-Purpose Machine for Molecular Dynamics Simulation. ISCA 2007 D. E. Shaw, A Fast, Scalable Method for the Parallel Evaluation of Distance-Limited Pairwise Particle Interactions. Journal of Computational Chemistry. 2005 Z. DeVito et al., Liszt: A Domain Specific Language for Building Portable Mesh-based PDF Solvers. Supercomputer 2011 |
Nov 17: | Digital Camera Image Processing Pipeline, Part II (later stages of the image processing pipeline up through JPG compression, preshot basics: auto-focus, auto-exposure, hardware implementations: TI OMAP, NVIDIA Tegra) Suggested readings: (Please see suggested readings from the previous lecture.) |
Nov 22: | Beyond RGB Pixels I: The Light-Field Camera (plenoptic cameras) (computational challenges of acquiring light fields; applications of light-field photography: synthetic aperture, post-shot refocusing) Suggested readings: E. H. Adelson and J. Wang, Single Lens Stereo with a Plenoptic Camera. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1992 B. Wilburn et al., High Performance Imaging Using Large Camera Arrays. SIGGRAPH 2005 R. Ng et al., Light Field Photography with a Hand-Held Plenoptic Camera. Stanford University Tech Report, 2005 R. Ng, Digital Light Field Photography. Stanford Ph.D. Dissertation, 2006 (Chapters 1-4 provide great descriptions of the topics we covered in class, with great figures!) |
Nov 29: | Beyond RGB Pixels II: Depth Cameras (case study of how the Microsoft Kinect system works. (Fixed-function processing and new algorithms to become practical on XBox 360) Suggested reading: J. Shotton et al., Real-Time Human Pose Recognition in Parts from Single Depth Images. CVPR 2011 Z. Zalevsky et al., Method and System for Object Recognition. International Patent WO 2007/043036 A1 (This is one of the original patents by the PrimeSense folks.) ROS.org, Technical Description of Kinect Calibration (Speculation about what the Kinect actually computes, by those who have to calibrate it for robots.) |
Dec 1: | Smarter Cameras: Emerging Efforts to Make Cameras Programmable (trends in camera phones, Stanford's Frankencamera project, challenges of 'always-on' cameras) Required reading: (due for class Dec 1) A. Adams et al., The Frankencamera: An Experimental Platform for Computational Photography. SIGGRAPH 2010 Suggested readings: Willow Garage is now supporting the FCam project. See the project page. |
Dec 6: | Course Review + Tips for Giving a Good Presentation Suggested readings: I. Sutherland, Technology and Courage. SunLabs Perspectives Series, 1996 (This essay is based on a famous talk Sutherland gave at CMU in 1982.) |
Dec 8: | No class: project work day |
Assignments / Grading
Students will be expected to complete and analyze required course readings as well as perform a semester long research project of their choosing (groups of 1 or 2). One to two short assignments will be given during the semester (involving literature presentation and architecture design/thought experiments).
See the project details page for expectations and deadlines for projects.
Reading summaries should be submitted on the reviews web site. (In the collections list, look for the collection called CMU 15-869: Graphics and Imaging Architectures -- Fall 2011):
Prerequisites
Knowledge of real-time 3D rendering as presented in an introductory graphics class (e.g., 15-462) is strongly recommended. Background in computer architecture (at the level of 15-213) or parallel computing is also recommended. If you are unsure about your preparedness for the material in this course, just ask!
Other Courses With Similar Topics
- Real-Time Graphics Architectures (Akeley and Hanrahan, Stanford University, CS448, Spring 2007)
- Beyond Programmable Shading (Houston and Lefohn) There is a Stanford version (CS448s, Spring 2011) and a University of Washington version (CSE 558, Winter 2011) of this course.
- SIGGRAPH 2011 Course: Beyond Programmable Shading, (also provided, with varying content, in 2008, 2009, and 2010)
- Stanford CS178 - Digital Photography (Levoy)
- MIT 6.815/6.865 - Digital and Computational Photography (Durand)