Visual Computing Systems : Stanford Winter 2018 (original) (raw)
Stanford CS348V, Winter 2018
VISUAL COMPUTING SYSTEMS
Visual computing tasks such as computational imaging, image/video understanding, and real-time 3D graphics are key responsibilities of modern computer systems ranging from sensor-rich smart phones, autonomous robots, and large datacenters. These workloads demand exceptional system efficiency and this course examines the key ideas, techniques, and challenges associated with the design of parallel, heterogeneous systems that accelerate visual computing applications. This course is intended for systems students interested in architecting efficient graphics, image processing, and computer vision platforms (both new hardware architectures and domain-optimized programming frameworks for these platforms) and for graphics, vision, and machine learning students that wish to understand throughput computing principles to design new algorithms that map efficiently to these machines.
Basic Info
Tues/Thurs 1:30-2:50pm
Mitchell Earth Sciences B67
See the course info page for more info on course policies, logistics, and how to prepare for the course.
Winter 2018 Schedule (subject to change)
Jan 9 | Course Introduction + Review of Parallel Hardware Architecture Multi-core, SIMD, and hardware multi-threading in the context of modern multi-core CPUs, GPUs, FPGAs, ASICs; understanding latency and bandwidth constraints |
---|---|
Part 1: High Efficiency Image and Video Processing | |
Jan 11 | Overview of a Modern Digital Camera Processing Pipeline Algorithms taking raw sensor pixels to an RGB image: demosaicing, sharpening, correcting aberrations, autofocus/autoexposure, high-dynamic range processing via multi-shot techniques |
Jan 16 | Camera Pipeline Part II + Image Processing Algorithms You Should Know Pyramidal/multi-resolution techniques, Local laplacian filters, bilateral filters (via the bilateral grid), optical flow |
Jan 18 | Efficiently Scheduling Image Processing Algorithms on Parallel Hardware Balancing locality, parallelism, and work, fusion and tiling, design of the Halide domain-specific language, automatically scheduling image processing pipelines |
Jan 23 | Specialized Hardware for Image Processing Contrasting efficiency of GPUs, DSPs, Image Signal Processors, and FPGAs for image processing, domain-specific languages for hardware synthesis; DSLs for hardware synthesis such as Darkroom/Rigel, compiling Halide to hardware |
Jan 25 | Lossy Image (JPG) and Video (H.264) Compression Basics of JPG and H.264 encoding, motivations for ASIC acceleration, future opportunities for compression when machines, not humans, will observe most images |
Jan 30 | The Light Field, Computational Cameras, and Display/Capture for VR Light field representation, light field cameras, computational challenges of synthesizing video streams for VR output, Google's Jump VR pipeline |
Part 2: Efficient Training and Evaluation of DNNs for Visual Understanding | |
Feb 1 | Workload Characteristics of DNN Inference for Image Analysis DNN topologies, reduction to dense linear algebra, challenges of direct implementation, where the compute lies in the network, motivations for three modern "trunks": Inception/Resnet/MobileNet, what it means to be fully convolutional |
Feb 6 | Scheduling and Algorithms for Parallel DNN Training at Scale Footprint challenges of training, model vs. data parallelism, asynchronous vs. synchronous training debate, parameter server designs, key optimizations for parallel training |
Feb 8 | A Case Study of Algorithmic Optimizations for Object Detection Motivating R-CNN/Fast R-CNN/Faster R-CNN, alternative design of SSD/Yolo, philosophy of end-to-end training |
Feb 13 | Leveraging Task-Specific DNN Structure for Improving Performance and Accuracy Neural module networks (and their surprising effectiveness for VQA), learning to compress images/video, discussion on value of modularity vs. end-to-end learning |
Feb 15 | Hardware Accelerators for DNN Inference GPUs, Google TPU, special instructions for DNN evaluation, choice of precision, recent ISCA/MICRO papers on DNN acceleration |
Feb 20 | Optimizing Inference on Video Streams Specialization to scene, exploiting temporal coherence in video, sharing across applications |
Feb 22 | Video Processing at Datacenter Scale Facebook SVE, ExCamera, Scanner |
Part 3: GPU Implementation of the Real-Time 3D Graphics Pipeline | |
Feb 27 | Real-Time 3D Graphics Pipeline Architecture 3D graphics pipeline as a machine architecture (abstraction), basic pipeline semantics/functionality, contrasting graphics pipeline architecture with compute-mode GPU architecture |
Mar 1 | Hardware Acceleration of Texture Mapping and Depth-Buffering Texture sampling and prefiltering basics, texture compression, depth-and-color buffer compression, motivations for hardware multi-threading for latency hiding in modern GPUs |
Mar 6 | Scheduling the Graphics Pipeline onto a GPU Molnar sorting taxonomy, dataflow scheduling under data amplification, tiled rendering for bandwidth-efficiency, deferred shading as a scheduling decision |
Mar 8 | Large Scale Distributed Image Processing at Facebook (guest lecture) Facebook Lumos |
Mar 13 | Domain Specific Languages for Shading Renderman Shading Language and Cg: contrasting two different levels of abstraction for shading languages |
Mar 15 | Topic TBD |
Assignments and Projects
All students will be expected to perform academic paper readings approximately every other class, complete three simple programming exercises (to reinforce concepts), and complete a self-selected final project (projects can be performed in teams of up to two).