(original) (raw)

Stanford CS348K, Spring 2025

VISUAL COMPUTING SYSTEMS

Visual computing tasks such as computational imaging, 3D graphics, image/video understanding, generative content creation (images, videos, 3D models, interactive worlds), and AI-driven problem solving (agents) are key responsibilities of modern computer systems ranging from sensor-rich smart phones, autonomous robots, and large datacenters. These workloads demand exceptional system efficiency and this course examines the key ideas, techniques, and challenges associated with the design of parallel, heterogeneous systems that accelerate visual computing applications. This course is intended for systems students interested in architecting efficient graphics, image processing, and computer vision platforms (both new hardware architectures and domain-optimized programming frameworks for these platforms) and for graphics, vision, and AI students that wish to understand throughput computing principles to design new algorithms that map efficiently to these machines.

Basic Info

Tues/Thurs 10:30-11:50pm

Location: Lathrop 018

Welcome to CS348K Spring 2025. Please see the course info page for more info on policies and logistics, and well as answers to common questions like "Am I prepared to take this class?" This course is a paper-reading and in-class discussion-based course, so live attendence is expected of all participants.

Spring 2025 Schedule

| Apr 01 | | Course Introduction + Importance of Explicit Goals and Constraints Discussion of modern visual computing applications, a design exercise | | ------ | | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Apr 03 | | Digital Camera Processing Pipeline (Part I) - Algorithms Algorithms for taking raw sensor pixels to an RGB image: demosaicing, sharpening, correcting lens aberrations, multi-shot alignment/merging, multi-scale processing with Gaussian and Laplacian pyramids, HDR (local tone mapping) | | Apr 08 | | Digital Camera Processing Pipeline (Part II) + Camera Programming Abstractions The Frankencamera, modern camera APIs, advanced image understanding responsibilities of digital cameras (portrait mode, autofocus, etc) | | Apr 10 | | Scheduling Image Processing Algorithms in Halide Key optimization ideas, a detailed look at Halide's scheduling algebra | | Apr 15 | | Hardware Acceleration: What's in a Modern GPU + AI Accelerators GPUs, AI accelerators, special instructions for DNN evaluation (and their efficiency vs custom ASIC), choice of precision in arithmetic, flexibility vs efficiency trade-offs | | Apr 17 | | Adding Controls to Generative AI Systems (for images, video, 3D, animation) The importance of predictable control in content creation. Techniques for inserting new forms of control into generative image synthesis, role of human-interpretable abstractions. | | Apr 22 | | Controlling Generative AI (Part II) + Neurosymbolic Representations Aligning controls with user expectations, forms of "loose" control, how modern systems increasingly combine traditional symbolic structures with learned structures. The role of code as a representation for content. What should be learned and what should kept interpretable? | | Apr 24 | | Data Curation and Cleaning: the Unsexy Secret of Generative AI A look at the data curation and data cleaning pipelines that select data that goes into modern generative AI models. | | Apr 29 | | The Role of Virtual World Simulation in Training Agents Potential impact on creating functional content, role of agents as verifiers, implications to real-world robotics, etc. | | May 01 | | High-Throughput World Simulation for Agent Training Motivation for batch simulators (training RL agents), the design of the Madrona simulation engine. | | May 06 | | High-Throughput World Simulation for Agent Training (Part II) More detail on fast simulation platforms for training agents in virtual worlds. | | May 08 | | Generative Interactive World Models Generative AI models that yield interactive worlds, their relationship to traditional simulation engines, when are models to desirable? What is a future simulation engine? | | May 13 | | LLM-Based Problem Solving Agents LLM-based techniques for creating agents can learn skills in virtual worlds. | | May 15 | | Problem Solving Agents (Part 2) Techniques for using AI agents to simulate humans and human populations. | | May 20 | | Video Compression + Video Conferencing Systems H.264 video representation/encoding, parallel encoding, motivations for ASIC acceleration, ML-based compression methods, emerging opportunities for compression when machines, not humans, will observe most images | | May 22 | | Systems for Developing AI agents The DSPy system for efficiently developing and executing agentic designs. | | May 27 | | Differentiable Rendering to Reconstruct Visual Data The role of differentiable rendering and (differentiable programming in general) to recover 3D shapes/scenes, textures, materials, etc. from images. | | May 29 | | The Design of the Differentiable Slang Programming Language How the Slang language provides flexible support for auto-differentiation. Understanding the difference between mechanism and policy in system design. | | Jun 03 | | Project Presentations Students present projects. |

Assignments