William J. Dally's Home Page has moved (original) (raw)
Last updated Oct. 3, 2001
Bill Dally is a Professor of Electrical Engineering and Computer Science at Stanford University. He is a member of theComputer Systems Laboratory, leads the Concurrent VLSI Architecture Group, and teaches courses on Computer Architecture, Computer Design, and VLSI Design.
Before coming to Stanford, Bill was a Professor in the department of Electrical Engineering and Computer Science at MIT .
Current Projects
We are developing a streaming supercomputer (SS) that is scalable from a single-chip to thousands of chips that we estimate will achieve a factor of 100x improvement in the performance per unit cost on a wide range of demanding numerical computations compared to conventional cluster-based supercomputers. The SS uses a combination of stream processing with a high-performance network to access a globally shared memory to achieve this goal.
Imagine: A High-Performance Image and Signal Processor
Imagine is a programmable signal and image processor that provides the performance and performance density of a special-purpose processor. Imagine achieves a peak performance of 20GFLOPS (single-precision floating point) and 40GOPS (16-bit fixed point) and sustains over 12GFLOPS and 20GOPS on key signal processing benchmarks. Imagine sustains a power efficiency of 3.7GFLOPS/W on these same benchmarks, a factor of 20 better than the most efficient conventional signal processors.
Scalable Network Fabrics
We are developing architectures and technologies to enable large, scalable high-performance interconnection networks to be used in parallel computers, network switches and routers, and high-performance I/O systems. Recent results include the development of a hierarchical network topology that makes efficient use of a combination of electrical and optical links, a locality-preserving randomized oblivious routing algorithm, a method for scheduling constrained crossbar switches, new speculative and reservation-based flow control methods, and a method for computing the worst-case traffic pattern for any oblivious routing function.
We are investigating combined processor/memory architectures that are best able to exploit 2009 semiconductor technologies. We envision these architectures being composed of 10s to 100s of processors and memory banks on a single semiconductor chip. Our research addresses the design of the processors and memories, the architecture of the interconnection network that ties them together, and mechanisms to simplify programming of such machines.
We are developing methods and circuits that stretch the performance bounds of electrical signalling between chips, boards, and cabinets in a digital system. A prototype 0.25um 4Gb/s CMOS transceiver has been developed, dissipating only 130mW, amenable for large scale integration. Future chips include a a 20Gb/s 0.13um CMOS transceiver.
Recent Projects
Is an experimental parallel computer that demonstrated highly-efficient mechanisms for parallelism including two-level multithreading, efficient network interfaces, fast communication and synchronization, and support for efficient shared memory protocols.
is a high-performance multicomputer router that demonstrates new technologies ranging from architecture to circuit design. At the architecture level the router uses a novel adaptive routing algorithm, a link-level retry protocol, and a unique token protocol. Together the two protocols greatly reduce the cost of providing reliable, exactly-once end-to-end communication. At the circuit level the router demonstrates the latest version of our simultaneous bidirectional pads and a new method for plesiochronous synchronization.
is an experimental parallel computer, in operation since July 1991, that demonstrates mechanisms that greatly reduce the overhead involved in inter-processor interaction.
Selected Publications
- William Dally, Pat Hanrahan, and Ron Fedkiw,"A Streaming Supercomputer,"Whitepaper, September 18, 2001.
- Brucek Khailany, William J. Dally, Scott Rixner, Ujval J. Kapasi, Peter Mattson, Jin Namkoong, John D. Owens, Brian Towles, and Andrew Chang. "Imagine: Media Processing with Streams." IEEE Micro, Mar/April 2001.
- William J. Dally and John W. Poulton, Digital Systems Engineering, Cambridge University Press, 1998
- Fillo, Marco, Keckler, Stephen W., Dally, William J., Carter, Nicholas P., Chang, Andrew, Gurevich, Yevgeny, and Lee, Whay S., "The M-Machine Multicomputer" , International Journal of Parallel Programming - Special Issue on Instruction-Level Parallel Processing Part II . Vol 25, No 3, 1997 pp 183-212.
- Dally, William J., Chang, Andrew., Chien, Andrew., Fiske, Stuart., Horwat, Waldemar., Keen, John., Lethin, Richard., Noakes, Michael., Nuth, Peter., Spertus, Ellen., Wallach, Deborah., and Wills, D. Scott.
"The J-Machine" . Retrospective in 25 Years of the International Symposia on Computer Architecture - Selected Papers. pp 54-58. - William J. Dally, Virtual Channel Flow Control, _IEEE Transactions on Parallel and Distributed Systems_March, 1992, pp. 194-205.
A more complete list of publications can be found at theCVA group publications page
Recent Talks, Etc...
- Tomorrow's Computing Engines, presented at HPCA on February 3, 1998
- Retrospective Paper on the J-Machine
- Interconnect-Oriented Architecture and Circuits
- VLSI Architecture: Past, Present, and Future
Courses
- CS 99S -- The Coming Revolution in Computer Architecture (freshman seminar)
- EE271 Introduction to VLSI Systems
- EE273 -- Digital Systems Engineering
- EE282 Computer Architecture and Organization
- EE482a--Advanced Computer Organization: Processor Microarchitecture
- EE482b--Advanced Computer Organization: Interconnection Networks
- 6.823 Computer System Architecture
- 6.845 Concurrent VLSI Architecture
- 6.915 Digital Systems Engineering
CVA People
William J. Dally
Stanford University
Computer Systems Laboratory
Gates Room 314
Stanford, CA 94305
(650) 725-8945
FAX: (650) 725-6949