AI Canon (original) (raw)

Research in artificial intelligence is increasing at an exponential rate. It’s difficult for AI experts to keep up with everything new being published, and even harder for beginners to know where to start.

So, in this post, we’re sharing a curated list of resources we’ve relied on to get smarter about modern AI. We call it the “AI Canon” because these papers, blog posts, courses, and guides have had an outsized impact on the field over the past several years.

We start with a gentle introduction to transformer and latent diffusion models, which are fueling the current AI wave. Next, we go deep on technical learning resources; practical guides to building with large language models (LLMs); and analysis of the AI market. Finally, we include a reference list of landmark research results, starting with “Attention is All You Need”—the 2017 paper by Google that introduced the world to transformer models and ushered in the age of generative AI.

A gentle introduction…

These articles require no specialized background and can help you get up to speed quickly on the most important parts of the modern AI wave.

Software 2.0: Andrej Karpathy was one of the first to clearly explain (in 2017!) why the new AI wave really matters. His argument is that AI is a new and powerful way to program computers. As LLMs have improved rapidly, this thesis has proven prescient, and it gives a good mental model for how the AI market may progress.
State of GPT: Also from Karpathy, this is a very approachable explanation of how ChatGPT / GPT models in general work, how to use them, and what directions R&D may take.
What is ChatGPT doing … and why does it work?: Computer scientist and entrepreneur Stephen Wolfram gives a long but highly readable explanation, from first principles, of how modern AI models work. He follows the timeline from early neural nets to today’s LLMs and ChatGPT.
Transformers, explained: This post by Dale Markowitz is a shorter, more direct answer to the question “what is an LLM, and how does it work?” This is a great way to ease into the topic and develop intuition for the technology. It was written about GPT-3 but still applies to newer models.
How Stable Diffusion works: This is the computer vision analogue to the last post. Chris McCormick gives a layperson’s explanation of how Stable Diffusion works and develops intuition around text-to-image models generally. For an even gentler introduction, check out this comic from r/StableDiffusion.

Foundational learning: neural networks, backpropagation, and embeddings

These resources provide a base understanding of fundamental ideas in machine learning and AI, from the basics of deep learning to university-level courses from AI experts.

Explainers

Deep learning in a nutshell: core concepts: This four-part series from Nvidia walks through the basics of deep learning as practiced in 2015, and is a good resource for anyone just learning about AI.
Practical deep learning for coders: Comprehensive, free course on the fundamentals of AI, explained through practical examples and code.
Word2vec explained: Easy introduction to embeddings and tokens, which are building blocks of LLMs (and all language models).
Yes you should understand backprop: More in-depth post on back-propagation if you want to understand the details. If you want even more, try the Stanford CS231n lecture on Youtube.

Courses

Stanford CS229: Introduction to Machine Learning with Andrew Ng, covering the fundamentals of machine learning.
Stanford CS224N: NLP with Deep Learning with Chris Manning, covering NLP basics through the first generation of LLMs.

Tech deep dive: understanding transformers and large models

There are countless resources—some better than others—attempting to explain how LLMs work. Here are some of our favorites, targeting a wide range of readers/viewers.

Explainers

The illustrated transformer: A more technical overview of the transformer architecture by Jay Alammar.
The annotated transformer: In-depth post if you want to understand transformers at a source code level. Requires some knowledge of PyTorch.
Let’s build GPT: from scratch, in code, spelled out: For the engineers out there, Karpathy does a video walkthrough of how to build a GPT model.
The illustrated Stable Diffusion : Introduction to latent diffusion models, the most common type of generative AI model for images.
RLHF: Reinforcement Learning from Human Feedback: Chip Huyen explains RLHF, which can make LLMs behave in more predictable and human-friendly ways. This is one of the most important but least well-understood aspects of systems like ChatGPT.
Reinforcement learning from human feedback: Computer scientist and OpenAI cofounder John Shulman goes deeper in this great talk on the current state, progress and limitations of LLMs with RLHF.

Courses

Stanford CS25: Transformers United, an online seminar on Transformers.
Stanford CS324: Large Language Models with Percy Liang, Tatsu Hashimoto, and Chris Re, covering a wide range of technical and non-technical aspects of LLMs.

Reference and commentary

Predictive learning, NIPS 2016: In this early talk, Yann LeCun makes a strong case for unsupervised learning as a critical element of AI model architectures at scale. Skip to 19:20 for the famous cake analogy, which is still one of the best mental models for modern AI.
AI for full-self driving at Tesla: Another classic Karpathy talk, this time covering the Tesla data collection engine. Starting at 8:35 is one of the great all-time AI rants, explaining why long-tailed problems (in this case stop sign detection) are so hard.
The scaling hypothesis: One of the most surprising aspects of LLMs is that scaling—adding more data and compute—just keeps increasing accuracy. GPT-3 was the first model to demonstrate this clearly, and Gwern’s post does a great job explaining the intuition behind it.
Chinchilla’s wild implications: Nominally an explainer of the important Chinchilla paper (see below), this post gets to the heart of the big question in LLM scaling: are we running out of data? This builds on the post above and gives a refreshed view on scaling laws.
A survey of large language models: Comprehensive breakdown of current LLMs, including development timeline, size, training strategies, training data, hardware, and more.
Sparks of artificial general intelligence: Early experiments with GPT-4: Early analysis from Microsoft Research on the capabilities of GPT-4, the current most advanced LLM, relative to human intelligence.
The AI revolution: How Auto-GPT unleashes a new era of automation and creativity: An introduction to Auto-GPT and AI agents in general. This technology is very early but important to understand—it uses internet access and self-generated sub-tasks in order to solve specific, complex problems or goals.
The Waluigi Effect: Nominally an explanation of the “Waluigi effect” (i.e., why “alter egos” emerge in LLM behavior), but interesting mostly for its deep dive on the theory of LLM prompting.

Practical guides to building with LLMs

A new application stack is emerging with LLMs at the core. While there isn’t a lot of formal education available on this topic yet, we pulled out some of the most useful resources we’ve found.

Reference

Build a GitHub support bot with GPT3, LangChain, and Python: One of the earliest public explanations of the modern LLM app stack. Some of the advice in here is dated, but in many ways it kicked off widespread adoption and experimentation of new AI apps.
Building LLM applications for production: Chip Huyen discusses many of the key challenges in building LLM apps, how to address them, and what types of use cases make the most sense.
Prompt Engineering Guide: For anyone writing LLM prompts—including app devs—this is the most comprehensive guide, with specific examples for a handful of popular models. For a lighter, more conversational treatment, try Brex’s prompt engineering guide.
Prompt injection: What’s the worst that can happen? Prompt injection is a potentially serious security vulnerability lurking for LLM apps, with no perfect solution yet. Simon Willison gives the definitive description of the problem in this post. Nearly everything Simon writes on AI is outstanding.
OpenAI cookbook: For developers, this is the definitive collection of guides and code examples for working with the OpenAI API. It’s updated continually with new code examples.
Pinecone learning center: Many LLM apps are based around a vector search paradigm. Pinecone’s learning center—despite being branded vendor content—offers some of the most useful instruction on how to build in this pattern.
LangChain docs: As the default orchestration layer for LLM apps, LangChain connects to just about all other pieces of the stack. So their docs are a real reference for the full stack and how the pieces fit together.

Courses

LLM Bootcamp: A practical course for building LLM-based applications with Charles Frye, Sergey Karayev, and Josh Tobin.
Hugging Face Transformers: Guide to using open-source LLMs in the Hugging Face transformers library.

LLM benchmarks

Chatbot Arena: An Elo-style ranking system of popular LLMs, led by a team at UC Berkeley. Users can also participate by comparing models head to head.
Open LLM Leaderboard: A ranking by Hugging Face, comparing open source LLMs across a collection of standard benchmarks and tasks.

Market analysis

We’ve all marveled at what generative AI can produce, but there are still a lot of questions about what it all means. Which products and companies will survive and thrive? What happens to artists? How should companies use it? How will it affect literally jobs and society at large? Here are some attempts at answering these questions.

a16z thinking

Who owns the generative AI platform?: Our flagship assessment of where value is accruing, and might accrue, at the infrastructure, model, and application layers of generative AI.
Navigating the high cost of AI compute: A detailed breakdown of why generative AI models require so many computing resources, and how to think about acquiring those resources (i.e., the right GPUs in the right quantity, at the right cost) in a high-demand market.
Art isn’t dead, it’s just machine-generated: A look at how AI models were able to reshape creative fields—often assumed to be the last holdout against automation—much faster than fields such as software development.
The generative AI revolution in games: An in-depth analysis from our Games team at how the ability to easily create highly detailed graphics will change how game designers, studios, and the entire market function. This follow-up piece from our Games team looks specifically at the advent of AI-generated content vis à vis user-generated content.
For B2B generative AI apps, is less more?: A prediction for how LLMs will evolve in the world of B2B enterprise applications, centered around the idea that summarizing information will ultimately be more valuable than producing text.
Financial services will embrace generative AI faster than you think: An argument that the financial services industry is poised to use generative AI for personalized consumer experiences, cost-efficient operations, better compliance, improved risk management, and dynamic forecasting and reporting.
Generative AI: The next consumer platform: A look at opportunities for generative AI to impact the consumer market across a range of sectors from therapy to ecommerce.
To make a real difference in health care, AI will need to learn like we do: AI is poised to irrevocably change how we look to prevent and treat illness. However, to truly transform drug discovery to care delivery, we should invest in creating an ecosystem of “specialist” AIs—that learn like our best physicians and drug developers do today.
The new industrial revolution: Bio x AI: The next industrial revolution in human history will be biology powered by artificial intelligence.

Other perspectives

On the opportunities and risks of foundation models: Stanford overview paper on Foundation Models. Long and opinionated, but this shaped the term.
State of AI Report: An annual roundup of everything going on in AI, including technology breakthroughs, industry development, politics/regulation, economic implications, safety, and predictions for the future.
GPTs are GPTs: An early look at the labor market impact potential of large language models: This paper from researchers at OpenAI, OpenResearch, and the University of of Pennsylvania predicts that “around 80% of the U.S. workforce could have at least 10% of their work tasks affected by the introduction of LLMs, while approximately 19% of workers may see at least 50% of their tasks impacted.”
Deep medicine: How artificial intelligence can make healthcare human again: Dr. Eric Topol reveals how artificial intelligence has the potential to free physicians from the time-consuming tasks that interfere with human connection. The doctor-patient relationship is restored. (a16z podcast)

Landmark research results

Most of the amazing AI products we see today are the result of no-less-amazing research, carried out by experts inside large companies and leading universities. Lately, we’ve also seen impressive work from individuals and the open source community taking popular projects into new directions, for example by creating automated agents or porting models onto smaller hardware footprints.

Here’s a collection of many of these papers and projects, for folks who really want to dive deep into generative AI. (For research papers and projects, we’ve also included links to the accompanying blog posts or websites, where available, which tend to explain things at a higher level. And we’ve included original publication years so you can track foundational research over time.)

Large language models

New models

Attention is all you need (2017): The original transformer work and research paper from Google Brain that started it all. (blog post)
BERT: pre-training of deep bidirectional transformers for language understanding (2018): One of the first publicly available LLMs, with many variants still in use today. (blog post)
Improving language understanding by generative pre-training (2018): The first paper from OpenAI covering the GPT architecture, which has become the dominant development path in LLMs. (blog post)
Language models are few-shot learners (2020): The OpenAI paper that describes GPT-3 and the decoder-only architecture of modern LLMs.
Training language models to follow instructions with human feedback (2022): OpenAI’s paper explaining InstructGPT, which utilizes humans in the loop to train models and, thus, better follow the instructions in prompts. This was one of the key unlocks that made LLMs accessible to consumers (e.g., via ChatGPT). (blog post)
LaMDA: language models for dialog applications (2022): A model form Google specifically designed for free-flowing dialog between a human and chatbot across a wide variety of topics. (blog post)
PaLM: Scaling language modeling with pathways (2022): PaLM, from Google, utilized a new system for training LLMs across thousands of chips and demonstrated larger-than-expected improvements for certain tasks as model size scaled up. (blog post). See also the PaLM-2 technical report.
OPT: Open Pre-trained Transformer language models (2022): OPT is one of the top performing fully open source LLMs. The release for this 175-billion-parameter model comes with code and was trained on publicly available datasets. (blog post)
Training compute-optimal large language models (2022): The Chinchilla paper. It makes the case that most models are data limited, not compute limited, and changed the consensus on LLM scaling. (blog post)
GPT-4 technical report (2023): The latest and greatest paper from OpenAI, known mostly for how little it reveals! (blog post). The GPT-4 system card sheds some light on how OpenAI treats hallucinations, privacy, security, and other issues.
LLaMA: Open and efficient foundation language models (2023): The model from Meta that (almost) started an open-source LLM revolution. Competitive with many of the best closed-source models but only opened up to researchers on a restricted license. (blog post)
Alpaca: A strong, replicable instruction-following model (2023): Out of Stanford, this model demonstrates the power of instruction tuning, especially in smaller open-source models, compared to pure scale.

Model improvements (e.g. fine-tuning, retrieval, attention)

Deep reinforcement learning from human preferences (2017): Research on reinforcement learning in gaming and robotics contexts, that turned out to be a fantastic tool for LLMs.
Retrieval-augmented generation for knowledge-intensive NLP tasks (2020): Developed by Facebook, RAG is one of the two main research paths for improving LLM accuracy via information retrieval. (blog post)
Improving language models by retrieving from trillions of tokens (2021): RETRO, for “Retrieval Enhanced TRansfOrmers,” is another approach—this one by DeepMind—to improve LLM accuracy by accessing information not included in their training data. (blog post)
LoRA: Low-rank adaptation of large language models (2021): This research out of Microsoft introduced a more efficient alternative to fine-tuning for training LLMs on new data. It’s now become a standard for community fine-tuning, especially for image models.
Constitutional AI (2022): The Anthropic team introduces the concept of reinforcement learning from AI Feedback (RLAIF). The main idea is that we can develop a harmless AI assistant with the supervision of other AIs.
FlashAttention: Fast and memory-efficient exact attention with IO-awareness (2022): This research out of Stanford opened the door for state-of-the-art models to understand longer sequences of text (and higher-resolution images) without exorbitant training times and costs. (blog post)
Hungry hungry hippos: Towards language modeling with state space models (2022): Again from Stanford, this paper describes one of the leading alternatives to attention in language modeling. This is a promising path to better scaling and training efficiency. (blog post)

Image generation models

Learning transferable visual models from natural language supervision (2021): Paper that introduces a base model—CLIP—that links textual descriptions to images. One of the first effective, large-scale uses of foundation models in computer vision. (blog post)
Zero-shot text-to-image generation (2021): This is the paper that introduced DALL-E, a model that combines the aforementioned CLIP and GPT-3 to automatically generate images based on text prompts. Its successor, DALL-E 2, would kick off the image-based generative AI boom in 2022. (blog post)
High-resolution image synthesis with latent diffusion models (2021): The paper that described Stable Diffusion (after the launch and explosive open source growth).
Photorealistic text-to-image diffusion models with deep language understanding (2022): Imagen was Google’s foray into AI image generation. More than a year after its announcement, the model has yet to be released publicly as of the publish date of this piece. (website)
DreamBooth: Fine tuning text-to-image diffusion models for subject-driven generation (2022): DreamBooth is a system, developed at Google, for training models to recognize user-submitted subjects and apply them to the context of a prompt (e.g. [USER] smiling at the Eiffel Tower). (website)
Adding conditional control to text-to-image diffusion models (2023): This paper from Stanford introduces ControlNet, a now very popular tool for exercising fine-grained control over image generation with latent diffusion models.

Agents

A path towards autonomous machine intelligence (2022): A proposal from Meta AI lead and NYU professor Yann LeCun on how to build autonomous and intelligent agents that truly understand the world around them.
ReAct: Synergizing reasoning and acting in language models (2022): A project out of Princeton and Google to test and improve the reasoning and planning abilities of LLMs. (blog post)
Generative agents: Interactive simulacra of human behavior (2023): Researchers at Stanford and Google used LLMs to power agents, in a setting akin to “The Sims,” whose interactions are emergent rather than programmed.
Reflexion: an autonomous agent with dynamic memory and self-reflection (2023): Work from researchers at Northeastern University and MIT on teaching LLMs to solve problems more reliably by learning from their mistakes and past experiences.
Toolformer: Language models can teach themselves to use tools (2023): This project from Meta trained LLMs to use external tools (APIs, in this case, pointing to things like search engines and calculators) in order to improve accuracy without increasing model size.
Auto-GPT: An autonomous GPT-4 experiment: An open source experiment to expand on the capabilities of GPT-4 by giving it a collection of tools (internet access, file storage, etc.) and choosing which ones to use in order to solve a specific task.
BabyAGI: This Python script utilizes GPT-4 and vector databases (to store context) in order to plan and executes a series of tasks that solve a broader objective.

Other data modalities

Code generation

Evaluating large language models trained on code (2021): This is OpenAI’s research paper for Codex, the code-generation model behind the GitHub Copilot product. (blog post)
Competition-level code generation with AlphaCode (2021): This research from DeepMind demonstrates a model capable of writing better code than human programmers. (blog post)
CodeGen: An open large language model for code with multi-turn program synthesis (2022): CodeGen comes out of the AI research arm at Salesforce, and currently underpins the Replit Ghostwriter product for code generation. (blog post)

Video generation

Make-A-Video: Text-to-video generation without text-video data (2022): A model from Meta that creates short videos from text prompts, but also adds motion to static photo inputs or creates variations of existing videos. (blog post)
Imagen Video: High definition video generation with diffusion models (2022): Just what it sounds like: a version of Google’s image-based Imagen model optimized for producing short videos from text prompts. (website)

Human biology and medical data

Strategies for pre-training graph neural networks (2020): This publication laid the groundwork for effective pre-training methods useful for applications across drug discovery, such as molecular property prediction and protein function prediction. (blog post)
Improved protein structure prediction using potentials from deep learning (2020): DeepMind’s protein-centric transformer model, AlphaFold, made it possible to predict protein structure from sequence—a true breakthrough which has already had far-reaching implications for understanding biological processes and developing new treatments for diseases. (blog post) (explainer)
Large language models encode clinical knowledge (2022): Med-PaLM is a LLM capable of correctly answering US Medical License Exam style questions. The team has since published results on the performance of Med-PaLM2, which achieved a score on par with “expert” test takers. Other teams have performed similar experiments with ChatGPT and GPT-4. (video)

Audio generation

Jukebox: A generative model for music (2020): OpenAI’s foray into music generation using transformers, capable of producing music, vocals, and lyrics with minimal training. (blog post)
AudioLM: a language modeling approach to audio generation (2022): AudioLM is a Google project for generating multiple types of audio, including speech and instrumentation. (blog post)
MusicLM: Generating nusic from text (2023): Current state of the art in AI-based music generation, showing higher quality and coherence than previous attempts. (blog post)

Multi-dimensional image generation

NeRF: Representing scenes as neural radiance fields for view synthesis (2020): Research from a UC-Berkeley-led team on “synthesizing novel views of complex scenes” using 5D coordinates. (website)
DreamFusion: Text-to-3D using 2D diffusion (2022): Work from researchers at Google and UC-Berkeley that builds on NeRF to generate 3D images from 2D inputs. (website)

Special thanks to Jack Soslow, Jay Rughani, Marco Mascorro, Martin Casado, Rajko Radovanovic, and Vijay Pande for their contributions to this piece, and to the entire a16z team for an always informative discussion about the latest in AI. And thanks to Sonal Chokshi and the crypto team for building a long series of canons at the firm.