What is a generative model? (original) (raw)

A generative model uses artificial intelligence (AI) and statistical and probabilistic methods to create representations or abstractions of observed phenomena or target variables. These representations can then be used to generate new data similar to the observed data.

Generative modeling is used in unsupervised machine learning (ML) to describe phenomena in data, enabling computers to understand the real world. This AI understanding can be used to predict all manner of probabilities about a subject from modeled data.

Generative models are a class of statistical models that generate new data instances.

How generative models work

Generative models are generally run on neural networks. To create a generative model, a large data set is typically required. The model is trained by feeding it various examples from the data set and adjusting its parameters to better match the distribution of data.

Once the model is trained, it can be used to generate new data by sampling from the learned distribution. The generated data can be similar to the original data set but with some variations or noise. For example, a data set containing images of horses could be used to build a model that can generate a new image of a horse that has never existed but still looks almost realistic. This is possible because the model has learned the general rules that govern the appearance of a horse.

Generative models can also be used in unsupervised learning to discover underlying patterns and structure in unlabeled data as well as many other applications, such as image generation, speech generation and data augmentation.

Types of generative models

The following are prominent types of generative models:

Generative adversarial network (GAN)

This model is based on ML and deep neural networks. In it, two unstable neural networks -- a generator and a discriminator -- compete against each other to provide more accurate predictions and realistic data.

A chart outlining the generative model training process.

The GAN training process.

A GAN is an unsupervised learning technique that makes it possible to automatically find and learn different patterns in input data. One of its main uses is image-to-image translation, which can change daylight photos into nighttime photos. GANs are also used to create incredibly lifelike renderings of a variety of objects, people and scenes that are challenging for even a human brain to identify as fake.

Variational autoencoders (VAEs)

Similar to GANs, VAEs are generative models based on neural network autoencoders, which are composed of two separate neural networks -- encoders and decoders. They're the most efficient and practical method for developing generative models.

A Bayesian inference-based probabilistic graphical model, VAE seeks to understand the underlying probability distribution of the training data so that it can quickly sample new data from that distribution. In VAEs, the encoders aim to represent data more effectively, whereas the decoders regenerate the original data set more efficiently. Popular applications of VAEs include anomaly detection for predictive maintenance, signal processing and security analytics applications.

Autoregressive models

Autoregressive models predict future values based on historical values and can easily handle a variety of time-series patterns. These models predict the future values of a sequence based on a linear combination of the sequence's past values.

Autoregressive models are widely used in forecasting and time series analysis, such as stock prices and index values. Other use cases include modeling and forecasting weather patterns, forecasting demand for products using past sales data and studying health outcomes and crime rates.

Bayesian networks

Bayesian networks are graphical models that depict probabilistic relationships between variables. They excel in situations where understanding cause and effect is vital. For instance, in medical diagnostics, a Bayesian network can effectively assess the probability of a disease based on observed symptoms.

Diffusion models

Diffusion models create data by progressively introducing noise and then learning to reverse this process.

They're instrumental in understanding how phenomena evolve and are particularly useful for analyzing situations such as the spread of rumors in social networks or the transmission of infectious diseases within a population.

Restricted Boltzmann machines

RBMs are two-layered neural networks capable of learning the probability distribution of input data. They're used in recommendation systems, such as suggesting movies on streaming services based on user preferences.

Pixel recurrent neural networks

PixelRNNs are a type of generative model designed for image generation tasks. They're based on the concept of recurrent neural networks and are specifically trained to model images pixel by pixel, to generate new images that resemble the ones in the training data.

Markov chains

Markov chains are generative models that forecast future states based solely on the current state while ignoring any prior states. They're commonly used in text generation, where the next word in a sentence is predicted based only on the word currently in use.

Normalizing flows

These generative models transform a simple, easily sampled probability distribution, such as a Gaussian distribution, into a more complex distribution capable of modeling real-world data.

The primary purpose of normalizing flows is to apply a series of invertible transformations to a simple distribution so that after these transformations, the resulting distribution closely matches the target data distribution.

Generative models use cases

Generative models have a wide array of applications across various fields. Some notable use cases of generative models include the following:

Generative modeling vs. discriminative modeling

Machine learning models are typically classified into discriminative and generative models. Both serve different purposes in ML, each with a unique approach to understanding data.

Generative modeling contrasts with discriminative modeling, which identifies existing data and can be used to classify data. Generative modeling produces something, whereas discriminative modeling captures the conditional probability, recognizes tags and sorts data. A generative model can be enhanced by a discriminative model and vice versa. This is done by having the generative model try to fool the discriminative model into believing the generated images are real. Through successions of training, both become more sophisticated at their tasks.

The following is a brief rundown of major differences between the two models:

Benefits of generative models

Generative models offer the following advantages, which make them valuable in various applications:

Challenges of generative models

Generative models provide several advantages, but they also have the following drawbacks:

Deep generative modeling

A subset of generative modeling, deep generative modeling uses deep neural networks to learn the underlying distribution of data. These models can develop novel samples that have never been seen before by producing new samples that are similar to the input data but not the same. Deep generative models come in many forms, including VAEs, GANs and autoregressive models. These models have proven promising in a wide range of applications, including text-to-image synthesis, music generation and drug discovery.

However, deep generative modeling remains an active area of research with many challenges. These include difficulties evaluating the quality of generated samples and preventing mode collapse, which can occur when the generator starts producing similar or identical samples, leading to a collapse in the modes of data distribution.

Large-scale deep generative models are increasingly popular. For example, BigGAN and VQ-VAE are used to generate images and can have hundreds of millions of parameters. Jukebox is another large generative model for musical audio that has billions of parameters. OpenAI's third-generation Generative Pre-trained Transformer (GPT-3) and its predecessors, which are autoregressive neural language models, also contain billions of parameters. But GPT-4o outshines all the previous versions of GPT in terms of dependability, originality and the capacity to comprehend complex instructions. It can process up to 64,000 tokens, enabling it to handle more complex prompts.

GPT-5 is expected to be released in 2025. Its training data is anticipated to be both extensive and diverse, combining around 70 trillion tokens across 281 terabytes of data.

Generative modeling history and timeline

Generative models have been a mainstay of AI since the 1950s. Early models at the time, including Hidden Markov models and Gaussian mixture models, provided simple data. However, the field has experienced a significant rise in popularity in recent years, thanks to the development of powerful generative models such as GANs and VAEs.

A timeline showing the history and evolution of generative AI.

The evolution and timeline of generative AI.

Ian Goodfellow first proposed GANs in 2014, as well as the two-part generator and discriminator architecture. The generator creates new data, while the discriminator tries to distinguish between the generated data and real data. The generator learns to improve its output by attempting to fool the discriminator.

In 2017, the transformer -- a deep learning architecture that underpins large language models including GPT-3, Google LaMDA and DeepMind Gopher -- was introduced. The transformer can generate text, computer code and even protein structures.

In 2021, OpenAI introduced a technique called Contrastive Language-Image Pre-training (CLIP) that's used heavily by text-to-image generators. Using image-caption pairs gathered from the internet, CLIP is particularly successful at discovering shared embeddings between images and text.

Since CLIP's release, multiple vision-language algorithms have emerged, including MetaAI's MetaCLIP, PubmedCLIP for medical and visual question and answering, and BioCLIP for classifying items by their biological taxonomy.

Recent AI generative services are aiding generative AI's quick and unparalleled rise to fame. Examples include OpenAI's Dall-E and ChatGPT.

The release of GPT-4 and GPT-4 Vision in 2023 ignited the multimodal revolution, demonstrating remarkable capabilities in processing text and visual data and elevating multimodal AI to new heights by enabling even more sophisticated and realistic interactions. This rapid progress, along with the 2024 release of the latest version, GPT-4o, has solidified multimodal AI and large multimodal models as some of the most prominent trends in generative AI for 2024.

These models have been applied in various fields, such as computer vision, natural language processing and music generation. Generative modeling has also seen advancements in quantum machine learning and reinforcement learning. In general, the rise of generative modeling has opened up many new possibilities for AI and has the potential to transform a wide range of industries, from entertainment to healthcare.

GANs and VAEs are two popular generative AI approaches. Analyze the benefits and drawbacks of each method and discover how GANs and VAEs stack up against each other.

This was last updated in January 2025

Continue Reading About What is a generative model?

Dig Deeper on AI technologies