📓 Text Generation docs rework (original) (raw)

What is this?

This is an issue to discuss and track the rework of the docs for text generation. Comments and feedback are appreciated, as always 🤗

Current issues

Our main reference for text generation is not in the docs and is quite outdated
The docs regarding text generation are scattered, and it is not simple to navigate between them -- the reader has to know where to look for them
We lack examples beyond the simplest forms of text generation
We have undocumented advanced use cases, such as setting a custom stopping criteria
We are not clear about what the user can't do

Proposed plan

EDIT:

incorporated feedback up to this comment (including)
Also includes this comment

I'd like to split the plan into three parts:

Designing a simpler entry point to text generation, from which all related documentation is discoverable
Upgrading the developer guides to cover the full potential of text generation
Make our code more self-documenting and other code changes

1. Designing a simpler entry point for text generation docs

Tackles issues 1 and 2.

This part is further divided into two actions:

The (blog post)[https://huggingface.co/blog/how-to-generate\] is still a solid reference for the background in text generation, but it holds old examples (tensorflow!) and focuses a bit too much on top_p/top_k. Let's retouch it.
Create a short tutorial to serve as an entry point to the multiple forms of text generation. Like the other tutorials, it contains references to related docs throughout the text (let's see if it is enough to handle discoverability -- we can create a stand-alone related docs section in the future if needed). It would also cover a few basics like "use left-padding when doing batched generation with decoder-only models" and "double-check your generate kwargs".

Related docs:

Tasks
Related developer guides
API reference
Outside transformers (e.g. optimum, text-generation-inference, LLM leaderboard, non-HF libs like autogptq?)

2. Upgrading the developer guides

Tackles issues 3 and 4.

We currently have one developer guide, which writes about the API and a few basic ways to manipulate text generation. I propose we improve the existing one and add 2 new guides, preferably with examples that cover more modalities and use cases:

1. Improve the existing guide -- Add a section about the impact of logits processors, and another on how stopping conditions operate.
2. "Prompting" -- Some basic "do and don'ts" regarding prompting and how different types of models respond differently to it (encoder-decoder vs decoder, instruction-tuned vs base), the importance of prompting on chat applications
3. Using LLMs, with a focus on the 1st L (large) -- write about variable types, quantization, device mapping, advanced architectures (alibi, rope, MQA/GQA), flash attention
4. Advanced examples (name?) -- Concrete use cases that make use of many features at once, to serve as inspiration: how to control between extractive and abstraction summarization, retrival-augmented generation, and other modality-specific examples

3. Self-documenting code and other code changes

Tackles issues 3 and 5.

Let's be honest -- the best user experience is when no docs are needed at all. We can improve our game here, by performing parameterization validation. Currently, our validation step is very superficial, and users are allowed to do things like passing temperature with do_sample=False, ultimately resulting in GH issues. I'd suggest performing a hard validation and throwing informative exceptions, pointing to the redesigned docs 🤗
In parallel, our logits processors and stopping condition classes are missing docstring examples on how to use them. This should make our API reference much more robust.