New (2h13m ๐Ÿ˜…) lecture: "Let's build the GPT Tokenizer"

Tokenizers are a completely separate stage of the LLM pipeline: they have their own training set, training algorithm (Byte Pair Encoding), and after training implement two functions: encode() from strings to tokens, and https://t.co/iSRD2la1Gv (original) (raw)

New (2h13m ๐Ÿ˜…) lecture: "Let's build the GPT Tokenizer" Tokenizers are a completely separate stage of the LLM pipeline: they have their own training set, training algorithm (Byte Pair Encoding), and after training implement two functions: encode() from strings to tokens, and ns to strings. In this lecture we build from scratch the Tokenizer used in the GPT series from OpenAI.

5:40 PM ยท Feb 20, 20241.7MViews