Paper page - Textbooks Are All You Need (original) (raw)
Published on Jun 20, 2023
·
Submitted by
AK
on Jun 21, 2023
Authors:
,
,
,
,
,
,
,
,
,
,
Abstract
A new compact Transformer-based large language model for code, phi-1, achieves high accuracy on coding benchmarks despite having fewer parameters than competing models.
We introduce phi-1, a new large language model for code, with significantly smaller size than competing models: phi-1 is a Transformer-based model with 1.3B parameters, trained for 4 days on 8 A100s, using a selection of ``textbook quality" data from the web (6B tokens) and synthetically generated textbooks and exercises with GPT-3.5 (1B tokens). Despite this small scale, phi-1 attains pass@1 accuracy 50.6% on HumanEval and 55.5% on MBPP. It also displays surprising emergent properties compared to phi-1-base, our model before our finetuning stage on a dataset of coding exercises, and phi-1-small, a smaller model with 350M parameters trained with the same pipeline as phi-1 that still achieves 45% on HumanEval.
View arXiv page View PDF Add to collection
Get this paper in your agent:
hf papers read 2306.11644
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash
Models citing this paper 9
microsoft/phi-1 Text Generation • 1B • Updated Nov 24, 2025 • 8.97k • 220
professorf/phi-1-gguf Text Generation • 1B • Updated Aug 27, 2024 • 16 • 1
michaelfeil/ct2fast-phi-1 Text Generation • Updated Nov 30, 2023 • 11
Browse 9 models citing this paper
Datasets citing this paper 17
HuggingFaceTB/cosmopedia Viewer • Updated Aug 12, 2024• 31.1M • 20.4k • 690
nampdn-ai/tiny-codes Viewer • Updated Sep 30, 2023• 1.63M • 1.9k • 288
maywell/korean_textbooks Viewer • Updated Jan 10, 2024• 4.42M • 1.34k • 124
goendalf666/sales-conversations Viewer • Updated Oct 4, 2023• 3.41k • 294 • 43
Browse 17 datasets citing this paper