Notice | Open Assistant (original) (raw)
Open Assistant has now concluded. Please seethis video for more information. Thanks you to all those who made this project possible.
Introduction
The FAQ page is available athere.
Open Assistant (abbreviated as OA) is a chat-based and open-source assistant. The vision of the project is to make a large language model that can run on a single high-end consumer GPU. With some modifications, Open Assistant should also be able to interface with other third-party applications easily as well as retrieve information from databases and the Internet.
You can play with our current best model here!
You should join theOpen Assistant discord serverand/or comment on Github issues before making any major changes. Most dev communications take place on the Discord server. There are four main areas that you can work on:
- Ranking, labelling and making responses inopen-assistant.io. You can take a look attasks docs section for more information.
- Curating datasets and performing data augmentation. This includes scraping, gathering other public datasets, etc. Most of these efforts will be concentrated at/data/datasetsand are documented athere.
- Creating and fine-tuning Open Assistant itself. For that, you should pay special attention to/model.
- open-assistant.io dev. Take a close look at/website as well as/backend.
GitHub folders explanation
Do read thedeveloper guidefor further information.
Here's a list of first-level folders atOpen Assistant's Github page.
- /ansible - for managing the full stack usingAnsible
- /assets - contains logos
- /backend - backend for open-assistant.io and discord bots, maybe helpful for locally test API calls
- /copilot - read more at AWS's Copilot. And no, this is not a folder that contains something similar to OpenAI's Codex.
- /data - contains/data/datasetsthat contains data scraping code and links to datasets on Hugging Face
- /deploy
- /discord-bot - frontend as discord bots for volunteer data collection
- /docker
- /docs - this website!
- /inference - inference pipeline for Open Assistant model
- /model - currently contains scripts and tools for training/fine-tuning Open Assistant and other neural networks
- */notebooks - DEPRECATED in favor of*/data/datasets. Contains jupyter notebooks for data scraping and augmentation
- /oasst-shared - shared Python code for Open Assistant
- /scripts - contains various scripts for things
- /text-frontend
- /website - everything in open-assistant.io, including gamification
Principles
- We put the human in the center
- We need to get the MVP out fast, while we still have momentum
- We pull in one direction
- We are pragmatic
- We aim for models that can (or could, with some effort) be run on consumer hardware
- We rapidly validate our ML experiments on a small scale, before going to a supercluster