Hugging Face (original) (raw)
What is Hugging Face?
Hugging Face is a machine learning (ML) and data science platform and community that helps users build, deploy and train machine learning models.
It provides the infrastructure to demo, run and deploy artificial intelligence (AI) in live applications. Users can also browse through models and data sets that other people have uploaded. Hugging Face is often called the GitHub of machine learning because it lets developers share and test their work openly.
Hugging Face is known for its Transformers Python library, which simplifies the process of downloading and training ML models. The library gives developers an efficient way to include one of the ML models hosted on Hugging Face in their workflow and create ML pipelines.
The platform is important because of its open source nature and deployment tools. It allows users to share resources, models and research and to reduce model training time, resource consumption and environmental impact of AI development.
Hugging Face Inc. is the American company that created the Hugging Face platform. The company was founded in New York City in 2016 by French entrepreneurs Clément Delangue, Julien Chaumond and Thomas Wolf. The company originally developed a chatbot app by the same name for teenagers. The company switched its focus to being a machine learning platform after open sourcing the model behind the chatbot app.
In 2023, the company announced a partnership with Amazon Web Services to make Hugging Face products available to AWS customers for building custom applications. Google, Amazon and Nvidia are just a few of the companies that have invested in the startup as of this writing.
How is Hugging Face used?
Hugging Face is an AI platform and supporting community. The community uses Hugging Face to do the following:
- Implement machine learning models. Users can upload machine learning models to the platform. There are models for a variety of functions, including natural language processing (NLP), computer vision, image generation and audio.
- Share and discover machine learning models. Through Spaces and the Hugging Face Transformers library, researchers and developers can share models with the community. Other users can download these models and use them in their own applications.
- Share and discover data sets. Researchers and developers can share data sets for training machine learning models or discover data sets to train their models through the Datasets library.
- Fine-tune models. Users can fine-tune and train deep learning models using Hugging Face's application programming interface (API) tools.
- Host demos. Hugging Face lets users create interactive, in-browser demos of machine learning models. This lets users showcase and test models more easily.
- Research. Hugging Face has been involved in collaborative research projects, such as the BigScience research workshop, aiming to advance the field of NLP. The site also hosts a page with a curated list of research papers.
- Develop business applications. Hugging Face's Enterprise Hub lets business users work with transformers, data sets and open source libraries in a privately hosted environment.
- Evaluate ML models. Hugging Face provides access to a code library for evaluating machine learning models and data sets.
Hugging Face features
The Hugging Face Hub is where to find some of the main features of Hugging Face, including the following:
- Models. Hugging Face hosts a large library of models that users can filter by type. As of this writing, there are more than 300,000 models on Hugging Face. Hugging Face also hosts some of the top open source ML models on the platform. Some of the models on the leaderboard at the time of this writing include the following:
- Data sets. Data sets help train models to understand patterns and relationships between data -- and creating a good data set can be difficult. Hugging Face provides access to data sets uploaded by the community that users can access. Some example data sets in the Hugging Face library include the following:
- the_pile_books3, which contains all data from Bibliotik in plain text. Bibliotik is a repository of 197,000 books.
- wikipedia, which contains data from Wikipedia.
- Anthropic/hh-rlhf, which contains human preference data about the helpfulness and harmlessness of AI outputs.
- imdb, which contains a large collection of movie reviews.
- Spaces. Machine learning models on their own typically require technical knowledge to implement and use. Spaces packages models in a user-friendly experience that lets users showcase their work. Hugging Face provides the computing resources necessary to host demos. Spaces doesn't require any technical knowledge to use. Some examples of Hugging Face Spaces include the following:
- LoRA the Explorer image generator. Users can generate images in a variety of different styles based on a prompt.
- MusicGen music generator. MusicGen lets users generate music based on a description of the desired output or sample audio.
- Image to Story. Users can upload an image, and a large language model uses text generation to write a story based on it.
How to sign up for Hugging Face
Hugging Face is free to sign up for as a community contributor. Users get a Git-based repository where they can store Models, Datasets and Spaces. After creating an account, users can do the following:
- Check the activity feed.
- Access the Hugging Face Hub.
- Create organizations or private repositories.
- Explore their profile and adjust settings.
- Initiate a new Model, Dataset or Space.
- Discover the latest trends within the Hugging Face community.
- Review the organizations the user is a part of and access their specific sections.
- Access useful ML resources and documentation.
Hugging Face also offers a paid pro account that gives users access to more features, and an enterprise account at a slightly higher rate. The enterprise account adds enterprise-grade security and access control features, as well as dedicated customer support.
Benefits of using Hugging Face
The open source, communal nature of Hugging Face provides several benefits:
- Accessibility. Hugging Face helps users bypass restrictive compute and skill requirements typical of AI development. The fact that Hugging Face provides pre-trained models, fine-tuning scripts and APIs for deployment makes the process of creating LLMs easier.
- Integration. Hugging Face helps users integrate multiple ML frameworks. For example, the Transformer library integrates with other ML frameworks such as PyTorch and TensorFlow.
- Prototyping. Hugging Face enables rapid prototyping and deployment of NLP and ML applications.
- Community. Hugging Face provides access to a vast community, continuously updated models, and documentation and tutorials.
- Cost-effective. Hugging Face provides cost-effective and scalable solutions for businesses. Building large ML models from scratch can be expensive, and using Hugging Face's hosted models saves money.
Challenges and considerations
There are also some considerations and risks to weigh against the benefits of Hugging Face, including the following:
- Bias. As with any pre-trained machine learning model, the models available on Hugging Face are susceptible to bias, which might cause the model to generate sexist, racist or homophobic content.
- Computational requirements. There are larger models on Hugging Face that need more compute than the default amount the platform provides, which users would need to purchase. For example, Bloom is a large multilingual language model that could potentially be costly to run.
- Support. The free and pro versions of the platform do not have dedicated customer support.
- Model search. It can sometimes be difficult to find appropriate models or libraries among the many hosted on the platform.
- Security. Enterprises using Hugging Face should make sure that the platform offers security measures that align with the data security needs of the business.
Hugging Face and the broader AI ecosystem
Hugging Face reinforces a more collaborative approach to AI development in comparison with other contemporary AI startups, which develop an AI service and charge people to use it while keeping the inner workings of the technology a trade secret.
As more companies seek to develop their own AI models, Hugging Face will provide developers with the tools to do so. As the saying goes: In a gold rush, sell shovels. Many large companies already collaborate with Hugging Face to take advantage of its development platform.
Hugging Face aims to distribute AI access to many instead of restricting it to a few key players. Some employees at generative AI companies hold the opinion that open source AI will out-compete closed source AI providers such as OpenAI and Google. A leaked communication from a Google researcher in early 2023 expressed the researcher's opinion that Google has "no moat" in the industry: "While we've been squabbling, a third faction has been quietly eating our lunch."
This was last updated in September 2023