Serverless LLM Hosting - Featherless.ai (original) (raw)
Largest AI inference access to 4300 open source models
Instantly deploy at scale for fine-tuning, testing, and production with unlimited tokens.
Trusted by AI teams worldwide
Explore and access models instantly
We provide inference via API to a continually expanding library of open-weight models, including the most popular models for coding assistance, deep research, creative writing, and more.
How people use Featherless
OpenHands
OpenHands is an open source AI software development platform to streamline sofware development by automating coding tasks using intelligent agents. Developers can now focus on more complex challenges teaming up with AI supported by Featherless. See how to get started in this guide.
Novelcrafter
NovelCrafter is an AI-powered writing platform designed to assist authors throughout the entire novel-writing process, from initial brainstorming to final edits. You can level up your creative writing with any model from Featherless extensive catalog, from ones that are known for poetic prose to specialized ones in dialogue or vast world knowledge.
WyvernChat
WyvernChat is a user-first AI chat app with sleek UX and consistent content policy. Finding the right model isn't simply a technical choice; it's giving life to your character within unique identity and personality. Featherless has built-in support into WyvernChat so you can make use of our growing catalog of open source models for your favorite characters and creative writing.
LangChain
LangChain is one of the most widely adopted libraries that offer developers powerful tools to manage complex prompts and conversational state. With our OpenAI SDK compatibility you can power your applications with Featherless and our catalog of open models. See the docs for LangChain and LiteLLM.
Featherless is a serverless inference provider offering advanced model loading and GPU orchestration capabilities. Access our extensive catalog of thousands of models without the burden of server management or operational overhead. Our transparent billing structure is predictable, ensuring no unexpected costs.
Provider | Cost | Speed | Choice |
---|---|---|---|
RunPod | (thousands) | ||
HuggingFace | (thousands) | ||
Anthropic | (<10 models) | ||
OpenRouter | (~200 models) | ||
Featherless | ![]() |
Flat pricing with unlimited tokens
Feather Basic
$10.00/month
Access to models up to 15B
Up to 2 concurrent connections
Up to 16K context
Regular speed
Feather Premium
$25.00/month
Access any model - no limit on size!
Up to 4 concurrent connections
Up to 16K context
Regular speed
Feather Scale
$75.00/month
Business plan that can scale to arbitrarily many concurrent connections
Each scale unit allows for:
8 concurrent requests to models less than or equal to 15B, or
4 concurrent requests to models less than or equal to 34B, or
2 concurrent requests to models less than or equal to 72B, or
a linear combination of the above
Private, secure, and anonymous usage - no logs
Concurrency is scaled based on quantity of the selected plan. DeepSeek R1 and V3 currently excluded
How many concurrencies do you need?
For enterprise, you can run your own catalog on us from your cloud with reduced GPU.
Frequently Asked Questions
What is Featherless?
Featherless is an LLM hosting provider that offers our subscribers access to a continually expanding library of HuggingFace models.
Featherless: Less hassle, less effort. Start now.
Do you log my chat history?
No. We do not log any of the prompts or completions sent to our API.
Which model architectures are supported?
Our goal is to provide serverless inference for all models on Hugging Face. We currently support a wide range of llama models including Llama 2 and 3, Mistral, Qwen and Deep Seek. For more details see https://featherless.ai/docs/model-compatibility.
How do I get models added?
Business customers can deploy models through their dashboard. Users on individual plans can request either on discord or by emailing [email protected].