Serverless LLM Hosting - Featherless.ai (original) (raw)

Largest AI inference access to 4300 open source models

Instantly deploy at scale for fine-tuning, testing, and production with unlimited tokens.

Get Started Read Docs

Trusted by AI teams worldwide

Background Light

Explore and access models instantly

We provide inference via API to a continually expanding library of open-weight models, including the most popular models for coding assistance, deep research, creative writing, and more.

Leaderboard

See All Models

How people use Featherless

OpenHands

OpenHands

OpenHands is an open source AI software development platform to streamline sofware development by automating coding tasks using intelligent agents. Developers can now focus on more complex challenges teaming up with AI supported by Featherless. See how to get started in this guide.

Novelcrafter

Novelcrafter

NovelCrafter is an AI-powered writing platform designed to assist authors throughout the entire novel-writing process, from initial brainstorming to final edits. You can level up your creative writing with any model from Featherless extensive catalog, from ones that are known for poetic prose to specialized ones in dialogue or vast world knowledge.

WyvernChat

WyvernChat

WyvernChat is a user-first AI chat app with sleek UX and consistent content policy. Finding the right model isn't simply a technical choice; it's giving life to your character within unique identity and personality. Featherless has built-in support into WyvernChat so you can make use of our growing catalog of open source models for your favorite characters and creative writing.

LangChain

LangChain

LangChain is one of the most widely adopted libraries that offer developers powerful tools to manage complex prompts and conversational state. With our OpenAI SDK compatibility you can power your applications with Featherless and our catalog of open models. See the docs for LangChain and LiteLLM.

Featherless is a serverless inference provider offering advanced model loading and GPU orchestration capabilities. Access our extensive catalog of thousands of models without the burden of server management or operational overhead. Our transparent billing structure is predictable, ensuring no unexpected costs.

Provider Cost Speed Choice
RunPod (thousands)
HuggingFace (thousands)
Anthropic (<10 models)
OpenRouter (~200 models)
Featherless Featherless

Grid Background

Flat pricing with unlimited tokens

Feather Basic

$10.00/month

Feature icon

Access to models up to 15B

Feature icon

Up to 2 concurrent connections

Feature icon

Up to 16K context

Feature icon

Regular speed

Feather Premium

$25.00/month

Feature icon

Access any model - no limit on size!

Feature icon

Up to 4 concurrent connections

Feature icon

Up to 16K context

Feature icon

Regular speed

Feather Scale

$75.00/month

Business plan that can scale to arbitrarily many concurrent connections

Each scale unit allows for:

8 concurrent requests to models less than or equal to 15B, or

4 concurrent requests to models less than or equal to 34B, or

2 concurrent requests to models less than or equal to 72B, or

a linear combination of the above

Private, secure, and anonymous usage - no logs

Concurrency is scaled based on quantity of the selected plan. DeepSeek R1 and V3 currently excluded

How many concurrencies do you need?

For enterprise, you can run your own catalog on us from your cloud with reduced GPU.

See Details.

Frequently Asked Questions

What is Featherless?

Featherless is an LLM hosting provider that offers our subscribers access to a continually expanding library of HuggingFace models.
Featherless: Less hassle, less effort. Start now.

Do you log my chat history?

No. We do not log any of the prompts or completions sent to our API.

Which model architectures are supported?

Our goal is to provide serverless inference for all models on Hugging Face. We currently support a wide range of llama models including Llama 2 and 3, Mistral, Qwen and Deep Seek. For more details see https://featherless.ai/docs/model-compatibility.

How do I get models added?

Business customers can deploy models through their dashboard. Users on individual plans can request either on discord or by emailing [email protected].