Efficient, Customizable, and Reliable GenAI Inferences | OctoAI (original) (raw)

OctoAI is now NVIDIA

Innovators Choose OctoAI

“Working with the OctoAI team, we were able to quickly evaluate the new model, validate its performance through our proof of concept phase, and move the model to production. Mixtral on OctoAI serves a majority of the inferences and end player experiences on AI Dungeon today.”

GenAI production stack: SaaS or in your environment

The foundation of OctoAI is systems and compilation technologies we’ve pioneered, like XG Boost, TVM, and MLC, giving you an enterprise system that runs in our SaaS or your private environment.

Diagram of OctoAI GenAI systems stack showing OctoAI's solutions, models, and AI serving stack powered by broad hardware

Enterprise-grade inference

check icon in blue

Predictable reliability

99.999% uptime with consistent latency SLAs.

performance icon in purple

Optimize Performance & Cost

Run GenAI inference at the lowest price and latency on our optimized serving layer.

cog with check icon in yellow

Future Proof Applications

Rapidly iterate with new models and infrastructure without rearchitecting anything.

customize icon with wrench in dark grey

Customize Freely

Mix and match models, fine tunes, and LoRAs at the model serving layer.

SOC 2 Type II & HIPPA certified

Your data security and privacy is a top priority for OctoAI. We continually invest in security capabilities and practices in our platform and processes.

Learn more

SOC 2 and HIPPA certified badges

Powerful capabilities for your GenAI apps

Build using state of the art solutions for your products with multiple models, thousands of LoRAs, your datasets, and orchestration logic.

Learn more

Open source LLMss going into the OctoAI platform and being used for your use cases: classification, chatbots, coding, summarization, and more

OctoStack from OctoAI: GenAI in your environment

OctoStack is a turnkey GenAI serving stack to run your optimized models in your environment on your GPUs. Lower your total cost of ownership and deploy models with greater agility while ensuring data privacy.

Learn more

Overview diagram of how OctoStack by OctoAI would work in your infrastructure environment

What’s New at OctoAI

box in gear icon

Latest Models

Phi 3.5-VisionThe newest from the Phi-3 family is a lightweight state-of-the-art multimodal model. This model comes with 128k context length, and was built with a focus on high quality reasoning for both text and vision. This model can use it's reasoning on both text and image inputs, and is available for commercial use.Vision-LanguageChatExperimentalMistral NeMoBuilt in collaboration with NVIDIA this state-of-the-art models has a 128k context window and has an Apache 2.0 license. This model excels at reasoning, coding accuracy, world knowledge, and is multilingual. CodingChatContent ModerationExperimentalFLUX.1 [Schnell]A 12 billion parameter model used to create high quality images from text prompts. FLUX models showcase superiority in creating text in images, high accuracy for human features, and multi-element spaces or landscapes. With super fast generation speeds and a commercial license it can be used for all your GenAI image products.Llama 3.1 InstructThe Meta Llama 3.1 models are instruction tuned and optimized for multilingual dialogue. Currently, they outperform many open source and closed chat models on several industry benchmarks. Supported languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.

See all models

Demos & Webinars

Optimizing LLMs for cost and qualityThis technical webinar will review fine tuning models for performance, model quality optimization, devops for LLM apps, and a full demo showing how to fine tune OSS models for better quality than closed models.Fine-tuningModel SelectionText GenerationHarnessing Agentic AI: Function Calling FoundationsWatch our on-demand webinar about how to create AI agents using function calling for your AI apps. This technical deep dive has a presentation, demo, and example code to follow.On-demandWebinarText GenerationQuestion AnsweringSummarizationAll about fine-tuning LLMsListen on-demand to a panel of experts talking about various fine-tunes available, how to create your own fine-tune, alternatives to custom fine-tunes, and more.WebinarOn-demandText GenerationFine-tuningSelecting the right GenAI model for productionWatch our on-demand webinar as our engineers review all steps of model evaluation, testing, when to use checkpoints vs LoRAs, and how to get the best results.WebinarOn-demandModel SelectionText GenerationFine-tuning

View all demos & webinars

Your choice of models and fine tunes

Start building in minutes. Gain the freedom to run on any model or checkpoint on our efficient API endpoints.