Hugging Face Inference API (original) (raw)

Last Updated : 11 May, 2026

The Hugging Face Inference API is a cloud service that lets developers use pre-trained models from the Hugging Face Hub without managing infrastructure. It provides a simple interface via InferenceClient for quick integration.

Setting Up the Inference Client

1. Install Required Library

pip install huggingface_hub

2. Generating Hugging Face API Key

Before accessing the Inference API, you need an API key

**Refer: How to Access HuggingFace API key

3. Authenticating Using InferenceClient

You can initialize the InferenceClient in Python by passing your API token

Python `

from huggingface_hub import InferenceClient client = InferenceClient(token="YOUR_API_KEY", model="gpt2")

`

Practical Considerations

When using the Inference API in real world applications, it is important to account for operational factors that can impact performance and cost.

Inference with Inference Client

After authentication, the InferenceClient enables you to run models via API calls, where input is sent to Hugging Face servers and predictions are returned without local model execution.

1. Text Classification

Text classification predicts the sentiment or category of a given input using a pre-trained model hosted on the Hugging Face Hub.

from huggingface_hub import InferenceClient

client = InferenceClient( token="YOUR_HuggingFace_API_KEY", model="distilbert/distilbert-base-uncased-finetuned-sst-2-english" )

result = client.text_classification( text="I love using Hugging Face models!" )

print(result)

`

**Output:

[TextClassificationOutputElement(label='POSITIVE', score=0.9992625117301941), TextClassificationOutputElement(label='NEGATIVE', score=0.0007375259883701801)]

2. Text Generation

Text generation produces natural language output based on a given prompt using pre-trained generative models hosted on Hugging Face servers.

from huggingface_hub import InferenceClient

client = InferenceClient(token="YOUR_API_KEY",model="NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO")

messages = [ {"role": "user", "content": "What is the capital of France?"} ]

response = client.chat_completion(messages=messages, stream=False) print(response.choices[0].message.content)

`

**Output:

The capital of France is Paris.

3. Named Entity Recognition

Named Entity Recognition (NER) extracts structured information from text by identifying entities such as names, locations and organizations using pre-trained models.

from huggingface_hub import InferenceClient

client = InferenceClient(token="Yours HuggingFace API Key")

result = client.token_classification( model="dbmdz/bert-large-cased-finetuned-conll03-english", text="Hugging Face is based in New York." )

print(result)

`

**Output:

[TokenClassificationOutputElement(end=12, score=0.88766795, start=0, word='Hugging Face', entity=None, entity_group='ORG'), TokenClassificationOutputElement(end=33, score=0.9985268, start=25, word='New York', entity=None, entity_group='LOC')]

Error Handling and Status Codes

Errors during inference can occur due to invalid tokens, incorrect model names, rate limits, or network issues. Handling these cases ensures reliable and stable application behavior.

from huggingface_hub import InferenceClient import requests

client = InferenceClient( provider="hf-inference",
token="Yours Hugging Face APi Key" )

try: result = client.text_classification( "I love using Hugging Face models!",
model="finiteautomata/bertweet-base-sentiment-analysis" )

print(result)

except requests.exceptions.RequestException: print("Request Error, try later")

except Exception as e: print(f"Error: {e}")

`

**Output:

[TextClassificationOutputElement(label='POS', score=0.9913303852081299), TextClassificationOutputElement(label='NEU', score=0.007244149222970009), TextClassificationOutputElement(label='NEG', score=0.0014254497364163399)]

Advantages

Limitations