Text Generation (original) (raw)

Generate text based on a prompt.

If you are interested in a Chat Completion task, which generates a response based on a list of messages, check out the chat-completion task.

For more details about the text-generation task, check out its dedicated page! You will find examples and related materials.

Explore all available models and find the one that suits you best here.

Using the API

from huggingface_hub import InferenceClient

client = InferenceClient( provider="hf-inference", api_key="hf_xxxxxxxxxxxxxxxxxxxxxxxx", )

completion = client.chat.completions.create( model="Qwen/Qwen3-235B-A22B", messages=""Can you please let us know more details about your "", max_tokens=512, )

print(completion.choices[0].message)

API specification

Request

Headers
authorization string Authentication header in the form 'Bearer: hf_****' when hf_**** is a personal user access token with “Inference Providers” permission. You can generate one from your settings page.
Payload
inputs* string
parameters object
adapter_id string Lora adapter id
best_of integer Generate best_of sequences and return the one if the highest token logprobs.
decoder_input_details boolean Whether to return decoder input token logprobs and ids.
details boolean Whether to return generation details.
do_sample boolean Activate logits sampling.
frequency_penalty number The parameter for frequency penalty. 1.0 means no penalty Penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim.
grammar unknown One of the following:
(#1) object
type* enum Possible values: json.
value* unknown A string that represents a JSON Schema. JSON Schema is a declarative language that allows to annotate JSON documents with types and descriptions.
(#2) object
type* enum Possible values: regex.
value* string
max_new_tokens integer Maximum number of tokens to generate.
repetition_penalty number The parameter for repetition penalty. 1.0 means no penalty. See this paper for more details.
return_full_text boolean Whether to prepend the prompt to the generated text
seed integer Random sampling seed.
stop string[] Stop generating tokens if a member of stop is generated.
temperature number The value used to module the logits distribution.
top_k integer The number of highest probability vocabulary tokens to keep for top-k-filtering.
top_n_tokens integer The number of highest probability vocabulary tokens to keep for top-n-filtering.
top_p number Top-p value for nucleus sampling.
truncate integer Truncate inputs tokens to the given size.
typical_p number Typical Decoding mass See Typical Decoding for Natural Language Generation for more information.
watermark boolean Watermarking with A Watermark for Large Language Models.
stream boolean

Response

Output type depends on the stream input parameter. If stream is false (default), the response will be a JSON object with the following fields:

Body
details object
best_of_sequences object[]
finish_reason enum Possible values: length, eos_token, stop_sequence.
generated_text string
generated_tokens integer
prefill object[]
id integer
logprob number
text string
seed integer
tokens object[]
id integer
logprob number
special boolean
text string
top_tokens array[]
id integer
logprob number
special boolean
text string
finish_reason enum Possible values: length, eos_token, stop_sequence.
generated_tokens integer
prefill object[]
id integer
logprob number
text string
seed integer
tokens object[]
id integer
logprob number
special boolean
text string
top_tokens array[]
id integer
logprob number
special boolean
text string
generated_text string

If stream is true, generated tokens are returned as a stream, using Server-Sent Events (SSE). For more information about streaming, check out this guide.

Body
details object
finish_reason enum Possible values: length, eos_token, stop_sequence.
generated_tokens integer
input_length integer
seed integer
generated_text string
index integer
token object
id integer
logprob number
special boolean
text string
top_tokens object[]
id integer
logprob number
special boolean
text string

< > Update on GitHub