Model Configurations and Parameters in Hugging Face (original) (raw)

Last Updated : 8 Apr, 2026

A pretrained model in Hugging Face consists of two core components, the model weights and the configuration file. The weights store the parameters learned during training, while the configuration file defines the model’s architecture and structure.

pretrained_transformer_model

Model configuration and parameters

The configuration determines how the model is constructed, whereas the weights represent the knowledge acquired through training. Both components work together to properly initialize and execute a transformer model.

Model Configuration in Hugging Face

This configuration defines the structural setup of the model before the weights are loaded. The configuration serves as the structural blueprint of the model. It specifies key architectural settings such as:

For example, the model bert-base-uncased includes architectural details such as:

These values define how the model is built internally and are stored in a configuration class accessed using AutoConfig. The configuration does not contain learned parameters; it only defines the architecture before loading pretrained weights.

Working with Model Configuration in Hugging Face

Model configuration in Hugging Face can be loaded, inspected and even customized before initializing a model. Below is a structured implementation showing how configuration works in real scenarios.

1. Install Required Libraries

Run the following command in your command prompt.

pip install transformers

2. Loading a Pretrained Configuration

The AutoConfig class allows you to load the configuration of any pretrained model.

from transformers import AutoConfig

config = AutoConfig.from_pretrained("bert-base-uncased") print(config)

`

**Output:

Output

Pretrained Config

3. Accessing Specific Configuration Parameters

We can directly inspect important attributes from the configuration object. These parameters define the size and capacity of the model:

print("Hidden Size:", config.hidden_size) print("Number of Layers:", config.num_hidden_layers) print("Attention Heads:", config.num_attention_heads) print("Vocabulary Size:", config.vocab_size)

`

**Output:

config

Parameters

4. Loading a Model with Its Configuration

This code loads a pretrained BERT model using AutoModel. When from_pretrained() is called, Hugging Face automatically reads the model’s configuration, builds the architecture and loads the pretrained weights into it.

from transformers import AutoModel

model = AutoModel.from_pretrained("bert-base-uncased")

`

**Output:

pretrained-autoconfig

Loading model and its configuration

5. Creating a Custom Configuration

You can also define a custom configuration before building a model.

from transformers import BertConfig, BertModel

custom_config = BertConfig( hidden_size=512, num_hidden_layers=6, num_attention_heads=8, intermediate_size=2048 )

model = BertModel(custom_config)

print(model.config)

`

**Output:

custom-config

Custom Configuration

Working with Model Parameters

Model parameters are the learnable weights inside a transformer. They determine how the model makes predictions and are updated during training. Below are practical and relevant operations you can perform with parameters.

1. Install Required Libraries

Run the following command in your command prompt

pip install transformers torch

2. Load a Pretrained Model

from transformers import AutoModel import torch

model = AutoModel.from_pretrained("bert-base-uncased")

`

**Output:

Output

Loading Pretrained model

3. View Model Parameters

for name, param in model.named_parameters(): print(name, param.shape) break

`

**Output:

embeddings.word_embeddings.weight torch.Size([30522, 768])

4. Count Total Parameters

total_params = sum(p.numel() for p in model.parameters()) print("Total Parameters:", total_params)

`

**Output:

Total Parameters: 109482240

5. Freeze Parameters

Setting requires_grad = False stops the model’s weights from updating during training. It is commonly used in transfer learning, where the pretrained model is kept fixed and only new layers are trained.

Python `

for param in model.parameters(): param.requires_grad = False

`

You can download the full code from here

Model Parameters vs Configuration

Aspect Configuration Parameters
Definition Defines the architecture and structural settings of the model Learnable weights adjusted during training
Purpose Determines how the model is built Stores learned knowledge from data
Contains Layers, hidden size, attention heads, dropout, vocabulary size Numerical weight values (millions or billions)
Role in Model Acts as the blueprint Acts as the learned intelligence
Changes During Training No Yes
Required For Building the model structure Making predictions and performing inference

Advantages

Limitations of Modifying