mcore_gpt_minitron — Model Optimizer 0.27.1 (original) (raw)

Module implementing top-level mcore_gpt_minitron pruning handler for NVIDIA Megatron-Core / NeMo models.

Minitron pruning algorithm uses activation magnitudes to estimate importance of neurons / attention heads in the model. More details on Minitron pruning algorithm can be found here: https://arxiv.org/pdf/2407.14679

Actual implementation is at modelopt.torch.nas.plugins.megatron.

Classes

MCoreGPTMinitronSearcher Searcher for Minitron pruning algorithm.

Functions

get_supported_model_config_map Get supported models (inside function to avoid circular imports).

class MCoreGPTMinitronSearcher

Bases: BaseSearcher

Searcher for Minitron pruning algorithm.

before_search()

Optional pre-processing steps before the search.

Return type:

None

property default_search_config_: dict[str, Any]_

Get the default config for the searcher.

property default_state_dict_: dict[str, Any]_

Return default state dict.

run_search()

Run actual search.

Return type:

None

sanitize_search_config(config)

Sanitize the search config dict.

Parameters:

config (dict [ str , Any ] | None) –

Return type:

_dict_[str, _Any_]

get_supported_model_config_map()

Get supported models (inside function to avoid circular imports).

Return type:

_dict_[type, _str_]