Use Models — detectron2 0.6 documentation (original) (raw)

detectron2

Build Models from Yacs Config

From a yacs config object, models (and their sub-models) can be built by functions such as build_model, build_backbone, build_roi_heads:

from detectron2.modeling import build_model model = build_model(cfg) # returns a torch.nn.Module

build_model only builds the model structure and fills it with random parameters. See below for how to load an existing checkpoint to the model and how to use the model object.

Load/Save a Checkpoint

from detectron2.checkpoint import DetectionCheckpointer DetectionCheckpointer(model).load(file_path_or_url) # load a file, usually from cfg.MODEL.WEIGHTS

checkpointer = DetectionCheckpointer(model, save_dir="output") checkpointer.save("model_999") # save to output/model_999.pth

Detectron2’s checkpointer recognizes models in pytorch’s .pth format, as well as the .pkl files in our model zoo. See API docfor more details about its usage.

The model files can be arbitrarily manipulated using torch.{load,save} for .pth files orpickle.{dump,load} for .pkl files.

Use a Model

A model can be called by outputs = model(inputs), where inputs is a list[dict]. Each dict corresponds to one image and the required keys depend on the type of model, and whether the model is in training or evaluation mode. For example, in order to do inference, all existing models expect the “image” key, and optionally “height” and “width”. The detailed format of inputs and outputs of existing models are explained below.

Training: When in training mode, all models are required to be used under an EventStorage. The training statistics will be put into the storage:

from detectron2.utils.events import EventStorage with EventStorage() as storage: losses = model(inputs)

Inference: If you only want to do simple inference using an existing model,DefaultPredictoris a wrapper around model that provides such basic functionality. It includes default behavior including model loading, preprocessing, and operates on single image rather than batches. See its documentation for usage.

You can also run inference directly like this:

model.eval() with torch.no_grad(): outputs = model(inputs)

Model Input Format

Users can implement custom models that support any arbitrary input format. Here we describe the standard input format that all builtin models support in detectron2. They all take a list[dict] as the inputs. Each dict corresponds to information about one image.

The dict may contain the following keys:

For inference of builtin models, only “image” key is required, and “width/height” are optional.

We currently don’t define standard input format for panoptic segmentation training, because models now use custom formats produced by custom data loaders.

How it connects to data loader:

The output of the default DatasetMapper is a dict that follows the above format. After the data loader performs batching, it becomes list[dict] which the builtin models support.

Model Output Format

When in training mode, the builtin models output a dict[str->ScalarTensor] with all the losses.

When in inference mode, the builtin models output a list[dict], one dict for each image. Based on the tasks the model is doing, each dict may contain the following fields:

Partially execute a model:

Sometimes you may want to obtain an intermediate tensor inside a model, such as the input of certain layer, the output before post-processing. Since there are typically hundreds of intermediate tensors, there isn’t an API that provides you the intermediate result you need. You have the following options:

  1. Write a (sub)model. Following the tutorial, you can rewrite a model component (e.g. a head of a model), such that it does the same thing as the existing component, but returns the output you need.
  2. Partially execute a model. You can create the model as usual, but use custom code to execute it instead of its forward(). For example, the following code obtains mask features before mask head.
    images = ImageList.from_tensors(...) # preprocessed input tensor
    model = build_model(cfg)
    model.eval()
    features = model.backbone(images.tensor)
    proposals, _ = model.proposal_generator(images, features)
    instances, _ = model.roi_heads(images, features, proposals)
    mask_features = [features[f] for f in model.roi_heads.in_features]
    mask_features = model.roi_heads.mask_pooler(mask_features, [x.pred_boxes for x in instances])
  3. Use forward hooks. Forward hooks can help you obtain inputs or outputs of a certain module. If they are not exactly what you want, they can at least be used together with partial execution to obtain other tensors.

All options require you to read documentation and sometimes code of the existing models to understand the internal logic, in order to write code to obtain the internal tensors.