ZoeDepth (original) (raw)

PyTorch

ZoeDepth is a depth estimation model that combines the generalization performance of relative depth estimation (how far objects are from each other) and metric depth estimation (precise depth measurement on metric scale) from a single image. It is pre-trained on 12 datasets using relative depth and 2 datasets (NYU Depth v2 and KITTI) for metric accuracy. A lightweight head with a metric bin module for each domain is used, and during inference, it automatically selects the appropriate head for each input image with a latent classifier.

drawing

You can find all the original ZoeDepth checkpoints under the Intel organization.

The example below demonstrates how to estimate depth with Pipeline or the AutoModel class.

import requests import torch from transformers import pipeline from PIL import Image

url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg" image = Image.open(requests.get(url, stream=True).raw) pipeline = pipeline( task="depth-estimation", model="Intel/zoedepth-nyu-kitti", torch_dtype=torch.float16, device=0 ) results = pipeline(image) results["depth"]

Notes

Resources

ZoeDepthConfig

class transformers.ZoeDepthConfig

< source >

( backbone_config = None backbone = None use_pretrained_backbone = False backbone_kwargs = None hidden_act = 'gelu' initializer_range = 0.02 batch_norm_eps = 1e-05 readout_type = 'project' reassemble_factors = [4, 2, 1, 0.5] neck_hidden_sizes = [96, 192, 384, 768] fusion_hidden_size = 256 head_in_index = -1 use_batch_norm_in_fusion_residual = False use_bias_in_fusion_residual = None num_relative_features = 32 add_projection = False bottleneck_features = 256 num_attractors = [16, 8, 4, 1] bin_embedding_dim = 128 attractor_alpha = 1000 attractor_gamma = 2 attractor_kind = 'mean' min_temp = 0.0212 max_temp = 50.0 bin_centers_type = 'softplus' bin_configurations = [{'n_bins': 64, 'min_depth': 0.001, 'max_depth': 10.0}] num_patch_transformer_layers = None patch_transformer_hidden_size = None patch_transformer_intermediate_size = None patch_transformer_num_attention_heads = None **kwargs )

Parameters

This is the configuration class to store the configuration of a ZoeDepthForDepthEstimation. It is used to instantiate an ZoeDepth model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of the ZoeDepthIntel/zoedepth-nyu architecture.

Configuration objects inherit from PretrainedConfig and can be used to control the model outputs. Read the documentation from PretrainedConfig for more information.

Example:

from transformers import ZoeDepthConfig, ZoeDepthForDepthEstimation

configuration = ZoeDepthConfig()

model = ZoeDepthForDepthEstimation(configuration)

configuration = model.config

ZoeDepthImageProcessor

class transformers.ZoeDepthImageProcessor

< source >

( do_pad: bool = True do_rescale: bool = True rescale_factor: typing.Union[int, float] = 0.00392156862745098 do_normalize: bool = True image_mean: typing.Union[float, typing.List[float], NoneType] = None image_std: typing.Union[float, typing.List[float], NoneType] = None do_resize: bool = True size: typing.Optional[typing.Dict[str, int]] = None resample: Resampling = <Resampling.BILINEAR: 2> keep_aspect_ratio: bool = True ensure_multiple_of: int = 32 **kwargs )

Parameters

Constructs a ZoeDepth image processor.

preprocess

< source >

( images: typing.Union[ForwardRef('PIL.Image.Image'), numpy.ndarray, ForwardRef('torch.Tensor'), list['PIL.Image.Image'], list[numpy.ndarray], list['torch.Tensor']] do_pad: typing.Optional[bool] = None do_rescale: typing.Optional[bool] = None rescale_factor: typing.Optional[float] = None do_normalize: typing.Optional[bool] = None image_mean: typing.Union[float, typing.List[float], NoneType] = None image_std: typing.Union[float, typing.List[float], NoneType] = None do_resize: typing.Optional[bool] = None size: typing.Optional[int] = None keep_aspect_ratio: typing.Optional[bool] = None ensure_multiple_of: typing.Optional[int] = None resample: Resampling = None return_tensors: typing.Union[str, transformers.utils.generic.TensorType, NoneType] = None data_format: ChannelDimension = <ChannelDimension.FIRST: 'channels_first'> input_data_format: typing.Union[str, transformers.image_utils.ChannelDimension, NoneType] = None )

Parameters

Preprocess an image or batch of images.

ZoeDepthImageProcessorFast

class transformers.ZoeDepthImageProcessorFast

< source >

( **kwargs: typing_extensions.Unpack[transformers.models.zoedepth.image_processing_zoedepth_fast.ZoeDepthFastImageProcessorKwargs] )

Parameters

Constructs a fast Zoedepth image processor.

preprocess

< source >

( images: typing.Union[ForwardRef('PIL.Image.Image'), numpy.ndarray, ForwardRef('torch.Tensor'), list['PIL.Image.Image'], list[numpy.ndarray], list['torch.Tensor']] **kwargs: typing_extensions.Unpack[transformers.models.zoedepth.image_processing_zoedepth_fast.ZoeDepthFastImageProcessorKwargs] ) → <class 'transformers.image_processing_base.BatchFeature'>

Parameters

Returns

<class 'transformers.image_processing_base.BatchFeature'>

ZoeDepthForDepthEstimation

class transformers.ZoeDepthForDepthEstimation

< source >

( config )

Parameters

ZoeDepth model with one or multiple metric depth estimation head(s) on top.

This model inherits from PreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.

forward

< source >

( pixel_values: FloatTensor labels: typing.Optional[torch.LongTensor] = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None ) → transformers.modeling_outputs.DepthEstimatorOutput or tuple(torch.FloatTensor)

Parameters

A transformers.modeling_outputs.DepthEstimatorOutput or a tuple oftorch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (ZoeDepthConfig) and inputs.

The ZoeDepthForDepthEstimation forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Examples:

from transformers import AutoImageProcessor, ZoeDepthForDepthEstimation import torch import numpy as np from PIL import Image import requests

url = "http://images.cocodataset.org/val2017/000000039769.jpg" image = Image.open(requests.get(url, stream=True).raw)

image_processor = AutoImageProcessor.from_pretrained("Intel/zoedepth-nyu-kitti") model = ZoeDepthForDepthEstimation.from_pretrained("Intel/zoedepth-nyu-kitti")

inputs = image_processor(images=image, return_tensors="pt")

with torch.no_grad(): ... outputs = model(**inputs)

post_processed_output = image_processor.post_process_depth_estimation( ... outputs, ... source_sizes=[(image.height, image.width)], ... )

predicted_depth = post_processed_output[0]["predicted_depth"] depth = predicted_depth * 255 / predicted_depth.max() depth = depth.detach().cpu().numpy() depth = Image.fromarray(depth.astype("uint8"))

< > Update on GitHub