What is LAVIS? — LAVIS documentation (original) (raw)

LAVIS is a Python deep learning library for LAnguage-and-VISion research and applications. It features a unified design to access state-of-the-art foundation language-vision models (ALBEF,BLIP, ALPRO, CLIP), common tasks (retrieval, captioning, visual question answering, multimodal classification etc.) and datasets (COCO, Flickr, Nocaps, Conceptual Commons, SBU, etc.).

This library aims to provide engineers and researchers with a one-stop solution to rapidly develop models for their specific multimodal scenarios, and benchmark them across standard and customized datasets.

Key features of LAVIS include:

Other features include:

Supported Tasks, Models and Datasets

The following table shows the supported models and language-vision tasks by LAVIS. Adapting existing models to more tasks is possible and next to come in future releases.

Tasks Supported Models Supported Datasets Modalities
Image-text Pre-training ALBEF, BLIP COCO, VisualGenome, SBU, ConceptualCaptions image, text
Image-text Retrieval ALBEF, BLIP, CLIP COCO, Flickr30k image, text
Text-image Retrieval ALBEF, BLIP, CLIP COCO, Flickr30k image, text
Visual Question Answering ALBEF, BLIP VQAv2, OKVQA, A-OKVQA image, text
Image Captioning BLIP COCO, NoCaps image, text
Image Classification CLIP ImageNet image
Natural Language Visual Reasoning (NLVR) ALBEF, BLIP NLVR2 image, text
Visual Entailment (VE) ALBEF SNLI-VE image, text
Visual Dialogue BLIP VisDial image, text
Video-text Retrieval BLIP, ALPRO MSRVTT, DiDeMo video, text
Text-video Retrieval BLIP, ALPRO MSRVTT, DiDeMo video, text
Video Question Answering (VideoQA) BLIP, ALPRO MSRVTT, MSVD video, text
Video Dialogue VGD-GPT AVSD video, text
Multimodal Feature Extraction ALBEF, CLIP, BLIP, ALPRO customized image, text

Library Design

_images/architecture.png

LAVIS has six key modules.

Installation

  1. (Optional) Creating conda environment

conda create -n lavis python=3.8 conda activate lavis

  1. Cloning and building from source

git clone https://github.com/salesforce/LAVIS.git cd LAVIS pip install .

If you would like to develop on LAVIS, you may find it easier to build with editable mode: