chore: Update lmms-eval to support video evaluations for LLaVA models · dadwadw233/lmms-eval@ccf4fbf (original) (raw)
1
``
`-
`
``
1
`+
`
2
2
`
`
3
3
`
`
4
4
``
`@@ -11,7 +11,7 @@
`
11
11
``
12
12
`# Annoucement
`
13
13
``
14
``
`` -
- [2024-06] The
lmms-eval/v0.2
has been upgraded to support video evaluations, and other feature updates. Please refer to the blog for more details
``
``
14
`` +
- [2024-06] The
lmms-eval/v0.2
has been upgraded to support video evaluations for video models like LLaVA-NeXT Video and Gemini 1.5 Pro across tasks such as EgoSchema, PerceptionTest, VideoMME, and more. Please refer to the blog for more details
``
15
15
``
16
16
`` - [2024-03] We have released the first version of lmms-eval
, please refer to the blog for more details
``
17
17
``
`@@ -67,9 +67,32 @@ conda install openjdk=8
`
67
67
```` ```
`68`
`68`
`` you can then check your java version by `java -version`
``
`69`
`69`
``
``
`70`
`+`
``
`71`
`+
<details>
`
``
`72`
`+
<summary>Comprehensive Evaluation Results of LLaVA Family Models</summary>
`
``
`73`
`+
<br>
`
``
`74`
`+`
``
`75`
`+
As demonstrated by the extensive table below, we aim to provide detailed information for readers to understand the datasets included in lmms-eval and some specific details about these datasets (we remain grateful for any corrections readers may have during our evaluation process).
`
``
`76`
`+`
``
`77`
`+
We provide a Google Sheet for the detailed results of the LLaVA series models on different datasets. You can access the sheet [here](https://docs.google.com/spreadsheets/d/1a5ImfdKATDI8T7Cwh6eH-bEsnQFzanFraFUgcS9KHWc/edit?usp=sharing). It's a live sheet, and we are updating it with new results.
`
``
`78`
`+`
``
`79`
`+
<p align="center" width="100%">
`
``
`80`
`+
<img src="https://i.postimg.cc/jdw497NS/WX20240307-162526-2x.png" width="100%" height="80%">
`
``
`81`
`+
</p>
`
``
`82`
`+`
``
`83`
`+
We also provide the raw data exported from Weights & Biases for the detailed results of the LLaVA series models on different datasets. You can access the raw data [here](https://docs.google.com/spreadsheets/d/1AvaEmuG4csSmXaHjgu4ei1KBMmNNW8wflOD_kkTDdv8/edit?usp=sharing).
`
``
`84`
`+`
``
`85`
`+
</details>
`
``
`86`
`+
<br>
`
``
`87`
`+`
``
`88`
`+`
``
`89`
`+
Our Development will be continuing on the main branch, and we encourage you to give us feedback on what features are desired and how to improve the library further, or ask questions, either in issues or PRs on GitHub.
`
``
`90`
`+`
`70`
`91`
`# Multiple Usages
`
``
`92`
`+`
``
`93`
`+
**Evaluation of LLaVA on MME**
`
``
`94`
`+`
`71`
`95`
```` ```bash
72
``
`-
Evaluation of LLaVA on MME
`
73
96
`python3 -m accelerate.commands.launch \
`
74
97
` --num_processes=8 \
`
75
98
` -m lmms_eval \
`
`@@ -80,8 +103,11 @@ python3 -m accelerate.commands.launch \
`
80
103
` --log_samples \
`
81
104
` --log_samples_suffix llava_v1.5_mme \
`
82
105
` --output_path ./logs/
`
``
106
```
``
107
+
``
108
`+
Evaluation of LLaVA on multiple datasets
`
83
109
``
84
``
`-
Evaluation of LLaVA on multiple datasets
`
``
110
```bash
85
111
`python3 -m accelerate.commands.launch \
`
86
112
` --num_processes=8 \
`
87
113
` -m lmms_eval \
`
`@@ -92,8 +118,11 @@ python3 -m accelerate.commands.launch \
`
92
118
` --log_samples \
`
93
119
` --log_samples_suffix llava_v1.5_mme_mmbenchen \
`
94
120
` --output_path ./logs/
`
``
121
```
95
122
``
96
``
`` -
For other variants llava. Note that conv_template
is an arg of the init function of llava in lmms_eval/models/llava.py
``
``
123
`` +
For other variants llava. Note that conv_template
is an arg of the init function of llava in lmms_eval/models/llava.py
``
``
124
+
``
125
```bash
97
126
`python3 -m accelerate.commands.launch \
`
98
127
` --num_processes=8 \
`
99
128
` -m lmms_eval \
`
`@@ -104,8 +133,11 @@ python3 -m accelerate.commands.launch \
`
104
133
` --log_samples \
`
105
134
` --log_samples_suffix llava_v1.5_mme_mmbenchen \
`
106
135
` --output_path ./logs/
`
``
136
```
107
137
``
108
``
`-
Evaluation of larger lmms (llava-v1.6-34b)
`
``
138
`+
Evaluation of larger lmms (llava-v1.6-34b)
`
``
139
+
``
140
```bash
109
141
`python3 -m accelerate.commands.launch \
`
110
142
` --num_processes=8 \
`
111
143
` -m lmms_eval \
`
`@@ -116,11 +148,17 @@ python3 -m accelerate.commands.launch \
`
116
148
` --log_samples \
`
117
149
` --log_samples_suffix llava_v1.5_mme_mmbenchen \
`
118
150
` --output_path ./logs/
`
``
151
```
``
152
+
``
153
`+
Evaluation with a set of configurations, supporting evaluation of multiple models and datasets
`
119
154
``
120
``
`-
Evaluation with a set of configurations, supporting evaluation of multiple models and datasets
`
``
155
```bash
121
156
`python3 -m accelerate.commands.launch --num_processes=8 -m lmms_eval --config ./miscs/example_eval.yaml
`
``
157
```
122
158
``
123
``
`-
Evaluation with naive model sharding for bigger model (llava-next-72b)
`
``
159
`+
Evaluation with naive model sharding for bigger model (llava-next-72b)
`
``
160
+
``
161
```bash
124
162
`python3 -m lmms_eval \
`
125
163
` --model=llava \
`
126
164
` --model_args=pretrained=lmms-lab/llava-next-72b,conv_template=qwen_1_5,device_map=auto,model_name=llava_qwen \
`
`@@ -130,8 +168,11 @@ python3 -m lmms_eval \
`
130
168
` --log_samples_suffix=llava_qwen \
`
131
169
` --output_path="./logs/" \
`
132
170
` --wandb_args=project=lmms-eval,job_type=eval,entity=llava-vl
`
``
171
```
``
172
+
``
173
`+
Evaluation with SGLang for bigger model (llava-next-72b)
`
133
174
``
134
``
`-
Evaluation with SGLang for bigger model (llava-next-72b)
`
``
175
```bash
135
176
`python3 -m lmms_eval \
`
136
177
` --model=llava_sglang \
`
137
178
` --model_args=pretrained=lmms-lab/llava-next-72b,tokenizer=lmms-lab/llavanext-qwen-tokenizer,conv_template=chatml-llava,tp_size=8,parallel=8 \
`
`@@ -143,26 +184,6 @@ python3 -m lmms_eval \
`
143
184
` --verbosity=INFO
`
144
185
```` ```
````
145
186
``
146
``
`-
`
147
``
`-
Comprehensive Evaluation Results of LLaVA Family Models
`
148
``
`-
`
149
``
-
150
``
`-
As demonstrated by the extensive table below, we aim to provide detailed information for readers to understand the datasets included in lmms-eval and some specific details about these datasets (we remain grateful for any corrections readers may have during our evaluation process).
`
151
``
-
152
``
`-
We provide a Google Sheet for the detailed results of the LLaVA series models on different datasets. You can access the sheet here. It's a live sheet, and we are updating it with new results.
`
153
``
-
154
``
`-
`
155
``
`-

`
156
``
`-
`
157
``
-
158
``
`-
We also provide the raw data exported from Weights & Biases for the detailed results of the LLaVA series models on different datasets. You can access the raw data here.
`
159
``
-
160
``
`-
`
161
``
`-
`
162
``
-
163
``
-
164
``
`-
Our Development will be continuing on the main branch, and we encourage you to give us feedback on what features are desired and how to improve the library further, or ask questions, either in issues or PRs on GitHub.
`
165
``
-
166
187
`## Supported models
`
167
188
``
168
189
`Please check supported models for more details.
`