chore: Update lmms-eval to support video evaluations for LLaVA models · EvolvingLMMs-Lab/lmms-eval@cbeee20 (original) (raw)
`@@ -8,6 +8,7 @@
`
8
8
``
9
9
`🏠 LMMs-Lab Homepage | 🎉 Blog | 📚 Documentation | 🤗 Huggingface Datasets | discord/lmms-eval
`
10
10
``
``
11
`+
`
11
12
``
12
13
`# Annoucement
`
13
14
``
`@@ -206,14 +207,41 @@ Please refer to our documentation.
`
206
207
``
207
208
`lmms_eval is a fork of lm-eval-harness. We recommend you to read through the docs of lm-eval-harness for relevant information.
`
208
209
``
``
210
`+
`
``
211
+
209
212
`Below are the changes we made to the original API:
`
210
213
`- Build context now only pass in idx and process image and doc during the model responding phase. This is due to the fact that dataset now contains lots of images and we can't store them in the doc like the original lm-eval-harness other wise the cpu memory would explode.
`
211
214
`- Instance.args (lmms_eval/api/instance.py) now contains a list of images to be inputted to lmms.
`
212
215
`- lm-eval-harness supports all HF language models as single model class. Currently this is not possible of lmms because the input/output format of lmms in HF are not yet unified. Thererfore, we have to create a new class for each lmms model. This is not ideal and we will try to unify them in the future.
`
213
216
``
214
``
`-
We also thank:
`
``
217
`+
`
``
218
+
``
219
`+
During the initial stage of our project, we thank:
`
215
220
`- Xiang Yue, Jingkang Yang, Dong Guo and Sheng Shen for early discussion and testing.
`
216
221
``
``
222
`+
`
``
223
+
``
224
`` +
During the v0.1
to v0.2
, we thank the community support from pull requests (PRs):
``
``
225
+
``
226
`+
Datasets:
`
``
227
+
``
228
`+
- VCR: Vision_Caption_Restoration (officially from the authors, MILA)
`
``
229
`+
- ConBench (officially from the authors, PKU/Bytedance)
`
``
230
`+
- MathVerse (officially from the authors, CUHK)
`
``
231
`+
- MM-UPD (officially from the authors, University of Tokyo)
`
``
232
`+
- Multi-lingual MMMU (officially from the authors, CUHK)
`
``
233
`+
- WebSRC (from Hunter Heiden)
`
``
234
`+
- ScreeSpot (from Hunter Heiden)
`
``
235
`+
- RealworldQA (from Fanyi Pu, NTU)
`
``
236
`+
- Multi-lingual LLaVA-W (from Gagan Bhatia, UBC)
`
``
237
+
``
238
`+
Models:
`
``
239
+
``
240
`+
- LLaVA-HF (officially from Huggingface)
`
``
241
`+
- Idefics-2 (from the lmms-lab team)
`
``
242
`+
- microsoft/Phi-3-Vision (officially from the authors, Microsoft)
`
``
243
`+
- LLaVA-SGlang (from the lams-lab team)
`
``
244
+
217
245
`## Citations
`
218
246
``
219
247
```` ```shell
````