chore: Update lmms-eval to support video evaluations for LLaVA models · EvolvingLMMs-Lab/lmms-eval@cbeee20 (original) (raw)

`@@ -8,6 +8,7 @@

8

9

`🏠 LMMs-Lab Homepage | 🎉 Blog | 📚 Documentation | 🤗 Huggingface Datasets | discord/lmms-eval

10

11

12

13

`# Annoucement

13

14

`@@ -206,14 +207,41 @@ Please refer to our documentation.

206

207

208

`lmms_eval is a fork of lm-eval-harness. We recommend you to read through the docs of lm-eval-harness for relevant information.

208

209

210

211

+

209

212

`Below are the changes we made to the original API:

210

213

`- Build context now only pass in idx and process image and doc during the model responding phase. This is due to the fact that dataset now contains lots of images and we can't store them in the doc like the original lm-eval-harness other wise the cpu memory would explode.

211

214

`- Instance.args (lmms_eval/api/instance.py) now contains a list of images to be inputted to lmms.

212

215

`- lm-eval-harness supports all HF language models as single model class. Currently this is not possible of lmms because the input/output format of lmms in HF are not yet unified. Thererfore, we have to create a new class for each lmms model. This is not ideal and we will try to unify them in the future.

213

216

214

We also thank:

217

218

+

219

During the initial stage of our project, we thank:

215

220

`- Xiang Yue, Jingkang Yang, Dong Guo and Sheng Shen for early discussion and testing.

216

221

222

223

+

224

`` +

During the v0.1 to v0.2, we thank the community support from pull requests (PRs):

225

+

226

Datasets:

227

+

228

VCR: Vision_Caption_Restoration (officially from the authors, MILA)

229

ConBench (officially from the authors, PKU/Bytedance)

230

MathVerse (officially from the authors, CUHK)

231

MM-UPD (officially from the authors, University of Tokyo)

232

Multi-lingual MMMU (officially from the authors, CUHK)

233

WebSRC (from Hunter Heiden)

234

ScreeSpot (from Hunter Heiden)

235

RealworldQA (from Fanyi Pu, NTU)

236

Multi-lingual LLaVA-W (from Gagan Bhatia, UBC)

237

+

238

Models:

239

+

240

LLaVA-HF (officially from Huggingface)

241

Idefics-2 (from the lmms-lab team)

242

microsoft/Phi-3-Vision (officially from the authors, Microsoft)

243

LLaVA-SGlang (from the lams-lab team)

244

+

217

245

`## Citations

218

246

219

247

```` ```shell

````