chore: Update lmms-eval to support video evaluations for LLaVA models · dadwadw233/lmms-eval@ccf4fbf (original) (raw)

1

2

3

4

`@@ -11,7 +11,7 @@

11

12

`# Annoucement

13

14

`` -

[2024-06] The lmms-eval/v0.2 has been upgraded to support video evaluations, and other feature updates. Please refer to the blog for more details

14

`` +

[2024-06] The lmms-eval/v0.2 has been upgraded to support video evaluations for video models like LLaVA-NeXT Video and Gemini 1.5 Pro across tasks such as EgoSchema, PerceptionTest, VideoMME, and more. Please refer to the blog for more details

15

16

`` - [2024-03] We have released the first version of lmms-eval, please refer to the blog for more details

17

`@@ -67,9 +67,32 @@ conda install openjdk=8

67

```` ```


`68`

`68`

`` you can then check your java version by `java -version` 

``

`69`

`69`

``

``

`70`

`+`

``

`71`

`+

<details>

`

``

`72`

`+

<summary>Comprehensive Evaluation Results of LLaVA Family Models</summary>

`

``

`73`

`+

<br>

`

``

`74`

`+`

``

`75`

`+

As demonstrated by the extensive table below, we aim to provide detailed information for readers to understand the datasets included in lmms-eval and some specific details about these datasets (we remain grateful for any corrections readers may have during our evaluation process).

`

``

`76`

`+`

``

`77`

`+

We provide a Google Sheet for the detailed results of the LLaVA series models on different datasets. You can access the sheet [here](https://docs.google.com/spreadsheets/d/1a5ImfdKATDI8T7Cwh6eH-bEsnQFzanFraFUgcS9KHWc/edit?usp=sharing). It's a live sheet, and we are updating it with new results.

`

``

`78`

`+`

``

`79`

`+

<p align="center" width="100%">

`

``

`80`

`+

<img src="https://i.postimg.cc/jdw497NS/WX20240307-162526-2x.png" width="100%" height="80%">

`

``

`81`

`+

</p>

`

``

`82`

`+`

``

`83`

`+

We also provide the raw data exported from Weights & Biases for the detailed results of the LLaVA series models on different datasets. You can access the raw data [here](https://docs.google.com/spreadsheets/d/1AvaEmuG4csSmXaHjgu4ei1KBMmNNW8wflOD_kkTDdv8/edit?usp=sharing).

`

``

`84`

`+`

``

`85`

`+

</details>

`

``

`86`

`+

<br>

`

``

`87`

`+`

``

`88`

`+`

``

`89`

`+

Our Development will be continuing on the main branch, and we encourage you to give us feedback on what features are desired and how to improve the library further, or ask questions, either in issues or PRs on GitHub.

`

``

`90`

`+`

`70`

`91`

`# Multiple Usages

`

``

`92`

`+`

``

`93`

`+

**Evaluation of LLaVA on MME**

`

``

`94`

`+`

`71`

`95`

```` ```bash

72

Evaluation of LLaVA on MME

73

96

`python3 -m accelerate.commands.launch \

74

97

` --num_processes=8 \

75

98

` -m lmms_eval \

`@@ -80,8 +103,11 @@ python3 -m accelerate.commands.launch \

80

103

` --log_samples \

81

104

` --log_samples_suffix llava_v1.5_mme \

82

105

` --output_path ./logs/

106

```

107

+

108

Evaluation of LLaVA on multiple datasets

83

109

84

Evaluation of LLaVA on multiple datasets

110


```bash

85

111

`python3 -m accelerate.commands.launch \

86

112

` --num_processes=8 \

87

113

` -m lmms_eval \

`@@ -92,8 +118,11 @@ python3 -m accelerate.commands.launch \

92

118

` --log_samples \

93

119

` --log_samples_suffix llava_v1.5_mme_mmbenchen \

94

120

` --output_path ./logs/

121

```

95

122

96

`` -

For other variants llava. Note that `conv_template` is an arg of the init function of llava in `lmms_eval/models/llava.py`

123

`` +

For other variants llava. Note that conv_template is an arg of the init function of llava in lmms_eval/models/llava.py

124

+

125


```bash

97

126

`python3 -m accelerate.commands.launch \

98

127

` --num_processes=8 \

99

128

` -m lmms_eval \

`@@ -104,8 +133,11 @@ python3 -m accelerate.commands.launch \

104

133

` --log_samples \

105

134

` --log_samples_suffix llava_v1.5_mme_mmbenchen \

106

135

` --output_path ./logs/

136

```

107

137

108

Evaluation of larger lmms (llava-v1.6-34b)

138

Evaluation of larger lmms (llava-v1.6-34b)

139

+

140


```bash

109

141

`python3 -m accelerate.commands.launch \

110

142

` --num_processes=8 \

111

143

` -m lmms_eval \

`@@ -116,11 +148,17 @@ python3 -m accelerate.commands.launch \

116

148

` --log_samples \

117

149

` --log_samples_suffix llava_v1.5_mme_mmbenchen \

118

150

` --output_path ./logs/

151

```

152

+

153

Evaluation with a set of configurations, supporting evaluation of multiple models and datasets

119

154

120

Evaluation with a set of configurations, supporting evaluation of multiple models and datasets

155


```bash

121

156

`python3 -m accelerate.commands.launch --num_processes=8 -m lmms_eval --config ./miscs/example_eval.yaml

157

```

122

158

123

Evaluation with naive model sharding for bigger model (llava-next-72b)

159

Evaluation with naive model sharding for bigger model (llava-next-72b)

160

+

161


```bash

124

162

`python3 -m lmms_eval \

125

163

` --model=llava \

126

164

` --model_args=pretrained=lmms-lab/llava-next-72b,conv_template=qwen_1_5,device_map=auto,model_name=llava_qwen \

`@@ -130,8 +168,11 @@ python3 -m lmms_eval \

130

168

` --log_samples_suffix=llava_qwen \

131

169

` --output_path="./logs/" \

132

170

` --wandb_args=project=lmms-eval,job_type=eval,entity=llava-vl

171

```

172

+

173

Evaluation with SGLang for bigger model (llava-next-72b)

133

174

134

Evaluation with SGLang for bigger model (llava-next-72b)

175


```bash

135

176

`python3 -m lmms_eval \

136

177

` --model=llava_sglang \

137

178

` --model_args=pretrained=lmms-lab/llava-next-72b,tokenizer=lmms-lab/llavanext-qwen-tokenizer,conv_template=chatml-llava,tp_size=8,parallel=8 \

`@@ -143,26 +184,6 @@ python3 -m lmms_eval \

143

184

` --verbosity=INFO

144

185

```` ```

````

145

186

146

147

Comprehensive Evaluation Results of LLaVA Family Models

148

149

-

150

As demonstrated by the extensive table below, we aim to provide detailed information for readers to understand the datasets included in lmms-eval and some specific details about these datasets (we remain grateful for any corrections readers may have during our evaluation process).

151

-

152

We provide a Google Sheet for the detailed results of the LLaVA series models on different datasets. You can access the sheet here. It's a live sheet, and we are updating it with new results.

153

-

154

155

156

157

-

158

We also provide the raw data exported from Weights & Biases for the detailed results of the LLaVA series models on different datasets. You can access the raw data here.

159

-

160

161

162

-

163

-

164

Our Development will be continuing on the main branch, and we encourage you to give us feedback on what features are desired and how to improve the library further, or ask questions, either in issues or PRs on GitHub.

165

-

166

187

`## Supported models

167

188

168

189

`Please check supported models for more details.

chore: Update lmms-eval to support video evaluations for LLaVA models · dadwadw233/lmms-eval@ccf4fbf (original) (raw)

Evaluation of LLaVA on MME

Evaluation of LLaVA on multiple datasets

For other variants llava. Note that conv_template is an arg of the init function of llava in lmms_eval/models/llava.py

Evaluation of larger lmms (llava-v1.6-34b)

Evaluation with a set of configurations, supporting evaluation of multiple models and datasets

Evaluation with naive model sharding for bigger model (llava-next-72b)

Evaluation with SGLang for bigger model (llava-next-72b)

For other variants llava. Note that `conv_template` is an arg of the init function of llava in `lmms_eval/models/llava.py`