chore: Update lmms-eval to support video evaluations for LLaVA models · dadwadw233/lmms-eval@ccf4fbf (original) (raw)

1

``

`-

`

``

1

`+

`

2

2

`

`

3

3

`

`

4

4

``

`@@ -11,7 +11,7 @@

`

11

11

``

12

12

`# Annoucement

`

13

13

``

14

``

`` -

``

``

14

`` +

``

15

15

``

16

16

`` - [2024-03] We have released the first version of lmms-eval, please refer to the blog for more details

``

17

17

``

`@@ -67,9 +67,32 @@ conda install openjdk=8

`

67

67

```` ```


`68`

`68`

`` you can then check your java version by `java -version` 

``

`69`

`69`

``

``

`70`

`+`

``

`71`

`+

<details>

`

``

`72`

`+

<summary>Comprehensive Evaluation Results of LLaVA Family Models</summary>

`

``

`73`

`+

<br>

`

``

`74`

`+`

``

`75`

`+

As demonstrated by the extensive table below, we aim to provide detailed information for readers to understand the datasets included in lmms-eval and some specific details about these datasets (we remain grateful for any corrections readers may have during our evaluation process).

`

``

`76`

`+`

``

`77`

`+

We provide a Google Sheet for the detailed results of the LLaVA series models on different datasets. You can access the sheet [here](https://docs.google.com/spreadsheets/d/1a5ImfdKATDI8T7Cwh6eH-bEsnQFzanFraFUgcS9KHWc/edit?usp=sharing). It's a live sheet, and we are updating it with new results.

`

``

`78`

`+`

``

`79`

`+

<p align="center" width="100%">

`

``

`80`

`+

<img src="https://i.postimg.cc/jdw497NS/WX20240307-162526-2x.png" width="100%" height="80%">

`

``

`81`

`+

</p>

`

``

`82`

`+`

``

`83`

`+

We also provide the raw data exported from Weights & Biases for the detailed results of the LLaVA series models on different datasets. You can access the raw data [here](https://docs.google.com/spreadsheets/d/1AvaEmuG4csSmXaHjgu4ei1KBMmNNW8wflOD_kkTDdv8/edit?usp=sharing).

`

``

`84`

`+`

``

`85`

`+

</details>

`

``

`86`

`+

<br>

`

``

`87`

`+`

``

`88`

`+`

``

`89`

`+

Our Development will be continuing on the main branch, and we encourage you to give us feedback on what features are desired and how to improve the library further, or ask questions, either in issues or PRs on GitHub.

`

``

`90`

`+`

`70`

`91`

`# Multiple Usages

`

``

`92`

`+`

``

`93`

`+

**Evaluation of LLaVA on MME**

`

``

`94`

`+`

`71`

`95`

```` ```bash

72

``

`-

Evaluation of LLaVA on MME

`

73

96

`python3 -m accelerate.commands.launch \

`

74

97

` --num_processes=8 \

`

75

98

` -m lmms_eval \

`

`@@ -80,8 +103,11 @@ python3 -m accelerate.commands.launch \

`

80

103

` --log_samples \

`

81

104

` --log_samples_suffix llava_v1.5_mme \

`

82

105

` --output_path ./logs/

`

``

106


```

``

107

+

``

108

`+

Evaluation of LLaVA on multiple datasets

`

83

109

``

84

``

`-

Evaluation of LLaVA on multiple datasets

`

``

110


```bash

85

111

`python3 -m accelerate.commands.launch \

`

86

112

` --num_processes=8 \

`

87

113

` -m lmms_eval \

`

`@@ -92,8 +118,11 @@ python3 -m accelerate.commands.launch \

`

92

118

` --log_samples \

`

93

119

` --log_samples_suffix llava_v1.5_mme_mmbenchen \

`

94

120

` --output_path ./logs/

`

``

121


```

95

122

``

96

``

`` -

For other variants llava. Note that conv_template is an arg of the init function of llava in lmms_eval/models/llava.py

``

``

123

`` +

For other variants llava. Note that conv_template is an arg of the init function of llava in lmms_eval/models/llava.py

``

``

124

+

``

125


```bash

97

126

`python3 -m accelerate.commands.launch \

`

98

127

` --num_processes=8 \

`

99

128

` -m lmms_eval \

`

`@@ -104,8 +133,11 @@ python3 -m accelerate.commands.launch \

`

104

133

` --log_samples \

`

105

134

` --log_samples_suffix llava_v1.5_mme_mmbenchen \

`

106

135

` --output_path ./logs/

`

``

136


```

107

137

``

108

``

`-

Evaluation of larger lmms (llava-v1.6-34b)

`

``

138

`+

Evaluation of larger lmms (llava-v1.6-34b)

`

``

139

+

``

140


```bash

109

141

`python3 -m accelerate.commands.launch \

`

110

142

` --num_processes=8 \

`

111

143

` -m lmms_eval \

`

`@@ -116,11 +148,17 @@ python3 -m accelerate.commands.launch \

`

116

148

` --log_samples \

`

117

149

` --log_samples_suffix llava_v1.5_mme_mmbenchen \

`

118

150

` --output_path ./logs/

`

``

151


```

``

152

+

``

153

`+

Evaluation with a set of configurations, supporting evaluation of multiple models and datasets

`

119

154

``

120

``

`-

Evaluation with a set of configurations, supporting evaluation of multiple models and datasets

`

``

155


```bash

121

156

`python3 -m accelerate.commands.launch --num_processes=8 -m lmms_eval --config ./miscs/example_eval.yaml

`

``

157


```

122

158

``

123

``

`-

Evaluation with naive model sharding for bigger model (llava-next-72b)

`

``

159

`+

Evaluation with naive model sharding for bigger model (llava-next-72b)

`

``

160

+

``

161


```bash

124

162

`python3 -m lmms_eval \

`

125

163

` --model=llava \

`

126

164

` --model_args=pretrained=lmms-lab/llava-next-72b,conv_template=qwen_1_5,device_map=auto,model_name=llava_qwen \

`

`@@ -130,8 +168,11 @@ python3 -m lmms_eval \

`

130

168

` --log_samples_suffix=llava_qwen \

`

131

169

` --output_path="./logs/" \

`

132

170

` --wandb_args=project=lmms-eval,job_type=eval,entity=llava-vl

`

``

171


```

``

172

+

``

173

`+

Evaluation with SGLang for bigger model (llava-next-72b)

`

133

174

``

134

``

`-

Evaluation with SGLang for bigger model (llava-next-72b)

`

``

175


```bash

135

176

`python3 -m lmms_eval \

`

136

177

` --model=llava_sglang \

`

137

178

` --model_args=pretrained=lmms-lab/llava-next-72b,tokenizer=lmms-lab/llavanext-qwen-tokenizer,conv_template=chatml-llava,tp_size=8,parallel=8 \

`

`@@ -143,26 +184,6 @@ python3 -m lmms_eval \

`

143

184

` --verbosity=INFO

`

144

185

```` ```

````

145

186

``

146

``

`-

`

147

``

`-

Comprehensive Evaluation Results of LLaVA Family Models

`

148

``

`-


`

149

``

-

150

``

`-

As demonstrated by the extensive table below, we aim to provide detailed information for readers to understand the datasets included in lmms-eval and some specific details about these datasets (we remain grateful for any corrections readers may have during our evaluation process).

`

151

``

-

152

``

`-

We provide a Google Sheet for the detailed results of the LLaVA series models on different datasets. You can access the sheet here. It's a live sheet, and we are updating it with new results.

`

153

``

-

154

``

`-

`

155

``

`-

`

156

``

`-

`

157

``

-

158

``

`-

We also provide the raw data exported from Weights & Biases for the detailed results of the LLaVA series models on different datasets. You can access the raw data here.

`

159

``

-

160

``

`-

`

161

``

`-


`

162

``

-

163

``

-

164

``

`-

Our Development will be continuing on the main branch, and we encourage you to give us feedback on what features are desired and how to improve the library further, or ask questions, either in issues or PRs on GitHub.

`

165

``

-

166

187

`## Supported models

`

167

188

``

168

189

`Please check supported models for more details.

`