Add MMStar by skyil7 · Pull Request #158 · EvolvingLMMs-Lab/lmms-eval (original) (raw)
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
Luodian added a commit that referenced this pull request
…roup tasks (#158)
feat: Add language tasks and rebase to latest features of lm-evaluation-harness
chore: Update lmms_eval task registration logic for group tasks
refactor: Update lmms_eval task registration logic for group tasks
Update pyproject.toml
MichalCiesiolka pushed a commit to MichalCiesiolka/lmms-eval-llmzszl that referenced this pull request
MichalCiesiolka referenced this pull request in MichalCiesiolka/lmms-eval-llmzszl
…s. (EvolvingLMMs-Lab#218)
Load tasks only one time (#139)
chore: Initialize tasks only once to avoid re-initialization
chore: Initialize tasks only once to avoid re-initialization
chore: Refactor task initialization to avoid re-initialization
chore: Update task initialization to fix include_path issue
chore: Update task initialization to fix include_path issue
Upload live_bench results (#140)
upload results
add a readme
chore: Update upload_results.py script to use shell syntax
Update upload_results.py
Update upload_results.py
Add Muirbench (#143)
handle gen kwargs in internvl2
Add muirbench
if no response directly return 0 (#142)
merge ov evals (#144)
chore: Update gpt_eval_model_name to "gpt-3.5-turbo" in mathvista.yaml
Squashed commit of the following:
commit 994c9f97a2f8db3e9b7d7933d1e1680acde5b70b Author: Yan Shu 570533048@qq.com Date: Mon Jul 8 17:21:23 2024 +0800
Add files via upload
- Squashed commit of the following:
commit e31cd7883d4555c7530795c7f102b8d78cbd372f Author: Bo Li drluodian@gmail.com Date: Wed Jul 10 12:08:08 2024 +1000
chore: Update lmms_eval/models/vila.py and lmms_eval/tasks/__init__.py
commit 1d8c980d1089f9d7702c3b92d5c85039f2809c6d Author: kcz358 kaichenzhang358@outlook.com Date: Tue Jul 9 02:08:52 2024 +0000
Rename xcomposer 4KHD
commit 6da76f36ecf5f9aa73057e767a4fcb60c99ff896 Author: Bo Li drluodian@gmail.com Date: Tue Jul 9 11:55:56 2024 +1000
Upgrade lmms-eval to version 0.2.1
commit cd1858523fcd8630082cbefba8710e0de3ee8805 Author: Bo Li drluodian@gmail.com Date: Tue Jul 9 11:52:23 2024 +1000
Upgrade lmms-eval to support more models and evaluation tasks
commit 672d7e5bb49dcb34e1b2fdeb09f3f4588dc583a6 Author: Bo Li drluodian@gmail.com Date: Tue Jul 9 11:43:41 2024 +1000
feat: Add tie_weights parameter to Llava model initialization
commit 2037a86261b55fa42b8ba3a04eab192b3e69d6ea Merge: e6844db1 a5c18692 Author: Bo Li drluodian@gmail.com Date: Tue Jul 9 11:37:12 2024 +1000
Fix gen kwargs image aspect ratio in internvl2
commit a5c186925de989b616f58a35ece36065a32b4594 Merge: 2ebec77f 557083a1 Author: Li Bo drluodian@gmail.com Date: Tue Jul 9 09:15:56 2024 +0800
Merge pull request #137 from shuyansy/main
add MLVU task
commit 557083a156c3dd67ac79e22b4202e9b69b6b00f4 Author: Yan Shu 570533048@qq.com Date: Mon Jul 8 16:56:50 2024 +0800
Add files via upload
commit 2ebec77f5606d79e9a7b995970e32792050606a1 Merge: 211bfede b23d349e Author: Li Bo drluodian@gmail.com Date: Mon Jul 8 11:53:06 2024 +0800
Merge pull request #136 from Dousia/main
Add detailcaps
commit b23d349e46d60dc149ffaa54d6e019f4996ed92d Author: ByteDance bytedance@MacBook-Pro.local Date: Sun Jul 7 23:24:19 2024 +0800
Add install capture_metric in env
commit c6e211d5f9dbb7572d3a141b6504cb1ca2007c33 Author: ByteDance bytedance@MacBook-Pro.local Date: Sun Jul 7 23:04:13 2024 +0800
Add detailcaps
commit 211bfedebad243ef82a8b0be36c3b5a9b9cb2f72 Merge: 7c208b76 79514eee Author: Li Bo drluodian@gmail.com Date: Tue Jul 2 23:05:12 2024 +0800
Merge pull request #133 from EvolvingLMMs-Lab/dev/wild_vision
Add wild vision bench
commit 79514eeebcfd6f655be2a10c776037d12a7b7214 Author: kcz358 kaichenzhang358@outlook.com Date: Mon Jul 1 15:10:02 2024 +0000
Fixing handling None filtered score
commit 725fac2781446958b905e1e6c6eb3c0a8e582e49 Author: kcz358 kaichenzhang358@outlook.com Date: Mon Jul 1 08:25:42 2024 +0000
Fixing dataset name
commit 8d963e132ac03fc0d835d480cfcfcabe72af143c Author: kcz358 kaichenzhang358@outlook.com Date: Mon Jul 1 08:24:51 2024 +0000
Fixing scoring logic
commit e2990d0a69e876721256fdf946c68ba7ae0cbdc1 Author: kcz358 kaichenzhang358@outlook.com Date: Mon Jul 1 06:06:57 2024 +0000
Hardcode to keep image for wild vision
commit ed381736730d8fb785b4ee919fdb751734ecef25 Author: kcz358 kaichenzhang358@outlook.com Date: Mon Jul 1 06:06:38 2024 +0000
Add wild vision 0617
commit 7c208b76640c986cfe94233dce735c3ca4ad4319 Author: Li Bo drluodian@gmail.com Date: Mon Jul 1 11:53:31 2024 +0800
Update README.md
commit 39d40dea47bc59ff04e8b0cbc445345098debc9a Merge: e19b43a3 ba7081c0 Author: Li Bo drluodian@gmail.com Date: Mon Jul 1 11:47:09 2024 +0800
Merge pull request #129 from Dannoopsy/mmbench_ru
add task MMBench-ru
commit e19b43a3a1e7212e623061b164b0419cc0dda689 Merge: 11fd7e3f a0de8970 Author: Li Bo drluodian@gmail.com Date: Mon Jul 1 11:46:58 2024 +0800
Merge pull request #128 from Dannoopsy/gqa-ru
add task gqa-ru
commit 11fd7e3fc05908aeb01e4a6161a7b55cd38b3122 Merge: 383e7fea a7522592 Author: Li Bo drluodian@gmail.com Date: Mon Jul 1 11:46:16 2024 +0800
Merge pull request #130 from lscpku/vitatecs
Add task VITATECS
commit a75225926e5954f85466d257f99acf0163fde596 Author: lscpku lisc99@pku.edu.cn Date: Fri Jun 28 20:37:06 2024 +0800
create new task vitatecs
commit ba7081c0abac840002d320e30733e891298dfa11 Author: Dannoopsy 63581325+Dannoopsy@users.noreply.github.com Date: Fri Jun 28 12:21:05 2024 +0300
change prompt to ru
commit 27ea9c0055a8abf3a8198829b8617018479918e2 Author: Dannoopsy belopolskikh.dd@phystech.edu Date: Thu Jun 27 17:17:29 2024 +0000
add mmbench_ru_dev
commit 383e7fead3138aedf62e9c0ec48303835ef26e2a Merge: 06fa000f ed2e7f79 Author: Li Bo drluodian@gmail.com Date: Fri Jun 28 00:14:10 2024 +0800
Merge pull request #126 from lorenzomammana/feature/external-package-integration
External package integration using plugins
commit ed2e7f792151d21bce8f1c498270b9391e1d5c85 Merge: 03947e14 06fa000f Author: Lorenzo Mammana mammanalorenzo@outlook.it Date: Thu Jun 27 15:38:10 2024 +0000
Merge branch 'main' into feature/external-package-integration
commit a0de89708d5e6f259bb17f0eaace3c5b901b275c Author: Dannoopsy belopolskikh.dd@phystech.edu Date: Tue Jun 25 11:11:37 2024 +0000
new task gqa-ru
commit 06fa000f60d3e4d160fac8ceb9959ae92a98f752 Author: kcz358 kaichenzhang358@outlook.com Date: Tue Jun 25 06:41:13 2024 +0000
Fix vid mme post prompt issue
commit b388d79e0df6f60068196cb7047453ebd22d6ef1 Author: Li Bo drluodian@gmail.com Date: Sun Jun 23 22:31:16 2024 +0800
Update activitynetqa_generation.yaml
commit 8f9d620fcd9d0a0742ee6bcf51ea63bd6b088a36 Author: Li Bo drluodian@gmail.com Date: Sun Jun 23 14:02:25 2024 +0800
Update pyproject.toml
commit 6341b7c15ce9fb28eb06b067ddb299d6cf2e16c3 Merge: fce85f1b 903b042b Author: Li Bo drluodian@gmail.com Date: Sun Jun 23 14:02:02 2024 +0800
Merge pull request #125 from EvolvingLMMs-Lab/dev/interleave
[Model] aligned llava-interleave model results on video tasks
commit 903b042be016016d4ebeecb07701f3076a2d323c Author: kcz358 kaichenzhang358@outlook.com Date: Sat Jun 22 12:07:13 2024 +0000
Remove unnecessary lines for video llava
commit d78ec86407b729a964906a8c2e50704b4bc74d06 Merge: ebe7217a fce85f1b Author: Li Bo drluodian@gmail.com Date: Sat Jun 22 13:57:31 2024 +0800
Merge branch 'main' into dev/interleave
commit ebe7217a486c1e754e42c2cbdb834e09fbbcc9b0 Author: kcz358 kaichenzhang358@outlook.com Date: Sat Jun 22 02:57:08 2024 +0000
Delete unnecessary lines
commit 120c474b056f9177c74e1fd9691d59e2f234b785 Author: kcz358 kaichenzhang358@outlook.com Date: Fri Jun 21 08:38:41 2024 +0000
Revise model registry for llava_hf and longva
commit 7d6201f921088afd3f52a35076e3c6fcc9aa518c Author: kcz358 kaichenzhang358@outlook.com Date: Fri Jun 21 08:38:24 2024 +0000
Add longva
commit 12f480699c71a12a24d4349d9b0681933201a3a6 Author: kcz358 kaichenzhang358@outlook.com Date: Fri Jun 21 08:35:39 2024 +0000
Remove unnecessary lines since use batched visuals now in llava
commit 12cea76f1f0f14b1fd1007c9d39a9b0557368637 Author: Bo Li drluodian@gmail.com Date: Thu Jun 20 18:15:32 2024 +0000
chore: Add loguru for logging in lmms_eval package
commit 03947e14a46fd25b412931f7c9c25f4a2971d0b4 Author: Lorenzo Mammana mammanalorenzo@outlook.it Date: Wed Jun 5 13:40:41 2024 +0000
feat: Allow including external tasks from plugins
commit b80a91f73e15ddd0b0ce1322d7d121fa14030eed Author: Lorenzo Mammana mammanalorenzo@outlook.it Date: Wed Jun 5 13:04:55 2024 +0000
feat: Allow loading model configurations from other packages
commit 8ef24740dd48a11c97eb627f2fff4aca107fef0d Author: Bo Li drluodian@gmail.com Date: Thu Jun 20 12:11:03 2024 +0000
chore: Remove unused models from lmms_eval package
commit af38885fc2e066f5ea44388f33e07176f836fe28 Author: Bo Li drluodian@gmail.com Date: Thu Jun 20 12:07:09 2024 +0000
chore: Handle ImportError when importing models
Handle the ImportError exception when importing models in the lmms_eval package. This change adds a try-except block to catch the ImportError and print an error message indicating the failed import. This will help with troubleshooting and identifying any issues with the model imports.
commit fce85f1b03ff7043b29dee787c5d17a08dd2687a Merge: dbe63293 d94f83cb Author: Li Bo drluodian@gmail.com Date: Thu Jun 20 20:02:12 2024 +0800
Merge pull request #120 from EvolvingLMMs-Lab/pufanyi/hf_dataset_docs
Add docs for datasets upload to HF
commit dbe63293245a5141fdfd80bda7657c304f6bd32f Author: choiszt ls2001927@sohu.com Date: Thu Jun 20 15:14:21 2024 +0800
update ablation for videomme datasets
commit d94f83cb3f08b61a2c75cc4326e58792100605b3 Author: Li Bo drluodian@gmail.com Date: Thu Jun 20 13:30:59 2024 +0800
Update README.md
commit cab8159ff35db330536c0b6dfb4b0a3b24142209 Author: Li Bo drluodian@gmail.com Date: Thu Jun 20 13:30:29 2024 +0800
Update README.md
commit 45876652a877a8006b828f32f5cc4660629f9190 Author: kcz358 kaichenzhang358@outlook.com Date: Thu Jun 20 03:55:30 2024 +0000
Add llava_hf back to registry
commit 3463651b8c54d36cd94169e3d376f5ed225a195a Author: kcz358 kaichenzhang358@outlook.com Date: Thu Jun 20 03:54:33 2024 +0000
Remove handling non-visual loop in llava
commit cb0d3f49b72790b081f981e0e6147131542f7f68 Author: Fanyi Pu FPU001@e.ntu.edu.sg Date: Thu Jun 20 02:11:18 2024 +0800
update readme
commit 813877bfe5ac590cdbe92dd74d18f83a2091f748 Author: Fanyi Pu FPU001@e.ntu.edu.sg Date: Wed Jun 19 15:37:52 2024 +0800
to sh script
commit a14684b8557d5894976448a5c559ed7a66a6cf16 Author: Fanyi Pu FPU001@e.ntu.edu.sg Date: Wed Jun 19 15:37:04 2024 +0800
lint
commit d0f8851d42ba31f5da2a7a65e91499db45174dbc Author: Fanyi Pu FPU001@e.ntu.edu.sg Date: Wed Jun 19 15:36:48 2024 +0800
small fix
commit 63748e9718f287ad433afc90e340b5e17a89c1ed Author: Fanyi Pu FPU001@e.ntu.edu.sg Date: Wed Jun 19 15:36:43 2024 +0800
small fix
commit 7f1159a1fe04cfb783dc31d4fbdef3bda0ce19e4 Author: Fanyi Pu FPU001@e.ntu.edu.sg Date: Wed Jun 19 15:35:05 2024 +0800
update preparation
commit 19f9bd621c76a483ff98f8c7eb78f64753da683a Author: Fanyi Pu FPU001@e.ntu.edu.sg Date: Wed Jun 19 15:23:24 2024 +0800
docs
commit ce6f889ba02d819979c7922f6336cf4f1f718f65 Author: Fanyi Pu FPU001@e.ntu.edu.sg Date: Wed Jun 19 15:04:16 2024 +0800
tutorial
commit f513c520c2a3dad26d2b2ca5c4ed4db05a493c73 Author: Bo Li drluodian@gmail.com Date: Wed Jun 19 06:51:19 2024 +0000
chore: Update dependencies to fix potential risks and improve compatibility
commit efb529552c5e4ba039a4cba8e9aa5cb7ba65bf90 Author: kcz358 kaichenzhang358@outlook.com Date: Wed Jun 19 10:25:58 2024 +0800
Release llava-wilder
commit 742651fc9daf97e2f57831ed6e6e7ee7ead7d555 Author: Fanyi Pu FPU001@e.ntu.edu.sg Date: Wed Jun 19 07:44:26 2024 +0800
feat: Add support for auto downloading tar format videos
commit 511b6259828212fcba954cdeb8cf90d6e5daabf8 Merge: 22a4958e 050b2c37 Author: Bo Li drluodian@gmail.com Date: Tue Jun 18 17:01:03 2024 +0000
Merge branch 'main' of https://github.com/EvolvingLMMs-Lab/lmms-eval
commit 050b2c370017e9b97475dd6cf01fd051b5ca5c86 Merge: 74facb41 ef306512 Author: Li Bo drluodian@gmail.com Date: Tue Jun 18 13:13:38 2024 +0800
Merge pull request #114 from zjysteven/add-tinyllava
add tinyllava
commit ef306512e5135f76dffa383f600b8733015836e8 Author: Jingyang Zhang jingyang.zhang@duke.edu Date: Mon Jun 17 17:57:02 2024 -0400
fix typo
commit 9bab67732a4238097725deddf867fb1946ffee40 Merge: dbfb2387 74facb41 Author: Jingyang Zhang jingyang.zhang@duke.edu Date: Sun Jun 16 10:56:05 2024 -0400
Merge branch 'EvolvingLMMs-Lab:main' into add-tinyllava
commit 74facb41a826691dfce4458cf1d8659b34fc5bf5 Merge: 8ba192f9 d5df72de Author: Li Bo drluodian@gmail.com Date: Sun Jun 16 17:59:19 2024 +0800
Merge pull request #118 from teowu/main
Fix the potential risk by PR #117
commit d5df72de2d03108d6b365818ecc3551ac9aa6302 Merge: 5bf59ed2 8ba192f9 Author: Teo (Timothy) Wu Haoning 38696372+teowu@users.noreply.github.com Date: Sun Jun 16 15:32:13 2024 +0800
Merge branch 'EvolvingLMMs-Lab:main' into main
commit 5bf59ed250da98a408a94e214a73caa400cba842 Author: teowu realtimothyhwu@gmail.com Date: Sun Jun 16 07:27:28 2024 +0000
fix #117, allow auto download with tar format videos
commit 98b3955cb808e36303c030aea78eb037d1ec59ce Merge: a056f118 be9dada8 Author: teowu realtimothyhwu@gmail.com Date: Sun Jun 16 07:25:07 2024 +0000
Merge branch 'main' of https://github.com/teowu/lmms-eval into main
commit a056f118704eccec86ce32ab86981ce4bc1e1deb Author: teowu realtimothyhwu@gmail.com Date: Sun Jun 16 07:23:54 2024 +0000
fix #117, allow auto download with tar format videos
commit 8ba192f94edf5d99598983445d5faa4f8807c49f Merge: 7cc28907 be9dada8 Author: Li Bo drluodian@gmail.com Date: Sat Jun 15 17:30:59 2024 +0800
Merge pull request #117 from teowu/main
LongVideoBench for LMMs-Eval
commit be9dada8b4189c53c08e1674ab273242cf2f80a0 Merge: 62ea8ceb 7cc28907 Author: Teo (Timothy) Wu Haoning 38696372+teowu@users.noreply.github.com Date: Sat Jun 15 16:39:20 2024 +0800
Merge pull request #1 from EvolvingLMMs-Lab/main
Merge pull request #113 from teowu/main
commit 62ea8ceb223ef2b51ebab2bcd50d5cf339c35cfe Author: teowu realtimothyhwu@gmail.com Date: Sat Jun 15 08:30:11 2024 +0000
LongVideoBench support: image LMMs (idefics2, phi3) and video LMMs (LLaVA-Next-Video-34B)
commit 7cc28907edbb4eb58ee1398772a48110ea35dd96 Merge: 4bc7224d ea14cd4b Author: Li Bo drluodian@gmail.com Date: Sat Jun 15 14:10:22 2024 +0800
Merge pull request #113 from teowu/main
Q-Bench, Q-Bench2, A-Bench
commit dbfb23873979f789477f4797ee2d6071e0fd921e Author: Jingyang jingyang.zhang@duke.edu Date: Fri Jun 14 16:20:42 2024 -0400
add tinyllava
commit ea14cd4b361f4c95b3665cbdb95bc51754090eb5 Author: teowu realtimothyhwu@gmail.com Date: Fri Jun 14 15:01:52 2024 +0000
Add qbench, qbench2, abench; fix phi3v as its current implementation does not support multi-image
commit 4bc7224dcd27fe8b288bfc3fed4d7a9da9635658 Merge: 2797987f bf14cb85 Author: Li Bo drluodian@gmail.com Date: Fri Jun 14 02:14:43 2024 +0800
Merge pull request #111 from XinrunDu/main
add II-Bench
commit bf14cb8527b2b7ac438a36567a875168bc02d294 Author: XinrunDu duxinrun2000@gmail.com Date: Thu Jun 13 09:37:02 2024 +0000
fix dataset_path
commit 6248113f4e11a0ac396d31fa1b032a142fea8cb4 Author: XinrunDu duxinrun2000@gmail.com Date: Thu Jun 13 09:32:06 2024 +0000
add II-Bench
commit 2797987f5b88b87bd172714b678a75a1d8051826 Merge: 63d82f1f 66d4bb2d Author: Li Bo drluodian@gmail.com Date: Thu Jun 13 11:14:47 2024 +0800
Merge pull request #109 from EvolvingLMMs-Lab/pufanyi/update_version
[Small Update] Update the version of LMMs-Eval
commit 66d4bb2d9c9afbbdea40196d4ad80e214d0b14b6 Author: Fanyi Pu FPU001@e.ntu.edu.sg Date: Thu Jun 13 11:13:00 2024 +0800
update version
commit 63d82f1ff11eb430d91a15d6788a1f0b4d596850 Author: Li Bo drluodian@gmail.com Date: Thu Jun 13 11:04:32 2024 +0800
Update README.md
commit 44a33799671cb668f55366d5e5a4ddb051a3a1b4 Merge: 5ed00356 0ce46d08 Author: Li Bo drluodian@gmail.com Date: Thu Jun 13 04:00:12 2024 +0800
Merge pull request #105 from tianyu-z/main
Include VCR
commit 0ce46d088e473d12d63de44f17c67dceab25658c Author: Suyuchen suyuchen.wang@umontreal.ca Date: Wed Jun 12 15:56:34 2024 -0400
update README.md
commit 46a88d8b0199ed44d2ff459fb372f2e006960cea Merge: 47b13b9b 5ed00356 Author: Suyuchen suyuchen.wang@umontreal.ca Date: Wed Jun 12 15:50:26 2024 -0400
merged readme.md
commit 47b13b9b320d36ac53b3622557e31239f7c22621 Author: Suyuchen suyuchen.wang@umontreal.ca Date: Wed Jun 12 15:30:52 2024 -0400
update aggregation function for vcr_wiki
commit 5ed00356676cf5d0ff056cf27d1b519b8e303ff7 Author: Li Bo drluodian@gmail.com Date: Thu Jun 13 03:21:42 2024 +0800
Update README.md
commit ed8806839db5988ced672bd162b7b046edb4863a Author: Li Bo drluodian@gmail.com Date: Thu Jun 13 03:13:59 2024 +0800
Update README.md
commit fea3806026932a6e2bd6e538bcc413e33abdf245 Merge: d99a24ab 05dc8e85 Author: Li Bo drluodian@gmail.com Date: Thu Jun 13 03:11:49 2024 +0800
Merge pull request #108 from EvolvingLMMs-Lab/internal_main_dev
[Upgrade to v0.2] Embracing Video Evaluations with LMMs-Eval
commit 05dc8e853eab7c6bc782a1e2662d2efe7422f767 Author: Bo Li drluodian@gmail.com Date: Wed Jun 12 15:56:04 2024 +0000
chore: Update lmms-eval to support video evaluations for LLaVA models
commit cbeee20bc4ffb510a2b23d96cdaf4077be7c2a9e Author: Bo Li drluodian@gmail.com Date: Wed Jun 12 15:50:30 2024 +0000
chore: Update lmms-eval to support video evaluations for LLaVA models
commit f00d5498b69dd4f7e54c907ac906abc7c128f000 Author: Bo Li drluodian@gmail.com Date: Wed Jun 12 15:46:33 2024 +0000
Update image alignment in README.md
commit 34156335db74cef9e3f0915d7172fd6b22456c15 Author: Bo Li drluodian@gmail.com Date: Wed Jun 12 15:43:16 2024 +0000
Update llava conv_template in lmms_eval/models/llava.py
commit 50575a950736bc8fc1e191310314cbb5fdff5720 Author: Bo Li drluodian@gmail.com Date: Wed Jun 12 15:39:03 2024 +0000
chore: Update lmms-eval to support video evaluations for LLaVA models
commit c9b2252fb8a15dd04252af5e6b4613855afd6ada Author: Bo Li drluodian@gmail.com Date: Wed Jun 12 15:33:48 2024 +0000
Bump version to 0.2.0.dev0
commit 465bd4205e8097e9c037b24a3ed08dd6a7694efa Merge: e43bd840 d99a24ab Author: Bo Li drluodian@gmail.com Date: Wed Jun 12 15:04:25 2024 +0000
Merge branch 'main' of https://github.com/EvolvingLMMs-Lab/lmms-eval into internal_main_dev
commit e43bd840b63eb499856e36d9d2ba45c924abcead Author: Bo Li drluodian@gmail.com Date: Wed Jun 12 14:54:06 2024 +0000
chore: Remove unnecessary files and code related to live_bench and sft_eval tasks
commit d99a24abd06df10d07e5a4d0ad5030613f92f2e7 Merge: 374590be a66003be Author: Li Bo drluodian@gmail.com Date: Wed Jun 12 19:45:57 2024 +0800
Merge pull request #107 from AtsuMiyai/new_task/upd_update
update gpt-3.5-turbo version
commit a66003befe4175824a1be6ed59f5f5b88c15f792 Author: AtsuMiyai miyai.atsuyuki.practice@gmail.com Date: Wed Jun 12 17:05:17 2024 +0900
update gpt-3.5-turbo version
commit ee91f272985f32eeb9cd6faa41afdd8eb49cac30 Author: AtsuMiyai miyai.atsuyuki.practice@gmail.com Date: Wed Jun 12 16:50:53 2024 +0900
update gpt-3.5-turbo version
commit 326b9694fc77398592b8caf3ba0bc2e2bb903813 Author: tianyu-z zhangtianyupro@gmail.com Date: Mon Jun 10 20:07:40 2024 -0400
include std and confidence interval
commit cd050d4a721d01a2ace0cd030cf7f8dc67eb8c4d Author: Suyuchen suyuchen.wang@umontreal.ca Date: Mon Jun 10 18:49:47 2024 -0400
update vcr_wiki tasks in README.md
commit 205721e0aad76dde30255e56149bbed121883356 Author: Suyuchen suyuchen.wang@umontreal.ca Date: Mon Jun 10 18:43:15 2024 -0400
update vcr_wiki tasks
commit db8e718b502469e8536ee359c5559de87635ffc7 Author: tianyu-z zhangtianyupro@gmail.com Date: Mon Jun 10 16:13:58 2024 -0400
include the try-except logic for spacy
commit 427dabb790118f538b64e4e5bf6a7aab9689b3d9 Author: Suyuchen suyuchen.wang@umontreal.ca Date: Mon Jun 10 15:51:05 2024 -0400
add crossed_text to vcr_wiki output
commit 043b483eb55f7be4fea75c9bc0b9b03d251b109b Author: tianyu-z zhangtianyupro@gmail.com Date: Mon Jun 10 15:47:00 2024 -0400
switch logic
commit e1f04db8f58dd10591fde335ea13f74cda7c79bd Author: tianyu-z zhangtianyupro@gmail.com Date: Mon Jun 10 02:38:21 2024 -0400
modify the form of VCR
commit 96e8d9867c9549ab7490f4b12cfeb6a06238e0aa Author: tianyu-z zhangtianyupro@gmail.com Date: Mon Jun 10 00:10:30 2024 -0400
init include vcr
commit 374590be62f988a76cf6704cfe394cd8ae7d4cb6 Merge: 504685e2 cb3b9ce7 Author: Kaichen Zhang - NTU kaichenzhang358@outlook.com Date: Fri Jun 7 20:25:48 2024 +0800
Merge pull request #101 from Gumpest/main
Update conbench in README
commit 504685e20b17659b913cf46f3012c16bf429e09d Author: Li Bo drluodian@gmail.com Date: Thu Jun 6 15:42:15 2024 +0800
Update README.md
commit cb3b9ce71411da862ff01342a9122a3c656ffbd1 Merge: c9793b38 67b64ea4 Author: Yuan Zhang 56063339+Gumpest@users.noreply.github.com Date: Thu Jun 6 11:22:24 2024 +0800
Merge branch 'EvolvingLMMs-Lab:main' into main
commit c9793b3883714f254a700230b7bee781d6110e73 Author: Yuan Zhang gump_well_done@163.com Date: Thu Jun 6 11:21:05 2024 +0800
update README
commit 67b64ea44a5a39d96c7a196a8a8345a7486bd912 Merge: 8ee7848a 5fd68451 Author: Li Bo drluodian@gmail.com Date: Wed Jun 5 23:12:58 2024 +0800
Merge pull request #100 from Gumpest/main
add Conbench
commit 5fd684515c55ef643726c1b6c720c7cbd2183ba1 Author: Yuan Zhang gump_well_done@163.com Date: Wed Jun 5 21:52:31 2024 +0800
add conbench
commit 8ee7848aaa6383aa1f919c3f21199c81db3fff89 Merge: 747e1978 6fefaf7c Author: Li Bo drluodian@gmail.com Date: Tue Jun 4 17:09:33 2024 +0800
Merge pull request #95 from AtsuMiyai/new_task/upd
add MM-UPD
commit 747e19782996065cdce7157ee8c5e15beb5b6c59 Merge: 4854a34d 05843072 Author: Li Bo drluodian@gmail.com Date: Tue Jun 4 17:09:04 2024 +0800
Merge pull request #97 from CaraJ7/update
Add MathVerse in README.md
commit 6fefaf7cea504e35583ee7217449da290295a7a4 Author: AtsuMiyai miyai.atsuyuki.practice@gmail.com Date: Tue Jun 4 17:36:39 2024 +0900
update utils.py for leaderboard submission
commit 5f4fe360def1c48ea0cb1da6409d192784882308 Author: AtsuMiyai miyai.atsuyuki.practice@gmail.com Date: Sun Jun 2 23:28:27 2024 +0900
slightly change query_prompt for the reproduction
commit 05843072d608b970bcada1cd0db65a3c80864060 Author: CaraJ7 1350074492@qq.com Date: Sun Jun 2 17:05:28 2024 +0800
Add MathVerse in README.md
commit 0581ab3cfb362e2024988b46fbbb00324f1233c9 Author: AtsuMiyai miyai.atsuyuki.practice@gmail.com Date: Fri May 31 16:09:45 2024 +0900
merge model_specific_prompt_kwargs and dataset_name into each task yaml
commit 4854a34d4d37efb5e201f2691ecdb054590cf20b Author: Pu Fanyi FPU001@e.ntu.edu.sg Date: Sat May 4 19:23:39 2024 +0800
Group MMMU images into one image (#83)
* update
* update font
* Add matplotlib.font_manager import in utils.py
* Refactor font handling in add_order_label function in utils.py
* group mmmu
---------
Co-authored-by: Li Bo <drluodian@gmail.com>
commit d224794c49520f4d28a31862cf977198cd6cbc5e Author: AtsuMiyai miyai.atsuyuki.practice@gmail.com Date: Wed May 29 15:15:59 2024 +0900
add upd
commit 453e7936424220f02b99517059ca71babfbe5f5a Author: AtsuMiyai miyai.atsuyuki.practice@gmail.com Date: Wed May 29 15:03:30 2024 +0900
add upd
commit 909edd6769ddcf8a546be4fdd129416687516878 Author: AtsuMiyai miyai.atsuyuki.practice@gmail.com Date: Wed May 29 12:52:21 2024 +0900
add upd
commit 7c1ac9706cafc4801fa4da181d2f610b7838c7b8 Author: AtsuMiyai miyai.atsuyuki.practice@gmail.com Date: Wed May 29 12:50:32 2024 +0900
add upd
commit 811301c5280ddd74986645086f026ab730c8848c Author: AtsuMiyai miyai.atsuyuki.practice@gmail.com Date: Wed May 29 12:46:58 2024 +0900
add upd
commit 71401bafd1d515f704f86ab4817a758542bc4672 Author: AtsuMiyai miyai.atsuyuki.practice@gmail.com Date: Wed May 29 12:41:21 2024 +0900
add upd
commit 24dc435908d921e9f1a5706e3141b12e5d838d18 Author: Bo Li drluodian@gmail.com Date: Mon May 27 10:17:32 2024 +0000
fix compatibility issue of older version llava
commit 616edf43731415b35f0f5e97748ed2e017a2891d Author: Bo Li drluodian@gmail.com Date: Mon May 27 09:32:26 2024 +0000
[Fix] import issues of multilingual llava and olympiadbench
commit 4c5a99e21a63fb0ee1c7d15546d18066e1d9894b Merge: 45c05b2b b05c3e22 Author: Li Bo drluodian@gmail.com Date: Mon May 27 14:19:53 2024 +0800
Merge pull request #87 from vfragoso/vifragos/phi3v
Adding microsoft/Phi-3-vision-128k-instruct model.
commit b05c3e222fabd308dd7af4e04c1c6a0812962fe6 Author: Victor Fragoso victor.fragoso@microsoft.com Date: Fri May 24 16:36:37 2024 +0000
Adding documentation of Phi3v class.
commit c2008971308ce8168d57c24d00b725832f099244 Author: Victor Fragoso victor.fragoso@microsoft.com Date: Fri May 24 16:25:02 2024 +0000
Adding prompt arguments for Phi3v on MathVista-TestMini
commit 7f9fb6bcc6cd24a7b8011b8753d0ea98cc2451fd Author: Victor Fragoso victor.fragoso@microsoft.com Date: Fri May 24 13:24:16 2024 +0000
Adding Phi3v model.
commit 45c05b2b2bece76e06849a52a0d034f9c0ac2367 Author: kcz358 kaichenzhang358@outlook.com Date: Thu May 23 03:47:36 2024 +0000
Set printing info for llava_hf to debug level
commit 53f013ed8278776551ca992562253387cc9968d2 Author: kcz358 kaichenzhang358@outlook.com Date: Thu May 23 03:41:39 2024 +0000
Fix pope random name in pope full
commit 22520a95f13334b75eee0cf0387151067a6bf516 Author: kcz358 kaichenzhang358@outlook.com Date: Thu May 23 03:41:14 2024 +0000
Add separated pope tasks by category
commit d1eefb1565014b47287ffa6b350229062f8f602f Author: kcz358 kaichenzhang358@outlook.com Date: Thu May 9 08:36:02 2024 +0000
Update gitignore
commit b2b4dbd2dc13432c79208db35abf7f55c97f1790 Author: kcz358 kaichenzhang358@outlook.com Date: Mon May 20 07:45:11 2024 +0000
Comment out Spice in caption task so that don't need to download stanford nlp model
commit 662f05ce4c62a46a83f819d3a5925a9bd20059b5 Author: kcz358 kaichenzhang358@outlook.com Date: Mon May 20 03:13:13 2024 +0000
Comment out parse result in xcomposer
commit 09329322916bfbb604d72ddaf50441a0947f8805 Author: kcz358 kaichenzhang358@outlook.com Date: Thu May 16 03:55:39 2024 +0000
Fix instructblip qformer size mismatch and multi-images problem
commit 557a6a3b15e07e506bc05e2cc76ff6a2f8c93964 Author: kcz358 kaichenzhang358@outlook.com Date: Thu May 16 03:11:41 2024 +0000
Remove redundant code in fuyu
commit 6aeb5504e74ed1980b53700d8e4d4dcf7d1b38fc Author: kcz358 kaichenzhang358@outlook.com Date: Thu May 16 01:45:24 2024 +0000
Fix idefics2 llava in the wild bugs
commit aea80e6a71f716951353e1e5d68380243396b4d6 Author: kcz358 kaichenzhang358@outlook.com Date: Wed May 15 11:07:35 2024 +0000
Better task list_with_num
commit 3c12a080d66b9c38f615b961befca7c30f82fa39 Author: Li Bo drluodian@gmail.com Date: Sat May 18 02:35:52 2024 +0800
Update LICENSE
commit 82317a635a4978b32e095a06cc295d0ae23661c2 Author: Li Bo drluodian@gmail.com Date: Sat May 18 02:29:09 2024 +0800
Update LICENSE
commit a8bba1cdb51061a0d27bf9a98cca1505b5c58ea5 Author: Li Bo drluodian@gmail.com Date: Sat May 18 02:28:03 2024 +0800
Create LICENSE
commit caa5893b5fd2c1d32c72b97f371ccd9a8d9ec3a0 Merge: c0944486 423b0060 Author: Li Bo drluodian@gmail.com Date: Mon May 13 11:45:26 2024 +0800
Merge pull request #73 from EvolvingLMMs-Lab/kc/qwen_vl_api
[Feat] Add qwen vl api
commit c09444860362a136f17641f8b2a1f91c2bbc3715 Author: kcz358 kaichenzhang358@outlook.com Date: Sat May 11 06:11:19 2024 +0000
Fix llava_hf image tokens number issue
commit 64f07e497f53e5bcbe9e8fb5830cc7a1daaf7ff1 Author: kcz358 kaichenzhang358@outlook.com Date: Thu May 9 02:04:10 2024 +0000
Fix endless warning for llava_hf generation
commit 8aaa828108da8514dd9cd23a9d6d83a8b67f2d65 Author: Bo Li drluodian@gmail.com Date: Thu May 2 06:13:56 2024 +0000
Add model_name parameter to Llava constructor
commit 7847dc4d8efe60605102414bb071b1da9851228e Author: kcz358 kaichenzhang358@outlook.com Date: Tue May 7 03:15:59 2024 +0000
Parse result for llava_hf 1.6
commit 3e56b4f92db39a2ce92903b0c43a34f1d14d59ec Author: kcz358 kaichenzhang358@outlook.com Date: Tue May 7 03:09:56 2024 +0000
Fix llava_hf generation for 1.6
commit fa3ff92b07ea5aaa633a2039818c310744f84d07 Author: kcz358 kaichenzhang358@outlook.com Date: Mon May 6 08:32:57 2024 +0000
Fix llava conv template for llama3
commit 423b00606aa77fd6b324c19e3d480b73ab852db6 Author: kcz358 kaichenzhang358@outlook.com Date: Sun May 5 07:54:52 2024 +0000
Add qwen vl api
commit b7fd7a9f7aa3c0e1e50374047dfffc46a7462b90 Merge: 986139a9 c5a130b6 Author: Li Bo drluodian@gmail.com Date: Sun May 5 13:19:48 2024 +0800
Merge pull request #59 from EvolvingLMMs-Lab/add_idefics2
add idefics2
commit 986139a9a31154679bdea029b09639f84712db27 Merge: b46239ca 8d3526c0 Author: Li Bo drluodian@gmail.com Date: Fri May 3 01🔞18 2024 +0800
Merge pull request #36 from cocoshe/main
[Fix] repr llava doc
commit b46239cabab7b545ec99d9eae6c851e531b18374 Merge: bc69a744 373265f2 Author: Li Bo drluodian@gmail.com Date: Fri May 3 01:17:34 2024 +0800
Merge pull request #56 from gagan3012/main
Multilingual LLava bench
commit bc69a744d2cffeb06eba62e843bcc7869e27613a Merge: eef3aeb6 626e8a91 Author: Li Bo drluodian@gmail.com Date: Fri May 3 01:12:14 2024 +0800
Merge pull request #70 from hunterheiden/hsh/new_task/WebSRC
Bugfix: WebSRC should be token-level F1 NOT character-level
commit 626e8a91a4af2dd5dd774fc130cc2f4d74b2bc37 Author: Hunter Heidenreich hunter.heidenreich@rootsautomation.com Date: Thu May 2 09:31:03 2024 -0400
Bugfix: WebSRC should be token-level F1 NOT character-level
commit eef3aeb6ab589bb1d5045af5b5c1984a69402d19 Merge: c4e9dd9f 9bca4413 Author: Li Bo drluodian@gmail.com Date: Thu May 2 14:38:17 2024 +0800
Merge pull request #69 from hunterheiden/hsh/new_task/WebSRC
[New Task] WebSRC (multimodal Q&A on web screenshots)
commit 9bca441376325173128e5c50087f068e519c48da Author: Hunter Heidenreich hunter.heidenreich@rootsautomation.com Date: Wed May 1 11:07:29 2024 -0400
Add code to enable compilation of submission for WebSRC test split
commit 7687495b1ed552eeba088cb9ad5aaf1170e7fff9 Author: Hunter Heidenreich hunter.heidenreich@rootsautomation.com Date: Wed May 1 10:47:32 2024 -0400
Draft and validate websrc eval on dev split
commit 4eebd3e5d7ab3b8c3116eea57318db72d2ce32bb Author: Hunter Heidenreich hunter.heidenreich@rootsautomation.com Date: Wed May 1 10:46:54 2024 -0400
Update main README with new task names
commit 35fe80b67656114a8824eb59574089663bdc4c9a Author: Hunter Heidenreich hunter.heidenreich@rootsautomation.com Date: Wed May 1 10:46:20 2024 -0400
Draft README for WebSRC
commit 955bd0635cc6c14a96ad869f1002e6dbefdc5071 Author: Hunter Heidenreich hunter.heidenreich@rootsautomation.com Date: Tue Apr 30 10:16:21 2024 -0400
Init webSRC
commit c4e9dd9f6e40e8586587c4a75987aa109a37f14b Merge: d8a3a99f 319afccb Author: Li Bo drluodian@gmail.com Date: Fri Apr 26 14:37:22 2024 +0800
Merge pull request #63 from hunterheiden/hsh/new_task/screenspot
New Task: ScreenSpot - Grounding (REC) and instruction generation (REG) on screens
commit 319afccbe713ddf40a8a6fa28501e64c0ad34725 Author: Hunter Heidenreich hunter.heidenreich@rootsautomation.com Date: Thu Apr 25 11:44:34 2024 -0400
slight update
commit 2f3811ca1bbad6a441016b05fde09a571900fca8 Author: Hunter Heidenreich hunter.heidenreich@rootsautomation.com Date: Thu Apr 25 11:41:04 2024 -0400
Add README file specific to ScreenSpot
commit 28962cbe83631ec5d6481aaea4907a7c96fec848 Author: Hunter Heidenreich hunter.heidenreich@rootsautomation.com Date: Wed Apr 24 11:52:33 2024 -0400
Update README to reflect new tasks
commit e457cfb4f2d6869e8367d6d5b03ad25ee4acc363 Author: Hunter Heidenreich hunter.heidenreich@rootsautomation.com Date: Tue Apr 23 18:33:16 2024 -0400
Create ScreenSpot on clean branch
commit d8a3a99ff6142fe101fa3c188cc7f29593c44345 Merge: 3dcd0158 ed171293 Author: Li Bo drluodian@gmail.com Date: Tue Apr 23 10:34:03 2024 +0800
Merge pull request #61 from tupini07/patch-1
Fix typo in Qwen-VL that was causing "reference before assignment"
commit ed171293d1e82075c5c6a847fc91ecbfd45cf89f Author: Andrea Tupini tupini07@gmail.com Date: Mon Apr 22 14:56:41 2024 -0600
refactor query construction for clarity
commit cd874201c46f32a2903ddffae85f9db73e14adfd Author: Andrea Tupini tupini07@gmail.com Date: Mon Apr 22 14:54:29 2024 -0600
convert contexts to list if necessary and remove unnecessary construction of `questions`
commit 85573674e90c8d505312ba18c5102e0051255078 Author: Andrea Tupini tupini07@gmail.com Date: Mon Apr 22 14:47:33 2024 -0600
Fix typo in qwen_vl that was causing "reference before assignment"
commit 3dcd01582b719555bcf8eb25d91cc5e42abd2c5f Merge: 95df9fee 743673a1 Author: Li Bo drluodian@gmail.com Date: Sat Apr 20 22:03:16 2024 +0800
Merge pull request #60 from CaraJ7/main
Add MathVerse
commit 743673a1419b6e729e18c96f148745cc739d4c71 Merge: c1a54721 95df9fee Author: CaraJ7 1350074492@qq.com Date: Sat Apr 20 21:49:02 2024 +0800
Merge branch 'main' of https://github.com/EvolvingLMMs-Lab/lmms-eval
commit c1a5472135c3b84061b64d997ab50dda0412ba4f Author: CaraJ7 1350074492@qq.com Date: Sat Apr 20 21:45:34 2024 +0800
Add MathVerse
commit 373265f24e7a89cbd49ab724a2e388cc0930be78 Author: Gagan Bhatia 49101362+gagan3012@users.noreply.github.com Date: Fri Apr 12 17:21:39 2024 -0700
Add files via upload
commit d8530514a5ef9378d2adeaceb228b60ec25a6718 Author: Gagan Bhatia 49101362+gagan3012@users.noreply.github.com Date: Fri Apr 12 17:19:49 2024 -0700
Create README.md
commit 22a4958e993463edff352ac033014f9a485706cc Author: Bo Li bo.li01@bytedance.com Date: Thu Apr 4 17:12:43 2024 +0000
[WIP] adding mmbench dev evaluation (#75)
* WIP
* Update GPT evaluation model name and sys prompt
* 🛠️ Scale accuracy to percentage
The accuracy value is now multiplied by 100 in the aggregation function to represent it as a percentage. Regarding the evaluation process, `math` module importation and refactoring reduce progress log verbosity by logging every 100 evaluations instead of 10. It prevents potential logging overflow. Handling of NaN values is added to ensure 'default_value' is set in case of missing data, avoiding errors in split, category, and l2-category assignments. Finally, reporting of categorical and l2-categorical accuracies is streamlined through a new `calculate_hit_rates` function, improving code readability and maintenance.
Issue refs: #1427, #1533
* Update GPT evaluation model name and API configuration
* Refactor MMBench_Evaluator class to handle missing columns
* Add print statements for detailed results in MMBench-CN(CC), MMBench-CN(Dev), and MMBench-EN(Dev) evaluations
* Refactor MMBench-CN and MMBench-EN evaluation functions
* 🔄 Refactor result processing and logging logic
- Simplified the result processing functions across different utility modules (`cc_utils.py`, `cn_utils.py`, `en_utils.py`) to unify the handling of multiple-choice options. Now, all options ("A" to "E") are dynamically added to the result data, and default to "nan" if not provided in the document.
- Removed redundant keys directly from the process results dict creation to avoid clutter and align with the new dynamic addition of options.
- In `mmbench_evals.py`, removed the unnecessary check for all splits being 'dev' and streamlined the evaluation loop by eliminating the progress bar (tqdm) for a cleaner log output.
- Commented-out code and verbose logging during evaluation, which may have interfered with performance, has been removed for a more efficient and less intrusive logging experience.
This cleanup reduces redundancy in the codebase and improves evaluation performance.
Refs #2045
---------
Co-authored-by: Bo Li <bo.li01@bytedance.com>
(cherry picked from commit a19278c2ea6ddcbca64d3cc7f4efec7fe5775121)
commit 8d3526c0869f0ad7747ff6bb02441140792b461c Author: cocoshe 1228759711@qq.com Date: Thu Mar 28 13:38:36 2024 +0800
fix doc
- feat: Add LlavaOneVision model to available models
chore: Update sqlitedict dependency to version 2.1.0
- Revert "Squashed commit of the following:"
This reverts commit 11b00999df3c43cb225482e030b791b2d454124c.
- Refactor available models in lmms_eval
Remove duplicate entries for "llava_hf", "llava_onevision", and "longva" in the AVAILABLE_MODELS dictionary in lmms_eval/models/init.py.
- fix: Handle import errors in lmms_eval models/init.py
The code changes in this commit fix the handling of import errors in the lmms_eval/models/init.py file. Previously, when an import error occurred, the code simply ignored it. This commit updates the code to log an error message using the logger module when an import error occurs.
This commit also removes duplicate entries for "llava_hf", "llava_onevision", and "longva" in the AVAILABLE_MODELS dictionary.
Recent user commits:
- Refactor available models in lmms_eval
- Revert "Squashed commit of the following:"
- feat: Add LlavaOneVision model to available models
- chore: Update sqlitedict dependency to version 2.1.0
fix: Handle import errors in lmms_eval models/init.py
chore: Remove unused imports in lmms_eval/models/init.py and lmms_eval/tasks/vcr_wiki/utils.py
Remove unused imports in lmms_eval/tasks/vcr_wiki/utils.py
chore: Update lmms_eval/tasks/vcr_wiki/utils.py
This commit updates the lmms_eval/tasks/vcr_wiki/utils.py
file. It removes unused imports and fixes the condition for loading Spacy models based on the load_package
value in the config file. Additionally, it adds a debug log message when the Spacy models are not loaded due to load_package
being set to False.
Remove unused imports in lmms_eval/tasks/vcr_wiki/utils.py
- feat: Add new subtasks to overall score calculation
The code changes in this commit add new subtasks to the overall score calculation in the overall_score
function. The subtasks "ScanQA", "BLINK", "MathVerse", "SciVerse", and "Mantis" are included in the categories
dictionary. This ensures that the scores for these subtasks are calculated and included in the evaluation results.
Remove unused imports and update subtask categories in utils.py
feat: Add new subtasks to overall score calculation
chore: Update lmms_eval/tasks/llava_interleave_bench/_default_template_interleave_yaml
Update the image aspect ratio in the default template for the llava_interleave_bench task. Change the value of "image_aspect_ratio" from "original" to "pad". This ensures that the generated images have a padded aspect ratio.
if no response directly return 0
Squashed commit of the following:
commit b2a009b6bbf8353172f5a1dd9c29ea1f67610c02 Author: Pu Fanyi FPU001@e.ntu.edu.sg Date: Mon Jul 15 19:12:25 2024 -0700
if no response directly return 0 (#142)
commit 5fc5f2f5acf454fc99448b0d62eb52b4bffba0d5 Author: Kaichen Zhang - NTU kaichenzhang358@outlook.com Date: Tue Jul 16 10:12:11 2024 +0800
Add Muirbench (#143)
* handle gen kwargs in internvl2
* Add muirbench
- Add files via upload
(cherry picked from commit 557083a156c3dd67ac79e22b4202e9b69b6b00f4)
- update
Co-authored-by: Fanyi Pu FPU001@e.ntu.edu.sg Co-authored-by: Yan Shu 570533048@qq.com
Fix llava onevision loglikelihood video bug
LiveBench July (#146)
claude auto detect json mode
extract information
use claude to generate
fix bugs
fix
generate data
chore: Update dataset name and version for live_bench task
gpt-4-turbo => gpt-4o
chore: Update dataset capture settings in create_dataset.py
everything use gpt-4o
websites
livebench_july
Refactor code to simplify data assignment in example.ipynb
chore: Update dataset name for live_bench task
Add xcomposer2d5 from fanyi, revise something for better usage (#145)
internvl2
fix some bugs
fix
lint
feat: Add XComposer2D5 model to AVAILABLE_MODELS
xcomposer
Fix llava vid error when using public
Fix xcomposer2d5
Add generation tokens
Co-authored-by: Fanyi Pu FPU001@e.ntu.edu.sg
Dev/ov evals (#147)
fix doc
[WIP] adding mmbench dev evaluation (#75)
WIP
Update GPT evaluation model name and sys prompt
🛠️ Scale accuracy to percentage
The accuracy value is now multiplied by 100 in the aggregation function to represent it as a percentage. Regarding the evaluation process, math
module importation and refactoring reduce progress log verbosity by logging every 100 evaluations instead of 10. It prevents potential logging overflow. Handling of NaN values is added to ensure 'default_value' is set in case of missing data, avoiding errors in split, category, and l2-category assignments. Finally, reporting of categorical and l2-categorical accuracies is streamlined through a new calculate_hit_rates
function, improving code readability and maintenance.
Issue refs: #1427, #1533
Update GPT evaluation model name and API configuration
Refactor MMBench_Evaluator class to handle missing columns
Add print statements for detailed results in MMBench-CN(CC), MMBench-CN(Dev), and MMBench-EN(Dev) evaluations
Refactor MMBench-CN and MMBench-EN evaluation functions
🔄 Refactor result processing and logging logic
- Simplified the result processing functions across different utility modules (
cc_utils.py
,cn_utils.py
,en_utils.py
) to unify the handling of multiple-choice options. Now, all options ("A" to "E") are dynamically added to the result data, and default to "nan" if not provided in the document. - Removed redundant keys directly from the process results dict creation to avoid clutter and align with the new dynamic addition of options.
- In
mmbench_evals.py
, removed the unnecessary check for all splits being 'dev' and streamlined the evaluation loop by eliminating the progress bar (tqdm) for a cleaner log output. - Commented-out code and verbose logging during evaluation, which may have interfered with performance, has been removed for a more efficient and less intrusive logging experience.
This cleanup reduces redundancy in the codebase and improves evaluation performance.
Refs #2045
Co-authored-by: Bo Li bo.li01@bytedance.com (cherry picked from commit a19278c2ea6ddcbca64d3cc7f4efec7fe5775121)
Create README.md
Add files via upload
Add MathVerse
Fix typo in qwen_vl that was causing "reference before assignment"
convert contexts to list if necessary and remove unnecessary construction of
questions
refactor query construction for clarity
Create ScreenSpot on clean branch
Update README to reflect new tasks
Add README file specific to ScreenSpot
slight update
Init webSRC
Draft README for WebSRC
Update main README with new task names
Draft and validate websrc eval on dev split
Add code to enable compilation of submission for WebSRC test split
Bugfix: WebSRC should be token-level F1 NOT character-level
Add qwen vl api
Fix llava conv template for llama3
Fix llava_hf generation for 1.6
Parse result for llava_hf 1.6
Add model_name parameter to Llava constructor
Fix endless warning for llava_hf generation
Fix llava_hf image tokens number issue
Create LICENSE
Update LICENSE
Update LICENSE
Better task list_with_num
Fix idefics2 llava in the wild bugs
Remove redundant code in fuyu
Fix instructblip qformer size mismatch and multi-images problem
Comment out parse result in xcomposer
Comment out Spice in caption task so that don't need to download stanford nlp model
Update gitignore
Add separated pope tasks by category
Fix pope random name in pope full
Set printing info for llava_hf to debug level
Adding Phi3v model.
Adding prompt arguments for Phi3v on MathVista-TestMini
Adding documentation of Phi3v class.
[Fix] import issues of multilingual llava and olympiadbench
fix compatibility issue of older version llava
add upd
add upd
add upd
add upd
add upd
add upd
Group MMMU images into one image (#83)
update
update font
Add matplotlib.font_manager import in utils.py
Refactor font handling in add_order_label function in utils.py
group mmmu
Co-authored-by: Li Bo drluodian@gmail.com
merge model_specific_prompt_kwargs and dataset_name into each task yaml
Add MathVerse in README.md
slightly change query_prompt for the reproduction
update utils.py for leaderboard submission
add conbench
update README
Update README.md
init include vcr
modify the form of VCR
switch logic
add crossed_text to vcr_wiki output
include the try-except logic for spacy
update vcr_wiki tasks
update vcr_wiki tasks in README.md
include std and confidence interval
update gpt-3.5-turbo version
update gpt-3.5-turbo version
chore: Remove unnecessary files and code related to live_bench and sft_eval tasks
Bump version to 0.2.0.dev0
chore: Update lmms-eval to support video evaluations for LLaVA models
Update llava conv_template in lmms_eval/models/llava.py
Update image alignment in README.md
chore: Update lmms-eval to support video evaluations for LLaVA models
chore: Update lmms-eval to support video evaluations for LLaVA models
Update README.md
Update README.md
update aggregation function for vcr_wiki
update README.md
Update README.md
update version
add II-Bench
fix dataset_path
Add qbench, qbench2, abench; fix phi3v as its current implementation does not support multi-image
add tinyllava
LongVideoBench support: image LMMs (idefics2, phi3) and video LMMs (LLaVA-Next-Video-34B)
fix #117, allow auto download with tar format videos
fix #117, allow auto download with tar format videos
fix typo
feat: Add support for auto downloading tar format videos
Release llava-wilder
chore: Update dependencies to fix potential risks and improve compatibility
tutorial
docs
update preparation
small fix
small fix
lint
to sh script
update readme
Remove handling non-visual loop in llava
Add llava_hf back to registry
Update README.md
Update README.md
update ablation for videomme datasets
chore: Handle ImportError when importing models
Handle the ImportError exception when importing models in the lmms_eval package. This change adds a try-except block to catch the ImportError and print an error message indicating the failed import. This will help with troubleshooting and identifying any issues with the model imports.
chore: Remove unused models from lmms_eval package
feat: Allow loading model configurations from other packages
feat: Allow including external tasks from plugins
chore: Add loguru for logging in lmms_eval package
Remove unnecessary lines since use batched visuals now in llava
Add longva
Revise model registry for llava_hf and longva
Delete unnecessary lines
Remove unnecessary lines for video llava
Update pyproject.toml
Update activitynetqa_generation.yaml
Fix vid mme post prompt issue
new task gqa-ru
add mmbench_ru_dev
change prompt to ru
create new task vitatecs
Update README.md
Add wild vision 0617
Hardcode to keep image for wild vision
Fixing scoring logic
Fixing dataset name
Fixing handling None filtered score
Add detailcaps
Add install capture_metric in env
Add files via upload
feat: Add tie_weights parameter to Llava model initialization
Upgrade lmms-eval to support more models and evaluation tasks
Upgrade lmms-eval to version 0.2.1
Rename xcomposer 4KHD
chore: Update lmms_eval/models/vila.py and lmms_eval/tasks/init.py
Update utils.py
Update _default_template_vcr_yaml
add process sync via temp file in lmms_eval/evaluator.py
Update utils.py
Update _default_template_vcr_yaml
Add muirbench
Squashed commit of the following:
commit dfdba507b5fbe985b0030ffec575f9f2638bc1ed Author: Li Bo drluodian@gmail.com Date: Tue Jul 16 11:13:52 2024 +0800
merge ov evals (#144)
* chore: Update gpt_eval_model_name to "gpt-3.5-turbo" in mathvista.yaml
* Squashed commit of the following:
commit 994c9f97a2f8db3e9b7d7933d1e1680acde5b70b
Author: Yan Shu <570533048@qq.com>
Date: Mon Jul 8 17:21:23 2024 +0800
Add files via upload
* Squashed commit of the following:
commit e31cd7883d4555c7530795c7f102b8d78cbd372f
Author: Bo Li <drluodian@gmail.com>
Date: Wed Jul 10 12:08:08 2024 +1000
chore: Update lmms_eval/models/vila.py and lmms_eval/tasks/__init__.py
commit 1d8c980d1089f9d7702c3b92d5c85039f2809c6d
Author: kcz358 <kaichenzhang358@outlook.com>
Date: Tue Jul 9 02:08:52 2024 +0000
Rename xcomposer 4KHD
commit 6da76f36ecf5f9aa73057e767a4fcb60c99ff896
Author: Bo Li <drluodian@gmail.com>
Date: Tue Jul 9 11:55:56 2024 +1000
Upgrade lmms-eval to version 0.2.1
commit cd1858523fcd8630082cbefba8710e0de3ee8805
Author: Bo Li <drluodian@gmail.com>
Date: Tue Jul 9 11:52:23 2024 +1000
Upgrade lmms-eval to support more models and evaluation tasks
commit 672d7e5bb49dcb34e1b2fdeb09f3f4588dc583a6
Author: Bo Li <drluodian@gmail.com>
Date: Tue Jul 9 11:43:41 2024 +1000
feat: Add tie_weights parameter to Llava model initialization
commit 2037a86261b55fa42b8ba3a04eab192b3e69d6ea
Merge: e6844db1 a5c18692
Author: Bo Li <drluodian@gmail.com>
Date: Tue Jul 9 11:37:12 2024 +1000
Fix gen kwargs image aspect ratio in internvl2
commit a5c186925de989b616f58a35ece36065a32b4594
Merge: 2ebec77f 557083a1
Author: Li Bo <drluodian@gmail.com>
Date: Tue Jul 9 09:15:56 2024 +0800
Merge pull request #137 from shuyansy/main
add MLVU task
commit 557083a156c3dd67ac79e22b4202e9b69b6b00f4
Author: Yan Shu <570533048@qq.com>
Date: Mon Jul 8 16:56:50 2024 +0800
Add files via upload
commit 2ebec77f5606d79e9a7b995970e32792050606a1
Merge: 211bfede b23d349e
Author: Li Bo <drluodian@gmail.com>
Date: Mon Jul 8 11:53:06 2024 +0800
Merge pull request #136 from Dousia/main
Add detailcaps
commit b23d349e46d60dc149ffaa54d6e019f4996ed92d
Author: ByteDance <bytedance@MacBook-Pro.local>
Date: Sun Jul 7 23:24:19 2024 +0800
Add install capture_metric in env
commit c6e211d5f9dbb7572d3a141b6504cb1ca2007c33
Author: ByteDance <bytedance@MacBook-Pro.local>
Date: Sun Jul 7 23:04:13 2024 +0800
Add detailcaps
commit 211bfedebad243ef82a8b0be36c3b5a9b9cb2f72
Merge: 7c208b76 79514eee
Author: Li Bo <drluodian@gmail.com>
Date: Tue Jul 2 23:05:12 2024 +0800
Merge pull request #133 from EvolvingLMMs-Lab/dev/wild_vision
Add wild vision bench
commit 79514eeebcfd6f655be2a10c776037d12a7b7214
Author: kcz358 <kaichenzhang358@outlook.com>
Date: Mon Jul 1 15:10:02 2024 +0000
Fixing handling None filtered score
commit 725fac2781446958b905e1e6c6eb3c0a8e582e49
Author: kcz358 <kaichenzhang358@outlook.com>
Date: Mon Jul 1 08:25:42 2024 +0000
Fixing dataset name
commit 8d963e132ac03fc0d835d480cfcfcabe72af143c
Author: kcz358 <kaichenzhang358@outlook.com>
Date: Mon Jul 1 08:24:51 2024 +0000
Fixing scoring logic
commit e2990d0a69e876721256fdf946c68ba7ae0cbdc1
Author: kcz358 <kaichenzhang358@outlook.com>
Date: Mon Jul 1 06:06:57 2024 +0000
Hardcode to keep image for wild vision
commit ed381736730d8fb785b4ee919fdb751734ecef25
Author: kcz358 <kaichenzhang358@outlook.com>
Date: Mon Jul 1 06:06:38 2024 +0000
Add wild vision 0617
commit 7c208b76640c986cfe94233dce735c3ca4ad4319
Author: Li Bo <drluodian@gmail.com>
Date: Mon Jul 1 11:53:31 2024 +0800
Update README.md
commit 39d40dea47bc59ff04e8b0cbc445345098debc9a
Merge: e19b43a3 ba7081c0
Author: Li Bo <drluodian@gmail.com>
Date: Mon Jul 1 11:47:09 2024 +0800
Merge pull request #129 from Dannoopsy/mmbench_ru
add task MMBench-ru
commit e19b43a3a1e7212e623061b164b0419cc0dda689
Merge: 11fd7e3f a0de8970
Author: Li Bo <drluodian@gmail.com>
Date: Mon Jul 1 11:46:58 2024 +0800
Merge pull request #128 from Dannoopsy/gqa-ru
add task gqa-ru
commit 11fd7e3fc05908aeb01e4a6161a7b55cd38b3122
Merge: 383e7fea a7522592
Author: Li Bo <drluodian@gmail.com>
Date: Mon Jul 1 11:46:16 2024 +0800
Merge pull request #130 from lscpku/vitatecs
Add task VITATECS
commit a75225926e5954f85466d257f99acf0163fde596
Author: lscpku <lisc99@pku.edu.cn>
Date: Fri Jun 28 20:37:06 2024 +0800
create new task vitatecs
commit ba7081c0abac840002d320e30733e891298dfa11
Author: Dannoopsy <63581325+Dannoopsy@users.noreply.github.com>
Date: Fri Jun 28 12:21:05 2024 +0300
change prompt to ru
commit 27ea9c0055a8abf3a8198829b8617018479918e2
Author: Dannoopsy <belopolskikh.dd@phystech.edu>
Date: Thu Jun 27 17:17:29 2024 +0000
add mmbench_ru_dev
commit 383e7fead3138aedf62e9c0ec48303835ef26e2a
Merge: 06fa000f ed2e7f79
Author: Li Bo <drluodian@gmail.com>
Date: Fri Jun 28 00:14:10 2024 +0800
Merge pull request #126 from lorenzomammana/feature/external-package-integration
External package integration using plugins
commit ed2e7f792151d21bce8f1c498270b9391e1d5c85
Merge: 03947e14 06fa000f
Author: Lorenzo Mammana <mammanalorenzo@outlook.it>
Date: Thu Jun 27 15:38:10 2024 +0000
Merge branch 'main' into feature/external-package-integration
commit a0de89708d5e6f259bb17f0eaace3c5b901b275c
Author: Dannoopsy <belopolskikh.dd@phystech.edu>
Date: Tue Jun 25 11:11:37 2024 +0000
new task gqa-ru
commit 06fa000f60d3e4d160fac8ceb9959ae92a98f752
Author: kcz358 <kaichenzhang358@outlook.com>
Date: Tue Jun 25 06:41:13 2024 +0000
Fix vid mme post prompt issue
commit b388d79e0df6f60068196cb7047453ebd22d6ef1
Author: Li Bo <drluodian@gmail.com>
Date: Sun Jun 23 22:31:16 2024 +0800
Update activitynetqa_generation.yaml
commit 8f9d620fcd9d0a0742ee6bcf51ea63bd6b088a36
Author: Li Bo <drluodian@gmail.com>
Date: Sun Jun 23 14:02:25 2024 +0800
Update pyproject.toml
commit 6341b7c15ce9fb28eb06b067ddb299d6cf2e16c3
Merge: fce85f1b 903b042b
Author: Li Bo <drluodian@gmail.com>
Date: Sun Jun 23 14:02:02 2024 +0800
Merge pull request #125 from EvolvingLMMs-Lab/dev/interleave
[Model] aligned llava-interleave model results on video tasks
commit 903b042be016016d4ebeecb07701f3076a2d323c
Author: kcz358 <kaichenzhang358@outlook.com>
Date: Sat Jun 22 12:07:13 2024 +0000
Remove unnecessary lines for video llava
commit d78ec86407b729a964906a8c2e50704b4bc74d06
Merge: ebe7217a fce85f1b
Author: Li Bo <drluodian@gmail.com>
Date: Sat Jun 22 13:57:31 2024 +0800
Merge branch 'main' into dev/interleave
commit ebe7217a486c1e754e42c2cbdb834e09fbbcc9b0
Author: kcz358 <kaichenzhang358@outlook.com>
Date: Sat Jun 22 02:57:08 2024 +0000
Delete unnecessary lines
commit 120c474b056f9177c74e1fd9691d59e2f234b785
Author: kcz358 <kaichenzhang358@outlook.com>
Date: Fri Jun 21 08:38:41 2024 +0000
Revise model registry for llava_hf and longva
commit 7d6201f921088afd3f52a35076e3c6fcc9aa518c
Author: kcz358 <kaichenzhang358@outlook.com>
Date: Fri Jun 21 08:38:24 2024 +0000
Add longva
commit 12f480699c71a12a24d4349d9b0681933201a3a6
Author: kcz358 <kaichenzhang358@outlook.com>
Date: Fri Jun 21 08:35:39 2024 +0000
Remove unnecessary lines since use batched visuals now in llava
commit 12cea76f1f0f14b1fd1007c9d39a9b0557368637
Author: Bo Li <drluodian@gmail.com>
Date: Thu Jun 20 18:15:32 2024 +0000
chore: Add loguru for logging in lmms_eval package
commit 03947e14a46fd25b412931f7c9c25f4a2971d0b4
Author: Lorenzo Mammana <mammanalorenzo@outlook.it>
Date: Wed Jun 5 13:40:41 2024 +0000
feat: Allow including external tasks from plugins
commit b80a91f73e15ddd0b0ce1322d7d121fa14030eed
Author: Lorenzo Mammana <mammanalorenzo@outlook.it>
Date: Wed Jun 5 13:04:55 2024 +0000
feat: Allow loading model configurations from other packages
commit 8ef24740dd48a11c97eb627f2fff4aca107fef0d
Author: Bo Li <drluodian@gmail.com>
Date: Thu Jun 20 12:11:03 2024 +0000
chore: Remove unused models from lmms_eval package
commit af38885fc2e066f5ea44388f33e07176f836fe28
Author: Bo Li <drluodian@gmail.com>
Date: Thu Jun 20 12:07:09 2024 +0000
chore: Handle ImportError when importing models
Handle the ImportError exception when importing models in the lmms_eval package. This change adds a try-except block to catch the ImportError and print an error message indicating the failed import. This will help with troubleshooting and identifying any issues with the model imports.
commit fce85f1b03ff7043b29dee787c5d17a08dd2687a
Merge: dbe63293 d94f83cb
Author: Li Bo <drluodian@gmail.com>
Date: Thu Jun 20 20:02:12 2024 +0800
Merge pull request #120 from EvolvingLMMs-Lab/pufanyi/hf_dataset_docs
Add docs for datasets upload to HF
commit dbe63293245a5141fdfd80bda7657c304f6bd32f
Author: choiszt <ls2001927@sohu.com>
Date: Thu Jun 20 15:14:21 2024 +0800
update ablation for videomme datasets
commit d94f83cb3f08b61a2c75cc4326e58792100605b3
Author: Li Bo <drluodian@gmail.com>
Date: Thu Jun 20 13:30:59 2024 +0800
Update README.md
commit cab8159ff35db330536c0b6dfb4b0a3b24142209
Author: Li Bo <drluodian@gmail.com>
Date: Thu Jun 20 13:30:29 2024 +0800
Update README.md
commit 45876652a877a8006b828f32f5cc4660629f9190
Author: kcz358 <kaichenzhang358@outlook.com>
Date: Thu Jun 20 03:55:30 2024 +0000
Add llava_hf back to registry
commit 3463651b8c54d36cd94169e3d376f5ed225a195a
Author: kcz358 <kaichenzhang358@outlook.com>
Date: Thu Jun 20 03:54:33 2024 +0000
Remove handling non-visual loop in llava
commit cb0d3f49b72790b081f981e0e6147131542f7f68
Author: Fanyi Pu <FPU001@e.ntu.edu.sg>
Date: Thu Jun 20 02:11:18 2024 +0800
update readme
commit 813877bfe5ac590cdbe92dd74d18f83a2091f748
Author: Fanyi Pu <FPU001@e.ntu.edu.sg>
Date: Wed Jun 19 15:37:52 2024 +0800
to sh script
commit a14684b8557d5894976448a5c559ed7a66a6cf16
Author: Fanyi Pu <FPU001@e.ntu.edu.sg>
Date: Wed Jun 19 15:37:04 2024 +0800
lint
commit d0f8851d42ba31f5da2a7a65e91499db45174dbc
Author: Fanyi Pu <FPU001@e.ntu.edu.sg>
Date: Wed Jun 19 15:36:48 2024 +0800
small fix
commit 63748e9718f287ad433afc90e340b5e17a89c1ed
Author: Fanyi Pu <FPU001@e.ntu.edu.sg>
Date: Wed Jun 19 15:36:43 2024 +0800
small fix
commit 7f1159a1fe04cfb783dc31d4fbdef3bda0ce19e4
Author: Fanyi Pu <FPU001@e.ntu.edu.sg>
Date: Wed Jun 19 15:35:05 2024 +0800
update preparation
commit 19f9bd621c76a483ff98f8c7eb78f64753da683a
Author: Fanyi Pu <FPU001@e.ntu.edu.sg>
Date: Wed Jun 19 15:23:24 2024 +0800
docs
commit ce6f889ba02d819979c7922f6336cf4f1f718f65
Author: Fanyi Pu <FPU001@e.ntu.edu.sg>
Date: Wed Jun 19 15:04:16 2024 +0800
tutorial
commit f513c520c2a3dad26d2b2ca5c4ed4db05a493c73
Author: Bo Li <drluodian@gmail.com>
Date: Wed Jun 19 06:51:19 2024 +0000
chore: Update dependencies to fix potential risks and improve compatibility
commit efb529552c5e4ba039a4cba8e9aa5cb7ba65bf90
Author: kcz358 <kaichenzhang358@outlook.com>
Date: Wed Jun 19 10:25:58 2024 +0800
Release llava-wilder
commit 742651fc9daf97e2f57831ed6e6e7ee7ead7d555
Author: Fanyi Pu <FPU001@e.ntu.edu.sg>
Date: Wed Jun 19 07:44:26 2024 +0800
feat: Add support for auto downloading tar format videos
commit 511b6259828212fcba954cdeb8cf90d6e5daabf8
Merge: 22a4958e 050b2c37
Author: Bo Li <drluodian@gmail.com>
Date: Tue Jun 18 17:01:03 2024 +0000
Merge branch 'main' of https://github.com/EvolvingLMMs-Lab/lmms-eval
commit 050b2c370017e9b97475dd6cf01fd051b5ca5c86
Merge: 74facb41 ef306512
Author: Li Bo <drluodian@gmail.com>
Date: Tue Jun 18 13:13:38 2024 +0800
Merge pull request #114 from zjysteven/add-tinyllava
add tinyllava
commit ef306512e5135f76dffa383f600b8733015836e8
Author: Jingyang Zhang <jingyang.zhang@duke.edu>
Date: Mon Jun 17 17:57:02 2024 -0400
fix typo
commit 9bab67732a4238097725deddf867fb1946ffee40
Merge: dbfb2387 74facb41
Author: Jingyang Zhang <jingyang.zhang@duke.edu>
Date: Sun Jun 16 10:56:05 2024 -0400
Merge branch 'EvolvingLMMs-Lab:main' into add-tinyllava
commit 74facb41a826691dfce4458cf1d8659b34fc5bf5
Merge: 8ba192f9 d5df72de
Author: Li Bo <drluodian@gmail.com>
Date: Sun Jun 16 17:59:19 2024 +0800
Merge pull request #118 from teowu/main
Fix the potential risk by PR #117
commit d5df72de2d03108d6b365818ecc3551ac9aa6302
Merge: 5bf59ed2 8ba192f9
Author: Teo (Timothy) Wu Haoning <38696372+teowu@users.noreply.github.com>
Date: Sun Jun 16 15:32:13 2024 +0800
Merge branch 'EvolvingLMMs-Lab:main' into main
commit 5bf59ed250da98a408a94e214a73caa400cba842
Author: teowu <realtimothyhwu@gmail.com>
Date: Sun Jun 16 07:27:28 2024 +0000
fix #117, allow auto download with tar format videos
commit 98b3955cb808e…
MichalCiesiolka referenced this pull request in MichalCiesiolka/lmms-eval-llmzszl
…s. (EvolvingLMMs-Lab#218)
Load tasks only one time (#139)
chore: Initialize tasks only once to avoid re-initialization
chore: Initialize tasks only once to avoid re-initialization
chore: Refactor task initialization to avoid re-initialization
chore: Update task initialization to fix include_path issue
chore: Update task initialization to fix include_path issue
Upload live_bench results (#140)
upload results
add a readme
chore: Update upload_results.py script to use shell syntax
Update upload_results.py
Update upload_results.py
Add Muirbench (#143)
handle gen kwargs in internvl2
Add muirbench
if no response directly return 0 (#142)
merge ov evals (#144)
chore: Update gpt_eval_model_name to "gpt-3.5-turbo" in mathvista.yaml
Squashed commit of the following:
commit 994c9f97a2f8db3e9b7d7933d1e1680acde5b70b Author: Yan Shu 570533048@qq.com Date: Mon Jul 8 17:21:23 2024 +0800
Add files via upload
- Squashed commit of the following:
commit e31cd7883d4555c7530795c7f102b8d78cbd372f Author: Bo Li drluodian@gmail.com Date: Wed Jul 10 12:08:08 2024 +1000
chore: Update lmms_eval/models/vila.py and lmms_eval/tasks/__init__.py
commit 1d8c980d1089f9d7702c3b92d5c85039f2809c6d Author: kcz358 kaichenzhang358@outlook.com Date: Tue Jul 9 02:08:52 2024 +0000
Rename xcomposer 4KHD
commit 6da76f36ecf5f9aa73057e767a4fcb60c99ff896 Author: Bo Li drluodian@gmail.com Date: Tue Jul 9 11:55:56 2024 +1000
Upgrade lmms-eval to version 0.2.1
commit cd1858523fcd8630082cbefba8710e0de3ee8805 Author: Bo Li drluodian@gmail.com Date: Tue Jul 9 11:52:23 2024 +1000
Upgrade lmms-eval to support more models and evaluation tasks
commit 672d7e5bb49dcb34e1b2fdeb09f3f4588dc583a6 Author: Bo Li drluodian@gmail.com Date: Tue Jul 9 11:43:41 2024 +1000
feat: Add tie_weights parameter to Llava model initialization
commit 2037a86261b55fa42b8ba3a04eab192b3e69d6ea Merge: e6844db1 a5c18692 Author: Bo Li drluodian@gmail.com Date: Tue Jul 9 11:37:12 2024 +1000
Fix gen kwargs image aspect ratio in internvl2
commit a5c186925de989b616f58a35ece36065a32b4594 Merge: 2ebec77f 557083a1 Author: Li Bo drluodian@gmail.com Date: Tue Jul 9 09:15:56 2024 +0800
Merge pull request #137 from shuyansy/main
add MLVU task
commit 557083a156c3dd67ac79e22b4202e9b69b6b00f4 Author: Yan Shu 570533048@qq.com Date: Mon Jul 8 16:56:50 2024 +0800
Add files via upload
commit 2ebec77f5606d79e9a7b995970e32792050606a1 Merge: 211bfede b23d349e Author: Li Bo drluodian@gmail.com Date: Mon Jul 8 11:53:06 2024 +0800
Merge pull request #136 from Dousia/main
Add detailcaps
commit b23d349e46d60dc149ffaa54d6e019f4996ed92d Author: ByteDance bytedance@MacBook-Pro.local Date: Sun Jul 7 23:24:19 2024 +0800
Add install capture_metric in env
commit c6e211d5f9dbb7572d3a141b6504cb1ca2007c33 Author: ByteDance bytedance@MacBook-Pro.local Date: Sun Jul 7 23:04:13 2024 +0800
Add detailcaps
commit 211bfedebad243ef82a8b0be36c3b5a9b9cb2f72 Merge: 7c208b76 79514eee Author: Li Bo drluodian@gmail.com Date: Tue Jul 2 23:05:12 2024 +0800
Merge pull request #133 from EvolvingLMMs-Lab/dev/wild_vision
Add wild vision bench
commit 79514eeebcfd6f655be2a10c776037d12a7b7214 Author: kcz358 kaichenzhang358@outlook.com Date: Mon Jul 1 15:10:02 2024 +0000
Fixing handling None filtered score
commit 725fac2781446958b905e1e6c6eb3c0a8e582e49 Author: kcz358 kaichenzhang358@outlook.com Date: Mon Jul 1 08:25:42 2024 +0000
Fixing dataset name
commit 8d963e132ac03fc0d835d480cfcfcabe72af143c Author: kcz358 kaichenzhang358@outlook.com Date: Mon Jul 1 08:24:51 2024 +0000
Fixing scoring logic
commit e2990d0a69e876721256fdf946c68ba7ae0cbdc1 Author: kcz358 kaichenzhang358@outlook.com Date: Mon Jul 1 06:06:57 2024 +0000
Hardcode to keep image for wild vision
commit ed381736730d8fb785b4ee919fdb751734ecef25 Author: kcz358 kaichenzhang358@outlook.com Date: Mon Jul 1 06:06:38 2024 +0000
Add wild vision 0617
commit 7c208b76640c986cfe94233dce735c3ca4ad4319 Author: Li Bo drluodian@gmail.com Date: Mon Jul 1 11:53:31 2024 +0800
Update README.md
commit 39d40dea47bc59ff04e8b0cbc445345098debc9a Merge: e19b43a3 ba7081c0 Author: Li Bo drluodian@gmail.com Date: Mon Jul 1 11:47:09 2024 +0800
Merge pull request #129 from Dannoopsy/mmbench_ru
add task MMBench-ru
commit e19b43a3a1e7212e623061b164b0419cc0dda689 Merge: 11fd7e3f a0de8970 Author: Li Bo drluodian@gmail.com Date: Mon Jul 1 11:46:58 2024 +0800
Merge pull request #128 from Dannoopsy/gqa-ru
add task gqa-ru
commit 11fd7e3fc05908aeb01e4a6161a7b55cd38b3122 Merge: 383e7fea a7522592 Author: Li Bo drluodian@gmail.com Date: Mon Jul 1 11:46:16 2024 +0800
Merge pull request #130 from lscpku/vitatecs
Add task VITATECS
commit a75225926e5954f85466d257f99acf0163fde596 Author: lscpku lisc99@pku.edu.cn Date: Fri Jun 28 20:37:06 2024 +0800
create new task vitatecs
commit ba7081c0abac840002d320e30733e891298dfa11 Author: Dannoopsy 63581325+Dannoopsy@users.noreply.github.com Date: Fri Jun 28 12:21:05 2024 +0300
change prompt to ru
commit 27ea9c0055a8abf3a8198829b8617018479918e2 Author: Dannoopsy belopolskikh.dd@phystech.edu Date: Thu Jun 27 17:17:29 2024 +0000
add mmbench_ru_dev
commit 383e7fead3138aedf62e9c0ec48303835ef26e2a Merge: 06fa000f ed2e7f79 Author: Li Bo drluodian@gmail.com Date: Fri Jun 28 00:14:10 2024 +0800
Merge pull request #126 from lorenzomammana/feature/external-package-integration
External package integration using plugins
commit ed2e7f792151d21bce8f1c498270b9391e1d5c85 Merge: 03947e14 06fa000f Author: Lorenzo Mammana mammanalorenzo@outlook.it Date: Thu Jun 27 15:38:10 2024 +0000
Merge branch 'main' into feature/external-package-integration
commit a0de89708d5e6f259bb17f0eaace3c5b901b275c Author: Dannoopsy belopolskikh.dd@phystech.edu Date: Tue Jun 25 11:11:37 2024 +0000
new task gqa-ru
commit 06fa000f60d3e4d160fac8ceb9959ae92a98f752 Author: kcz358 kaichenzhang358@outlook.com Date: Tue Jun 25 06:41:13 2024 +0000
Fix vid mme post prompt issue
commit b388d79e0df6f60068196cb7047453ebd22d6ef1 Author: Li Bo drluodian@gmail.com Date: Sun Jun 23 22:31:16 2024 +0800
Update activitynetqa_generation.yaml
commit 8f9d620fcd9d0a0742ee6bcf51ea63bd6b088a36 Author: Li Bo drluodian@gmail.com Date: Sun Jun 23 14:02:25 2024 +0800
Update pyproject.toml
commit 6341b7c15ce9fb28eb06b067ddb299d6cf2e16c3 Merge: fce85f1b 903b042b Author: Li Bo drluodian@gmail.com Date: Sun Jun 23 14:02:02 2024 +0800
Merge pull request #125 from EvolvingLMMs-Lab/dev/interleave
[Model] aligned llava-interleave model results on video tasks
commit 903b042be016016d4ebeecb07701f3076a2d323c Author: kcz358 kaichenzhang358@outlook.com Date: Sat Jun 22 12:07:13 2024 +0000
Remove unnecessary lines for video llava
commit d78ec86407b729a964906a8c2e50704b4bc74d06 Merge: ebe7217a fce85f1b Author: Li Bo drluodian@gmail.com Date: Sat Jun 22 13:57:31 2024 +0800
Merge branch 'main' into dev/interleave
commit ebe7217a486c1e754e42c2cbdb834e09fbbcc9b0 Author: kcz358 kaichenzhang358@outlook.com Date: Sat Jun 22 02:57:08 2024 +0000
Delete unnecessary lines
commit 120c474b056f9177c74e1fd9691d59e2f234b785 Author: kcz358 kaichenzhang358@outlook.com Date: Fri Jun 21 08:38:41 2024 +0000
Revise model registry for llava_hf and longva
commit 7d6201f921088afd3f52a35076e3c6fcc9aa518c Author: kcz358 kaichenzhang358@outlook.com Date: Fri Jun 21 08:38:24 2024 +0000
Add longva
commit 12f480699c71a12a24d4349d9b0681933201a3a6 Author: kcz358 kaichenzhang358@outlook.com Date: Fri Jun 21 08:35:39 2024 +0000
Remove unnecessary lines since use batched visuals now in llava
commit 12cea76f1f0f14b1fd1007c9d39a9b0557368637 Author: Bo Li drluodian@gmail.com Date: Thu Jun 20 18:15:32 2024 +0000
chore: Add loguru for logging in lmms_eval package
commit 03947e14a46fd25b412931f7c9c25f4a2971d0b4 Author: Lorenzo Mammana mammanalorenzo@outlook.it Date: Wed Jun 5 13:40:41 2024 +0000
feat: Allow including external tasks from plugins
commit b80a91f73e15ddd0b0ce1322d7d121fa14030eed Author: Lorenzo Mammana mammanalorenzo@outlook.it Date: Wed Jun 5 13:04:55 2024 +0000
feat: Allow loading model configurations from other packages
commit 8ef24740dd48a11c97eb627f2fff4aca107fef0d Author: Bo Li drluodian@gmail.com Date: Thu Jun 20 12:11:03 2024 +0000
chore: Remove unused models from lmms_eval package
commit af38885fc2e066f5ea44388f33e07176f836fe28 Author: Bo Li drluodian@gmail.com Date: Thu Jun 20 12:07:09 2024 +0000
chore: Handle ImportError when importing models
Handle the ImportError exception when importing models in the lmms_eval package. This change adds a try-except block to catch the ImportError and print an error message indicating the failed import. This will help with troubleshooting and identifying any issues with the model imports.
commit fce85f1b03ff7043b29dee787c5d17a08dd2687a Merge: dbe63293 d94f83cb Author: Li Bo drluodian@gmail.com Date: Thu Jun 20 20:02:12 2024 +0800
Merge pull request #120 from EvolvingLMMs-Lab/pufanyi/hf_dataset_docs
Add docs for datasets upload to HF
commit dbe63293245a5141fdfd80bda7657c304f6bd32f Author: choiszt ls2001927@sohu.com Date: Thu Jun 20 15:14:21 2024 +0800
update ablation for videomme datasets
commit d94f83cb3f08b61a2c75cc4326e58792100605b3 Author: Li Bo drluodian@gmail.com Date: Thu Jun 20 13:30:59 2024 +0800
Update README.md
commit cab8159ff35db330536c0b6dfb4b0a3b24142209 Author: Li Bo drluodian@gmail.com Date: Thu Jun 20 13:30:29 2024 +0800
Update README.md
commit 45876652a877a8006b828f32f5cc4660629f9190 Author: kcz358 kaichenzhang358@outlook.com Date: Thu Jun 20 03:55:30 2024 +0000
Add llava_hf back to registry
commit 3463651b8c54d36cd94169e3d376f5ed225a195a Author: kcz358 kaichenzhang358@outlook.com Date: Thu Jun 20 03:54:33 2024 +0000
Remove handling non-visual loop in llava
commit cb0d3f49b72790b081f981e0e6147131542f7f68 Author: Fanyi Pu FPU001@e.ntu.edu.sg Date: Thu Jun 20 02:11:18 2024 +0800
update readme
commit 813877bfe5ac590cdbe92dd74d18f83a2091f748 Author: Fanyi Pu FPU001@e.ntu.edu.sg Date: Wed Jun 19 15:37:52 2024 +0800
to sh script
commit a14684b8557d5894976448a5c559ed7a66a6cf16 Author: Fanyi Pu FPU001@e.ntu.edu.sg Date: Wed Jun 19 15:37:04 2024 +0800
lint
commit d0f8851d42ba31f5da2a7a65e91499db45174dbc Author: Fanyi Pu FPU001@e.ntu.edu.sg Date: Wed Jun 19 15:36:48 2024 +0800
small fix
commit 63748e9718f287ad433afc90e340b5e17a89c1ed Author: Fanyi Pu FPU001@e.ntu.edu.sg Date: Wed Jun 19 15:36:43 2024 +0800
small fix
commit 7f1159a1fe04cfb783dc31d4fbdef3bda0ce19e4 Author: Fanyi Pu FPU001@e.ntu.edu.sg Date: Wed Jun 19 15:35:05 2024 +0800
update preparation
commit 19f9bd621c76a483ff98f8c7eb78f64753da683a Author: Fanyi Pu FPU001@e.ntu.edu.sg Date: Wed Jun 19 15:23:24 2024 +0800
docs
commit ce6f889ba02d819979c7922f6336cf4f1f718f65 Author: Fanyi Pu FPU001@e.ntu.edu.sg Date: Wed Jun 19 15:04:16 2024 +0800
tutorial
commit f513c520c2a3dad26d2b2ca5c4ed4db05a493c73 Author: Bo Li drluodian@gmail.com Date: Wed Jun 19 06:51:19 2024 +0000
chore: Update dependencies to fix potential risks and improve compatibility
commit efb529552c5e4ba039a4cba8e9aa5cb7ba65bf90 Author: kcz358 kaichenzhang358@outlook.com Date: Wed Jun 19 10:25:58 2024 +0800
Release llava-wilder
commit 742651fc9daf97e2f57831ed6e6e7ee7ead7d555 Author: Fanyi Pu FPU001@e.ntu.edu.sg Date: Wed Jun 19 07:44:26 2024 +0800
feat: Add support for auto downloading tar format videos
commit 511b6259828212fcba954cdeb8cf90d6e5daabf8 Merge: 22a4958e 050b2c37 Author: Bo Li drluodian@gmail.com Date: Tue Jun 18 17:01:03 2024 +0000
Merge branch 'main' of https://github.com/EvolvingLMMs-Lab/lmms-eval
commit 050b2c370017e9b97475dd6cf01fd051b5ca5c86 Merge: 74facb41 ef306512 Author: Li Bo drluodian@gmail.com Date: Tue Jun 18 13:13:38 2024 +0800
Merge pull request #114 from zjysteven/add-tinyllava
add tinyllava
commit ef306512e5135f76dffa383f600b8733015836e8 Author: Jingyang Zhang jingyang.zhang@duke.edu Date: Mon Jun 17 17:57:02 2024 -0400
fix typo
commit 9bab67732a4238097725deddf867fb1946ffee40 Merge: dbfb2387 74facb41 Author: Jingyang Zhang jingyang.zhang@duke.edu Date: Sun Jun 16 10:56:05 2024 -0400
Merge branch 'EvolvingLMMs-Lab:main' into add-tinyllava
commit 74facb41a826691dfce4458cf1d8659b34fc5bf5 Merge: 8ba192f9 d5df72de Author: Li Bo drluodian@gmail.com Date: Sun Jun 16 17:59:19 2024 +0800
Merge pull request #118 from teowu/main
Fix the potential risk by PR #117
commit d5df72de2d03108d6b365818ecc3551ac9aa6302 Merge: 5bf59ed2 8ba192f9 Author: Teo (Timothy) Wu Haoning 38696372+teowu@users.noreply.github.com Date: Sun Jun 16 15:32:13 2024 +0800
Merge branch 'EvolvingLMMs-Lab:main' into main
commit 5bf59ed250da98a408a94e214a73caa400cba842 Author: teowu realtimothyhwu@gmail.com Date: Sun Jun 16 07:27:28 2024 +0000
fix #117, allow auto download with tar format videos
commit 98b3955cb808e36303c030aea78eb037d1ec59ce Merge: a056f118 be9dada8 Author: teowu realtimothyhwu@gmail.com Date: Sun Jun 16 07:25:07 2024 +0000
Merge branch 'main' of https://github.com/teowu/lmms-eval into main
commit a056f118704eccec86ce32ab86981ce4bc1e1deb Author: teowu realtimothyhwu@gmail.com Date: Sun Jun 16 07:23:54 2024 +0000
fix #117, allow auto download with tar format videos
commit 8ba192f94edf5d99598983445d5faa4f8807c49f Merge: 7cc28907 be9dada8 Author: Li Bo drluodian@gmail.com Date: Sat Jun 15 17:30:59 2024 +0800
Merge pull request #117 from teowu/main
LongVideoBench for LMMs-Eval
commit be9dada8b4189c53c08e1674ab273242cf2f80a0 Merge: 62ea8ceb 7cc28907 Author: Teo (Timothy) Wu Haoning 38696372+teowu@users.noreply.github.com Date: Sat Jun 15 16:39:20 2024 +0800
Merge pull request #1 from EvolvingLMMs-Lab/main
Merge pull request #113 from teowu/main
commit 62ea8ceb223ef2b51ebab2bcd50d5cf339c35cfe Author: teowu realtimothyhwu@gmail.com Date: Sat Jun 15 08:30:11 2024 +0000
LongVideoBench support: image LMMs (idefics2, phi3) and video LMMs (LLaVA-Next-Video-34B)
commit 7cc28907edbb4eb58ee1398772a48110ea35dd96 Merge: 4bc7224d ea14cd4b Author: Li Bo drluodian@gmail.com Date: Sat Jun 15 14:10:22 2024 +0800
Merge pull request #113 from teowu/main
Q-Bench, Q-Bench2, A-Bench
commit dbfb23873979f789477f4797ee2d6071e0fd921e Author: Jingyang jingyang.zhang@duke.edu Date: Fri Jun 14 16:20:42 2024 -0400
add tinyllava
commit ea14cd4b361f4c95b3665cbdb95bc51754090eb5 Author: teowu realtimothyhwu@gmail.com Date: Fri Jun 14 15:01:52 2024 +0000
Add qbench, qbench2, abench; fix phi3v as its current implementation does not support multi-image
commit 4bc7224dcd27fe8b288bfc3fed4d7a9da9635658 Merge: 2797987f bf14cb85 Author: Li Bo drluodian@gmail.com Date: Fri Jun 14 02:14:43 2024 +0800
Merge pull request #111 from XinrunDu/main
add II-Bench
commit bf14cb8527b2b7ac438a36567a875168bc02d294 Author: XinrunDu duxinrun2000@gmail.com Date: Thu Jun 13 09:37:02 2024 +0000
fix dataset_path
commit 6248113f4e11a0ac396d31fa1b032a142fea8cb4 Author: XinrunDu duxinrun2000@gmail.com Date: Thu Jun 13 09:32:06 2024 +0000
add II-Bench
commit 2797987f5b88b87bd172714b678a75a1d8051826 Merge: 63d82f1f 66d4bb2d Author: Li Bo drluodian@gmail.com Date: Thu Jun 13 11:14:47 2024 +0800
Merge pull request #109 from EvolvingLMMs-Lab/pufanyi/update_version
[Small Update] Update the version of LMMs-Eval
commit 66d4bb2d9c9afbbdea40196d4ad80e214d0b14b6 Author: Fanyi Pu FPU001@e.ntu.edu.sg Date: Thu Jun 13 11:13:00 2024 +0800
update version
commit 63d82f1ff11eb430d91a15d6788a1f0b4d596850 Author: Li Bo drluodian@gmail.com Date: Thu Jun 13 11:04:32 2024 +0800
Update README.md
commit 44a33799671cb668f55366d5e5a4ddb051a3a1b4 Merge: 5ed00356 0ce46d08 Author: Li Bo drluodian@gmail.com Date: Thu Jun 13 04:00:12 2024 +0800
Merge pull request #105 from tianyu-z/main
Include VCR
commit 0ce46d088e473d12d63de44f17c67dceab25658c Author: Suyuchen suyuchen.wang@umontreal.ca Date: Wed Jun 12 15:56:34 2024 -0400
update README.md
commit 46a88d8b0199ed44d2ff459fb372f2e006960cea Merge: 47b13b9b 5ed00356 Author: Suyuchen suyuchen.wang@umontreal.ca Date: Wed Jun 12 15:50:26 2024 -0400
merged readme.md
commit 47b13b9b320d36ac53b3622557e31239f7c22621 Author: Suyuchen suyuchen.wang@umontreal.ca Date: Wed Jun 12 15:30:52 2024 -0400
update aggregation function for vcr_wiki
commit 5ed00356676cf5d0ff056cf27d1b519b8e303ff7 Author: Li Bo drluodian@gmail.com Date: Thu Jun 13 03:21:42 2024 +0800
Update README.md
commit ed8806839db5988ced672bd162b7b046edb4863a Author: Li Bo drluodian@gmail.com Date: Thu Jun 13 03:13:59 2024 +0800
Update README.md
commit fea3806026932a6e2bd6e538bcc413e33abdf245 Merge: d99a24ab 05dc8e85 Author: Li Bo drluodian@gmail.com Date: Thu Jun 13 03:11:49 2024 +0800
Merge pull request #108 from EvolvingLMMs-Lab/internal_main_dev
[Upgrade to v0.2] Embracing Video Evaluations with LMMs-Eval
commit 05dc8e853eab7c6bc782a1e2662d2efe7422f767 Author: Bo Li drluodian@gmail.com Date: Wed Jun 12 15:56:04 2024 +0000
chore: Update lmms-eval to support video evaluations for LLaVA models
commit cbeee20bc4ffb510a2b23d96cdaf4077be7c2a9e Author: Bo Li drluodian@gmail.com Date: Wed Jun 12 15:50:30 2024 +0000
chore: Update lmms-eval to support video evaluations for LLaVA models
commit f00d5498b69dd4f7e54c907ac906abc7c128f000 Author: Bo Li drluodian@gmail.com Date: Wed Jun 12 15:46:33 2024 +0000
Update image alignment in README.md
commit 34156335db74cef9e3f0915d7172fd6b22456c15 Author: Bo Li drluodian@gmail.com Date: Wed Jun 12 15:43:16 2024 +0000
Update llava conv_template in lmms_eval/models/llava.py
commit 50575a950736bc8fc1e191310314cbb5fdff5720 Author: Bo Li drluodian@gmail.com Date: Wed Jun 12 15:39:03 2024 +0000
chore: Update lmms-eval to support video evaluations for LLaVA models
commit c9b2252fb8a15dd04252af5e6b4613855afd6ada Author: Bo Li drluodian@gmail.com Date: Wed Jun 12 15:33:48 2024 +0000
Bump version to 0.2.0.dev0
commit 465bd4205e8097e9c037b24a3ed08dd6a7694efa Merge: e43bd840 d99a24ab Author: Bo Li drluodian@gmail.com Date: Wed Jun 12 15:04:25 2024 +0000
Merge branch 'main' of https://github.com/EvolvingLMMs-Lab/lmms-eval into internal_main_dev
commit e43bd840b63eb499856e36d9d2ba45c924abcead Author: Bo Li drluodian@gmail.com Date: Wed Jun 12 14:54:06 2024 +0000
chore: Remove unnecessary files and code related to live_bench and sft_eval tasks
commit d99a24abd06df10d07e5a4d0ad5030613f92f2e7 Merge: 374590be a66003be Author: Li Bo drluodian@gmail.com Date: Wed Jun 12 19:45:57 2024 +0800
Merge pull request #107 from AtsuMiyai/new_task/upd_update
update gpt-3.5-turbo version
commit a66003befe4175824a1be6ed59f5f5b88c15f792 Author: AtsuMiyai miyai.atsuyuki.practice@gmail.com Date: Wed Jun 12 17:05:17 2024 +0900
update gpt-3.5-turbo version
commit ee91f272985f32eeb9cd6faa41afdd8eb49cac30 Author: AtsuMiyai miyai.atsuyuki.practice@gmail.com Date: Wed Jun 12 16:50:53 2024 +0900
update gpt-3.5-turbo version
commit 326b9694fc77398592b8caf3ba0bc2e2bb903813 Author: tianyu-z zhangtianyupro@gmail.com Date: Mon Jun 10 20:07:40 2024 -0400
include std and confidence interval
commit cd050d4a721d01a2ace0cd030cf7f8dc67eb8c4d Author: Suyuchen suyuchen.wang@umontreal.ca Date: Mon Jun 10 18:49:47 2024 -0400
update vcr_wiki tasks in README.md
commit 205721e0aad76dde30255e56149bbed121883356 Author: Suyuchen suyuchen.wang@umontreal.ca Date: Mon Jun 10 18:43:15 2024 -0400
update vcr_wiki tasks
commit db8e718b502469e8536ee359c5559de87635ffc7 Author: tianyu-z zhangtianyupro@gmail.com Date: Mon Jun 10 16:13:58 2024 -0400
include the try-except logic for spacy
commit 427dabb790118f538b64e4e5bf6a7aab9689b3d9 Author: Suyuchen suyuchen.wang@umontreal.ca Date: Mon Jun 10 15:51:05 2024 -0400
add crossed_text to vcr_wiki output
commit 043b483eb55f7be4fea75c9bc0b9b03d251b109b Author: tianyu-z zhangtianyupro@gmail.com Date: Mon Jun 10 15:47:00 2024 -0400
switch logic
commit e1f04db8f58dd10591fde335ea13f74cda7c79bd Author: tianyu-z zhangtianyupro@gmail.com Date: Mon Jun 10 02:38:21 2024 -0400
modify the form of VCR
commit 96e8d9867c9549ab7490f4b12cfeb6a06238e0aa Author: tianyu-z zhangtianyupro@gmail.com Date: Mon Jun 10 00:10:30 2024 -0400
init include vcr
commit 374590be62f988a76cf6704cfe394cd8ae7d4cb6 Merge: 504685e2 cb3b9ce7 Author: Kaichen Zhang - NTU kaichenzhang358@outlook.com Date: Fri Jun 7 20:25:48 2024 +0800
Merge pull request #101 from Gumpest/main
Update conbench in README
commit 504685e20b17659b913cf46f3012c16bf429e09d Author: Li Bo drluodian@gmail.com Date: Thu Jun 6 15:42:15 2024 +0800
Update README.md
commit cb3b9ce71411da862ff01342a9122a3c656ffbd1 Merge: c9793b38 67b64ea4 Author: Yuan Zhang 56063339+Gumpest@users.noreply.github.com Date: Thu Jun 6 11:22:24 2024 +0800
Merge branch 'EvolvingLMMs-Lab:main' into main
commit c9793b3883714f254a700230b7bee781d6110e73 Author: Yuan Zhang gump_well_done@163.com Date: Thu Jun 6 11:21:05 2024 +0800
update README
commit 67b64ea44a5a39d96c7a196a8a8345a7486bd912 Merge: 8ee7848a 5fd68451 Author: Li Bo drluodian@gmail.com Date: Wed Jun 5 23:12:58 2024 +0800
Merge pull request #100 from Gumpest/main
add Conbench
commit 5fd684515c55ef643726c1b6c720c7cbd2183ba1 Author: Yuan Zhang gump_well_done@163.com Date: Wed Jun 5 21:52:31 2024 +0800
add conbench
commit 8ee7848aaa6383aa1f919c3f21199c81db3fff89 Merge: 747e1978 6fefaf7c Author: Li Bo drluodian@gmail.com Date: Tue Jun 4 17:09:33 2024 +0800
Merge pull request #95 from AtsuMiyai/new_task/upd
add MM-UPD
commit 747e19782996065cdce7157ee8c5e15beb5b6c59 Merge: 4854a34d 05843072 Author: Li Bo drluodian@gmail.com Date: Tue Jun 4 17:09:04 2024 +0800
Merge pull request #97 from CaraJ7/update
Add MathVerse in README.md
commit 6fefaf7cea504e35583ee7217449da290295a7a4 Author: AtsuMiyai miyai.atsuyuki.practice@gmail.com Date: Tue Jun 4 17:36:39 2024 +0900
update utils.py for leaderboard submission
commit 5f4fe360def1c48ea0cb1da6409d192784882308 Author: AtsuMiyai miyai.atsuyuki.practice@gmail.com Date: Sun Jun 2 23:28:27 2024 +0900
slightly change query_prompt for the reproduction
commit 05843072d608b970bcada1cd0db65a3c80864060 Author: CaraJ7 1350074492@qq.com Date: Sun Jun 2 17:05:28 2024 +0800
Add MathVerse in README.md
commit 0581ab3cfb362e2024988b46fbbb00324f1233c9 Author: AtsuMiyai miyai.atsuyuki.practice@gmail.com Date: Fri May 31 16:09:45 2024 +0900
merge model_specific_prompt_kwargs and dataset_name into each task yaml
commit 4854a34d4d37efb5e201f2691ecdb054590cf20b Author: Pu Fanyi FPU001@e.ntu.edu.sg Date: Sat May 4 19:23:39 2024 +0800
Group MMMU images into one image (#83)
* update
* update font
* Add matplotlib.font_manager import in utils.py
* Refactor font handling in add_order_label function in utils.py
* group mmmu
---------
Co-authored-by: Li Bo <drluodian@gmail.com>
commit d224794c49520f4d28a31862cf977198cd6cbc5e Author: AtsuMiyai miyai.atsuyuki.practice@gmail.com Date: Wed May 29 15:15:59 2024 +0900
add upd
commit 453e7936424220f02b99517059ca71babfbe5f5a Author: AtsuMiyai miyai.atsuyuki.practice@gmail.com Date: Wed May 29 15:03:30 2024 +0900
add upd
commit 909edd6769ddcf8a546be4fdd129416687516878 Author: AtsuMiyai miyai.atsuyuki.practice@gmail.com Date: Wed May 29 12:52:21 2024 +0900
add upd
commit 7c1ac9706cafc4801fa4da181d2f610b7838c7b8 Author: AtsuMiyai miyai.atsuyuki.practice@gmail.com Date: Wed May 29 12:50:32 2024 +0900
add upd
commit 811301c5280ddd74986645086f026ab730c8848c Author: AtsuMiyai miyai.atsuyuki.practice@gmail.com Date: Wed May 29 12:46:58 2024 +0900
add upd
commit 71401bafd1d515f704f86ab4817a758542bc4672 Author: AtsuMiyai miyai.atsuyuki.practice@gmail.com Date: Wed May 29 12:41:21 2024 +0900
add upd
commit 24dc435908d921e9f1a5706e3141b12e5d838d18 Author: Bo Li drluodian@gmail.com Date: Mon May 27 10:17:32 2024 +0000
fix compatibility issue of older version llava
commit 616edf43731415b35f0f5e97748ed2e017a2891d Author: Bo Li drluodian@gmail.com Date: Mon May 27 09:32:26 2024 +0000
[Fix] import issues of multilingual llava and olympiadbench
commit 4c5a99e21a63fb0ee1c7d15546d18066e1d9894b Merge: 45c05b2b b05c3e22 Author: Li Bo drluodian@gmail.com Date: Mon May 27 14:19:53 2024 +0800
Merge pull request #87 from vfragoso/vifragos/phi3v
Adding microsoft/Phi-3-vision-128k-instruct model.
commit b05c3e222fabd308dd7af4e04c1c6a0812962fe6 Author: Victor Fragoso victor.fragoso@microsoft.com Date: Fri May 24 16:36:37 2024 +0000
Adding documentation of Phi3v class.
commit c2008971308ce8168d57c24d00b725832f099244 Author: Victor Fragoso victor.fragoso@microsoft.com Date: Fri May 24 16:25:02 2024 +0000
Adding prompt arguments for Phi3v on MathVista-TestMini
commit 7f9fb6bcc6cd24a7b8011b8753d0ea98cc2451fd Author: Victor Fragoso victor.fragoso@microsoft.com Date: Fri May 24 13:24:16 2024 +0000
Adding Phi3v model.
commit 45c05b2b2bece76e06849a52a0d034f9c0ac2367 Author: kcz358 kaichenzhang358@outlook.com Date: Thu May 23 03:47:36 2024 +0000
Set printing info for llava_hf to debug level
commit 53f013ed8278776551ca992562253387cc9968d2 Author: kcz358 kaichenzhang358@outlook.com Date: Thu May 23 03:41:39 2024 +0000
Fix pope random name in pope full
commit 22520a95f13334b75eee0cf0387151067a6bf516 Author: kcz358 kaichenzhang358@outlook.com Date: Thu May 23 03:41:14 2024 +0000
Add separated pope tasks by category
commit d1eefb1565014b47287ffa6b350229062f8f602f Author: kcz358 kaichenzhang358@outlook.com Date: Thu May 9 08:36:02 2024 +0000
Update gitignore
commit b2b4dbd2dc13432c79208db35abf7f55c97f1790 Author: kcz358 kaichenzhang358@outlook.com Date: Mon May 20 07:45:11 2024 +0000
Comment out Spice in caption task so that don't need to download stanford nlp model
commit 662f05ce4c62a46a83f819d3a5925a9bd20059b5 Author: kcz358 kaichenzhang358@outlook.com Date: Mon May 20 03:13:13 2024 +0000
Comment out parse result in xcomposer
commit 09329322916bfbb604d72ddaf50441a0947f8805 Author: kcz358 kaichenzhang358@outlook.com Date: Thu May 16 03:55:39 2024 +0000
Fix instructblip qformer size mismatch and multi-images problem
commit 557a6a3b15e07e506bc05e2cc76ff6a2f8c93964 Author: kcz358 kaichenzhang358@outlook.com Date: Thu May 16 03:11:41 2024 +0000
Remove redundant code in fuyu
commit 6aeb5504e74ed1980b53700d8e4d4dcf7d1b38fc Author: kcz358 kaichenzhang358@outlook.com Date: Thu May 16 01:45:24 2024 +0000
Fix idefics2 llava in the wild bugs
commit aea80e6a71f716951353e1e5d68380243396b4d6 Author: kcz358 kaichenzhang358@outlook.com Date: Wed May 15 11:07:35 2024 +0000
Better task list_with_num
commit 3c12a080d66b9c38f615b961befca7c30f82fa39 Author: Li Bo drluodian@gmail.com Date: Sat May 18 02:35:52 2024 +0800
Update LICENSE
commit 82317a635a4978b32e095a06cc295d0ae23661c2 Author: Li Bo drluodian@gmail.com Date: Sat May 18 02:29:09 2024 +0800
Update LICENSE
commit a8bba1cdb51061a0d27bf9a98cca1505b5c58ea5 Author: Li Bo drluodian@gmail.com Date: Sat May 18 02:28:03 2024 +0800
Create LICENSE
commit caa5893b5fd2c1d32c72b97f371ccd9a8d9ec3a0 Merge: c0944486 423b0060 Author: Li Bo drluodian@gmail.com Date: Mon May 13 11:45:26 2024 +0800
Merge pull request #73 from EvolvingLMMs-Lab/kc/qwen_vl_api
[Feat] Add qwen vl api
commit c09444860362a136f17641f8b2a1f91c2bbc3715 Author: kcz358 kaichenzhang358@outlook.com Date: Sat May 11 06:11:19 2024 +0000
Fix llava_hf image tokens number issue
commit 64f07e497f53e5bcbe9e8fb5830cc7a1daaf7ff1 Author: kcz358 kaichenzhang358@outlook.com Date: Thu May 9 02:04:10 2024 +0000
Fix endless warning for llava_hf generation
commit 8aaa828108da8514dd9cd23a9d6d83a8b67f2d65 Author: Bo Li drluodian@gmail.com Date: Thu May 2 06:13:56 2024 +0000
Add model_name parameter to Llava constructor
commit 7847dc4d8efe60605102414bb071b1da9851228e Author: kcz358 kaichenzhang358@outlook.com Date: Tue May 7 03:15:59 2024 +0000
Parse result for llava_hf 1.6
commit 3e56b4f92db39a2ce92903b0c43a34f1d14d59ec Author: kcz358 kaichenzhang358@outlook.com Date: Tue May 7 03:09:56 2024 +0000
Fix llava_hf generation for 1.6
commit fa3ff92b07ea5aaa633a2039818c310744f84d07 Author: kcz358 kaichenzhang358@outlook.com Date: Mon May 6 08:32:57 2024 +0000
Fix llava conv template for llama3
commit 423b00606aa77fd6b324c19e3d480b73ab852db6 Author: kcz358 kaichenzhang358@outlook.com Date: Sun May 5 07:54:52 2024 +0000
Add qwen vl api
commit b7fd7a9f7aa3c0e1e50374047dfffc46a7462b90 Merge: 986139a9 c5a130b6 Author: Li Bo drluodian@gmail.com Date: Sun May 5 13:19:48 2024 +0800
Merge pull request #59 from EvolvingLMMs-Lab/add_idefics2
add idefics2
commit 986139a9a31154679bdea029b09639f84712db27 Merge: b46239ca 8d3526c0 Author: Li Bo drluodian@gmail.com Date: Fri May 3 01🔞18 2024 +0800
Merge pull request #36 from cocoshe/main
[Fix] repr llava doc
commit b46239cabab7b545ec99d9eae6c851e531b18374 Merge: bc69a744 373265f2 Author: Li Bo drluodian@gmail.com Date: Fri May 3 01:17:34 2024 +0800
Merge pull request #56 from gagan3012/main
Multilingual LLava bench
commit bc69a744d2cffeb06eba62e843bcc7869e27613a Merge: eef3aeb6 626e8a91 Author: Li Bo drluodian@gmail.com Date: Fri May 3 01:12:14 2024 +0800
Merge pull request #70 from hunterheiden/hsh/new_task/WebSRC
Bugfix: WebSRC should be token-level F1 NOT character-level
commit 626e8a91a4af2dd5dd774fc130cc2f4d74b2bc37 Author: Hunter Heidenreich hunter.heidenreich@rootsautomation.com Date: Thu May 2 09:31:03 2024 -0400
Bugfix: WebSRC should be token-level F1 NOT character-level
commit eef3aeb6ab589bb1d5045af5b5c1984a69402d19 Merge: c4e9dd9f 9bca4413 Author: Li Bo drluodian@gmail.com Date: Thu May 2 14:38:17 2024 +0800
Merge pull request #69 from hunterheiden/hsh/new_task/WebSRC
[New Task] WebSRC (multimodal Q&A on web screenshots)
commit 9bca441376325173128e5c50087f068e519c48da Author: Hunter Heidenreich hunter.heidenreich@rootsautomation.com Date: Wed May 1 11:07:29 2024 -0400
Add code to enable compilation of submission for WebSRC test split
commit 7687495b1ed552eeba088cb9ad5aaf1170e7fff9 Author: Hunter Heidenreich hunter.heidenreich@rootsautomation.com Date: Wed May 1 10:47:32 2024 -0400
Draft and validate websrc eval on dev split
commit 4eebd3e5d7ab3b8c3116eea57318db72d2ce32bb Author: Hunter Heidenreich hunter.heidenreich@rootsautomation.com Date: Wed May 1 10:46:54 2024 -0400
Update main README with new task names
commit 35fe80b67656114a8824eb59574089663bdc4c9a Author: Hunter Heidenreich hunter.heidenreich@rootsautomation.com Date: Wed May 1 10:46:20 2024 -0400
Draft README for WebSRC
commit 955bd0635cc6c14a96ad869f1002e6dbefdc5071 Author: Hunter Heidenreich hunter.heidenreich@rootsautomation.com Date: Tue Apr 30 10:16:21 2024 -0400
Init webSRC
commit c4e9dd9f6e40e8586587c4a75987aa109a37f14b Merge: d8a3a99f 319afccb Author: Li Bo drluodian@gmail.com Date: Fri Apr 26 14:37:22 2024 +0800
Merge pull request #63 from hunterheiden/hsh/new_task/screenspot
New Task: ScreenSpot - Grounding (REC) and instruction generation (REG) on screens
commit 319afccbe713ddf40a8a6fa28501e64c0ad34725 Author: Hunter Heidenreich hunter.heidenreich@rootsautomation.com Date: Thu Apr 25 11:44:34 2024 -0400
slight update
commit 2f3811ca1bbad6a441016b05fde09a571900fca8 Author: Hunter Heidenreich hunter.heidenreich@rootsautomation.com Date: Thu Apr 25 11:41:04 2024 -0400
Add README file specific to ScreenSpot
commit 28962cbe83631ec5d6481aaea4907a7c96fec848 Author: Hunter Heidenreich hunter.heidenreich@rootsautomation.com Date: Wed Apr 24 11:52:33 2024 -0400
Update README to reflect new tasks
commit e457cfb4f2d6869e8367d6d5b03ad25ee4acc363 Author: Hunter Heidenreich hunter.heidenreich@rootsautomation.com Date: Tue Apr 23 18:33:16 2024 -0400
Create ScreenSpot on clean branch
commit d8a3a99ff6142fe101fa3c188cc7f29593c44345 Merge: 3dcd0158 ed171293 Author: Li Bo drluodian@gmail.com Date: Tue Apr 23 10:34:03 2024 +0800
Merge pull request #61 from tupini07/patch-1
Fix typo in Qwen-VL that was causing "reference before assignment"
commit ed171293d1e82075c5c6a847fc91ecbfd45cf89f Author: Andrea Tupini tupini07@gmail.com Date: Mon Apr 22 14:56:41 2024 -0600
refactor query construction for clarity
commit cd874201c46f32a2903ddffae85f9db73e14adfd Author: Andrea Tupini tupini07@gmail.com Date: Mon Apr 22 14:54:29 2024 -0600
convert contexts to list if necessary and remove unnecessary construction of `questions`
commit 85573674e90c8d505312ba18c5102e0051255078 Author: Andrea Tupini tupini07@gmail.com Date: Mon Apr 22 14:47:33 2024 -0600
Fix typo in qwen_vl that was causing "reference before assignment"
commit 3dcd01582b719555bcf8eb25d91cc5e42abd2c5f Merge: 95df9fee 743673a1 Author: Li Bo drluodian@gmail.com Date: Sat Apr 20 22:03:16 2024 +0800
Merge pull request #60 from CaraJ7/main
Add MathVerse
commit 743673a1419b6e729e18c96f148745cc739d4c71 Merge: c1a54721 95df9fee Author: CaraJ7 1350074492@qq.com Date: Sat Apr 20 21:49:02 2024 +0800
Merge branch 'main' of https://github.com/EvolvingLMMs-Lab/lmms-eval
commit c1a5472135c3b84061b64d997ab50dda0412ba4f Author: CaraJ7 1350074492@qq.com Date: Sat Apr 20 21:45:34 2024 +0800
Add MathVerse
commit 373265f24e7a89cbd49ab724a2e388cc0930be78 Author: Gagan Bhatia 49101362+gagan3012@users.noreply.github.com Date: Fri Apr 12 17:21:39 2024 -0700
Add files via upload
commit d8530514a5ef9378d2adeaceb228b60ec25a6718 Author: Gagan Bhatia 49101362+gagan3012@users.noreply.github.com Date: Fri Apr 12 17:19:49 2024 -0700
Create README.md
commit 22a4958e993463edff352ac033014f9a485706cc Author: Bo Li bo.li01@bytedance.com Date: Thu Apr 4 17:12:43 2024 +0000
[WIP] adding mmbench dev evaluation (#75)
* WIP
* Update GPT evaluation model name and sys prompt
* 🛠️ Scale accuracy to percentage
The accuracy value is now multiplied by 100 in the aggregation function to represent it as a percentage. Regarding the evaluation process, `math` module importation and refactoring reduce progress log verbosity by logging every 100 evaluations instead of 10. It prevents potential logging overflow. Handling of NaN values is added to ensure 'default_value' is set in case of missing data, avoiding errors in split, category, and l2-category assignments. Finally, reporting of categorical and l2-categorical accuracies is streamlined through a new `calculate_hit_rates` function, improving code readability and maintenance.
Issue refs: #1427, #1533
* Update GPT evaluation model name and API configuration
* Refactor MMBench_Evaluator class to handle missing columns
* Add print statements for detailed results in MMBench-CN(CC), MMBench-CN(Dev), and MMBench-EN(Dev) evaluations
* Refactor MMBench-CN and MMBench-EN evaluation functions
* 🔄 Refactor result processing and logging logic
- Simplified the result processing functions across different utility modules (`cc_utils.py`, `cn_utils.py`, `en_utils.py`) to unify the handling of multiple-choice options. Now, all options ("A" to "E") are dynamically added to the result data, and default to "nan" if not provided in the document.
- Removed redundant keys directly from the process results dict creation to avoid clutter and align with the new dynamic addition of options.
- In `mmbench_evals.py`, removed the unnecessary check for all splits being 'dev' and streamlined the evaluation loop by eliminating the progress bar (tqdm) for a cleaner log output.
- Commented-out code and verbose logging during evaluation, which may have interfered with performance, has been removed for a more efficient and less intrusive logging experience.
This cleanup reduces redundancy in the codebase and improves evaluation performance.
Refs #2045
---------
Co-authored-by: Bo Li <bo.li01@bytedance.com>
(cherry picked from commit a19278c2ea6ddcbca64d3cc7f4efec7fe5775121)
commit 8d3526c0869f0ad7747ff6bb02441140792b461c Author: cocoshe 1228759711@qq.com Date: Thu Mar 28 13:38:36 2024 +0800
fix doc
- feat: Add LlavaOneVision model to available models
chore: Update sqlitedict dependency to version 2.1.0
- Revert "Squashed commit of the following:"
This reverts commit 11b00999df3c43cb225482e030b791b2d454124c.
- Refactor available models in lmms_eval
Remove duplicate entries for "llava_hf", "llava_onevision", and "longva" in the AVAILABLE_MODELS dictionary in lmms_eval/models/init.py.
- fix: Handle import errors in lmms_eval models/init.py
The code changes in this commit fix the handling of import errors in the lmms_eval/models/init.py file. Previously, when an import error occurred, the code simply ignored it. This commit updates the code to log an error message using the logger module when an import error occurs.
This commit also removes duplicate entries for "llava_hf", "llava_onevision", and "longva" in the AVAILABLE_MODELS dictionary.
Recent user commits:
- Refactor available models in lmms_eval
- Revert "Squashed commit of the following:"
- feat: Add LlavaOneVision model to available models
- chore: Update sqlitedict dependency to version 2.1.0
fix: Handle import errors in lmms_eval models/init.py
chore: Remove unused imports in lmms_eval/models/init.py and lmms_eval/tasks/vcr_wiki/utils.py
Remove unused imports in lmms_eval/tasks/vcr_wiki/utils.py
chore: Update lmms_eval/tasks/vcr_wiki/utils.py
This commit updates the lmms_eval/tasks/vcr_wiki/utils.py
file. It removes unused imports and fixes the condition for loading Spacy models based on the load_package
value in the config file. Additionally, it adds a debug log message when the Spacy models are not loaded due to load_package
being set to False.
Remove unused imports in lmms_eval/tasks/vcr_wiki/utils.py
- feat: Add new subtasks to overall score calculation
The code changes in this commit add new subtasks to the overall score calculation in the overall_score
function. The subtasks "ScanQA", "BLINK", "MathVerse", "SciVerse", and "Mantis" are included in the categories
dictionary. This ensures that the scores for these subtasks are calculated and included in the evaluation results.
Remove unused imports and update subtask categories in utils.py
feat: Add new subtasks to overall score calculation
chore: Update lmms_eval/tasks/llava_interleave_bench/_default_template_interleave_yaml
Update the image aspect ratio in the default template for the llava_interleave_bench task. Change the value of "image_aspect_ratio" from "original" to "pad". This ensures that the generated images have a padded aspect ratio.
if no response directly return 0
Squashed commit of the following:
commit b2a009b6bbf8353172f5a1dd9c29ea1f67610c02 Author: Pu Fanyi FPU001@e.ntu.edu.sg Date: Mon Jul 15 19:12:25 2024 -0700
if no response directly return 0 (#142)
commit 5fc5f2f5acf454fc99448b0d62eb52b4bffba0d5 Author: Kaichen Zhang - NTU kaichenzhang358@outlook.com Date: Tue Jul 16 10:12:11 2024 +0800
Add Muirbench (#143)
* handle gen kwargs in internvl2
* Add muirbench
- Add files via upload
(cherry picked from commit 557083a156c3dd67ac79e22b4202e9b69b6b00f4)
- update
Co-authored-by: Fanyi Pu FPU001@e.ntu.edu.sg Co-authored-by: Yan Shu 570533048@qq.com
Fix llava onevision loglikelihood video bug
LiveBench July (#146)
claude auto detect json mode
extract information
use claude to generate
fix bugs
fix
generate data
chore: Update dataset name and version for live_bench task
gpt-4-turbo => gpt-4o
chore: Update dataset capture settings in create_dataset.py
everything use gpt-4o
websites
livebench_july
Refactor code to simplify data assignment in example.ipynb
chore: Update dataset name for live_bench task
Add xcomposer2d5 from fanyi, revise something for better usage (#145)
internvl2
fix some bugs
fix
lint
feat: Add XComposer2D5 model to AVAILABLE_MODELS
xcomposer
Fix llava vid error when using public
Fix xcomposer2d5
Add generation tokens
Co-authored-by: Fanyi Pu FPU001@e.ntu.edu.sg
Dev/ov evals (#147)
fix doc
[WIP] adding mmbench dev evaluation (#75)
WIP
Update GPT evaluation model name and sys prompt
🛠️ Scale accuracy to percentage
The accuracy value is now multiplied by 100 in the aggregation function to represent it as a percentage. Regarding the evaluation process, math
module importation and refactoring reduce progress log verbosity by logging every 100 evaluations instead of 10. It prevents potential logging overflow. Handling of NaN values is added to ensure 'default_value' is set in case of missing data, avoiding errors in split, category, and l2-category assignments. Finally, reporting of categorical and l2-categorical accuracies is streamlined through a new calculate_hit_rates
function, improving code readability and maintenance.
Issue refs: #1427, #1533
Update GPT evaluation model name and API configuration
Refactor MMBench_Evaluator class to handle missing columns
Add print statements for detailed results in MMBench-CN(CC), MMBench-CN(Dev), and MMBench-EN(Dev) evaluations
Refactor MMBench-CN and MMBench-EN evaluation functions
🔄 Refactor result processing and logging logic
- Simplified the result processing functions across different utility modules (
cc_utils.py
,cn_utils.py
,en_utils.py
) to unify the handling of multiple-choice options. Now, all options ("A" to "E") are dynamically added to the result data, and default to "nan" if not provided in the document. - Removed redundant keys directly from the process results dict creation to avoid clutter and align with the new dynamic addition of options.
- In
mmbench_evals.py
, removed the unnecessary check for all splits being 'dev' and streamlined the evaluation loop by eliminating the progress bar (tqdm) for a cleaner log output. - Commented-out code and verbose logging during evaluation, which may have interfered with performance, has been removed for a more efficient and less intrusive logging experience.
This cleanup reduces redundancy in the codebase and improves evaluation performance.
Refs #2045
Co-authored-by: Bo Li bo.li01@bytedance.com (cherry picked from commit a19278c2ea6ddcbca64d3cc7f4efec7fe5775121)
Create README.md
Add files via upload
Add MathVerse
Fix typo in qwen_vl that was causing "reference before assignment"
convert contexts to list if necessary and remove unnecessary construction of
questions
refactor query construction for clarity
Create ScreenSpot on clean branch
Update README to reflect new tasks
Add README file specific to ScreenSpot
slight update
Init webSRC
Draft README for WebSRC
Update main README with new task names
Draft and validate websrc eval on dev split
Add code to enable compilation of submission for WebSRC test split
Bugfix: WebSRC should be token-level F1 NOT character-level
Add qwen vl api
Fix llava conv template for llama3
Fix llava_hf generation for 1.6
Parse result for llava_hf 1.6
Add model_name parameter to Llava constructor
Fix endless warning for llava_hf generation
Fix llava_hf image tokens number issue
Create LICENSE
Update LICENSE
Update LICENSE
Better task list_with_num
Fix idefics2 llava in the wild bugs
Remove redundant code in fuyu
Fix instructblip qformer size mismatch and multi-images problem
Comment out parse result in xcomposer
Comment out Spice in caption task so that don't need to download stanford nlp model
Update gitignore
Add separated pope tasks by category
Fix pope random name in pope full
Set printing info for llava_hf to debug level
Adding Phi3v model.
Adding prompt arguments for Phi3v on MathVista-TestMini
Adding documentation of Phi3v class.
[Fix] import issues of multilingual llava and olympiadbench
fix compatibility issue of older version llava
add upd
add upd
add upd
add upd
add upd
add upd
Group MMMU images into one image (#83)
update
update font
Add matplotlib.font_manager import in utils.py
Refactor font handling in add_order_label function in utils.py
group mmmu
Co-authored-by: Li Bo drluodian@gmail.com
merge model_specific_prompt_kwargs and dataset_name into each task yaml
Add MathVerse in README.md
slightly change query_prompt for the reproduction
update utils.py for leaderboard submission
add conbench
update README
Update README.md
init include vcr
modify the form of VCR
switch logic
add crossed_text to vcr_wiki output
include the try-except logic for spacy
update vcr_wiki tasks
update vcr_wiki tasks in README.md
include std and confidence interval
update gpt-3.5-turbo version
update gpt-3.5-turbo version
chore: Remove unnecessary files and code related to live_bench and sft_eval tasks
Bump version to 0.2.0.dev0
chore: Update lmms-eval to support video evaluations for LLaVA models
Update llava conv_template in lmms_eval/models/llava.py
Update image alignment in README.md
chore: Update lmms-eval to support video evaluations for LLaVA models
chore: Update lmms-eval to support video evaluations for LLaVA models
Update README.md
Update README.md
update aggregation function for vcr_wiki
update README.md
Update README.md
update version
add II-Bench
fix dataset_path
Add qbench, qbench2, abench; fix phi3v as its current implementation does not support multi-image
add tinyllava
LongVideoBench support: image LMMs (idefics2, phi3) and video LMMs (LLaVA-Next-Video-34B)
fix #117, allow auto download with tar format videos
fix #117, allow auto download with tar format videos
fix typo
feat: Add support for auto downloading tar format videos
Release llava-wilder
chore: Update dependencies to fix potential risks and improve compatibility
tutorial
docs
update preparation
small fix
small fix
lint
to sh script
update readme
Remove handling non-visual loop in llava
Add llava_hf back to registry
Update README.md
Update README.md
update ablation for videomme datasets
chore: Handle ImportError when importing models
Handle the ImportError exception when importing models in the lmms_eval package. This change adds a try-except block to catch the ImportError and print an error message indicating the failed import. This will help with troubleshooting and identifying any issues with the model imports.
chore: Remove unused models from lmms_eval package
feat: Allow loading model configurations from other packages
feat: Allow including external tasks from plugins
chore: Add loguru for logging in lmms_eval package
Remove unnecessary lines since use batched visuals now in llava
Add longva
Revise model registry for llava_hf and longva
Delete unnecessary lines
Remove unnecessary lines for video llava
Update pyproject.toml
Update activitynetqa_generation.yaml
Fix vid mme post prompt issue
new task gqa-ru
add mmbench_ru_dev
change prompt to ru
create new task vitatecs
Update README.md
Add wild vision 0617
Hardcode to keep image for wild vision
Fixing scoring logic
Fixing dataset name
Fixing handling None filtered score
Add detailcaps
Add install capture_metric in env
Add files via upload
feat: Add tie_weights parameter to Llava model initialization
Upgrade lmms-eval to support more models and evaluation tasks
Upgrade lmms-eval to version 0.2.1
Rename xcomposer 4KHD
chore: Update lmms_eval/models/vila.py and lmms_eval/tasks/init.py
Update utils.py
Update _default_template_vcr_yaml
add process sync via temp file in lmms_eval/evaluator.py
Update utils.py
Update _default_template_vcr_yaml
Add muirbench
Squashed commit of the following:
commit dfdba507b5fbe985b0030ffec575f9f2638bc1ed Author: Li Bo drluodian@gmail.com Date: Tue Jul 16 11:13:52 2024 +0800
merge ov evals (#144)
* chore: Update gpt_eval_model_name to "gpt-3.5-turbo" in mathvista.yaml
* Squashed commit of the following:
commit 994c9f97a2f8db3e9b7d7933d1e1680acde5b70b
Author: Yan Shu <570533048@qq.com>
Date: Mon Jul 8 17:21:23 2024 +0800
Add files via upload
* Squashed commit of the following:
commit e31cd7883d4555c7530795c7f102b8d78cbd372f
Author: Bo Li <drluodian@gmail.com>
Date: Wed Jul 10 12:08:08 2024 +1000
chore: Update lmms_eval/models/vila.py and lmms_eval/tasks/__init__.py
commit 1d8c980d1089f9d7702c3b92d5c85039f2809c6d
Author: kcz358 <kaichenzhang358@outlook.com>
Date: Tue Jul 9 02:08:52 2024 +0000
Rename xcomposer 4KHD
commit 6da76f36ecf5f9aa73057e767a4fcb60c99ff896
Author: Bo Li <drluodian@gmail.com>
Date: Tue Jul 9 11:55:56 2024 +1000
Upgrade lmms-eval to version 0.2.1
commit cd1858523fcd8630082cbefba8710e0de3ee8805
Author: Bo Li <drluodian@gmail.com>
Date: Tue Jul 9 11:52:23 2024 +1000
Upgrade lmms-eval to support more models and evaluation tasks
commit 672d7e5bb49dcb34e1b2fdeb09f3f4588dc583a6
Author: Bo Li <drluodian@gmail.com>
Date: Tue Jul 9 11:43:41 2024 +1000
feat: Add tie_weights parameter to Llava model initialization
commit 2037a86261b55fa42b8ba3a04eab192b3e69d6ea
Merge: e6844db1 a5c18692
Author: Bo Li <drluodian@gmail.com>
Date: Tue Jul 9 11:37:12 2024 +1000
Fix gen kwargs image aspect ratio in internvl2
commit a5c186925de989b616f58a35ece36065a32b4594
Merge: 2ebec77f 557083a1
Author: Li Bo <drluodian@gmail.com>
Date: Tue Jul 9 09:15:56 2024 +0800
Merge pull request #137 from shuyansy/main
add MLVU task
commit 557083a156c3dd67ac79e22b4202e9b69b6b00f4
Author: Yan Shu <570533048@qq.com>
Date: Mon Jul 8 16:56:50 2024 +0800
Add files via upload
commit 2ebec77f5606d79e9a7b995970e32792050606a1
Merge: 211bfede b23d349e
Author: Li Bo <drluodian@gmail.com>
Date: Mon Jul 8 11:53:06 2024 +0800
Merge pull request #136 from Dousia/main
Add detailcaps
commit b23d349e46d60dc149ffaa54d6e019f4996ed92d
Author: ByteDance <bytedance@MacBook-Pro.local>
Date: Sun Jul 7 23:24:19 2024 +0800
Add install capture_metric in env
commit c6e211d5f9dbb7572d3a141b6504cb1ca2007c33
Author: ByteDance <bytedance@MacBook-Pro.local>
Date: Sun Jul 7 23:04:13 2024 +0800
Add detailcaps
commit 211bfedebad243ef82a8b0be36c3b5a9b9cb2f72
Merge: 7c208b76 79514eee
Author: Li Bo <drluodian@gmail.com>
Date: Tue Jul 2 23:05:12 2024 +0800
Merge pull request #133 from EvolvingLMMs-Lab/dev/wild_vision
Add wild vision bench
commit 79514eeebcfd6f655be2a10c776037d12a7b7214
Author: kcz358 <kaichenzhang358@outlook.com>
Date: Mon Jul 1 15:10:02 2024 +0000
Fixing handling None filtered score
commit 725fac2781446958b905e1e6c6eb3c0a8e582e49
Author: kcz358 <kaichenzhang358@outlook.com>
Date: Mon Jul 1 08:25:42 2024 +0000
Fixing dataset name
commit 8d963e132ac03fc0d835d480cfcfcabe72af143c
Author: kcz358 <kaichenzhang358@outlook.com>
Date: Mon Jul 1 08:24:51 2024 +0000
Fixing scoring logic
commit e2990d0a69e876721256fdf946c68ba7ae0cbdc1
Author: kcz358 <kaichenzhang358@outlook.com>
Date: Mon Jul 1 06:06:57 2024 +0000
Hardcode to keep image for wild vision
commit ed381736730d8fb785b4ee919fdb751734ecef25
Author: kcz358 <kaichenzhang358@outlook.com>
Date: Mon Jul 1 06:06:38 2024 +0000
Add wild vision 0617
commit 7c208b76640c986cfe94233dce735c3ca4ad4319
Author: Li Bo <drluodian@gmail.com>
Date: Mon Jul 1 11:53:31 2024 +0800
Update README.md
commit 39d40dea47bc59ff04e8b0cbc445345098debc9a
Merge: e19b43a3 ba7081c0
Author: Li Bo <drluodian@gmail.com>
Date: Mon Jul 1 11:47:09 2024 +0800
Merge pull request #129 from Dannoopsy/mmbench_ru
add task MMBench-ru
commit e19b43a3a1e7212e623061b164b0419cc0dda689
Merge: 11fd7e3f a0de8970
Author: Li Bo <drluodian@gmail.com>
Date: Mon Jul 1 11:46:58 2024 +0800
Merge pull request #128 from Dannoopsy/gqa-ru
add task gqa-ru
commit 11fd7e3fc05908aeb01e4a6161a7b55cd38b3122
Merge: 383e7fea a7522592
Author: Li Bo <drluodian@gmail.com>
Date: Mon Jul 1 11:46:16 2024 +0800
Merge pull request #130 from lscpku/vitatecs
Add task VITATECS
commit a75225926e5954f85466d257f99acf0163fde596
Author: lscpku <lisc99@pku.edu.cn>
Date: Fri Jun 28 20:37:06 2024 +0800
create new task vitatecs
commit ba7081c0abac840002d320e30733e891298dfa11
Author: Dannoopsy <63581325+Dannoopsy@users.noreply.github.com>
Date: Fri Jun 28 12:21:05 2024 +0300
change prompt to ru
commit 27ea9c0055a8abf3a8198829b8617018479918e2
Author: Dannoopsy <belopolskikh.dd@phystech.edu>
Date: Thu Jun 27 17:17:29 2024 +0000
add mmbench_ru_dev
commit 383e7fead3138aedf62e9c0ec48303835ef26e2a
Merge: 06fa000f ed2e7f79
Author: Li Bo <drluodian@gmail.com>
Date: Fri Jun 28 00:14:10 2024 +0800
Merge pull request #126 from lorenzomammana/feature/external-package-integration
External package integration using plugins
commit ed2e7f792151d21bce8f1c498270b9391e1d5c85
Merge: 03947e14 06fa000f
Author: Lorenzo Mammana <mammanalorenzo@outlook.it>
Date: Thu Jun 27 15:38:10 2024 +0000
Merge branch 'main' into feature/external-package-integration
commit a0de89708d5e6f259bb17f0eaace3c5b901b275c
Author: Dannoopsy <belopolskikh.dd@phystech.edu>
Date: Tue Jun 25 11:11:37 2024 +0000
new task gqa-ru
commit 06fa000f60d3e4d160fac8ceb9959ae92a98f752
Author: kcz358 <kaichenzhang358@outlook.com>
Date: Tue Jun 25 06:41:13 2024 +0000
Fix vid mme post prompt issue
commit b388d79e0df6f60068196cb7047453ebd22d6ef1
Author: Li Bo <drluodian@gmail.com>
Date: Sun Jun 23 22:31:16 2024 +0800
Update activitynetqa_generation.yaml
commit 8f9d620fcd9d0a0742ee6bcf51ea63bd6b088a36
Author: Li Bo <drluodian@gmail.com>
Date: Sun Jun 23 14:02:25 2024 +0800
Update pyproject.toml
commit 6341b7c15ce9fb28eb06b067ddb299d6cf2e16c3
Merge: fce85f1b 903b042b
Author: Li Bo <drluodian@gmail.com>
Date: Sun Jun 23 14:02:02 2024 +0800
Merge pull request #125 from EvolvingLMMs-Lab/dev/interleave
[Model] aligned llava-interleave model results on video tasks
commit 903b042be016016d4ebeecb07701f3076a2d323c
Author: kcz358 <kaichenzhang358@outlook.com>
Date: Sat Jun 22 12:07:13 2024 +0000
Remove unnecessary lines for video llava
commit d78ec86407b729a964906a8c2e50704b4bc74d06
Merge: ebe7217a fce85f1b
Author: Li Bo <drluodian@gmail.com>
Date: Sat Jun 22 13:57:31 2024 +0800
Merge branch 'main' into dev/interleave
commit ebe7217a486c1e754e42c2cbdb834e09fbbcc9b0
Author: kcz358 <kaichenzhang358@outlook.com>
Date: Sat Jun 22 02:57:08 2024 +0000
Delete unnecessary lines
commit 120c474b056f9177c74e1fd9691d59e2f234b785
Author: kcz358 <kaichenzhang358@outlook.com>
Date: Fri Jun 21 08:38:41 2024 +0000
Revise model registry for llava_hf and longva
commit 7d6201f921088afd3f52a35076e3c6fcc9aa518c
Author: kcz358 <kaichenzhang358@outlook.com>
Date: Fri Jun 21 08:38:24 2024 +0000
Add longva
commit 12f480699c71a12a24d4349d9b0681933201a3a6
Author: kcz358 <kaichenzhang358@outlook.com>
Date: Fri Jun 21 08:35:39 2024 +0000
Remove unnecessary lines since use batched visuals now in llava
commit 12cea76f1f0f14b1fd1007c9d39a9b0557368637
Author: Bo Li <drluodian@gmail.com>
Date: Thu Jun 20 18:15:32 2024 +0000
chore: Add loguru for logging in lmms_eval package
commit 03947e14a46fd25b412931f7c9c25f4a2971d0b4
Author: Lorenzo Mammana <mammanalorenzo@outlook.it>
Date: Wed Jun 5 13:40:41 2024 +0000
feat: Allow including external tasks from plugins
commit b80a91f73e15ddd0b0ce1322d7d121fa14030eed
Author: Lorenzo Mammana <mammanalorenzo@outlook.it>
Date: Wed Jun 5 13:04:55 2024 +0000
feat: Allow loading model configurations from other packages
commit 8ef24740dd48a11c97eb627f2fff4aca107fef0d
Author: Bo Li <drluodian@gmail.com>
Date: Thu Jun 20 12:11:03 2024 +0000
chore: Remove unused models from lmms_eval package
commit af38885fc2e066f5ea44388f33e07176f836fe28
Author: Bo Li <drluodian@gmail.com>
Date: Thu Jun 20 12:07:09 2024 +0000
chore: Handle ImportError when importing models
Handle the ImportError exception when importing models in the lmms_eval package. This change adds a try-except block to catch the ImportError and print an error message indicating the failed import. This will help with troubleshooting and identifying any issues with the model imports.
commit fce85f1b03ff7043b29dee787c5d17a08dd2687a
Merge: dbe63293 d94f83cb
Author: Li Bo <drluodian@gmail.com>
Date: Thu Jun 20 20:02:12 2024 +0800
Merge pull request #120 from EvolvingLMMs-Lab/pufanyi/hf_dataset_docs
Add docs for datasets upload to HF
commit dbe63293245a5141fdfd80bda7657c304f6bd32f
Author: choiszt <ls2001927@sohu.com>
Date: Thu Jun 20 15:14:21 2024 +0800
update ablation for videomme datasets
commit d94f83cb3f08b61a2c75cc4326e58792100605b3
Author: Li Bo <drluodian@gmail.com>
Date: Thu Jun 20 13:30:59 2024 +0800
Update README.md
commit cab8159ff35db330536c0b6dfb4b0a3b24142209
Author: Li Bo <drluodian@gmail.com>
Date: Thu Jun 20 13:30:29 2024 +0800
Update README.md
commit 45876652a877a8006b828f32f5cc4660629f9190
Author: kcz358 <kaichenzhang358@outlook.com>
Date: Thu Jun 20 03:55:30 2024 +0000
Add llava_hf back to registry
commit 3463651b8c54d36cd94169e3d376f5ed225a195a
Author: kcz358 <kaichenzhang358@outlook.com>
Date: Thu Jun 20 03:54:33 2024 +0000
Remove handling non-visual loop in llava
commit cb0d3f49b72790b081f981e0e6147131542f7f68
Author: Fanyi Pu <FPU001@e.ntu.edu.sg>
Date: Thu Jun 20 02:11:18 2024 +0800
update readme
commit 813877bfe5ac590cdbe92dd74d18f83a2091f748
Author: Fanyi Pu <FPU001@e.ntu.edu.sg>
Date: Wed Jun 19 15:37:52 2024 +0800
to sh script
commit a14684b8557d5894976448a5c559ed7a66a6cf16
Author: Fanyi Pu <FPU001@e.ntu.edu.sg>
Date: Wed Jun 19 15:37:04 2024 +0800
lint
commit d0f8851d42ba31f5da2a7a65e91499db45174dbc
Author: Fanyi Pu <FPU001@e.ntu.edu.sg>
Date: Wed Jun 19 15:36:48 2024 +0800
small fix
commit 63748e9718f287ad433afc90e340b5e17a89c1ed
Author: Fanyi Pu <FPU001@e.ntu.edu.sg>
Date: Wed Jun 19 15:36:43 2024 +0800
small fix
commit 7f1159a1fe04cfb783dc31d4fbdef3bda0ce19e4
Author: Fanyi Pu <FPU001@e.ntu.edu.sg>
Date: Wed Jun 19 15:35:05 2024 +0800
update preparation
commit 19f9bd621c76a483ff98f8c7eb78f64753da683a
Author: Fanyi Pu <FPU001@e.ntu.edu.sg>
Date: Wed Jun 19 15:23:24 2024 +0800
docs
commit ce6f889ba02d819979c7922f6336cf4f1f718f65
Author: Fanyi Pu <FPU001@e.ntu.edu.sg>
Date: Wed Jun 19 15:04:16 2024 +0800
tutorial
commit f513c520c2a3dad26d2b2ca5c4ed4db05a493c73
Author: Bo Li <drluodian@gmail.com>
Date: Wed Jun 19 06:51:19 2024 +0000
chore: Update dependencies to fix potential risks and improve compatibility
commit efb529552c5e4ba039a4cba8e9aa5cb7ba65bf90
Author: kcz358 <kaichenzhang358@outlook.com>
Date: Wed Jun 19 10:25:58 2024 +0800
Release llava-wilder
commit 742651fc9daf97e2f57831ed6e6e7ee7ead7d555
Author: Fanyi Pu <FPU001@e.ntu.edu.sg>
Date: Wed Jun 19 07:44:26 2024 +0800
feat: Add support for auto downloading tar format videos
commit 511b6259828212fcba954cdeb8cf90d6e5daabf8
Merge: 22a4958e 050b2c37
Author: Bo Li <drluodian@gmail.com>
Date: Tue Jun 18 17:01:03 2024 +0000
Merge branch 'main' of https://github.com/EvolvingLMMs-Lab/lmms-eval
commit 050b2c370017e9b97475dd6cf01fd051b5ca5c86
Merge: 74facb41 ef306512
Author: Li Bo <drluodian@gmail.com>
Date: Tue Jun 18 13:13:38 2024 +0800
Merge pull request #114 from zjysteven/add-tinyllava
add tinyllava
commit ef306512e5135f76dffa383f600b8733015836e8
Author: Jingyang Zhang <jingyang.zhang@duke.edu>
Date: Mon Jun 17 17:57:02 2024 -0400
fix typo
commit 9bab67732a4238097725deddf867fb1946ffee40
Merge: dbfb2387 74facb41
Author: Jingyang Zhang <jingyang.zhang@duke.edu>
Date: Sun Jun 16 10:56:05 2024 -0400
Merge branch 'EvolvingLMMs-Lab:main' into add-tinyllava
commit 74facb41a826691dfce4458cf1d8659b34fc5bf5
Merge: 8ba192f9 d5df72de
Author: Li Bo <drluodian@gmail.com>
Date: Sun Jun 16 17:59:19 2024 +0800
Merge pull request #118 from teowu/main
Fix the potential risk by PR #117
commit d5df72de2d03108d6b365818ecc3551ac9aa6302
Merge: 5bf59ed2 8ba192f9
Author: Teo (Timothy) Wu Haoning <38696372+teowu@users.noreply.github.com>
Date: Sun Jun 16 15:32:13 2024 +0800
Merge branch 'EvolvingLMMs-Lab:main' into main
commit 5bf59ed250da98a408a94e214a73caa400cba842
Author: teowu <realtimothyhwu@gmail.com>
Date: Sun Jun 16 07:27:28 2024 +0000
fix #117, allow auto download with tar format videos
commit 98b3955cb808e…
dadwadw233 pushed a commit to dadwadw233/lmms-eval that referenced this pull request