slight update · MichalCiesiolka/lmms-eval-llmzszl@319afcc (original) (raw)

Original file line number Diff line number Diff line change
@@ -4,7 +4,12 @@
4 4
5 5 ScreenSpot is an evaluation benchmark for GUI grounding, comprising over 1200 instructions from iOS, Android, macOS, Windows and Web environments, along with annotated element types (Text or Icon/Widget).
6 6
7 -This evaluation allows for both:
7 +
8 +## Groups
9 +
10 +- `screenspot`: This group bundles both the original grounding task and the new instruction generation task.
11 +
12 +## Tasks
8 13 - `screenspot_rec_test`: the original evaluation of `{img} {instruction} --> {bounding box}` called grounding or Referring Expression Completion (REC);
9 14 - `screenspot_reg_test`: the new evaluation of `{img} {bounding box} --> {instruction}` called instruction generation or Referring Expression Generation (REG).
10 15