A Dataset For Web-Based Structual Reading Comprehension (original) (raw)
What is WebSRC?
WebSRC is a novel Web-based Structural Reading Comprehension dataset. It consists of 0.44M question-answer pairs, which are collected from 6.5K web pages with corresponding HTML source code, screenshots and metadata. Each question in WebSRC requires a certain structural understanding of a web page to answer, and the answer is either a text span on the web page or yes/no.
News
Apr 2, 2025 The train, dev, and test set of our dataset can now be accessed through Huggingface. Aug 27, 2022 To access the test set, see here Sept 1, 2021 The full dataset and baseline is available in: Extraction code: 2mys Download from Amazon.com Aug 25, 2021 Our paper is accepted by EMNLP 2021, the updated version of dataset will be available soon. Baseline is available on WebSRC-Baseline
Contact Us
If you have any questions about this dataset, please contact chenlusz@sjtu.edu.cn or galaxychen@sjtu.edu.cn
Leaderboard
Three metrics, i.e. Exact Match (EM), F1 score, and Path Overlap Score (POS), are used to evaluate on the test set of WebSRC. Please refer to the paper to find more details about evaluation metrics.
Rank | Model | EM | F1 | POS |
---|---|---|---|---|
1 Sep 4, 2023 | SageGPT-small-v0.2 4paradigm.Inc | 89.11 | 92.15 | N/A |
2 Jan 12, 2024 | ScreenAI 5B Google Research (Baechler et al.) | 84.02 | 87.24 | N/A |
3 Oct 12, 2022 | DocPrompt (ErnieLayout-Large) BAIDU-Document Intelligence (Wu et al.) code demo | 77.35 | 85.04 | N/A |
4 Mar 01, 2022 | TIE (MarkupLM-Large) Shanghai Jiao Tong University (Zhao et al., NAACL'22) code | 76.25 | 80.51 | 89.50 |
5 Mar 01, 2022 | TIE (MarkupLM-Large)-3_seeds_ave Shanghai Jiao Tong University (Zhao et al., NAACL'22) code | 75.87 | 80.19 | 89.73 |
6 Nov 30, 2021 | MarkupLM-Large MSRA+SJTU (Li et al., ACL'22) code | 69.87 | 77.94 | 88.09 |
7 Nov 30, 2021 | MarkupLM-Large (3_seeds_ave) MSRA+SJTU (Li et al., ACL'22) code | 69.09 | 76.45 | 87.24 |
8 Sept 01, 2021 | V-PLM (ELECTRA) Shanghai Jiao Tong University (Chen et al., EMNLP'21) code | 68.07 | 75.25 | 84.96 |
9 Sept 01, 2021 | V-PLM (BERT) Shanghai Jiao Tong University (Chen et al., EMNLP'21) code | 54.84 | 62.80 | 76.39 |