update preparation · dadwadw233/lmms-eval@1751954 (original) (raw)

Original file line number Diff line number Diff line change
@@ -6,13 +6,55 @@
6 6 "source": [
7 7 "# Make Image dataset on Hugging Face Datasets\n",
8 8 "\n",
9 -"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)\](https://colab.research.google.com/github/EvolvingLMMs-Lab/lmms-eval/blob/main/tools/make\_image\_hf\_dataset.ipynb)\\n",
9 +"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)\](https://colab.research.google.com/github/EvolvingLMMs-Lab/lmms-eval/blob/pufanyi/hf\_dataset\_docs/tools/make\_image\_hf\_dataset.ipynb)\\n",
10 10 "\n",
11 11 "This notebook will guide you to make correct format of Huggingface dataset, in proper parquet format and visualizable in Huggingface dataset hub.\n",
12 12 "\n",
13 13 "We will take the example of the dataset [`pufanyi/VQAv2_Example`](https://huggingface.co/datasets/lmms-lab/VQAv2) and convert it to the proper format."
14 14 ]
15 15 },
16 + {
17 +"cell_type": "markdown",
18 +"metadata": {},
19 +"source": [
20 +"## Preparation\n",
21 +"\n",
22 +"We need to install `datasets` library to create the dataset and `Pillow` to handle images."
23 + ]
24 + },
25 + {
26 +"cell_type": "code",
27 +"execution_count": null,
28 +"metadata": {
29 +"vscode": {
30 +"languageId": "bat"
31 + }
32 + },
33 +"outputs": [],
34 +"source": [
35 +"!pip install datasets Pillow"
36 + ]
37 + },
38 + {
39 +"cell_type": "markdown",
40 +"metadata": {},
41 +"source": [
42 +"And we need to login into Hugging Face to upload the dataset. You should goto the [Hugging Face website](https://huggingface.co/settings/tokens) to get your API token."
43 + ]
44 + },
45 + {
46 +"cell_type": "code",
47 +"execution_count": null,
48 +"metadata": {
49 +"vscode": {
50 +"languageId": "bat"
51 + }
52 + },
53 +"outputs": [],
54 +"source": [
55 +"!huggingface-cli login --token hf_YOUR_HF_TOKEN # replace hf_YOUR_HF_TOKEN to your own Hugging Face token."
56 + ]
57 + },
16 58 {
17 59 "cell_type": "markdown",
18 60 "metadata": {