@@ -6,13 +6,55 @@ |
|
|
6 |
6 |
"source": [ |
7 |
7 |
"# Make Image dataset on Hugging Face Datasets\n", |
8 |
8 |
"\n", |
9 |
|
-"[\](https://colab.research.google.com/github/EvolvingLMMs-Lab/lmms-eval/blob/main/tools/make\_image\_hf\_dataset.ipynb)\\n", |
|
9 |
+"[\](https://colab.research.google.com/github/EvolvingLMMs-Lab/lmms-eval/blob/pufanyi/hf\_dataset\_docs/tools/make\_image\_hf\_dataset.ipynb)\\n", |
10 |
10 |
"\n", |
11 |
11 |
"This notebook will guide you to make correct format of Huggingface dataset, in proper parquet format and visualizable in Huggingface dataset hub.\n", |
12 |
12 |
"\n", |
13 |
13 |
"We will take the example of the dataset [`pufanyi/VQAv2_Example`](https://huggingface.co/datasets/lmms-lab/VQAv2) and convert it to the proper format." |
14 |
14 |
] |
15 |
15 |
}, |
|
16 |
+ { |
|
17 |
+"cell_type": "markdown", |
|
18 |
+"metadata": {}, |
|
19 |
+"source": [ |
|
20 |
+"## Preparation\n", |
|
21 |
+"\n", |
|
22 |
+"We need to install `datasets` library to create the dataset and `Pillow` to handle images." |
|
23 |
+ ] |
|
24 |
+ }, |
|
25 |
+ { |
|
26 |
+"cell_type": "code", |
|
27 |
+"execution_count": null, |
|
28 |
+"metadata": { |
|
29 |
+"vscode": { |
|
30 |
+"languageId": "bat" |
|
31 |
+ } |
|
32 |
+ }, |
|
33 |
+"outputs": [], |
|
34 |
+"source": [ |
|
35 |
+"!pip install datasets Pillow" |
|
36 |
+ ] |
|
37 |
+ }, |
|
38 |
+ { |
|
39 |
+"cell_type": "markdown", |
|
40 |
+"metadata": {}, |
|
41 |
+"source": [ |
|
42 |
+"And we need to login into Hugging Face to upload the dataset. You should goto the [Hugging Face website](https://huggingface.co/settings/tokens) to get your API token." |
|
43 |
+ ] |
|
44 |
+ }, |
|
45 |
+ { |
|
46 |
+"cell_type": "code", |
|
47 |
+"execution_count": null, |
|
48 |
+"metadata": { |
|
49 |
+"vscode": { |
|
50 |
+"languageId": "bat" |
|
51 |
+ } |
|
52 |
+ }, |
|
53 |
+"outputs": [], |
|
54 |
+"source": [ |
|
55 |
+"!huggingface-cli login --token hf_YOUR_HF_TOKEN # replace hf_YOUR_HF_TOKEN to your own Hugging Face token." |
|
56 |
+ ] |
|
57 |
+ }, |
16 |
58 |
{ |
17 |
59 |
"cell_type": "markdown", |
18 |
60 |
"metadata": { |