Add Wikipedia Persian Dataset by pourmand1376 · Pull Request #3629 · LAION-AI/Open-Assistant (original) (raw)

Currently, the Open-assistant model doesn't support Farsi. This is a text-only dataset to learn Farsi (Persian).

One of my friends fine-tuned LLaMa on this dataset and It could understand Farsi grammar and word usage very well. If the Open-assistant team wants to add support to Farsi, this should be the first step.

I have transformed the dataset into the standard that has been mentioned here and uploaded it to my huggingface account.