Wikidata:Tools/OpenRefine - Wikidata (original) (raw)

Other languages:

WikidataCon Award 2019

Coolest Tool Award 2022 logo

OpenRefine logo

OpenRefine logo

Get started with a video tutorial

OpenRefine Beginners Tutorial by Emma Carroll

OpenRefine is a free data wrangling tool that can be used to clean tabular data and connect it with knowledge bases, including Wikidata. It was previously developed by Google (under the name Google Refine) and has now transitioned to a community-supported project.

This page gathers OpenRefine recipes that can be useful to import datasets into Wikidata, or augment datasets with additional data extracted from Wikidata. Feel free to use the talk page to ask for help with the software. If you enjoy using this tool, you can spread the word with the {{[User loves OpenRefine](/wiki/Template:User%5Floves%5FOpenRefine "Template:User loves OpenRefine")}} userbox.

OpenRefine currently only supports reconciling items. Lexemes are not supported as of September 2022.

OpenRefine can be downloaded as an application. It works on desktop and laptop computers with Windows, Mac and Linux operating systems. It runs a small server on your computer and you then use a web browser to interact with it. It works best with browsers based on Webkit, such as Google Chrome, Chromium, Opera and Microsoft Edge, and is also supported on Firefox.

OpenRefine has a graphical user interface which is available in more than 15 languages.

Install OpenRefine on your own desktop or laptop computer

[edit]

You can find and download the latest stable release of OpenRefine here.

Run OpenRefine on PAWS

[edit]

Since May 2021, everyone with a registered Wikimedia account can run OpenRefine in PAWS on Wikimedia's Cloud Services. Please note that this is an experimental feature which is not supported by the OpenRefine team itself, and which may break or malfunction. It is however an interesting option for people who can't install software on their local computer.

PAWS is a Wikimedia Cloud tool that provides hosted access to Jupyter notebooks and other tools without needing any local installation.

You can access your own installation of OpenRefine with this link: https://hub-paws.wmcloud.org/hub/user-redirect/openrefine. You'll have to login with your wiki credentials, but don't tick Remember me box: as all files written on PAWS are publicly available, you don't want to let your credentials accessible. It is also possible that you will get an error message; if that is the case, then refresh the page and it should work.

Please contact YuviPanda with questions about OpenRefine via PAWS.

Wikidata reconciliation

[edit]

In OpenRefine terminology, reconciliation is the process of linking free-text tabular cells to identifiers in knowledge bases. OpenRefine's built-in reconciliation capabilities make it a versatile tool to reconcile tabular data to a wide range of databases, including Wikidata.

Semi-automatic reconciliation of universities in OpenRefine

OpenRefine's wiki contains a detailed guide to the reconciliation process. Here are the main features:

If you want to use the reconciliation features, consider engaging with the following instruction materials:

APIs can be, for instance a search on frlabels with wikidata thanks to this link https://wikidata.reconci.link/**fr**/api.

This screencast demonstrates how to add new columns based on a reconciled column in OpenRefine 2.8.

This feature is available from OpenRefine 2.8 onwards.

Once a column of your table is reconciled to Wikidata, you can pull data from Wikidata, creating other columns in your dataset. If there are multiple claims for a given property, the values will be grouped as records in OpenRefine: they are stored in additional rows where the original reconciled column is blank. OpenRefine's record mode might therefore be more suitable for the later transformations you want to carry out on your table. Access to item labels, item descriptions and item sitelinks is provided by properties Lxx, Dxx and Syyyy, where xx is a language code (en, fr, yue, etc.) and yyyy is a site ID (enwiki, ptwikisource, etc.).

You can use this function recursively on the newly-created columns if they correspond to Wikidata items. This lets you explore the Wikidata graph along selected properties. It is also possible to configure the way you retrieve the properties in various ways (for instance, filtering by rank or references).

This feature is available from OpenRefine 3.0 onwards.

OpenRefine can help you transform tabular data into Wikidata statements. This works by creating a schema - a template of Wikidata edit that is applied to each row of your table. Once you have created a schema, you can:

See the editing subpage for more details. Many tutorials are available to get you started.

OpenRefine workflows can be shared by copying the JSON representation of the edit history. This represents the operations you have made in OpenRefine, and can be reused by others on similar datasets. This section lists some recipes that can be useful when working with Wikidata. See also OpenRefine Recipes.

OpenRefine needs your help! There are many things you can do:

We have a Phabricator project to track activity around OpenRefine within Wikimedia; feel free to tag any related task with it.

Over 2021-22, OpenRefine is being extended with Structured Data on Wikimedia Commons (SDC) support. This project is funded by a Wikimedia Foundation Project Grant.