What are the sources for Our World in Data's population estimates? (original) (raw)
Our team builds and maintains a long-run dataset on population by country, region, and for the world, based on three key sources.
This article was first published in 2022, and last revised in August 2024.
Cite this articleReuse our work freely
Population size is our most commonly used metric throughout Our World in Data. It is used directly to understand population growth over time or indirectly to calculate per-capita adjustments of the many other metrics we care about: from extreme poverty to electricity access, CO₂ emissions, and vaccination rates.
Many population datasets cover a specific period. For example, the UN publishes data from 1950 onwards. However, few maintain very long-term datasets that are continually updated to the present day.
Our team, therefore, builds and maintains a long-run dataset on population by country, region, and for the world, based on three key sources:
- 10,000 BCE to 1799: HYDE version 3.3.
- 1800 to 1949: Gapminder’s Population version 7
- 1950 onwards: UN World Population Prospects (2024)
- For former countries: Gapminder’s Systema Globalis
The scripts that produce this long-run dataset can be accessed in our GitHub repository.
In all sources we rely on, historical population estimates are based on today’s geographical borders.
We provide a full citation for each source below. If you cite population data for a specific period, please cite the source. For example, for the period 1950 onwards, please cite the UN World Population Prospects. You can add “via Our World in Data” if you downloaded the data from us.
You can find the complete list of the sources used for each country and year here.
HYDE version 3.2
The HYDE database (History Database of the Global Environment) is maintained by researchers at the Netherlands Environmental Assessment Agency.
HYDE is and internally consistent combination of updated historical population (gridded) estimates and land use for the past 12,000 years. Categories include cropland, with a new distinction into irrigated and rain fed crops (other than rice) and irrigated and rain fed rice. Also grazing lands are provided, divided into more intensively used pasture, converted rangeland and non-converted natural (less intensively used) rangeland. Population is represented by maps of total, urban, rural population and population density as well as built-up area.
Full citation: Klein Goldewijk, K., A. Beusen, J.Doelman and E. Stehfest (2017), Anthropogenic land use estimates for the Holocene; HYDE 3.2, Earth System Science Data, 9, 927-953.
The HYDE estimates go up to 2023, but they are only available once per decade for the period 1800–1950. Therefore, from 1800 onwards, when data is available from both HYDE and Gapminder, we favor the Gapminder dataset, as it provides annual estimates.
Gapminder
Gapminder version 7
Gapminder maintains a population dataset based on Angus Maddison and Clio Infra data. Their documentation provides the following details on their sources:
We use Maddison population data improved by CLIO INFRA in April 2015 and Gapminder v3 documented in greater detail by Mattias Lindgren. The main source of v3 was Angus Maddison’s data which is maintained and improved by CLIO Infra Project. The updated Maddison data by CLIO INFRA were based on the following improvements:
- Whenever estimates by Maddison were available, his figures are being followed in favor of estimates by Gapminder;
- For Africa, estimates by Frankema and Jerven (2014) for the period 1850-1960 have been added to the existing database;
- For Latin America, estimates by Abad & Van Zanden (2014) for the period 1500-1940 have been added.
Full citation: Gapminder doesn’t provide a preferred citation. We cite their work as: Gapminder population dataset version 7, based on data by Angus Maddison improved by Clio Infra. https://www.gapminder.org/data/documentation/gd003/
We use this dataset as our source from 1800 to 1949. In addition, we use their population estimates for the Vatican until 2100 since these are missing in the UN’s dataset.
Systema Globalis
Systema Globalis is Gapminder’s primary dataset, used in tools on their official website.
Full citation: Gapminder doesn’t provide a preferred citation. We cite their work as: Gapminder Systema Globalis. https://github.com/open-numbers/ddf--gapminder--systema_globalis
Data from this source covers the period 1555-2008. We use it to complement our population dataset with data from former countries (e.g., the Soviet Union, Yugoslavia, etc.) and other data not present in other sources.
UN World Population Prospects
We rely on the latest United Nations World Population Prospects (UNWPP) revision as our primary source for recent historical data and future projections. We use this data for its reliability, its consistent methods, and because it includes population estimates for almost all territories in the world. The UN updates its dataset every 2 years with the following:
- Annual historical estimates running from 1950 to the year before the most recent dataset publication;
- Annual projections running from the year of the most recent dataset publication to 2100. The UN publishes multiple projections based on different scenarios of global fertility rates: a low, medium, and high scenario. In our dataset, we use the medium-variant scenario.
The United Nations estimates may not always reflect the latest censuses or national figures. However, there are several reasons why we use this data over country-by-country national population estimates:
- The UNWPP dataset is the standard in research. The main reason is that it uses a reliable and standardized methodology for all countries. For example, data from individual countries may differ on how they count overseas workers, expatriates, undocumented immigrants, etc. The UNWPP dataset tries to maintain a consistent methodology across all countries.
- Using data from the UN allows us to get accurate population estimates for all territories worldwide. Finding and maintaining estimates based on national censuses would be time-consuming and more prone to errors.
- Other reasons include the availability of yearly data (national censuses are only conducted every few years) and avoiding double-counting in cases of border disputes.
Full citation: United Nations, Department of Economic and Social Affairs, Population Division (2024). World Population Prospects 2024, Online Edition.
Cite this work
Our articles and data visualizations rely on work from many different people and organizations. When citing this article, please also cite the underlying data sources. This article can be cited as:
Edouard Mathieu and Lucas Rodés-Guirao (2022) - “What are the sources for Our World in Data's population estimates?” Published online at OurWorldinData.org. Retrieved from: 'https://ourworldindata.org/population-sources' [Online Resource]
BibTeX citation
@article{owid-population-sources,
author = {Edouard Mathieu and Lucas Rodés-Guirao},
title = {What are the sources for Our World in Data's population estimates?},
journal = {Our World in Data},
year = {2022},
note = {https://ourworldindata.org/population-sources}
}
Reuse this work freely
All visualizations, data, and code produced by Our World in Data are completely open access under the Creative Commons BY license. You have the permission to use, distribute, and reproduce these in any medium, provided the source and authors are credited.
The data produced by third parties and made available by Our World in Data is subject to the license terms from the original third-party authors. We will always indicate the original source of the data in our documentation, so you should always check the license of any such third-party data before use and redistribution.
All of our charts can be embedded in any site.