Estimating and interpreting FST: The impact of rare variants (original) (raw)

  1. Nick Patterson2,6,7,
  2. Sriram Sankararaman2,3 and
  3. Alkes L. Price2,4,5,7
  4. 1Harvard–Massachusetts Institute of Technology (MIT), Division of Health, Science, and Technology, Cambridge, Massachusetts 02139, USA;
  5. 2Broad Institute of Harvard and MIT, Cambridge, Massachusetts 02142, USA;
  6. 3Department of Genetics, Harvard Medical School, Boston, Massachusetts 02115, USA;
  7. 4Department of Epidemiology, Harvard School of Public Health, Boston, Massachusetts 02115, USA;
  8. 5Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts 02115, USA
  9. 6 These authors contributed equally to this work.

Abstract

In a pair of seminal papers, Sewall Wright and Gustave Malécot introduced _F_ST as a measure of structure in natural populations. In the decades that followed, a number of papers provided differing definitions, estimation methods, and interpretations beyond Wright's. While this diversity in methods has enabled many studies in genetics, it has also introduced confusion regarding how to estimate _F_ST from available data. Considering this confusion, wide variation in published estimates of _F_ST for pairs of HapMap populations is a cause for concern. These estimates changed—in some cases more than twofold—when comparing estimates from genotyping arrays to those from sequence data. Indeed, changes in _F_ST from sequencing data might be expected due to population genetic factors affecting rare variants. While rare variants do influence the result, we show that this is largely through differences in estimation methods. Correcting for this yields estimates of _F_ST that are much more concordant between sequence and genotype data. These differences relate to three specific issues: (1) estimating _F_ST for a single SNP, (2) combining estimates of _F_ST across multiple SNPs, and (3) selecting the set of SNPs used in the computation. Changes in each of these aspects of estimation may result in _F_ST estimates that are highly divergent from one another. Here, we clarify these issues and propose solutions.

Footnotes

This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 3.0 Unported), as described at http://creativecommons.org/licenses/by-nc/3.0/.