An E-M algorithm and testing strategy for multiple-locus haplotypes (original) (raw)
. 1995 Mar;56(3):799–810.
Abstract
This paper gives an expectation maximization (EM) algorithm to obtain allele frequencies, haplotype frequencies, and gametic disequilibrium coefficients for multiple-locus systems. It permits high polymorphism and null alleles at all loci. This approach effectively deals with the primary estimation problems associated with such systems; that is, there is not a one-to-one correspondence between phenotypic and genotypic categories, and sample sizes tend to be much smaller than the number of phenotypic categories. The EM method provides maximum-likelihood estimates and therefore allows hypothesis tests using likelihood ratio statistics that have chi 2 distributions with large sample sizes. We also suggest a data resampling approach to estimate test statistic sampling distributions. The resampling approach is more computer intensive, but it is applicable to all sample sizes. A strategy to test hypotheses about aggregate groups of gametic disequilibrium coefficients is recommended. This strategy minimizes the number of necessary hypothesis tests while at the same time describing the structure of disequilibrium. These methods are applied to three unlinked dinucleotide repeat loci in Navajo Indians and to three linked HLA loci in Gila River (Pima) Indians. The likelihood functions of both data sets are shown to be maximized by the EM estimates, and the testing strategy provides a useful description of the structure of gametic disequilibrium. Following these applications, a number of simulation experiments are performed to test how well the likelihood-ratio statistic distributions are approximated by chi 2 distributions. In most circumstances the chi 2 grossly underestimated the probability of type I errors. However, at times they also overestimated the type 1 error probability. Accordingly, we recommended hypothesis tests that use the resampling method.
Selected References
These references are in PubMed. This may not be the complete list of references from this article.
- BENNETT J. H. On the theory of random mating. Ann Eugen. 1954 Mar;18(4):311–317. doi: 10.1111/j.1469-1809.1952.tb02522.x. [DOI] [PubMed] [Google Scholar]
- Bowcock A. M., Ruiz-Linares A., Tomfohrde J., Minch E., Kidd J. R., Cavalli-Sforza L. L. High resolution of human evolutionary trees with polymorphic microsatellites. Nature. 1994 Mar 31;368(6470):455–457. doi: 10.1038/368455a0. [DOI] [PubMed] [Google Scholar]
- Brown A. H., Feldman M. W., Nevo E. Multilocus Structure of Natural Populations of HORDEUM SPONTANEUM. Genetics. 1980 Oct;96(2):523–536. doi: 10.1093/genetics/96.2.523. [DOI] [PMC free article] [PubMed] [Google Scholar]
- CEPPELLINI R., SINISCALCO M., SMITH C. A. The estimation of gene frequencies in a random-mating population. Ann Hum Genet. 1955 Oct;20(2):97–115. doi: 10.1111/j.1469-1809.1955.tb01360.x. [DOI] [PubMed] [Google Scholar]
- Chakraborty R., Zhong Y., Jin L., Budowle B. Nondetectability of restriction fragments and independence of DNA fragment sizes within and between loci in RFLP typing of DNA. Am J Hum Genet. 1994 Aug;55(2):391–401. [PMC free article] [PubMed] [Google Scholar]
- Chakravarti A., Buetow K. H., Antonarakis S. E., Waber P. G., Boehm C. D., Kazazian H. H. Nonuniform recombination within the human beta-globin gene cluster. Am J Hum Genet. 1984 Nov;36(6):1239–1258. [PMC free article] [PubMed] [Google Scholar]
- Clark A. G. Inference of haplotypes from PCR-amplified samples of diploid populations. Mol Biol Evol. 1990 Mar;7(2):111–122. doi: 10.1093/oxfordjournals.molbev.a040591. [DOI] [PubMed] [Google Scholar]
- Gill P., Ivanov P. L., Kimpton C., Piercy R., Benson N., Tully G., Evett I., Hagelberg E., Sullivan K. Identification of the remains of the Romanov family by DNA analysis. Nat Genet. 1994 Feb;6(2):130–135. doi: 10.1038/ng0294-130. [DOI] [PubMed] [Google Scholar]
- Guo S. W., Thompson E. A. Performing the exact test of Hardy-Weinberg proportion for multiple alleles. Biometrics. 1992 Jun;48(2):361–372. [PubMed] [Google Scholar]
- Gyapay G., Morissette J., Vignal A., Dib C., Fizames C., Millasseau P., Marc S., Bernardi G., Lathrop M., Weissenbach J. The 1993-94 Généthon human genetic linkage map. Nat Genet. 1994 Jun;7(2 Spec No):246–339. doi: 10.1038/ng0694supp-246. [DOI] [PubMed] [Google Scholar]
- Haseman J. K., Elston R. C. The investigation of linkage between a quantitative trait and a marker locus. Behav Genet. 1972 Mar;2(1):3–19. doi: 10.1007/BF01066731. [DOI] [PubMed] [Google Scholar]
- Hedrick P. W. Gametic disequilibrium measures: proceed with caution. Genetics. 1987 Oct;117(2):331–341. doi: 10.1093/genetics/117.2.331. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hedrick P. W., Thomson G. A two-locus neutrality test: applications to humans, E. coli and lodgepole pine. Genetics. 1986 Jan;112(1):135–156. doi: 10.1093/genetics/112.1.135. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hill W. G. Estimation of linkage disequilibrium in randomly mating populations. Heredity (Edinb) 1974 Oct;33(2):229–239. doi: 10.1038/hdy.1974.89. [DOI] [PubMed] [Google Scholar]
- Hill W. G., Weir B. S. Variances and covariances of squared linkage disequilibria in finite populations. Theor Popul Biol. 1988 Feb;33(1):54–78. doi: 10.1016/0040-5809(88)90004-4. [DOI] [PubMed] [Google Scholar]
- Hästbacka J., de la Chapelle A., Kaitila I., Sistonen P., Weaver A., Lander E. Linkage disequilibrium mapping in isolated founder populations: diastrophic dysplasia in Finland. Nat Genet. 1992 Nov;2(3):204–211. doi: 10.1038/ng1192-204. [DOI] [PubMed] [Google Scholar]
- Jorde L. B., Watkins W. S., Viskochil D., O'Connell P., Ward K. Linkage disequilibrium in the neurofibromatosis 1 (NF1) region: implications for gene mapping. Am J Hum Genet. 1993 Nov;53(5):1038–1050. [PMC free article] [PubMed] [Google Scholar]
- Kaplan N., Weir B. S. Expected behavior of conditional linkage disequilibrium. Am J Hum Genet. 1992 Aug;51(2):333–343. [PMC free article] [PubMed] [Google Scholar]
- Nam J. M., Gart J. J. On two tests of fit for HLA data with no double blanks. Am J Hum Genet. 1987 Jul;41(1):70–76. [PMC free article] [PubMed] [Google Scholar]
- Ott J. Counting methods (EM algorithm) in human pedigree analysis: linkage and segregation analysis. Ann Hum Genet. 1977 May;40(4):443–454. [PubMed] [Google Scholar]
- SMITH C. A. Counting methods in genetical statistics. Ann Hum Genet. 1957 Mar;21(3):254–276. doi: 10.1111/j.1469-1809.1972.tb00287.x. [DOI] [PubMed] [Google Scholar]
- Weir B. S., Cockerham C. C. Testing Hypotheses about Linkage Disequilibrium with Multiple Alleles. Genetics. 1978 Mar;88(3):633–642. doi: 10.1093/genetics/88.3.633. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Weir B. S. Independence of VNTR alleles defined as fixed bins. Genetics. 1992 Apr;130(4):873–887. doi: 10.1093/genetics/130.4.873. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Weissenbach J., Gyapay G., Dib C., Vignal A., Morissette J., Millasseau P., Vaysseix G., Lathrop M. A second-generation linkage map of the human genome. Nature. 1992 Oct 29;359(6398):794–801. doi: 10.1038/359794a0. [DOI] [PubMed] [Google Scholar]
- Williams R. C., McAuley J. E. HLA class I variation controlled for genetic admixture in the Gila River Indian Community of Arizona: a model for the Paleo-Indians. Hum Immunol. 1992 Jan;33(1):39–46. doi: 10.1016/0198-8859(92)90050-w. [DOI] [PubMed] [Google Scholar]
- Yasuda N., Kimura M. A gene-counting method of maximum likelihood for estimating gene frequencies in ABO and ABO-like systems. Ann Hum Genet. 1968 May;31(4):409–420. doi: 10.1111/j.1469-1809.1968.tb00574.x. [DOI] [PubMed] [Google Scholar]