Hans D Daetwyler - Academia.edu (original) (raw)
Uploads
Papers by Hans D Daetwyler
BMC genetics, Jan 24, 2014
BackgroundLentil is a self-pollinated annual diploid (2n¿=¿2׿=¿14) crop with a restricted histor... more BackgroundLentil is a self-pollinated annual diploid (2n¿=¿2׿=¿14) crop with a restricted history of genetic improvement through breeding, particularly when compared to cereal crops. This limited breeding has probably contributed to the narrow genetic base of local cultivars, and a corresponding potential to continue yield increases and stability. Therefore, knowledge of genetic variation and relationships between populations is important for understanding of available genetic variability and its potential for use in breeding programs. Single nucleotide polymorphism (SNP) markers provide a method for rapid automated genotyping and subsequent data analysis over large numbers of samples, allowing assessment of genetic relationships between genotypes.ResultsIn order to investigate levels of genetic diversity within lentil germplasm, 505 cultivars and landraces were genotyped with 384 genome-wide distributed SNP markers, of which 266 (69.2%) obtained successful amplification and detect...
G3 (Bethesda, Md.), Jan 23, 2015
The non-additive genetic effects may have an important contribution to total genetic variation of... more The non-additive genetic effects may have an important contribution to total genetic variation of phenotypes, so estimates of both the additive and non-additive effects are desirable for breeding and selection purposes. Our main objectives were to: estimate additive, dominance and epistatic variances of apple (<italic>Malus</italic> ×…
Genetics, selection, evolution : GSE, Jan 22, 2015
The objectives of this study were to investigate the accuracy of genotype imputation from low (12... more The objectives of this study were to investigate the accuracy of genotype imputation from low (12k) to medium (50k Illumina-Ovine) SNP (single nucleotide polymorphism) densities in purebred and crossbred Merino sheep based on a random or selected reference set and to evaluate the impact of using imputed genotypes on accuracy of genomic prediction. Imputation validation sets were composed of random purebred or crossbred Merinos, while imputation reference sets were of variable sizes and included random purebred or crossbred Merinos or a group of animals that were selected based on high genetic relatedness to animals in the validation set. The Beagle software program was used for imputation and accuracy of imputation was assessed based on the Pearson correlation coefficient between observed and imputed genotypes. Genomic evaluation was performed based on genomic best linear unbiased prediction and its accuracy was evaluated as the Pearson correlation coefficient between genomic estima...
PLOS ONE, 2015
The proportion of genetic variation in complex traits explained by rare variants is a key questio... more The proportion of genetic variation in complex traits explained by rare variants is a key question for genomic prediction, and for identifying the basis of &amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;quot;missing heritability&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;quot;-the proportion of additive genetic variation not captured by common variants on SNP arrays. Sequence variants in transcript and regulatory regions from 429 sequenced animals were used to impute high density SNP genotypes of 3311 Holstein sires to sequence. There were 675,062 common variants (MAF&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;0.05), 102,549 uncommon variants (0.01&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;lt;MAF&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;lt;0.05), and 83,856 rare variants (MAF&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;lt;0.01). We describe a novel method for estimating the proportion of the rare variants that are sequencing errors using parent-progeny duos. We then used mixed model methodology to estimate the proportion of variance captured by these different classes of variants for fat, milk and protein yields, as well as for fertility. Common sequence variants captured 83%, 77%, 76% and 84% of the total genetic variance for fat, milk, and protein yields and fertility, respectively. This was between 2 and 5% more variance than that captured from 600k SNPs on a high density chip, although the difference was not significant. Rare variants captured 3%, 0%, 1% and 14% of the genetic variance for fat, milk and protein yields, and fertility respectively, whereas pedigree explained the remaining amount of genetic variance (none for fertility). The proportion of variation explained by rare variants is likely to be under-estimated due to reduced accuracies of imputation for this class of variants. Using common sequence variants slightly improved accuracy of genomic predictions for fat and milk yield, compared to high density SNP array genotypes. However, including rare variants from transcript regions did not increase the accuracy of genomic predictions. These results suggest that rare variants recover a small percentage of the missing heritability for complex traits, however very large reference sets will be required to exploit this to improve the accuracy of genomic predictions. Our results do suggest the contribution of rare variants to genetic variation may be greater for fitness traits.
Genome wide association studies (GWAS) have identified numerous quantitative trait loci (QTL) acr... more Genome wide association studies (GWAS) have identified numerous quantitative trait loci (QTL) across the bovine genome that are associated with milk production traits in dairy cattle breeds. However, many of the causal mutations underlying these QTL have not been identified. The emergence of Next Generation Sequencing (NGS) technology should aid the identification of causal mutations and candidate genes underlying quantitative traits as whole genome re-sequencing can be utilised for variant discovery. Re-sequencing should allow identification of all genetic variants, including the causal variant(s), within the relevant QTL regions and study population(s). This study aimed to identify causal mutations underlying milk production QTL across Bos taurus autosome 29 (BTA29) in Australian Holstein and Jersey cattle. Genetic variants on BTA29 were identified using NGS data from 429 animals, representing 14 breeds of cattle, that were re-sequenced for the 1000 Bull Genomes Project. After fil...
We inferred a step-wise pattern of changing ancestral demography using whole-genome sequence data... more We inferred a step-wise pattern of changing ancestral demography using whole-genome sequence data from four domestic sheep (Ovis aries) and four wild sheep (O. canadensis and O. dalli). The inferred demography indicates clear differences between the wild sheep and domestic sheep. Furthermore we identified marked changes in effective population size which correspond to known historical events, including glaciation events and sheep domestication. Keywords: demography effective population size runs of homozygosity sequence error correction Introduction
In sheep and other livestock species, the calpain/calpastatin system is one of the principle gene... more In sheep and other livestock species, the calpain/calpastatin system is one of the principle genetic influences on variation in meat tenderisation. Genome wide association studies have shown there is a strong relationship between SNPs located in the fatty acid desaturases locus (i.e. FADS 1/2/3), ELOVL2 and SLC26A10 genes and omega-3 polyunsaturated fatty acid (PUFA) levels in human plasma. This study determined the genomic association of 182 SNPs specifically selected in genes affecting meat tenderness and the omega-3 PUFAs in Australian lamb. An additional 10 SNPs from the OvineSNP50 array were selected for inclusion onto a 192 Meat Quality Research (MQR) SNP panel based on their significant genomic association with omega-3 PUFA levels. One-thousand and fifty-eight animals genotyped for this panel had OvineSNP50 array genotypes. These were used to impute the 192 MQR SNP into 2833 animals with only OvineSNP50 genotypes and hot carcass weight (HCWT) phenotype data using Beagle. This...
Genetics, 2015
Double haploids are routinely created and phenotypically selected in plant breeding programs to a... more Double haploids are routinely created and phenotypically selected in plant breeding programs to accelerate the breeding cycle. Genomic selection, which makes use of both phenotypes and genotypes, has been shown to further improve genetic gain through prediction of performance before or without phenotypic characterization of novel germplasm. Additional opportunities exist to combine genomic prediction methods with the creation of doubled haploids. Here we propose an extension to genomic selection, optimal haploid value selection (OHV), which predicts the best doubled haploid that can be produced from a segregating plant. This method focuses selection on the haplotype and optimises the breeding program towards its end goal of generating an elite fixed line. We rigorously tested OHV selection breeding programs using computer simulation and show that it results in up to 0.6 standard deviations more genetic gain than genomic selection. At the same time, OHV selection preserved a substantially greater amount of genetic diversity in the population than genomic selection, which is important to achieve long-term genetic gain in breeding populations.
Abstract Text: Advantages of using whole genome sequence data to predict genomic estimated breedi... more Abstract Text: Advantages of using whole genome sequence data to predict genomic estimated breeding values (GEBV) include better persistence of accuracy of GEBV across generations and more accurate GEBV across breeds. The 1000 Bull Genomes Project provides a database of whole genome sequenced key ancestor bulls, for imputing sequence variant genotypes into reference sets for genomic prediction. Run 3.0 included 429 sequences, with 31.8 million variants detected. BayesRC, a new method for genomic prediction, addresses some challenges associated with using the sequence data, and takes advantage of biological information. In a dairy data set, predictions using BayesRC and imputed sequence data from 1000 Bull Genomes were 2% more accurate than with 800k data. We could demonstrate the method identified causal mutations in some cases. Further improvements will come from more accurate imputation of sequence variant genotypes and improved biological information. Keywords: Genomic prediction...
Crop and Pasture Science, 2014
Journal of animal breeding and genetics = Zeitschrift für Tierzüchtung und Züchtungsbiologie, 2015
The mutations that cause genetic variation in quantitative traits could be old and segregate acro... more The mutations that cause genetic variation in quantitative traits could be old and segregate across many breeds or they could be young and segregate only within one breed. This has implications for our understanding of the evolution of quantitative traits and for genomic prediction to improve livestock. We investigated the age of quantitative trait loci (QTL) for milk production traits identified as segregating in Holstein dairy cattle. We use a multitrait method and found that six of 11 QTL also segregate in Jerseys. Variants identified as Holstein-only QTL were fixed or rare [minor allele frequency (MAF) < 0.05] in Jersey. The age of the QTL mutations appears to vary from perhaps 2000 to 50 000 generations old. The older QTL tend to have high derived allele frequencies and often segregate across both breeds. Holstein-only QTL were often embedded within longer haplotypes, supporting the conclusion that they are typically younger mutations that have occurred more recently than QT...
ABSTRACT Background The prediction of the genetic disease risk of an individual is a powerful pub... more ABSTRACT Background The prediction of the genetic disease risk of an individual is a powerful public health tool. While predicting risk has been successful in diseases which follow simple Mendelian inheritance, it has proven challenging in complex diseases for which a large number of loci contribute to the genetic variance. The large numbers of single nucleotide polymorphisms now available provide new opportunities for predicting genetic risk of complex diseases with high accuracy. Methodology/Principal Findings We have derived simple deterministic formulae to predict the accuracy of predicted genetic risk from population or case control studies using a genome-wide approach and assuming a dichotomous disease phenotype with an underlying continuous liability. We show that the prediction equations are special cases of the more general problem of predicting the accuracy of estimates of genetic values of a continuous phenotype. Our predictive equations are responsive to all parameters that affect accuracy and they are independent of allele frequency and effect distributions. Deterministic prediction errors when tested by simulation were generally small. The common link among the expressions for accuracy is that they are best summarized as the product of the ratio of number of phenotypic records per number of risk loci and the observed heritability. Conclusions/Significance This study advances the understanding of the relative power of case control and population studies of disease. The predictions represent an upper bound of accuracy which may be achievable with improved effect estimation methods. The formulae derived will help researchers determine an appropriate sample size to attain a certain accuracy when predicting genetic risk.
BMC genetics, Jan 24, 2014
BackgroundLentil is a self-pollinated annual diploid (2n¿=¿2׿=¿14) crop with a restricted histor... more BackgroundLentil is a self-pollinated annual diploid (2n¿=¿2׿=¿14) crop with a restricted history of genetic improvement through breeding, particularly when compared to cereal crops. This limited breeding has probably contributed to the narrow genetic base of local cultivars, and a corresponding potential to continue yield increases and stability. Therefore, knowledge of genetic variation and relationships between populations is important for understanding of available genetic variability and its potential for use in breeding programs. Single nucleotide polymorphism (SNP) markers provide a method for rapid automated genotyping and subsequent data analysis over large numbers of samples, allowing assessment of genetic relationships between genotypes.ResultsIn order to investigate levels of genetic diversity within lentil germplasm, 505 cultivars and landraces were genotyped with 384 genome-wide distributed SNP markers, of which 266 (69.2%) obtained successful amplification and detect...
G3 (Bethesda, Md.), Jan 23, 2015
The non-additive genetic effects may have an important contribution to total genetic variation of... more The non-additive genetic effects may have an important contribution to total genetic variation of phenotypes, so estimates of both the additive and non-additive effects are desirable for breeding and selection purposes. Our main objectives were to: estimate additive, dominance and epistatic variances of apple (<italic>Malus</italic> ×…
Genetics, selection, evolution : GSE, Jan 22, 2015
The objectives of this study were to investigate the accuracy of genotype imputation from low (12... more The objectives of this study were to investigate the accuracy of genotype imputation from low (12k) to medium (50k Illumina-Ovine) SNP (single nucleotide polymorphism) densities in purebred and crossbred Merino sheep based on a random or selected reference set and to evaluate the impact of using imputed genotypes on accuracy of genomic prediction. Imputation validation sets were composed of random purebred or crossbred Merinos, while imputation reference sets were of variable sizes and included random purebred or crossbred Merinos or a group of animals that were selected based on high genetic relatedness to animals in the validation set. The Beagle software program was used for imputation and accuracy of imputation was assessed based on the Pearson correlation coefficient between observed and imputed genotypes. Genomic evaluation was performed based on genomic best linear unbiased prediction and its accuracy was evaluated as the Pearson correlation coefficient between genomic estima...
PLOS ONE, 2015
The proportion of genetic variation in complex traits explained by rare variants is a key questio... more The proportion of genetic variation in complex traits explained by rare variants is a key question for genomic prediction, and for identifying the basis of &amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;quot;missing heritability&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;quot;-the proportion of additive genetic variation not captured by common variants on SNP arrays. Sequence variants in transcript and regulatory regions from 429 sequenced animals were used to impute high density SNP genotypes of 3311 Holstein sires to sequence. There were 675,062 common variants (MAF&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;0.05), 102,549 uncommon variants (0.01&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;lt;MAF&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;lt;0.05), and 83,856 rare variants (MAF&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;lt;0.01). We describe a novel method for estimating the proportion of the rare variants that are sequencing errors using parent-progeny duos. We then used mixed model methodology to estimate the proportion of variance captured by these different classes of variants for fat, milk and protein yields, as well as for fertility. Common sequence variants captured 83%, 77%, 76% and 84% of the total genetic variance for fat, milk, and protein yields and fertility, respectively. This was between 2 and 5% more variance than that captured from 600k SNPs on a high density chip, although the difference was not significant. Rare variants captured 3%, 0%, 1% and 14% of the genetic variance for fat, milk and protein yields, and fertility respectively, whereas pedigree explained the remaining amount of genetic variance (none for fertility). The proportion of variation explained by rare variants is likely to be under-estimated due to reduced accuracies of imputation for this class of variants. Using common sequence variants slightly improved accuracy of genomic predictions for fat and milk yield, compared to high density SNP array genotypes. However, including rare variants from transcript regions did not increase the accuracy of genomic predictions. These results suggest that rare variants recover a small percentage of the missing heritability for complex traits, however very large reference sets will be required to exploit this to improve the accuracy of genomic predictions. Our results do suggest the contribution of rare variants to genetic variation may be greater for fitness traits.
Genome wide association studies (GWAS) have identified numerous quantitative trait loci (QTL) acr... more Genome wide association studies (GWAS) have identified numerous quantitative trait loci (QTL) across the bovine genome that are associated with milk production traits in dairy cattle breeds. However, many of the causal mutations underlying these QTL have not been identified. The emergence of Next Generation Sequencing (NGS) technology should aid the identification of causal mutations and candidate genes underlying quantitative traits as whole genome re-sequencing can be utilised for variant discovery. Re-sequencing should allow identification of all genetic variants, including the causal variant(s), within the relevant QTL regions and study population(s). This study aimed to identify causal mutations underlying milk production QTL across Bos taurus autosome 29 (BTA29) in Australian Holstein and Jersey cattle. Genetic variants on BTA29 were identified using NGS data from 429 animals, representing 14 breeds of cattle, that were re-sequenced for the 1000 Bull Genomes Project. After fil...
We inferred a step-wise pattern of changing ancestral demography using whole-genome sequence data... more We inferred a step-wise pattern of changing ancestral demography using whole-genome sequence data from four domestic sheep (Ovis aries) and four wild sheep (O. canadensis and O. dalli). The inferred demography indicates clear differences between the wild sheep and domestic sheep. Furthermore we identified marked changes in effective population size which correspond to known historical events, including glaciation events and sheep domestication. Keywords: demography effective population size runs of homozygosity sequence error correction Introduction
In sheep and other livestock species, the calpain/calpastatin system is one of the principle gene... more In sheep and other livestock species, the calpain/calpastatin system is one of the principle genetic influences on variation in meat tenderisation. Genome wide association studies have shown there is a strong relationship between SNPs located in the fatty acid desaturases locus (i.e. FADS 1/2/3), ELOVL2 and SLC26A10 genes and omega-3 polyunsaturated fatty acid (PUFA) levels in human plasma. This study determined the genomic association of 182 SNPs specifically selected in genes affecting meat tenderness and the omega-3 PUFAs in Australian lamb. An additional 10 SNPs from the OvineSNP50 array were selected for inclusion onto a 192 Meat Quality Research (MQR) SNP panel based on their significant genomic association with omega-3 PUFA levels. One-thousand and fifty-eight animals genotyped for this panel had OvineSNP50 array genotypes. These were used to impute the 192 MQR SNP into 2833 animals with only OvineSNP50 genotypes and hot carcass weight (HCWT) phenotype data using Beagle. This...
Genetics, 2015
Double haploids are routinely created and phenotypically selected in plant breeding programs to a... more Double haploids are routinely created and phenotypically selected in plant breeding programs to accelerate the breeding cycle. Genomic selection, which makes use of both phenotypes and genotypes, has been shown to further improve genetic gain through prediction of performance before or without phenotypic characterization of novel germplasm. Additional opportunities exist to combine genomic prediction methods with the creation of doubled haploids. Here we propose an extension to genomic selection, optimal haploid value selection (OHV), which predicts the best doubled haploid that can be produced from a segregating plant. This method focuses selection on the haplotype and optimises the breeding program towards its end goal of generating an elite fixed line. We rigorously tested OHV selection breeding programs using computer simulation and show that it results in up to 0.6 standard deviations more genetic gain than genomic selection. At the same time, OHV selection preserved a substantially greater amount of genetic diversity in the population than genomic selection, which is important to achieve long-term genetic gain in breeding populations.
Abstract Text: Advantages of using whole genome sequence data to predict genomic estimated breedi... more Abstract Text: Advantages of using whole genome sequence data to predict genomic estimated breeding values (GEBV) include better persistence of accuracy of GEBV across generations and more accurate GEBV across breeds. The 1000 Bull Genomes Project provides a database of whole genome sequenced key ancestor bulls, for imputing sequence variant genotypes into reference sets for genomic prediction. Run 3.0 included 429 sequences, with 31.8 million variants detected. BayesRC, a new method for genomic prediction, addresses some challenges associated with using the sequence data, and takes advantage of biological information. In a dairy data set, predictions using BayesRC and imputed sequence data from 1000 Bull Genomes were 2% more accurate than with 800k data. We could demonstrate the method identified causal mutations in some cases. Further improvements will come from more accurate imputation of sequence variant genotypes and improved biological information. Keywords: Genomic prediction...
Crop and Pasture Science, 2014
Journal of animal breeding and genetics = Zeitschrift für Tierzüchtung und Züchtungsbiologie, 2015
The mutations that cause genetic variation in quantitative traits could be old and segregate acro... more The mutations that cause genetic variation in quantitative traits could be old and segregate across many breeds or they could be young and segregate only within one breed. This has implications for our understanding of the evolution of quantitative traits and for genomic prediction to improve livestock. We investigated the age of quantitative trait loci (QTL) for milk production traits identified as segregating in Holstein dairy cattle. We use a multitrait method and found that six of 11 QTL also segregate in Jerseys. Variants identified as Holstein-only QTL were fixed or rare [minor allele frequency (MAF) < 0.05] in Jersey. The age of the QTL mutations appears to vary from perhaps 2000 to 50 000 generations old. The older QTL tend to have high derived allele frequencies and often segregate across both breeds. Holstein-only QTL were often embedded within longer haplotypes, supporting the conclusion that they are typically younger mutations that have occurred more recently than QT...
ABSTRACT Background The prediction of the genetic disease risk of an individual is a powerful pub... more ABSTRACT Background The prediction of the genetic disease risk of an individual is a powerful public health tool. While predicting risk has been successful in diseases which follow simple Mendelian inheritance, it has proven challenging in complex diseases for which a large number of loci contribute to the genetic variance. The large numbers of single nucleotide polymorphisms now available provide new opportunities for predicting genetic risk of complex diseases with high accuracy. Methodology/Principal Findings We have derived simple deterministic formulae to predict the accuracy of predicted genetic risk from population or case control studies using a genome-wide approach and assuming a dichotomous disease phenotype with an underlying continuous liability. We show that the prediction equations are special cases of the more general problem of predicting the accuracy of estimates of genetic values of a continuous phenotype. Our predictive equations are responsive to all parameters that affect accuracy and they are independent of allele frequency and effect distributions. Deterministic prediction errors when tested by simulation were generally small. The common link among the expressions for accuracy is that they are best summarized as the product of the ratio of number of phenotypic records per number of risk loci and the observed heritability. Conclusions/Significance This study advances the understanding of the relative power of case control and population studies of disease. The predictions represent an upper bound of accuracy which may be achievable with improved effect estimation methods. The formulae derived will help researchers determine an appropriate sample size to attain a certain accuracy when predicting genetic risk.