Ioannis Politopoulos | Harvard School of Public Health (original) (raw)
Uploads
Papers by Ioannis Politopoulos
European Journal of …, Jan 1, 2010
We describe composite likelihood-based analysis of a genome-wide breast cancer casecontrol sample... more We describe composite likelihood-based analysis of a genome-wide breast cancer casecontrol sample from the Cancer Genetic Markers of Susceptibility project. We determine 14,380 genome regions of fixed size on a linkage disequilibrium map which delimit comparable levels of linkage disequilibrium. Although the numbers of SNPs are highly variable each region contains an average of ~35 SNPs and an average of ~69 after imputation of missing genotypes. Composite likelihood association mapping yields a single P-value for each region, established by a permutation test, along with a maximum likelihood disease location, standard error and information weight. For single SNP analysis the nominal P-value for the most significant SNP (msSNP) requires substantial correction given the number of SNPs in the region. Therefore imputing genotypes may not always be advantageous for the msSNP test, in contrast to composite likelihood. For the region containing FGFR2 (a known breast cancer gene) the largest chi-square is obtained under composite likelihood with imputed genotypes ( 2 2 χ increases from 20.6 to 22.7), and compares to a single-SNP based 2 2 χ of 19.9 after correction. Imputation of additional genotypes in this region reduces the size of the 95% confidence interval for location of the disease gene by ~40%. Amongst the highest ranked regions, SNPs in the NTSR1 gene would be worthy of examination in additional samples. Meta-analysis, which combines weighted evidence from composite likelihood in different samples, and refines putative disease locations, is facilitated through defining fixed regions on an underlying linkage disequilibrium map.
The Application of Clinical Genetics, Jan 1, 2011
The genetic factors known to be involved in breast cancer risk comprise about 30 genes. These inc... more The genetic factors known to be involved in breast cancer risk comprise about 30 genes. These include the high-penetrance early-onset breast cancer genes, BRCA1 and BRCA2, a number of rare cancer syndrome genes, and rare genes with more moderate penetrance. A larger group of common variants has more recently been identified through genome-wide association studies. Quite a number of these common variants are mapped to genomic regions without being firmly associated with specific genes. It is thought that most of these variants have gene regulatory functions, but their precise roles in disease susceptibility are not well understood. Common variants account for only a small percentage of the risk of disease because they have low penetrance. Collectively, the breast cancer genes identified to date contribute only ∼30% of the familial risk. Therefore, there is much interest in accounting for the missing heritability, and possible sources include loss of information through ignoring phenotype heterogeneity (disease subtypes have genetic differences), gene-gene and gene-environment interaction, and rarer forms of variation. Identification of these rarer variations in coding regions is now feasible and cost effective through exome sequencing, which has already identified high-penetrance variants for some rare diseases. Targeting more 'extreme' breast cancer phenotypes, particularly cases with early-onset disease, a strong family history (not accounted for by BRCA mutations), and with specific tumor subtypes, provides a route to progress using next-generation sequencing methods.
Journal of human …, Jan 1, 2011
For detecting low risk disease variants in genome-wide association panels, meta-analysis is a pow... more For detecting low risk disease variants in genome-wide association panels, meta-analysis is a powerful strategy to increase power. We apply a composite likelihood-based method, which models association with disease in regions defined on a linkage disequilibrium map and combines the evidence across multiple genome-wide samples. This fixed region approach has the advantage that, as only one statistical test is made per region, there is no increased multiple testing penalty in higher marker density panels. Imputation of missing genotypes is also advantageous to increase coverage. Meta-analysis of three breast cancer data sets combines evidence from samples that show heterogeneity in phenotype and, particularly, in marker coverage. The FGFR2 gene has the highest rank, consistent with previous analysis of one of these samples and supported by the small number of early-onset breast cancer cases included. The 8q24 breast cancer region also ranks highly and is supported by evidence from both early-onset and post-menopausal breast cancer samples. The PIK3AP1 gene region is highlighted in this analysis as a strong candidate for further study.
European Journal of …, Jan 1, 2010
We describe composite likelihood-based analysis of a genome-wide breast cancer casecontrol sample... more We describe composite likelihood-based analysis of a genome-wide breast cancer casecontrol sample from the Cancer Genetic Markers of Susceptibility project. We determine 14,380 genome regions of fixed size on a linkage disequilibrium map which delimit comparable levels of linkage disequilibrium. Although the numbers of SNPs are highly variable each region contains an average of ~35 SNPs and an average of ~69 after imputation of missing genotypes. Composite likelihood association mapping yields a single P-value for each region, established by a permutation test, along with a maximum likelihood disease location, standard error and information weight. For single SNP analysis the nominal P-value for the most significant SNP (msSNP) requires substantial correction given the number of SNPs in the region. Therefore imputing genotypes may not always be advantageous for the msSNP test, in contrast to composite likelihood. For the region containing FGFR2 (a known breast cancer gene) the largest chi-square is obtained under composite likelihood with imputed genotypes ( 2 2 χ increases from 20.6 to 22.7), and compares to a single-SNP based 2 2 χ of 19.9 after correction. Imputation of additional genotypes in this region reduces the size of the 95% confidence interval for location of the disease gene by ~40%. Amongst the highest ranked regions, SNPs in the NTSR1 gene would be worthy of examination in additional samples. Meta-analysis, which combines weighted evidence from composite likelihood in different samples, and refines putative disease locations, is facilitated through defining fixed regions on an underlying linkage disequilibrium map.
The Application of Clinical Genetics, Jan 1, 2011
The genetic factors known to be involved in breast cancer risk comprise about 30 genes. These inc... more The genetic factors known to be involved in breast cancer risk comprise about 30 genes. These include the high-penetrance early-onset breast cancer genes, BRCA1 and BRCA2, a number of rare cancer syndrome genes, and rare genes with more moderate penetrance. A larger group of common variants has more recently been identified through genome-wide association studies. Quite a number of these common variants are mapped to genomic regions without being firmly associated with specific genes. It is thought that most of these variants have gene regulatory functions, but their precise roles in disease susceptibility are not well understood. Common variants account for only a small percentage of the risk of disease because they have low penetrance. Collectively, the breast cancer genes identified to date contribute only ∼30% of the familial risk. Therefore, there is much interest in accounting for the missing heritability, and possible sources include loss of information through ignoring phenotype heterogeneity (disease subtypes have genetic differences), gene-gene and gene-environment interaction, and rarer forms of variation. Identification of these rarer variations in coding regions is now feasible and cost effective through exome sequencing, which has already identified high-penetrance variants for some rare diseases. Targeting more 'extreme' breast cancer phenotypes, particularly cases with early-onset disease, a strong family history (not accounted for by BRCA mutations), and with specific tumor subtypes, provides a route to progress using next-generation sequencing methods.
Journal of human …, Jan 1, 2011
For detecting low risk disease variants in genome-wide association panels, meta-analysis is a pow... more For detecting low risk disease variants in genome-wide association panels, meta-analysis is a powerful strategy to increase power. We apply a composite likelihood-based method, which models association with disease in regions defined on a linkage disequilibrium map and combines the evidence across multiple genome-wide samples. This fixed region approach has the advantage that, as only one statistical test is made per region, there is no increased multiple testing penalty in higher marker density panels. Imputation of missing genotypes is also advantageous to increase coverage. Meta-analysis of three breast cancer data sets combines evidence from samples that show heterogeneity in phenotype and, particularly, in marker coverage. The FGFR2 gene has the highest rank, consistent with previous analysis of one of these samples and supported by the small number of early-onset breast cancer cases included. The 8q24 breast cancer region also ranks highly and is supported by evidence from both early-onset and post-menopausal breast cancer samples. The PIK3AP1 gene region is highlighted in this analysis as a strong candidate for further study.