首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Zhang J 《PloS one》2010,5(11):e13734
Identification of a small panel of population structure informative markers can reduce genotyping cost and is useful in various applications, such as ancestry inference in association mapping, forensics and evolutionary theory in population genetics. Traditional methods to ascertain ancestral informative markers usually require the prior knowledge of individual ancestry and have difficulty for admixed populations. Recently Principal Components Analysis (PCA) has been employed with success to select SNPs which are highly correlated with top significant principal components (PCs) without use of individual ancestral information. The approach is also applicable to admixed populations. Here we propose a novel approach based on our recent result on summarizing population structure by graph laplacian eigenfunctions, which differs from PCA in that it is geometric and robust to outliers. Our approach also takes advantage of the priori sparseness of informative markers in the genome. Through simulation of a ring population and the real global population sample HGDP of 650K SNPs genotyped in 940 unrelated individuals, we validate the proposed algorithm at selecting most informative markers, a small fraction of which can recover the similar underlying population structure efficiently. Employing a standard Support Vector Machine (SVM) to predict individuals' continental memberships on HGDP dataset of seven continents, we demonstrate that the selected SNPs by our method are more informative but less redundant than those selected by PCA. Our algorithm is a promising tool in genome-wide association studies and population genetics, facilitating the selection of structure informative markers, efficient detection of population substructure and ancestral inference.  相似文献   

2.
Admixture is a well known confounder in genetic association studies. If genome-wide data is not available, as would be the case for candidate gene studies, ancestry informative markers (AIMs) are required in order to adjust for admixture. The predominant population group in the Western Cape, South Africa, is the admixed group known as the South African Coloured (SAC). A small set of AIMs that is optimized to distinguish between the five source populations of this population (African San, African non-San, European, South Asian, and East Asian) will enable researchers to cost-effectively reduce false-positive findings resulting from ignoring admixture in genetic association studies of the population. Using genome-wide data to find SNPs with large allele frequency differences between the source populations of the SAC, as quantified by Rosenberg et. al''s -statistic, we developed a panel of AIMs by experimenting with various selection strategies. Subsets of different sizes were evaluated by measuring the correlation between ancestry proportions estimated by each AIM subset with ancestry proportions estimated using genome-wide data. We show that a panel of 96 AIMs can be used to assess ancestry proportions and to adjust for the confounding effect of the complex five-way admixture that occurred in the South African Coloured population.  相似文献   

3.
Most individuals throughout the Americas are admixed descendants of Native American, European, and African ancestors. Complex historical factors have resulted in varying proportions of ancestral contributions between individuals within and among ethnic groups. We developed a panel of 446 ancestry informative markers (AIMs) optimized to estimate ancestral proportions in individuals and populations throughout Latin America. We used genome-wide data from 953 individuals from diverse African, European, and Native American populations to select AIMs optimized for each of the three main continental populations that form the basis of modern Latin American populations. We selected markers on the basis of locus-specific branch length to be informative, well distributed throughout the genome, capable of being genotyped on widely available commercial platforms, and applicable throughout the Americas by minimizing within-continent heterogeneity. We then validated the panel in samples from four admixed populations by comparing ancestry estimates based on the AIMs panel to estimates based on genome-wide association study (GWAS) data. The panel provided balanced discriminatory power among the three ancestral populations and accurate estimates of individual ancestry proportions (R2 > 0.9 for ancestral components with significant between-subject variance). Finally, we genotyped samples from 18 populations from Latin America using the AIMs panel and estimated variability in ancestry within and between these populations. This panel and its reference genotype information will be useful resources to explore population history of admixture in Latin America and to correct for the potential effects of population stratification in admixed samples in the region.  相似文献   

4.
30个祖先信息位点的筛选及应用   总被引:3,自引:0,他引:3  
李彩霞  贾竟  魏以梁  万立华  胡兰  叶健 《遗传》2014,36(8):779-785
摘要:目的 筛选一组祖先信息SNPs位点(AIMs,Ancestry Informative Markers),构建复合检测体系,用于东亚、欧洲和非洲人群遗传成分描述及个体种族来源推断。方法 以HapMap数据库9个人群的658份样本的分型数据为基础,从30个表型相关基因总共282个SNPs位点中筛选出30个AIMs位点,基于微测序-通用芯片技术构建复合检测体系,并建立人群等位基因频率数据库。使用这组位点分析HapMap数据库中658份人群样本,初步验证位点的区分效能;然后,使用研究构建的体系检验收集的5个人群194份无关个体的DNA样本。最后,通过Structure软件分析获取人群的成分构成以及个体的遗传成分,对个体样本进行种族来源推断。 结果 筛选的30个AIMs位点符合哈迪温伯格平衡(p>0.01),位点之间没有连锁(r2<0.1), 658份HapMap数据库样本和194份实验样本的祖先成分分析结果与已知结果完全一致。 结论 本文筛选并建立的30个AIMs位点复合检测体系,能够有效实现东亚、欧洲、非洲人群及混合人群的成分构成和个体遗传成分的分析,有效控制遗传连锁分析中由于人群分层现象带来的误差,也可以用于法医DNA检验中个体祖先来源推断。  相似文献   

5.
Markers with large differences in allele frequencies between ethnicities provide ancestry information that can be applied to genetic studies. We identified over 100 biallelic ancestry informative markers (AIMs) with large allele frequency differences between European Americans (EA) and Pima Amerindians from laboratory and database screens. For 35 of these markers, Mayan, Yavapai and Quechuan Amerindians were genotyped and compared with EA and Pima allele frequencies. Markers with large allele frequency differences between EA and one Amerindian tribe showed only small differences between the Amerindian tribes. Examination of structure in individuals demonstrated a clear separation of subjects of European from those of Amerindian ancestry, and similarity between individuals from disparate Amerindian populations. The AIMs demonstrated the variation in ancestral composition of individual Mexican Americans, providing evidence of applicability in admixture mapping and in controlling for structure in association tests. In addition, a high percentage of single-nucleotide polymorphisms (SNPs) selected on the basis of large frequency differences between EA and Asian populations had large allele frequency differences between EA and Amerindians, suggesting an efficient method for greatly expanding AIMs for use in admixture mapping/structure analysis in Mexican Americans. Together, these data provide additional support for the practical application of admixture mapping in the Mexican American population.Electronic Supplementary Material Supplementary material is available in the online version of this article at  相似文献   

6.
Admixture occurs when individuals from parental populations that have been isolated for hundreds of generations form a new hybrid population. Currently, interest in measuring biogeographic ancestry has spread from anthropology to forensic sciences, direct-to-consumers personal genomics, and civil rights issues of minorities, and it is critical for genetic epidemiology studies of admixed populations. Markers with highly differentiated frequencies among human populations are informative of ancestry and are called ancestry informative markers (AIMs). For tri-hybrid Latin American populations, ancestry information is required for Africans, Europeans and Native Americans. We developed two multiplex panels of AIMs (for 14 SNPs) to be genotyped by two mini-sequencing reactions, suitable for investigators of medium-small laboratories to estimate admixture of Latin American populations. We tested the performance of these AIMs by comparing results obtained with our 14 AIMs with those obtained using 108 AIMs genotyped in the same individuals, for which DNA samples is available for other investigators. We emphasize that this type of comparison should be made when new admixture/population structure panels are developed. At the population level, our 14 AIMs were useful to estimate European admixture, though they overestimated African admixture and underestimated Native American admixture. Combined with more AIMs, our panel could be used to infer individual admixture. We used our panel to infer the pattern of admixture in two urban populations (Montes Claros and Manhua?u) of the State of Minas Gerais (southeastern Brazil), obtaining a snapshot of their genetic structure in the context of their demographic history.  相似文献   

7.
As we move forward from the current generation of genome-wide association (GWA) studies, additional cohorts of different ancestries will be studied to increase power, fine map association signals, and generalize association results to additional populations. Knowledge of genetic ancestry as well as population substructure will become increasingly important for GWA studies in populations of unknown ancestry. Here we propose genotyping pooled DNA samples using genome-wide SNP arrays as a viable option to efficiently and inexpensively estimate admixture proportion and identify ancestry informative markers (AIMs) in populations of unknown origin. We constructed DNA pools from African American, Native Hawaiian, Latina, and Jamaican samples and genotyped them using the Affymetrix 6.0 array. Aided by individual genotype data from the African American cohort, we established quality control filters to remove poorly performing SNPs and estimated allele frequencies for the remaining SNPs in each panel. We then applied a regression-based method to estimate the proportion of admixture in each cohort using the allele frequencies estimated from pooling and populations from the International HapMap Consortium as reference panels, and identified AIMs unique to each population. In this study, we demonstrated that genotyping pooled DNA samples yields estimates of admixture proportion that are both consistent with our knowledge of population history and similar to those obtained by genotyping known AIMs. Furthermore, through validation by individual genotyping, we demonstrated that pooling is quite effective for identifying SNPs with large allele frequency differences (i.e., AIMs) and that these AIMs are able to differentiate two closely related populations (HapMap JPT and CHB).  相似文献   

8.
In the United States, asthma prevalence and mortality are the highest among Puerto Ricans and the lowest among Mexicans. Case-control association studies are a powerful strategy for identifying genes of modest effect in complex diseases. However, studies of complex disorders in admixed populations such as Latinos may be confounded by population stratification. We used ancestry informative markers (AIMs) to identify and correct for population stratification among Mexican and Puerto Rican subjects participating in case-control studies of asthma. Three hundred and sixty-two subjects with asthma (Mexican: 181, Puerto Rican: 181) and 359 ethnically matched controls (Mexican: 181, Puerto Rican: 178) were genotyped for 44 AIMs. We observed a greater than expected degree of association between pairs of AIMs on different chromosomes in Mexicans (P < 0.00001) and Puerto Ricans (P < 0.00002) providing evidence for population substructure and/or recent admixture. To assess the effect of population stratification on association studies of asthma, we measured differences in genetic background of cases and controls by comparing allele frequencies of the 44 AIMs. Among Puerto Ricans but not in Mexicans, we observed a significant overall difference in allele frequencies between cases and controls (P = 0.0002); of 44 AIMs tested, 8 (18%) were significantly associated with asthma. However, after adjustment for individual ancestry, only two of these markers remained significantly associated with the disease. Our findings suggest that empirical assessment of the effects of stratification is critical to appropriately interpret the results of case-control studies in admixed populations.  相似文献   

9.
Skin pigmentation,biogeographical ancestry and admixture mapping   总被引:23,自引:0,他引:23  
Ancestry informative markers (AIMs) are genetic loci showing alleles with large frequency differences between populations. AIMs can be used to estimate biogeographical ancestry at the level of the population, subgroup (e.g. cases and controls) and individual. Ancestry estimates at both the subgroup and individual level can be directly instructive regarding the genetics of the phenotypes that differ qualitatively or in frequency between populations. These estimates can provide a compelling foundation for the use of admixture mapping (AM) methods to identify the genes underlying these traits. We present details of a panel of 34 AIMs and demonstrate how such studies can proceed, by using skin pigmentation as a model phenotype. We have genotyped these markers in two population samples with primarily African ancestry, viz. African Americans from Washington D.C. and an African Caribbean sample from Britain, and in a sample of European Americans from Pennsylvania. In the two African population samples, we observed significant correlations between estimates of individual ancestry and skin pigmentation as measured by reflectometry (R(2)=0.21, P<0.0001 for the African-American sample and R(2)=0.16, P<0.0001 for the British African-Caribbean sample). These correlations confirm the validity of the ancestry estimates and also indicate the high level of population structure related to admixture, a level that characterizes these populations and that is detectable by using other tests to identify genetic structure. We have also applied two methods of admixture mapping to test for the effects of three candidate genes (TYR, OCA2, MC1R) on pigmentation. We show that TYR and OCA2 have measurable effects on skin pigmentation differences between the west African and west European parental populations. This work indicates that it is possible to estimate the individual ancestry of a person based on DNA analysis with a reasonable number of well-defined genetic markers. The implications and applications of ancestry estimates in biomedical research are discussed.  相似文献   

10.
Inference of individual ancestry is useful in various applications, such as admixture mapping and structured-association mapping. Using information-theoretic principles, we introduce a general measure, the informativeness for assignment (I(n)), applicable to any number of potential source populations, for determining the amount of information that multiallelic markers provide about individual ancestry. In a worldwide human microsatellite data set, we identify markers of highest informativeness for inference of regional ancestry and for inference of population ancestry within regions; these markers, which are listed in online-only tables in our article, can be useful both in testing for and in controlling the influence of ancestry on case-control genetic association studies. Markers that are informative in one collection of source populations are generally informative in others. Informativeness of random dinucleotides, the most informative class of microsatellites, is five to eight times that of random single-nucleotide polymorphisms (SNPs), but 2%-12% of SNPs have higher informativeness than the median for dinucleotides. Our results can aid in decisions about the type, quantity, and specific choice of markers for use in studies of ancestry.  相似文献   

11.
Population stratification may confound the results of genetic association studies among unrelated individuals from admixed populations. Several methods have been proposed to estimate the ancestral information in admixed populations and used to adjust the population stratification in genetic association tests. We evaluate the performances of three different methods: maximum likelihood estimation, ADMIXMAP and Structure through various simulated data sets and real data from Latino subjects participating in a genetic study of asthma. All three methods provide similar information on the accuracy of ancestral estimates and control type I error rate at an approximately similar rate. The most important factor in determining accuracy of the ancestry estimate and in minimizing type I error rate is the number of markers used to estimate ancestry. We demonstrate that approximately 100 ancestry informative markers (AIMs) are required to obtain estimates of ancestry that correlate with correlation coefficients more than 0.9 with the true individual ancestral proportions. In addition, after accounting for the ancestry information in association tests, the excess of type I error rate is controlled at the 5% level when 100 markers are used to estimate ancestry. However, since the effect of admixture on the type I error rate worsens with sample size, the accuracy of ancestry estimates also needs to increase to make the appropriate correction. Using data from the Latino subjects, we also apply these methods to an association study between body mass index and 44 AIMs. These simulations are meant to provide some practical guidelines for investigators conducting association studies in admixed populations.  相似文献   

12.

Background

There is a growing interest among geneticists in developing panels of Ancestry Informative Markers (AIMs) aimed at measuring the biogeographical ancestry of individual genomes. The efficiency of these panels is commonly tested empirically by contrasting self-reported ancestry with the ancestry estimated from these panels.

Results

Using SNP data from HapMap we carried out a simulation-based study aimed at measuring the effect of SNP coverage on the estimation of genome ancestry. For three of the main continental groups (Africans, East Asians, Europeans) ancestry was first estimated using the whole HapMap SNP database as a proxy for global genome ancestry; these estimates were subsequently compared to those obtained from pre-designed AIM panels. Panels that consider >400 AIMs capture genome ancestry reasonably well, while those containing a few dozen AIMs show a large variability in ancestry estimates. Curiously, 500-1,000 SNPs selected at random from the genome provide an unbiased estimate of genome ancestry and perform as well as any AIM panel of similar size. In simulated scenarios of population admixture, panels containing few AIMs also show important deficiencies to measure genome ancestry.

Conclusions

The results indicate that the ability to estimate genome ancestry is strongly dependent on the number of AIMs used, and not primarily on their individual informativeness. Caution should be taken when making individual (medical, forensic, or anthropological) inferences based on AIMs.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-543) contains supplementary material, which is available to authorized users.  相似文献   

13.
Ancestry-informative markers (AIMs) show high allele frequency divergence between different ancestral or geographically distant populations. These genetic markers are especially useful in inferring the likely ancestral origin of an individual or estimating the apportionment of ancestry components in admixed individuals or populations. The study of AIMs is of great interest in clinical genetics research, particularly to detect and correct for population substructure effects in case-control association studies, but also in population and forensic genetics studies. This work presents a set of 46 ancestry-informative insertion deletion polymorphisms selected to efficiently measure population admixture proportions of four different origins (African, European, East Asian and Native American). All markers are analyzed in short fragments (under 230 basepairs) through a single PCR followed by capillary electrophoresis (CE) allowing a very simple one tube PCR-to-CE approach. HGDP-CEPH diversity panel samples from the four groups, together with Oceanians, were genotyped to evaluate the efficiency of the assay in clustering populations from different continental origins and to establish reference databases. In addition, other populations from diverse geographic origins were tested using the HGDP-CEPH samples as reference data. The results revealed that the AIM-INDEL set developed is highly efficient at inferring the ancestry of individuals and provides good estimates of ancestry proportions at the population level. In conclusion, we have optimized the multiplexed genotyping of 46 AIM-INDELs in a simple and informative assay, enabling a more straightforward alternative to the commonly available AIM-SNP typing methods dependent on complex, multi-step protocols or implementation of large-scale genotyping technologies.  相似文献   

14.
Admixture mapping is a promising new tool for discovering genes that contribute to complex traits. This mapping approach uses samples from recently admixed populations to detect susceptibility loci at which the risk alleles have different frequencies in the original contributing populations. Although the idea for admixture mapping has been around for more than a decade, the genomic tools are only now becoming available to make this a feasible and attractive option for complex-trait mapping. In this article, we describe new statistical methods for analyzing multipoint data from admixture-mapping studies to detect "ancestry association." The new test statistics do not assume a particular disease model; instead, they are based simply on the extent to which the sample's ancestry proportions at a locus deviate from the genome average. Our power calculations show that, for loci at which the underlying risk-allele frequencies are substantially different in the ancestral populations, the power of admixture mapping can be comparable to that of association mapping but with a far smaller number of markers. We also show that, although "ancestry informative markers" (AIMs) are superior to random single-nucleotide polymorphisms (SNPs), random SNPs can perform quite well when AIMs are not available. Hence, researchers who study admixed populations in which AIMs are not available can perform admixture mapping with the use of modestly higher densities of random markers. Software to perform the gene-mapping calculations, "MALDsoft," is freely available on the Pritchard Lab Web site.  相似文献   

15.
Asthma is a complex respiratory disease characterized by chronic inflammation of airways and frequently associated with atopic symptoms. The population from the Canary Islands, which has resulted from a recent admixture of North African and Iberian populations, shows the highest prevalence of asthma and atopic symptoms among the Spanish populations. Although environmental particularities would account for the majority of such disparity, genetic ancestry might play a role in increasing the susceptibility of asthma or atopy, as have been demonstrated in other recently African-admixed populations. Here, we aimed to explore whether genetic ancestry was associated with asthma or related traits in the Canary Islanders. For that, a total of 734 DNA samples from unrelated individuals of the GOA study, self-reporting at least two generations of ancestors from the Canary Islands (391 asthmatics and 343 controls), were successfully genotyped for 83 ancestry informative markers (AIMs), which allowed to precisely distinguishing between North African and Iberian ancestries. No association was found between genetic ancestry and asthma or related traits after adjusting by demographic variables differing among compared groups. Similarly, none of the individual AIMs was associated with asthma when results were considered in the context of the multiple comparisons performed (0.005?≤?p value?≤?0.042; 0.221?≤?q value?≤?0.443). Our results suggest that if genetic ancestry were involved in the susceptibility to asthma or related traits among Canary Islanders, its effects would be modest. Larger studies, examining more genetic variants, would be needed to explore such possibility.  相似文献   

16.
Admixture mapping is a recently developed method for identifying genetic risk factors involved in complex traits or diseases showing prevalence differences between major continental groups. Type 2 diabetes (T2D) is at least twice as prevalent in Native American populations as in populations of European ancestry, so admixture mapping is well suited to study the genetic basis of this complex disease. We have characterized the admixture proportions in a sample of 286 unrelated T2D patients and 275 controls from Mexico City and we discuss the implications of the results for admixture mapping studies. Admixture proportions were estimated using 69 autosomal ancestry-informative markers (AIMs). Maternal and paternal contributions were estimated from geographically informative mtDNA and Y-specific polymorphisms. The average proportions of Native American, European and, West African admixture were estimated as 65, 30, and 5%, respectively. The contributions of Native American ancestors to maternal and paternal lineages were estimated as 90 and 40%, respectively. In a logistic model with higher educational status as dependent variable, the odds ratio for higher educational status associated with an increase from 0 to 1 in European admixture proportions was 9.4 (95%, credible interval 3.8–22.6). This association of socioeconomic status with individual admixture proportion shows that genetic stratification in this population is paralleled, and possibly maintained, by socioeconomic stratification. The effective number of generations back to unadmixed ancestors was 6.7 (95% CI 5.7–8.0), from which we can estimate that genome-wide admixture mapping will require typing about 1,400 evenly distributed AIMs to localize genes underlying disease risk between populations of European and Native American ancestry. Sample sizes of about 2,000 cases will be required to detect any locus that contributes an ancestry risk ratio of at least 1.5.  相似文献   

17.

Introduction

Non-Hispanic (nH) Black and Hispanic women are disproportionately affected by early onset disease, later stage, and with more aggressive, higher grade and ER/PR negative breast cancers. The purpose of this analysis was to examine whether genetic ancestry could account for these variation in breast cancer characteristics, once data were stratified by self-reported race/ethnicity and adjusted for potential confounding by social and behavioral factors.

Methods

We used a panel of 100 ancestry informative markers (AIMs) to estimate individual genetic ancestry in 656 women from the “Breast Cancer Care in Chicago” study, a multi-ethnic cohort of breast cancer patients to examine the association between individual genetic ancestry and breast cancer characteristics. In addition we examined the association of individual AIMs and breast cancer to identify genes/regions that may potentially play a role in breast cancer disease disparities.

Results

As expected, nH Black and Hispanic patients were more likely than nH White patients to be diagnosed at later stages, with higher grade, and with ER/PR negative tumors. Higher European genetic ancestry was protective against later stage at diagnosis (OR 0.7 95%CI: 0.54–0.92) among Hispanic patients, and higher grade (OR 0.73, 95%CI: 0.56–0.95) among nH Black patients. After adjustment for multiple social and behavioral risk factors, the association with later stage remained, while the association with grade was not significant. We also found that the AIM SNP rs10954631 on chromosome 7 was associated with later stage (p = 0.02) and higher grade (p = 0.012) in nH Whites and later stage (p = 0.03) in nH Blacks.

Conclusion

Non-European genetic ancestry was associated with later stage at diagnosis in ethnic minorities. The relation between genetic ancestry and stage at diagnosis may be due to genetic factors and/or unmeasured environmental factors that are overrepresented within certain racial/ethnic groups.  相似文献   

18.
Self-reported race/ethnicity is frequently used in epidemiological studies to assess an individual’s background origin. However, in admixed populations such as Hispanic, self-reported race/ethnicity may not accurately represent them genetically because they are admixed with European, African and Native American ancestry. We estimated the proportions of genetic admixture in an ethnically diverse population of 396 mothers and 188 of their children with 35 ancestry informative markers (AIMs) using the STRUCTURE version 2.2 program. The majority of the markers showed significant deviation from Hardy-Weinberg equilibrium in our study population. In mothers self-identified as Black and White, the imputed ancestry proportions were 77.6% African and 75.1% European respectively, while the racial composition among self-identified Hispanics was 29.2% European, 26.0% African, and 44.8% Native American. We also investigated the utility of AIMs by showing the improved fitness of models in paraoxanase-1 genotype-phenotype associations after incorporating AIMs; however, the improvement was moderate at best. In summary, a minimal set of 35 AIMs is sufficient to detect population stratification and estimate the proportion of individual genetic admixture; however, the utility of these markers remains questionable.  相似文献   

19.
Argentine population genetic structure was examined using a set of 78 ancestry informative markers (AIMs) to assess the contributions of European, Amerindian, and African ancestry in 94 individuals members of this population. Using the Bayesian clustering algorithm STRUCTURE, the mean European contribution was 78%, the Amerindian contribution was 19.4%, and the African contribution was 2.5%. Similar results were found using weighted least mean square method: European, 80.2%; Amerindian, 18.1%; and African, 1.7%. Consistent with previous studies the current results showed very few individuals (four of 94) with greater than 10% African admixture. Notably, when individual admixture was examined, the Amerindian and European admixture showed a very large variance and individual Amerindian contribution ranged from 1.5 to 84.5% in the 94 individual Argentine subjects. These results indicate that admixture must be considered when clinical epidemiology or case control genetic analyses are studied in this population. Moreover, the current study provides a set of informative SNPs that can be used to ascertain or control for this potentially hidden stratification. In addition, the large variance in admixture proportions in individual Argentine subjects shown by this study suggests that this population is appropriate for future admixture mapping studies.  相似文献   

20.
Existing methods to ascertain small sets of markers for the identification of human population structure require prior knowledge of individual ancestry. Based on Principal Components Analysis (PCA), and recent results in theoretical computer science, we present a novel algorithm that, applied on genomewide data, selects small subsets of SNPs (PCA-correlated SNPs) to reproduce the structure found by PCA on the complete dataset, without use of ancestry information. Evaluating our method on a previously described dataset (10,805 SNPs, 11 populations), we demonstrate that a very small set of PCA-correlated SNPs can be effectively employed to assign individuals to particular continents or populations, using a simple clustering algorithm. We validate our methods on the HapMap populations and achieve perfect intercontinental differentiation with 14 PCA-correlated SNPs. The Chinese and Japanese populations can be easily differentiated using less than 100 PCA-correlated SNPs ascertained after evaluating 1.7 million SNPs from HapMap. We show that, in general, structure informative SNPs are not portable across geographic regions. However, we manage to identify a general set of 50 PCA-correlated SNPs that effectively assigns individuals to one of nine different populations. Compared to analysis with the measure of informativeness, our methods, although unsupervised, achieved similar results. We proceed to demonstrate that our algorithm can be effectively used for the analysis of admixed populations without having to trace the origin of individuals. Analyzing a Puerto Rican dataset (192 individuals, 7,257 SNPs), we show that PCA-correlated SNPs can be used to successfully predict structure and ancestry proportions. We subsequently validate these SNPs for structure identification in an independent Puerto Rican dataset. The algorithm that we introduce runs in seconds and can be easily applied on large genome-wide datasets, facilitating the identification of population substructure, stratification assessment in multi-stage whole-genome association studies, and the study of demographic history in human populations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号