首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Gene flow and recombination in admixed populations produce genomes that are mosaic combinations of chromosome segments inherited from different source populations, that is, chromosome segments with different genetic ancestries. The statistical problem of estimating genetic ancestry from DNA sequence data has been widely studied, and analyses of genetic ancestry have facilitated research in molecular ecology and ecological genetics. In this review, we describe and compare different model‐based statistical methods used to infer genetic ancestry. We describe the conceptual and mathematical structure of these models and highlight some of their key differences and shared features. We then discuss recent empirical studies that use estimates of genetic ancestry to analyse population histories, the nature and genetic basis of species boundaries, and the genetic architecture of traits. These diverse studies demonstrate the breadth of applications that rely on genetic ancestry estimates and typify the genomics‐enabled research that is becoming increasingly common in molecular ecology. We conclude by identifying key research areas where future studies might further advance this field.  相似文献   

2.
Effect of gene conversion on variances of digenic identity measures   总被引:1,自引:0,他引:1  
The variances and covariances of digenic descent measures are studied for a two-locus model incorporating mutation, gene conversion, recombination, drift, and finite sampling. Gene conversion can occur between allelic pairs of genes or between non-allelic pairs on the same or different gametes within individuals. Most interest therefore centers on pairs of genes, and five digenic identity measures are required. The behavior over time of these measures is studied, with an emphasis on the effects of gene conversion. Because of the stochastic nature of the forces of drift, recombination, mutation, and conversion, the actual identity status of gene pairs can vary from expectation among replicate populations. To study this variation we compute the expected variances and covariances of the measures, and show that this requires the introduction of trigenic and quadrigenic measures. Allowing for conversion between genes on different gametes requires a large number of these higher-order measures.  相似文献   

3.
In the current study, we used bootstrap analyses and the common principal component (CPC) method of Flury (1988) to estimate and compare the G ‐matrix of Scabiosa columbaria and S. canescens populations. We found three major patterns in the G ‐matrices: (i) the magnitude of the (co)variances was more variable among characters than among populations, (ii) different populations showed high (co)variance for different characters, and (iii) there was a tendency for S. canescens to have higher genetic (co)variances than S. columbaria. The hypothesis of equal G ‐matrices was rejected in all comparisons and there was no evidence that the matrices differed by a proportional constant in any of the analyses. The two ‘species matrices’ were found to be unrelated, both for raw data and data standardized over populations, and there was significant between‐population variation in the G ‐matrix in both species. Populations of S. canescens showed conservation of structure (principal components) in their G ‐matrices, contrasting with the lack of common structure among the S. columbaria matrices. Given these observations and the results from previous studies, we propose that selection may be responsible for some of the variation between the G ‐matrices, at least in S. columbaria and at the between‐species level.  相似文献   

4.
The leucine-rich repeat kinase 2 (LRRK2) G2019S mutation is the most common genetic determinant of Parkinson disease (PD) identified to date. It accounts for 1%-7% of PD in patients of European origin and 20%-40% in Ashkenazi Jews and North African Arabs with PD. Previous studies concluded that patients from these populations all shared a common Middle Eastern founder who lived in the 13th century. We tested this hypothesis by genotyping 25 microsatellite and single-nucleotide-polymorphism markers in 22 families with G2019S and observed two distinct haplotypes. Haplotype 1 was present in 19 families of Ashkenazi Jewish and European ancestry, whereas haplotype 2 occurred in three European American families. Using a maximum-likelihood method, we estimated that the families with haplotype 1 shared a common ancestor 2,250 (95% confidence interval 1,650-3,120) years ago, whereas those with haplotype 2 appeared to share a more recent founder. Our data suggest two separate founding events for G2019S in these populations, beginning at a time that coincides with the Jewish Diasporas.  相似文献   

5.
Sequence variation and haplotype structure at the human HFE locus   总被引:4,自引:0,他引:4  
Toomajian C  Kreitman M 《Genetics》2002,161(4):1609-1623
The HFE locus encodes an HLA class-I-type protein important in iron regulation and segregates replacement mutations that give rise to the most common form of genetic hemochromatosis. The high frequency of one disease-associated mutation, C282Y, and the nature of this disease have led some to suggest a selective advantage for this mutation. To investigate the context in which this mutation arose and gain a better understanding of HFE genetic variation, we surveyed nucleotide variability in 11.2 kb encompassing the HFE locus and experimentally determined haplotypes. We fully resequenced 60 chromosomes of African, Asian, or European ancestry as well as one chimpanzee, revealing 41 variable sites and a nucleotide diversity of 0.08%. This indicates that linkage to the HLA region has not substantially increased the level of HFE variation. Although several haplotypes are shared between populations, one haplotype predominates in Asia but is nearly absent elsewhere, causing higher than average genetic differentiation among the three major populations. Our samples show evidence of intragenic recombination, so the scarcity of recombination events within the C282Y allele class is consistent with selection increasing the frequency of a young allele. Otherwise, the pattern of variability in this region does not clearly indicate the action of positive selection at this or linked loci.  相似文献   

6.
Hybridization is increasingly recognized as an important evolutionary force. Novel genetic methods now enable us to address how the genomes of parental species are combined in hybrid lineages. However, we still do not know the relative importance of admixed proportions, genome architecture and local selection in shaping hybrid genomes. Here, we take advantage of the genetically divergent island populations of Italian sparrow on Crete, Corsica and Sicily to investigate the predictors of genomic variation within a hybrid taxon. We test if differentiation is affected by recombination rate, selection, or variation in ancestry proportions. We find that the relationship between recombination rate and differentiation is less pronounced within hybrid lineages than between the parent species, as expected if purging of minor parent ancestry in low recombination regions reduces the variation available for differentiation. In addition, we find that differentiation between islands is correlated with differences in signatures of selection in two out of three comparisons. Signatures of selection within islands are correlated across all islands, suggesting that shared selection may mould genomic differentiation. The best predictor of strong differentiation within islands is the degree of differentiation from house sparrow, and hence loci with Spanish sparrow ancestry may vary more freely. Jointly, this suggests that constraints and selection interact in shaping the genomic landscape of differentiation in this hybrid species.  相似文献   

7.
Accurate estimation of individual ancestry is important in genetic association studies, especially when a large number of samples are collected from multiple sources. However, existing approaches developed for genome-wide SNP data do not work well with modest amounts of genetic data, such as in targeted sequencing or exome chip genotyping experiments. We propose a statistical framework to estimate individual ancestry in a principal component ancestry map generated by a reference set of individuals. This framework extends and improves upon our previous method for estimating ancestry using low-coverage sequence reads (LASER 1.0) to analyze either genotyping or sequencing data. In particular, we introduce a projection Procrustes analysis approach that uses high-dimensional principal components to estimate ancestry in a low-dimensional reference space. Using extensive simulations and empirical data examples, we show that our new method (LASER 2.0), combined with genotype imputation on the reference individuals, can substantially outperform LASER 1.0 in estimating fine-scale genetic ancestry. Specifically, LASER 2.0 can accurately estimate fine-scale ancestry within Europe using either exome chip genotypes or targeted sequencing data with off-target coverage as low as 0.05×. Under the framework of LASER 2.0, we can estimate individual ancestry in a shared reference space for samples assayed at different loci or by different techniques. Therefore, our ancestry estimation method will accelerate discovery in disease association studies not only by helping model ancestry within individual studies but also by facilitating combined analysis of genetic data from multiple sources.  相似文献   

8.
The single most difficult problem in phylogenetic analysis is deciding whether a shared taxonomic character is due to common ancestry or one that appeared independently due to convergence, parallelism, or reversion to an ancestral state. Mammalian L1 retrotransposons undergo periodic amplifications in which multiple copies of the elements are interspersed in the genome. Because these elements apparently are transmitted only by inheritance and are retained in the genome, a shared L1 amplification event can only be an inherited ancestral character. We propose that L1 amplification events can be an excellent tool for analyzing mammalian evolution and demonstrate here how we addressed several refractory problems in rodent systematics using L1 DNA as a taxonomic character.   相似文献   

9.
Gravel S 《Genetics》2012,191(2):607-619
Migrations have played an important role in shaping the genetic diversity of human populations. Understanding genomic data thus requires careful modeling of historical gene flow. Here we consider the effect of relatively recent population structure and gene flow and interpret genomes of individuals that have ancestry from multiple source populations as mosaics of segments originating from each population. This article describes general and tractable models for local ancestry patterns with a focus on the length distribution of continuous ancestry tracts and the variance in total ancestry proportions among individuals. The models offer improved agreement with Wright-Fisher simulation data when compared to the state-of-the art and can be used to infer time-dependent migration rates from multiple populations. Considering HapMap African-American (ASW) data, we find that a model with two distinct phases of "European" gene flow significantly improves the modeling of both tract lengths and ancestry variances.  相似文献   

10.

Background

Accurate, high-throughput genotyping allows the fine characterization of genetic ancestry. Here we applied recently developed statistical and computational techniques to the question of African ancestry in African Americans by using data on more than 450,000 single-nucleotide polymorphisms (SNPs) genotyped in 94 Africans of diverse geographic origins included in the HGDP, as well as 136 African Americans and 38 European Americans participating in the Atherosclerotic Disease Vascular Function and Genetic Epidemiology (ADVANCE) study. To focus on African ancestry, we reduced the data to include only those genotypes in each African American determined statistically to be African in origin.

Results

From cluster analysis, we found that all the African Americans are admixed in their African components of ancestry, with the majority contributions being from West and West-Central Africa, and only modest variation in these African-ancestry proportions among individuals. Furthermore, by principal components analysis, we found little evidence of genetic structure within the African component of ancestry in African Americans.

Conclusions

These results are consistent with historic mating patterns among African Americans that are largely uncorrelated to African ancestral origins, and they cast doubt on the general utility of mtDNA or Y-chromosome markers alone to delineate the full African ancestry of African Americans. Our results also indicate that the genetic architecture of African Americans is distinct from that of Africans, and that the greatest source of potential genetic stratification bias in case-control studies of African Americans derives from the proportion of European ancestry.  相似文献   

11.
Innan H 《Genetics》2002,161(2):865-872
A simple two-locus gene conversion model is considered to investigate the amounts of DNA variation and linkage disequilibrium in small multigene families. The exact solutions for the expectations and variances of the amounts of variation within and between two loci are obtained. It is shown that gene conversion increases the amount of variation within each locus and decreases the amount of variation between two loci. The expectation and variance of the amount of linkage disequilibrium are also obtained. Gene conversion generates positive linkage disequilibrium and the degree of linkage disequilibrium decreases as the recombination rate is increased. Using the theoretical results, a method for estimating the mutation, gene conversion, and recombination parameters is developed and applied to the data of the Amy multigene family in Drosophila melanogaster. The gene conversion rate is estimated to be approximately 60-165 times higher than the mutation rate for synonymous sites.  相似文献   

12.
The number of distinct functional classes of single-stranded RNAs (ssRNAs) and the number of sequences representing them are substantial and continue to increase. Organizing this data in an evolutionary context is essential, yet traditional comparative sequence analyses require that homologous sites can be identified. This prevents comparative analysis between sequences of different functional classes that share no site-to-site sequence similarity. Analysis within a single evolutionary lineage also limits evolutionary inference because shared ancestry confounds properties of molecular structure and function that are historically contingent with those that are imposed for biophysical reasons. Here, we apply a method of comparative analysis to ssRNAs that is not restricted to homologous sequences, and therefore enables comparison between distantly related or unrelated sequences, minimizing the effects of shared ancestry. This method is based on statistical similarities in nucleotide base composition among different functional classes of ssRNAs. In order to denote base composition unambiguously, we have calculated the fraction G+A and G+U content, in addition to the more commonly used fraction G+C content. These three parameters define RNA composition space, which we have visualized using interactive graphics software. We have examined the distribution of nucleotide composition from 15 distinct functional classes of ssRNAs from organisms spanning the universal phylogenetic tree and artificial ribozymes evolved in vitro. Surprisingly, these distributions are biased consistently in G+A and G+U content, both within and between functional classes, regardless of the more variable G+C content. Additionally, an analysis of the base composition of secondary structural elements indicates that paired and unpaired nucleotides, known to have different evolutionary rates, also have significantly different compositional biases. These universal compositional biases observed among ssRNAs sharing little or no sequence similarity suggest, contrary to current understanding, that base composition biases constitute a convergent adaptation among a wide variety of molecular functions.  相似文献   

13.
Human immunodeficiency virus type 1 (HIV-1) is classified in nine subtypes (A to D, F, G, H, J, and K), a number of subsubtypes, and several circulating recombinant forms (CRFs). Due to the high level of genetic diversity within HIV-1 and to its worldwide distribution, this classification system is widely used in fields as diverse as vaccine development, evolution, epidemiology, viral fitness, and drug resistance. Here, we demonstrate how the high recombination rates of HIV-1 may confound the study of its evolutionary history and classification. Our data show that subtype G, currently classified as a pure subtype, has in fact a recombinant history, having evolved following recombination between subtypes A and J and a putative subtype G parent. In addition, we find no evidence for recombination within one of the lineages currently classified as a CRF, CRF02_AG. Our analysis indicates that CRF02_AG was the parent of the recombinant subtype G, rather than the two having the opposite evolutionary relationship, as is currently proposed. Our results imply that the current classification of HIV-1 subtypes and CRFs is an artifact of sampling history, rather than reflecting the evolutionary history of the virus. We suggest a reanalysis of all pure subtypes and CRFs in order to better understand how high rates of recombination have influenced HIV-1 evolutionary history.  相似文献   

14.
Estimating the mutation rate, or equivalently effective population size, is a common task in population genetics. If recombination is low or high, optimal linear estimation methods are known and well understood. For intermediate recombination rates, the calculation of optimal estimators is more challenging. As an alternative to model-based estimation, neural networks and other machine learning tools could help to develop good estimators in these involved scenarios. However, if no benchmark is available it is difficult to assess how well suited these tools are for different applications in population genetics.Here we investigate feedforward neural networks for the estimation of the mutation rate based on the site frequency spectrum and compare their performance with model-based estimators. For this we use the model-based estimators introduced by Fu, Futschik et al., and Watterson that minimize the variance or mean squared error for no and free recombination. We find that neural networks reproduce these estimators if provided with the appropriate features and training sets. Remarkably, using the model-based estimators to adjust the weights of the training data, only one hidden layer is necessary to obtain a single estimator that performs almost as well as model-based estimators for low and high recombination rates, and at the same time provides a superior estimation method for intermediate recombination rates. We apply the method to simulated data based on the human chromosome 2 recombination map, highlighting its robustness in a realistic setting where local recombination rates vary and/or are unknown.  相似文献   

15.
Andolfatto P  Przeworski M 《Genetics》2000,156(1):257-268
We analyze nucleotide polymorphism data for a large number of loci in areas of normal to high recombination in Drosophila melanogaster and D. simulans (24 and 16 loci, respectively). We find a genome-wide, systematic departure from the neutral expectation for a panmictic population at equilibrium in natural populations of both species. The distribution of sequence-based estimates of 2Nc across loci is inconsistent with the assumptions of the standard neutral theory, given the observed levels of nucleotide diversity and accepted values for recombination and mutation rates. Under these assumptions, most estimates of 2Nc are severalfold too low; in other words, both species exhibit greater intralocus linkage disequilibrium than expected. Variation in recombination or mutation rates is not sufficient to account for the excess of linkage disequilibrium. While an equilibrium island model does not seem to account for the data, more complicated forms of population structure may. A proper test of alternative demographic models will require loci to be sampled in a more consistent fashion.  相似文献   

16.
Throughout the living world, genetic recombination and nucleotide substitution are the primary processes that create the genetic variation upon which natural selection acts. Just as analyses of substitution patterns can reveal a great deal about evolution, so too can analyses of recombination. Evidence of genetic recombination within the genomes of apparently asexual species can equate with evidence of cryptic sexuality. In sexually reproducing species, nonrandom patterns of sequence exchange can provide direct evidence of population subdivisions that prevent certain individuals from mating. Although an interesting topic in its own right, an important reason for analysing recombination is to account for its potentially disruptive influences on various phylogenetic-based molecular evolution analyses. Specifically, the evolutionary histories of recombinant sequences cannot be accurately described by standard bifurcating phylogenetic trees. Taking recombination into account can therefore be pivotal to the success of selection, molecular clock and various other analyses that require adequate modelling of shared ancestry and draw increased power from accurately inferred phylogenetic trees. Here, we review various computational approaches to studying recombination and provide guidelines both on how to gain insights into this important evolutionary process and on how it can be properly accounted for during molecular evolution studies.  相似文献   

17.
The nucleotide composition of the genome is a balance between the origin and fixation rates of different mutations. For example, it is well-known that transitions occur more frequently than transversions, particularly at CpG sites. Differences in fixation rates of mutation types are less explored. Specifically, recombination-associated GC-biased gene conversion (gBGC) may differentially impact GC-changing mutations, due to differences in their genomic distributions and efficiency of mismatch repair mechanisms. Given that recombination evolves rapidly across species, we explore gBGC of different mutation types across human populations and great ape species. We report a stronger correlation between segregating GC frequency and recombination for transitions than for transversions. Notably, CpG transitions are most strongly affected by gBGC in humans and chimpanzees. We show that the overall strength of gBGC is generally correlated with effective population sizes in humans, with some notable exceptions, such as a stronger effect of gBGC on non-CpG transitions in populations of European descent. Furthermore, species of the Gorilla and Pongo genus have a greatly reduced gBGC effect on CpG sites. We also study the dependence of gBGC dynamics on flanking nucleotides and show that some mutation types evolve in opposition to the gBGC expectation, likely due to the hypermutability of specific nucleotide contexts. Our results highlight the importance of different gBGC dynamics experienced by GC-changing mutations and their impact on nucleotide composition evolution.  相似文献   

18.
Genealogical inference from genetic data is essential for a variety of applications in human genetics. In genome-wide and sequencing association studies, for example, accurate inference on both recent genetic relatedness, such as family structure, and more distant genetic relatedness, such as population structure, is necessary for protection against spurious associations. Distinguishing familial relatedness from population structure with genotype data, however, is difficult because both manifest as genetic similarity through the sharing of alleles. Existing approaches for inference on recent genetic relatedness have limitations in the presence of population structure, where they either (1) make strong and simplifying assumptions about population structure, which are often untenable, or (2) require correct specification of and appropriate reference population panels for the ancestries in the sample, which might be unknown or not well defined. Here, we propose PC-Relate, a model-free approach for estimating commonly used measures of recent genetic relatedness, such as kinship coefficients and IBD sharing probabilities, in the presence of unspecified structure. PC-Relate uses principal components calculated from genome-screen data to partition genetic correlations among sampled individuals due to the sharing of recent ancestors and more distant common ancestry into two separate components, without requiring specification of the ancestral populations or reference population panels. In simulation studies with population structure, including admixture, we demonstrate that PC-Relate provides accurate estimates of genetic relatedness and improved relationship classification over widely used approaches. We further demonstrate the utility of PC-Relate in applications to three ancestrally diverse samples that vary in both size and genealogical complexity.  相似文献   

19.
J M Smith 《Genetics》1999,153(2):1021-1027
There are two types of recombination that we may wish to detect: rare recombinants between members of different populations or species and repeated recombination within a population. Methods appropriate in the former context are inappropriate in the latter because they depend on recognizing the existence of runs of nucleotides with similar ancestry. If recombination is sufficiently frequent, no such runs will be present. Several methods, including the homoplasy test and the incompatibility test, are described that are appropriate for detecting repeated recombination and for measuring its importance, relative to mutation, in causing genetic change. The sensitivity of these tests is investigated by simulating populations with varying frequencies of mutation and recombination and calculating the various statistics on samples.  相似文献   

20.
Familial amloidosis, Finnish type (FAP IV) was identified clinically in an American kindred with Scandinavian ancestry. A polymerase chain reaction (PCR)-based DNA diagnostic assay was used to identify a G-to-A mutation at position 654 of the gelsolin cDNA (G654A) in this family. Molecular diagnostic testing demonstrated the mutation in individuals in three generations — the clinically affected proband, her deceased clinically affected father, and her presumably affected presymptomatic child. This report represents a rare example of FAP IV and the G654A mutation identified in a family outside Finland. The disease-associated haplotype was similar to that observed in Finnish FAP IV families (suggesting common distant ancestry).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号