首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 984 毫秒
1.
Sequencing reduced‐representation libraries of restriction site‐associated DNA (RADseq) to identify single nucleotide polymorphisms (SNPs) is quickly becoming a standard methodology for molecular ecologists. Because of the scale of RADseq data sets, putative loci cannot be assessed individually, making the process of filtering noise and correctly identifying biologically meaningful signal more difficult. Artefacts introduced during library preparation and/or bioinformatic processing of SNP data can create patterns that are incorrectly interpreted as indicative of population structure or natural selection. Therefore, it is crucial to carefully consider types of errors that may be introduced during laboratory work and data processing, and how to minimize, detect and remove these errors. Here, we discuss issues inherent to RADseq methodologies that can result in artefacts during library preparation and locus reconstruction resulting in erroneous SNP calls and, ultimately, genotyping error. Further, we describe steps that can be implemented to create a rigorously filtered data set consisting of markers accurately representing independent loci and compare the effect of different combinations of filters on four RAD data sets. At last, we stress the importance of publishing raw sequence data along with final filtered data sets in addition to detailed documentation of filtering steps and quality control measures.  相似文献   

2.
In recent years multilocus data sets have been used to study the demographic history of human populations. In this paper (1) analyses previously done on 60 short tandem repeat (STR) loci are repeated on 30 restriction site polymorphism (RSP) markers; (2) relative population weights are estimated from the RSP data set and compared to previously published estimates from STR and craniometric data sets; and (3) computer simulations are performed to show the effects of ascertainment bias on relative population weight estimates. Not surprisingly, given that the RSP markers were originally identified in a small panel of Caucasians, estimates of relative population weights are biased and the European population weight is artificially inflated. However, the effects of ascertainment bias are not apparent in a principal components plot or estimates of FST. Ascertainment bias can have a large effect in other genetic systems with inherently low heterozygosity such as Alus or single nucleotide polymorphisms (SNPs), and care must be taken to have prior knowledge of how polymorphic markers in a given data set were originally identified. Otherwise, results can be skewed and interpretations faulty.  相似文献   

3.
Restriction‐enzyme‐based sequencing methods enable the genotyping of thousands of single nucleotide polymorphism (SNP) loci in nonmodel organisms. However, in contrast to traditional genetic markers, genotyping error rates in SNPs derived from restriction‐enzyme‐based methods remain largely unknown. Here, we estimated genotyping error rates in SNPs genotyped with double digest RAD sequencing from Mendelian incompatibilities in known mother–offspring dyads of Hoffman's two‐toed sloth (Choloepus hoffmanni) across a range of coverage and sequence quality criteria, for both reference‐aligned and de novo‐assembled data sets. Genotyping error rates were more sensitive to coverage than sequence quality and low coverage yielded high error rates, particularly in de novo‐assembled data sets. For example, coverage ≥5 yielded median genotyping error rates of ≥0.03 and ≥0.11 in reference‐aligned and de novo‐assembled data sets, respectively. Genotyping error rates declined to ≤0.01 in reference‐aligned data sets with a coverage ≥30, but remained ≥0.04 in the de novo‐assembled data sets. We observed approximately 10‐ and 13‐fold declines in the number of loci sampled in the reference‐aligned and de novo‐assembled data sets when coverage was increased from ≥5 to ≥30 at quality score ≥30, respectively. Finally, we assessed the effects of genotyping coverage on a common population genetic application, parentage assignments, and showed that the proportion of incorrectly assigned maternities was relatively high at low coverage. Overall, our results suggest that the trade‐off between sample size and genotyping error rates be considered prior to building sequencing libraries, reporting genotyping error rates become standard practice, and that effects of genotyping errors on inference be evaluated in restriction‐enzyme‐based SNP studies.  相似文献   

4.
Biallelic markers such as single nucleotide polymorphisms (SNPs) and insertion/deletion polymorphisms have become increasingly popular markers for various population genetics applications. However, the effort required to develop biallelic markers in nonmodel organisms is still substantial. In this study, we compared the estimation of various population genetic parameters (genetic divergence and structuring, isolation-by-distance, genetic diversity) using a limited number of biallelic markers (in total 7 loci) to those estimated with 14 microsatellite loci in 21 Atlantic salmon (Salmo salar) populations from northern Europe. Pairwise FST values were significantly correlated between biallelic loci and microsatellite datasets, as was overall heterozygosity when both anadromous and nonanadromous populations were analyzed together. However, when the anadromous and nonanadromous samples were analyzed separately, only genetic divergence correlations remained significant. Biallelic markers alone were not sufficient for reliable neighbor-joining clustering of populations but gave highly similar isolation-by-distance signals when compared with microsatellites. Finally, although several population prioritization measures for conservation exhibited significant correlation between different marker types, the specific populations highlighted as being most valuable for conservation purposes varied depending on the marker type and conservation criteria applied. This study demonstrates that a relatively small set of biallelic markers can be sufficient for obtaining concordant results in most of the analyses compared with microsatellites, although estimates of genetic distance are generally more concordant than estimates of genetic diversity. This suggests that a relatively small number of biallelic markers can provide useful information for various population genetic applications. However, we emphasize that the use of much higher number of loci is preferable, especially when the genetic differences between populations are subtle or individual multilocus genotype-based analyses are to be performed.  相似文献   

5.
High throughput arrays for the simultaneous genotyping of thousands of single-nucleotide polymorphisms (SNPs) have made the rapid genetic characterisation of plant genomes and the development of saturated linkage maps a realistic prospect for many plant species of agronomic importance. However, the correct calling of SNP genotypes in divergent polyploid genomes using array technology can be problematic due to paralogy, and to divergence in probe sequences causing changes in probe binding efficiencies. An Illumina Infinium II whole-genome genotyping array was recently developed for the cultivated apple and used to develop a molecular linkage map for an apple rootstock progeny (M432), but a large proportion of segregating SNPs were not mapped in the progeny, due to unexpected genotype clustering patterns. To investigate the causes of this unexpected clustering we performed BLAST analysis of all probe sequences against the ‘Golden Delicious’ genome sequence and discovered evidence for paralogous annealing sites and probe sequence divergence for a high proportion of probes contained on the array. Following visual re-evaluation of the genotyping data generated for 8,788 SNPs for the M432 progeny using the array, we manually re-scored genotypes at 818 loci and mapped a further 797 markers to the M432 linkage map. The newly mapped markers included the majority of those that could not be mapped previously, as well as loci that were previously scored as monomorphic, but which segregated due to divergence leading to heterozygosity in probe annealing sites. An evaluation of the 8,788 probes in a diverse collection of Malus germplasm showed that more than half the probes returned genotype clustering patterns that were difficult or impossible to interpret reliably, highlighting implications for the use of the array in genome-wide association studies.  相似文献   

6.
We propose a simple model of evolution at a pair of SNP loci, under mutation, genetic drift and recombination. The developed model allows to consider evolution of SNPs under different demographic scenarios. We applied it to SNP data containing polymorphisms spanning 19 gene regions. We initially matched the linkage disequilibrium (LD) data only, and then we reconciled both LD and heterozygosity data. The imbalance between LD and heterozygosity data, observed for some of the analyzed genomic regions, may be a signature of selection acting in these regions. However, assuming neutrality, we obtain estimates of the age of population expansion of modern humans, which are consistent with the consensus estimates. In addition, we are able to estimate the ages of the polymorphisms observed in different genomic regions and we find that they vary widely with respect to their age. Polymorphisms at loci implicated in human disease, seem to be younger than average. Our results supplement the conclusions originally obtained by Reich and co-workers for the same set of data.  相似文献   

7.
Gene flow between diverging populations experiencing dissimilar ecological conditions can theoretically constrain adaptive evolution. To minimize the effect of gene flow, alleles underlying traits essential for local adaptation are predicted to be located in linked genome regions with reduced recombination. Local reduction in gene flow caused by selection is expected to produce elevated divergence in these regions. The highly divergent crab‐adapted and wave‐adapted ecotypes of the marine snail Littorina saxatilis present a model system to test these predictions. We used genome‐wide association (GWA) analysis of geometric morphometric shell traits associated with microgeographic divergence between the two L. saxatilis ecotypes within three separate sampling sites. A total of 477 snails that had individual geometric morphometric data and individual genotypes at 4,066 single nucleotide polymorphisms (SNPs) were analyzed using GWA methods that corrected for population structure among the three sites. This approach allowed dissection of the genomic architecture of shell shape divergence between ecotypes across a wide geographic range, spanning two glacial lineages. GWA revealed 216 quantitative trait loci (QTL) with shell size or shape differences between ecotypes, with most loci explaining a small proportion of phenotypic variation. We found that QTL were evenly distributed across 17 linkage groups, and exhibited elevated interchromosomal linkage, suggesting a genome‐wide response to divergent selection on shell shape between the two ecotypes. Shell shape trait‐associated loci showed partial overlap with previously identified outlier loci under divergent selection between the two ecotypes, supporting the hypothesis of diversifying selection on these genomic regions. These results suggest that divergence in shell shape between the crab‐adapted and wave‐adapted ecotypes is produced predominantly by a polygenic genomic architecture with positive linkage disequilibrium among loci of small effect.  相似文献   

8.
Likelihood analysis of ongoing gene flow and historical association   总被引:3,自引:0,他引:3  
Abstract.— We develop a Monte Carlo-based likelihood method for estimating migration rates and population divergence times from data at unlinked loci at which mutation rates are sufficiently low that, in the recent past, the effects of mutation can be ignored. The method is applicable to restriction fragment length polymorphisms (RFLPs) and single nucleotide polymorphisms (SNPs) sampled from a subdivided population. The method produces joint maximum-likelihood estimates of the migration rate and the time of population divergence, both scaled by population size, and provides a framework in which to test either for no ongoing gene flow or for population divergence in the distant past. We show the method performs well and provides reasonably accurate estimates of parameters even when the assumptions under which those estimates are obtained are not completely satisfied. Furthermore, we show that, provided that the number of polymorphic loci is sufficiently large, there is some power to distinguish between ongoing gene flow and historical association as causes of genetic similarity between pairs of populations.  相似文献   

9.
Restriction‐site‐associated DNA sequencing (RAD‐seq) and related methods are revolutionizing the field of population genomics in nonmodel organisms as they allow generating an unprecedented number of single nucleotide polymorphisms (SNPs) even when no genomic information is available. Yet, RAD‐seq data analyses rely on assumptions on nature and number of nucleotide variants present in a single locus, the choice of which may lead to an under‐ or overestimated number of SNPs and/or to incorrectly called genotypes. Using the Atlantic mackerel (Scomber scombrus L.) and a close relative, the Atlantic chub mackerel (Scomber colias), as case study, here we explore the sensitivity of population structure inferences to two crucial aspects in RAD‐seq data analysis: the maximum number of mismatches allowed to merge reads into a locus and the relatedness of the individuals used for genotype calling and SNP selection. Our study resolves the population structure of the Atlantic mackerel, but, most importantly, provides insights into the effects of alternative RAD‐seq data analysis strategies on population structure inferences that are directly applicable to other species.  相似文献   

10.
Quantitative traits important to organismal function and fitness, such as brain size, are presumably controlled by many small‐effect loci. Deciphering the genetic architecture of such traits with traditional quantitative trait locus (QTL) mapping methods is challenging. Here, we investigated the genetic architecture of brain size (and the size of five different brain parts) in nine‐spined sticklebacks (Pungitius pungitius) with the aid of novel multilocus QTL‐mapping approaches based on a de‐biased LASSO method. Apart from having more statistical power to detect QTL and reduced rate of false positives than conventional QTL‐mapping approaches, the developed methods can handle large marker panels and provide estimates of genomic heritability. Single‐locus analyses of an F2 interpopulation cross with 239 individuals and 15 198, fully informative single nucleotide polymorphisms (SNPs) uncovered 79 QTL associated with variation in stickleback brain size traits. Many of these loci were in strong linkage disequilibrium (LD) with each other, and consequently, a multilocus mapping of individual SNPs, accounting for LD structure in the data, recovered only four significant QTL. However, a multilocus mapping of SNPs grouped by linkage group (LG) identified 14 LGs (1–6 depending on the trait) that influence variation in brain traits. For instance, 17.6% of the variation in relative brain size was explainable by cumulative effects of SNPs distributed over six LGs, whereas 42% of the variation was accounted for by all 21 LGs. Hence, the results suggest that variation in stickleback brain traits is influenced by many small‐effect loci. Apart from suggesting moderately heritable (h2 ≈ 0.15–0.42) multifactorial genetic architecture of brain traits, the results highlight the challenges in identifying the loci contributing to variation in quantitative traits. Nevertheless, the results demonstrate that the novel QTL‐mapping approach developed here has distinctive advantages over the traditional QTL‐mapping methods in analyses of dense marker panels.  相似文献   

11.
Many studies in human genetics compare informativeness of single‐nucleotide polymorphisms (SNPs) and microsatellites (single sequence repeats; SSR) in genome scans, but it is difficult to transfer the results directly to livestock because of different population structures. The aim of this study was to determine the number of SNPs needed to obtain the same differentiation power as with a given standard set of microsatellites. Eight chicken breeds were genotyped for 29 SSRs and 9216 SNPs. After filtering, only 2931 SNPs remained. The differentiation power was evaluated using two methods: partitioning of the Euclidean distance matrix based on a principal component analysis (PCA) and a Bayesian model‐based clustering approach. Generally, with PCA‐based partitioning, 70 SNPs provide a comparable resolution to 29 SSRs. In model‐based clustering, the similarity coefficient showed significantly higher values between repeated runs for SNPs compared to SSRs. For the membership coefficients, reflecting the proportion to which a fraction segment of the genome belongs to the ith cluster, the highest values were obtained for 29 SSRs and 100 SNPs respectively. With a low number of loci (29 SSRs or ≤100 SNPs), neither marker types could detect the admixture in the Gödöllö Nhx population. Using more than 250 SNPs allowed a more detailed insight into the genetic architecture. Thus, the admixed population could be detected. It is concluded that breed differentiation studies will substantially gain power even with moderate numbers of SNPs.  相似文献   

12.
Family-based candidate gene and genome-wide association studies are a logical progression from linkage studies for the identification of gene and polymorphisms underlying complex traits. An efficient way to analyse phenotypic and genotypic data is to model linkage and association simultaneously. An important result from such an analysis is whether any evidence for linkage remains after fitting polymorphisms at candidate genes (residual linkage), because this may indicate locus and allelic heterogeneity in the population and will influence subsequent molecular strategies. Here we report that substantial residual linkage is to be expected, even under genetic homogeneity and when the underlying causal polymorphisms are genotyped and fitted in the model. We simulated a powerful design to detect linkage to quantitative trait loci, with 5, 10 or 20 causal SNPs spread throughout the genome. These SNPs were responsible for all genetic variation, and hence for both linkage and association. Residual linkage at the largest linkage peak from a genome-wide scan was substantial, with mean LOD scores of 0.4, 0.7, and 1.4 for the case of 5, 10 and 20 underlying causal SNPs, respectively. For less powerful designs, the proportion of the original LOD scores that remains after association will be even larger. All cases of ‘significant’ residual linkage are false positives. The reason for the apparent paradox of detecting residual linkage after fitting causal polymorphisms is that the linkage signals at the largest peaks in a genome-scan are severely inflated, even if all peaks correspond to true linkage. Our findings are general and apply to linkage mapping of any phenotype and to any pedigree structure.  相似文献   

13.
Laura E. Timm 《Molecular ecology》2020,29(12):2133-2136
From its inception, population genetics has been nearly as concerned with the genetic data type—to which analyses are brought to bear—as it is with the analysis methods themselves. The field has traversed allozymes, microsatellites, segregating sites in multilocus alignments and, currently, single nucleotide polymorphisms (SNPs) generated by high‐throughput genomic sequencing methods, primarily whole genome sequencing and reduced representation library (RRL) sequencing. As each emerging data type has gained traction, it has been compared to existing methods, based on its relative ability to discern population structural complexity at increasing levels of resolution. However, this is usually done by comparing the gold standard in one data type to the gold standard in the new data type. These gold standards frequently differ in power and in sampling density, both across a genome and throughout a spatial range. In this issue of Molecular Ecology, D’Aloia et al. apply the high‐throughput approach as fully as possible to microsatellites, nuclear loci and SNPs genotyped through an RRL method; this is coupled with a spatially dense sampling scheme. Completing a battery of population genetics analyses across data types (including a series of down‐sampled data sets), the authors find that SNP data are slightly more sensitive to fine‐scale genetic structure, and the results are more resilient to down‐sampling than microsatellites and nonrepetitive nuclear loci. However, their results are far from an unqualified victory for RRL SNP data over all previous data types: the authors note that modest additions to the microsatellites and nuclear loci data sets may provide the necessary analytical power to delineate the fine‐scale genetic structuring identified by SNPs. As always, as the field begins to fully embrace the newest thing, good science reminds us that traditional data types are far from useless, especially when combined with a well‐designed sampling scheme.  相似文献   

14.
Advances in genomic techniques are greatly facilitating the study of molecular signatures of selection in diverging natural populations. Connecting these signatures to phenotypes under selection remains challenging, but benefits from dissections of the genetic architecture of adaptive divergence. We here perform quantitative trait locus (QTL) mapping using 488 F2 individuals and 2011 single nucleotide polymorphisms (SNPs) to explore the genetic architecture of skeletal divergence in a lake‐stream stickleback system from Central Europe. We find QTLs for gill raker, snout, and head length, vertebral number, and the extent of lateral plating (plate number and height). Although two large‐effect loci emerge, QTL effect sizes are generally small. Examining the neighborhood of the QTL‐linked SNPs identifies several genes involved in bone formation, which emerge as strong candidate genes for skeletal evolution. Finally, we use SNP data from the natural source populations to demonstrate that some SNPs linked to QTLs in our cross also exhibit striking allele frequency differences in the wild, suggesting a causal role of these QTLs in adaptive population divergence. Our study paves the way for comparative analyses across other (lake‐stream) stickleback populations, and for functional investigations of the candidate genes.  相似文献   

15.
Assessments of population genetic structure and demographic history have traditionally been based on neutral markers while explicitly excluding adaptive markers. In this study, we compared the utility of putatively adaptive and neutral single‐nucleotide polymorphisms (SNPs) for inferring mountain pine beetle population structure across its geographic range. Both adaptive and neutral SNPs, and their combination, allowed range‐wide structure to be distinguished and delimited a population that has recently undergone range expansion across northern British Columbia and Alberta. Using an equal number of both adaptive and neutral SNPs revealed that adaptive SNPs resulted in a stronger correlation between sampled populations and inferred clustering. Our results suggest that adaptive SNPs should not be excluded prior to analysis from neutral SNPs as a combination of both marker sets resulted in better resolution of genetic differentiation between populations than either marker set alone. These results demonstrate the utility of adaptive loci for resolving population genetic structure in a nonmodel organism.  相似文献   

16.
Effectiveness of computational methods in haplotype prediction   总被引:11,自引:0,他引:11  
Haplotype analysis has been used for narrowing down the location of disease-susceptibility genes and for investigating many population processes. Computational algorithms have been developed to estimate haplotype frequencies and to predict haplotype phases from genotype data for unrelated individuals. However, the accuracy of such computational methods needs to be evaluated before their applications can be advocated. We have experimentally determined the haplotypes at two loci, the N-acetyltransferase 2 gene ( NAT2, 850 bp, n=81) and a 140-kb region on chromosome X ( n=77), each consisting of five single nucleotide polymorphisms (SNPs). We empirically evaluated and compared the accuracy of the subtraction method, the expectation-maximization (EM) method, and the PHASE method in haplotype frequency estimation and in haplotype phase prediction. Where there was near complete linkage disequilibrium (LD) between SNPs (the NAT2 gene), all three methods provided effective and accurate estimates for haplotype frequencies and individual haplotype phases. For a genomic region in which marked LD was not maintained (the chromosome X locus), the computational methods were adequate in estimating overall haplotype frequencies. However, none of the methods was accurate in predicting individual haplotype phases. The EM and the PHASE methods provided better estimates for overall haplotype frequencies than the subtraction method for both genomic regions.  相似文献   

17.
The conservation and management of endangered species requires information on their genetic diversity, relatedness and population structure. The main genetic markers applied for these questions are microsatellites and single nucleotide polymorphisms (SNPs), the latter of which remain the more resource demanding approach in most cases. Here, we compare the performance of two approaches, SNPs obtained by restriction‐site‐associated DNA sequencing (RADseq) and 16 DNA microsatellite loci, for estimating genetic diversity, relatedness and genetic differentiation of three, small, geographically close wild brown trout (Salmo trutta) populations and a regionally used hatchery strain. The genetic differentiation, quantified as FST, was similar when measured using 16 microsatellites and 4,876 SNPs. Based on both marker types, each brown trout population represented a distinct gene pool with a low level of interbreeding. Analysis of SNPs identified half‐ and full‐siblings with a higher probability than the analysis based on microsatellites, and SNPs outperformed microsatellites in estimating individual‐level multilocus heterozygosity. Overall, the results indicated that moderately polymorphic microsatellites and SNPs from RADseq agreed on estimates of population genetic structure in moderately diverged, small populations, but RADseq outperformed microsatellites for applications that required individual‐level genotype information, such as quantifying relatedness and individual‐level heterozygosity. The results can be applied to other small populations with low or moderate levels of genetic diversity.  相似文献   

18.
Inference of intraspecific population divergence patterns typically requires genetic data for molecular markers with relatively high mutation rates. Microsatellites, or short tandem repeat (STR) polymorphisms, have proven informative in many such investigations. These markers are characterized, however, by high levels of homoplasy and varying mutational properties, often leading to inaccurate inference of population divergence. A SNPSTR is a genetic system that consists of an STR polymorphism closely linked (typically < 500 bp) to one or more single-nucleotide polymorphisms (SNPs). SNPSTR systems are characterized by lower levels of homoplasy than are STR loci. Divergence time estimates based on STR variation (on the derived SNP allele background) should, therefore, be more accurate and precise. We use coalescent-based simulations in the context of several models of demographic history to compare divergence time estimates based on SNPSTR haplotype frequencies and STR allele frequencies. We demonstrate that estimates of divergence time based on STR variation on the background of a derived SNP allele are more accurate (3% to 7% bias for SNPSTR versus 11% to 20% bias for STR) and more precise than STR-based estimates, conditional on a recent SNP mutation. These results hold even for models involving complex demographic scenarios with gene flow, population expansion, and population bottlenecks. Varying the timing of the mutation event generating the SNP revealed that estimates of divergence time are sensitive to SNP age, with more recent SNPs giving more accurate and precise estimates of divergence time. However, varying both mutational properties of STR loci and SNP age demonstrated that multiple independent SNPSTR systems provide less biased estimates of divergence time. Furthermore, the combination of estimates based separately on STR and SNPSTR variation provides insight into the age of the derived SNP alleles. In light of our simulations, we interpret estimates from data for human populations.  相似文献   

19.
Single nucleotide polymorphisms (SNPs) are rapidly becoming the marker of choice in population genetics due to a variety of advantages relative to other markers, including higher genomic density, data quality, reproducibility and genotyping efficiency, as well as ease of portability between laboratories. Advances in sequencing technology and methodologies to reduce genomic representation have made the isolation of SNPs feasible for nonmodel organisms. RNA‐seq is one such technique for the discovery of SNPs and development of markers for large‐scale genotyping. Here, we report the development of 192 validated SNP markers for parentage analysis in Tripterygion delaisi (the black‐faced blenny), a small rocky‐shore fish from the Mediterranean Sea. RNA‐seq data for 15 individual samples were used for SNP discovery by applying a series of selection criteria. Genotypes were then collected from 1599 individuals from the same population with the resulting loci. Differences in heterozygosity and allele frequencies were found between the two data sets. Heterozygosity was lower, on average, in the population sample, and the mean difference between the frequencies of particular alleles in the two data sets was 0.135 ± 0.100. We used bootstrap resampling of the sequence data to predict appropriate sample sizes for SNP discovery. As cDNA library production is time‐consuming and expensive, we suggest that using seven individuals for RNA sequencing reduces the probability of discarding highly informative SNP loci, due to lack of observed polymorphism, whereas use of more than 12 samples does not considerably improve prediction of true allele frequencies.  相似文献   

20.
Molecular methods as applied to the biogeography of single species (phylogeography) or multiple codistributed species (comparative phylogeography) have been productively and extensively used to elucidate common historical features in the diversification of the Earth's biota. However, only recently have methods for estimating population divergence times or their confidence limits while taking into account the critical effects of genetic polymorphism in ancestral species become available, and earlier methods for doing so are underutilized. We review models that address the crucial distinction between the gene divergence, the parameter that is typically recovered in molecular phylogeographic studies, and the population divergence, which is in most cases the parameter of interest and will almost always postdate the gene divergence. Assuming that population sizes of ancestral species are distributed similarly to those of extant species, we show that phylogeographic studies in vertebrates suggest that divergence of alleles in ancestral species can comprise from less than 10% to over 50% of the total divergence between sister species, suggesting that the problem of ancestral polymorphism in dating population divergence can be substantial. The variance in the number of substitutions (among loci for a given species or among species for a given gene) resulting from the stochastic nature of DNA change is generally smaller than the variance due to substitutions along allelic lines whose coalescence times vary due to genetic drift in the ancestral population. Whereas the former variance can be reduced by further DNA sequencing at a single locus, the latter cannot. Contrary to phylogeographic intuition, dating population divergence times when allelic lines have achieved reciprocal monophyly is in some ways more challenging than when allelic lines have not achieved monophyly, because in the former case critical data on ancestral population size provided by residual ancestral polymorphism is lost. In the former case differences in coalescence time between species pairs can in principle be explained entirely by differences in ancestral population size without resorting to explanations involving differences in divergence time. Furthermore, the confidence limits on population divergence times are severely underestimated when those for number of substitutions per site in the DNA sequences examined are used as a proxy. This uncertainty highlights the importance of multilocus data in estimating population divergence times; multilocus data can in principle distinguish differences in coalescence time (T) resulting from differences in population divergence time and differences in T due to differences in ancestral population sizes and will reduce the confidence limits on the estimates. We analyze the contribution of ancestral population size (theta) to T and the effect of uncertainty in theta on estimates of population divergence (tau) for single loci under reciprocal monophyly using a simple Bayesian extension of Takahata and Satta's and Yang's recent coalescent methods. The confidence limits on tau decrease when the range over which ancestral population size theta is assumed to be distributed decreases and when tau increases; they generally exclude zero when tau/(4Ne) > 1. We also apply a maximum-likelihood method to several single and multilocus data sets. With multilocus data, the criterion for excluding tau = 0 is roughly that l tau/(4Ne) > 1, where l is the number of loci. Our analyses corroborate recent suggestions that increasing the number of loci is critical to decreasing the uncertainty in estimates of population divergence time.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号