首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Understanding how and why populations evolve is of fundamental importance to molecular ecology. Restriction site‐associated DNA sequencing (RADseq), a popular reduced representation method, has ushered in a new era of genome‐scale research for assessing population structure, hybridization, demographic history, phylogeography and migration. RADseq has also been widely used to conduct genome scans to detect loci involved in adaptive divergence among natural populations. Here, we examine the capacity of those RADseq‐based genome scan studies to detect loci involved in local adaptation. To understand what proportion of the genome is missed by RADseq studies, we developed a simple model using different numbers of RAD‐tags, genome sizes and extents of linkage disequilibrium (length of haplotype blocks). Under the best‐case modelling scenario, we found that RADseq using six‐ or eight‐base pair cutting restriction enzymes would fail to sample many regions of the genome, especially for species with short linkage disequilibrium. We then surveyed recent studies that have used RADseq for genome scans and found that the median density of markers across these studies was 4.08 RAD‐tag markers per megabase (one marker per 245 kb). The length of linkage disequilibrium for many species is one to three orders of magnitude less than density of the typical recent RADseq study. Thus, we conclude that genome scans based on RADseq data alone, while useful for studies of neutral genetic variation and genetic population structure, will likely miss many loci under selection in studies of local adaptation.  相似文献   

2.
Approximate Bayesian computation (ABC) is a powerful tool for model‐based inference of demographic histories from large genetic data sets. For most organisms, its implementation has been hampered by the lack of sufficient genetic data. Genotyping‐by‐sequencing (GBS) provides cheap genome‐scale data to fill this gap, but its potential has not fully been exploited. Here, we explored power, precision and biases of a coalescent‐based ABC approach where GBS data were modelled with either a population mutation parameter (θ) or a fixed site (FS) approach, allowing single or several segregating sites per locus. With simulated data ranging from 500 to 50 000 loci, a variety of demographic models could be reliably inferred across a range of timescales and migration scenarios. Posterior estimates were informative with 1000 loci for migration and split time in simple population divergence models. In more complex models, posterior distributions were wide and almost reverted to the uninformative prior even with 50 000 loci. ABC parameter estimates, however, were generally more accurate than an alternative composite‐likelihood method. Bottleneck scenarios proved particularly difficult, and only recent bottlenecks without recovery could be reliably detected and dated. Notably, minor‐allele‐frequency filters – usual practice for GBS data – negatively affected nearly all estimates. With this in mind, we used a combination of FS and θ approaches on empirical GBS data generated from the Atlantic walrus (Odobenus rosmarus rosmarus), collectively providing support for a population split before the last glacial maximum followed by asymmetrical migration and a high Arctic bottleneck. Overall, this study evaluates the potential and limitations of GBS data in an ABC‐coalescence framework and proposes a best‐practice approach.  相似文献   

3.
Restriction‐site associated DNA sequencing (RADSeq) facilitates rapid generation of thousands of genetic markers at relatively low cost; however, several sources of error specific to RADSeq methods often lead to biased estimates of allele frequencies and thereby to erroneous population genetic inference. Estimating the distribution of sample allele frequencies without calling genotypes was shown to improve population inference from whole genome sequencing data, but the ability of this approach to account for RADSeq‐specific biases remains unexplored. Here we assess in how far genotype‐free methods of allele frequency estimation affect demographic inference from empirical RADSeq data. Using the well‐studied pied flycatcher (Ficedula hypoleuca) as a study system, we compare allele frequency estimation and demographic inference from whole genome sequencing data with that from RADSeq data matched for samples using both genotype‐based and genotype free methods. The demographic history of pied flycatchers as inferred from RADSeq data was highly congruent with that inferred from whole genome resequencing (WGS) data when allele frequencies were estimated directly from the read data. In contrast, when allele frequencies were derived from called genotypes, RADSeq‐based estimates of most model parameters fell outside the 95% confidence interval of estimates derived from WGS data. Notably, more stringent filtering of the genotype calls tended to increase the discrepancy between parameter estimates from WGS and RADSeq data, respectively. The results from this study demonstrate the ability of genotype‐free methods to improve allele frequency spectrum‐ (AFS‐) based demographic inference from empirical RADSeq data and highlight the need to account for uncertainty in NGS data regardless of sequencing method.  相似文献   

4.
5.
Full genome sequencing of organisms with large and complex genomes is intractable and cost ineffective under most research budgets. Cycads (Cycadales) represent one of the oldest lineages of the extant seed plants and, partly due to their age, have incredibly large genomes up to ~60 Gbp. Restriction site‐associated DNA sequencing (RADseq) offers an approach to find genome‐wide informative markers and has proven to be effective with both model and nonmodel organisms. We tested the application of RADseq using ezRAD across all 10 genera of the Cycadales including an example data set of Cycas calcicola representing 72 samples from natural populations. Using previously available plastid and mitochondrial genomes as references, reads were mapped recovering plastid and mitochondrial genome regions and nuclear markers for all of the genera. De novo assembly generated up to 138,407 high‐depth clusters and up to 1,705 phylogenetically informative loci for the genera, and 4,421 loci for the example assembly of C. calcicola. The number of loci recovered by de novo assembly was lower than previous RADseq studies, yet still sufficient for downstream analysis. However, the number of markers could be increased by relaxing our assembly parameters, especially for the C. calcicola data set. Our results demonstrate the successful application of RADseq across the Cycadales to generate a large number of markers for all genomic compartments, despite the large number of plastids present in a typical plant cell. Our modified protocol was adapted to be applied to cycads and other organisms with large genomes to yield many informative genome‐wide markers.  相似文献   

6.
The feasibility to sequence entire genomes of virtually any organism provides unprecedented insights into the evolutionary history of populations and species. Nevertheless, many population genomic inferences – including the quantification and dating of admixture, introgression and demographic events, and inference of selective sweeps – are still limited by the lack of high‐quality haplotype information. The newest generation of sequencing technology now promises significant progress. To establish the feasibility of haplotype‐resolved genome resequencing at population scale, we investigated properties of linked‐read sequencing data of songbirds of the genus Oenanthe across a range of sequencing depths. Our results based on the comparison of downsampled (25×, 20×, 15×, 10×, 7×, and 5×) with high‐coverage data (46–68×) of seven bird genomes mapped to a reference suggest that phasing contiguities and accuracies adequate for most population genomic analyses can be reached already with moderate sequencing effort. At 15× coverage, phased haplotypes span about 90% of the genome assembly, with 50% and 90% of phased sequences located in phase blocks longer than 1.25–4.6 Mb (N50) and 0.27–0.72 Mb (N90). Phasing accuracy reaches beyond 99% starting from 15× coverage. Higher coverages yielded higher contiguities (up to about 7 Mb/1 Mb [N50/N90] at 25× coverage), but only marginally improved phasing accuracy. Phase block contiguity improved with input DNA molecule length; thus, higher‐quality DNA may help keeping sequencing costs at bay. In conclusion, even for organisms with gigabase‐sized genomes like birds, linked‐read sequencing at moderate depth opens an affordable avenue towards haplotype‐resolved genome resequencing at population scale.  相似文献   

7.
We propose a novel approximate-likelihood method to fit demographic models to human genomewide single-nucleotide polymorphism (SNP) data. We divide the genome into windows of constant genetic map width and then tabulate the number of distinct haplotypes and the frequency of the most common haplotype for each window. We summarize the data by the genomewide joint distribution of these two statistics—termed the HCN statistic. Coalescent simulations are used to generate the expected HCN statistic for different demographic parameters. The HCN statistic provides additional information for disentangling complex demography beyond statistics based on single-SNP frequencies. Application of our method to simulated data shows it can reliably infer parameters from growth and bottleneck models, even in the presence of recombination hotspots when properly modeled. We also examined how practical problems with genomewide data sets, such as errors in the genetic map, haplotype phase uncertainty, and SNP ascertainment bias, affect our method. Several modifications of our method served to make it robust to these problems. We have applied our method to data collected by Perlegen Sciences and find evidence for a severe population size reduction in northwestern Europe starting 32,500–47,500 years ago.  相似文献   

8.
Recent studies suggest that haplotypes are arranged into discrete blocklike structures throughout the human genome. Here, we present an alternative haplotype block definition that assumes no recombination within each block but allows for recombination between blocks, and we use it to study the combined effects of demographic history and various population genetic parameters on haplotype block characteristics. Through extensive coalescent simulations and analysis of published haplotype data on chromosome 21, we find that (1) the combined effects of population demographic history, recombination, and mutation dictate haplotype block characteristics and (2) haplotype blocks can arise in the absence of recombination hot spots. Finally, we provide practical guidelines for designing and interpreting studies investigating haplotype block structure.  相似文献   

9.
The conservation of threatened species must be underpinned by phylogeographic knowledge. This need is epitomized by the freshwater fish Carassius carassius, which is in decline across much of its European range. Restriction site‐associated DNA sequencing (RADseq) is increasingly used for such applications; however, RADseq is expensive, and limitations on sample number must be weighed against the benefit of large numbers of markers. This trade‐off has previously been examined using simulation studies; however, empirical comparisons between these markers, especially in a phylogeographic context, are lacking. Here, we compare the results from microsatellites and RADseq for the phylogeography of C. carassius to test whether it is more advantageous to genotype fewer markers (microsatellites) in many samples, or many markers (SNPs) in fewer samples. These data sets, along with data from the mitochondrial cytochrome b gene, agree on broad phylogeographic patterns, showing the existence of two previously unidentified C. carassius lineages in Europe: one found throughout northern and central‐eastern European drainages and a second almost exclusively confined to the Danubian catchment. These lineages have been isolated for approximately 2.15 m years and should be considered separate conservation units. RADseq recovered finer population structure and stronger patterns of IBD than microsatellites, despite including only 17.6% of samples (38% of populations and 52% of samples per population). RADseq was also used along with approximate Bayesian computation to show that the postglacial colonization routes of C. carassius differ from the general patterns of freshwater fish in Europe, likely as a result of their distinctive ecology.  相似文献   

10.
11.
Whole‐genome or whole‐exome sequencing (WGS/WES) of the affected proband together with normal parents (trio) is commonly adopted to identify de novo germline mutations (DNMs) underlying sporadic cases of various genetic disorders. However, our current knowledge of the occurrence and functional effects of DNMs remains limited and accurately identifying the disease‐causing DNM from a group of irrelevant DNMs is complicated. Herein, we provide a general‐purpose discussion of important issues related to pathogenic gene identification based on trio‐based WGS/WES data. Specifically, the relevance of DNMs to human sporadic diseases, current knowledge of DNM biogenesis mechanisms, and common strategies or software tools used for DNM detection are reviewed, followed by a discussion of pathogenic gene prioritization. In addition, several key factors that may affect DNM identification accuracy and causal gene prioritization are reviewed. Based on recent major advances, this review both sheds light on how trio‐based WGS/WES technologies can play a significant role in the identification of DNMs and causal genes for sporadic diseases, and also discusses existing challenges.  相似文献   

12.
Transposable elements (TEs) – selfish DNA sequences that can move within the genome – comprise a large proportion of the genomes of many organisms. Although low‐coverage whole‐genome sequencing can be used to survey TE composition, it is noneconomical for species with large quantities of DNA. Here, we utilize restriction‐site associated DNA sequencing (RADSeq) as an alternative method to survey TE composition. First, we demonstrate in silico that double digest restriction‐site associated DNA sequencing (ddRADseq) markers contain the same TE compositions as whole genome assemblies across arthropods. Next, we show empirically using eight Synalpheus snapping shrimp species with large genomes that TE compositions from ddRADseq and low‐coverage whole‐genome sequencing are comparable within and across species. Finally, we develop a new bioinformatic pipeline, TERAD, to extract TE compositions from RADseq data. Our study expands the utility of RADseq to study the repeatome, making comparative studies of genome structure for species with large genomes more tractable and affordable.  相似文献   

13.
Restriction site‐associated DNA sequencing (RADseq) has emerged as a useful tool in systematics and population genomics. A common feature of RADseq data sets is that they contain missing data that arise from multiple sources including genealogical sampling bias, assembly methodology and sequencing error. Many RADseq studies have demonstrated that allowing sites (single nucleotide polymorphisms, SNPs) with missing data can increase support for phylogenetic hypotheses. Two non‐mutually exclusive explanations for this observation are that (a) larger data sets contain more phylogenetic information; and (b) excluding missing data disproportionally removes sites with the highest mutation rates, causing the exclusion of characters that are likely variable and informative. Using a RADseq data set derived from the East African banana frog, Afrixalus fornasini (up to 1.1 million SNPs), we found that missing data thresholds were positively correlated with the proportion of parsimony‐informative sites and mean branch support. Using three proxies for estimating site‐specific rate, we found that the most conservative missing data strategies excluded rapidly evolving sites, with four‐state sites present only when allowing ≥60% missing data per SNP. Topological similarity among estimated phylogenies was highest for the data sets with ≥60% missing data per SNP. Our results suggest that several desirable phylogenetic qualities were observed when allowing ≥60% missing data per SNP. However, at the highest missing data thresholds (80% and 90% missing data per SNP), we observed differences in performance between high‐ and mixed‐weight DNA extraction samples, which may indicate there are trade‐offs to consider when using degraded genomic template with RADseq protocols.  相似文献   

14.
15.
Recombination and selection drive the extent of linkage disequilibrium (LD) among loci and therefore affect the reshuffling of adaptive genetic variation. However, it is poorly known to what extent the enrichment of transposable elements (TEs) in recombinationally‐inert regions reflects their inefficient removal by purifying selection and whether the presence of polymorphic TEs can modify the local recombination rate. In this study, we investigate how TEs and recombination interact at fine scale along chromosomes and possibly support linked selection in natural populations. Whole‐genome sequencing data of 304 individuals from nearby alpine populations of Arabis alpina were used to show that the density of polymorphic TEs is specifically correlated with local LD along chromosomes. Consistent with TEs modifying recombination, the characterization of 28 such LD blocks of up to 5.5 Mb in length revealed strong evidence of selective sweeps at a few loci through either site frequency spectrum or haplotype structure. A majority of these blocks were enriched in genes related to ecologically relevant functions such as responses to cold, salt stress or photoperiodism. In particular, the S‐locus (i.e., supergene responsible for strict outcrossing) was identified in a LD block with high levels of polymorphic TEs and evidence of selection. Another such LD block was enriched in cold‐responding genes and presented evidence of adaptive loci related to photoperiodism and flowering being increasingly linked by polymorphic TEs. These results are consistent with the hypothesis that TEs modify recombination landscapes and thus interact with selection in driving blocks of linked adaptive loci in natural populations.  相似文献   

16.
Restriction site‐associated DNA sequencing (RADseq) is a powerful tool for genotyping of individuals, but the identification of loci and assignment of sequence reads is a crucial and often challenging step. The optimal parameter settings for a given de novo RADseq assembly vary between data sets and can be difficult and computationally expensive to determine. Here, we introduce RADProc, a software package that uses a graph data structure to represent all sequence reads and their similarity relationships. Storing sequence–comparison results in a graph eliminates unnecessary and redundant sequence similarity calculations. De novo locus formation for a given parameter set can be performed on the precomputed graph, making parameter sweeps far more efficient. RADProc also uses a clustering approach for faster nucleotide‐distance calculation. The performance of RADProc compares favourably with that of the widely used Stacks software. The run‐time comparisons between RADProc and Stacks for 32 different parameter settings using 20 green‐crab (Carcinus maenas) samples showed that RADProc took as little as 2 hr 40 min compared to 78 hr by Stacks, while 16 brown trout (Salmo trutta L.) samples were processed by RADProc and Stacks in 23 and 263 hr, respectively. Comparisons of the de novo loci formed, and catalog built using both the methods demonstrate that the improvement in processing speeds achieved by RADProc does not affect much the actual loci formed and the results of downstream analyses based on those loci.  相似文献   

17.
We present the development of a genomic library using RADseq (restriction site associated DNA sequencing) protocol for marker discovery that can be applied on evolutionary studies of the sugarcane borer Diatraea saccharalis, an important South American insect pest. A RADtag protocol combined with Illumina paired‐end sequencing allowed de novo discovery of 12 811 SNPs and a high‐quality assembly of 122.8M paired‐end reads from six individuals, representing 40 Gb of sequencing data. Approximately 1.7 Mb of the sugarcane borer genome distributed over 5289 minicontigs were obtained upon assembly of second reads from first reads RADtag loci where at least one SNP was discovered and genotyped. Minicontig lengths ranged from 200 to 611 bp and were used for functional annotation and microsatellite discovery. These markers will be used in future studies to understand gene flow and adaptation to host plants and control tactics.  相似文献   

18.
Genetic variation is of key importance for a species’ evolutionary potential, and its estimation is a major component of conservation studies. New DNA sequencing technologies have enabled the analysis of large portions of the genome in nonmodel species, promising highly accurate estimates of such population genetic parameters. Restriction site‐associated DNA sequencing (RADseq) is used to analyse thousands of variants in the bumble bee species Bombus impatiens, which is common, and Bombus pensylvanicus, which is in decline. Previous microsatellite‐based analyses have shown that gene diversity is lower in the declining B. pensylvanicus than in B. impatiens. RADseq nucleotide diversities appear much more similar in the two species. Both species exhibit allele frequencies consistent with historical population expansions. Differences in diversity observed at microsatellites thus do not appear to have arisen from long‐term differences in population size and are either recent in origin or may result from mutational processes. Additional research is needed to explain these discrepancies and to investigate the best ways to integrate next‐generation sequencing data and more traditional molecular markers in studies of genetic diversity.  相似文献   

19.

Background

An understanding of linkage disequilibrium (LD) structures in the human genome underpins much of medical genetics and provides a basis for disease gene mapping and investigating biological mechanisms such as recombination and selection. Whole genome sequencing (WGS) provides the opportunity to determine LD structures at maximal resolution.

Results

We compare LD maps constructed from WGS data with LD maps produced from the array-based HapMap dataset, for representative European and African populations. WGS provides up to 5.7-fold greater SNP density than array-based data and achieves much greater resolution of LD structure, allowing for identification of up to 2.8-fold more regions of intense recombination. The absence of ascertainment bias in variant genotyping improves the population representativeness of the WGS maps, and highlights the extent of uncaptured variation using array genotyping methodologies. The complete capture of LD patterns using WGS allows for higher genome-wide association study (GWAS) power compared to array-based GWAS, with WGS also allowing for the analysis of rare variation. The impact of marker ascertainment issues in arrays has been greatest for Sub-Saharan African populations where larger sample sizes and substantially higher marker densities are required to fully resolve the LD structure.

Conclusions

WGS provides the best possible resource for LD mapping due to the maximal marker density and lack of ascertainment bias. WGS LD maps provide a rich resource for medical and population genetics studies. The increasing availability of WGS data for large populations will allow for improved research utilising LD, such as GWAS and recombination biology studies.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1854-0) contains supplementary material, which is available to authorized users.  相似文献   

20.
Biological situations involving conflict can create arms race situations with repeated fixations of different functional variants, producing selective sweeps and lowering neutral diversity in genome regions linked to the functional locus. However, they can sometimes lead to balancing selection, potentially creating long coalescent times for sites with functionally different variants, and, if recombination occurs rarely, for extended haplotypes carrying such variants. We tested between these possibilities in a gynodioecious plant, Plantago lanceolata, in which cytoplasmic male‐sterility factors conflict with nuclear restorers of male fertility. We find low mitochondrial diversity, which does not support very long‐term coexistence of highly diverged mitochondrial haplotypes. Interestingly, however, we found a derived haplotype that is associated with male fertility in a restricted geographic region, and that has fixed differences from the ancestral sequence in several genes, suggesting that it did not arise very recently. Taken together, the results suggest arms race events that involved “soft" selective sweeps involving a moderately old‐established haplotype, consistent with the frequency fluctuations predicted by theoretical models of gynodioecy.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号