首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 250 毫秒
1.
2.
High‐throughput sequencing has revolutionized population and conservation genetics. RAD sequencing methods, such as 2b‐RAD, can be used on species lacking a reference genome. However, transferring protocols across taxa can potentially lead to poor results. We tested two different IIB enzymes (AlfI and CspCI) on two species with different genome sizes (the loggerhead turtle Caretta caretta and the sharpsnout seabream Diplodus puntazzo) to build a set of guidelines to improve 2b‐RAD protocols on non‐model organisms while optimising costs. Good results were obtained even with degraded samples, showing the value of 2b‐RAD in studies with poor DNA quality. However, library quality was found to be a critical parameter on the number of reads and loci obtained for genotyping. Resampling analyses with different number of reads per individual showed a trade‐off between number of loci and number of reads per sample. The resulting accumulation curves can be used as a tool to calculate the number of sequences per individual needed to reach a mean depth ≥20 reads to acquire good genotyping results. Finally, we demonstrated that selective‐base ligation does not affect genomic differentiation between individuals, indicating that this technique can be used in species with large genome sizes to adjust the number of loci to the study scope, to reduce sequencing costs and to maintain suitable sequencing depth for a reliable genotyping without compromising the results. Here, we provide a set of guidelines to improve 2b‐RAD protocols on non‐model organisms with different genome sizes, helping decision‐making for a reliable and cost‐effective genotyping.  相似文献   

3.
Whole genome resequencing of 51 Populus nigra (L.) individuals from across Western Europe was performed using Illumina platforms. A total number of 1 878 727 SNPs distributed along the P. nigra reference sequence were identified. The SNP calling accuracy was validated with Sanger sequencing. SNPs were selected within 14 previously identified QTL regions, 2916 expressional candidate genes related to rust resistance, wood properties, water‐use efficiency and bud phenology and 1732 genes randomly spread across the genome. Over 10 000 SNPs were selected for the construction of a 12k Infinium Bead‐Chip array dedicated to association mapping. The SNP genotyping assay was performed with 888 P. nigra individuals. The genotyping success rate was 91%. Our high success rate was due to the discovery panel design and the stringent parameters applied for SNP calling and selection. In the same set of P. nigra genotypes, linkage disequilibrium throughout the genome decayed on average within 5–7 kb to half of its maximum value. As an application test, ADMIXTURE analysis was performed with a selection of 600 SNPs spread throughout the genome and 706 individuals collected along 12 river basins. The admixture pattern was consistent with genetic diversity revealed by neutral markers and the geographical distribution of the populations. These newly developed SNP resources and genotyping array provide a valuable tool for population genetic studies and identification of QTLs through natural‐population based genetic association studies in P. nigra.  相似文献   

4.
Properly estimating genetic diversity in populations of nonmodel species requires a basic understanding of how diversity is distributed across the genome and among individuals. To this end, we analysed whole‐genome resequencing data from 20 collared flycatchers (genome size ≈1.1 Gb; 10.13 million single nucleotide polymorphisms detected). Genomewide nucleotide diversity was almost identical among individuals (mean = 0.00394, range = 0.00384–0.00401), but diversity levels varied extensively across the genome (95% confidence interval for 200‐kb windows = 0.0013–0.0053). Diversity was related to selective constraint such that in comparison with intergenic DNA, diversity at fourfold degenerate sites was reduced to 85%, 3′ UTRs to 82%, 5′ UTRs to 70% and nondegenerate sites to 12%. There was a strong positive correlation between diversity and chromosome size, probably driven by a higher density of targets for selection on smaller chromosomes increasing the diversity‐reducing effect of linked selection. Simulations exploring the ability of sequence data from a small number of genetic markers to capture the observed diversity clearly demonstrated that diversity estimation from finite sampling of such data is bound to be associated with large confidence intervals. Nevertheless, we show that precision in diversity estimation in large outbred population benefits from increasing the number of loci rather than the number of individuals. Simulations mimicking RAD sequencing showed that this approach gives accurate estimates of genomewide diversity. Based on the patterns of observed diversity and the performed simulations, we provide broad recommendations for how genetic diversity should be estimated in natural populations.  相似文献   

5.
Restriction site‐associated DNA sequencing (RADseq) provides researchers with the ability to record genetic polymorphism across thousands of loci for nonmodel organisms, potentially revolutionizing the field of molecular ecology. However, as with other genotyping methods, RADseq is prone to a number of sources of error that may have consequential effects for population genetic inferences, and these have received only limited attention in terms of the estimation and reporting of genotyping error rates. Here we use individual sample replicates, under the expectation of identical genotypes, to quantify genotyping error in the absence of a reference genome. We then use sample replicates to (i) optimize de novo assembly parameters within the program Stacks, by minimizing error and maximizing the retrieval of informative loci; and (ii) quantify error rates for loci, alleles and single‐nucleotide polymorphisms. As an empirical example, we use a double‐digest RAD data set of a nonmodel plant species, Berberis alpina, collected from high‐altitude mountains in Mexico.  相似文献   

6.
The number of polymorphisms identified with next‐generation sequencing approaches depends directly on the sequencing depth and therefore on the experimental cost. Although higher levels of depth ensure more sensitive and more specific SNP calls, economic constraints limit the increase of depth for whole‐genome resequencing (WGS). For this reason, capture resequencing is used for studies focusing on only some specific regions of the genome. However, several biases in capture resequencing are known to have a negative impact on the sensitivity of SNP detection. Within this framework, the aim of this study was to compare the accuracy of WGS and capture resequencing on SNP detection and genotype calling, which differ in terms of both sequencing depth and biases. Indeed, we have evaluated the SNP calling and genotyping accuracy in a WGS dataset (13X) and in a capture resequencing dataset (87X) performed on 11 individuals. The percentage of SNPs not identified due to a sevenfold sequencing depth decrease was estimated at 7.8% using a down‐sampling procedure on the capture sequencing dataset. A comparison of the 87X capture sequencing dataset with the WGS dataset revealed that capture‐related biases were leading with the loss of 5.2% of SNPs detected with WGS. Nevertheless, when considering the SNPs detected by both approaches, capture sequencing appears to achieve far better SNP genotyping, with about 4.4% of the WGS genotypes that can be considered as erroneous and even 10% focusing on heterozygous genotypes. In conclusion, WGS and capture deep sequencing can be considered equivalent strategies for SNP detection, as the rate of SNPs not identified because of a low sequencing depth in the former is quite similar to SNPs missed because of method biases of the latter. On the other hand, capture deep sequencing clearly appears more adapted for studies requiring great accuracy in genotyping.  相似文献   

7.
Advanced resources for genome‐assisted research in barley (Hordeum vulgare) including a whole‐genome shotgun assembly and an integrated physical map have recently become available. These have made possible studies that aim to assess genetic diversity or to isolate single genes by whole‐genome resequencing and in silico variant detection. However such an approach remains expensive given the 5 Gb size of the barley genome. Targeted sequencing of the mRNA‐coding exome reduces barley genomic complexity more than 50‐fold, thus dramatically reducing this heavy sequencing and analysis load. We have developed and employed an in‐solution hybridization‐based sequence capture platform to selectively enrich for a 61.6 megabase coding sequence target that includes predicted genes from the genome assembly of the cultivar Morex as well as publicly available full‐length cDNAs and de novo assembled RNA‐Seq consensus sequence contigs. The platform provides a highly specific capture with substantial and reproducible enrichment of targeted exons, both for cultivated barley and related species. We show that this exome capture platform provides a clear path towards a broader and deeper understanding of the natural variation residing in the mRNA‐coding part of the barley genome and will thus constitute a valuable resource for applications such as mapping‐by‐sequencing and genetic diversity analyzes.  相似文献   

8.
In the last decade, the revolution in sequencing technologies has deeply impacted crop genotyping practice. New methods allowing rapid, high‐throughput genotyping of entire crop populations have proliferated and opened the door to wider use of molecular tools in plant breeding. These new genotyping‐by‐sequencing (GBS) methods include over a dozen reduced‐representation sequencing (RRS) approaches and at least four whole‐genome resequencing (WGR) approaches. The diversity of methods available, each often producing different types of data at different cost, can make selection of the best‐suited method seem a daunting task. We review the most common genotyping methods used today and compare their suitability for linkage mapping, genomewide association studies (GWAS), marker‐assisted and genomic selection and genome assembly and improvement in crops with various genome sizes and complexity. Furthermore, we give an outline of bioinformatics tools for analysis of genotyping data. WGR is well suited to genotyping biparental cross populations with complex, small‐ to moderate‐sized genomes and provides the lowest cost per marker data point. RRS approaches differ in their suitability for various tasks, but demonstrate similar costs per marker data point. These approaches are generally better suited for de novo applications and more cost‐effective when genotyping populations with large genomes or high heterozygosity. We expect that although RRS approaches will remain the most cost‐effective for some time, WGR will become more widespread for crop genotyping as sequencing costs continue to decrease.  相似文献   

9.
We report on a whole‐genome draft sequence of rye (Secale cereale L.). Rye is a diploid Triticeae species closely related to wheat and barley, and an important crop for food and feed in Central and Eastern Europe. Through whole‐genome shotgun sequencing of the 7.9‐Gbp genome of the winter rye inbred line Lo7 we obtained a de novo assembly represented by 1.29 million scaffolds covering a total length of 2.8 Gbp. Our reference sequence represents nearly the entire low‐copy portion of the rye genome. This genome assembly was used to predict 27 784 rye gene models based on homology to sequenced grass genomes. Through resequencing of 10 rye inbred lines and one accession of the wild relative S. vavilovii, we discovered more than 90 million single nucleotide variants and short insertions/deletions in the rye genome. From these variants, we developed the high‐density Rye600k genotyping array with 600 843 markers, which enabled anchoring the sequence contigs along a high‐density genetic map and establishing a synteny‐based virtual gene order. Genotyping data were used to characterize the diversity of rye breeding pools and genetic resources, and to obtain a genome‐wide map of selection signals differentiating the divergent gene pools. This rye whole‐genome sequence closes a gap in Triticeae genome research, and will be highly valuable for comparative genomics, functional studies and genome‐based breeding in rye.  相似文献   

10.
Genotyping‐by‐sequencing (GBS) and related methods are increasingly used for studies of non‐model organisms from population genetic to phylogenetic scales. We present GIbPSs, a new genotyping toolkit for the analysis of data from various protocols such as RAD, double‐digest RAD, GBS, and two‐enzyme GBS without a reference genome. GIbPSs can handle paired‐end GBS data and is able to assign reads from both strands of a restriction fragment to the same locus. GIbPSs is most suitable for population genetic and phylogeographic analyses. It avoids genotyping errors due to indel variation by identifying and discarding affected loci. GIbPSs creates a genotype database that offers rich functionality for data filtering and export in numerous formats. We performed comparative analyses of simulated and real GBS data with GIbPSs and another program, pyRAD. This program accounts for indel variation by aligning homologous sequences. GIbPSs performed better than pyRAD in several aspects. It required much less computation time and displayed higher genotyping accuracy. GIbPSs retained smaller numbers of loci overall in analyses of real GBS data. It nevertheless delivered more complete genotype matrices with greater locus overlap between individuals and greater numbers of loci sampled in all individuals.  相似文献   

11.
The identification of thousands of variants across the genomes and their accurate genotyping are crucial for estimating the genetic parameters needed to address a host of molecular ecological and evolutionary questions. With rapid advances of massively parallel high‐throughput sequencing technologies, several methods have recently been developed to access genomewide data on population variation. One of the most successful and widely used techniques relies on the combination of restriction enzymes and sequencing‐by‐synthesis: restriction‐site‐associated DNA sequencing (RADSeq). We developed a new, more time‐ and cost‐efficient double‐digest RAD paired‐end protocol (quaddRAD) that simplifies and speeds up the identification of PCR duplicates and permits large‐scale multiplexing. Assessing its performance on a technical data set, we also applied the quaddRAD method on population samples of a Neotropical cichlid fish lineage (Archocentrus centrarchus) to assess its genetic structure and demographic history. While we identified allopatric interlake genetic divergence, most likely driven by drift, no signature of sympatric divergence was detected. This differs from what has been observed in the clade of Midas cichlids (Amphilophus citrinellus spp.), another cichlid lineage that inhabits the same lakes and shares a similar demographic history, but has evolved into small‐scale adaptive radiations via sympatric speciation. We demonstrate that quaddRAD is a robust and efficient method for genotyping a massive number and widely overlapping set of loci with high accuracy. Furthermore, the results on A. centrarchus open new research avenues providing an ideal system to investigate genome‐level mechanisms that could alter the speciation potential of different but closely related cichlid lineages.  相似文献   

12.
The trade‐offs of using single‐digest vs. double‐digest restriction site‐associated DNA sequencing (RAD‐seq) protocols have been widely discussed. However, no direct empirical comparisons of the two methods have been conducted. Here, we sampled a single population of Gulf pipefish (Syngnathus scovelli) and genotyped 444 individuals using RAD‐seq. Sixty individuals were subjected to single‐digest RAD‐seq (sdRAD‐seq), and the remaining 384 individuals were genotyped using a double‐digest RAD‐seq (ddRAD‐seq) protocol. We analysed the resulting Illumina sequencing data and compared the two genotyping methods when reads were analysed either together or separately. Coverage statistics, observed heterozygosity, and allele frequencies differed significantly between the two protocols, as did the results of selection components analysis. We also performed an in silico digestion of the Gulf pipefish genome and modelled five major sources of bias: PCR duplicates, polymorphic restriction sites, shearing bias, asymmetric sampling (i.e., genotyping fewer individuals with sdRAD‐seq than with ddRAD‐seq) and higher major allele frequencies. This combination of approaches allowed us to determine that polymorphic restriction sites, an asymmetric sampling scheme, mean allele frequencies and to some extent PCR duplicates all contribute to different estimates of allele frequencies between samples genotyped using sdRAD‐seq versus ddRAD‐seq. Our finding that sdRAD‐seq and ddRAD‐seq can result in different allele frequencies has implications for comparisons across studies and techniques that endeavour to identify genomewide signatures of evolutionary processes in natural populations.  相似文献   

13.
The advent of next‐generation sequencing (NGS) has dramatically changed bacterial typing technologies, increasing our ability to differentiate bacterial isolates. Despite it is now possible to sequence a bacterial genome in a few days and at reasonable costs, most genetic analyses do not require whole‐genome sequencing, which also remains impractical for large population samples due to the cost of individual library preparation and bioinformatics. More traditional sequencing approaches, however, such as MultiLocus Sequence Typing (mlst ) are quite laborious and time‐consuming, especially for large‐scale analyses. In this study, a genotyping approach based on restriction site‐associated (RAD) tag sequencing, 2b‐RAD, was applied to characterize Listeria monocytogenes strains. To verify the feasibility of the method, an in silico analysis was performed on 30 available complete genomes. For the same set of strains, in silico mlst analysis was conducted as well. Subsequently, 2b‐RAD and mlst analyses were experimentally carried out on 58 isolates collected from food samples or food‐processing sites. The obtained results demonstrate that 2b‐RAD predicts mlst types and often provides more detailed information on population structure than mlst . Moreover, the majority of variants differentiating identical sequence type isolates mapped against accessory fragments, thus providing additional information to characterize strains. Although mlst still represents a reliable typing method, large‐scale studies on molecular epidemiology and public health, as well as bacterial phylogenetics, population genetics and biosafety could benefit of a low cost and fast turnaround time approach such as the 2b‐RAD analysis proposed here.  相似文献   

14.
Species’ responses at the genetic level are key to understanding the long‐term consequences of anthropogenic global change. Herbaria document such responses, and, with contemporary sampling, provide high‐resolution time‐series of plant evolutionary change. Characterizing genetic diversity is straightforward for model species with small genomes and a reference sequence. For nonmodel species—with small or large genomes—diversity is traditionally assessed using restriction‐enzyme‐based sequencing. However, age‐related DNA damage and fragmentation preclude the use of this approach for ancient herbarium DNA. Here, we combine reduced‐representation sequencing and hybridization‐capture to overcome this challenge and efficiently compare contemporary and historical specimens. Specifically, we describe how homemade DNA baits can be produced from reduced‐representation libraries of fresh samples, and used to efficiently enrich historical libraries for the same fraction of the genome to produce compatible sets of sequence data from both types of material. Applying this approach to both Arabidopsis thaliana and the nonmodel plant Cardamine bulbifera, we discovered polymorphisms de novo in an unbiased, reference‐free manner. We show that the recovered genetic variation recapitulates known genetic diversity in A. thaliana, and recovers geographical origin in both species and over time, independent of bait diversity. Hence, our method enables fast, cost‐efficient, large‐scale integration of contemporary and historical specimens for assessment of genome‐wide genetic trends over time, independent of genome size and presence of a reference genome.  相似文献   

15.
DNA quantity can be a hindrance in ecological and evolutionary research programmes due to a range of factors including endangered status of target organisms, available tissue type, and the impact of field conditions on preservation methods. A potential solution to low‐quantity DNA lies in whole genome amplification (WGA) techniques that can substantially increase DNA yield. To date, few studies have rigorously examined sequence bias that might result from WGA and next‐generation sequencing of nonmodel taxa. To address this knowledge deficit, we use multiple displacement amplification (MDA) and double‐digest RAD sequencing on the grey mouse lemur (Microcebus murinus) to quantify bias in genome coverage and SNP calls when compared to raw genomic DNA (gDNA). We focus our efforts in providing baseline estimates of potential bias by following manufacturer's recommendations for starting DNA quantities (>100 ng). Our results are strongly suggestive that MDA enrichment does not introduce systematic bias to genome characterization. SNP calling between samples when genotyping both de‐novo and with a reference genome are highly congruent (>98%) when specifying a minimum threshold of 20X stack depth to call genotypes. Relative genome coverage is also similar between MDA and gDNA, and allelic dropout is not observed. SNP concordance varies based on coverage threshold, with 95% concordance reached at ~12X coverage genotyping de‐novo and ~7X coverage genotyping with the reference genome. These results suggest that MDA may be a suitable solution for next‐generation molecular ecological studies when DNA quantity would otherwise be a limiting factor.  相似文献   

16.
Understanding the genetics of biological diversification across micro‐ and macro‐evolutionary time scales is a vibrant field of research for molecular ecologists as rapid advances in sequencing technologies promise to overcome former limitations. In palms, an emblematic, economically and ecologically important plant family with high diversity in the tropics, studies of diversification at the population and species levels are still hampered by a lack of genomic markers suitable for the genotyping of large numbers of recently diverged taxa. To fill this gap, we used a whole genome sequencing approach to develop target sequencing for molecular markers in 4,184 genome regions, including 4,051 genes and 133 non‐genic putatively neutral regions. These markers were chosen to cover a wide range of evolutionary rates allowing future studies at the family, genus, species and population levels. Special emphasis was given to the avoidance of copy number variation during marker selection. In addition, a set of 149 well‐known sequence regions previously used as phylogenetic markers by the palm biological research community were included in the target regions, to open the possibility to combine and jointly analyse already available data sets with genomic data to be produced with this new toolkit. The bait set was effective for species belonging to all three palm sub‐families tested (Arecoideae, Ceroxyloideae and Coryphoideae), with high mapping rates, specificity and efficiency. The number of high‐quality single nucleotide polymorphisms (SNPs) detected at both the sub‐family and population levels facilitates efficient analyses of genomic diversity across micro‐ and macro‐evolutionary time scales.  相似文献   

17.
18.
Reduced representation genome sequencing such as restriction‐site‐associated DNA (RAD) sequencing is finding increased use to identify and genotype large numbers of single‐nucleotide polymorphisms (SNPs) in model and nonmodel species. We generated a unique resource of novel SNP markers for the European eel using the RAD sequencing approach that was simultaneously identified and scored in a genome‐wide scan of 30 individuals. Whereas genomic resources are increasingly becoming available for this species, including the recent release of a draft genome, no genome‐wide set of SNP markers was available until now. The generated SNPs were widely distributed across the eel genome, aligning to 4779 different contigs and 19 703 different scaffolds. Significant variation was identified, with an average nucleotide diversity of 0.00529 across individuals. Results varied widely across the genome, ranging from 0.00048 to 0.00737 per locus. Based on the average nucleotide diversity across all loci, long‐term effective population size was estimated to range between 132 000 and 1 320 000, which is much higher than previous estimates based on microsatellite loci. The generated SNP resource consisting of 82 425 loci and 376 918 associated SNPs provides a valuable tool for future population genetics and genomics studies and allows for targeting specific genes and particularly interesting regions of the eel genome.  相似文献   

19.
The identification of genes influencing fitness is central to our understanding of the genetic basis of adaptation and how it shapes phenotypic variation in wild populations. Here, we used whole‐genome resequencing of wild Rocky Mountain bighorn sheep (Ovis canadensis) to >50‐fold coverage to identify 2.8 million single nucleotide polymorphisms (SNPs) and genomic regions bearing signatures of directional selection (i.e. selective sweeps). A comparison of SNP diversity between the X chromosome and the autosomes indicated that bighorn males had a dramatically reduced long‐term effective population size compared to females. This probably reflects a long history of intense sexual selection mediated by male–male competition for mates. Selective sweep scans based on heterozygosity and nucleotide diversity revealed evidence for a selective sweep shared across multiple populations at RXFP2, a gene that strongly affects horn size in domestic ungulates. The massive horns carried by bighorn rams appear to have evolved in part via strong positive selection at RXFP2. We identified evidence for selection within individual populations at genes affecting early body growth and cellular response to hypoxia; however, these must be interpreted more cautiously as genetic drift is strong within local populations and may have caused false positives. These results represent a rare example of strong genomic signatures of selection identified at genes with known function in wild populations of a nonmodel species. Our results also showcase the value of reference genome assemblies from agricultural or model species for studies of the genomic basis of adaptation in closely related wild taxa.  相似文献   

20.
Restriction site‐associated DNA sequencing (RAD‐Seq), a next‐generation sequencing‐based genome ‘complexity reduction’ protocol, has been useful in population genomics in species with a reference genome. However, the application of this protocol to natural populations of genomically underinvestigated species, particularly under low‐to‐medium sequencing depth, has not been well justified. In this study, a Bayesian method was developed for calling genotypes from an F2 population of bottle gourd [Lagenaria siceraria (Mol.) Standl.] to construct a high‐density genetic map. Low‐depth genome shotgun sequencing allowed the assembly of scaffolds/contigs comprising approximately 50% of the estimated genome, of which 922 were anchored for identifying syntenic regions between species. RAD‐Seq genotyping of a natural population comprising 80 accessions identified 3226 single nuclear polymorphisms (SNPs), based on which two sub‐gene pools were suggested for association with fruit shape. The two sub‐gene pools were moderately differentiated, as reflected by the Hudson's FST value of 0.14, and they represent regions on LG7 with strikingly elevated FST values. Seven‐fold reduction in heterozygosity and two times increase in LD (r2) were observed in the same region for the round‐fruited sub‐gene pool. Outlier test suggested the locus LX3405 on LG7 to be a candidate site under selection. Comparative genomic analysis revealed that the cucumber genome region syntenic to the high FST island on LG7 harbors an ortholog of the tomato fruit shape gene OVATE. Our results point to a bright future of applying RAD‐Seq to population genomic studies for non‐model species even under low‐to‐medium sequencing efforts. The genomic resources provide valuable information for cucurbit genome research.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号