首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 812 毫秒
1.
Full genome sequencing of organisms with large and complex genomes is intractable and cost ineffective under most research budgets. Cycads (Cycadales) represent one of the oldest lineages of the extant seed plants and, partly due to their age, have incredibly large genomes up to ~60 Gbp. Restriction site‐associated DNA sequencing (RADseq) offers an approach to find genome‐wide informative markers and has proven to be effective with both model and nonmodel organisms. We tested the application of RADseq using ezRAD across all 10 genera of the Cycadales including an example data set of Cycas calcicola representing 72 samples from natural populations. Using previously available plastid and mitochondrial genomes as references, reads were mapped recovering plastid and mitochondrial genome regions and nuclear markers for all of the genera. De novo assembly generated up to 138,407 high‐depth clusters and up to 1,705 phylogenetically informative loci for the genera, and 4,421 loci for the example assembly of C. calcicola. The number of loci recovered by de novo assembly was lower than previous RADseq studies, yet still sufficient for downstream analysis. However, the number of markers could be increased by relaxing our assembly parameters, especially for the C. calcicola data set. Our results demonstrate the successful application of RADseq across the Cycadales to generate a large number of markers for all genomic compartments, despite the large number of plastids present in a typical plant cell. Our modified protocol was adapted to be applied to cycads and other organisms with large genomes to yield many informative genome‐wide markers.  相似文献   

2.
Restriction site‐associated DNA sequencing (RADseq) provides researchers with the ability to record genetic polymorphism across thousands of loci for nonmodel organisms, potentially revolutionizing the field of molecular ecology. However, as with other genotyping methods, RADseq is prone to a number of sources of error that may have consequential effects for population genetic inferences, and these have received only limited attention in terms of the estimation and reporting of genotyping error rates. Here we use individual sample replicates, under the expectation of identical genotypes, to quantify genotyping error in the absence of a reference genome. We then use sample replicates to (i) optimize de novo assembly parameters within the program Stacks, by minimizing error and maximizing the retrieval of informative loci; and (ii) quantify error rates for loci, alleles and single‐nucleotide polymorphisms. As an empirical example, we use a double‐digest RAD data set of a nonmodel plant species, Berberis alpina, collected from high‐altitude mountains in Mexico.  相似文献   

3.
Restriction‐site‐associated DNA tag (RAD‐tag) sequencing has become a popular approach to generate thousands of SNPs used to address diverse questions in population genomics. Comparatively, the suitability of RAD‐tag genotyping to address evolutionary questions across divergent species has been the subject of only a few recent studies. Here, we evaluate the applicability of this approach to conduct genome‐wide scans for polymorphisms across two cetacean species belonging to distinct families: the short‐beaked common dolphin (Delphinus delphis; n = 5 individuals) and the harbour porpoise (Phocoena phocoena; n = 1 individual). Additionally, we explore the effects of varying two parameters in the Stacks analysis pipeline on the number of loci and level of divergence obtained. We observed a 34% drop in the total number of loci that were present in all individuals when analysing individuals from the distinct families compared with analyses restricted to intraspecific comparisons (i.e. within D. delphis). Despite relatively stringent quality filters, 3595 polymorphic loci were retrieved from our interfamilial comparison. Cetaceans have undergone rapid diversification, and the estimated divergence time between the two families is relatively recent (14–19 Ma). Thus, our results showed that, for this level of divergence, a large number of orthologous loci can still be genotyped using this approach, which is on par with two recent in silico studies. Our findings constitute one of the first empirical investigations using RAD‐tag sequencing at this level of divergence and highlights the great potential of this approach in comparative studies and to address evolutionary questions.  相似文献   

4.
By combining next‐generation sequencing technology (454) and reduced representation library (RRL) construction, the rapid and economical isolation of over 25 000 potential single‐nucleotide polymorphisms (SNP) and >6000 putative microsatellite loci from c. 2% of the genome of the non‐model teleost, Atlantic cod Gadus morhua from the Celtic Sea, south of Ireland, was demonstrated. A small‐scale validation of markers indicated that 80% (11 of 14) of SNP loci and 40% (6 of 15) of the microsatellite loci could be amplified and showed variability. The results clearly show that small‐scale next‐generation sequencing of RRL genomes is an economical and rapid approach for simultaneous SNP and microsatellite discovery that is applicable to any species. The low cost and relatively small investment in time allows for positive exploitation of ascertainment bias to design markers applicable to specific populations and study questions.  相似文献   

5.
Laura E. Timm 《Molecular ecology》2020,29(12):2133-2136
From its inception, population genetics has been nearly as concerned with the genetic data type—to which analyses are brought to bear—as it is with the analysis methods themselves. The field has traversed allozymes, microsatellites, segregating sites in multilocus alignments and, currently, single nucleotide polymorphisms (SNPs) generated by high‐throughput genomic sequencing methods, primarily whole genome sequencing and reduced representation library (RRL) sequencing. As each emerging data type has gained traction, it has been compared to existing methods, based on its relative ability to discern population structural complexity at increasing levels of resolution. However, this is usually done by comparing the gold standard in one data type to the gold standard in the new data type. These gold standards frequently differ in power and in sampling density, both across a genome and throughout a spatial range. In this issue of Molecular Ecology, D’Aloia et al. apply the high‐throughput approach as fully as possible to microsatellites, nuclear loci and SNPs genotyped through an RRL method; this is coupled with a spatially dense sampling scheme. Completing a battery of population genetics analyses across data types (including a series of down‐sampled data sets), the authors find that SNP data are slightly more sensitive to fine‐scale genetic structure, and the results are more resilient to down‐sampling than microsatellites and nonrepetitive nuclear loci. However, their results are far from an unqualified victory for RRL SNP data over all previous data types: the authors note that modest additions to the microsatellites and nuclear loci data sets may provide the necessary analytical power to delineate the fine‐scale genetic structuring identified by SNPs. As always, as the field begins to fully embrace the newest thing, good science reminds us that traditional data types are far from useless, especially when combined with a well‐designed sampling scheme.  相似文献   

6.
A growing variety of “genotype-by-sequencing” (GBS) methods use restriction enzymes and high throughput DNA sequencing to generate data for a subset of genomic loci, allowing the simultaneous discovery and genotyping of thousands of polymorphisms in a set of multiplexed samples. We evaluated a “double-digest” restriction-site associated DNA sequencing (ddRAD-seq) protocol by 1) comparing results for a zebra finch (Taeniopygia guttata) sample with in silico predictions from the zebra finch reference genome; 2) assessing data quality for a population sample of indigobirds (Vidua spp.); and 3) testing for consistent recovery of loci across multiple samples and sequencing runs. Comparison with in silico predictions revealed that 1) over 90% of predicted, single-copy loci in our targeted size range (178–328 bp) were recovered; 2) short restriction fragments (38–178 bp) were carried through the size selection step and sequenced at appreciable depth, generating unexpected but nonetheless useful data; 3) amplification bias favored shorter, GC-rich fragments, contributing to among locus variation in sequencing depth that was strongly correlated across samples; 4) our use of restriction enzymes with a GC-rich recognition sequence resulted in an up to four-fold overrepresentation of GC-rich portions of the genome; and 5) star activity (i.e., non-specific cutting) resulted in thousands of “extra” loci sequenced at low depth. Results for three species of indigobirds show that a common set of thousands of loci can be consistently recovered across both individual samples and sequencing runs. In a run with 46 samples, we genotyped 5,996 loci in all individuals and 9,833 loci in 42 or more individuals, resulting in <1% missing data for the larger data set. We compare our approach to similar methods and discuss the range of factors (fragment library preparation, natural genetic variation, bioinformatics) influencing the recovery of a consistent set of loci among samples.  相似文献   

7.
Reduced representation genome sequencing such as restriction‐site‐associated DNA (RAD) sequencing is finding increased use to identify and genotype large numbers of single‐nucleotide polymorphisms (SNPs) in model and nonmodel species. We generated a unique resource of novel SNP markers for the European eel using the RAD sequencing approach that was simultaneously identified and scored in a genome‐wide scan of 30 individuals. Whereas genomic resources are increasingly becoming available for this species, including the recent release of a draft genome, no genome‐wide set of SNP markers was available until now. The generated SNPs were widely distributed across the eel genome, aligning to 4779 different contigs and 19 703 different scaffolds. Significant variation was identified, with an average nucleotide diversity of 0.00529 across individuals. Results varied widely across the genome, ranging from 0.00048 to 0.00737 per locus. Based on the average nucleotide diversity across all loci, long‐term effective population size was estimated to range between 132 000 and 1 320 000, which is much higher than previous estimates based on microsatellite loci. The generated SNP resource consisting of 82 425 loci and 376 918 associated SNPs provides a valuable tool for future population genetics and genomics studies and allows for targeting specific genes and particularly interesting regions of the eel genome.  相似文献   

8.
Population genetic studies of nonmodel organisms frequently employ reduced representation library (RRL) methodologies, many of which rely on protocols in which genomic DNA is digested by one or more restriction enzymes. However, because high molecular weight DNA is recommended for these protocols, samples with degraded DNA are generally unsuitable for RRL methods. Given that ancient and historic specimens can provide key temporal perspectives to evolutionary questions, we explored how custom‐designed RNA probes could enrich for RRL loci (Restriction Enzyme‐Associated Loci baits, or REALbaits). Starting with genotyping‐by‐sequencing (GBS) data generated on modern common ragweed (Ambrosia artemisiifolia L.) specimens, we designed 20 000 RNA probes to target well‐characterized genomic loci in herbarium voucher specimens dating from 1835 to 1913. Compared to shotgun sequencing, we observed enrichment of the targeted loci at 19‐ to 151‐fold. Using our GBS capture pipeline on a data set of 38 herbarium samples, we discovered 22 813 SNPs, providing sufficient genomic resolution to distinguish geographic populations. For these samples, we found that dilution of REALbaits to 10% of their original concentration still yielded sufficient data for downstream analyses and that a sequencing depth of ~7m reads was sufficient to characterize most loci without wasting sequencing capacity. In addition, we observed that targeted loci had highly variable rates of success, which we primarily attribute to similarity between loci, a trait that ultimately interferes with unambiguous read mapping. Our findings can help researchers design capture experiments for RRL loci, thereby providing an efficient means to integrate samples with degraded DNA into existing RRL data sets.  相似文献   

9.
During speciation‐with‐gene‐flow, effective migration varies across the genome as a function of several factors, including proximity of selected loci, recombination rate, strength of selection, and number of selected loci. Genome scans may provide better empirical understanding of the genome‐wide patterns of genetic differentiation, especially if the variance due to the previously mentioned factors is partitioned. In North American lake whitefish (Coregonus clupeaformis), glacial lineages that diverged in allopatry about 60,000 years ago and came into contact 12,000 years ago have independently evolved in several lakes into two sympatric species pairs (a normal benthic and a dwarf limnetic). Variable degrees of reproductive isolation between species pairs across lakes offer a continuum of genetic and phenotypic divergence associated with adaptation to distinct ecological niches. To disentangle the complex array of genetically based barriers that locally reduce the effective migration rate between whitefish species pairs, we compared genome‐wide patterns of divergence across five lakes distributed along this divergence continuum. Using restriction site associated DNA (RAD) sequencing, we combined genetic mapping and population genetics approaches to identify genomic regions resistant to introgression and derive empirical measures of the barrier strength as a function of recombination distance. We found that the size of the genomic islands of differentiation was influenced by the joint effects of linkage disequilibrium maintained by selection on many loci, the strength of ecological niche divergence, as well as demographic characteristics unique to each lake. Partial parallelism in divergent genomic regions likely reflected the combined effects of polygenic adaptation from standing variation and independent changes in the genetic architecture of postzygotic isolation. This study illustrates how integrating genetic mapping and population genomics of multiple sympatric species pairs provide a window on the speciation‐with‐gene‐flow mechanism.  相似文献   

10.
High‐throughput DNA sequencing facilitates the analysis of large portions of the genome in nonmodel organisms, ensuring high accuracy of population genetic parameters. However, empirical studies evaluating the appropriate sample size for these kinds of studies are still scarce. In this study, we use double‐digest restriction‐associated DNA sequencing (ddRADseq) to recover thousands of single nucleotide polymorphisms (SNPs) for two physically isolated populations of Amphirrhox longifolia (Violaceae), a nonmodel plant species for which no reference genome is available. We used resampling techniques to construct simulated populations with a random subset of individuals and SNPs to determine how many individuals and biallelic markers should be sampled for accurate estimates of intra‐ and interpopulation genetic diversity. We identified 3646 and 4900 polymorphic SNPs for the two populations of A. longifolia, respectively. Our simulations show that, overall, a sample size greater than eight individuals has little impact on estimates of genetic diversity within A. longifolia populations, when 1000 SNPs or higher are used. Our results also show that even at a very small sample size (i.e. two individuals), accurate estimates of FST can be obtained with a large number of SNPs (≥1500). These results highlight the potential of high‐throughput genomic sequencing approaches to address questions related to evolutionary biology in nonmodel organisms. Furthermore, our findings also provide insights into the optimization of sampling strategies in the era of population genomics.  相似文献   

11.
  1. Knowledge of relationships in wild populations is critical for better understanding mating systems and inbreeding scenarios to inform conservation strategies for endangered species. To delineate pedigrees in wild populations, study genetic connectivity, study genotype‐phenotype associations, trace individuals, or track wildlife trade, many identified individuals need to be genotyped at thousands of loci, mostly from noninvasive samples. This requires us to (a) identify the most common noninvasive sample available from identified individuals, (b) assess the ability to acquire genome‐wide data from such samples, and (c) evaluate the quality of such genome‐wide data, and its ability to reconstruct relationships between animals within a population.
  2. We followed identified individuals from a wild endangered tiger population and found that shed hair samples were the most common compared to scat samples, opportunistically found carcasses, and opportunistic invasive samples. We extracted DNA from these samples, prepared whole genome sequencing libraries, and sequenced genomes from these.
  3. Whole genome sequencing methods resulted in between 25%–98% of the genome sequenced for five such samples. Exploratory population genetic analyses revealed that these data were free of holistic biases and could recover expected population structure and relatedness. Mitochondrial genomes recovered matrilineages in accordance with long‐term monitoring data. Even with just five samples, we were able to uncover the matrilineage for three individuals with unknown ancestry.
  4. In summary, we demonstrated that noninvasive shed hair samples yield adequate quality and quantity of DNA in conjunction with sensitive library preparation methods, and provide reliable data from hundreds of thousands of SNPs across the genome. This makes shed hair an ideal noninvasive resource for studying individual‐based genetics of elusive endangered species in the wild.
  相似文献   

12.
Population genetic studies in nonmodel organisms are often hampered by a lack of reference genomes that are essential for whole‐genome resequencing. In the light of this, genotyping methods have been developed to effectively eliminate the need for a reference genome, such as genotyping by sequencing or restriction site‐associated DNA sequencing (RAD‐seq). However, what remains relatively poorly studied is how accurately these methods capture both average and variation in genetic diversity across an organism's genome. In this issue of Molecular Ecology Resources, Dutoit et al. (2016) use whole‐genome resequencing data from the collard flycatcher to assess what factors drive heterogeneity in nucleotide diversity across the genome. Using these data, they then simulate how well different sequencing designs, including RAD sequencing, could capture most of the variation in genetic diversity. They conclude that for evolutionary and conservation‐related studies focused on the estimating genomic diversity, researchers should emphasize the number of loci analysed over the number of individuals sequenced.  相似文献   

13.
Laboratory techniques for high‐throughput sequencing have enhanced our ability to generate DNA sequence data from millions of natural history specimens collected prior to the molecular era, but remain poorly tested at shallower evolutionary time scales. Hybridization capture using restriction site‐associated DNA probes (hyRAD) is a recently developed method for population genomics with museum specimens. The hyRAD method employs fragments produced in a restriction site‐associated double digestion as the basis for probes that capture orthologous loci in samples of interest. While promising in that it does not require a reference genome, hyRAD has yet to be applied across study systems in independent laboratories. Here, we provide an independent assessment of the effectiveness of hyRAD on both fresh avian tissue and dried tissue from museum specimens up to 140 years old and investigate how variable quantities of input DNA affect sequencing, assembly, and population genetic inference. We present a modified bench protocol and bioinformatics pipeline, including three steps for detection and removal of microbial and mitochondrial DNA contaminants. We confirm that hyRAD is an effective tool for sampling thousands of orthologous SNPs from historic museum specimens to describe phylogeographic patterns. We find that modern DNA performs significantly better than historical DNA better during sequencing but that assembly performance is largely equivalent. We also find that the quantity of input DNA predicts %GC content of assembled contiguous sequences, suggesting PCR bias. We caution against sampling schemes that include taxonomic or geographic autocorrelation across modern and historic samples.  相似文献   

14.
Decreasing sequencing costs have driven a rapid expansion of novel genotyping methods. One of these methods is the exploitation of restriction enzyme cut sites to generate genome‐wide but reduced representation sequencing libraries (RRLs), alternatively termed genotyping by sequencing or restriction‐site associated DNA sequencing. Without a reference genome, the resulting short sequence reads must be assembled de novo. There are many possible assembly programs, most not explicitly developed for RRL data, and we know little of their effectiveness. In this issue of Molecular Ecology Resources, LaCava et al. (2020) systematically evaluate six commonly used programs and two commonly varied parameters for complete and accurate assembly of RRLs, using simulated double digests of Homo sapiens and Arabidopsis thaliana genomes with varied mutation rates and types. The authors find substantial variation in performance across assembly programs. The most consistently high‐performing assembler is infrequently used in their literature survey (CD‐HIT; Li and Godzik, 2006), while several others fail to produce complete, accurate assemblies under many conditions. LaCava et al. additionally recommend best practices in parameter choice and evaluation of future assembly programs—advice that molecular ecologists working to assemble sequences of all kinds should take to heart.  相似文献   

15.
Restriction site‐associated DNA sequencing (RAD‐Seq), a next‐generation sequencing‐based genome ‘complexity reduction’ protocol, has been useful in population genomics in species with a reference genome. However, the application of this protocol to natural populations of genomically underinvestigated species, particularly under low‐to‐medium sequencing depth, has not been well justified. In this study, a Bayesian method was developed for calling genotypes from an F2 population of bottle gourd [Lagenaria siceraria (Mol.) Standl.] to construct a high‐density genetic map. Low‐depth genome shotgun sequencing allowed the assembly of scaffolds/contigs comprising approximately 50% of the estimated genome, of which 922 were anchored for identifying syntenic regions between species. RAD‐Seq genotyping of a natural population comprising 80 accessions identified 3226 single nuclear polymorphisms (SNPs), based on which two sub‐gene pools were suggested for association with fruit shape. The two sub‐gene pools were moderately differentiated, as reflected by the Hudson's FST value of 0.14, and they represent regions on LG7 with strikingly elevated FST values. Seven‐fold reduction in heterozygosity and two times increase in LD (r2) were observed in the same region for the round‐fruited sub‐gene pool. Outlier test suggested the locus LX3405 on LG7 to be a candidate site under selection. Comparative genomic analysis revealed that the cucumber genome region syntenic to the high FST island on LG7 harbors an ortholog of the tomato fruit shape gene OVATE. Our results point to a bright future of applying RAD‐Seq to population genomic studies for non‐model species even under low‐to‐medium sequencing efforts. The genomic resources provide valuable information for cucurbit genome research.  相似文献   

16.
High‐throughput sequencing has revolutionized population and conservation genetics. RAD sequencing methods, such as 2b‐RAD, can be used on species lacking a reference genome. However, transferring protocols across taxa can potentially lead to poor results. We tested two different IIB enzymes (AlfI and CspCI) on two species with different genome sizes (the loggerhead turtle Caretta caretta and the sharpsnout seabream Diplodus puntazzo) to build a set of guidelines to improve 2b‐RAD protocols on non‐model organisms while optimising costs. Good results were obtained even with degraded samples, showing the value of 2b‐RAD in studies with poor DNA quality. However, library quality was found to be a critical parameter on the number of reads and loci obtained for genotyping. Resampling analyses with different number of reads per individual showed a trade‐off between number of loci and number of reads per sample. The resulting accumulation curves can be used as a tool to calculate the number of sequences per individual needed to reach a mean depth ≥20 reads to acquire good genotyping results. Finally, we demonstrated that selective‐base ligation does not affect genomic differentiation between individuals, indicating that this technique can be used in species with large genome sizes to adjust the number of loci to the study scope, to reduce sequencing costs and to maintain suitable sequencing depth for a reliable genotyping without compromising the results. Here, we provide a set of guidelines to improve 2b‐RAD protocols on non‐model organisms with different genome sizes, helping decision‐making for a reliable and cost‐effective genotyping.  相似文献   

17.
Modern analytical methods for population genetics and phylogenetics are expected to provide more accurate results when data from multiple genome‐wide loci are analysed. We present the results of an initial application of parallel tagged sequencing (PTS) on a next‐generation platform to sequence thousands of barcoded PCR amplicons generated from 95 nuclear loci and 93 individuals sampled across the range of the tiger salamander (Ambystoma tigrinum) species complex. To manage the bioinformatic processing of this large data set (344 330 reads), we developed a pipeline that sorts PTS data by barcode and locus, identifies high‐quality variable nucleotides and yields phased haplotype sequences for each individual at each locus. Our sequencing and bioinformatic strategy resulted in a genome‐wide data set with relatively low levels of missing data and a wide range of nucleotide variation. structure analyses of these data in a genotypic format resulted in strongly supported assignments for the majority of individuals into nine geographically defined genetic clusters. Species tree analyses of the most variable loci using a multi‐species coalescent model resulted in strong support for most branches in the species tree; however, analyses including more than 50 loci produced parameter sampling trends that indicated a lack of convergence on the posterior distribution. Overall, these results demonstrate the potential for amplicon‐based PTS to rapidly generate large‐scale data for population genetic and phylogenetic‐based research.  相似文献   

18.
Restriction‐site associated DNA sequencing (RAD‐seq) can identify and score thousands of genetic markers from a group of samples for population‐genetics studies. One challenge of de novo RAD‐seq analysis is to distinguish paralogous sequence variants (PSVs) from true single‐nucleotide polymorphisms (SNPs) associated with orthologous loci. In the absence of a reference genome, it is difficult to differentiate true SNPs from PSVs, and their impact on downstream analysis remains unclear. Here, we introduce a network‐based approach, PMERGE that connects fragments based on their DNA sequence similarity to identify probable PSVs. Applying our method to de novo RAD‐seq data from 150 Atlantic salmon (Salmo salar) samples collected from 15 locations across the Southern Newfoundland coast allowed the identification of 87% of total PSVs identified through alignment to the Atlantic salmon genome. Removal of these paralogs altered the inferred population structure, highlighting the potential impact of filtering in RAD‐seq analysis. PMERGE is also applied to a green crab (Carcinus maenas) data set consisting of 242 samples from 11 different locations and was successfully able to identify and remove the majority of paralogous loci (62%). The PMERGE software can be run as part of the widely used Stacks analysis package.  相似文献   

19.
High‐throughput sequencing methods for genotyping genome‐wide markers are being rapidly adopted for phylogenetics of nonmodel organisms in conservation and biodiversity studies. However, the reproducibility of SNP genotyping and degree of marker overlap or compatibility between datasets from different methodologies have not been tested in nonmodel systems. Using double‐digest restriction site‐associated DNA sequencing, we sequenced a common set of 22 specimens from the butterfly genus Speyeria on two different Illumina platforms, using two variations of library preparation. We then used a de novo approach to bioinformatic locus assembly and SNP discovery for subsequent phylogenetic analyses. We found a high rate of locus recovery despite differences in library preparation and sequencing platforms, as well as overall high levels of data compatibility after data processing and filtering. These results provide the first application of NGS methods for phylogenetic reconstruction in Speyeria and support the use and long‐term viability of SNP genotyping applications in nonmodel systems.  相似文献   

20.
We present the development of a genomic library using RADseq (restriction site associated DNA sequencing) protocol for marker discovery that can be applied on evolutionary studies of the sugarcane borer Diatraea saccharalis, an important South American insect pest. A RADtag protocol combined with Illumina paired‐end sequencing allowed de novo discovery of 12 811 SNPs and a high‐quality assembly of 122.8M paired‐end reads from six individuals, representing 40 Gb of sequencing data. Approximately 1.7 Mb of the sugarcane borer genome distributed over 5289 minicontigs were obtained upon assembly of second reads from first reads RADtag loci where at least one SNP was discovered and genotyped. Minicontig lengths ranged from 200 to 611 bp and were used for functional annotation and microsatellite discovery. These markers will be used in future studies to understand gene flow and adaptation to host plants and control tactics.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号