首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Single nucleotide polymorphisms SNPs are rapidly replacing anonymous markers in population genomic studies, but their use in non model organisms is hampered by the scarcity of cost‐effective approaches to uncover genome‐wide variation in a comprehensive subset of individuals. The screening of one or only a few individuals induces ascertainment bias. To discover SNPs for a population genomic study of the Pyrenean rocket (Sisymbrium austriacum subsp. chrysanthum), we undertook a pooled RAD‐PE (Restriction site Associated DNA Paired‐End sequencing) approach. RAD tags were generated from the PstI‐digested pooled genomic DNA of 12 individuals sampled across the species distribution range and paired‐end sequenced using Illumina technology to produce ~24.5 Mb of sequences, covering ~7% of the specie's genome. Sequences were assembled into ~76 000 contigs with a mean length of 323 bp (N50 = 357 bp, sequencing depth = 24x). In all, >15 000 SNPs were called, of which 47% were annotated in putative genic regions based on homology with the Arabidopsis thaliana genome. Gene ontology (GO) slim categorization demonstrated that the identified SNPs covered extant genic variation well. The validation of 300 SNPs on a larger set of individuals using a KASPar assay underpinned the utility of pooled RAD‐PE as an inexpensive genome‐wide SNP discovery technique (success rate: 87%). In addition to SNPs, we discovered >600 putative SSR markers.  相似文献   

2.
Transposable elements (TEs) – selfish DNA sequences that can move within the genome – comprise a large proportion of the genomes of many organisms. Although low‐coverage whole‐genome sequencing can be used to survey TE composition, it is noneconomical for species with large quantities of DNA. Here, we utilize restriction‐site associated DNA sequencing (RADSeq) as an alternative method to survey TE composition. First, we demonstrate in silico that double digest restriction‐site associated DNA sequencing (ddRADseq) markers contain the same TE compositions as whole genome assemblies across arthropods. Next, we show empirically using eight Synalpheus snapping shrimp species with large genomes that TE compositions from ddRADseq and low‐coverage whole‐genome sequencing are comparable within and across species. Finally, we develop a new bioinformatic pipeline, TERAD, to extract TE compositions from RADseq data. Our study expands the utility of RADseq to study the repeatome, making comparative studies of genome structure for species with large genomes more tractable and affordable.  相似文献   

3.
Restriction site‐associated DNA sequencing (RADseq) has emerged as a useful tool in systematics and population genomics. A common feature of RADseq data sets is that they contain missing data that arise from multiple sources including genealogical sampling bias, assembly methodology and sequencing error. Many RADseq studies have demonstrated that allowing sites (single nucleotide polymorphisms, SNPs) with missing data can increase support for phylogenetic hypotheses. Two non‐mutually exclusive explanations for this observation are that (a) larger data sets contain more phylogenetic information; and (b) excluding missing data disproportionally removes sites with the highest mutation rates, causing the exclusion of characters that are likely variable and informative. Using a RADseq data set derived from the East African banana frog, Afrixalus fornasini (up to 1.1 million SNPs), we found that missing data thresholds were positively correlated with the proportion of parsimony‐informative sites and mean branch support. Using three proxies for estimating site‐specific rate, we found that the most conservative missing data strategies excluded rapidly evolving sites, with four‐state sites present only when allowing ≥60% missing data per SNP. Topological similarity among estimated phylogenies was highest for the data sets with ≥60% missing data per SNP. Our results suggest that several desirable phylogenetic qualities were observed when allowing ≥60% missing data per SNP. However, at the highest missing data thresholds (80% and 90% missing data per SNP), we observed differences in performance between high‐ and mixed‐weight DNA extraction samples, which may indicate there are trade‐offs to consider when using degraded genomic template with RADseq protocols.  相似文献   

4.
Effective population size (Ne) is a key parameter of population genetics. However, Ne remains challenging to estimate for natural populations as several factors are likely to bias estimates. These factors include sampling design, sequencing method, and data filtering. One issue inherent to the restriction site‐associated DNA sequencing (RADseq) protocol is missing data and SNP selection criteria (e.g., minimum minor allele frequency, number of SNPs). To evaluate the potential impact of SNP selection criteria on Ne estimates (Linkage Disequilibrium method) we used RADseq data for a nonmodel species, the thornback ray. In this data set, the inbreeding coefficient FIS was positively correlated with the amount of missing data, implying data were missing nonrandomly. The precision of Neestimates decreased with the number of SNPs. Mean Ne estimates (averaged across 50 random data sets with2000 SNPs) ranged between 237 and 1784. Increasing the percentage of missing data from 25% to 50% increased Ne estimates between 82% and 120%, while increasing the minor allele frequency (MAF) threshold from 0.01 to 0.1 decreased estimates between 71% and 75%. Considering these effects is important when interpreting RADseq data‐derived estimates of effective population size in empirical studies.  相似文献   

5.
Here, we present an adaptation of restriction‐site‐associated DNA sequencing (RAD‐seq) to the Illumina HiSeq2000 technology that we used to produce SNP markers in very large quantities at low cost per unit in the Réunion grey white‐eye (Zosterops borbonicus), a nonmodel passerine bird species with no reference genome. We sequenced a set of six pools of 18–25 individuals using a single sequencing lane. This allowed us to build around 600 000 contigs, among which at least 386 000 could be mapped to the zebra finch (Taeniopygia guttata) genome. This yielded more than 80 000 SNPs that could be mapped unambiguously and are evenly distributed across the genome. Thus, our approach provides a good illustration of the high potential of paired‐end RAD sequencing of pooled DNA samples combined with comparative assembly to the zebra finch genome to build large contigs and characterize vast numbers of informative SNPs in nonmodel passerine bird species in a very efficient and cost‐effective way.  相似文献   

6.
Single‐nucleotide polymorphisms (SNPs) are rapidly becoming the standard markers in population genomics studies; however, their use in nonmodel organisms is limited due to the lack of cost‐effective approaches to uncover genome‐wide variation, and the large number of individuals needed in the screening process to reduce ascertainment bias. To discover SNPs for population genomics studies in the fungal symbionts of the mountain pine beetle (MPB), we developed a road map to discover SNPs and to produce a genotyping platform. We undertook a whole‐genome sequencing approach of Leptographium longiclavatum in combination with available genomics resources of another MPB symbiont, Grosmannia clavigera. We sequenced 71 individuals pooled into four groups using the Illumina sequencing technology. We generated between 27 and 30 million reads of 75 bp that resulted in a total of 1, 181 contigs longer than 2 kb and an assembled genome size of 28.9 Mb (N50 = 48 kb, average depth = 125x). A total of 9052 proteins were annotated, and between 9531 and 17 266 SNPs were identified in the four pools. A subset of 206 genes (containing 574 SNPs, 11% false positives) was used to develop a genotyping platform for this species. Using this roadmap, we developed a genotyping assay with a total of 147 SNPs located in 121 genes using the Illumina® Sequenom iPLEX Gold. Our preliminary genotyping (success rate = 85%) of 304 individuals from 36 populations supports the utility of this approach for population genomics studies in other MPB fungal symbionts and other fungal nonmodel species.  相似文献   

7.
Minimally invasive sampling (MIS) is widespread in wildlife studies; however, its utility for massively parallel DNA sequencing (MPS) is limited. Poor sample quality and contamination by exogenous DNA can make MIS challenging to use with modern genotyping‐by‐sequencing approaches, which have been traditionally developed for high‐quality DNA sources. Given that MIS is often more appropriate in many contexts, there is a need to make such samples practical for harnessing MPS. Here, we test the ability for Genotyping‐in‐Thousands by sequencing (GT‐seq), a multiplex amplicon sequencing approach, to effectively genotype minimally invasive cloacal DNA samples collected from the Western Rattlesnake (Crotalus oreganus), a threatened species in British Columbia, Canada. As there was no previous genetic information for this species, an optimized panel of 362 SNPs was selected for use with GT‐seq from a de novo restriction site‐associated DNA sequencing (RADseq) assembly. Comparisons of genotypes generated within and among RADseq and GT‐seq for the same individuals found low rates of genotyping error (GT‐seq: 0.50%; RADseq: 0.80%) and discordance (2.57%), the latter likely due to the different genotype calling models employed. GT‐seq mean genotype discordance between blood and cloacal swab samples collected from the same individuals was also minimal (1.37%). Estimates of population diversity parameters were similar across GT‐seq and RADseq data sets, as were inferred patterns of population structure. Overall, GT‐seq can be effectively applied to low‐quality DNA samples, minimizing the inefficiencies presented by exogenous DNA typically found in minimally invasive samples and continuing the expansion of molecular ecology and conservation genetics in the genomics era.  相似文献   

8.
The conservation of threatened species must be underpinned by phylogeographic knowledge. This need is epitomized by the freshwater fish Carassius carassius, which is in decline across much of its European range. Restriction site‐associated DNA sequencing (RADseq) is increasingly used for such applications; however, RADseq is expensive, and limitations on sample number must be weighed against the benefit of large numbers of markers. This trade‐off has previously been examined using simulation studies; however, empirical comparisons between these markers, especially in a phylogeographic context, are lacking. Here, we compare the results from microsatellites and RADseq for the phylogeography of C. carassius to test whether it is more advantageous to genotype fewer markers (microsatellites) in many samples, or many markers (SNPs) in fewer samples. These data sets, along with data from the mitochondrial cytochrome b gene, agree on broad phylogeographic patterns, showing the existence of two previously unidentified C. carassius lineages in Europe: one found throughout northern and central‐eastern European drainages and a second almost exclusively confined to the Danubian catchment. These lineages have been isolated for approximately 2.15 m years and should be considered separate conservation units. RADseq recovered finer population structure and stronger patterns of IBD than microsatellites, despite including only 17.6% of samples (38% of populations and 52% of samples per population). RADseq was also used along with approximate Bayesian computation to show that the postglacial colonization routes of C. carassius differ from the general patterns of freshwater fish in Europe, likely as a result of their distinctive ecology.  相似文献   

9.
We present the development of a genomic library using RADseq (restriction site associated DNA sequencing) protocol for marker discovery that can be applied on evolutionary studies of the sugarcane borer Diatraea saccharalis, an important South American insect pest. A RADtag protocol combined with Illumina paired‐end sequencing allowed de novo discovery of 12 811 SNPs and a high‐quality assembly of 122.8M paired‐end reads from six individuals, representing 40 Gb of sequencing data. Approximately 1.7 Mb of the sugarcane borer genome distributed over 5289 minicontigs were obtained upon assembly of second reads from first reads RADtag loci where at least one SNP was discovered and genotyped. Minicontig lengths ranged from 200 to 611 bp and were used for functional annotation and microsatellite discovery. These markers will be used in future studies to understand gene flow and adaptation to host plants and control tactics.  相似文献   

10.
The conservation and management of endangered species requires information on their genetic diversity, relatedness and population structure. The main genetic markers applied for these questions are microsatellites and single nucleotide polymorphisms (SNPs), the latter of which remain the more resource demanding approach in most cases. Here, we compare the performance of two approaches, SNPs obtained by restriction‐site‐associated DNA sequencing (RADseq) and 16 DNA microsatellite loci, for estimating genetic diversity, relatedness and genetic differentiation of three, small, geographically close wild brown trout (Salmo trutta) populations and a regionally used hatchery strain. The genetic differentiation, quantified as FST, was similar when measured using 16 microsatellites and 4,876 SNPs. Based on both marker types, each brown trout population represented a distinct gene pool with a low level of interbreeding. Analysis of SNPs identified half‐ and full‐siblings with a higher probability than the analysis based on microsatellites, and SNPs outperformed microsatellites in estimating individual‐level multilocus heterozygosity. Overall, the results indicated that moderately polymorphic microsatellites and SNPs from RADseq agreed on estimates of population genetic structure in moderately diverged, small populations, but RADseq outperformed microsatellites for applications that required individual‐level genotype information, such as quantifying relatedness and individual‐level heterozygosity. The results can be applied to other small populations with low or moderate levels of genetic diversity.  相似文献   

11.
Natural history museums harbour a plethora of biological specimens which are of potential use in population and conservation genetic studies. Although technical advancements in museum genomics have enabled genome‐wide markers to be generated from aged museum specimens, the suitability of these data for robust biological inference is not well characterized. The aim of this study was to test the utility of museum specimens in population and conservation genomics by assessing the biological and technical validity of single nucleotide polymorphism (SNP) data derived from such samples. To achieve this, we generated thousands of SNPs from 47 red‐tailed black cockatoo (Calyptorhychus banksii) traditional museum samples (i.e. samples that were not collected with the primary intent of DNA analysis) and 113 fresh tissue samples (cryopreserved liver/muscle) using a restriction site‐associated DNA marker approach (DArTseq?). Thousands of SNPs were successfully generated from most of the traditional museum samples (with a mean age of 44 years, ranging from 5 to 123 years), although 38% did not provide useful data. These SNPs exhibited higher error rates and contained significantly more missing data compared with SNPs from fresh tissue samples, likely due to considerable DNA fragmentation. However, based on simulation results, the level of genotyping error had a negligible effect on inference of population structure in this species. We did identify a bias towards low diversity SNPs in older samples that appears to compromise temporal inferences of genetic diversity. This study demonstrates the utility of a RADseq‐based method to produce reliable genome‐wide SNP data from traditional museum specimens.  相似文献   

12.
Single nucleotide polymorphisms (SNPs) have become the marker of choice for genetic studies in organisms of conservation, commercial or biological interest. Most SNP discovery projects in nonmodel organisms apply a strategy for identifying putative SNPs based on filtering rules that account for random sequencing errors. Here, we analyse data used to develop 4723 novel SNPs for the commercially important deep‐sea fish, orange roughy (Hoplostethus atlanticus), to assess the impact of not accounting for systematic sequencing errors when filtering identified polymorphisms when discovering SNPs. We used SAMtools to identify polymorphisms in a velvet assembly of genomic DNA sequence data from seven individuals. The resulting set of polymorphisms were filtered to minimize ‘bycatch’—polymorphisms caused by sequencing or assembly error. An Illumina Infinium SNP chip was used to genotype a final set of 7714 polymorphisms across 1734 individuals. Five predictors were examined for their effect on the probability of obtaining an assayable SNP: depth of coverage, number of reads that support a variant, polymorphism type (e.g. A/C), strand‐bias and Illumina SNP probe design score. Our results indicate that filtering out systematic sequencing errors could substantially improve the efficiency of SNP discovery. We show that BLASTX can be used as an efficient tool to identify single‐copy genomic regions in the absence of a reference genome. The results have implications for research aiming to identify assayable SNPs and build SNP genotyping assays for nonmodel organisms.  相似文献   

13.
Restriction site‐associated DNA sequencing (RADseq) provides researchers with the ability to record genetic polymorphism across thousands of loci for nonmodel organisms, potentially revolutionizing the field of molecular ecology. However, as with other genotyping methods, RADseq is prone to a number of sources of error that may have consequential effects for population genetic inferences, and these have received only limited attention in terms of the estimation and reporting of genotyping error rates. Here we use individual sample replicates, under the expectation of identical genotypes, to quantify genotyping error in the absence of a reference genome. We then use sample replicates to (i) optimize de novo assembly parameters within the program Stacks, by minimizing error and maximizing the retrieval of informative loci; and (ii) quantify error rates for loci, alleles and single‐nucleotide polymorphisms. As an empirical example, we use a double‐digest RAD data set of a nonmodel plant species, Berberis alpina, collected from high‐altitude mountains in Mexico.  相似文献   

14.
15.
16.
17.
The assembly of a reference genome sequence of bread wheat is challenging due to its specific features such as the genome size of 17 Gbp, polyploid nature and prevalence of repetitive sequences. BAC‐by‐BAC sequencing based on chromosomal physical maps, adopted by the International Wheat Genome Sequencing Consortium as the key strategy, reduces problems caused by the genome complexity and polyploidy, but the repeat content still hampers the sequence assembly. Availability of a high‐resolution genomic map to guide sequence scaffolding and validate physical map and sequence assemblies would be highly beneficial to obtaining an accurate and complete genome sequence. Here, we chose the short arm of chromosome 7D (7DS) as a model to demonstrate for the first time that it is possible to couple chromosome flow sorting with genome mapping in nanochannel arrays and create a de novo genome map of a wheat chromosome. We constructed a high‐resolution chromosome map composed of 371 contigs with an N50 of 1.3 Mb. Long DNA molecules achieved by our approach facilitated chromosome‐scale analysis of repetitive sequences and revealed a ~800‐kb array of tandem repeats intractable to current DNA sequencing technologies. Anchoring 7DS sequence assemblies obtained by clone‐by‐clone sequencing to the 7DS genome map provided a valuable tool to improve the BAC‐contig physical map and validate sequence assembly on a chromosome‐arm scale. Our results indicate that creating genome maps for the whole wheat genome in a chromosome‐by‐chromosome manner is feasible and that they will be an affordable tool to support the production of improved pseudomolecules.  相似文献   

18.
Reduced representation genome sequencing such as restriction‐site‐associated DNA (RAD) sequencing is finding increased use to identify and genotype large numbers of single‐nucleotide polymorphisms (SNPs) in model and nonmodel species. We generated a unique resource of novel SNP markers for the European eel using the RAD sequencing approach that was simultaneously identified and scored in a genome‐wide scan of 30 individuals. Whereas genomic resources are increasingly becoming available for this species, including the recent release of a draft genome, no genome‐wide set of SNP markers was available until now. The generated SNPs were widely distributed across the eel genome, aligning to 4779 different contigs and 19 703 different scaffolds. Significant variation was identified, with an average nucleotide diversity of 0.00529 across individuals. Results varied widely across the genome, ranging from 0.00048 to 0.00737 per locus. Based on the average nucleotide diversity across all loci, long‐term effective population size was estimated to range between 132 000 and 1 320 000, which is much higher than previous estimates based on microsatellite loci. The generated SNP resource consisting of 82 425 loci and 376 918 associated SNPs provides a valuable tool for future population genetics and genomics studies and allows for targeting specific genes and particularly interesting regions of the eel genome.  相似文献   

19.
We identified ~13 000 putative single nucleotide polymorphisms (SNPs) by comparison of repeat‐masked BAC‐end sequences from the cattle RPCI‐42 BAC library with whole‐genome shotgun contigs of cattle genome assembly Btau 1.0. Genotyping of a subset of these SNPs was performed on a panel containing 186 DNA samples from 18 cattle breeds including 43 trios. Of 1039 SNPs confirmed as polymorphic in the panel, 998 had minor allele frequency ≥0.25 among unrelated individuals of at least one breed. When Btau 4.0 became available, 974 of these validated SNPs were assigned in silico to known cattle chromosomes, while 41 SNPs were mapped to unassigned sequence scaffolds, yielding one SNP every ~3 Mbp on average. Twenty‐four SNPs identified in Btau 1.0 were not mapped to Btau 4.0. Of the 1015 SNPs mapped to Btau 4.0, 959 SNPs had nucleotide bases identical in Btau 4.0 and Btau 1.0 contigs, whereas 56 bases were changed, resulting in the loss of the in silico SNP in Btau 4.0. Because these 1039 SNPs were all directly confirmed by genotyping on the multi‐breed panel, it is likely that the original polymorphisms were correctly identified. The 1039 validated SNPs identified in this study represent a new and useful resource for genome‐wide association studies and applications in animal breeding.  相似文献   

20.
Single nucleotide polymorphism (SNP) markers were identified and validated for two stingrays species, Potamotrygon motoro and Potamotrygon falkneri, using double digest restriction‐site associated DNA (ddRAD) reads using 454‐Roche technology. A total of 226 774 reads (65.5 Mb) were obtained (mean read length 289 ± 183 bp) detecting a total of 5399 contigs (mean contig length: 396 ± 91 bp). Mining this data set, a panel of 143 in silico SNPs was selected. Eighty‐two of these SNPs were successfully validated and 61 were polymorphic: 14 in P. falkneri, 21 in P. motoro, 3 in both species and 26 fixed for alternative variants in both species, thus being useful for population analyses and hybrid detection.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号