首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Minimally invasive sampling (MIS) is widespread in wildlife studies; however, its utility for massively parallel DNA sequencing (MPS) is limited. Poor sample quality and contamination by exogenous DNA can make MIS challenging to use with modern genotyping‐by‐sequencing approaches, which have been traditionally developed for high‐quality DNA sources. Given that MIS is often more appropriate in many contexts, there is a need to make such samples practical for harnessing MPS. Here, we test the ability for Genotyping‐in‐Thousands by sequencing (GT‐seq), a multiplex amplicon sequencing approach, to effectively genotype minimally invasive cloacal DNA samples collected from the Western Rattlesnake (Crotalus oreganus), a threatened species in British Columbia, Canada. As there was no previous genetic information for this species, an optimized panel of 362 SNPs was selected for use with GT‐seq from a de novo restriction site‐associated DNA sequencing (RADseq) assembly. Comparisons of genotypes generated within and among RADseq and GT‐seq for the same individuals found low rates of genotyping error (GT‐seq: 0.50%; RADseq: 0.80%) and discordance (2.57%), the latter likely due to the different genotype calling models employed. GT‐seq mean genotype discordance between blood and cloacal swab samples collected from the same individuals was also minimal (1.37%). Estimates of population diversity parameters were similar across GT‐seq and RADseq data sets, as were inferred patterns of population structure. Overall, GT‐seq can be effectively applied to low‐quality DNA samples, minimizing the inefficiencies presented by exogenous DNA typically found in minimally invasive samples and continuing the expansion of molecular ecology and conservation genetics in the genomics era.  相似文献   

2.
Characterization of highly duplicated genes, such as genes of the major histocompatibility complex (MHC), where multiple loci often co‐amplify, has until recently been hindered by insufficient read depths per amplicon. Here, we used ultra‐deep Illumina sequencing to resolve genotypes at exon 3 of MHC class I genes in the sedge warbler (Acrocephalus schoenobaenus). We sequenced 24 individuals in two replicates and used this data, as well as a simulated data set, to test the effect of amplicon coverage (range: 500–20 000 reads per amplicon) on the repeatability of genotyping using four different genotyping approaches. A third replicate employed unique barcoding to assess the extent of tag jumping, that is swapping of individual tag identifiers, which may confound genotyping. The reliability of MHC genotyping increased with coverage and approached or exceeded 90% within‐method repeatability of allele calling at coverages of >5000 reads per amplicon. We found generally high agreement between genotyping methods, especially at high coverages. High reliability of the tested genotyping approaches was further supported by our analysis of the simulated data set, although the genotyping approach relying primarily on replication of variants in independent amplicons proved sensitive to repeatable errors. According to the most repeatable genotyping method, the number of co‐amplifying variants per individual ranged from 19 to 42. Tag jumping was detectable, but at such low frequencies that it did not affect the reliability of genotyping. We thus demonstrate that gene families with many co‐amplifying genes can be reliably genotyped using HTS, provided that there is sufficient per amplicon coverage.  相似文献   

3.
The trade‐offs of using single‐digest vs. double‐digest restriction site‐associated DNA sequencing (RAD‐seq) protocols have been widely discussed. However, no direct empirical comparisons of the two methods have been conducted. Here, we sampled a single population of Gulf pipefish (Syngnathus scovelli) and genotyped 444 individuals using RAD‐seq. Sixty individuals were subjected to single‐digest RAD‐seq (sdRAD‐seq), and the remaining 384 individuals were genotyped using a double‐digest RAD‐seq (ddRAD‐seq) protocol. We analysed the resulting Illumina sequencing data and compared the two genotyping methods when reads were analysed either together or separately. Coverage statistics, observed heterozygosity, and allele frequencies differed significantly between the two protocols, as did the results of selection components analysis. We also performed an in silico digestion of the Gulf pipefish genome and modelled five major sources of bias: PCR duplicates, polymorphic restriction sites, shearing bias, asymmetric sampling (i.e., genotyping fewer individuals with sdRAD‐seq than with ddRAD‐seq) and higher major allele frequencies. This combination of approaches allowed us to determine that polymorphic restriction sites, an asymmetric sampling scheme, mean allele frequencies and to some extent PCR duplicates all contribute to different estimates of allele frequencies between samples genotyped using sdRAD‐seq versus ddRAD‐seq. Our finding that sdRAD‐seq and ddRAD‐seq can result in different allele frequencies has implications for comparisons across studies and techniques that endeavour to identify genomewide signatures of evolutionary processes in natural populations.  相似文献   

4.
Triploidy can occur naturally or be induced in fish and shellfish during artificial propagation in order to produce sterile individuals. Fisheries managers often stock these sterile triploids as a means of improving angling opportunities without risking unwanted reproduction of the stocked fish. Additionally, the rearing of all‐triploid individuals has been suggested as a means to reduce the possibility of escaped aquaculture fish interbreeding with wild populations. Efficient means of determining if an individual is triploid or diploid are therefore needed both to monitor the efficacy of triploidy‐inducing treatments and, when sampling fish from a body of water that has a mixture of diploids and triploids, to determine the ploidy of a fish prior to further analyses. Currently, ploidy is regularly measured through flow cytometry, but this technique typically utilizes a fresh blood sample. This study presents an alternative, cost‐effective method of determining ploidy by analysing amplicon‐sequencing data for biallelic single‐nucleotide polymorphisms (SNPs). For each sample, heterozygous genotypes are identified and the likelihoods of diploidy and triploidy are calculated based on the read counts for each allele. The accuracy of this method is demonstrated using triploid and diploid brook trout (Salvelinus fontinalis) genotyped with a panel of 234 SNPs and Chinook salmon (Oncorhynchus tshawytscha) genotyped with a panel of 298 SNPs following the GT‐seq methodology of amplicon sequencing.  相似文献   

5.
Single nucleotide polymorphisms (SNPs) are essential to the understanding of population genetic variation and diversity. Here, we performed restriction‐site‐associated DNA sequencing (RAD‐seq) on 72 individuals from 13 Chinese indigenous and three introduced chicken breeds. A total of 620 million reads were obtained using an Illumina Hiseq2000 sequencer. An average of 75 587 SNPs were identified from each individual. Further filtering strictly validated 28 895 SNPs candidates for all populations. When compared with the NCBI dbSNP (chicken_9031), 15 404 SNPs were new discoveries. In this study, RAD‐seq was performed for the first time on chickens, implicating the remarkable effectiveness and potential applications on genetic analysis and breeding technique for whole‐genome selection in chicken and other agricultural animals.  相似文献   

6.
Here, we present an adaptation of restriction‐site‐associated DNA sequencing (RAD‐seq) to the Illumina HiSeq2000 technology that we used to produce SNP markers in very large quantities at low cost per unit in the Réunion grey white‐eye (Zosterops borbonicus), a nonmodel passerine bird species with no reference genome. We sequenced a set of six pools of 18–25 individuals using a single sequencing lane. This allowed us to build around 600 000 contigs, among which at least 386 000 could be mapped to the zebra finch (Taeniopygia guttata) genome. This yielded more than 80 000 SNPs that could be mapped unambiguously and are evenly distributed across the genome. Thus, our approach provides a good illustration of the high potential of paired‐end RAD sequencing of pooled DNA samples combined with comparative assembly to the zebra finch genome to build large contigs and characterize vast numbers of informative SNPs in nonmodel passerine bird species in a very efficient and cost‐effective way.  相似文献   

7.
While various technologies for high‐throughput genotyping have been developed for ecological studies, simple methods tolerant to low‐quality DNA samples are still limited. In this study, we tested the availability of a random PCR‐based genotyping‐by‐sequencing technology, genotyping by random amplicon sequencing, direct (GRAS‐Di). We focused on population genetic analysis of estuarine mangrove fishes, including two resident species, the Amboina cardinalfish (Fibramia amboinensis, Bleeker, 1853) and the Duncker's river garfish (Zenarchopterus dunckeri, Mohr, 1926), and a marine migrant, the blacktail snapper (Lutjanus fulvus, Forster, 1801). Collections were from the Ryukyu Islands, southern Japan. PCR amplicons derived from ~130 individuals were pooled and sequenced in a single lane on a HiSeq2500 platform, and an average of three million reads was obtained per individual. Consensus contigs were assembled for each species and used for genotyping of single nucleotide polymorphisms by mapping trimmed reads onto the contigs. After quality filtering steps, 4,000–9,000 putative single nucleotide polymorphisms were detected for each species. Although DNA fragmentation can diminish genotyping performance when analysed on next‐generation sequencing technology, the effect was small. Genetic differentiation and a clear pattern of isolation‐by‐distance was observed in F. amboinensis and Z. dunckeri by means of principal component analysis, FST and the admixture analysis. By contrast, L. fulvus comprised a genetically homogeneous population with directional recent gene flow. These genetic differentiation patterns reflect patterns of estuary use through life history. These results showed the power of GRAS‐Di for fine‐grained genetic analysis using field samples, including mangrove fishes.  相似文献   

8.
Delineation of units below the species level is critical for prioritizing conservation actions for species at‐risk. Genetic studies play an important role in characterizing patterns of population connectivity and diversity to inform the designation of conservation units, especially for populations that are geographically isolated. The northernmost range margin of Western Rattlesnakes (Crotalus oreganus) occurs in British Columbia, Canada, where it is federally classified as threatened and restricted to five geographic regions. In these areas, Western Rattlesnakes hibernate (den) communally, raising questions about connectivity within and between den complexes. At present, Western Rattlesnake conservation efforts are hindered by a complete lack of information on genetic structure and degree of isolation at multiple scales, from the den to the regional level. To fill this knowledge gap, we used Genotyping‐in‐Thousands by sequencing (GT‐seq) to genotype an optimized panel of 362 single nucleotide polymorphisms (SNPs) from individual samples (n = 461) collected across the snake's distribution in western Canada and neighboring Washington (USA). Hierarchical STRUCTURE analyses found evidence for population structure within and among the five geographic regions in BC, as well as in Washington. Within these regions, 11 genetically distinct complexes of dens were identified, with some regions having multiple complexes. No significant pattern of isolation‐by‐distance and generally low levels of migration were detected among den complexes across regions. Additionally, snakes within dens generally were more related than those among den complexes within a region, indicating limited movement. Overall, our results suggest that the single, recognized designatable unit for Western Rattlesnakes in Canada should be re‐assessed to proactively focus conservation efforts on preserving total genetic variation detected range‐wide. More broadly, our study demonstrates a novel application of GT‐seq for investigating patterns of diversity in wild populations at multiple scales to better inform conservation management.  相似文献   

9.
Hierarchical shotgun sequencing remains the method of choice for assembling high‐quality reference sequences of complex plant genomes. The efficient exploitation of current high‐throughput technologies and powerful computational facilities for large‐insert clone sequencing necessitates the sequencing and assembly of a large number of clones in parallel. We developed a multiplexed pipeline for shotgun sequencing and assembling individual bacterial artificial chromosomes (BACs) using the Illumina sequencing platform. We illustrate our approach by sequencing 668 barley BACs (Hordeum vulgare L.) in a single Illumina HiSeq 2000 lane. Using a newly designed parallelized computational pipeline, we obtained sequence assemblies of individual BACs that consist, on average, of eight sequence scaffolds and represent >98% of the genomic inserts. Our BAC assemblies are clearly superior to a whole‐genome shotgun assembly regarding contiguity, completeness and the representation of the gene space. Our methods may be employed to rapidly obtain high‐quality assemblies of a large number of clones to assemble map‐based reference sequences of plant and animal species with complex genomes by sequencing along a minimum tiling path.  相似文献   

10.
Next‐generation sequencing (NGS) technologies are revolutionizing the fields of biology and medicine as powerful tools for amplicon sequencing (AS). Using combinations of primers and barcodes, it is possible to sequence targeted genomic regions with deep coverage for hundreds, even thousands, of individuals in a single experiment. This is extremely valuable for the genotyping of gene families in which locus‐specific primers are often difficult to design, such as the major histocompatibility complex (MHC). The utility of AS is, however, limited by the high intrinsic sequencing error rates of NGS technologies and other sources of error such as polymerase amplification or chimera formation. Correcting these errors requires extensive bioinformatic post‐processing of NGS data. Amplicon Sequence Assignment (amplisas ) is a tool that performs analysis of AS results in a simple and efficient way, while offering customization options for advanced users. amplisas is designed as a three‐step pipeline consisting of (i) read demultiplexing, (ii) unique sequence clustering and (iii) erroneous sequence filtering. Allele sequences and frequencies are retrieved in excel spreadsheet format, making them easy to interpret. amplisas performance has been successfully benchmarked against previously published genotyped MHC data sets obtained with various NGS technologies.  相似文献   

11.
12.
Mondal K  Shetty AC  Patel V  Cutler DJ  Zwick ME 《Genomics》2011,98(4):260-265
We used a RainDance Technologies (RDT) expanded content library to enrich the human X chromosome exome (2.5 Mb) from 26 male samples followed by Illumina sequencing. Our multiplex primer library covered 98.05% of the human X chromosome exome in a single tube with 11,845 different PCR amplicons. Illumina sequencing of 24 male samples showed coverage for 97% of the targeted sequences. Sequence from 2 HapMap samples confirmed missing data rates of 2–3% at sites successfully typed by the HapMap project, with an accuracy of at least ~ 99.5% as compared to reported HapMap genotypes. Our demonstration that a RDT expanded content library can efficiently enrich and enable the routine sequencing of the human X chromosome exome suggests a wide variety of potential research and clinical applications for this platform.  相似文献   

13.
We address the bioinformatic issue of accurately separating amplified genes of the major histocompatibility complex (MHC) from artefacts generated during high‐throughput sequencing workflows. We fit observed ultra‐deep sequencing depths (hundreds to thousands of sequences per amplicon) of allelic variants to expectations from genetic models of copy number variation (CNV). We provide a simple, accurate and repeatable method for genotyping multigene families, evaluating our method via analyses of 209 b of MHC class IIb exon 2 in guppies (Poecilia reticulata). Genotype repeatability for resequenced individuals (N = 49) was high (100%) within the same sequencing run. However, repeatability dropped to 83.7% between independent runs, either because of lower mean amplicon sequencing depth in the initial run or random PCR effects. This highlights the importance of fully independent replicates. Significant improvements in genotyping accuracy were made by greatly reducing type I genotyping error (i.e. accepting an artefact as a true allele), which may occur when using low‐depth allele validation thresholds used by previous methods. Only a small amount (4.9%) of type II error (i.e. rejecting a genuine allele as an artefact) was detected through fully independent sequencing runs. We observed 1–6 alleles per individual, and evidence of sharing of alleles across loci. Variation in the total number of MHC class II loci among individuals, both among and within populations was also observed, and some genotypes appeared to be partially hemizygous; total allelic dosage added up to an odd number of allelic copies. Collectively, observations provide evidence of MHC CNV and its complex basis in natural populations.  相似文献   

14.
15.
Next‐generation sequencing (NGS) is emerging as an efficient and cost‐effective tool in population genomic analyses of nonmodel organisms, allowing simultaneous resequencing of many regions of multi‐genomic DNA from multiplexed samples. Here, we detail our synthesis of protocols for targeted resequencing of mitochondrial and nuclear loci by generating indexed genomic libraries for multiplexing up to 100 individuals in a single sequencing pool, and then enriching the pooled library using custom DNA capture arrays. Our use of DNA sequence from one species to capture and enrich the sequencing libraries of another species (i.e. cross‐species DNA capture) indicates that efficient enrichment occurs when sequences are up to about 12% divergent, allowing us to take advantage of genomic information in one species to sequence orthologous regions in related species. In addition to a complete mitochondrial genome on each array, we have included between 43 and 118 nuclear loci for low‐coverage sequencing of between 18 kb and 87 kb of DNA sequence per individual for single nucleotide polymorphisms discovery from 50 to 100 individuals in a single sequencing lane. Using this method, we have generated a total of over 500 whole mitochondrial genomes from seven cetacean species and green sea turtles. The greater variation detected in mitogenomes relative to short mtDNA sequences is helping to resolve genetic structure ranging from geographic to species‐level differences. These NGS and analysis techniques have allowed for simultaneous population genomic studies of mtDNA and nDNA with greater genomic coverage and phylogeographic resolution than has previously been possible in marine mammals and turtles.  相似文献   

16.
DNA barcoding is an efficient method to identify specimens and to detect undescribed/cryptic species. Sanger sequencing of individual specimens is the standard approach in generating large‐scale DNA barcode libraries and identifying unknowns. However, the Sanger sequencing technology is, in some respects, inferior to next‐generation sequencers, which are capable of producing millions of sequence reads simultaneously. Additionally, direct Sanger sequencing of DNA barcode amplicons, as practiced in most DNA barcoding procedures, is hampered by the need for relatively high‐target amplicon yield, coamplification of nuclear mitochondrial pseudogenes, confusion with sequences from intracellular endosymbiotic bacteria (e.g. Wolbachia) and instances of intraindividual variability (i.e. heteroplasmy). Any of these situations can lead to failed Sanger sequencing attempts or ambiguity of the generated DNA barcodes. Here, we demonstrate the potential application of next‐generation sequencing platforms for parallel acquisition of DNA barcode sequences from hundreds of specimens simultaneously. To facilitate retrieval of sequences obtained from individual specimens, we tag individual specimens during PCR amplification using unique 10‐mer oligonucleotides attached to DNA barcoding PCR primers. We employ 454 pyrosequencing to recover full‐length DNA barcodes of 190 specimens using 12.5% capacity of a 454 sequencing run (i.e. two lanes of a 16 lane run). We obtained an average of 143 sequence reads for each individual specimen. The sequences produced are full‐length DNA barcodes for all but one of the included specimens. In a subset of samples, we also detected Wolbachia, nontarget species, and heteroplasmic sequences. Next‐generation sequencing is of great value because of its protocol simplicity, greatly reduced cost per barcode read, faster throughout and added information content.  相似文献   

17.
Research in evolutionary biology involving nonmodel organisms is rapidly shifting from using traditional molecular markers such as mtDNA and microsatellites to higher throughput SNP genotyping methodologies to address questions in population genetics, phylogenetics and genetic mapping. Restriction site associated DNA sequencing (RAD sequencing or RADseq) has become an established method for SNP genotyping on Illumina sequencing platforms. Here, we developed a protocol and adapters for double‐digest RAD sequencing for Ion Torrent (Life Technologies; Ion Proton, Ion PGM) semiconductor sequencing. We sequenced thirteen genomic libraries of three different nonmodel vertebrate species on Ion Proton with PI chips: Arctic charr Salvelinus alpinus, European whitefish Coregonus lavaretus and common lizard Zootoca vivipara. This resulted in ~962 million single‐end reads overall and a mean of ~74 million reads per library. We filtered the genomic data using Stacks, a bioinformatic tool to process RAD sequencing data. On average, we obtained ~11 000 polymorphic loci per library of 6–30 individuals. We validate our new method by technical and biological replication, by reconstructing phylogenetic relationships, and using a hybrid genetic cross to track genomic variants. Finally, we discuss the differences between using the different sequencing platforms in the context of RAD sequencing, assessing possible advantages and disadvantages. We show that our protocol can be used for Ion semiconductor sequencing platforms for the rapid and cost‐effective generation of variable and reproducible genetic markers.  相似文献   

18.
Flexibility and low cost make genotyping‐by‐sequencing (GBS) an ideal tool for population genomic studies of nonmodel species. However, to utilize the potential of the method fully, many parameters affecting library quality and single nucleotide polymorphism (SNP) discovery require optimization, especially for conifer genomes with a high repetitive DNA content. In this study, we explored strategies for effective GBS analysis in pine species. We constructed GBS libraries using HpaII, PstI and EcoRI‐MseI digestions with different multiplexing levels and examined the effect of restriction enzymes on library complexity and the impact of sequencing depth and size selection of restriction fragments on sequence coverage bias. We tested and compared UNEAK, Stacks and GATK pipelines for the GBS data, and then developed a reference‐free SNP calling strategy for haploid pine genomes. Our GBS procedure proved to be effective in SNP discovery, producing 7000–11 000 and 14 751 SNPs within and among three pine species, respectively, from a PstI library. This investigation provides guidance for the design and analysis of GBS experiments, particularly for organisms for which genomic information is lacking.  相似文献   

19.
Restriction‐enzyme‐based sequencing methods enable the genotyping of thousands of single nucleotide polymorphism (SNP) loci in nonmodel organisms. However, in contrast to traditional genetic markers, genotyping error rates in SNPs derived from restriction‐enzyme‐based methods remain largely unknown. Here, we estimated genotyping error rates in SNPs genotyped with double digest RAD sequencing from Mendelian incompatibilities in known mother–offspring dyads of Hoffman's two‐toed sloth (Choloepus hoffmanni) across a range of coverage and sequence quality criteria, for both reference‐aligned and de novo‐assembled data sets. Genotyping error rates were more sensitive to coverage than sequence quality and low coverage yielded high error rates, particularly in de novo‐assembled data sets. For example, coverage ≥5 yielded median genotyping error rates of ≥0.03 and ≥0.11 in reference‐aligned and de novo‐assembled data sets, respectively. Genotyping error rates declined to ≤0.01 in reference‐aligned data sets with a coverage ≥30, but remained ≥0.04 in the de novo‐assembled data sets. We observed approximately 10‐ and 13‐fold declines in the number of loci sampled in the reference‐aligned and de novo‐assembled data sets when coverage was increased from ≥5 to ≥30 at quality score ≥30, respectively. Finally, we assessed the effects of genotyping coverage on a common population genetic application, parentage assignments, and showed that the proportion of incorrectly assigned maternities was relatively high at low coverage. Overall, our results suggest that the trade‐off between sample size and genotyping error rates be considered prior to building sequencing libraries, reporting genotyping error rates become standard practice, and that effects of genotyping errors on inference be evaluated in restriction‐enzyme‐based SNP studies.  相似文献   

20.
Use of SNPs has been favoured due to their abundance in plant and animal genomes, accompanied by the falling cost and rising throughput capacity for detection and genotyping. Here, we present in vitro (obtained from targeted sequencing) and in silico discovery of SNPs, and the design of medium‐throughput genotyping arrays for two oyster species, the Pacific oyster, Crassostrea gigas, and European flat oyster, Ostrea edulis. Two sets of 384 SNP markers were designed for two Illumina GoldenGate arrays and genotyped on more than 1000 samples for each species. In each case, oyster samples were obtained from wild and selected populations and from three‐generation families segregating for traits of interest in aquaculture. The rate of successfully genotyped polymorphic SNPs was about 60% for each species. Effects of SNP origin and quality on genotyping success (Illumina functionality Score) were analysed and compared with other model and nonmodel species. Furthermore, a simulation was made based on a subset of the C. gigas SNP array with a minor allele frequency of 0.3 and typical crosses used in shellfish hatcheries. This simulation indicated that at least 150 markers were needed to perform an accurate parental assignment. Such panels might provide valuable tools to improve our understanding of the connectivity between wild (and selected) populations and could contribute to future selective breeding programmes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号