首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 203 毫秒
1.
ABSTRACT: BACKGROUND: Next-Generation Sequencing has revolutionized our approach to ancient DNA (aDNA) research, by providing complete genomic sequences of ancient individuals and extinct species. However, the recovery of genetic material from long-dead organisms is still complicated by a number of issues, including post-mortem DNA damage and high levels of environmental contamination. Together with error profiles specific to the type of sequencing platforms used, these specificities could limit our ability to map sequencing reads against modern reference genomes and therefore limit our ability to identify endogenous ancient reads, reducing the efficiency of shotgun sequencing aDNA. RESULTS: In this study, we compare different computational methods for improving the accuracy and sensitivity of aDNA sequence identification, based on shotgun sequencing reads recovered from Pleistocene horse extracts using Illumina GAIIx and Helicos Heliscope platforms. We show that the performance of the Burrows Wheeler Aligner (BWA), that has been developed for mapping of undamaged sequencing reads using platforms with low rates of indel-types of sequencing errors, can be employed at acceptable run-times by modifying default parameters in a platform-specific manner. We also examine if trimming likely damaged positions at read ends can increase the recovery of genuine aDNA fragments and if accurate identification of human contamination can be achieved using a strategy previously suggested based on best hit filtering. We show that combining our different mapping and filtering approaches can increase the number of high-quality endogenous hits recovered by up to 33%. CONCLUSIONS: We have shown that Illumina and Helicos sequences recovered from aDNA extracts could not be aligned to modern reference genomes with the same efficiency unless mapping parameters are optimized for the specific types of errors generated by these platforms and by post-mortem DNA damage. Our findings have important implications for future aDNA research, as we define mapping guidelines that improve our ability to identify genuine aDNA sequences, which in turn could improve the genotyping accuracy of ancient specimens. Our framework provides a significant improvement to the standard procedures used for characterizing ancient genomes, which is challenged by contamination and often low amounts of DNA material.  相似文献   

2.
Most ancient specimens contain very low levels of endogenous DNA, precluding the shotgun sequencing of many interesting samples because of cost. Ancient DNA (aDNA) libraries often contain <1% endogenous DNA, with the majority of sequencing capacity taken up by environmental DNA. Here we present a capture-based method for enriching the endogenous component of aDNA sequencing libraries. By using biotinylated RNA baits transcribed from genomic DNA libraries, we are able to capture DNA fragments from across the human genome. We demonstrate this method on libraries created from four Iron Age and Bronze Age human teeth from Bulgaria, as well as bone samples from seven Peruvian mummies and a Bronze Age hair sample from Denmark. Prior to capture, shotgun sequencing of these libraries yielded an average of 1.2% of reads mapping to the human genome (including duplicates). After capture, this fraction increased substantially, with up to 59% of reads mapped to human and enrichment ranging from 6- to 159-fold. Furthermore, we maintained coverage of the majority of regions sequenced in the precapture library. Intersection with the 1000 Genomes Project reference panel yielded an average of 50,723 SNPs (range 3,062–147,243) for the postcapture libraries sequenced with 1 million reads, compared with 13,280 SNPs (range 217–73,266) for the precapture libraries, increasing resolution in population genetic analyses. Our whole-genome capture approach makes it less costly to sequence aDNA from specimens containing very low levels of endogenous DNA, enabling the analysis of larger numbers of samples.  相似文献   

3.
4.
Ancient DNA research has developed rapidly over the past few decades due to improvements in PCR and next‐generation sequencing (NGS) technologies, but challenges still exist. One major challenge in relation to ancient DNA research is to recover genuine endogenous ancient DNA sequences from raw sequencing data. This is often difficult due to degradation of ancient DNA and high levels of contamination, especially homologous contamination that has extremely similar genetic background with that of the real ancient DNA. In this study, we collected whole‐genome sequencing (WGS) data from 6 ancient samples to compare different mapping algorithms. To further explore more effective methods to separate endogenous DNA from homologous contaminations, we attempted to recover reads based on ancient DNA specific characteristics of deamination, depurination, and DNA fragmentation with different parameters. We propose a quick and improved pipeline for separating endogenous ancient DNA while simultaneously decreasing homologous contaminations to very low proportions. Our goal in this research was to develop useful recommendations for ancient DNA mapping and for separation of endogenous DNA to facilitate future studies of ancient DNA.  相似文献   

5.
The performance of hybridization capture combined with next‐generation sequencing (NGS) has seen limited investigation with samples from hot and arid regions until now. We applied hybridization capture and shotgun sequencing to recover DNA sequences from bone specimens of ancient‐domestic dromedary (Camelus dromedarius) and its extinct ancestor, the wild dromedary from Jordan, Syria, Turkey and the Arabian Peninsula, respectively. Our results show that hybridization capture increased the percentage of mitochondrial DNA (mtDNA) recovery by an average 187‐fold and in some cases yielded virtually complete mitochondrial (mt) genomes at multifold coverage in a single capture experiment. Furthermore, we tested the effect of hybridization temperature and time by using a touchdown approach on a limited number of samples. We observed no significant difference in the number of unique dromedary mtDNA reads retrieved with the standard capture compared to the touchdown method. In total, we obtained 14 partial mitochondrial genomes from ancient‐domestic dromedaries with 17–95% length coverage and 1.27–47.1‐fold read depths for the covered regions. Using whole‐genome shotgun sequencing, we successfully recovered endogenous dromedary nuclear DNA (nuDNA) from domestic and wild dromedary specimens with 1–1.06‐fold read depths for covered regions. Our results highlight that despite recent methodological advances, obtaining ancient DNA (aDNA) from specimens recovered from hot, arid environments is still problematic. Hybridization protocols require specific optimization, and samples at the limit of DNA preservation need multiple replications of DNA extraction and hybridization capture as has been shown previously for Middle Pleistocene specimens.  相似文献   

6.
The advent of next‐generation sequencing (NGS) technologies has transformed the way microsatellites are isolated for ecological and evolutionary investigations. Recent attempts to employ NGS for microsatellite discovery have used the 454, Illumina, and Ion Torrent platforms, but other methods including single‐molecule real‐time DNA sequencing (Pacific Biosciences or PacBio) remain viable alternatives. We outline a workflow from sequence quality control to microsatellite marker validation in three plant species using PacBio circular consensus sequencing (CCS). We then evaluate the performance of PacBio CCS in comparison with other NGS platforms for microsatellite isolation, through simulations that focus on variations in read length, read quantity and sequencing error rate. Although quality control of CCS reads reduced microsatellite yield by around 50%, hundreds of microsatellite loci that are expected to have improved conversion efficiency to functional markers were retrieved for each species. The simulations quantitatively validate the advantages of long reads and emphasize the detrimental effects of sequencing errors on NGS‐enabled microsatellite development. In view of the continuing improvement in read length on NGS platforms, sequence quality and the corresponding strategies of quality control will become the primary factors to consider for effective microsatellite isolation. Among current options, PacBio CCS may be optimal for rapid, small‐scale microsatellite development due to its flexibility in scaling sequencing effort, while platforms such as Illumina MiSeq will provide cost‐efficient solutions for multispecies microsatellite projects.  相似文献   

7.
Recent advancements of sequencing technology have opened up unprecedented opportunities in many application areas. Virus samples can now be sequenced efficiently with very deep coverage to infer the genetic diversity of the underlying virus populations. Several sequencing platforms with different underlying technologies and performance characteristics are available for viral diversity studies. Here, we investigate how the differences between two common platforms provided by 454/Roche and Illumina affect viral diversity estimation and the reconstruction of viral haplotypes. Using a mixture of ten HIV clones sequenced with both platforms and additional simulation experiments, we assessed the trade-off between sequencing coverage, read length, and error rate. For fixed costs, short Illumina reads can be generated at higher coverage and allow for detecting variants at lower frequencies. They can also be sufficient to assess the diversity of the sample if sequences are dissimilar enough, but, in general, assembly of full-length haplotypes is feasible only with the longer 454/Roche reads. The quantitative comparison highlights the advantages and disadvantages of both platforms and provides guidance for the design of viral diversity studies.  相似文献   

8.
High‐throughput sequencing platforms are continuing to increase resulting read lengths, which is allowing for a deeper and more accurate depiction of environmental microbial diversity. With the nascent Reagent Kit v3, Illumina MiSeq now has the ability to sequence the eukaryotic hyper‐variable V4 region of the SSU‐rDNA locus with paired‐end reads. Using DNA collected from soils with analyses of strictly‐ and nearly identical amplicons, here we ask how the new Illumina MiSeq data compares with what we can obtain with Roche/454 GS FLX with regard to quantity and quality, presence and absence, and abundance perspectives. We show that there is an easy qualitative transition from the Roche/454 to the Illumina MiSeq platforms. The ease of this transition is more nuanced quantitatively for low‐abundant amplicons, although estimates of abundances are known to also vary within platforms.  相似文献   

9.
High-throughput sequencing technologies produce short sequence reads that can contain phase information if they span two or more heterozygote genotypes. This information is not routinely used by current methods that infer haplotypes from genotype data. We have extended the SHAPEIT2 method to use phase-informative sequencing reads to improve phasing accuracy. Our model incorporates the read information in a probabilistic model through base quality scores within each read. The method is primarily designed for high-coverage sequence data or data sets that already have genotypes called. One important application is phasing of single samples sequenced at high coverage for use in medical sequencing and studies of rare diseases. Our method can also use existing panels of reference haplotypes. We tested the method by using a mother-father-child trio sequenced at high-coverage by Illumina together with the low-coverage sequence data from the 1000 Genomes Project (1000GP). We found that use of phase-informative reads increases the mean distance between switch errors by 22% from 274.4 kb to 328.6 kb. We also used male chromosome X haplotypes from the 1000GP samples to simulate sequencing reads with varying insert size, read length, and base error rate. When using short 100 bp paired-end reads, we found that using mixtures of insert sizes produced the best results. When using longer reads with high error rates (5–20 kb read with 4%–15% error per base), phasing performance was substantially improved.  相似文献   

10.
We provide a novel method, DRISEE (duplicate read inferred sequencing error estimation), to assess sequencing quality (alternatively referred to as "noise" or "error") within and/or between sequencing samples. DRISEE provides positional error estimates that can be used to inform read trimming within a sample. It also provides global (whole sample) error estimates that can be used to identify samples with high or varying levels of sequencing error that may confound downstream analyses, particularly in the case of studies that utilize data from multiple sequencing samples. For shotgun metagenomic data, we believe that DRISEE provides estimates of sequencing error that are more accurate and less constrained by technical limitations than existing methods that rely on reference genomes or the use of scores (e.g. Phred). Here, DRISEE is applied to (non amplicon) data sets from both the 454 and Illumina platforms. The DRISEE error estimate is obtained by analyzing sets of artifactual duplicate reads (ADRs), a known by-product of both sequencing platforms. We present DRISEE as an open-source, platform-independent method to assess sequencing error in shotgun metagenomic data, and utilize it to discover previously uncharacterized error in de novo sequence data from the 454 and Illumina sequencing platforms.  相似文献   

11.
The invention and development of next or second generation sequencing methods has resulted in a dramatic transformation of ancient DNA research and allowed shotgun sequencing of entire genomes from fossil specimens. However, although there are exceptions, most fossil specimens contain only low (~ 1% or less) percentages of endogenous DNA. The only skeletal element for which a systematically higher endogenous DNA content compared to other skeletal elements has been shown is the petrous part of the temporal bone. In this study we investigate whether (a) different parts of the petrous bone of archaeological human specimens give different percentages of endogenous DNA yields, (b) there are significant differences in average DNA read lengths, damage patterns and total DNA concentration, and (c) it is possible to obtain endogenous ancient DNA from petrous bones from hot environments. We carried out intra-petrous comparisons for ten petrous bones from specimens from Holocene archaeological contexts across Eurasia dated between 10,000-1,800 calibrated years before present (cal. BP). We obtained shotgun DNA sequences from three distinct areas within the petrous: a spongy part of trabecular bone (part A), the dense part of cortical bone encircling the osseous inner ear, or otic capsule (part B), and the dense part within the otic capsule (part C). Our results confirm that dense bone parts of the petrous bone can provide high endogenous aDNA yields and indicate that endogenous DNA fractions for part C can exceed those obtained for part B by up to 65-fold and those from part A by up to 177-fold, while total endogenous DNA concentrations are up to 126-fold and 109-fold higher for these comparisons. Our results also show that while endogenous yields from part C were lower than 1% for samples from hot (both arid and humid) parts, the DNA damage patterns indicate that at least some of the reads originate from ancient DNA molecules, potentially enabling ancient DNA analyses of samples from hot regions that are otherwise not amenable to ancient DNA analyses.  相似文献   

12.
The DNA molecules that can be extracted from archaeological and palaeontological remains are often degraded and massively contaminated with environmental microbial material. This reduces the efficacy of shotgun approaches for sequencing ancient genomes, despite the decreasing sequencing costs of high‐throughput sequencing (HTS). Improving the recovery of endogenous molecules from the DNA extraction and purification steps could, thus, help advance the characterization of ancient genomes. Here, we apply the three most commonly used DNA extraction methods to five ancient bone samples spanning a ~30 thousand year temporal range and originating from a diversity of environments, from South America to Alaska. We show that methods based on the purification of DNA fragments using silica columns are more advantageous than in solution methods and increase not only the total amount of DNA molecules retrieved but also the relative importance of endogenous DNA fragments and their molecular diversity. Therefore, these methods provide a cost‐effective solution for downstream applications, including DNA sequencing on HTS platforms.  相似文献   

13.
DNA extracted from ancient plant remains almost always contains a mixture of endogenous (that is, derived from the plant) and exogenous (derived from other sources) DNA. The exogenous ‘contaminant’ DNA, chiefly derived from microorganisms, presents significant problems for shotgun sequencing. In some samples, more than 90% of the recovered sequences are exogenous, providing limited data relevant to the sample. However, other samples have far less contamination and subsequently yield much more useful data via shotgun sequencing. Given the investment required for high-throughput sequencing, whenever multiple samples are available, it is most economical to sequence the least contaminated sample. We present an assay based on quantitative real-time PCR which estimates the relative amounts of fungal and bacterial DNA in a sample in comparison to the endogenous plant DNA. Given a collection of contextually-similar ancient plant samples, this low cost assay aids in selecting the best sample for shotgun sequencing.  相似文献   

14.
Target‐capture approach has improved over the past years, proving to be very efficient tool for selectively sequencing genetic regions of interest. These methods have also allowed the use of noninvasive samples such as faeces (characterized by their low quantity and quality of endogenous DNA) to be used in conservation genomic, evolution and population genetic studies. Here we aim to test different protocols and strategies for exome capture using the Roche SeqCap EZ Developer kit (57.5 Mb). First, we captured a complex pool of DNA libraries. Second, we assessed the influence of using more than one faecal sample, extract and/or library from the same individual, to evaluate its effect on the molecular complexity of the experiment. We validated our experiments with 18 chimpanzee faecal samples collected from two field sites as a part of the Pan African Programme: The Cultured Chimpanzee. Those two field sites are in Kibale National Park, Uganda (N = 9) and Loango National Park, Gabon (N = 9). We demonstrate that at least 16 libraries can be pooled, target enriched through hybridization, and sequenced allowing for the genotyping of 951,949 exome markers for population genetic analyses. Further, we observe that molecule richness, and thus, data acquisition, increase when using multiple libraries from the same extract or multiple extracts from the same sample. Finally, repeated captures significantly decrease the proportion of off‐target reads from 34.15% after one capture round to 7.83% after two capture rounds, supporting our conclusion that two rounds of target enrichment are advisable when using complex faecal samples.  相似文献   

15.
RAD‐tag is a powerful tool for high‐throughput genotyping. It relies on PCR amplification of the starting material, following enzymatic digestion and sequencing adaptor ligation. Amplification introduces duplicate reads into the data, which arise from the same template molecule and are statistically nonindependent, potentially introducing errors into genotype calling. In shotgun sequencing, data duplicates are removed by filtering reads starting at the same position in the alignment. However, restriction enzymes target specific locations within the genome, causing reads to start in the same place, and making it difficult to estimate the extent of PCR duplication. Here, we introduce a slight change to the Illumina sequencing adaptor chemistry, appending a unique four‐base tag to the first index read, which allows duplicate discrimination in aligned data. This approach was validated on the Illumina MiSeq platform, using double‐digest libraries of ants (Wasmannia auropunctata) and yeast (Saccharomyces cerevisiae) with known genotypes, producing modest though statistically significant gains in the odds of calling a genotype accurately. More importantly, removing duplicates also corrected for strong sample‐to‐sample variability of genotype calling accuracy seen in the ant samples. For libraries prepared from low‐input degraded museum bird samples (Mixornis gularis), which had low complexity, having been generated from relatively few starting molecules, adaptor tags show that virtually all of the genotypes were called with inflated confidence as a result of PCR duplicates. Quantification of library complexity by adaptor tagging does not significantly increase the difficulty of the overall workflow or its cost, but corrects for differences in quality between samples and permits analysis of low‐input material.  相似文献   

16.
Millions to billions of DNA sequences can now be generated from ancient skeletal remains thanks to the massive throughput of next‐generation sequencing platforms. Except in cases of exceptional endogenous DNA preservation, most of the sequences isolated from fossil material do not originate from the specimen of interest, but instead reflect environmental organisms that colonized the specimen after death. Here, we characterize the microbial diversity recovered from seven c. 200‐ to 13 000‐year‐old horse bones collected from northern Siberia. We use a robust, taxonomy‐based assignment approach to identify the microorganisms present in ancient DNA extracts and quantify their relative abundance. Our results suggest that molecular preservation niches exist within ancient samples that can potentially be used to characterize the environments from which the remains are recovered. In addition, microbial community profiling of the seven specimens revealed site‐specific environmental signatures. These microbial communities appear to comprise mainly organisms that colonized the fossils recently. Our approach significantly extends the amount of useful data that can be recovered from ancient specimens using a shotgun sequencing approach. In future, it may be possible to correlate, for example, the accumulation of postmortem DNA damage with the presence and/or abundance of particular microbes.  相似文献   

17.
DNA sequencing of ancient permafrost samples can be used to reconstruct past plant, animal and bacterial communities. In this study, we assess the small‐scale reproducibility of taxonomic composition obtained from sequencing four molecular markers (mitochondrial 12S ribosomal DNA (rDNA), prokaryote 16S rDNA, mitochondrial cox1 and chloroplast trnL intron) from two soil cores sampled 10 cm apart. In addition, sequenced control reactions were used to produce a contaminant library that was used to filter similar sequences from sample libraries. Contaminant filtering resulted in the removal of 1% of reads or 0.3% of operational taxonomic units. We found similar richness, overlap, abundance and taxonomic diversity from the 12S, 16S and trnL markers from each soil core. Jaccard dissimilarity across the two soil cores was highest for metazoan taxa detected by the 12S and cox1 markers. Taxonomic community distances were similar for each marker across the two soil cores when the chi‐squared metric was used; however, the 12S and cox1 markers did not cluster well when the Goodall similarity metric was used. A comparison of plant macrofossil vs. read abundance corroborates previous work that suggests eastern Beringia was dominated by grasses and forbs during cold stages of the Pleistocene, a habitat that is restricted to isolated sites in the present‐day Yukon.  相似文献   

18.
For many researchers, next generation sequencing data holds the key to answering a category of questions previously unassailable. One of the important and challenging steps in achieving these goals is accurately assembling the massive quantity of short sequencing reads into full nucleic acid sequences. For research groups working with non-model or wild systems, short read assembly can pose a significant challenge due to the lack of pre-existing EST or genome reference libraries. While many publications describe the overall process of sequencing and assembly, few address the topic of how many and what types of reads are best for assembly. The goal of this project was use real world data to explore the effects of read quantity and short read quality scores on the resulting de novo assemblies. Using several samples of short reads of various sizes and qualities we produced many assemblies in an automated manner. We observe how the properties of read length, read quality, and read quantity affect the resulting assemblies and provide some general recommendations based on our real-world data set.  相似文献   

19.

Background

Massively parallel sequencing platforms, featuring high throughput and relatively short read lengths, are well suited to ancient DNA (aDNA) studies. Variant identification from short-read alignment could be hindered, however, by low DNA concentrations common to historic samples, which constrain sequencing depths, and post-mortem DNA damage patterns.

Results

We simulated pairs of sequences to act as reference and sample genomes at varied GC contents and divergence levels. Short-read sequence pools were generated from sample sequences, and subjected to varying levels of “post-mortem” damage by adjusting levels of fragmentation and fragmentation biases, transition rates at sequence ends, and sequencing depths. Mapping of sample read pools to reference sequences revealed several trends, including decreased alignment success with increased read length and decreased variant recovery with increased divergence. Variants were generally called with high accuracy, however identification of SNPs (single-nucleotide polymorphisms) was less accurate for high damage/low divergence samples. Modest increases in sequencing depth resulted in rapid gains in total variant recovery, and limited improvements to recovery of heterozygous variants.

Conclusions

This in silico study suggests aDNA-associated damage patterns minimally impact variant call accuracy and recovery from short-read alignment, while modest increases in sequencing depth can greatly improve variant recovery.
  相似文献   

20.
Population‐scale molecular studies of endangered and cryptic species are often limited by access to high‐quality samples. The use of noninvasively collected samples or museum‐preserved specimens reduces the pressure on modern populations by removing the need to capture and handle live animals. However, endogenous DNA content in such samples is low, making shotgun sequencing a financially prohibitive approach. Here, we apply a target enrichment method to retrieve mitochondrial genomes from 65 museum specimens and 56 noninvasively collected faecal samples of two endangered great ape species, Grauer's gorilla and the eastern chimpanzee. We show that the applied method is suitable for a wide range of sample types that differ in endogenous DNA content, increasing the proportion of target reads to over 300‐fold. By systematically evaluating biases introduced during target enrichment of pooled museum samples, we show that capture is less efficient for fragments shorter or longer than the baits, that the proportion of human contaminating reads increases postcapture although capture efficiency is lower for human compared to gorilla fragments with a gorilla‐generated bait, and that the rate of jumping PCR is considerable, but can be controlled for with a double‐barcoding approach. We succeed in capturing complete mitochondrial genomes from faecal samples, but observe reduced capture efficiency as sequence divergence increases between the bait and target species. As previously shown for museum specimens, we demonstrate here that mitochondrial genome capture from field‐collected faecal samples is a robust and reliable approach for population‐wide studies of nonmodel organisms.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号