首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Ancient DNA research has developed rapidly over the past few decades due to improvements in PCR and next‐generation sequencing (NGS) technologies, but challenges still exist. One major challenge in relation to ancient DNA research is to recover genuine endogenous ancient DNA sequences from raw sequencing data. This is often difficult due to degradation of ancient DNA and high levels of contamination, especially homologous contamination that has extremely similar genetic background with that of the real ancient DNA. In this study, we collected whole‐genome sequencing (WGS) data from 6 ancient samples to compare different mapping algorithms. To further explore more effective methods to separate endogenous DNA from homologous contaminations, we attempted to recover reads based on ancient DNA specific characteristics of deamination, depurination, and DNA fragmentation with different parameters. We propose a quick and improved pipeline for separating endogenous ancient DNA while simultaneously decreasing homologous contaminations to very low proportions. Our goal in this research was to develop useful recommendations for ancient DNA mapping and for separation of endogenous DNA to facilitate future studies of ancient DNA.  相似文献   

2.
ABSTRACT: BACKGROUND: Next-Generation Sequencing has revolutionized our approach to ancient DNA (aDNA) research, by providing complete genomic sequences of ancient individuals and extinct species. However, the recovery of genetic material from long-dead organisms is still complicated by a number of issues, including post-mortem DNA damage and high levels of environmental contamination. Together with error profiles specific to the type of sequencing platforms used, these specificities could limit our ability to map sequencing reads against modern reference genomes and therefore limit our ability to identify endogenous ancient reads, reducing the efficiency of shotgun sequencing aDNA. RESULTS: In this study, we compare different computational methods for improving the accuracy and sensitivity of aDNA sequence identification, based on shotgun sequencing reads recovered from Pleistocene horse extracts using Illumina GAIIx and Helicos Heliscope platforms. We show that the performance of the Burrows Wheeler Aligner (BWA), that has been developed for mapping of undamaged sequencing reads using platforms with low rates of indel-types of sequencing errors, can be employed at acceptable run-times by modifying default parameters in a platform-specific manner. We also examine if trimming likely damaged positions at read ends can increase the recovery of genuine aDNA fragments and if accurate identification of human contamination can be achieved using a strategy previously suggested based on best hit filtering. We show that combining our different mapping and filtering approaches can increase the number of high-quality endogenous hits recovered by up to 33%. CONCLUSIONS: We have shown that Illumina and Helicos sequences recovered from aDNA extracts could not be aligned to modern reference genomes with the same efficiency unless mapping parameters are optimized for the specific types of errors generated by these platforms and by post-mortem DNA damage. Our findings have important implications for future aDNA research, as we define mapping guidelines that improve our ability to identify genuine aDNA sequences, which in turn could improve the genotyping accuracy of ancient specimens. Our framework provides a significant improvement to the standard procedures used for characterizing ancient genomes, which is challenged by contamination and often low amounts of DNA material.  相似文献   

3.
High‐throughput sequencing has dramatically fostered ancient DNA research in recent years. Shotgun sequencing, however, does not necessarily appear as the best‐suited approach due to the extensive contamination of samples with exogenous environmental microbial DNA. DNA capture‐enrichment methods represent cost‐effective alternatives that increase the sequencing focus on the endogenous fraction, whether it is from mitochondrial or nuclear genomes, or parts thereof. Here, we explored experimental parameters that could impact the efficacy of MYbaits in‐solution capture assays of ~5000 nuclear loci or the whole genome. We found that varying quantities of the starting probes had only moderate effect on capture outcomes. Starting DNA, probe tiling, the hybridization temperature and the proportion of endogenous DNA all affected the assay, however. Additionally, probe features such as their GC content, number of CpG dinucleotides, sequence complexity and entropy and self‐annealing properties need to be carefully addressed during the design stage of the capture assay. The experimental conditions and probe molecular features identified in this study will improve the recovery of genetic information extracted from degraded and ancient remains.  相似文献   

4.
Tracking down human contamination in ancient human teeth   总被引:1,自引:0,他引:1  
DNA contamination arising from the manipulation of ancient calcified tissue samples is a poorly understood, yet fundamental, problem that affects the reliability of ancient DNA (aDNA) studies. We have typed the mitochondrial DNA hypervariable region I of the only 6 people involved in the excavation, washing, and subsequent anthropological and genetic study of 23 Neolithic remains excavated from Granollers (Barcelona, Spain) and searched for their presence among the 572 clones generated during the aDNA analyses of teeth from these samples. Of the cloned sequences, 17.13% could be unambiguously identified as contaminants, with those derived from the people involved in the retrieval and washing of the remains present in higher frequencies than those of the anthropologist and genetic researchers. This finding confirms, for the first time, previous hypotheses that teeth samples are most susceptible to contamination at their initial excavation. More worrying, the cloned contaminant sequences exhibit substitutions that can be attributed to DNA damage after the contamination event, and we demonstrate that the level of such damage increases with time: contaminants that are >10 years old have approximately 5 times more damage than those that are recent. Furthermore, we demonstrate that in this data set, the damage rate of the old contaminant sequences is indistinguishable from that of the endogenous DNA sequences. As such, the commonly used argument that miscoding lesions observed among cloned aDNA sequences can be used to support data authenticity is misleading in scenarios where the presence of old contaminant sequences is possible. We argue therefore that the typing of those involved in the manipulation of the ancient human specimens is critical in order to ensure that generated results are accurate.  相似文献   

5.
Recent advances in high‐thoughput DNA sequencing have made genome‐scale analyses of genomes of extinct organisms possible. With these new opportunities come new difficulties in assessing the authenticity of the DNA sequences retrieved. We discuss how these difficulties can be addressed, particularly with regard to analyses of the Neandertal genome. We argue that only direct assays of DNA sequence positions in which Neandertals differ from all contemporary humans can serve as a reliable means to estimate human contamination. Indirect measures, such as the extent of DNA fragmentation, nucleotide misincorporations, or comparison of derived allele frequencies in different fragment size classes, are unreliable. Fortunately, interim approaches based on mtDNA differences between Neandertals and current humans, detection of male contamination through Y chromosomal sequences, and repeated sequencing from the same fossil to detect autosomal contamination allow initial large‐scale sequencing of Neandertal genomes. This will result in the discovery of fixed differences in the nuclear genome between Neandertals and current humans that can serve as future direct assays for contamination. For analyses of other fossil hominins, which may become possible in the future, we suggest a similar ‘boot‐strap’ approach in which interim approaches are applied until sufficient data for more definitive direct assays are acquired.  相似文献   

6.

Background

Determination of genetic relatedness among microorganisms provides information necessary for making inferences regarding phylogeny. However, there is little information available on how well the genetic relationships inferred from different genotyping methods agree with true genetic relationships. In this report, two genotyping methods – restriction fragment analysis (RFA) and partial genome DNA sequencing – were each compared to complete DNA sequencing as the definitive standard for classification.

Results

Using the Genbank database, 16 different types or subtypes of papillomavirus were selected as study samples, because numerous complete genome sequences were available. RFA was achieved by computer-simulated digestion. The genetic similarity of samples, based on RFA, was determined from the proportion of fragments that matched in size. DNA sequences of four specific genes (E1, E6, E7, and L1), representing partial genome sequencing, were also selected for comparison to complete genome sequencing. Laboratory error was not taken into account. Evaluation of the correlation between genetic similarity matrices (Mantel's r) and comparisons of the structure of the derived dendrograms (partition metric) indicated that partial genome sequencing (for single genes) had higher agreement with complete genome sequencing, achieving a maximum Mantel's r = 0.97 and a minimum partition metric = 10. RFA had lower agreement, with a maximum Mantel's r = 0.60 and a minimum partition metric = 18.

Conclusions

This simulation indicated that for smaller genomes, such as papillomavirus, partial genome sequencing is superior to restriction fragment analysis in representing genetic relatedness among isolates. The generalizability of these results to larger genomes, as well as the impact of laboratory error, remains to be demonstrated.  相似文献   

7.
The DNA molecules that can be extracted from archaeological and palaeontological remains are often degraded and massively contaminated with environmental microbial material. This reduces the efficacy of shotgun approaches for sequencing ancient genomes, despite the decreasing sequencing costs of high‐throughput sequencing (HTS). Improving the recovery of endogenous molecules from the DNA extraction and purification steps could, thus, help advance the characterization of ancient genomes. Here, we apply the three most commonly used DNA extraction methods to five ancient bone samples spanning a ~30 thousand year temporal range and originating from a diversity of environments, from South America to Alaska. We show that methods based on the purification of DNA fragments using silica columns are more advantageous than in solution methods and increase not only the total amount of DNA molecules retrieved but also the relative importance of endogenous DNA fragments and their molecular diversity. Therefore, these methods provide a cost‐effective solution for downstream applications, including DNA sequencing on HTS platforms.  相似文献   

8.
Knowledge of how individuals are related is important in many areas of research, and numerous methods for inferring pairwise relatedness from genetic data have been developed. However, the majority of these methods were not developed for situations where data are limited. Specifically, most methods rely on the availability of population allele frequencies, the relative genomic position of variants and accurate genotype data. But in studies of non‐model organisms or ancient samples, such data are not always available. Motivated by this, we present a new method for pairwise relatedness inference, which requires neither allele frequency information nor information on genomic position. Furthermore, it can be applied not only to accurate genotype data but also to low‐depth sequencing data from which genotypes cannot be accurately called. We evaluate it using data from a range of human populations and show that it can be used to infer close familial relationships with a similar accuracy as a widely used method that relies on population allele frequencies. Additionally, we show that our method is robust to SNP ascertainment and applicable to low‐depth sequencing data generated using different strategies, including resequencing and RADseq, which is important for application to a diverse range of populations and species.  相似文献   

9.
We extracted DNA from the human remains excavated from the Yixi site ( approximately 2,000 years before the present) in the Shandong peninsula of China and, through PCR amplification, determined nucleotide sequences of their mitochondrial D-loop regions. Nucleotide diversity of the ancient Yixi people was similar to those of modern populations. Modern humans in Asia and the circum-Pacific region are divided into six radiation groups, on the basis of the phylogenetic network constructed by means of 414 mtDNA types from 1, 298 individuals. We compared the ancient Yixi people with the modern Asian and the circum-Pacific populations, using two indices: frequency distribution of the radiation groups and genetic distances among populations. Both revealed that the closest genetic relatedness is between the ancient Yixi people and the modern Taiwan Han Chinese. The Yixi people show closer genetic affinity with Mongolians, mainland Japanese, and Koreans than with Ainu and Ryukyu Japanese and less genetic resemblance with Jomon people and Yayoi people, their predecessors and contemporaries, respectively, in ancient Japan.  相似文献   

10.
The challenge of sequencing ancient DNA has led to the development of specialized laboratory protocols that have focused on reducing contamination and maximizing the number of molecules that are extracted from ancient remains. Despite the fact that success in ancient DNA studies is typically obtained by screening many samples to identify a promising subset, ancient DNA protocols have not, in general, focused on reducing the time required to screen samples. We present an adaptation of a popular ancient library preparation method that makes screening more efficient. First, the DNA extract is treated using a protocol that causes characteristic ancient DNA damage to be restricted to the terminal nucleotides, while nearly eliminating it in the interior of the DNA molecules, allowing a single library to be used both to test for ancient DNA authenticity and to carry out population genetic analysis. Second, the DNA molecules are ligated to a unique pair of barcodes, which eliminates undetected cross-contamination from this step onwards. Third, the barcoded library molecules include incomplete adapters of short length that can increase the specificity of hybridization-based genomic target enrichment. The adapters are completed just before sequencing, so the same DNA library can be used in multiple experiments, and the sequences distinguished. We demonstrate this protocol on 60 ancient human samples.  相似文献   

11.
Millions to billions of DNA sequences can now be generated from ancient skeletal remains thanks to the massive throughput of next‐generation sequencing platforms. Except in cases of exceptional endogenous DNA preservation, most of the sequences isolated from fossil material do not originate from the specimen of interest, but instead reflect environmental organisms that colonized the specimen after death. Here, we characterize the microbial diversity recovered from seven c. 200‐ to 13 000‐year‐old horse bones collected from northern Siberia. We use a robust, taxonomy‐based assignment approach to identify the microorganisms present in ancient DNA extracts and quantify their relative abundance. Our results suggest that molecular preservation niches exist within ancient samples that can potentially be used to characterize the environments from which the remains are recovered. In addition, microbial community profiling of the seven specimens revealed site‐specific environmental signatures. These microbial communities appear to comprise mainly organisms that colonized the fossils recently. Our approach significantly extends the amount of useful data that can be recovered from ancient specimens using a shotgun sequencing approach. In future, it may be possible to correlate, for example, the accumulation of postmortem DNA damage with the presence and/or abundance of particular microbes.  相似文献   

12.
We present a cost‐effective metabarcoding approach, aMPlex Torrent, which relies on an improved multiplex PCR adapted to highly degraded DNA, combining barcoding and next‐generation sequencing to simultaneously analyse many heterogeneous samples. We demonstrate the strength of these improvements by generating a phylochronology through the genotyping of ancient rodent remains from a Moroccan cave whose stratigraphy covers the last 120 000 years. Rodents are important for epidemiology, agronomy and ecological investigations and can act as bioindicators for human‐ and/or climate‐induced environmental changes. Efficient and reliable genotyping of ancient rodent remains has the potential to deliver valuable phylogenetic and paleoecological information. The analysis of multiple ancient skeletal remains of very small size with poor DNA preservation, however, requires a sensitive high‐throughput method to generate sufficient data. We show this approach to be particularly adapted at accessing this otherwise difficult taxonomic and genetic resource. As a highly scalable, lower cost and less labour‐intensive alternative to targeted sequence capture approaches, we propose the aMPlex Torrent strategy to be a useful tool for the genetic analysis of multiple degraded samples in studies involving ecology, archaeology, conservation and evolutionary biology.  相似文献   

13.
Kinship plays a fundamental role in the evolution of social systems and is considered a key driver of group living. To understand the role of kinship in the formation and maintenance of social bonds, accurate measures of genetic relatedness are critical. Genotype‐by‐sequencing technologies are rapidly advancing the accuracy and precision of genetic relatedness estimates for wild populations. The ability to assign kinship from genetic data varies depending on a species’ or population's mating system and pattern of dispersal, and empirical data from longitudinal studies are crucial to validate these methods. We use data from a long‐term behavioural study of a polygynandrous, bisexually philopatric marine mammal to measure accuracy and precision of parentage and genetic relatedness estimation against a known partial pedigree. We show that with moderate but obtainable sample sizes of approximately 4,235 SNPs and 272 individuals, highly accurate parentage assignments and genetic relatedness coefficients can be obtained. Additionally, we subsample our data to quantify how data availability affects relatedness estimation and kinship assignment. Lastly, we conduct a social network analysis to investigate the extent to which accuracy and precision of relatedness estimation improve statistical power to detect an effect of relatedness on social structure. Our results provide practical guidance for minimum sample sizes and sequencing depth for future studies, as well as thresholds for post hoc interpretation of previous analyses.  相似文献   

14.
Chloroplast DNA sequence data are a versatile tool for plant identification or barcoding and establishing genetic relationships among plant species. Different chloroplast loci have been utilized for use at close and distant evolutionary distances in plants, and no single locus has been identified that can distinguish between all plant species. Advances in DNA sequencing technology are providing new cost‐effective options for genome comparisons on a much larger scale. Universal PCR amplification of chloroplast sequences or isolation of pure chloroplast fractions, however, are non‐trivial. We now propose the analysis of chloroplast genome sequences from massively parallel sequencing (MPS) of total DNA as a simple and cost‐effective option for plant barcoding, and analysis of plant relationships to guide gene discovery for biotechnology. We present chloroplast genome sequences of five grass species derived from MPS of total DNA. These data accurately established the phylogenetic relationships between the species, correcting an apparent error in the published rice sequence. The chloroplast genome may be the elusive single‐locus DNA barcode for plants.  相似文献   

15.
The performance of hybridization capture combined with next‐generation sequencing (NGS) has seen limited investigation with samples from hot and arid regions until now. We applied hybridization capture and shotgun sequencing to recover DNA sequences from bone specimens of ancient‐domestic dromedary (Camelus dromedarius) and its extinct ancestor, the wild dromedary from Jordan, Syria, Turkey and the Arabian Peninsula, respectively. Our results show that hybridization capture increased the percentage of mitochondrial DNA (mtDNA) recovery by an average 187‐fold and in some cases yielded virtually complete mitochondrial (mt) genomes at multifold coverage in a single capture experiment. Furthermore, we tested the effect of hybridization temperature and time by using a touchdown approach on a limited number of samples. We observed no significant difference in the number of unique dromedary mtDNA reads retrieved with the standard capture compared to the touchdown method. In total, we obtained 14 partial mitochondrial genomes from ancient‐domestic dromedaries with 17–95% length coverage and 1.27–47.1‐fold read depths for the covered regions. Using whole‐genome shotgun sequencing, we successfully recovered endogenous dromedary nuclear DNA (nuDNA) from domestic and wild dromedary specimens with 1–1.06‐fold read depths for covered regions. Our results highlight that despite recent methodological advances, obtaining ancient DNA (aDNA) from specimens recovered from hot, arid environments is still problematic. Hybridization protocols require specific optimization, and samples at the limit of DNA preservation need multiple replications of DNA extraction and hybridization capture as has been shown previously for Middle Pleistocene specimens.  相似文献   

16.
Molecular markers produced by next‐generation sequencing (NGS) technologies are revolutionizing genetic research. However, the costs of analysing large numbers of individual genomes remain prohibitive for most population genetics studies. Here, we present results based on mathematical derivations showing that, under many realistic experimental designs, NGS of DNA pools from diploid individuals allows to estimate the allele frequencies at single nucleotide polymorphisms (SNPs) with at least the same accuracy as individual‐based analyses, for considerably lower library construction and sequencing efforts. These findings remain true when taking into account the possibility of substantially unequal contributions of each individual to the final pool of sequence reads. We propose the intuitive notion of effective pool size to account for unequal pooling and derive a Bayesian hierarchical model to estimate this parameter directly from the data. We provide a user‐friendly application assessing the accuracy of allele frequency estimation from both pool‐ and individual‐based NGS population data under various sampling, sequencing depth and experimental error designs. We illustrate our findings with theoretical examples and real data sets corresponding to SNP loci obtained using restriction site–associated DNA (RAD) sequencing in pool‐ and individual‐based experiments carried out on the same population of the pine processionary moth (Thaumetopoea pityocampa). NGS of DNA pools might not be optimal for all types of studies but provides a cost‐effective approach for estimating allele frequencies for very large numbers of SNPs. It thus allows comparison of genome‐wide patterns of genetic variation for large numbers of individuals in multiple populations.  相似文献   

17.
Information on genetic relationships among individuals is essential to many studies of the behaviour and ecology of wild organisms. Parentage and relatedness assays based on large numbers of single nucleotide polymorphism (SNP) loci hold substantial advantages over the microsatellite markers traditionally used for these purposes. We present a double‐digest restriction site‐associated DNA sequencing (ddRAD‐seq) analysis pipeline that, as such, simultaneously achieves the SNP discovery and genotyping steps and which is optimized to return a statistically powerful set of SNP markers (typically 150–600 after stringent filtering) from large numbers of individuals (up to 240 per run). We explore the trade‐offs inherent in this approach through a set of experiments in a species with a complex social system, the variegated fairy‐wren (Malurus lamberti) and further validate it in a phylogenetically broad set of other bird species. Through direct comparisons with a parallel data set from a robust panel of highly variable microsatellite markers, we show that this ddRAD‐seq approach results in substantially improved power to discriminate among potential relatives and considerably more precise estimates of relatedness coefficients. The pipeline is designed to be universally applicable to all bird species (and with minor modifications to many other taxa), to be cost‐ and time‐efficient, and to be replicable across independent runs such that genotype data from different study periods can be combined and analysed as field samples are accumulated.  相似文献   

18.
We address the bioinformatic issue of accurately separating amplified genes of the major histocompatibility complex (MHC) from artefacts generated during high‐throughput sequencing workflows. We fit observed ultra‐deep sequencing depths (hundreds to thousands of sequences per amplicon) of allelic variants to expectations from genetic models of copy number variation (CNV). We provide a simple, accurate and repeatable method for genotyping multigene families, evaluating our method via analyses of 209 b of MHC class IIb exon 2 in guppies (Poecilia reticulata). Genotype repeatability for resequenced individuals (N = 49) was high (100%) within the same sequencing run. However, repeatability dropped to 83.7% between independent runs, either because of lower mean amplicon sequencing depth in the initial run or random PCR effects. This highlights the importance of fully independent replicates. Significant improvements in genotyping accuracy were made by greatly reducing type I genotyping error (i.e. accepting an artefact as a true allele), which may occur when using low‐depth allele validation thresholds used by previous methods. Only a small amount (4.9%) of type II error (i.e. rejecting a genuine allele as an artefact) was detected through fully independent sequencing runs. We observed 1–6 alleles per individual, and evidence of sharing of alleles across loci. Variation in the total number of MHC class II loci among individuals, both among and within populations was also observed, and some genotypes appeared to be partially hemizygous; total allelic dosage added up to an odd number of allelic copies. Collectively, observations provide evidence of MHC CNV and its complex basis in natural populations.  相似文献   

19.
When sequencing an ancient DNA sample from a hominin fossil, DNA from present-day humans involved in excavation and extraction will be sequenced along with the endogenous material. This type of contamination is problematic for downstream analyses as it will introduce a bias towards the population of the contaminating individual(s). Quantifying the extent of contamination is a crucial step as it allows researchers to account for possible biases that may arise in downstream genetic analyses. Here, we present an MCMC algorithm to co-estimate the contamination rate, sequencing error rate and demographic parameters—including drift times and admixture rates—for an ancient nuclear genome obtained from human remains, when the putative contaminating DNA comes from present-day humans. We assume we have a large panel representing the putative contaminant population (e.g. European, East Asian or African). The method is implemented in a C++ program called ‘Demographic Inference with Contamination and Error’ (DICE). We applied it to simulations and genome data from ancient Neanderthals and modern humans. With reasonable levels of genome sequence coverage (>3X), we find we can recover accurate estimates of all these parameters, even when the contamination rate is as high as 50%.  相似文献   

20.
The genetic relatedness of the Bacillus anthracis typing phages Gamma and Cherry was determined by nucleotide sequencing and comparative analysis. The genomes of these two phages were identical except at three variable loci, which showed heterogeneity within individual lysates and among Cherry, Wbeta, Fah, and four Gamma bacteriophage sequences.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号