首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Three benchtop high-throughput sequencing instruments are now available. The 454 GS Junior (Roche), MiSeq (Illumina) and Ion Torrent PGM (Life Technologies) are laser-printer sized and offer modest set-up and running costs. Each instrument can generate data required for a draft bacterial genome sequence in days, making them attractive for identifying and characterizing pathogens in the clinical setting. We compared the performance of these instruments by sequencing an isolate of Escherichia coli O104:H4, which caused an outbreak of food poisoning in Germany in 2011. The MiSeq had the highest throughput per run (1.6 Gb/run, 60 Mb/h) and lowest error rates. The 454 GS Junior generated the longest reads (up to 600 bases) and most contiguous assemblies but had the lowest throughput (70 Mb/run, 9 Mb/h). Run in 100-bp mode, the Ion Torrent PGM had the highest throughput (80–100 Mb/h). Unlike the MiSeq, the Ion Torrent PGM and 454 GS Junior both produced homopolymer-associated indel errors (1.5 and 0.38 errors per 100 bases, respectively).  相似文献   

2.
High‐throughput sequencing platforms are continuing to increase resulting read lengths, which is allowing for a deeper and more accurate depiction of environmental microbial diversity. With the nascent Reagent Kit v3, Illumina MiSeq now has the ability to sequence the eukaryotic hyper‐variable V4 region of the SSU‐rDNA locus with paired‐end reads. Using DNA collected from soils with analyses of strictly‐ and nearly identical amplicons, here we ask how the new Illumina MiSeq data compares with what we can obtain with Roche/454 GS FLX with regard to quantity and quality, presence and absence, and abundance perspectives. We show that there is an easy qualitative transition from the Roche/454 to the Illumina MiSeq platforms. The ease of this transition is more nuanced quantitatively for low‐abundant amplicons, although estimates of abundances are known to also vary within platforms.  相似文献   

3.
Iris validation is a Python package created to represent comprehensive per‐residue validation metrics for entire protein chains in a compact, readable and interactive view. These metrics can either be calculated by Iris, or by a third‐party program such as MolProbity. We show that those parts of a protein model requiring attention may generate ripples across the metrics on the diagram, immediately catching the modeler's attention. Iris can run as a standalone tool, or be plugged into existing structural biology software to display per‐chain model quality at a glance, with a particular emphasis on evaluating incremental changes resulting from the iterative nature of model building and refinement. Finally, the integration of Iris into the CCP4i2 graphical user interface is provided as a showcase of its pluggable design.  相似文献   

4.
Despite recent advances in high‐throughput sequencing, difficulties are often encountered when developing microsatellites for species with large and complex genomes. This probably reflects the close association in many species of microsatellites with cryptic repetitive elements. We therefore developed a novel approach for isolating polymorphic microsatellites from the club‐legged grasshopper (Gomphocerus sibiricus), an emerging quantitative genetic and behavioral model system. Whole genome shotgun Illumina MiSeq sequencing was used to generate over three million 300 bp paired‐end reads, of which 67.75% were grouped into 40,548 clusters within RepeatExplorer. Annotations of the top 468 clusters, which represent 60.5% of the reads, revealed homology to satellite DNA and a variety of transposable elements. Evaluating 96 primer pairs in eight wild‐caught individuals, we found that primers mined from singleton reads were six times more likely to amplify a single polymorphic microsatellite locus than primers mined from clusters. Our study provides experimental evidence in support of the notion that microsatellites associated with repetitive elements are less likely to successfully amplify. It also reveals how advances in high‐throughput sequencing and graph‐based repetitive DNA analysis can be leveraged to isolate polymorphic microsatellites from complex genomes.  相似文献   

5.
Summary Second‐generation sequencing (sec‐gen) technology can sequence millions of short fragments of DNA in parallel, making it capable of assembling complex genomes for a small fraction of the price and time of previous technologies. In fact, a recently formed international consortium, the 1000 Genomes Project, plans to fully sequence the genomes of approximately 1200 people. The prospect of comparative analysis at the sequence level of a large number of samples across multiple populations may be achieved within the next five years. These data present unprecedented challenges in statistical analysis. For instance, analysis operates on millions of short nucleotide sequences, or reads—strings of A,C,G, or T's, between 30 and 100 characters long—which are the result of complex processing of noisy continuous fluorescence intensity measurements known as base‐calling. The complexity of the base‐calling discretization process results in reads of widely varying quality within and across sequence samples. This variation in processing quality results in infrequent but systematic errors that we have found to mislead downstream analysis of the discretized sequence read data. For instance, a central goal of the 1000 Genomes Project is to quantify across‐sample variation at the single nucleotide level. At this resolution, small error rates in sequencing prove significant, especially for rare variants. Sec‐gen sequencing is a relatively new technology for which potential biases and sources of obscuring variation are not yet fully understood. Therefore, modeling and quantifying the uncertainty inherent in the generation of sequence reads is of utmost importance. In this article, we present a simple model to capture uncertainty arising in the base‐calling procedure of the Illumina/Solexa GA platform. Model parameters have a straightforward interpretation in terms of the chemistry of base‐calling allowing for informative and easily interpretable metrics that capture the variability in sequencing quality. Our model provides these informative estimates readily usable in quality assessment tools while significantly improving base‐calling performance.  相似文献   

6.
7.
Type specimens have high scientific importance because they provide the only certain connection between the application of a Linnean name and a physical specimen. Many other individuals may have been identified as a particular species, but their linkage to the taxon concept is inferential. Because type specimens are often more than a century old and have experienced conditions unfavourable for DNA preservation, success in sequence recovery has been uncertain. This study addresses this challenge by employing next‐generation sequencing (NGS) to recover sequences for the barcode region of the cytochrome c oxidase 1 gene from small amounts of template DNA. DNA quality was first screened in more than 1800 century‐old type specimens of Lepidoptera by attempting to recover 164‐bp and 94‐bp reads via Sanger sequencing. This analysis permitted the assignment of each specimen to one of three DNA quality categories – high (164‐bp sequence), medium (94‐bp sequence) or low (no sequence). Ten specimens from each category were subsequently analysed via a PCR‐based NGS protocol requiring very little template DNA. It recovered sequence information from all specimens with average read lengths ranging from 458 bp to 610 bp for the three DNA categories. By sequencing ten specimens in each NGS run, costs were similar to Sanger analysis. Future increases in the number of specimens processed in each run promise substantial reductions in cost, making it possible to anticipate a future where barcode sequences are available from most type specimens.  相似文献   

8.
Multi‐factorial experimentation is essential in understanding the link between mammalian cell culture conditions and the glycoprotein product of any biomanufacturing process. This understanding is increasingly demanded as bioprocess development is influenced by the Quality by Design paradigm. We have developed a system that allows hundreds of micro‐bioreactors to be run in parallel under controlled conditions, enabling factorial experiments of much larger scope than is possible with traditional systems. A high‐throughput analytics workflow was also developed using commercially available instruments to obtain product quality information for each cell culture condition. The micro‐bioreactor system was tested by executing a factorial experiment varying four process parameters: pH, dissolved oxygen, feed supplement rate, and reduced glutathione level. A total of 180 micro‐bioreactors were run for 2 weeks during this DOE experiment to assess this scaled down micro‐bioreactor system as a high‐throughput tool for process development. Online measurements of pH, dissolved oxygen, and optical density were complemented by offline measurements of glucose, viability, titer, and product quality. Model accuracy was assessed by regressing the micro‐bioreactor results with those obtained in conventional 3 L bioreactors. Excellent agreement was observed between the micro‐bioreactor and the bench‐top bioreactor. The micro‐bioreactor results were further analyzed to link parameter manipulations to process outcomes via leverage plots, and to examine the interactions between process parameters. The results show that feed supplement rate has a significant effect (P < 0.05) on all performance metrics with higher feed rates resulting in greater cell mass and product titer. Culture pH impacted terminal integrated viable cell concentration, titer and intact immunoglobulin G titer, with better results obtained at the lower pH set point. The results demonstrate that a micro‐scale system can be an excellent model of larger scale systems, while providing data sets broader and deeper than are available by traditional methods. Biotechnol. Bioeng. 2009; 104: 1107–1120. © 2009 Wiley Periodicals, Inc.  相似文献   

9.
Quantifying landscape characteristics and linking them to ecological processes is one of the central goals of landscape ecology. Landscape metrics are a widely used tool for the analysis of patch‐based, discrete land‐cover classes. Existing software to calculate landscape metrics has several constraints, such as being limited to a single platform, not being open‐source or involving a complicated integration into large workflows. We present landscapemetrics, an open‐source R package that overcomes many constraints of existing landscape metric software. The package includes an extensive collection of commonly used landscape metrics in a tidy workflow. To facilitate the integration into large workflows, landscapemetrics is based on a well‐established spatial framework in R. This allows pre‐processing of land‐cover maps or further statistical analysis without importing and exporting the data from and to different software environments. Additionally, the package provides many utility functions to visualize, extract, and sample landscape metrics. Lastly, we provide building‐blocks to motivate the development and integration of new metrics in the future. We demonstrate the usage and advantages of landscapemetrics by analysing the influence of different sampling schemes on the estimation of landscape metrics. In so doing, we demonstrate the many advantages of the package, especially its easy integration into large workflows. These new developments should help with the integration of landscape analysis in ecological research, given that ecologists are increasingly using R for the statistical analysis, modelling and visualization of spatial data.  相似文献   

10.
Minimally invasive sampling (MIS) is widespread in wildlife studies; however, its utility for massively parallel DNA sequencing (MPS) is limited. Poor sample quality and contamination by exogenous DNA can make MIS challenging to use with modern genotyping‐by‐sequencing approaches, which have been traditionally developed for high‐quality DNA sources. Given that MIS is often more appropriate in many contexts, there is a need to make such samples practical for harnessing MPS. Here, we test the ability for Genotyping‐in‐Thousands by sequencing (GT‐seq), a multiplex amplicon sequencing approach, to effectively genotype minimally invasive cloacal DNA samples collected from the Western Rattlesnake (Crotalus oreganus), a threatened species in British Columbia, Canada. As there was no previous genetic information for this species, an optimized panel of 362 SNPs was selected for use with GT‐seq from a de novo restriction site‐associated DNA sequencing (RADseq) assembly. Comparisons of genotypes generated within and among RADseq and GT‐seq for the same individuals found low rates of genotyping error (GT‐seq: 0.50%; RADseq: 0.80%) and discordance (2.57%), the latter likely due to the different genotype calling models employed. GT‐seq mean genotype discordance between blood and cloacal swab samples collected from the same individuals was also minimal (1.37%). Estimates of population diversity parameters were similar across GT‐seq and RADseq data sets, as were inferred patterns of population structure. Overall, GT‐seq can be effectively applied to low‐quality DNA samples, minimizing the inefficiencies presented by exogenous DNA typically found in minimally invasive samples and continuing the expansion of molecular ecology and conservation genetics in the genomics era.  相似文献   

11.
12.
The main objective of this work was to develop and validate a robust and reliable “from‐benchtop‐to‐desktop” metabarcoding workflow to investigate the diet of invertebrate‐eaters. We applied our workflow to faecal DNA samples of an invertebrate‐eating fish species. A fragment of the cytochrome c oxidase I (COI) gene was amplified by combining two minibarcoding primer sets to maximize the taxonomic coverage. Amplicons were sequenced by an Illumina MiSeq platform. We developed a filtering approach based on a series of nonarbitrary thresholds established from control samples and from molecular replicates to address the elimination of cross‐contamination, PCR/sequencing errors and mistagging artefacts. This resulted in a conservative and informative metabarcoding data set. We developed a taxonomic assignment procedure that combines different approaches and that allowed the identification of ~75% of invertebrate COI variants to the species level. Moreover, based on the diversity of the variants, we introduced a semiquantitative statistic in our diet study, the minimum number of individuals, which is based on the number of distinct variants in each sample. The metabarcoding approach described in this article may guide future diet studies that aim to produce robust data sets associated with a fine and accurate identification of prey items.  相似文献   

13.
Estimating the evolutionary potential of quantitative traits and reliably predicting responses to selection in wild populations are important challenges in evolutionary biology. The genomic revolution has opened up opportunities for measuring relatedness among individuals with precision, enabling pedigree‐free estimation of trait heritabilities in wild populations. However, until now, most quantitative genetic studies based on a genomic relatedness matrix (GRM) have focused on long‐term monitored populations for which traditional pedigrees were also available, and have often had access to knowledge of genome sequence and variability. Here, we investigated the potential of RAD‐sequencing for estimating heritability in a free‐ranging roe deer (Capreolous capreolus) population for which no prior genomic resources were available. We propose a step‐by‐step analytical framework to optimize the quality and quantity of the genomic data and explore the impact of the single nucleotide polymorphism (SNP) calling and filtering processes on the GRM structure and GRM‐based heritability estimates. As expected, our results show that sequence coverage strongly affects the number of recovered loci, the genotyping error rate and the amount of missing data. Ultimately, this had little effect on heritability estimates and their standard errors, provided that the GRM was built from a minimum number of loci (above 7,000). Genomic relatedness matrix‐based heritability estimates thus appear robust to a moderate level of genotyping errors in the SNP data set. We also showed that quality filters, such as the removal of low‐frequency variants, affect the relatedness structure of the GRM, generating lower h2 estimates. Our work illustrates the huge potential of RAD‐sequencing for estimating GRM‐based heritability in virtually any natural population.  相似文献   

14.
DNA barcoding is an efficient method to identify specimens and to detect undescribed/cryptic species. Sanger sequencing of individual specimens is the standard approach in generating large‐scale DNA barcode libraries and identifying unknowns. However, the Sanger sequencing technology is, in some respects, inferior to next‐generation sequencers, which are capable of producing millions of sequence reads simultaneously. Additionally, direct Sanger sequencing of DNA barcode amplicons, as practiced in most DNA barcoding procedures, is hampered by the need for relatively high‐target amplicon yield, coamplification of nuclear mitochondrial pseudogenes, confusion with sequences from intracellular endosymbiotic bacteria (e.g. Wolbachia) and instances of intraindividual variability (i.e. heteroplasmy). Any of these situations can lead to failed Sanger sequencing attempts or ambiguity of the generated DNA barcodes. Here, we demonstrate the potential application of next‐generation sequencing platforms for parallel acquisition of DNA barcode sequences from hundreds of specimens simultaneously. To facilitate retrieval of sequences obtained from individual specimens, we tag individual specimens during PCR amplification using unique 10‐mer oligonucleotides attached to DNA barcoding PCR primers. We employ 454 pyrosequencing to recover full‐length DNA barcodes of 190 specimens using 12.5% capacity of a 454 sequencing run (i.e. two lanes of a 16 lane run). We obtained an average of 143 sequence reads for each individual specimen. The sequences produced are full‐length DNA barcodes for all but one of the included specimens. In a subset of samples, we also detected Wolbachia, nontarget species, and heteroplasmic sequences. Next‐generation sequencing is of great value because of its protocol simplicity, greatly reduced cost per barcode read, faster throughout and added information content.  相似文献   

15.
The feasibility to sequence entire genomes of virtually any organism provides unprecedented insights into the evolutionary history of populations and species. Nevertheless, many population genomic inferences – including the quantification and dating of admixture, introgression and demographic events, and inference of selective sweeps – are still limited by the lack of high‐quality haplotype information. The newest generation of sequencing technology now promises significant progress. To establish the feasibility of haplotype‐resolved genome resequencing at population scale, we investigated properties of linked‐read sequencing data of songbirds of the genus Oenanthe across a range of sequencing depths. Our results based on the comparison of downsampled (25×, 20×, 15×, 10×, 7×, and 5×) with high‐coverage data (46–68×) of seven bird genomes mapped to a reference suggest that phasing contiguities and accuracies adequate for most population genomic analyses can be reached already with moderate sequencing effort. At 15× coverage, phased haplotypes span about 90% of the genome assembly, with 50% and 90% of phased sequences located in phase blocks longer than 1.25–4.6 Mb (N50) and 0.27–0.72 Mb (N90). Phasing accuracy reaches beyond 99% starting from 15× coverage. Higher coverages yielded higher contiguities (up to about 7 Mb/1 Mb [N50/N90] at 25× coverage), but only marginally improved phasing accuracy. Phase block contiguity improved with input DNA molecule length; thus, higher‐quality DNA may help keeping sequencing costs at bay. In conclusion, even for organisms with gigabase‐sized genomes like birds, linked‐read sequencing at moderate depth opens an affordable avenue towards haplotype‐resolved genome resequencing at population scale.  相似文献   

16.
The ability to generate genomic data from wild animal populations has the potential to give unprecedented insight into the population history and dynamics of species in their natural habitats. However, for many species, it is impossible legally, ethically or logistically to obtain tissue samples of quality sufficient for genomic analyses. In this study we evaluate the success of multiple sources of genetic material (faeces, urine, dentin and dental calculus) and several capture methods (shotgun, whole‐genome, exome) in generating genome‐scale data in wild eastern chimpanzees (Pan troglodytes schweinfurthii) from Gombe National Park, Tanzania. We found that urine harbours significantly more host DNA than other sources, leading to broader and deeper coverage across the genome. Urine also exhibited a lower rate of allelic dropout. We found exome sequencing to be far more successful than both shotgun sequencing and whole‐genome capture at generating usable data from low‐quality samples such as faeces and dental calculus. These results highlight urine as a promising and untapped source of DNA that can be noninvasively collected from wild populations of many species.  相似文献   

17.
As a preliminary investigation for the development of microbial‐enhanced oil recovery strategies for high‐temperature oil reservoirs (~70 to 90°C), we have investigated the indigenous microbial community compositions of produced waters from five different high‐temperature oil reservoirs near Segno, Texas, U.S. (~80 to 85°C) and Crossfield, Alberta, Canada (~75°C). The DNA extracted from these low‐biomass‐produced water samples were analysed with MiSeq amplicon sequencing of partial 16S rRNA genes. These sequences were analysed along with additional sequence data sets available from existing databases. Despite the geographical distance and difference in the physicochemical properties, the microbial compositions of the Segno and Crossfield produced waters exhibited unexpectedly high similarity, as indicated by the results of beta diversity analyses. The major operational taxonomic units included acetoclastic and hydrogenotrophic methanogens (Methanosaetaceae, Methanobacterium and Methanoculleus), as well as bacteria belonging to the families Clostridiaceae and Thermotogaceae, which have been recognized to include thermophilic, thermotolerant, and/or spore‐forming subtaxa. The sequence data retrieved from the databases exhibited different clustering patterns, as the communities from close geographical locations invariably had low beta diversity and the physicochemical properties and conditions of the reservoirs apparently did not have a substantial role in shaping of microbial communities.  相似文献   

18.
With their direct link to individual fitness, genes of the major histocompatibility complex (MHC) are a popular system to study the evolution of adaptive genetic diversity. However, owing to the highly dynamic evolution of the MHC region, the isolation, characterization and genotyping of MHC genes remain a major challenge. While high‐throughput sequencing technologies now provide unprecedented resolution of the high allelic diversity observed at the MHC, in many species, it remains unclear (i) how alleles are distributed among MHC loci, (ii) whether MHC loci are linked or segregate independently and (iii) how much copy number variation (CNV) can be observed for MHC genes in natural populations. Here, we show that the study of allele segregation patterns within families can provide significant insights in this context. We sequenced two MHC class I (MHC‐I) loci in 1267 European barn owls (Tyto alba), including 590 offspring from 130 families using Illumina MiSeq technology. Coupled with a high per‐individual sequencing coverage (~3000×), the study of allele segregation patterns within families provided information on three aspects of the architecture of MHC‐I variation in barn owls: (i) extensive sharing of alleles among loci, (ii) strong linkage of MHC‐I loci indicating tandem architecture and (iii) the presence of CNV in the barn owl MHC‐I. We conclude that the additional information that can be gained from high‐coverage amplicon sequencing by investigating allele segregation patterns in families not only helps improving the accuracy of MHC genotyping, but also contributes towards enhanced analyses in the context of MHC evolutionary ecology.  相似文献   

19.
With novel developments in sequencing technologies, time‐sampled data are becoming more available and accessible. Naturally, there have been efforts in parallel to infer population genetic parameters from these data sets. Here, we compare and analyse four recent approaches based on the Wright–Fisher model for inferring selection coefficients (s) given effective population size (Ne), with simulated temporal data sets. Furthermore, we demonstrate the advantage of a recently proposed approximate Bayesian computation (ABC)‐based method that is able to correctly infer genomewide average Ne from time‐serial data, which is then set as a prior for inferring per‐site selection coefficients accurately and precisely. We implement this ABC method in a new software and apply it to a classical time‐serial data set of the medionigra genotype in the moth Panaxia dominula. We show that a recessive lethal model is the best explanation for the observed variation in allele frequency by implementing an estimator of the dominance ratio (h).  相似文献   

20.
Flexibility and low cost make genotyping‐by‐sequencing (GBS) an ideal tool for population genomic studies of nonmodel species. However, to utilize the potential of the method fully, many parameters affecting library quality and single nucleotide polymorphism (SNP) discovery require optimization, especially for conifer genomes with a high repetitive DNA content. In this study, we explored strategies for effective GBS analysis in pine species. We constructed GBS libraries using HpaII, PstI and EcoRI‐MseI digestions with different multiplexing levels and examined the effect of restriction enzymes on library complexity and the impact of sequencing depth and size selection of restriction fragments on sequence coverage bias. We tested and compared UNEAK, Stacks and GATK pipelines for the GBS data, and then developed a reference‐free SNP calling strategy for haploid pine genomes. Our GBS procedure proved to be effective in SNP discovery, producing 7000–11 000 and 14 751 SNPs within and among three pine species, respectively, from a PstI library. This investigation provides guidance for the design and analysis of GBS experiments, particularly for organisms for which genomic information is lacking.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号