期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

BIPES,a cost-effective high-throughput method for assessing microbial diversity

Hong-Wei Zhou Dong-Fang Li Nora Fung-Yee Tam Xiao-Tao Jiang Hai Zhang Hua-Fang Sheng Jin Qin Xiao Liu Fei Zou 《The ISME journal》2011,5(4):741-749

Pyrosequencing of 16S rRNA (16S) variable tags has become the most popular method for assessing microbial diversity, but the method remains costly for the evaluation of large numbers of environmental samples with high sequencing depths. We developed a barcoded Illumina paired-end (PE) sequencing (BIPES) method that sequences each 16S V6 tag from both ends on the Illumina HiSeq 2000, and the PE reads are then overlapped to obtain the V6 tag. The average accuracy of Illumina single-end (SE) reads was only 97.9%, which decreased from ∼99.9% at the start of the read to less than 85% at the end of the read; nevertheless, overlapping of the PE reads significantly increased the sequencing accuracy to 99.65% by verifying the 3′ end of each SE in which the sequencing quality was degraded. After the removal of tags with two or more mismatches within the medial 40–70 bases of the reads and of tags with any primer errors, the overall base sequencing accuracy of the BIPES reads was further increased to 99.93%. The BIPES reads reflected the amounts of the various tags in the initial template, but long tags and high GC tags were underestimated. The BIPES method yields 20–50 times more 16S V6 tags than does pyrosequencing in a single-flow cell run, and each of the BIPES reads costs less than 1/40 of a pyrosequencing read. As a laborsaving and cost-effective method, BIPES can be routinely used to analyze the microbial ecology of both environmental and human microbiomes. 相似文献

2.

Haplotype Estimation Using Sequencing Reads

Olivier Delaneau Bryan Howie Anthony?J. Cox Jean-Fran?ois Zagury Jonathan Marchini 《American journal of human genetics》2013,93(4):687-696

High-throughput sequencing technologies produce short sequence reads that can contain phase information if they span two or more heterozygote genotypes. This information is not routinely used by current methods that infer haplotypes from genotype data. We have extended the SHAPEIT2 method to use phase-informative sequencing reads to improve phasing accuracy. Our model incorporates the read information in a probabilistic model through base quality scores within each read. The method is primarily designed for high-coverage sequence data or data sets that already have genotypes called. One important application is phasing of single samples sequenced at high coverage for use in medical sequencing and studies of rare diseases. Our method can also use existing panels of reference haplotypes. We tested the method by using a mother-father-child trio sequenced at high-coverage by Illumina together with the low-coverage sequence data from the 1000 Genomes Project (1000GP). We found that use of phase-informative reads increases the mean distance between switch errors by 22% from 274.4 kb to 328.6 kb. We also used male chromosome X haplotypes from the 1000GP samples to simulate sequencing reads with varying insert size, read length, and base error rate. When using short 100 bp paired-end reads, we found that using mixtures of insert sizes produced the best results. When using longer reads with high error rates (5–20 kb read with 4%–15% error per base), phasing performance was substantially improved. 相似文献

3.

Coval: Improving Alignment Quality and Variant Calling Accuracy for Next-Generation Sequencing Data

Shunichi Kosugi Satoshi Natsume Kentaro Yoshida Daniel MacLean Liliana Cano Sophien Kamoun Ryohei Terauchi 《PloS one》2013,8(10)

Accurate identification of DNA polymorphisms using next-generation sequencing technology is challenging because of a high rate of sequencing error and incorrect mapping of reads to reference genomes. Currently available short read aligners and DNA variant callers suffer from these problems. We developed the Coval software to improve the quality of short read alignments. Coval is designed to minimize the incidence of spurious alignment of short reads, by filtering mismatched reads that remained in alignments after local realignment and error correction of mismatched reads. The error correction is executed based on the base quality and allele frequency at the non-reference positions for an individual or pooled sample. We demonstrated the utility of Coval by applying it to simulated genomes and experimentally obtained short-read data of rice, nematode, and mouse. Moreover, we found an unexpectedly large number of incorrectly mapped reads in ‘targeted’ alignments, where the whole genome sequencing reads had been aligned to a local genomic segment, and showed that Coval effectively eliminated such spurious alignments. We conclude that Coval significantly improves the quality of short-read sequence alignments, thereby increasing the calling accuracy of currently available tools for SNP and indel identification. Coval is available at http://sourceforge.net/projects/coval105/. 相似文献

4.

Correction of sequence-dependent ambiguous bases (Ns) from the 454 pyrosequencing system

Sunguk Shin Joonhong Park 《Nucleic acids research》2014,42(7):e51

Pyrosequencing of the 16S ribosomal RNA gene (16S) has become one of the most popular methods to assess microbial diversity. Pyrosequencing reads containing ambiguous bases (Ns) are generally discarded based on the assumptions of their non-sequence-dependent formation and high error rates. However, taxonomic composition differed by removal of reads with Ns. We determined whether Ns from pyrosequencing occur in a sequence-dependent manner. Our reads and the corresponding flow value data revealed occurrence of sequence-specific N errors with a common sequential pattern (a homopolymer + a few nucleotides with bases other than the homopolymer + N) and revealed that the nucleotide base of the homopolymer is the true base for the following N. Using an algorithm reflecting this sequence-dependent pattern, we corrected the Ns in the 16S (86.54%), bphD (81.37%) and nifH (81.55%) amplicon reads from a mock community with high precisions of 95.4, 96.9 and 100%, respectively. The new N correction method was applicable for determining most of Ns in amplicon reads from a soil sample, resulting in reducing taxonomic biases associated with N errors and in shotgun sequencing reads from public metagenome data. The method improves the accuracy and precision of microbial community analysis and genome sequencing using 454 pyrosequencing. 相似文献

5.

Aberration-corrected ultrafine analysis of miRNA reads at single-base resolution: a k-mer lattice approach

Xuan Zhang Pengyao Ping Gyorgy Hutvagner Michael Blumenstein Jinyan Li 《Nucleic acids research》2021,49(18):e106

Raw sequencing reads of miRNAs contain machine-made substitution errors, or even insertions and deletions (indels). Although the error rate can be low at 0.1%, precise rectification of these errors is critically important because isoform variation analysis at single-base resolution such as novel isomiR discovery, editing events understanding, differential expression analysis, or tissue-specific isoform identification is very sensitive to base positions and copy counts of the reads. Existing error correction methods do not work for miRNA sequencing data attributed to miRNAs’ length and per-read-coverage properties distinct from DNA or mRNA sequencing reads. We present a novel lattice structure combining kmers, (k – 1)mers and (k + 1)mers to address this problem. The method is particularly effective for the correction of indel errors. Extensive tests on datasets having known ground truth of errors demonstrate that the method is able to remove almost all of the errors, without introducing any new error, to improve the data quality from every-50-reads containing one error to every-1300-reads containing one error. Studies on experimental miRNA sequencing datasets show that the errors are often rectified at the 5′ ends and the seed regions of the reads, and that there are remarkable changes after the correction in miRNA isoform abundance, volume of singleton reads, overall entropy, isomiR families, tissue-specific miRNAs, and rare-miRNA quantities. 相似文献

6.

Pulling out the 1%: Whole-Genome Capture for the Targeted Enrichment of Ancient DNA Sequencing Libraries

Meredith?L. Carpenter Jason?D. Buenrostro Cristina Valdiosera Hannes Schroeder Morten?E. Allentoft Martin Sikora Morten Rasmussen Simon Gravel Sonia Guillén Georgi Nekhrizov Krasimir Leshtakov Diana Dimitrova Nikola Theodossiev Davide Pettener Donata Luiselli Karla Sandoval Andrés Moreno-Estrada Yingrui Li Jun Wang M.?Thomas?P. Gilbert Eske Willerslev William?J. Greenleaf Carlos?D. Bustamante 《American journal of human genetics》2013,93(5):852-864

Most ancient specimens contain very low levels of endogenous DNA, precluding the shotgun sequencing of many interesting samples because of cost. Ancient DNA (aDNA) libraries often contain <1% endogenous DNA, with the majority of sequencing capacity taken up by environmental DNA. Here we present a capture-based method for enriching the endogenous component of aDNA sequencing libraries. By using biotinylated RNA baits transcribed from genomic DNA libraries, we are able to capture DNA fragments from across the human genome. We demonstrate this method on libraries created from four Iron Age and Bronze Age human teeth from Bulgaria, as well as bone samples from seven Peruvian mummies and a Bronze Age hair sample from Denmark. Prior to capture, shotgun sequencing of these libraries yielded an average of 1.2% of reads mapping to the human genome (including duplicates). After capture, this fraction increased substantially, with up to 59% of reads mapped to human and enrichment ranging from 6- to 159-fold. Furthermore, we maintained coverage of the majority of regions sequenced in the precapture library. Intersection with the 1000 Genomes Project reference panel yielded an average of 50,723 SNPs (range 3,062–147,243) for the postcapture libraries sequenced with 1 million reads, compared with 13,280 SNPs (range 217–73,266) for the precapture libraries, increasing resolution in population genetic analyses. Our whole-genome capture approach makes it less costly to sequence aDNA from specimens containing very low levels of endogenous DNA, enabling the analysis of larger numbers of samples. 相似文献

7.

QuorUM: An Error Corrector for Illumina Reads

Guillaume Mar?ais James A. Yorke Aleksey Zimin 《PloS one》2015,10(6)

Motivation

Illumina Sequencing data can provide high coverage of a genome by relatively short (most often 100 bp to 150 bp) reads at a low cost. Even with low (advertised 1%) error rate, 100 × coverage Illumina data on average has an error in some read at every base in the genome. These errors make handling the data more complicated because they result in a large number of low-count erroneous k-mers in the reads. However, there is enough information in the reads to correct most of the sequencing errors, thus making subsequent use of the data (e.g. for mapping or assembly) easier. Here we use the term “error correction” to denote the reduction in errors due to both changes in individual bases and trimming of unusable sequence. We developed an error correction software called QuorUM. QuorUM is mainly aimed at error correcting Illumina reads for subsequent assembly. It is designed around the novel idea of minimizing the number of distinct erroneous k-mers in the output reads and preserving the most true k-mers, and we introduce a composite statistic π that measures how successful we are at achieving this dual goal. We evaluate the performance of QuorUM by correcting actual Illumina reads from genomes for which a reference assembly is available.

Results

We produce trimmed and error-corrected reads that result in assemblies with longer contigs and fewer errors. We compared QuorUM against several published error correctors and found that it is the best performer in most metrics we use. QuorUM is efficiently implemented making use of current multi-core computing architectures and it is suitable for large data sets (1 billion bases checked and corrected per day per core). We also demonstrate that a third-party assembler (SOAPdenovo) benefits significantly from using QuorUM error-corrected reads. QuorUM error corrected reads result in a factor of 1.1 to 4 improvement in N50 contig size compared to using the original reads with SOAPdenovo for the data sets investigated.

Availability

QuorUM is distributed as an independent software package and as a module of the MaSuRCA assembly software. Both are available under the GPL open source license at http://www.genome.umd.edu.

Contact

ude.dmu@siacramg. 相似文献

8.

BALSA: Bayesian algorithm for local sequence alignment 总被引：3，自引：1，他引：2

下载免费PDF全文

Webb BJ Liu JS Lawrence CE 《Nucleic acids research》2002,30(5):1268-1277

The Smith–Waterman algorithm yields a single alignment, which, albeit optimal, can be strongly affected by the choice of the scoring matrix and the gap penalties. Additionally, the scores obtained are dependent upon the lengths of the aligned sequences, requiring a post-analysis conversion. To overcome some of these shortcomings, we developed a Bayesian algorithm for local sequence alignment (BALSA), that takes into account the uncertainty associated with all unknown variables by incorporating in its forward sums a series of scoring matrices, gap parameters and all possible alignments. The algorithm can return both the joint and the marginal optimal alignments, samples of alignments drawn from the posterior distribution and the posterior probabilities of gap penalties and scoring matrices. Furthermore, it automatically adjusts for variations in sequence lengths. BALSA was compared with SSEARCH, to date the best performing dynamic programming algorithm in the detection of structural neighbors. Using the SCOP databases PDB40D-B and PDB90D-B, BALSA detected 19.8 and 41.3% of remote homologs whereas SSEARCH detected 18.4 and 38% at an error rate of 1% errors per query over the databases, respectively. 相似文献

9.

Multi-Sample Pooling and Illumina Genome Analyzer Sequencing Methods to Determine Gene Sequence Variation for Database Development

Rebecca L. Margraf Jacob D. Durtschi Shale Dames David C. Pattison Jack E. Stephens Rong Mao Karl V. Voelkerding 《Journal of biomolecular techniques》2010,21(3):126-140

Determination of sequence variation within a genetic locus to develop clinically relevant databases is critical for molecular assay design and clinical test interpretation, so multisample pooling for Illumina genome analyzer (GA) sequencing was investigated using the RET proto-oncogene as a model. Samples were Sanger-sequenced for RET exons 10, 11, and 13–16. Ten samples with 13 known unique variants (“singleton variants” within the pool) and seven common changes were amplified and then equimolar-pooled before sequencing on a single flow cell lane, generating 36 base reads. For comparison, a single “control” sample was run in a different lane. After alignment, a 24-base quality score-screening threshold and 3` read end trimming of three bases yielded low background error rates with a 27% decrease in aligned read coverage. Sequencing data were evaluated using an established variant detection method (percent variant reads), by the presented subtractive correction method, and with SNPSeeker software. In total, 41 variants (of which 23 were singleton variants) were detected in the 10 pool data, which included all Sanger-identified variants. The 23 singleton variants were detected near the expected 5% allele frequency (average 5.17%±0.90% variant reads), well above the highest background error (1.25%). Based on background error rates, read coverage, simulated 30, 40, and 50 sample pool data, expected singleton allele frequencies within pools, and variant detection methods; ≥30 samples (which demonstrated a minimum 1% variant reads for singletons) could be pooled to reliably detect singleton variants by GA sequencing. 相似文献

10.

Mapping Accuracy of Short Reads from Massively Parallel Sequencing and the Implications for Quantitative Expression Profiling

Nicola Palmieri Christian Schl?tterer 《PloS one》2009,4(7)

Background

Massively parallel sequencing offers an enormous potential for expression profiling, in particular for interspecific comparisons. Currently, different platforms for massively parallel sequencing are available, which differ in read length and sequencing costs. The 454-technology offers the highest read length. The other sequencing technologies are more cost effective, on the expense of shorter reads. Reliable expression profiling by massively parallel sequencing depends crucially on the accuracy to which the reads could be mapped to the corresponding genes.

Methodology/Principal Findings

We performed an in silico analysis to evaluate whether incorrect mapping of the sequence reads results in a biased expression pattern. A comparison of six available mapping software tools indicated a considerable heterogeneity in mapping speed and accuracy. Independently of the software used to map the reads, we found that for compact genomes both short (35 bp, 50 bp) and long sequence reads (100 bp) result in an almost unbiased expression pattern. In contrast, for species with a larger genome containing more gene families and repetitive DNA, shorter reads (35–50 bp) produced a considerable bias in gene expression. In humans, about 10% of the genes had fewer than 50% of the sequence reads correctly mapped. Sequence polymorphism up to 9% had almost no effect on the mapping accuracy of 100 bp reads. For 35 bp reads up to 3% sequence divergence did not affect the mapping accuracy strongly. The effect of indels on the mapping efficiency strongly depends on the mapping software.

Conclusions/Significance

In complex genomes, expression profiling by massively parallel sequencing could introduce a considerable bias due to incorrectly mapped sequence reads if the read length is short. Nevertheless, this bias could be accounted for if the genomic sequence is known. Furthermore, sequence polymorphisms and indels also affect the mapping accuracy and may cause a biased gene expression measurement. The choice of the mapping software is highly critical and the reliability depends on the presence/absence of indels and the divergence between reads and the reference genome. Overall, we found SSAHA2 and CLC to produce the most reliable mapping results. 相似文献

11.

Fast Mapping of Short Sequences with Mismatches,Insertions and Deletions Using Index Structures

Steve Hoffmann Christian Otto Stefan Kurtz Cynthia M. Sharma Philipp Khaitovich J?rg Vogel Peter F. Stadler J?rg Hackermüller 《PLoS computational biology》2009,5(9)

相似文献

12.

A platform-independent method for detecting errors in metagenomic sequencing data: DRISEE

Keegan KP Trimble WL Wilkening J Wilke A Harrison T D'Souza M Meyer F 《PLoS computational biology》2012,8(6):e1002541

We provide a novel method, DRISEE (duplicate read inferred sequencing error estimation), to assess sequencing quality (alternatively referred to as "noise" or "error") within and/or between sequencing samples. DRISEE provides positional error estimates that can be used to inform read trimming within a sample. It also provides global (whole sample) error estimates that can be used to identify samples with high or varying levels of sequencing error that may confound downstream analyses, particularly in the case of studies that utilize data from multiple sequencing samples. For shotgun metagenomic data, we believe that DRISEE provides estimates of sequencing error that are more accurate and less constrained by technical limitations than existing methods that rely on reference genomes or the use of scores (e.g. Phred). Here, DRISEE is applied to (non amplicon) data sets from both the 454 and Illumina platforms. The DRISEE error estimate is obtained by analyzing sets of artifactual duplicate reads (ADRs), a known by-product of both sequencing platforms. We present DRISEE as an open-source, platform-independent method to assess sequencing error in shotgun metagenomic data, and utilize it to discover previously uncharacterized error in de novo sequence data from the 454 and Illumina sequencing platforms. 相似文献

13.

Sequence deeper without sequencing more: Bayesian resolution of ambiguously mapped reads

Rohan N. Shah Alexander J. Ruthenburg 《PLoS computational biology》2021,17(4)

Next-generation sequencing (NGS) has transformed molecular biology and contributed to many seminal insights into genomic regulation and function. Apart from whole-genome sequencing, an NGS workflow involves alignment of the sequencing reads to the genome of study, after which the resulting alignments can be used for downstream analyses. However, alignment is complicated by the repetitive sequences; many reads align to more than one genomic locus, with 15–30% of the genome not being uniquely mappable by short-read NGS. This problem is typically addressed by discarding reads that do not uniquely map to the genome, but this practice can lead to systematic distortion of the data. Previous studies that developed methods for handling ambiguously mapped reads were often of limited applicability or were computationally intensive, hindering their broader usage. In this work, we present SmartMap: an algorithm that augments industry-standard aligners to enable usage of ambiguously mapped reads by assigning weights to each alignment with Bayesian analysis of the read distribution and alignment quality. SmartMap is computationally efficient, utilizing far fewer weighting iterations than previously thought necessary to process alignments and, as such, analyzing more than a billion alignments of NGS reads in approximately one hour on a desktop PC. By applying SmartMap to peak-type NGS data, including MNase-seq, ChIP-seq, and ATAC-seq in three organisms, we can increase read depth by up to 53% and increase the mapped proportion of the genome by up to 18% compared to analyses utilizing only uniquely mapped reads. We further show that SmartMap enables the analysis of more than 140,000 repetitive elements that could not be analyzed by traditional ChIP-seq workflows, and we utilize this method to gain insight into the epigenetic regulation of different classes of repetitive elements. These data emphasize both the dangers of discarding ambiguously mapped reads and their power for driving biological discovery. 相似文献

14.

A fault-tolerant method for HLA typing with PacBio data

Chia-Jung Chang Pei-Lung Chen Wei-Shiung Yang Kun-Mao Chao 《BMC bioinformatics》2014,15(1)

Background

Human leukocyte antigen (HLA) genes are critical genes involved in important biomedical aspects, including organ transplantation, autoimmune diseases and infectious diseases. The gene family contains the most polymorphic genes in humans and the difference between two alleles is only a single base pair substitution in many cases. The next generation sequencing (NGS) technologies could be used for high throughput HLA typing but in silico methods are still needed to correctly assign the alleles of a sample. Computer scientists have developed such methods for various NGS platforms, such as Illumina, Roche 454 and Ion Torrent, based on the characteristics of the reads they generate. However, the method for PacBio reads was less addressed, probably owing to its high error rates. The PacBio system has the longest read length among available NGS platforms, and therefore is the only platform capable of having exon 2 and exon 3 of HLA genes on the same read to unequivocally solve the ambiguity problem caused by the “phasing” issue.

Results

We proposed a new method BayesTyping1 to assign HLA alleles for PacBio circular consensus sequencing reads using Bayes’ theorem. The method was applied to simulated data of the three loci HLA-A, HLA-B and HLA-DRB1. The experimental results showed its capability to tolerate the disturbance of sequencing errors and external noise reads.

Conclusions

The BayesTyping1 method could overcome the problems of HLA typing using PacBio reads, which mostly arise from sequencing errors of PacBio reads and the divergence of HLA genes, to some extent.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-296) contains supplementary material, which is available to authorized users. 相似文献

15.

Pollux: platform independent error correction of single and mixed genomes

Eric Marinier Daniel G Brown Brendan J McConkey 《BMC bioinformatics》2014,16(1)

Background

Second-generation sequencers generate millions of relatively short, but error-prone, reads. These errors make sequence assembly and other downstream projects more challenging. Correcting these errors improves the quality of assemblies and projects which benefit from error-free reads.

Results

We have developed a general-purpose error corrector that corrects errors introduced by Illumina, Ion Torrent, and Roche 454 sequencing technologies and can be applied to single- or mixed-genome data. In addition to correcting substitution errors, we locate and correct insertion, deletion, and homopolymer errors while remaining sensitive to low coverage areas of sequencing projects. Using published data sets, we correct 94% of Illumina MiSeq errors, 88% of Ion Torrent PGM errors, 85% of Roche 454 GS Junior errors. Introduced errors are 20 to 70 times more rare than successfully corrected errors. Furthermore, we show that the quality of assemblies improves when reads are corrected by our software.

Conclusions

Pollux is highly effective at correcting errors across platforms, and is consistently able to perform as well or better than currently available error correction software. Pollux provides general-purpose error correction and may be used in applications with or without assembly. 相似文献

16.

Archaea in Yellowstone Lake

Jinjun Kan Scott Clingenpeel Richard E Macur William P Inskeep Dave Lovalvo John Varley Yuri Gorby Timothy R McDermott Kenneth Nealson 《The ISME journal》2011,5(11):1784-1795

The Yellowstone geothermal complex has yielded foundational discoveries that have significantly enhanced our understanding of the Archaea. This study continues on this theme, examining Yellowstone Lake and its lake floor hydrothermal vents. Significant Archaea novelty and diversity were found associated with two near-surface photic zone environments and two vents that varied in their depth, temperature and geochemical profile. Phylogenetic diversity was assessed using 454-FLX sequencing (∼51 000 pyrosequencing reads; V1 and V2 regions) and Sanger sequencing of 200 near-full-length polymerase chain reaction (PCR) clones. Automated classifiers (Ribosomal Database Project (RDP) and Greengenes) were problematic for the 454-FLX reads (wrong domain or phylum), although BLAST analysis of the 454-FLX reads against the phylogenetically placed full-length Sanger sequenced PCR clones proved reliable. Most of the archaeal diversity was associated with vents, and as expected there were differences between the vents and the near-surface photic zone samples. Thaumarchaeota dominated all samples: vent-associated organisms corresponded to the largely uncharacterized Marine Group I, and in surface waters, ∼69–84% of the 454-FLX reads matched archaeal clones representing organisms that are Nitrosopumilus maritimus-like (96–97% identity). Importance of the lake nitrogen cycling was also suggested by >5% of the alkaline vent phylotypes being closely related to the nitrifier Candidatus Nitrosocaldus yellowstonii. The Euryarchaeota were primarily related to the uncharacterized environmental clones that make up the Deep Sea Euryarchaeal Group or Deep Sea Hydrothermal Vent Group-6. The phylogenetic parallels of Yellowstone Lake archaea to marine microorganisms provide opportunities to examine interesting evolutionary tracks between freshwater and marine lineages. 相似文献

17.

A metagenome of a full-scale microbial community carrying out enhanced biological phosphorus removal

Mads Albertsen Lea Benedicte Skov Hansen Aaron Marc Saunders Per Halkj?r Nielsen K?re Lehmann Nielsen 《The ISME journal》2012,6(6):1094-1106

Enhanced biological phosphorus removal (EBPR) is widely used for removal of phosphorus from wastewater. In this study, a metagenome (18.2 Gb) was generated using Illumina sequencing from a full-scale EBPR plant to study the community structure and genetic potential. Quantitative fluorescence in situ hybridization (qFISH) was applied as an independent method to evaluate the community structure. The results were in qualitative agreement, but a DNA extraction bias against gram positive bacteria using standard extraction protocols was identified, which would not have been identified without the use of qFISH. The genetic potential for community function showed enrichment of genes involved in phosphate metabolism and biofilm formation, reflecting the selective pressure of the EBPR process. Most contigs in the assembled metagenome had low similarity to genes from currently sequenced genomes, underlining the need for more reference genomes of key EBPR species. Only the genome of ‘Candidatus Accumulibacter'', a genus of phosphorus-removing organisms, was closely enough related to the species present in the metagenome to allow for detailed investigations. Accumulibacter accounted for only 4.8% of all bacteria by qFISH, but the depth of sequencing enabled detailed insight into their microdiversity in the full-scale plant. Only 15% of the reads matching Accumulibacter had a high similarity (>95%) to the sequenced Accumulibacter clade IIA strain UW-1 genome, indicating the presence of some microdiversity. The differences in gene complement between the Accumulibacter clades were limited to genes for extracellular polymeric substances and phage-related genes, suggesting a selective pressure from phages on the Accumulibacter diversity. 相似文献

18.

Substantial biases in ultra-short read data sets from high-throughput DNA sequencing 总被引：7，自引：1，他引：6

Dohm JC Lottaz C Borodina T Himmelbauer H 《Nucleic acids research》2008,36(16):e105

相似文献

19.

Detection of Genomic Variation by Selection of a 9 Mb DNA Region and High Throughput Sequencing

Sergey I. Nikolaev Christian Iseli Andrew J. Sharp Daniel Robyr Jacques Rougemont Corinne Gehrig Laurent Farinelli Stylianos E. Antonarakis 《PloS one》2009,4(8)

Detection of the rare polymorphisms and causative mutations of genetic diseases in a targeted genomic area has become a major goal in order to understand genomic and phenotypic variability. We have interrogated repeat-masked regions of 8.9 Mb on human chromosomes 21 (7.8 Mb) and 7 (1.1 Mb) from an individual from the International HapMap Project (NA12872). We have optimized a method of genomic selection for high throughput sequencing. Microarray-based selection and sequencing resulted in 260-fold enrichment, with 41% of reads mapping to the target region. 83% of SNPs in the targeted region had at least 4-fold sequence coverage and 54% at least 15-fold. When assaying HapMap SNPs in NA12872, our sequence genotypes are 91.3% concordant in regions with coverage≥4-fold, and 97.9% concordant in regions with coverage≥15-fold. About 81% of the SNPs recovered with both thresholds are listed in dbSNP. We observed that regions with low sequence coverage occur in close proximity to low-complexity DNA. Validation experiments using Sanger sequencing were performed for 46 SNPs with 15-20 fold coverage, with a confirmation rate of 96%, suggesting that DNA selection provides an accurate and cost-effective method for identifying rare genomic variants. 相似文献

20.

Ethnographic study of incidence and severity of intravenous drug errors

Katja Taxis Nick Barber 《BMJ (Clinical research ed.)》2003,326(7391):684

ObjectivesTo determine the incidence and clinical importance of errors in the preparation and administration of intravenous drugs and the stages of the process in which errors occur.DesignProspective ethnographic study using disguised observation.ParticipantsNurses who prepared and administered intravenous drugs.Setting10 wards in a teaching and non-teaching hospital in the United Kingdom.Results249 errors were identified. At least one error occurred in 212 out of 430 intravenous drug doses (49%, 95% confidence interval 45% to 54%). Three doses (1%) had potentially severe errors, 126 (29%) potentially moderate errors, and 83 (19%) potentially minor errors. Most errors occurred when giving bolus doses or making up drugs that required multiple step preparation.ConclusionsThe rate of intravenous drug errors was high. Although most errors would cause only short term adverse effects, a few could have been serious. A combination of reducing the amount of preparation on the ward, training, and technology to administer slow bolus doses would probably have the greatest effect on error rates.

What is already known on this topic

Errors in preparing and administering intravenous drugs can cause considerable harm to patientsReduction of drug errors is a government health target in the United Kingdom and the United States

What this study adds

Errors occurred in about half of the intravenous drug doses observedErrors were potentially harmful in about a third of casesThe most common errors were giving bolus doses too quickly and mistakes in preparing drugs that required multiple steps 相似文献