期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Improving PacBio Long Read Accuracy by Short Read Alignment

Kin Fai Au Jason G. Underwood Lawrence Lee Wing Hung Wong 《PloS one》2012,7(10)

The recent development of third generation sequencing (TGS) generates much longer reads than second generation sequencing (SGS) and thus provides a chance to solve problems that are difficult to study through SGS alone. However, higher raw read error rates are an intrinsic drawback in most TGS technologies. Here we present a computational method, LSC, to perform error correction of TGS long reads (LR) by SGS short reads (SR). Aiming to reduce the error rate in homopolymer runs in the main TGS platform, the PacBio® RS, LSC applies a homopolymer compression (HC) transformation strategy to increase the sensitivity of SR-LR alignment without scarifying alignment accuracy. We applied LSC to 100,000 PacBio long reads from human brain cerebellum RNA-seq data and 64 million single-end 75 bp reads from human brain RNA-seq data. The results show LSC can correct PacBio long reads to reduce the error rate by more than 3 folds. The improved accuracy greatly benefits many downstream analyses, such as directional gene isoform detection in RNA-seq study. Compared with another hybrid correction tool, LSC can achieve over double the sensitivity and similar specificity. 相似文献

2.

Hybrid De Novo Genome Assembly Using MiSeq and SOLiD Short Read Data

Tsutomu Ikegami Toyohiro Inatsugi Isao Kojima Myco Umemura Hiroko Hagiwara Masayuki Machida Kiyoshi Asai 《PloS one》2015,10(4)

A hybrid de novo assembly pipeline was constructed to utilize both MiSeq and SOLiD short read data in combination in the assembly. The short read data were converted to a standard format of the pipeline, and were supplied to the pipeline components such as ABySS and SOAPdenovo. The assembly pipeline proceeded through several stages, and either MiSeq paired-end data, SOLiD mate-paired data, or both of them could be specified as input data at each stage separately. The pipeline was examined on the filamentous fungus Aspergillus oryzae RIB40, by aligning the assembly results against the reference sequences. Using both the MiSeq and the SOLiD data in the hybrid assembly, the alignment length was improved by a factor of 3 to 8, compared with the assemblies using either one of the data types. The number of the reproduced gene cluster regions encoding secondary metabolite biosyntheses (SMB) was also improved by the hybrid assemblies. These results imply that the MiSeq data with long read length are essential to construct accurate nucleotide sequences, while the SOLiD mate-paired reads with long insertion length enhance long-range arrangements of the sequences. The pipeline was also tested on the actinomycete Streptomyces avermitilis MA-4680, whose gene is known to have high-GC content. Although the quality of the SOLiD reads was too low to perform any meaningful assemblies by themselves, the alignment length to the reference was improved by a factor of 2, compared with the assembly using only the MiSeq data. 相似文献

3.

Unlocking Short Read Sequencing for Metagenomics

Sébastien Rodrigue Arne C. Materna Sonia C. Timberlake Matthew C. Blackburn Rex R. Malmstrom Eric J. Alm Sallie W. Chisholm 《PloS one》2010,5(7)

Background

Different high-throughput nucleic acid sequencing platforms are currently available but a trade-off currently exists between the cost and number of reads that can be generated versus the read length that can be achieved.

Methodology/Principal Findings

We describe an experimental and computational pipeline yielding millions of reads that can exceed 200 bp with quality scores approaching that of traditional Sanger sequencing. The method combines an automatable gel-less library construction step with paired-end sequencing on a short-read instrument. With appropriately sized library inserts, mate-pair sequences can overlap, and we describe the SHERA software package that joins them to form a longer composite read.

Conclusions/Significance

This strategy is broadly applicable to sequencing applications that benefit from low-cost high-throughput sequencing, but require longer read lengths. We demonstrate that our approach enables metagenomic analyses using the Illumina Genome Analyzer, with low error rates, and at a fraction of the cost of pyrosequencing. 相似文献

4.

Reference-Free Comparative Genomics of 174 Chloroplasts

Chai-Shian Kua Jue Ruan John Harting Cheng-Xi Ye Matthew R. Helmus Jun Yu Charles H. Cannon 《PloS one》2012,7(11)

Direct analysis of unassembled genomic data could greatly increase the power of short read DNA sequencing technologies and allow comparative genomics of organisms without a completed reference available. Here, we compare 174 chloroplasts by analyzing the taxanomic distribution of short kmers across genomes . We then assemble de novo contigs centered on informative variation. The localized de novo contigs can be separated into two major classes: tip = unique to a single genome and group = shared by a subset of genomes. Prior to assembly, we found that ∼18% of the chloroplast was duplicated in the inverted repeat (IR) region across a four-fold difference in genome sizes, from a highly reduced parasitic orchid to a massive algal chloroplast , including gnetophytes and cycads . The conservation of this ratio between single copy and duplicated sequence was basal among green plants, independent of photosynthesis and mechanism of genome size change, and different in gymnosperms and lower plants. Major lineages in the angiosperm clade differed in the pattern of shared kmers and de novo contigs. For example, parasitic plants demonstrated an expected accelerated overall rate of evolution, while the hemi-parasitic genomes contained a great deal more novel sequence than holo-parasitic plants, suggesting different mechanisms at different stages of genomic contraction. Additionally, the legumes are diverging more quickly and in different ways than other major families. Small duplicated fragments of the rrn23 genes were deeply conserved among seed plants, including among several species without the IR regions, indicating a crucial functional role of this duplication. Localized de novo assembly of informative kmers greatly reduces the complexity of large comparative analyses by confining the analysis to a small partition of data and genomes relevant to the specific question, allowing direct analysis of next-gen sequence data from previously unstudied genomes and rapid discovery of informative candidate regions. 相似文献

5.

CGAP-Align: A High Performance DNA Short Read Alignment Tool

Yaoliang Chen Ji Hong Wanyun Cui Jacques Zaneveld Wei Wang Richard Gibbs Yanghua Xiao Rui Chen 《PloS one》2013,8(4)

Background

Next generation sequencing platforms have greatly reduced sequencing costs, leading to the production of unprecedented amounts of sequence data. BWA is one of the most popular alignment tools due to its relatively high accuracy. However, mapping reads using BWA is still the most time consuming step in sequence analysis. Increasing mapping efficiency would allow the community to better cope with ever expanding volumes of sequence data.

Results

We designed a new program, CGAP-align, that achieves a performance improvement over BWA without sacrificing recall or precision. This is accomplished through the use of Suffix Tarray, a novel data structure combining elements of Suffix Array and Suffix Tree. We also utilize a tighter lower bound estimation for the number of mismatches in a read, allowing for more effective pruning during inexact mapping. Evaluation of both simulated and real data suggests that CGAP-align consistently outperforms the current version of BWA and can achieve over twice its speed under certain conditions, all while obtaining nearly identical results.

Conclusion

CGAP-align is a new time efficient read alignment tool that extends and improves BWA. The increase in alignment speed will be of critical assistance to all sequence-based research and medicine. CGAP-align is freely available to the academic community at http://sourceforge.net/p/cgap-align under the GNU General Public License (GPL). 相似文献

6.

SRComp: Short Read Sequence Compression Using Burstsort and Elias Omega Coding

Jeremy John Selva Xin Chen 《PloS one》2013,8(12)

Next-generation sequencing (NGS) technologies permit the rapid production of vast amounts of data at low cost. Economical data storage and transmission hence becomes an increasingly important challenge for NGS experiments. In this paper, we introduce a new non-reference based read sequence compression tool called SRComp. It works by first employing a fast string-sorting algorithm called burstsort to sort read sequences in lexicographical order and then Elias omega-based integer coding to encode the sorted read sequences. SRComp has been benchmarked on four large NGS datasets, where experimental results show that it can run 5–35 times faster than current state-of-the-art read sequence compression tools such as BEETL and SCALCE, while retaining comparable compression efficiency for large collections of short read sequences. SRComp is a read sequence compression tool that is particularly valuable in certain applications where compression time is of major concern. 相似文献

7.

RNA-Seq Analysis and Gene Discovery of Andrias davidianus Using Illumina Short Read Sequencing

Fenggang Li Lixin Wang Qingjing Lan Hui Yang Yang Li Xiaolin Liu Zhaoxia Yang 《PloS one》2015,10(4)

相似文献

8.

HLA Haplotyping from RNA-seq Data Using Hierarchical Read Weighting

Hyunsung John Kim Nader Pourmand 《PloS one》2013,8(6)

相似文献

9.

DistMap: A Toolkit for Distributed Short Read Mapping on a Hadoop Cluster

Ram Vinay Pandey Christian Schl?tterer 《PloS one》2013,8(8)

With the rapid and steady increase of next generation sequencing data output, the mapping of short reads has become a major data analysis bottleneck. On a single computer, it can take several days to map the vast quantity of reads produced from a single Illumina HiSeq lane. In an attempt to ameliorate this bottleneck we present a new tool, DistMap - a modular, scalable and integrated workflow to map reads in the Hadoop distributed computing framework. DistMap is easy to use, currently supports nine different short read mapping tools and can be run on all Unix-based operating systems. It accepts reads in FASTQ format as input and provides mapped reads in a SAM/BAM format. DistMap supports both paired-end and single-end reads thereby allowing the mapping of read data produced by different sequencing platforms. DistMap is available from http://code.google.com/p/distmap/ 相似文献

10.

Reference-Free Assessment of Speech Intelligibility Using Bispectrum of an Auditory Neurogram

Mohammad E. Hossain Wissam A. Jassim Muhammad S. A. Zilany 《PloS one》2016,11(3)

Sensorineural hearing loss occurs due to damage to the inner and outer hair cells of the peripheral auditory system. Hearing loss can cause decreases in audibility, dynamic range, frequency and temporal resolution of the auditory system, and all of these effects are known to affect speech intelligibility. In this study, a new reference-free speech intelligibility metric is proposed using 2-D neurograms constructed from the output of a computational model of the auditory periphery. The responses of the auditory-nerve fibers with a wide range of characteristic frequencies were simulated to construct neurograms. The features of the neurograms were extracted using third-order statistics referred to as bispectrum. The phase coupling of neurogram bispectrum provides a unique insight for the presence (or deficit) of supra-threshold nonlinearities beyond audibility for listeners with normal hearing (or hearing loss). The speech intelligibility scores predicted by the proposed method were compared to the behavioral scores for listeners with normal hearing and hearing loss both in quiet and under noisy background conditions. The results were also compared to the performance of some existing methods. The predicted results showed a good fit with a small error suggesting that the subjective scores can be estimated reliably using the proposed neural-response-based metric. The proposed metric also had a wide dynamic range, and the predicted scores were well-separated as a function of hearing loss. The proposed metric successfully captures the effects of hearing loss and supra-threshold nonlinearities on speech intelligibility. This metric could be applied to evaluate the performance of various speech-processing algorithms designed for hearing aids and cochlear implants. 相似文献

11.

Correction: HLA Haplotyping from RNA-seq Data Using Hierarchical Read Weighting

The PLOS ONE Staff 《PloS one》2014,9(7)

相似文献

12.

An Extensive Evaluation of Read Trimming Effects on Illumina NGS Data Analysis

Cristian Del Fabbro Simone Scalabrin Michele Morgante Federico M. Giorgi 《PloS one》2013,8(12)

Next Generation Sequencing is having an extremely strong impact in biological and medical research and diagnostics, with applications ranging from gene expression quantification to genotyping and genome reconstruction. Sequencing data is often provided as raw reads which are processed prior to analysis 1 of the most used preprocessing procedures is read trimming, which aims at removing low quality portions while preserving the longest high quality part of a NGS read. In the current work, we evaluate nine different trimming algorithms in four datasets and three common NGS-based applications (RNA-Seq, SNP calling and genome assembly). Trimming is shown to increase the quality and reliability of the analysis, with concurrent gains in terms of execution time and computational resources needed. 相似文献

13.

A Streamlined Method for Detecting Structural Variants in Cancer Genomes by Short Read Paired-End Sequencing

Martina Miju?kovi? Stuart M. Brown Zuojian Tang Cory R. Lindsay Efstratios Efstathiadis Ludovic Deriano David B. Roth 《PloS one》2012,7(10)

Defining the architecture of a specific cancer genome, including its structural variants, is essential for understanding tumor biology, mechanisms of oncogenesis, and for designing effective personalized therapies. Short read paired-end sequencing is currently the most sensitive method for detecting somatic mutations that arise during tumor development. However, mapping structural variants using this method leads to a large number of false positive calls, mostly due to the repetitive nature of the genome and the difficulty of assigning correct mapping positions to short reads. This study describes a method to efficiently identify large tumor-specific deletions, inversions, duplications and translocations from low coverage data using SVDetect or BreakDancer software and a set of novel filtering procedures designed to reduce false positive calls. Applying our method to a spontaneous T cell lymphoma arising in a core RAG2/p53-deficient mouse, we identified 40 validated tumor-specific structural rearrangements supported by as few as 2 independent read pairs. 相似文献

14.

Short of the mark: Paediatric Data Interpretation

Douglas Matsell 《CMAJ》1988,138(12):1123-1124

相似文献

15.

Optimization of De Novo Short Read Assembly of Seabuckthorn (Hippophae rhamnoides L.) Transcriptome

Rajesh Ghangal Saurabh Chaudhary Mukesh Jain Ram Singh Purty Prakash Chand Sharma 《PloS one》2013,8(8)

相似文献

16.

A Comprehensive Workflow for Read Depth-Based Identification of Copy-Number Variation from Whole-Genome Sequence Data

Brett Trost Susan Walker Zhuozhi Wang Bhooma Thiruvahindrapuram Jeffrey R. MacDonald Wilson W.L. Sung Sergio L. Pereira Joe Whitney Ada J.S. Chan Giovanna Pellecchia Miriam S. Reuter Si Lok Ryan K.C. Yuen Christian R. Marshall Daniele Merico Stephen W. Scherer 《American journal of human genetics》2018,102(1):142-155

相似文献

17.

Jenny Read

Jenny Read 《Current biology : CB》2019,29(4):R111-R112

相似文献

18.

Experimental Design-Based Functional Mining and Characterization of High-Throughput Sequencing Data in the Sequence Read Archive

Takeru Nakazato Tazro Ohta Hidemasa Bono 《PloS one》2013,8(10)

相似文献

19.

A Reference-Free Method for Brightness Compensation and Contrast Enhancement of Micrographs of Serial Sections

Shi-Jie Chang Shuo Li Arne Andreasen Xian-Zheng Sha Xiao-Yue Zhai 《PloS one》2015,10(5)

Three-dimensional (3D) reconstruction of an organ or tissue from a stack of histologic serial sections provides valuable morphological information. The procedure includes section preparation of the organ or tissue, micrographs acquisition, image registration, 3D reconstruction, and visualization. However, the brightness and contrast through the image stack may not be consistent due to imperfections in the staining procedure, which may cause difficulties in micro-structure identification using virtual sections, region segmentation, automatic target tracing, etc. In the present study, a reference-free method, Sequential Histogram Fitting Algorithm (SHFA), is therefore developed for adjusting the severe and irregular variance of brightness and contrast within the image stack. To apply the SHFA, the gray value histograms of individual images are first calculated over the entire image stack and a set of landmark gray values are chosen. Then the histograms are transformed so that there are no abrupt changes in progressing through the stack. Finally, the pixel gray values of the original images are transformed into the desired ones based on the relationship between the original and the transformed histograms. The SHFA is tested on an image stacks from mouse kidney sections stained with toluidine blue, and captured by a slide scanner. As results, the images through the entire stack reveal homogenous brightness and consistent contrast. In addition, subtle color differences in the tissue are well preserved so that the morphological details can be recognized, even in virtual sections. In conclusion, compared with the existing histogram-based methods, the present study provides a practical method suitable for compensating brightness, and improving contrast of images derived from a large number of serial sections of biological organ. 相似文献

20.

The Smartphone Addiction Scale: Development and Validation of a Short Version for Adolescents

Min Kwon Dai-Jin Kim Hyun Cho Soo Yang 《PloS one》2013,8(12)

Objective

This study was designed to investigate the revised and short version of the smartphone addiction scale and the proof of its validity in adolescents. In addition, it suggested cutting off the values by gender in order to determine smartphone addiction and elaborate the characteristics of smartphone usage in adolescents.

Method

A set of questionnaires were provided to a total of 540 selected participants from April to May of 2013. The participants consisted of 343 boys and 197 girls, and their average age was 14.5 years old. The content validity was performed on a selection of shortened items, while an internal-consistency test was conducted for the verification of its reliability. The concurrent validity was confirmed using SAS, SAPS and KS-scale. Receiver operating characteristics analysis was conducted to suggest cut-off.

Results

The 10 final questions were selected using content validity. The internal consistency and concurrent validity of SAS were verified with a Cronbach''s alpha of 0.911. The SAS-SV was significantly correlated with the SAS, SAPS and KS-scale. The SAS-SV scores of gender (p<.001) and self-evaluation of smartphone addiction (p<.001) showed significant difference. The ROC analysis results showed an area under a curve (AUC) value of 0.963(0.888–1.000), a cut-off value of 31, sensitivity value of 0.867 and specificity value of 0.893 in boys while an AUC value of 0.947(0.887–1.000), a cut-off value of 33, sensitivity value of 0.875, and a specificity value of 0.886 in girls.

Conclusions

The SAS-SV showed good reliability and validity for the assessment of smartphone addiction. The smartphone addiction scale short version, which was developed and validated in this study, could be used efficiently for the evaluation of smartphone addiction in community and research areas. 相似文献