期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Network-Based Isoform Quantification with RNA-Seq Data for Cancer Transcriptome Analysis

Wei Zhang Jae-Woong Chang Lilong Lin Kay Minn Baolin Wu Jeremy Chien Jeongsik Yong Hui Zheng Rui Kuang 《PLoS computational biology》2015,11(12)

相似文献

2.

SNPAAMapper: An efficient genome-wide SNP variant analysis pipeline for next-generation sequencing data

Yongsheng Bai James Cavalcoli 《Bioinformation》2013,9(17):870-872

相似文献

3.

Probabilistic error correction for RNA sequencing

Hai-Son Le Marcel H. Schulz Brenna M. McCauley Veronica F. Hinman Ziv Bar-Joseph 《Nucleic acids research》2013,41(10):e109

相似文献

4.

TREAT: a bioinformatics tool for variant annotations and visualizations in targeted and exome sequencing data

Asmann YW Middha S Hossain A Baheti S Li Y Chai HS Sun Z Duffy PH Hadad AA Nair A Liu X Zhang Y Klee EW Kalari KR Kocher JP 《Bioinformatics (Oxford, England)》2012,28(2):277-278

TREAT (Targeted RE-sequencing Annotation Tool) is a tool for facile navigation and mining of the variants from both targeted resequencing and whole exome sequencing. It provides a rich integration of publicly available as well as in-house developed annotations and visualizations for variants, variant-hosting genes and host-gene pathways. AVAILABILITY AND IMPLEMENTATION: TREAT is freely available to non-commercial users as either a stand-alone annotation and visualization tool, or as a comprehensive workflow integrating sequencing alignment and variant calling. The executables, instructions and the Amazon Cloud Images of TREAT can be downloaded at the website: http://ndc.mayo.edu/mayo/research/biostat/stand-alone-packages.cfm. 相似文献

5.

From Gigabyte to Kilobyte: A Bioinformatics Protocol for Mining Large RNA-Seq Transcriptomics Data

Jilong Li Jie Hou Lin Sun Jordan Maximillian Wilkins Yuan Lu Chad E. Niederhuth Benjamin Ryan Merideth Thomas P. Mawhinney Valeri V. Mossine C. Michael Greenlief John C. Walker William R. Folk Mark Hannink Dennis B. Lubahn James A. Birchler Jianlin Cheng 《PloS one》2015,10(4)

RNA-Seq techniques generate hundreds of millions of short RNA reads using next-generation sequencing (NGS). These RNA reads can be mapped to reference genomes to investigate changes of gene expression but improved procedures for mining large RNA-Seq datasets to extract valuable biological knowledge are needed. RNAMiner—a multi-level bioinformatics protocol and pipeline—has been developed for such datasets. It includes five steps: Mapping RNA-Seq reads to a reference genome, calculating gene expression values, identifying differentially expressed genes, predicting gene functions, and constructing gene regulatory networks. To demonstrate its utility, we applied RNAMiner to datasets generated from Human, Mouse, Arabidopsis thaliana, and Drosophila melanogaster cells, and successfully identified differentially expressed genes, clustered them into cohesive functional groups, and constructed novel gene regulatory networks. The RNAMiner web service is available at http://calla.rnet.missouri.edu/rnaminer/index.html. 相似文献

6.

Identification of indels in next-generation sequencing data

Aakrosh Ratan Thomas L Olson Thomas P Loughran Jr Webb Miller 《BMC bioinformatics》2015,16(1)

Background

The discovery and mapping of genomic variants is an essential step in most analysis done using sequencing reads. There are a number of mature software packages and associated pipelines that can identify single nucleotide polymorphisms (SNPs) with a high degree of concordance. However, the same cannot be said for tools that are used to identify the other types of variants. Indels represent the second most frequent class of variants in the human genome, after single nucleotide polymorphisms. The reliable detection of indels is still a challenging problem, especially for variants that are longer than a few bases.

Results

We have developed a set of algorithms and heuristics collectively called indelMINER to identify indels from whole genome resequencing datasets using paired-end reads. indelMINER uses a split-read approach to identify the precise breakpoints for indels of size less than a user specified threshold, and supplements that with a paired-end approach to identify larger variants that are frequently missed with the split-read approach. We use simulated and real datasets to show that an implementation of the algorithm performs favorably when compared to several existing tools.

Conclusions

indelMINER can be used effectively to identify indels in whole-genome resequencing projects. The output is provided in the VCF format along with additional information about the variant, including information about its presence or absence in another sample. The source code and documentation for indelMINER can be freely downloaded from www.bx.psu.edu/miller_lab/indelMINER.tar.gz.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0483-6) contains supplementary material, which is available to authorized users. 相似文献

7.

GBSA: a comprehensive software for analysing whole genome bisulfite sequencing data

Touati Benoukraf Sarawut Wongphayak Luqman Hakim Abdul Hadi Mengchu Wu Richie Soong 《Nucleic acids research》2013,41(4):e55

High-throughput sequencing is increasingly being used in combination with bisulfite (BS) assays to study DNA methylation at nucleotide resolution. Although several programmes provide genome-wide alignment of BS-treated reads, the resulting information is not readily interpretable and often requires further bioinformatic steps for meaningful analysis. Current post-alignment BS-sequencing programmes are generally focused on the gene-specific level, a restrictive feature when analysis in the non-coding regions, such as enhancers and intergenic microRNAs, is required. Here, we present Genome Bisulfite Sequencing Analyser (GBSA—http://ctrad-csi.nus.edu.sg/gbsa), a free open-source software capable of analysing whole-genome bisulfite sequencing data with either a gene-centric or gene-independent focus. Through analysis of the largest published data sets to date, we demonstrate GBSA’s features in providing sequencing quality assessment, methylation scoring, functional data management and visualization of genomic methylation at nucleotide resolution. Additionally, we show that GBSA’s output can be easily integrated with other high-throughput sequencing data, such as RNA-Seq or ChIP-seq, to elucidate the role of methylated intergenic regions in gene regulation. In essence, GBSA allows an investigator to explore not only known loci but also all the genomic regions, for which methylation studies could lead to the discovery of new regulatory mechanisms. 相似文献

8.

SHEAR: sample heterogeneity estimation and assembly by reference

Sean R Landman Tae Hyun Hwang Kevin AT Silverstein Yingming Li Scott M Dehm Michael Steinbach Vipin Kumar 《BMC genomics》2014,15(1)

Background

Personal genome assembly is a critical process when studying tumor genomes and other highly divergent sequences. The accuracy of downstream analyses, such as RNA-seq and ChIP-seq, can be greatly enhanced by using personal genomic sequences rather than standard references. Unfortunately, reads sequenced from these types of samples often have a heterogeneous mix of various subpopulations with different variants, making assembly extremely difficult using existing assembly tools. To address these challenges, we developed SHEAR (Sample Heterogeneity Estimation and Assembly by Reference; http://vk.cs.umn.edu/SHEAR), a tool that predicts SVs, accounts for heterogeneous variants by estimating their representative percentages, and generates personal genomic sequences to be used for downstream analysis.

Results

By making use of structural variant detection algorithms, SHEAR offers improved performance in the form of a stronger ability to handle difficult structural variant types and better computational efficiency. We compare against the lead competing approach using a variety of simulated scenarios as well as real tumor cell line data with known heterogeneous variants. SHEAR is shown to successfully estimate heterogeneity percentages in both cases, and demonstrates an improved efficiency and better ability to handle tandem duplications.

Conclusion

SHEAR allows for accurate and efficient SV detection and personal genomic sequence generation. It is also able to account for heterogeneous sequencing samples, such as from tumor tissue, by estimating the subpopulation percentage for each heterogeneous variant.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-84) contains supplementary material, which is available to authorized users. 相似文献

9.

SNPAAMapperT2K: A genome-wide SNP downstream analysis and annotation pipeline for species annotated with NCBI.tbl data files

Yongsheng Bai 《Bioinformation》2014,10(11):711-715

相似文献

10.

Vindel: a simple pipeline for checking indel redundancy

Zhiyi Li Xiaowei Wu Bin He Liqing Zhang 《BMC bioinformatics》2014,15(1)

Background

With the advance of next generation sequencing (NGS) technologies, a large number of insertion and deletion (indel) variants have been identified in human populations. Despite much research into variant calling, it has been found that a non-negligible proportion of the identified indel variants might be false positives due to sequencing errors, artifacts caused by ambiguous alignments, and annotation errors.

Results

In this paper, we examine indel redundancy in dbSNP, one of the central databases for indel variants, and develop a standalone computational pipeline, dubbed Vindel, to detect redundant indels. The pipeline first applies indel position information to form candidate redundant groups, then performs indel mutations to the reference genome to generate corresponding indel variant substrings. Finally the indel variant substrings in the same candidate redundant groups are compared in a pairwise fashion to identify redundant indels. We applied our pipeline to check for redundancy in the human indels in dbSNP. Our pipeline identified approximately 8% redundancy in insertion type indels, 12% in deletion type indels, and overall 10% for insertions and deletions combined. These numbers are largely consistent across all human autosomes. We also investigated indel size distribution and adjacent indel distance distribution for a better understanding of the mechanisms generating indel variants.

Conclusions

Vindel, a simple yet effective computational pipeline, can be used to check whether a set of indels are redundant with respect to those already in the database of interest such as NCBI’s dbSNP. Of the approximately 5.9 million indels we examined, nearly 0.6 million are redundant, revealing a serious limitation in the current indel annotation. Statistics results prove the consistency of the pipeline on indel redundancy detection for all 22 chromosomes. Apart from the standalone Vindel pipeline, the indel redundancy check algorithm is also implemented in the web server http://bioinformatics.cs.vt.edu/zhanglab/indelRedundant.php.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-014-0359-1) contains supplementary material, which is available to authorized users. 相似文献

11.

ViVar: A Comprehensive Platform for the Analysis and Visualization of Structural Genomic Variation

Tom Sante Sarah Vergult Pieter-Jan Volders Wigard P. Kloosterman Geert Trooskens Katleen De Preter Annelies Dheedene Frank Speleman Tim De Meyer Bj?rn Menten 《PloS one》2014,9(12)

Structural genomic variations play an important role in human disease and phenotypic diversity. With the rise of high-throughput sequencing tools, mate-pair/paired-end/single-read sequencing has become an important technique for the detection and exploration of structural variation. Several analysis tools exist to handle different parts and aspects of such sequencing based structural variation analyses pipelines. A comprehensive analysis platform to handle all steps, from processing the sequencing data, to the discovery and visualization of structural variants, is missing. The ViVar platform is built to handle the discovery of structural variants, from Depth Of Coverage analysis, aberrant read pair clustering to split read analysis. ViVar provides you with powerful visualization options, enables easy reporting of results and better usability and data management. The platform facilitates the processing, analysis and visualization, of structural variation based on massive parallel sequencing data, enabling the rapid identification of disease loci or genes. ViVar allows you to scale your analysis with your work load over multiple (cloud) servers, has user access control to keep your data safe and is easy expandable as analysis techniques advance. URL: https://www.cmgg.be/vivar/ 相似文献

12.

Gene Expression,Single Nucleotide Variant and Fusion Transcript Discovery in Archival Material from Breast Tumors

Nadine Norton Zhifu Sun Yan W. Asmann Daniel J. Serie Brian M. Necela Aditya Bhagwate Jin Jen Bruce W. Eckloff Krishna R. Kalari Kevin J. Thompson Jennifer M. Carr Jennifer M. Kachergus Xochiquetzal J. Geiger Edith A. Perez E. Aubrey Thompson 《PloS one》2013,8(11)

相似文献

13.

MutAid: Sanger and NGS Based Integrated Pipeline for Mutation Identification,Validation and Annotation in Human Molecular Genetics

Ram Vinay Pandey Stephan Pabinger Albert Kriegner Andreas Weinh?usel 《PloS one》2016,11(2)

Traditional Sanger sequencing as well as Next-Generation Sequencing have been used for the identification of disease causing mutations in human molecular research. The majority of currently available tools are developed for research and explorative purposes and often do not provide a complete, efficient, one-stop solution. As the focus of currently developed tools is mainly on NGS data analysis, no integrative solution for the analysis of Sanger data is provided and consequently a one-stop solution to analyze reads from both sequencing platforms is not available. We have therefore developed a new pipeline called MutAid to analyze and interpret raw sequencing data produced by Sanger or several NGS sequencing platforms. It performs format conversion, base calling, quality trimming, filtering, read mapping, variant calling, variant annotation and analysis of Sanger and NGS data under a single platform. It is capable of analyzing reads from multiple patients in a single run to create a list of potential disease causing base substitutions as well as insertions and deletions. MutAid has been developed for expert and non-expert users and supports four sequencing platforms including Sanger, Illumina, 454 and Ion Torrent. Furthermore, for NGS data analysis, five read mappers including BWA, TMAP, Bowtie, Bowtie2 and GSNAP and four variant callers including GATK-HaplotypeCaller, SAMTOOLS, Freebayes and VarScan2 pipelines are supported. MutAid is freely available at https://sourceforge.net/projects/mutaid. 相似文献

14.

FamPipe: An Automatic Analysis Pipeline for Analyzing Sequencing Data in Families for Disease Studies

Ren-Hua Chung Wei-Yun Tsai Chen-Yu Kang Po-Ju Yao Hui-Ju Tsai Chia-Hsiang Chen 《PLoS computational biology》2016,12(6)

In disease studies, family-based designs have become an attractive approach to analyzing next-generation sequencing (NGS) data for the identification of rare mutations enriched in families. Substantial research effort has been devoted to developing pipelines for automating sequence alignment, variant calling, and annotation. However, fewer pipelines have been designed specifically for disease studies. Most of the current analysis pipelines for family-based disease studies using NGS data focus on a specific function, such as identifying variants with Mendelian inheritance or identifying shared chromosomal regions among affected family members. Consequently, some other useful family-based analysis tools, such as imputation, linkage, and association tools, have yet to be integrated and automated. We developed FamPipe, a comprehensive analysis pipeline, which includes several family-specific analysis modules, including the identification of shared chromosomal regions among affected family members, prioritizing variants assuming a disease model, imputation of untyped variants, and linkage and association tests. We used simulation studies to compare properties of some modules implemented in FamPipe, and based on the results, we provided suggestions for the selection of modules to achieve an optimal analysis strategy. The pipeline is under the GNU GPL License and can be downloaded for free at http://fampipe.sourceforge.net.

This is a PLOS Computational Biology Software article.

相似文献

15.

Integrating Multiple Genomic Data to Predict Disease-Causing Nonsynonymous Single Nucleotide Variants in Exome Sequencing Studies

Jiaxin Wu Yanda Li Rui Jiang 《PLoS genetics》2014,10(3)

Exome sequencing has been widely used in detecting pathogenic nonsynonymous single nucleotide variants (SNVs) for human inherited diseases. However, traditional statistical genetics methods are ineffective in analyzing exome sequencing data, due to such facts as the large number of sequenced variants, the presence of non-negligible fraction of pathogenic rare variants or de novo mutations, and the limited size of affected and normal populations. Indeed, prevalent applications of exome sequencing have been appealing for an effective computational method for identifying causative nonsynonymous SNVs from a large number of sequenced variants. Here, we propose a bioinformatics approach called SPRING (Snv PRioritization via the INtegration of Genomic data) for identifying pathogenic nonsynonymous SNVs for a given query disease. Based on six functional effect scores calculated by existing methods (SIFT, PolyPhen2, LRT, MutationTaster, GERP and PhyloP) and five association scores derived from a variety of genomic data sources (gene ontology, protein-protein interactions, protein sequences, protein domain annotations and gene pathway annotations), SPRING calculates the statistical significance that an SNV is causative for a query disease and hence provides a means of prioritizing candidate SNVs. With a series of comprehensive validation experiments, we demonstrate that SPRING is valid for diseases whose genetic bases are either partly known or completely unknown and effective for diseases with a variety of inheritance styles. In applications of our method to real exome sequencing data sets, we show the capability of SPRING in detecting causative de novo mutations for autism, epileptic encephalopathies and intellectual disability. We further provide an online service, the standalone software and genome-wide predictions of causative SNVs for 5,080 diseases at http://bioinfo.au.tsinghua.edu.cn/spring. 相似文献

16.

One Size Doesn't Fit All - RefEditor: Building Personalized Diploid Reference Genome to Improve Read Mapping and Genotype Calling in Next Generation Sequencing Studies

Shuai Yuan H. Richard Johnston Guosheng Zhang Yun Li Yi-Juan Hu Zhaohui S. Qin 《PLoS computational biology》2015,11(8)

With rapid decline of the sequencing cost, researchers today rush to embrace whole genome sequencing (WGS), or whole exome sequencing (WES) approach as the next powerful tool for relating genetic variants to human diseases and phenotypes. A fundamental step in analyzing WGS and WES data is mapping short sequencing reads back to the reference genome. This is an important issue because incorrectly mapped reads affect the downstream variant discovery, genotype calling and association analysis. Although many read mapping algorithms have been developed, the majority of them uses the universal reference genome and do not take sequence variants into consideration. Given that genetic variants are ubiquitous, it is highly desirable if they can be factored into the read mapping procedure. In this work, we developed a novel strategy that utilizes genotypes obtained a priori to customize the universal haploid reference genome into a personalized diploid reference genome. The new strategy is implemented in a program named RefEditor. When applying RefEditor to real data, we achieved encouraging improvements in read mapping, variant discovery and genotype calling. Compared to standard approaches, RefEditor can significantly increase genotype calling consistency (from 43% to 61% at 4X coverage; from 82% to 92% at 20X coverage) and reduce Mendelian inconsistency across various sequencing depths. Because many WGS and WES studies are conducted on cohorts that have been genotyped using array-based genotyping platforms previously or concurrently, we believe the proposed strategy will be of high value in practice, which can also be applied to the scenario where multiple NGS experiments are conducted on the same cohort. The RefEditor sources are available at https://github.com/superyuan/refeditor.

This is a PLOS Computational Biology Software Article.

相似文献

17.

Genome Modeling System: A Knowledge Management Platform for Genomics

《PLoS computational biology》2015,11(7)

相似文献

18.

GobyWeb: Simplified Management and Analysis of Gene Expression and DNA Methylation Sequencing Data

Kevin C. Dorff Nyasha Chambwe Zachary Zeno Manuele Simi Rita Shaknovich Fabien Campagne 《PloS one》2013,8(7)

We present GobyWeb, a web-based system that facilitates the management and analysis of high-throughput sequencing (HTS) projects. The software provides integrated support for a broad set of HTS analyses and offers a simple plugin extension mechanism. Analyses currently supported include quantification of gene expression for messenger and small RNA sequencing, estimation of DNA methylation (i.e., reduced bisulfite sequencing and whole genome methyl-seq), or the detection of pathogens in sequenced data. In contrast to previous analysis pipelines developed for analysis of HTS data, GobyWeb requires significantly less storage space, runs analyses efficiently on a parallel grid, scales gracefully to process tens or hundreds of multi-gigabyte samples, yet can be used effectively by researchers who are comfortable using a web browser. We conducted performance evaluations of the software and found it to either outperform or have similar performance to analysis programs developed for specialized analyses of HTS data. We found that most biologists who took a one-hour GobyWeb training session were readily able to analyze RNA-Seq data with state of the art analysis tools. GobyWeb can be obtained at http://gobyweb.campagnelab.org and is freely available for non-commercial use. GobyWeb plugins are distributed in source code and licensed under the open source LGPL3 license to facilitate code inspection, reuse and independent extensions http://github.com/CampagneLaboratory/gobyweb2-plugins. 相似文献

19.

DsTRD: Danshen Transcriptional Resource Database

Yuxuan Shao Jiabo Wei Fangli Wu Haihua Zhang Dongfeng Yang Zongsuo Liang Weibo Jin 《PloS one》2016,11(2)

相似文献

20.

miRspring: a compact standalone research tool for analyzing miRNA-seq data

David T. Humphreys Catherine M. Suter 《Nucleic acids research》2013,41(15):e147

High-throughput sequencing for microRNA (miRNA) profiling has revealed a vast complexity of miRNA processing variants, but these are difficult to discern for those without bioinformatics expertise and large computing capability. In this article, we present miRNA Sequence Profiling (miRspring) (http://mirspring.victorchang.edu.au), a software solution that creates a small portable research document that visualizes, calculates and reports on the complexities of miRNA processing. We designed an index-compression algorithm that allows the miRspring document to reproduce a complete miRNA sequence data set while retaining a small file size (typically <3 MB). Through analysis of 73 public data sets, we demonstrate miRspring’s features in assessing quality parameters, miRNA cluster expression levels and miRNA processing. Additionally, we report on a new class of miRNA variants, which we term seed-isomiRs, identified through the novel visualization tools of the miRspring document. Further investigation identified that ∼30% of human miRBase entries are likely to have a seed-isomiR. We believe that miRspring will be a highly useful research tool that will enhance the analysis of miRNA data sets and thus increase our understanding of miRNA biology. 相似文献