共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
3.
4.
5.
6.
7.
Li Tai Fang Pegah Tootoonchi Afshar Aparna Chhibber Marghoob Mohiyuddin Yu Fan John C. Mu Greg Gibeling Sharon Barr Narges Bani Asadi Mark B. Gerstein Daniel C. Koboldt Wenyi Wang Wing H. Wong Hugo YK Lam 《Genome biology》2015,16(1)
SomaticSeq is an accurate somatic mutation detection pipeline implementing a stochastic boosting algorithm to produce highly accurate somatic mutation calls for both single nucleotide variants and small insertions and deletions. The workflow currently incorporates five state-of-the-art somatic mutation callers, and extracts over 70 individual genomic and sequencing features for each candidate site. A training set is provided to an adaptively boosted decision tree learner to create a classifier for predicting mutation statuses. We validate our results with both synthetic and real data. We report that SomaticSeq is able to achieve better overall accuracy than any individual tool incorporated.
Electronic supplementary material
The online version of this article (doi:10.1186/s13059-015-0758-2) contains supplementary material, which is available to authorized users. 相似文献8.
9.
Benjamin Lehne Alexander W Drong Marie Loh Weihua Zhang William R Scott Sian-Tsung Tan Uzma Afzal James Scott Marjo-Riitta Jarvelin Paul Elliott Mark I McCarthy Jaspal S Kooner John C Chambers 《Genome biology》2015,16(1)
DNA methylation plays a fundamental role in the regulation of the genome, but the optimal strategy for analysis of genome-wide DNA methylation data remains to be determined. We developed a comprehensive analysis pipeline for epigenome-wide association studies (EWAS) using the Illumina Infinium HumanMethylation450 BeadChip, based on 2,687 individuals, with 36 samples measured in duplicate. We propose new approaches to quality control, data normalisation and batch correction through control-probe adjustment and establish a null hypothesis for EWAS using permutation testing. Our analysis pipeline outperforms existing approaches, enabling accurate identification of methylation quantitative trait loci for hypothesis driven follow-up experiments.
Electronic supplementary material
The online version of this article (doi:10.1186/s13059-015-0600-x) contains supplementary material, which is available to authorized users. 相似文献10.
11.
12.
13.
Allyson L Byrd Joseph F Perez-Rogers Solaiappan Manimaran Eduardo Castro-Nallar Ian Toma Tim McCaffrey Marc Siegel Gary Benson Keith A Crandall William Evan Johnson 《BMC bioinformatics》2014,15(1)
Background
The use of sequencing technologies to investigate the microbiome of a sample can positively impact patient healthcare by providing therapeutic targets for personalized disease treatment. However, these samples contain genomic sequences from various sources that complicate the identification of pathogens.Results
Here we present Clinical PathoScope, a pipeline to rapidly and accurately remove host contamination, isolate microbial reads, and identify potential disease-causing pathogens. We have accomplished three essential tasks in the development of Clinical PathoScope. First, we developed an optimized framework for pathogen identification using a computational subtraction methodology in concordance with read trimming and ambiguous read reassignment. Second, we have demonstrated the ability of our approach to identify multiple pathogens in a single clinical sample, accurately identify pathogens at the subspecies level, and determine the nearest phylogenetic neighbor of novel or highly mutated pathogens using real clinical sequencing data. Finally, we have shown that Clinical PathoScope outperforms previously published pathogen identification methods with regard to computational speed, sensitivity, and specificity.Conclusions
Clinical PathoScope is the only pathogen identification method currently available that can identify multiple pathogens from mixed samples and distinguish between very closely related species and strains in samples with very few reads per pathogen. Furthermore, Clinical PathoScope does not rely on genome assembly and thus can more rapidly complete the analysis of a clinical sample when compared with current assembly-based methods. Clinical PathoScope is freely available at: http://sourceforge.net/projects/pathoscope/.Electronic supplementary material
The online version of this article (doi:10.1186/1471-2105-15-262) contains supplementary material, which is available to authorized users. 相似文献14.
15.
Background
In somatic cancer genomes, delineating genuine driver mutations against a background of multiple passenger events is a challenging task. The difficulty of determining function from sequence data and the low frequency of mutations are increasingly hindering the search for novel, less common cancer drivers. The accumulation of extensive amounts of data on somatic point and copy number alterations necessitates the development of systematic methods for driver mutation analysis.Results
We introduce a framework for detecting driver mutations via functional network analysis, which is applied to individual genomes and does not require pooling multiple samples. It probabilistically evaluates 1) functional network links between different mutations in the same genome and 2) links between individual mutations and known cancer pathways. In addition, it can employ correlations of mutation patterns in pairs of genes. The method was used to analyze genomic alterations in two TCGA datasets, one for glioblastoma multiforme and another for ovarian carcinoma, which were generated using different approaches to mutation profiling. The proportions of drivers among the reported de novo point mutations in these cancers were estimated to be 57.8% and 16.8%, respectively. The both sets also included extended chromosomal regions with synchronous duplications or losses of multiple genes. We identified putative copy number driver events within many such segments. Finally, we summarized seemingly disparate mutations and discovered a functional network of collagen modifications in the glioblastoma. In order to select the most efficient network for use with this method, we used a novel, ROC curve-based procedure for benchmarking different network versions by their ability to recover pathway membership.Conclusions
The results of our network-based procedure were in good agreement with published gold standard sets of cancer genes and were shown to complement and expand frequency-based driver analyses. On the other hand, three sequence-based methods applied to the same data yielded poor agreement with each other and with our results. We review the difference in driver proportions discovered by different sequencing approaches and discuss the functional roles of novel driver mutations. The software used in this work and the global network of functional couplings are publicly available at http://research.scilifelab.se/andrej_alexeyenko/downloads.html.Electronic supplementary material
The online version of this article (doi:10.1186/1471-2105-15-308) contains supplementary material, which is available to authorized users. 相似文献16.
Background
With the advance of next generation sequencing (NGS) technologies, a large number of insertion and deletion (indel) variants have been identified in human populations. Despite much research into variant calling, it has been found that a non-negligible proportion of the identified indel variants might be false positives due to sequencing errors, artifacts caused by ambiguous alignments, and annotation errors.Results
In this paper, we examine indel redundancy in dbSNP, one of the central databases for indel variants, and develop a standalone computational pipeline, dubbed Vindel, to detect redundant indels. The pipeline first applies indel position information to form candidate redundant groups, then performs indel mutations to the reference genome to generate corresponding indel variant substrings. Finally the indel variant substrings in the same candidate redundant groups are compared in a pairwise fashion to identify redundant indels. We applied our pipeline to check for redundancy in the human indels in dbSNP. Our pipeline identified approximately 8% redundancy in insertion type indels, 12% in deletion type indels, and overall 10% for insertions and deletions combined. These numbers are largely consistent across all human autosomes. We also investigated indel size distribution and adjacent indel distance distribution for a better understanding of the mechanisms generating indel variants.Conclusions
Vindel, a simple yet effective computational pipeline, can be used to check whether a set of indels are redundant with respect to those already in the database of interest such as NCBI’s dbSNP. Of the approximately 5.9 million indels we examined, nearly 0.6 million are redundant, revealing a serious limitation in the current indel annotation. Statistics results prove the consistency of the pipeline on indel redundancy detection for all 22 chromosomes. Apart from the standalone Vindel pipeline, the indel redundancy check algorithm is also implemented in the web server http://bioinformatics.cs.vt.edu/zhanglab/indelRedundant.php.Electronic supplementary material
The online version of this article (doi:10.1186/s12859-014-0359-1) contains supplementary material, which is available to authorized users. 相似文献17.
Filipe C Martins Ines de Santiago Anne Trinh Jian Xian Anne Guo Karen Sayal Mercedes Jimenez-Linan Suha Deen Kristy Driver Marie Mack Jennifer Aslop Paul D Pharoah Florian Markowetz James D Brenton 《Genome biology》2014,15(12)
Background
TP53 and BRCA1/2 mutations are the main drivers in high-grade serous ovarian carcinoma (HGSOC). We hypothesise that combining tissue phenotypes from image analysis of tumour sections with genomic profiles could reveal other significant driver events.Results
Automatic estimates of stromal content combined with genomic analysis of TCGA HGSOC tumours show that stroma strongly biases estimates of PTEN expression. Tumour-specific PTEN expression was tested in two independent cohorts using tissue microarrays containing 521 cases of HGSOC. PTEN loss or downregulation occurred in 77% of the first cohort by immunofluorescence and 52% of the validation group by immunohistochemistry, and is associated with worse survival in a multivariate Cox-regression model adjusted for study site, age, stage and grade. Reanalysis of TCGA data shows that hemizygous loss of PTEN is common (36%) and expression of PTEN and expression of androgen receptor are positively associated. Low androgen receptor expression was associated with reduced survival in data from TCGA and immunohistochemical analysis of the first cohort.Conclusion
PTEN loss is a common event in HGSOC and defines a subgroup with significantly worse prognosis, suggesting the rational use of drugs to target PI3K and androgen receptor pathways for HGSOC. This work shows that integrative approaches combining tissue phenotypes from images with genomic analysis can resolve confounding effects of tissue heterogeneity and should be used to identify new drivers in other cancers.Electronic supplementary material
The online version of this article (doi:10.1186/s13059-014-0526-8) contains supplementary material, which is available to authorized users. 相似文献18.
Background
Despite several recent advances in the automated generation of draft metabolic reconstructions, the manual curation of these networks to produce high quality genome-scale metabolic models remains a labour-intensive and challenging task.Results
We present PathwayBooster, an open-source software tool to support the manual comparison and curation of metabolic models. It combines gene annotations from GenBank files and other sources with information retrieved from the metabolic databases BRENDA and KEGG to produce a set of pathway diagrams and reports summarising the evidence for the presence of a reaction in a given organism’s metabolic network. By comparing multiple sources of evidence within a common framework, PathwayBooster assists the curator in the identification of likely false positive (misannotated enzyme) and false negative (pathway hole) reactions. Reaction evidence may be taken from alternative annotations of the same genome and/or a set of closely related organisms.Conclusions
By integrating and visualising evidence from multiple sources, PathwayBooster reduces the manual effort required in the curation of a metabolic model. The software is available online at http://www.theosysbio.bio.ic.ac.uk/resources/pathwaybooster/.Electronic supplementary material
The online version of this article (doi:10.1186/s12859-014-0447-2) contains supplementary material, which is available to authorized users. 相似文献19.