首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background

Personal genome assembly is a critical process when studying tumor genomes and other highly divergent sequences. The accuracy of downstream analyses, such as RNA-seq and ChIP-seq, can be greatly enhanced by using personal genomic sequences rather than standard references. Unfortunately, reads sequenced from these types of samples often have a heterogeneous mix of various subpopulations with different variants, making assembly extremely difficult using existing assembly tools. To address these challenges, we developed SHEAR (Sample Heterogeneity Estimation and Assembly by Reference; http://vk.cs.umn.edu/SHEAR), a tool that predicts SVs, accounts for heterogeneous variants by estimating their representative percentages, and generates personal genomic sequences to be used for downstream analysis.

Results

By making use of structural variant detection algorithms, SHEAR offers improved performance in the form of a stronger ability to handle difficult structural variant types and better computational efficiency. We compare against the lead competing approach using a variety of simulated scenarios as well as real tumor cell line data with known heterogeneous variants. SHEAR is shown to successfully estimate heterogeneity percentages in both cases, and demonstrates an improved efficiency and better ability to handle tandem duplications.

Conclusion

SHEAR allows for accurate and efficient SV detection and personal genomic sequence generation. It is also able to account for heterogeneous sequencing samples, such as from tumor tissue, by estimating the subpopulation percentage for each heterogeneous variant.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-84) contains supplementary material, which is available to authorized users.  相似文献   

2.
3.

Background

Epigenome-wide association scans (EWAS) are an increasingly powerful and widely-used approach to assess the role of epigenetic variation in human complex traits. However, this rapidly emerging field lacks dedicated visualisation tools that can display features specific to epigenetic datasets.

Result

We developed coMET, an R package and online tool for visualisation of EWAS results in a genomic region of interest. coMET generates a regional plot of epigenetic-phenotype association results and the estimated DNA methylation correlation between CpG sites (co-methylation), with further options to visualise genomic annotations based on ENCODE data, gene tracks, reference CpG-sites, and user-defined features. The tool can be used to display phenotype association signals and correlation patterns of microarray or sequencing-based DNA methylation data, such as Illumina Infinium 450k, WGBS, or MeDIP-seq, as well as other types of genomic data, such as gene expression profiles. The software is available as a user-friendly online tool from http://epigen.kcl.ac.uk/cometand as an R Bioconductor package. Source code, examples, and full documentation are also available from GitHub.

Conclusion

Our new software allows visualisation of EWAS results with functional genomic annotations and with estimation of co-methylation patterns. coMET is available to a wide audience as an online tool and R package, and can be a valuable resource to interpret results in the fast growing field of epigenetics. The software is designed for epigenetic data, but can also be applied to genomic and functional genomic datasets in any species.  相似文献   

4.
5.

Background

Exome sequencing allows researchers to study the human genome in unprecedented detail. Among the many types of variants detectable through exome sequencing, one of the most over looked types of mutation is internal deletion of exons. Internal exon deletions are the absence of consecutive exons in a gene. Such deletions have potentially significant biological meaning, and they are often too short to be considered copy number variation. Therefore, to the need for efficient detection of such deletions using exome sequencing data exists.

Results

We present ExonDel, a tool specially designed to detect homozygous exon deletions efficiently. We tested ExonDel on exome sequencing data generated from 16 breast cancer cell lines and identified both novel and known IEDs. Subsequently, we verified our findings using RNAseq and PCR technologies. Further comparisons with multiple sequencing-based CNV tools showed that ExonDel is capable of detecting unique IEDs not found by other CNV tools.

Conclusions

ExonDel is an efficient way to screen for novel and known IEDs using exome sequencing data. ExonDel and its source code can be downloaded freely at https://github.com/slzhao/ExonDel.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-332) contains supplementary material, which is available to authorized users.  相似文献   

6.
7.

Background

Dynamic visual exploration of detailed pathway information can help researchers digest and interpret complex mechanisms and genomic datasets.

Results

ChiBE is a free, open-source software tool for visualizing, querying, and analyzing human biological pathways in BioPAX format. The recently released version 2 can search for neighborhoods, paths between molecules, and common regulators/targets of molecules, on large integrated cellular networks in the Pathway Commons database as well as in local BioPAX models. Resulting networks can be automatically laid out for visualization using a graphically rich, process-centric notation. Profiling data from the cBioPortal for Cancer Genomics and expression data from the Gene Expression Omnibus can be overlaid on these networks.

Conclusions

ChiBE’s new capabilities are organized around a genomics-oriented workflow and offer a unique comprehensive pathway analysis solution for genomics researchers. The software is freely available at http://code.google.com/p/chibe.  相似文献   

8.

Background

Next-generation sequencing technologies are rapidly generating whole-genome datasets for an increasing number of organisms. However, phylogenetic reconstruction of genomic data remains difficult because de novo assembly for non-model genomes and multi-genome alignment are challenging.

Results

To greatly simplify the analysis, we present an Assembly and Alignment-Free (AAF) method (https://sourceforge.net/projects/aaf-phylogeny) that constructs phylogenies directly from unassembled genome sequence data, bypassing both genome assembly and alignment. Using mathematical calculations, models of sequence evolution, and simulated sequencing of published genomes, we address both evolutionary and sampling issues caused by direct reconstruction, including homoplasy, sequencing errors, and incomplete sequencing coverage. From these results, we calculate the statistical properties of the pairwise distances between genomes, allowing us to optimize parameter selection and perform bootstrapping. As a test case with real data, we successfully reconstructed the phylogeny of 12 mammals using raw sequencing reads. We also applied AAF to 21 tropical tree genome datasets with low coverage to demonstrate its effectiveness on non-model organisms.

Conclusion

Our AAF method opens up phylogenomics for species without an appropriate reference genome or high sequence coverage, and rapidly creates a phylogenetic framework for further analysis of genome structure and diversity among non-model organisms.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1647-5) contains supplementary material, which is available to authorized users.  相似文献   

9.

Background

In recent years, increasing amounts of genomic and clinical cancer data have become publically available through large-scale collaborative projects such as The Cancer Genome Atlas (TCGA). However, as long as these datasets are difficult to access and interpret, they are essentially useless for a major part of the research community and their scientific potential will not be fully realized. To address these issues we developed MEXPRESS, a straightforward and easy-to-use web tool for the integration and visualization of the expression, DNA methylation and clinical TCGA data on a single-gene level (http://mexpress.be).

Results

In comparison to existing tools, MEXPRESS allows researchers to quickly visualize and interpret the different TCGA datasets and their relationships for a single gene, as demonstrated for GSTP1 in prostate adenocarcinoma. We also used MEXPRESS to reveal the differences in the DNA methylation status of the PAM50 marker gene MLPH between the breast cancer subtypes and how these differences were linked to the expression of MPLH.

Conclusions

We have created a user-friendly tool for the visualization and interpretation of TCGA data, offering clinical researchers a simple way to evaluate the TCGA data for their genes or candidate biomarkers of interest.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1847-z) contains supplementary material, which is available to authorized users.  相似文献   

10.
11.
12.

Background

Cancer immunotherapy has recently entered a remarkable renaissance phase with the approval of several agents for treatment. Cancer treatment platforms have demonstrated profound tumor regressions including complete cure in patients with metastatic cancer. Moreover, technological advances in next-generation sequencing (NGS) as well as the development of devices for scanning whole-slide bioimages from tissue sections and image analysis software for quantitation of tumor-infiltrating lymphocytes (TILs) allow, for the first time, the development of personalized cancer immunotherapies that target patient specific mutations. However, there is currently no bioinformatics solution that supports the integration of these heterogeneous datasets.

Results

We have developed a bioinformatics platform – Personalized Oncology Suite (POS) – that integrates clinical data, NGS data and whole-slide bioimages from tissue sections. POS is a web-based platform that is scalable, flexible and expandable. The underlying database is based on a data warehouse schema, which is used to integrate information from different sources. POS stores clinical data, genomic data (SNPs and INDELs identified from NGS analysis), and scanned whole-slide images. It features a genome browser as well as access to several instances of the bioimage management application Bisque. POS provides different visualization techniques and offers sophisticated upload and download possibilities. The modular architecture of POS allows the community to easily modify and extend the application.

Conclusions

The web-based integration of clinical, NGS, and imaging data represents a valuable resource for clinical researchers and future application in medical oncology. POS can be used not only in the context of cancer immunology but also in other studies in which NGS data and images of tissue sections are generated. The application is open-source and can be downloaded at http://www.icbi.at/POS.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-306) contains supplementary material, which is available to authorized users.  相似文献   

13.

Background

Next-generation sequencing technology provides a means to study genetic exchange at a higher resolution than was possible using earlier technologies. However, this improvement presents challenges as the alignments of next generation sequence data to a reference genome cannot be directly used as input to existing detection algorithms, which instead typically use multiple sequence alignments as input. We therefore designed a software suite called REDHORSE that uses genomic alignments, extracts genetic markers, and generates multiple sequence alignments that can be used as input to existing recombination detection algorithms. In addition, REDHORSE implements a custom recombination detection algorithm that makes use of sequence information and genomic positions to accurately detect crossovers. REDHORSE is a portable and platform independent suite that provides efficient analysis of genetic crosses based on Next-generation sequencing data.

Results

We demonstrated the utility of REDHORSE using simulated data and real Next-generation sequencing data. The simulated dataset mimicked recombination between two known haploid parental strains and allowed comparison of detected break points against known true break points to assess performance of recombination detection algorithms. A newly generated NGS dataset from a genetic cross of Toxoplasma gondii allowed us to demonstrate our pipeline. REDHORSE successfully extracted the relevant genetic markers and was able to transform the read alignments from NGS to the genome to generate multiple sequence alignments. Recombination detection algorithm in REDHORSE was able to detect conventional crossovers and double crossovers typically associated with gene conversions whilst filtering out artifacts that might have been introduced during sequencing or alignment. REDHORSE outperformed other commonly used recombination detection algorithms in finding conventional crossovers. In addition, REDHORSE was the only algorithm that was able to detect double crossovers.

Conclusion

REDHORSE is an efficient analytical pipeline that serves as a bridge between genomic alignments and existing recombination detection algorithms. Moreover, REDHORSE is equipped with a recombination detection algorithm specifically designed for Next-generation sequencing data. REDHORSE is portable, platform independent Java based utility that provides efficient analysis of genetic crosses based on Next-generation sequencing data. REDHORSE is available at http://redhorse.sourceforge.net/.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1309-7) contains supplementary material, which is available to authorized users.  相似文献   

14.

Background

Large clinical genomics studies using next generation DNA sequencing require the ability to select and track samples from a large population of patients through many experimental steps. With the number of clinical genome sequencing studies increasing, it is critical to maintain adequate laboratory information management systems to manage the thousands of patient samples that are subject to this type of genetic analysis.

Results

To meet the needs of clinical population studies using genome sequencing, we developed a web-based laboratory information management system (LIMS) with a flexible configuration that is adaptable to continuously evolving experimental protocols of next generation DNA sequencing technologies. Our system is referred to as MendeLIMS, is easily implemented with open source tools and is also highly configurable and extensible. MendeLIMS has been invaluable in the management of our clinical genome sequencing studies.

Conclusions

We maintain a publicly available demonstration version of the application for evaluation purposes at http://mendelims.stanford.edu. MendeLIMS is programmed in Ruby on Rails (RoR) and accesses data stored in SQL-compliant relational databases. Software is freely available for non-commercial use at http://dna-discovery.stanford.edu/software/mendelims/.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-290) contains supplementary material, which is available to authorized users.  相似文献   

15.

Background

Sampling genomes with Fosmid vectors and sequencing of pooled Fosmid libraries on the Illumina platform for massive parallel sequencing is a novel and promising approach to optimizing the trade-off between sequencing costs and assembly quality.

Results

In order to sequence the genome of Norway spruce, which is of great size and complexity, we developed and applied a new technology based on the massive production, sequencing, and assembly of Fosmid pools (FP). The spruce chromosomes were sampled with ~40,000 bp Fosmid inserts to obtain around two-fold genome coverage, in parallel with traditional whole genome shotgun sequencing (WGS) of haploid and diploid genomes. Compared to the WGS results, the contiguity and quality of the FP assemblies were high, and they allowed us to fill WGS gaps resulting from repeats, low coverage, and allelic differences. The FP contig sets were further merged with WGS data using a novel software package GAM-NGS.

Conclusions

By exploiting FP technology, the first published assembly of a conifer genome was sequenced entirely with massively parallel sequencing. Here we provide a comprehensive report on the different features of the approach and the optimization of the process.We have made public the input data (FASTQ format) for the set of pools used in this study:ftp://congenie.org/congenie/Nystedt_2013/Assembly/ProcessedData/FosmidPools/.(alternatively accessible via http://congenie.org/downloads).The software used for running the assembly process is available at http://research.scilifelab.se/andrej_alexeyenko/downloads/fpools/.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-439) contains supplementary material, which is available to authorized users.  相似文献   

16.

Background

Many open problems in bioinformatics involve elucidating underlying functional signals in biological sequences. DNA sequences, in particular, are characterized by rich architectures in which functional signals are increasingly found to combine local and distal interactions at the nucleotide level. Problems of interest include detection of regulatory regions, splice sites, exons, hypersensitive sites, and more. These problems naturally lend themselves to formulation as classification problems in machine learning. When classification is based on features extracted from the sequences under investigation, success is critically dependent on the chosen set of features.

Methodology

We present an algorithmic framework (EFFECT) for automated detection of functional signals in biological sequences. We focus here on classification problems involving DNA sequences which state-of-the-art work in machine learning shows to be challenging and involve complex combinations of local and distal features. EFFECT uses a two-stage process to first construct a set of candidate sequence-based features and then select a most effective subset for the classification task at hand. Both stages make heavy use of evolutionary algorithms to efficiently guide the search towards informative features capable of discriminating between sequences that contain a particular functional signal and those that do not.

Results

To demonstrate its generality, EFFECT is applied to three separate problems of importance in DNA research: the recognition of hypersensitive sites, splice sites, and ALU sites. Comparisons with state-of-the-art algorithms show that the framework is both general and powerful. In addition, a detailed analysis of the constructed features shows that they contain valuable biological information about DNA architecture, allowing biologists and other researchers to directly inspect the features and potentially use the insights obtained to assist wet-laboratory studies on retainment or modification of a specific signal. Code, documentation, and all data for the applications presented here are provided for the community at http://www.cs.gmu.edu/~ashehu/?q=OurTools.  相似文献   

17.
18.

Background

Transposable elements are mobile DNA repeat sequences, known to have high impact on genes, genome structure and evolution. This has stimulated broad interest in the detailed biological studies of transposable elements. Hence, we have developed an easy-to-use tool for the comparative analysis of the structural organization and functional relationships of transposable elements, to help understand their functional role in genomes.

Results

We named our new software VisualTE and describe it here. VisualTE is a JAVA stand-alone graphical interface that allows users to visualize and analyze all occurrences of transposable element families in annotated genomes. VisualTE reads and extracts transposable elements and genomic information from annotation and repeat data. Result analyses are displayed in several graphical panels that include location and distribution on the chromosome, the occurrence of transposable elements in the genome, their size distribution, and neighboring genes’ features and ontologies. With these hallmarks, VisualTE provides a convenient tool for studying transposable element copies and their functional relationships with genes, at the whole-genome scale, and in diverse organisms.

Conclusions

VisualTE graphical interface makes possible comparative analyses of transposable elements in any annotated sequence as well as structural organization and functional relationships between transposable elements and other genetic object. This tool is freely available at: http://lcb.cnrs-mrs.fr/spip.php?article867.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1351-5) contains supplementary material, which is available to authorized users.  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号