首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.

Background

Mismatch repair deficient colorectal adenomas are composed of transformed cells that descend from a common founder and progressively accumulate genomic alterations. The proliferation history of these tumors is still largely unknown. Here we present a novel approach to rebuild the proliferation trees that recapitulate the history of individual colorectal adenomas by mapping the progressive acquisition of somatic point mutations during tumor growth.

Results

Using our approach, we called high and low frequency mutations acquired in the X chromosome of four mismatch repair deficient colorectal adenomas deriving from male individuals. We clustered these mutations according to their frequencies and rebuilt the proliferation trees directly from the mutation clusters using a recursive algorithm. The trees of all four lesions were formed of a dominant subclone that co-existed with other genetically heterogeneous subpopulations of cells. However, despite this similar hierarchical organization, the growth dynamics varied among and within tumors, likely depending on a combination of tumor-specific genetic and environmental factors.

Conclusions

Our study provides insights into the biological properties of individual mismatch repair deficient colorectal adenomas that may influence their growth and also the response to therapy. Extended to other solid tumors, our novel approach could inform on the mechanisms of cancer progression and on the best treatment choice.

Electronic supplementary material

The online version of this article (doi:10.1186/s13059-014-0437-8) contains supplementary material, which is available to authorized users.  相似文献   

3.
The Chiari network is a net-like mobile structure, occasionally encountered near the entrance of the inferior vena cava in the right atrium. Due to its fenestration, the Chiari network does not cause flow obstruction of the blood. The Chiari network is usually an incidental finding with no further clinical consequences. We report an unusual presentation of a Chiari network, mimicking a right atrial oscillating cystic mass attached to the interatrial septum by a thin stalk.

Electronic supplementary material

The online version of this article (doi:10.1007/s12471-014-0621-1) contains supplementary material, which is available to authorized users.  相似文献   

4.

Background

In genomics, hierarchical clustering (HC) is a popular method for grouping similar samples based on a distance measure. HC algorithms do not actually create clusters, but compute a hierarchical representation of the data set. Usually, a fixed height on the HC tree is used, and each contiguous branch of samples below that height is considered a separate cluster. Due to the fixed-height cutting, those clusters may not unravel significant functional coherence hidden deeper in the tree. Besides that, most existing approaches do not make use of available clinical information to guide cluster extraction from the HC. Thus, the identified subgroups may be difficult to interpret in relation to that information.

Results

We develop a novel framework for decomposing the HC tree into clusters by semi-supervised piecewise snipping. The framework, called guided piecewise snipping, utilizes both molecular data and clinical information to decompose the HC tree into clusters. It cuts the given HC tree at variable heights to find a partition (a set of non-overlapping clusters) which does not only represent a structure deemed to underlie the data from which HC tree is derived, but is also maximally consistent with the supplied clinical data. Moreover, the approach does not require the user to specify the number of clusters prior to the analysis. Extensive results on simulated and multiple medical data sets show that our approach consistently produces more meaningful clusters than the standard fixed-height cut and/or non-guided approaches.

Conclusions

The guided piecewise snipping approach features several novelties and advantages over existing approaches. The proposed algorithm is generic, and can be combined with other algorithms that operate on detected clusters. This approach represents an advancement in several regards: (1) a piecewise tree snipping framework that efficiently extracts clusters by snipping the HC tree possibly at variable heights while preserving the HC tree structure; (2) a flexible implementation allowing a variety of data types for both building and snipping the HC tree, including patient follow-up data like survival as auxiliary information.The data sets and R code are provided as supplementary files. The proposed method is available from Bioconductor as the R-package HCsnip.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-014-0448-1) contains supplementary material, which is available to authorized users.  相似文献   

5.

Background

Metabolomics is one of most recent omics technologies. It has been applied on fields such as food science, nutrition, drug discovery and systems biology. For this, gas chromatography-mass spectrometry (GC-MS) has been largely applied and many computational tools have been developed to support the analysis of metabolomics data. Among them, AMDIS is perhaps the most used tool for identifying and quantifying metabolites. However, AMDIS generates a high number of false-positives and does not have an interface amenable for high-throughput data analysis. Although additional computational tools have been developed for processing AMDIS results and to perform normalisations and statistical analysis of metabolomics data, there is not yet a single free software or package able to reliably identify and quantify metabolites analysed by GC-MS.

Results

Here we introduce a new algorithm, PScore, able to score peaks according to their likelihood of representing metabolites defined in a mass spectral library. We implemented PScore in a R package called MetaBox and evaluated the applicability and potential of MetaBox by comparing its performance against AMDIS results when analysing volatile organic compounds (VOC) from standard mixtures of metabolites and from female and male mice faecal samples. MetaBox reported lower percentages of false positives and false negatives, and was able to report a higher number of potential biomarkers associated to the metabolism of female and male mice.

Conclusions

Identification and quantification of metabolites is among the most critical and time-consuming steps in GC-MS metabolome analysis. Here we present an algorithm implemented in a R package, which allows users to construct flexible pipelines and analyse metabolomics data in a high-throughput manner.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-014-0374-2) contains supplementary material, which is available to authorized users.  相似文献   

6.

Background

One of the most common goals of hierarchical clustering is finding those branches of a tree that form quantifiably distinct data subtypes. Achieving this goal in a statistically meaningful way requires (a) a measure of distinctness of a branch and (b) a test to determine the significance of the observed measure, applicable to all branches and across multiple scales of dissimilarity.

Results

We formulate a method termed Tree Branches Evaluated Statistically for Tightness (TBEST) for identifying significantly distinct tree branches in hierarchical clusters. For each branch of the tree a measure of distinctness, or tightness, is defined as a rational function of heights, both of the branch and of its parent. A statistical procedure is then developed to determine the significance of the observed values of tightness. We test TBEST as a tool for tree-based data partitioning by applying it to five benchmark datasets, one of them synthetic and the other four each from a different area of biology. For each dataset there is a well-defined partition of the data into classes. In all test cases TBEST performs on par with or better than the existing techniques.

Conclusions

Based on our benchmark analysis, TBEST is a tool of choice for detection of significantly distinct branches in hierarchical trees grown from biological data. An R language implementation of the method is available from the Comprehensive R Archive Network: http://www.cran.r-project.org/web/packages/TBEST/index.html.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-1000) contains supplementary material, which is available to authorized users.  相似文献   

7.

Background

Microarray technology, as well as other functional genomics experiments, allow simultaneous measurements of thousands of genes within each sample. Both the prediction accuracy and interpretability of a classifier could be enhanced by performing the classification based only on selected discriminative genes. We propose a statistical method for selecting genes based on overlapping analysis of expression data across classes. This method results in a novel measure, called proportional overlapping score (POS), of a feature’s relevance to a classification task.

Results

We apply POS, along‐with four widely used gene selection methods, to several benchmark gene expression datasets. The experimental results of classification error rates computed using the Random Forest, k Nearest Neighbor and Support Vector Machine classifiers show that POS achieves a better performance.

Conclusions

A novel gene selection method, POS, is proposed. POS analyzes the expressions overlap across classes taking into account the proportions of overlapping samples. It robustly defines a mask for each gene that allows it to minimize the effect of expression outliers. The constructed masks along‐with a novel gene score are exploited to produce the selected subset of genes.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-274) contains supplementary material, which is available to authorized users.  相似文献   

8.

Introduction

This study was performed to develop ultrasound composite scores for the assessment of inflammatory and structural lesions in Psoriatic Arthritis (PsA).

Methods

We performed a prospective study on 83 PsA patients undergoing two study visits scheduled 6 months apart. B-mode and Power Doppler (PD) findings were semi-quantitatively scored at 68 joints (evaluating synovia, perisynovial tissue, tendons and bone) and 14 entheses. We constructed bilateral and unilateral (focusing the dominant site) ultrasound composite scores selecting relevant sites by a hierarchical approach. We tested convergent construct validity, reliability and feasibility of inflammatory and structural elements of the scores as well as sensitivity to change for inflammatory items.

Results

The bilateral score (termed PsASon22) included 22 joints (6 metacarpophalangeal joints (MCPs), 4 proximal interphalangeal joints (PIPs) of hands (H-PIPs), 2 metatarsophalangeal joints (MTPs), 4 distal interphalangeal joints (DIPs) of hands (H-DIPs), 2 DIPs of feet (F-DIPs), 4 large joints) and 4 entheses (bilateral assessment of lateral epicondyle and distal patellar tendon). The unilateral score (PsASon13) compromised 13 joints (2 MCPs, 3 H-PIPs, 1 PIP of feet (F-PIP), 2 MTPs, 1 H-DIP and 2 F-DIPs and 2 large joints) and 2 entheses (unilateral lateral epicondyle and distal patellar tendon). Both composite scores revealed a moderate to high sensitivity (bilateral composite score 43% to 100%, unilateral 36% to 100%) to detect inflammatory and structural lesions compared to the 68-joint/14-entheses score. The inflammatory and structural components of the composite scores correlated weakly with clinical markers of disease activity (corrcoeffs 0 to 0.40) and the health assessment questionnaire (HAQ, corrcoeffs 0 to 0.39), respectively. Patients with active disease achieving remission at follow-up yielded greater reductions of ultrasound inflammatory scores than those with stable clinical activity (Cohen’s d effect size ranging from 0 to 0.79). Inter-rater reliability of bi- and unilateral composite scores was moderate to good with ICCs ranging from 0.42 to 0.96 and from 0.36 to 0.71, respectively for inflammatory and structural sub-scores. The PsASon22 and PsASon13 required 16 to 26 and 9 to 13 minutes, respectively to be completed.

Conclusion

Both new PsA ultrasound composite scores (PsASon22 and PsASon13) revealed sufficient convergent construct validity, sensitivity to change, reliability and feasibility.

Electronic supplementary material

The online version of this article (doi:10.1186/s13075-014-0476-2) contains supplementary material, which is available to authorized users.  相似文献   

9.

Background

We consider the problem of reconstructing a gene regulatory network structure from limited time series gene expression data, without any a priori knowledge of connectivity. We assume that the network is sparse, meaning the connectivity among genes is much less than full connectivity. We develop a method for network reconstruction based on compressive sensing, which takes advantage of the network’s sparseness.

Results

For the case in which all genes are accessible for measurement, and there is no measurement noise, we show that our method can be used to exactly reconstruct the network. For the more general problem, in which hidden genes exist and all measurements are contaminated by noise, we show that our method leads to reliable reconstruction. In both cases, coherence of the model is used to assess the ability to reconstruct the network and to design new experiments. We demonstrate that it is possible to use the coherence distribution to guide biological experiment design effectively. By collecting a more informative dataset, the proposed method helps reduce the cost of experiments. For each problem, a set of numerical examples is presented.

Conclusions

The method provides a guarantee on how well the inferred graph structure represents the underlying system, reveals deficiencies in the data and model, and suggests experimental directions to remedy the deficiencies.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-014-0400-4) contains supplementary material, which is available to authorized users.  相似文献   

10.

Background

A hierarchy, characterized by tree-like relationships, is a natural method of organizing data in various domains. When considering an unsupervised machine learning routine, such as clustering, a bottom-up hierarchical (BU, agglomerative) algorithm is used as a default and is often the only method applied.

Methodology/Principal Findings

We show that hierarchical clustering that involve global considerations, such as top-down (TD, divisive), or glocal (global-local) algorithms are better suited to reveal meaningful patterns in the data. This is demonstrated, by testing the correspondence between the results of several algorithms (TD, glocal and BU) and the correct annotations provided by experts. The correspondence was tested in multiple domains including gene expression experiments, stock trade records and functional protein families. The performance of each of the algorithms is evaluated by statistical criteria that are assigned to clusters (nodes of the hierarchy tree) based on expert-labeled data. Whereas TD algorithms perform better on global patterns, BU algorithms perform well and are advantageous when finer granularity of the data is sought. In addition, a novel TD algorithm that is based on genuine density of the data points is presented and is shown to outperform other divisive and agglomerative methods. Application of the algorithm to more than 500 protein sequences belonging to ion-channels illustrates the potential of the method for inferring overlooked functional annotations. ClustTree, a graphical Matlab toolbox for applying various hierarchical clustering algorithms and testing their quality is made available.

Conclusions

Although currently rarely used, global approaches, in particular, TD or glocal algorithms, should be considered in the exploratory process of clustering. In general, applying unsupervised clustering methods can leverage the quality of manually-created mapping of proteins families. As demonstrated, it can also provide insights in erroneous and missed annotations.  相似文献   

11.

Background

With the advent of low cost, fast sequencing technologies metagenomic analyses are made possible. The large data volumes gathered by these techniques and the unpredictable diversity captured in them are still, however, a challenge for computational biology.

Results

In this paper we address the problem of rapid taxonomic assignment with small and adaptive data models (< 5 MB) and present the accelerated k-mer explorer (AKE). Acceleration in AKE’s taxonomic assignments is achieved by a special machine learning architecture, which is well suited to model data collections that are intrinsically hierarchical. We report classification accuracy reasonably well for ranks down to order, observed on a study on real world data (Acid Mine Drainage, Cow Rumen).

Conclusion

We show that the execution time of this approach is orders of magnitude shorter than competitive approaches and that accuracy is comparable. The tool is presented to the public as a web application (url: https://ani.cebitec.uni-bielefeld.de/ake/, username: bmc, password: bmcbioinfo).

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-014-0384-0) contains supplementary material, which is available to authorized users.  相似文献   

12.

Background

The use of a severity score to help orientation decisions could improve the efficiency of care for acute exacerbations of COPD (AECOPD). We previously developed a score (‘2008 score’, based on age, dyspnea grade at steady state and number of clinical signs of severity) predicting in-hospital mortality in patients with AECOPD visiting emergency departments (EDs). External validity of this score remained to be assessed.

Objectives

To test the predictive properties of the ‘2008 score’ in a population of patients hospitalized in medical respiratory wards for AECOPD, and determine whether a new score specifically derived from this population would differ from the previous score in terms of components or predictive performance.

Methods

Data from a cohort study in 1824 patients hospitalized in a medical ward for an AECOPD were analyzed. Patients were categorized using the 2008 score and its predictive characteristics for in-hospital mortality rates were assessed. A new score was developed using multivariate logistic regression modeling in a randomly selected derivation population sample followed by testing in the remaining population (validation sample). Robustness of results was assessed by case-by-case validation.

Results

The 2008 score was characterized by a c-statistic at 0.77, a sensitivity of 69% and a specificity of 76% for prediction of in-hospital mortality. The new score comprised the same variables plus major cardiac comorbidities and was characterized by a c-statistic of 0.78, a sensitivity of 77% and specificity of 66%.

Conclusions

A score using simple clinical variables has robust properties for predicting the risk of in-hospital death in patients hospitalized for AECOPD. Adding cardiac comorbidities to the original score increased its sensitivity while decreasing its specificity.

Electronic supplementary material

The online version of this article (doi:10.1186/s12931-014-0099-9) contains supplementary material, which is available to authorized users.  相似文献   

13.
We present a novel method for the identification of sets of mutually exclusive gene alterations in a given set of genomic profiles. We scan the groups of genes with a common downstream effect on the signaling network, using a mutual exclusivity criterion that ensures that each gene in the group significantly contributes to the mutual exclusivity pattern. We test the method on all available TCGA cancer genomics datasets, and detect multiple previously unreported alterations that show significant mutual exclusivity and are likely to be driver events.

Electronic supplementary material

The online version of this article (doi:10.1186/s13059-015-0612-6) contains supplementary material, which is available to authorized users.  相似文献   

14.
15.

Background

Our knowledge of global protein-protein interaction (PPI) networks in complex organisms such as humans is hindered by technical limitations of current methods.

Results

On the basis of short co-occurring polypeptide regions, we developed a tool called MP-PIPE capable of predicting a global human PPI network within 3 months. With a recall of 23% at a precision of 82.1%, we predicted 172,132 putative PPIs. We demonstrate the usefulness of these predictions through a range of experiments.

Conclusions

The speed and accuracy associated with MP-PIPE can make this a potential tool to study individual human PPI networks (from genomic sequences alone) for personalized medicine.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-014-0383-1) contains supplementary material, which is available to authorized users.  相似文献   

16.

Background

Network-based approaches for the analysis of large-scale genomics data have become well established. Biological networks provide a knowledge scaffold against which the patterns and dynamics of ‘omics’ data can be interpreted. The background information required for the construction of such networks is often dispersed across a multitude of knowledge bases in a variety of formats. The seamless integration of this information is one of the main challenges in bioinformatics. The Semantic Web offers powerful technologies for the assembly of integrated knowledge bases that are computationally comprehensible, thereby providing a potentially powerful resource for constructing biological networks and network-based analysis.

Results

We have developed the Gene eXpression Knowledge Base (GeXKB), a semantic web technology based resource that contains integrated knowledge about gene expression regulation. To affirm the utility of GeXKB we demonstrate how this resource can be exploited for the identification of candidate regulatory network proteins. We present four use cases that were designed from a biological perspective in order to find candidate members relevant for the gastrin hormone signaling network model. We show how a combination of specific query definitions and additional selection criteria derived from gene expression data and prior knowledge concerning candidate proteins can be used to retrieve a set of proteins that constitute valid candidates for regulatory network extensions.

Conclusions

Semantic web technologies provide the means for processing and integrating various heterogeneous information sources. The GeXKB offers biologists such an integrated knowledge resource, allowing them to address complex biological questions pertaining to gene expression. This work illustrates how GeXKB can be used in combination with gene expression results and literature information to identify new potential candidates that may be considered for extending a gene regulatory network.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-014-0386-y) contains supplementary material, which is available to authorized users.  相似文献   

17.

Background

Ontology-based enrichment analysis aids in the interpretation and understanding of large-scale biological data. Ontologies are hierarchies of biologically relevant groupings. Using ontology annotations, which link ontology classes to biological entities, enrichment analysis methods assess whether there is a significant over or under representation of entities for ontology classes. While many tools exist that run enrichment analysis for protein sets annotated with the Gene Ontology, there are only a few that can be used for small molecules enrichment analysis.

Results

We describe BiNChE, an enrichment analysis tool for small molecules based on the ChEBI Ontology. BiNChE displays an interactive graph that can be exported as a high-resolution image or in network formats. The tool provides plain, weighted and fragment analysis based on either the ChEBI Role Ontology or the ChEBI Structural Ontology.

Conclusions

BiNChE aids in the exploration of large sets of small molecules produced within Metabolomics or other Systems Biology research contexts. The open-source tool provides easy and highly interactive web access to enrichment analysis with the ChEBI ontology tool and is additionally available as a standalone library.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0486-3) contains supplementary material, which is available to authorized users.  相似文献   

18.

Background

A convergence of high-throughput sequencing and computational power is transforming biology into information science. Despite these technological advances, converting bits and bytes of sequence information into meaningful insights remains a challenging enterprise. Biological systems operate on multiple hierarchical levels from genomes to biomes. Holistic understanding of biological systems requires agile software tools that permit comparative analyses across multiple information levels (DNA, RNA, protein, and metabolites) to identify emergent properties, diagnose system states, or predict responses to environmental change.

Results

Here we adopt the MetaPathways annotation and analysis pipeline and Pathway Tools to construct environmental pathway/genome databases (ePGDBs) that describe microbial community metabolism using MetaCyc, a highly curated database of metabolic pathways and components covering all domains of life. We evaluate Pathway Tools’ performance on three datasets with different complexity and coding potential, including simulated metagenomes, a symbiotic system, and the Hawaii Ocean Time-series. We define accuracy and sensitivity relationships between read length, coverage and pathway recovery and evaluate the impact of taxonomic pruning on ePGDB construction and interpretation. Resulting ePGDBs provide interactive metabolic maps, predict emergent metabolic pathways associated with biosynthesis and energy production and differentiate between genomic potential and phenotypic expression across defined environmental gradients.

Conclusions

This multi-tiered analysis provides the user community with specific operating guidelines, performance metrics and prediction hazards for more reliable ePGDB construction and interpretation. Moreover, it demonstrates the power of Pathway Tools in predicting metabolic interactions in natural and engineered ecosystems.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-619) contains supplementary material, which is available to authorized users.  相似文献   

19.

Background

The invasion of red blood cells (RBCs) by malarial parasites is an essential step in the life cycle of Plasmodium falciparum. Human-parasite surface protein interactions play a critical role in this process. Although several interactions between human and parasite proteins have been discovered, the mechanism related to invasion remains poorly understood because numerous human-parasite protein interactions have not yet been identified. High-throughput screening experiments are not feasible for malarial parasites due to difficulty in expressing the parasite proteins. Here, we performed computational prediction of the PPIs involved in malaria parasite invasion to elucidate the mechanism by which invasion occurs.

Results

In this study, an expectation maximization algorithm was used to estimate the probabilities of domain-domain interactions (DDIs). Estimates of DDI probabilities were then used to infer PPI probabilities. We found that our prediction performance was better than that based on the information of D. melanogaster alone when information related to the six species was used. Prediction performance was assessed using protein interaction data from S. cerevisiae, indicating that the predicted results were reliable. We then used the estimates of DDI probabilities to infer interactions between 490 parasite and 3,787 human membrane proteins. A small-scale dataset was used to illustrate the usability of our method in predicting interactions between human and parasite proteins. The positive predictive value (PPV) was lower than that observed in S. cerevisiae. We integrated gene expression data to improve prediction accuracy and to reduce false positives. We identified 80 membrane proteins highly expressed in the schizont stage by fast Fourier transform method. Approximately 221 erythrocyte membrane proteins were identified using published mass spectral datasets. A network consisting of 205 interactions was predicted. Results of network analysis suggest that SNARE proteins of parasites and APP of humans may function in the invasion of RBCs by parasites.

Conclusions

We predicted a small-scale PPI network that may be involved in parasite invasion of RBCs by integrating DDI information and expression profiles. Experimental studies should be conducted to validate the predicted interactions. The predicted PPIs help elucidate the mechanism of parasite invasion and provide directions for future experimental investigations.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-014-0393-z) contains supplementary material, which is available to authorized users.  相似文献   

20.

Background  

Hierarchical clustering methods like Ward's method have been used since decades to understand biological and chemical data sets. In order to get a partition of the data set, it is necessary to choose an optimal level of the hierarchy by a so-called level selection algorithm. In 2005, a new kind of hierarchical clustering method was introduced by Palla et al. that differs in two ways from Ward's method: it can be used on data on which no full similarity matrix is defined and it can produce overlapping clusters, i.e., allow for multiple membership of items in clusters. These features are optimal for biological and chemical data sets but until now no level selection algorithm has been published for this method.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号