共查询到20条相似文献,搜索用时 31 毫秒
1.
Background
In microarray data analysis, hierarchical clustering (HC) is often used to group samples or genes according to their gene expression profiles to study their associations. In a typical HC, nested clustering structures can be quickly identified in a tree. The relationship between objects is lost, however, because clusters rather than individual objects are compared. This results in a tree that is hard to interpret.Methodology/Principal Findings
This study proposes an ordering method, HC-SYM, which minimizes bilateral symmetric distance of two adjacent clusters in a tree so that similar objects in the clusters are located in the cluster boundaries. The performance of HC-SYM was evaluated by both supervised and unsupervised approaches and compared favourably with other ordering methods.Conclusions/Significance
The intuitive relationship between objects and flexibility of the HC-SYM method can be very helpful in the exploratory analysis of not only microarray data but also similar high-dimensional data. 相似文献2.
Patrick Kück Sandra A Meid Christian Gro? Johann W W?gele Bernhard Misof 《BMC bioinformatics》2014,15(1)
Background
Masking of multiple sequence alignment blocks has become a powerful method to enhance the tree-likeness of the underlying data. However, existing masking approaches are insensitive to heterogeneous sequence divergence which can mislead tree reconstructions. We present AliGROOVE, a new method based on a sliding window and a Monte Carlo resampling approach, that visualizes heterogeneous sequence divergence or alignment ambiguity related to single taxa or subsets of taxa within a multiple sequence alignment and tags suspicious branches on a given tree.Results
We used simulated multiple sequence alignments to show that the extent of alignment ambiguity in pairwise sequence comparison is correlated with the frequency of misplaced taxa in tree reconstructions. The approach implemented in AliGROOVE allows to detect nodes within a tree that are supported despite the absence of phylogenetic signal in the underlying multiple sequence alignment. We show that AliGROOVE equally well detects heterogeneous sequence divergence in a case study based on an empirical data set of mitochondrial DNA sequences of chelicerates.Conclusions
The AliGROOVE approach has the potential to identify single taxa or subsets of taxa which show predominantly randomized sequence similarity in comparison with other taxa in a multiple sequence alignment. It further allows to evaluate the reliability of node support in a novel way.Electronic supplementary material
The online version of this article (doi:10.1186/1471-2105-15-294) contains supplementary material, which is available to authorized users. 相似文献3.
Background
One of the most common goals of hierarchical clustering is finding those branches of a tree that form quantifiably distinct data subtypes. Achieving this goal in a statistically meaningful way requires (a) a measure of distinctness of a branch and (b) a test to determine the significance of the observed measure, applicable to all branches and across multiple scales of dissimilarity.Results
We formulate a method termed Tree Branches Evaluated Statistically for Tightness (TBEST) for identifying significantly distinct tree branches in hierarchical clusters. For each branch of the tree a measure of distinctness, or tightness, is defined as a rational function of heights, both of the branch and of its parent. A statistical procedure is then developed to determine the significance of the observed values of tightness. We test TBEST as a tool for tree-based data partitioning by applying it to five benchmark datasets, one of them synthetic and the other four each from a different area of biology. For each dataset there is a well-defined partition of the data into classes. In all test cases TBEST performs on par with or better than the existing techniques.Conclusions
Based on our benchmark analysis, TBEST is a tool of choice for detection of significantly distinct branches in hierarchical trees grown from biological data. An R language implementation of the method is available from the Comprehensive R Archive Network: http://www.cran.r-project.org/web/packages/TBEST/index.html.Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-1000) contains supplementary material, which is available to authorized users. 相似文献4.
Background
Plant resistance genes (R genes) exist in large families and usually contain both a nucleotide-binding site domain and a leucine-rich repeat domain, denoted NBS-LRR. The genome sequence of cassava (Manihot esculenta) is a valuable resource for analysing the genomic organization of resistance genes in this crop.Results
With searches for Pfam domains and manual curation of the cassava gene annotations, we identified 228 NBS-LRR type genes and 99 partial NBS genes. These represent almost 1% of the total predicted genes and show high sequence similarity to proteins from other plant species. Furthermore, 34 contained an N-terminal toll/interleukin (TIR)-like domain, and 128 contained an N-terminal coiled-coil (CC) domain. 63% of the 327 R genes occurred in 39 clusters on the chromosomes. These clusters are mostly homogeneous, containing NBS-LRRs derived from a recent common ancestor.Conclusions
This study provides insight into the evolution of NBS-LRR genes in the cassava genome; the phylogenetic and mapping information may aid efforts to further characterize the function of these predicted R genes.Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1554-9) contains supplementary material, which is available to authorized users. 相似文献5.
Caroline E. Broos Menno van Nimwegen Alex Kleinjan Bregje ten Berge Femke Muskens Johannes C.C.M. in ’t Veen Jouke T. Annema Bart N. Lambrecht Henk C. Hoogsteden Rudi W. Hendriks Mirjam Kool Bernt van den Blink 《Respiratory research》2015,16(1)
Background
Impaired regulatory T cell (Treg) function is thought to contribute to ongoing inflammatory responses in sarcoidosis, but underlying mechanisms remain unclear. Moreover, it is not known if increased apoptotic susceptibility of Tregs may contribute to an impaired immunosuppressive function in sarcoidosis. Therefore, the aim of this study is to analyze proportions, phenotype, survival, and apoptotic susceptibility of Tregs in sarcoidosis.Methods
Patients with pulmonary sarcoidosis (n = 58) were included at time of diagnosis. Tregs were analyzed in broncho-alveolar lavage fluid and peripheral blood of patients and healthy controls (HC).Results
In sarcoidosis patients no evidence was found for a relative deficit of Tregs, neither locally nor systemically. Rather, increased proportions of circulating Tregs were observed, most prominently in patients developing chronic disease. Sarcoidosis circulating Tregs displayed adequate expression of FoxP3, CD25 and CTLA4. Remarkably, in sarcoidosis enhanced CD95 expression on circulating activated CD45RO+ Tregs was observed compared with HC, and proportions of these cells were significantly increased. Specifically sarcoidosis Tregs - but not Th cells - showed impaired survival compared with HC. Finally, CD95L-mediated apoptosis was enhanced in sarcoidosis Tregs.Conclusion
In untreated patients with active pulmonary sarcoidosis, Tregs show impaired survival and enhanced apoptotic susceptibility towards CD95L. Increased apoptosis likely contributes to the insufficient immunosuppressive function of sarcoidosis Tregs. Further research into this field will help determine whether improvement of Treg survival holds a promising new therapeutic approach for chronic sarcoidosis patients.Electronic supplementary material
The online version of this article (doi:10.1186/s12931-015-0265-8) contains supplementary material, which is available to authorized users. 相似文献6.
Background
Ribosomally synthesized and post-translationally modified peptides (RiPPs) are a diverse group of biologically active bacterial molecules. Due to the conserved genomic arrangement of many of the genes involved in their synthesis, these secondary metabolite biosynthetic pathways can be predicted from genome sequence data. To date, however, despite the myriad of sequenced genomes covering many branches of the bacterial phylogenetic tree, such an analysis for a broader group of bacteria like anaerobes has not been attempted.Results
We investigated a collection of 211 complete and published genomes, focusing on anaerobic bacteria, whose potential to encode RiPPs is relatively unknown. We showed that the presence of RiPP-genes is widespread among anaerobic representatives of the phyla Actinobacteria, Proteobacteria and Firmicutes and that, collectively, anaerobes possess the ability to synthesize a broad variety of different RiPP classes. More than 25% of anaerobes are capable of producing RiPPs either alone or in conjunction with other secondary metabolites, such as polyketides or non-ribosomal peptides.Conclusion
Amongst the analyzed genomes, several gene clusters encode uncharacterized RiPPs, whilst others show similarity with known RiPPs. These include a number of potential class II lanthipeptides; head-to-tail cyclized peptides and lactococcin 972-like RiPP. This study presents further evidence in support of anaerobic bacteria as an untapped natural products reservoir.Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-983) contains supplementary material, which is available to authorized users. 相似文献7.
Background
Mate preference behavior is an essential first step in sexual selection and is a critical determinant in evolutionary biology. Previously an environmental compound (the fungicide vinclozolin) was found to promote the epigenetic transgenerational inheritance of an altered sperm epigenome and modified mate preference characteristics for three generations after exposure of a gestating female.Results
The current study investigated gene networks involved in various regions of the brain that correlated with the altered mate preference behavior in the male and female. Statistically significant correlations of gene clusters and modules were identified to associate with specific mate preference behaviors. This novel systems biology approach identified gene networks (bionetworks) involved in sex-specific mate preference behavior. Observations demonstrate the ability of environmental factors to promote the epigenetic transgenerational inheritance of this altered evolutionary biology determinant.Conclusions
Combined observations elucidate the potential molecular control of mate preference behavior and suggests environmental epigenetics can have a role in evolutionary biology.Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-377) contains supplementary material, which is available to authorized users. 相似文献8.
Norbert Meyer Jan W Dallinga Sarah Janine Nuss Edwin JC Moonen Joep JBN van Berkel Cezmi Akdis Frederik Jan van Schooten Günter Menz 《Respiratory research》2014,15(1)
Background
Several classifications of adult asthma patients using cluster analyses based on clinical and demographic information has resulted in clinical phenotypic clusters that do not address molecular mechanisms. Volatile organic compounds (VOC) in exhaled air are released during inflammation in response to oxidative stress as a result of activated leukocytes. VOC profiles in exhaled air could distinguish between asthma patients and healthy subjects. In this study, we aimed to classify new asthma endotypes by combining inflammatory mechanisms investigated by VOC profiles in exhaled air and clinical information of asthma patients.Methods
Breath samples were analyzed for VOC profiles by gas chromatography–mass spectrometry from asthma patients (n = 195) and healthy controls (n = 40). A total of 945 determined compounds were subjected to discriminant analysis to find those that could discriminate healthy from asthmatic subjects. 2-step cluster analysis based on clinical information and VOCs in exhaled air were used to form asthma endotypes.Results
We identified 16 VOCs, which could distinguish between healthy and asthma subjects with a sensitivity of 100% and a specificity of 91.1%. Cluster analysis based on VOCs in exhaled air and the clinical parameters FEV1, FEV1 change after 3 weeks of hospitalization, allergic sensitization, Junipers symptoms score and asthma medications resulted in the formation of 7 different asthma endotype clusters. We identified asthma clusters with different VOC profiles but similar clinical characteristics and endotypes with similar VOC profiles, but distinct clinical characteristics.Conclusion
This study demonstrates that both, clinical presentation of asthma and inflammatory mechanisms in the airways should be considered for classification of asthma subtypes.Electronic supplementary material
The online version of this article (doi:10.1186/s12931-014-0136-8) contains supplementary material, which is available to authorized users. 相似文献9.
Background
Metagenomics has a great potential to discover previously unattainable information about microbial communities. An important prerequisite for such discoveries is to accurately estimate the composition of microbial communities. Most of prevalent homology-based approaches utilize solely the results of an alignment tool such as BLAST, limiting their estimation accuracy to high ranks of the taxonomy tree.Results
We developed a new homology-based approach called Taxonomic Analysis by Elimination and Correction (TAEC), which utilizes the similarity in the genomic sequence in addition to the result of an alignment tool. The proposed method is comprehensively tested on various simulated benchmark datasets of diverse complexity of microbial structure. Compared with other available methods designed for estimating taxonomic composition at a relatively low taxonomic rank, TAEC demonstrates greater accuracy in quantification of genomes in a given microbial sample. We also applied TAEC on two real metagenomic datasets, oral cavity dataset and Crohn’s disease dataset. Our results, while agreeing with previous findings at higher ranks of the taxonomy tree, provide accurate estimation of taxonomic compositions at the species/strain level, narrowing down which species/strains need more attention in the study of oral cavity and the Crohn’s disease.Conclusions
By taking account of the similarity in the genomic sequence TAEC outperforms other available tools in estimating taxonomic composition at a very low rank, especially when closely related species/strains exist in a metagenomic sample.Electronic supplementary material
The online version of this article (doi:10.1186/1471-2105-15-242) contains supplementary material, which is available to authorized users. 相似文献10.
Distance-based assessment of the localization of functional annotations in 3D genome reconstructions
Background
Recent studies used the contact data or three-dimensional (3D) genome reconstructions from Hi-C (chromosome conformation capture with next-generation sequencing) to assess the co-localization of functional genomic annotations in the nucleus. These analyses dichotomized data point pairs belonging to a functional annotation as “close” or “far” based on some threshold and then tested for enrichment of “close” pairs. We propose an alternative approach that avoids dichotomization of the data and instead directly estimates the significance of distances within the 3D reconstruction.Results
We applied this approach to 3D genome reconstructions for Plasmodium falciparum, the causative agent of malaria, and Saccharomyces cerevisiae and compared the results to previous approaches. We found significant 3D co-localization of centromeres, telomeres, virulence genes, and several sets of genes with developmentally regulated expression in P. falciparum; and significant 3D co-localization of centromeres and long terminal repeats in S. cerevisiae. Additionally, we tested the experimental observation that telomeres form three to seven clusters in P. falciparum and S. cerevisiae. Applying affinity propagation clustering to telomere coordinates in the 3D reconstructions yielded six telomere clusters for both organisms.Conclusions
Distance-based assessment replicated key findings, while avoiding dichotomization of the data (which previously yielded threshold-sensitive results).Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-992) contains supplementary material, which is available to authorized users. 相似文献11.
12.
Background
Meta-analysis has become a popular approach for high-throughput genomic data analysis because it often can significantly increase power to detect biological signals or patterns in datasets. However, when using public-available databases for meta-analysis, duplication of samples is an often encountered problem, especially for gene expression data. Not removing duplicates could lead false positive finding, misleading clustering pattern or model over-fitting issue, etc in the subsequent data analysis.Results
We developed a Bioconductor package Dupchecker that efficiently identifies duplicated samples by generating MD5 fingerprints for raw data. A real data example was demonstrated to show the usage and output of the package.Conclusions
Researchers may not pay enough attention to checking and removing duplicated samples, and then data contamination could make the results or conclusions from meta-analysis questionable. We suggest applying DupChecker to examine all gene expression data sets before any data analysis step.Electronic supplementary material
The online version of this article (doi:10.1186/1471-2105-15-323) contains supplementary material, which is available to authorized users. 相似文献13.
14.
Anna De Grassi Fabio Iannelli Matteo Cereda Sara Volorio Valentina Melocchi Alessandra Viel Gianluca Basso Luigi Laghi Michele Caselle Francesca D Ciccarelli 《Genome biology》2014,15(8)
Background
Mismatch repair deficient colorectal adenomas are composed of transformed cells that descend from a common founder and progressively accumulate genomic alterations. The proliferation history of these tumors is still largely unknown. Here we present a novel approach to rebuild the proliferation trees that recapitulate the history of individual colorectal adenomas by mapping the progressive acquisition of somatic point mutations during tumor growth.Results
Using our approach, we called high and low frequency mutations acquired in the X chromosome of four mismatch repair deficient colorectal adenomas deriving from male individuals. We clustered these mutations according to their frequencies and rebuilt the proliferation trees directly from the mutation clusters using a recursive algorithm. The trees of all four lesions were formed of a dominant subclone that co-existed with other genetically heterogeneous subpopulations of cells. However, despite this similar hierarchical organization, the growth dynamics varied among and within tumors, likely depending on a combination of tumor-specific genetic and environmental factors.Conclusions
Our study provides insights into the biological properties of individual mismatch repair deficient colorectal adenomas that may influence their growth and also the response to therapy. Extended to other solid tumors, our novel approach could inform on the mechanisms of cancer progression and on the best treatment choice.Electronic supplementary material
The online version of this article (doi:10.1186/s13059-014-0437-8) contains supplementary material, which is available to authorized users. 相似文献15.
Rocío Rodríguez-López Armando Reyes-Palomares Francisca Sánchez-Jiménez Miguel ángel Medina 《BMC bioinformatics》2014,15(1)
Background
Several types of genetic interactions in humans can be directly or indirectly associated with the causal effects of mutations. These interactions are usually based on their co-associations to biological processes, coexistence in cellular locations, coexpression in cell lines, physical interactions and so on. In addition, pathological processes can present similar phenotypes that have mutations either in the same genomic location or in different genomic regions. Therefore, integrative resources for all of these complex interactions can help us prioritize the relationships between genes and diseases that are most deserving to be studied by researchers and physicians.Results
PhenUMA is a web application that displays biological networks using information from biomedical and biomolecular data repositories. One of its most innovative features is to combine the benefits of semantic similarity methods with the information taken from databases of genetic diseases and biological interactions. More specifically, this tool is useful in studying novel pathological relationships between functionally related genes, merging diseases into clusters that share specific phenotypes or finding diseases related to reported phenotypes.Conclusions
This framework builds, analyzes and visualizes networks based on both functional and phenotypic relationships. The integration of this information helps in the discovery of alternative pathological roles of genes, biological functions and diseases. PhenUMA represents an advancement toward the use of new technologies for genomics and personalized medicine.Electronic supplementary material
The online version of this article (doi:10.1186/s12859-014-0375-1) contains supplementary material, which is available to authorized users. 相似文献16.
17.
Background
The increasing abundance of neuromorphological data provides both the opportunity and the challenge to compare massive numbers of neurons from a wide diversity of sources efficiently and effectively. We implemented a modified global alignment algorithm representing axonal and dendritic bifurcations as strings of characters. Sequence alignment quantifies neuronal similarity by identifying branch-level correspondences between trees.Results
The space generated from pairwise similarities is capable of classifying neuronal arbor types as well as, or better than, traditional topological metrics. Unsupervised cluster analysis produces groups that significantly correspond with known cell classes for axons, dendrites, and pyramidal apical dendrites. Furthermore, the distinguishing consensus topology generated by multiple sequence alignment of a group of neurons reveals their shared branching blueprint. Interestingly, the axons of dendritic-targeting interneurons in the rodent cortex associates with pyramidal axons but apart from the (more topologically symmetric) axons of perisomatic-targeting interneurons.Conclusions
Global pairwise and multiple sequence alignment of neurite topologies enables detailed comparison of neurites and identification of conserved topological features in alignment-defined clusters. The methods presented also provide a framework for incorporation of additional branch-level morphological features. Moreover, comparison of multiple alignment with motif analysis shows that the two techniques provide complementary information respectively revealing global and local features.Electronic supplementary material
The online version of this article (doi:10.1186/s12859-015-0605-1) contains supplementary material, which is available to authorized users. 相似文献18.
19.