首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Chae M  Chen JJ 《PloS one》2011,6(8):e22546

Background

In microarray data analysis, hierarchical clustering (HC) is often used to group samples or genes according to their gene expression profiles to study their associations. In a typical HC, nested clustering structures can be quickly identified in a tree. The relationship between objects is lost, however, because clusters rather than individual objects are compared. This results in a tree that is hard to interpret.

Methodology/Principal Findings

This study proposes an ordering method, HC-SYM, which minimizes bilateral symmetric distance of two adjacent clusters in a tree so that similar objects in the clusters are located in the cluster boundaries. The performance of HC-SYM was evaluated by both supervised and unsupervised approaches and compared favourably with other ordering methods.

Conclusions/Significance

The intuitive relationship between objects and flexibility of the HC-SYM method can be very helpful in the exploratory analysis of not only microarray data but also similar high-dimensional data.  相似文献   

2.

Background

Masking of multiple sequence alignment blocks has become a powerful method to enhance the tree-likeness of the underlying data. However, existing masking approaches are insensitive to heterogeneous sequence divergence which can mislead tree reconstructions. We present AliGROOVE, a new method based on a sliding window and a Monte Carlo resampling approach, that visualizes heterogeneous sequence divergence or alignment ambiguity related to single taxa or subsets of taxa within a multiple sequence alignment and tags suspicious branches on a given tree.

Results

We used simulated multiple sequence alignments to show that the extent of alignment ambiguity in pairwise sequence comparison is correlated with the frequency of misplaced taxa in tree reconstructions. The approach implemented in AliGROOVE allows to detect nodes within a tree that are supported despite the absence of phylogenetic signal in the underlying multiple sequence alignment. We show that AliGROOVE equally well detects heterogeneous sequence divergence in a case study based on an empirical data set of mitochondrial DNA sequences of chelicerates.

Conclusions

The AliGROOVE approach has the potential to identify single taxa or subsets of taxa which show predominantly randomized sequence similarity in comparison with other taxa in a multiple sequence alignment. It further allows to evaluate the reliability of node support in a novel way.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-294) contains supplementary material, which is available to authorized users.  相似文献   

3.

Background

One of the most common goals of hierarchical clustering is finding those branches of a tree that form quantifiably distinct data subtypes. Achieving this goal in a statistically meaningful way requires (a) a measure of distinctness of a branch and (b) a test to determine the significance of the observed measure, applicable to all branches and across multiple scales of dissimilarity.

Results

We formulate a method termed Tree Branches Evaluated Statistically for Tightness (TBEST) for identifying significantly distinct tree branches in hierarchical clusters. For each branch of the tree a measure of distinctness, or tightness, is defined as a rational function of heights, both of the branch and of its parent. A statistical procedure is then developed to determine the significance of the observed values of tightness. We test TBEST as a tool for tree-based data partitioning by applying it to five benchmark datasets, one of them synthetic and the other four each from a different area of biology. For each dataset there is a well-defined partition of the data into classes. In all test cases TBEST performs on par with or better than the existing techniques.

Conclusions

Based on our benchmark analysis, TBEST is a tool of choice for detection of significantly distinct branches in hierarchical trees grown from biological data. An R language implementation of the method is available from the Comprehensive R Archive Network: http://www.cran.r-project.org/web/packages/TBEST/index.html.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-1000) contains supplementary material, which is available to authorized users.  相似文献   

4.

Background

Plant resistance genes (R genes) exist in large families and usually contain both a nucleotide-binding site domain and a leucine-rich repeat domain, denoted NBS-LRR. The genome sequence of cassava (Manihot esculenta) is a valuable resource for analysing the genomic organization of resistance genes in this crop.

Results

With searches for Pfam domains and manual curation of the cassava gene annotations, we identified 228 NBS-LRR type genes and 99 partial NBS genes. These represent almost 1% of the total predicted genes and show high sequence similarity to proteins from other plant species. Furthermore, 34 contained an N-terminal toll/interleukin (TIR)-like domain, and 128 contained an N-terminal coiled-coil (CC) domain. 63% of the 327 R genes occurred in 39 clusters on the chromosomes. These clusters are mostly homogeneous, containing NBS-LRRs derived from a recent common ancestor.

Conclusions

This study provides insight into the evolution of NBS-LRR genes in the cassava genome; the phylogenetic and mapping information may aid efforts to further characterize the function of these predicted R genes.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1554-9) contains supplementary material, which is available to authorized users.  相似文献   

5.

Background

Impaired regulatory T cell (Treg) function is thought to contribute to ongoing inflammatory responses in sarcoidosis, but underlying mechanisms remain unclear. Moreover, it is not known if increased apoptotic susceptibility of Tregs may contribute to an impaired immunosuppressive function in sarcoidosis. Therefore, the aim of this study is to analyze proportions, phenotype, survival, and apoptotic susceptibility of Tregs in sarcoidosis.

Methods

Patients with pulmonary sarcoidosis (n = 58) were included at time of diagnosis. Tregs were analyzed in broncho-alveolar lavage fluid and peripheral blood of patients and healthy controls (HC).

Results

In sarcoidosis patients no evidence was found for a relative deficit of Tregs, neither locally nor systemically. Rather, increased proportions of circulating Tregs were observed, most prominently in patients developing chronic disease. Sarcoidosis circulating Tregs displayed adequate expression of FoxP3, CD25 and CTLA4. Remarkably, in sarcoidosis enhanced CD95 expression on circulating activated CD45RO+ Tregs was observed compared with HC, and proportions of these cells were significantly increased. Specifically sarcoidosis Tregs - but not Th cells - showed impaired survival compared with HC. Finally, CD95L-mediated apoptosis was enhanced in sarcoidosis Tregs.

Conclusion

In untreated patients with active pulmonary sarcoidosis, Tregs show impaired survival and enhanced apoptotic susceptibility towards CD95L. Increased apoptosis likely contributes to the insufficient immunosuppressive function of sarcoidosis Tregs. Further research into this field will help determine whether improvement of Treg survival holds a promising new therapeutic approach for chronic sarcoidosis patients.

Electronic supplementary material

The online version of this article (doi:10.1186/s12931-015-0265-8) contains supplementary material, which is available to authorized users.  相似文献   

6.

Background

Ribosomally synthesized and post-translationally modified peptides (RiPPs) are a diverse group of biologically active bacterial molecules. Due to the conserved genomic arrangement of many of the genes involved in their synthesis, these secondary metabolite biosynthetic pathways can be predicted from genome sequence data. To date, however, despite the myriad of sequenced genomes covering many branches of the bacterial phylogenetic tree, such an analysis for a broader group of bacteria like anaerobes has not been attempted.

Results

We investigated a collection of 211 complete and published genomes, focusing on anaerobic bacteria, whose potential to encode RiPPs is relatively unknown. We showed that the presence of RiPP-genes is widespread among anaerobic representatives of the phyla Actinobacteria, Proteobacteria and Firmicutes and that, collectively, anaerobes possess the ability to synthesize a broad variety of different RiPP classes. More than 25% of anaerobes are capable of producing RiPPs either alone or in conjunction with other secondary metabolites, such as polyketides or non-ribosomal peptides.

Conclusion

Amongst the analyzed genomes, several gene clusters encode uncharacterized RiPPs, whilst others show similarity with known RiPPs. These include a number of potential class II lanthipeptides; head-to-tail cyclized peptides and lactococcin 972-like RiPP. This study presents further evidence in support of anaerobic bacteria as an untapped natural products reservoir.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-983) contains supplementary material, which is available to authorized users.  相似文献   

7.

Background

Mate preference behavior is an essential first step in sexual selection and is a critical determinant in evolutionary biology. Previously an environmental compound (the fungicide vinclozolin) was found to promote the epigenetic transgenerational inheritance of an altered sperm epigenome and modified mate preference characteristics for three generations after exposure of a gestating female.

Results

The current study investigated gene networks involved in various regions of the brain that correlated with the altered mate preference behavior in the male and female. Statistically significant correlations of gene clusters and modules were identified to associate with specific mate preference behaviors. This novel systems biology approach identified gene networks (bionetworks) involved in sex-specific mate preference behavior. Observations demonstrate the ability of environmental factors to promote the epigenetic transgenerational inheritance of this altered evolutionary biology determinant.

Conclusions

Combined observations elucidate the potential molecular control of mate preference behavior and suggests environmental epigenetics can have a role in evolutionary biology.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-377) contains supplementary material, which is available to authorized users.  相似文献   

8.

Background

Several classifications of adult asthma patients using cluster analyses based on clinical and demographic information has resulted in clinical phenotypic clusters that do not address molecular mechanisms. Volatile organic compounds (VOC) in exhaled air are released during inflammation in response to oxidative stress as a result of activated leukocytes. VOC profiles in exhaled air could distinguish between asthma patients and healthy subjects. In this study, we aimed to classify new asthma endotypes by combining inflammatory mechanisms investigated by VOC profiles in exhaled air and clinical information of asthma patients.

Methods

Breath samples were analyzed for VOC profiles by gas chromatography–mass spectrometry from asthma patients (n = 195) and healthy controls (n = 40). A total of 945 determined compounds were subjected to discriminant analysis to find those that could discriminate healthy from asthmatic subjects. 2-step cluster analysis based on clinical information and VOCs in exhaled air were used to form asthma endotypes.

Results

We identified 16 VOCs, which could distinguish between healthy and asthma subjects with a sensitivity of 100% and a specificity of 91.1%. Cluster analysis based on VOCs in exhaled air and the clinical parameters FEV1, FEV1 change after 3 weeks of hospitalization, allergic sensitization, Junipers symptoms score and asthma medications resulted in the formation of 7 different asthma endotype clusters. We identified asthma clusters with different VOC profiles but similar clinical characteristics and endotypes with similar VOC profiles, but distinct clinical characteristics.

Conclusion

This study demonstrates that both, clinical presentation of asthma and inflammatory mechanisms in the airways should be considered for classification of asthma subtypes.

Electronic supplementary material

The online version of this article (doi:10.1186/s12931-014-0136-8) contains supplementary material, which is available to authorized users.  相似文献   

9.

Background

Metagenomics has a great potential to discover previously unattainable information about microbial communities. An important prerequisite for such discoveries is to accurately estimate the composition of microbial communities. Most of prevalent homology-based approaches utilize solely the results of an alignment tool such as BLAST, limiting their estimation accuracy to high ranks of the taxonomy tree.

Results

We developed a new homology-based approach called Taxonomic Analysis by Elimination and Correction (TAEC), which utilizes the similarity in the genomic sequence in addition to the result of an alignment tool. The proposed method is comprehensively tested on various simulated benchmark datasets of diverse complexity of microbial structure. Compared with other available methods designed for estimating taxonomic composition at a relatively low taxonomic rank, TAEC demonstrates greater accuracy in quantification of genomes in a given microbial sample. We also applied TAEC on two real metagenomic datasets, oral cavity dataset and Crohn’s disease dataset. Our results, while agreeing with previous findings at higher ranks of the taxonomy tree, provide accurate estimation of taxonomic compositions at the species/strain level, narrowing down which species/strains need more attention in the study of oral cavity and the Crohn’s disease.

Conclusions

By taking account of the similarity in the genomic sequence TAEC outperforms other available tools in estimating taxonomic composition at a very low rank, especially when closely related species/strains exist in a metagenomic sample.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-242) contains supplementary material, which is available to authorized users.  相似文献   

10.

Background

Recent studies used the contact data or three-dimensional (3D) genome reconstructions from Hi-C (chromosome conformation capture with next-generation sequencing) to assess the co-localization of functional genomic annotations in the nucleus. These analyses dichotomized data point pairs belonging to a functional annotation as “close” or “far” based on some threshold and then tested for enrichment of “close” pairs. We propose an alternative approach that avoids dichotomization of the data and instead directly estimates the significance of distances within the 3D reconstruction.

Results

We applied this approach to 3D genome reconstructions for Plasmodium falciparum, the causative agent of malaria, and Saccharomyces cerevisiae and compared the results to previous approaches. We found significant 3D co-localization of centromeres, telomeres, virulence genes, and several sets of genes with developmentally regulated expression in P. falciparum; and significant 3D co-localization of centromeres and long terminal repeats in S. cerevisiae. Additionally, we tested the experimental observation that telomeres form three to seven clusters in P. falciparum and S. cerevisiae. Applying affinity propagation clustering to telomere coordinates in the 3D reconstructions yielded six telomere clusters for both organisms.

Conclusions

Distance-based assessment replicated key findings, while avoiding dichotomization of the data (which previously yielded threshold-sensitive results).

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-992) contains supplementary material, which is available to authorized users.  相似文献   

11.
12.

Background

Meta-analysis has become a popular approach for high-throughput genomic data analysis because it often can significantly increase power to detect biological signals or patterns in datasets. However, when using public-available databases for meta-analysis, duplication of samples is an often encountered problem, especially for gene expression data. Not removing duplicates could lead false positive finding, misleading clustering pattern or model over-fitting issue, etc in the subsequent data analysis.

Results

We developed a Bioconductor package Dupchecker that efficiently identifies duplicated samples by generating MD5 fingerprints for raw data. A real data example was demonstrated to show the usage and output of the package.

Conclusions

Researchers may not pay enough attention to checking and removing duplicated samples, and then data contamination could make the results or conclusions from meta-analysis questionable. We suggest applying DupChecker to examine all gene expression data sets before any data analysis step.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-323) contains supplementary material, which is available to authorized users.  相似文献   

13.
14.

Background

Mismatch repair deficient colorectal adenomas are composed of transformed cells that descend from a common founder and progressively accumulate genomic alterations. The proliferation history of these tumors is still largely unknown. Here we present a novel approach to rebuild the proliferation trees that recapitulate the history of individual colorectal adenomas by mapping the progressive acquisition of somatic point mutations during tumor growth.

Results

Using our approach, we called high and low frequency mutations acquired in the X chromosome of four mismatch repair deficient colorectal adenomas deriving from male individuals. We clustered these mutations according to their frequencies and rebuilt the proliferation trees directly from the mutation clusters using a recursive algorithm. The trees of all four lesions were formed of a dominant subclone that co-existed with other genetically heterogeneous subpopulations of cells. However, despite this similar hierarchical organization, the growth dynamics varied among and within tumors, likely depending on a combination of tumor-specific genetic and environmental factors.

Conclusions

Our study provides insights into the biological properties of individual mismatch repair deficient colorectal adenomas that may influence their growth and also the response to therapy. Extended to other solid tumors, our novel approach could inform on the mechanisms of cancer progression and on the best treatment choice.

Electronic supplementary material

The online version of this article (doi:10.1186/s13059-014-0437-8) contains supplementary material, which is available to authorized users.  相似文献   

15.

Background

Several types of genetic interactions in humans can be directly or indirectly associated with the causal effects of mutations. These interactions are usually based on their co-associations to biological processes, coexistence in cellular locations, coexpression in cell lines, physical interactions and so on. In addition, pathological processes can present similar phenotypes that have mutations either in the same genomic location or in different genomic regions. Therefore, integrative resources for all of these complex interactions can help us prioritize the relationships between genes and diseases that are most deserving to be studied by researchers and physicians.

Results

PhenUMA is a web application that displays biological networks using information from biomedical and biomolecular data repositories. One of its most innovative features is to combine the benefits of semantic similarity methods with the information taken from databases of genetic diseases and biological interactions. More specifically, this tool is useful in studying novel pathological relationships between functionally related genes, merging diseases into clusters that share specific phenotypes or finding diseases related to reported phenotypes.

Conclusions

This framework builds, analyzes and visualizes networks based on both functional and phenotypic relationships. The integration of this information helps in the discovery of alternative pathological roles of genes, biological functions and diseases. PhenUMA represents an advancement toward the use of new technologies for genomics and personalized medicine.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-014-0375-1) contains supplementary material, which is available to authorized users.  相似文献   

16.
17.

Background

The increasing abundance of neuromorphological data provides both the opportunity and the challenge to compare massive numbers of neurons from a wide diversity of sources efficiently and effectively. We implemented a modified global alignment algorithm representing axonal and dendritic bifurcations as strings of characters. Sequence alignment quantifies neuronal similarity by identifying branch-level correspondences between trees.

Results

The space generated from pairwise similarities is capable of classifying neuronal arbor types as well as, or better than, traditional topological metrics. Unsupervised cluster analysis produces groups that significantly correspond with known cell classes for axons, dendrites, and pyramidal apical dendrites. Furthermore, the distinguishing consensus topology generated by multiple sequence alignment of a group of neurons reveals their shared branching blueprint. Interestingly, the axons of dendritic-targeting interneurons in the rodent cortex associates with pyramidal axons but apart from the (more topologically symmetric) axons of perisomatic-targeting interneurons.

Conclusions

Global pairwise and multiple sequence alignment of neurite topologies enables detailed comparison of neurites and identification of conserved topological features in alignment-defined clusters. The methods presented also provide a framework for incorporation of additional branch-level morphological features. Moreover, comparison of multiple alignment with motif analysis shows that the two techniques provide complementary information respectively revealing global and local features.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0605-1) contains supplementary material, which is available to authorized users.  相似文献   

18.
19.

Background

High-throughput DNA sequencing technologies are generating vast amounts of data. Fast, flexible and memory efficient implementations are needed in order to facilitate analyses of thousands of samples simultaneously.

Results

We present a multithreaded program suite called ANGSD. This program can calculate various summary statistics, and perform association mapping and population genetic analyses utilizing the full information in next generation sequencing data by working directly on the raw sequencing data or by using genotype likelihoods.

Conclusions

The open source c/c++ program ANGSD is available at http://www.popgen.dk/angsd. The program is tested and validated on GNU/Linux systems. The program facilitates multiple input formats including BAM and imputed beagle genotype probability files. The program allow the user to choose between combinations of existing methods and can perform analysis that is not implemented elsewhere.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-014-0356-4) contains supplementary material, which is available to authorized users.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号