首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Classification of high-throughput genomic data is a powerful method to assign samples to subgroups with specific molecular profiles. Consensus partitioning is the most widely applied approach to reveal subgroups by summarizing a consensus classification from a list of individual classifications generated by repeatedly executing clustering on random subsets of the data. It is able to evaluate the stability of the classification. We implemented a new R/Bioconductor package, cola, that provides a general framework for consensus partitioning. With cola, various parameters and methods can be user-defined and easily integrated into different steps of an analysis, e.g., feature selection, sample classification or defining signatures. cola provides a new method named ATC (ability to correlate to other rows) to extract features and recommends spherical k-means clustering (skmeans) for subgroup classification. We show that ATC and skmeans have better performance than other commonly used methods by a comprehensive benchmark on public datasets. We also benchmark key parameters in the consensus partitioning procedure, which helps users to select optimal parameter values. Moreover, cola provides rich functionalities to apply multiple partitioning methods in parallel and directly compare their results, as well as rich visualizations. cola can automate the complete analysis and generates a comprehensive HTML report.  相似文献   

2.
3.
SUMMARY: Hierarchical clustering is a widely used method for detecting clusters in genomic data. Clusters are defined by cutting branches off the dendrogram. A common but inflexible method uses a constant height cutoff value; this method exhibits suboptimal performance on complicated dendrograms. We present the Dynamic Tree Cut R package that implements novel dynamic branch cutting methods for detecting clusters in a dendrogram depending on their shape. Compared to the constant height cutoff method, our techniques offer the following advantages: (1) they are capable of identifying nested clusters; (2) they are flexible-cluster shape parameters can be tuned to suit the application at hand; (3) they are suitable for automation; and (4) they can optionally combine the advantages of hierarchical clustering and partitioning around medoids, giving better detection of outliers. We illustrate the use of these methods by applying them to protein-protein interaction network data and to a simulated gene expression data set. AVAILABILITY: The Dynamic Tree Cut method is implemented in an R package available at http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/BranchCutting.  相似文献   

4.
周春玲  陈芳  韩德铎  徐萌 《西北植物学报》2007,27(12):2559-2563
采用改良的CTAB法提取19个樱花品种的基因组DNA.利用RAPD技术对19个樱花品种进行亲缘关系鉴定和品种分类研究.从60条10 bp随机引物中筛选出32条扩增效果较好的引物进行扩增,共扩增出533条带,其中多态性带为406条,多态率达76.17%.根据扩增结果进行UPGMA聚类分析,聚类结果将19个品种分为2大类群,第1类群主要为杂交樱系;第2类群为日本晚樱系,根据樱花的重瓣性、枝姿和花色又可分为不同的亚类群、类和亚类.结果表明应用RAPD技术对樱花分子水平的分类结果与传统分类学的结果基本一致,进一步证明种源、重瓣性、枝姿和花色都可作为樱花品种分类的重要指标.  相似文献   

5.
Link Clustering (LC) is a relatively new method for detecting overlapping communities in networks. The basic principle of LC is to derive a transform matrix whose elements are composed of the link similarity of neighbor links based on the Jaccard distance calculation; then it applies hierarchical clustering to the transform matrix and uses a measure of partition density on the resulting dendrogram to determine the cut level for best community detection. However, the original link clustering method does not consider the link similarity of non-neighbor links, and the partition density tends to divide the communities into many small communities. In this paper, an Extended Link Clustering method (ELC) for overlapping community detection is proposed. The improved method employs a new link similarity, Extended Link Similarity (ELS), to produce a denser transform matrix, and uses the maximum value of EQ (an extended measure of quality of modularity) as a means to optimally cut the dendrogram for better partitioning of the original network space. Since ELS uses more link information, the resulting transform matrix provides a superior basis for clustering and analysis. Further, using the EQ value to find the best level for the hierarchical clustering dendrogram division, we obtain communities that are more sensible and reasonable than the ones obtained by the partition density evaluation. Experimentation on five real-world networks and artificially-generated networks shows that the ELC method achieves higher EQ and In-group Proportion (IGP) values. Additionally, communities are more realistic than those generated by either of the original LC method or the classical CPM method.  相似文献   

6.
Peeters  Edwin T.H.M.  Gylstra  Ronald  Vos  Jose H. 《Hydrobiologia》2004,519(1-3):103-115
The relative contribution of sediment food (e.g. organic matter, carbohydrates, proteins, C, N, polyunsaturated fatty acids) and environmental variables (e.g. oxygen, pH, depth, sediment grain size, conductivity) in explaining the observed variation in benthic macroinvertebrates is investigated. Soft bottom sediments, water and benthic macroinvertebrates were sampled in several water systems across The Netherlands. The variance partitioning method is used to quantify the relative contributions of food and environmental variables in structuring the benthic macroinvertebrate community structure.It is assumed that detritivores show a significant relationship with sediment food variables and carnivores and herbivores do not. The results of the variance partitioning method with data sets containing only detritivores, herbivores or carnivores confirm this assumption. This indicates that the variance partitioning method is a useful tool for analyzing the impact of different groups of variables in complex situations. Approximately 45% of the total variation in the macroinvertebrate community structure could be explained by variables included in the analyses. The variance partitioning method shows that sediment food variables contributed significantly to the total variation in the macroinvertebrate dataset. The relative importance of food depends on the intensity of other environmental factors and is lower on broad spatial scales than on smaller scales.The results of the partitioning depend on the selected variables that are included in the analyses. The method becomes problematic in case variables from different groups of variables (e.g. one food variable and one environmental variable) have a high inflation factor and thus are collinear. The choice of the variable that is left out impacts the variance allocated to the different groups of variables.The variance partitioning method was able to detect the spatial scale dependent contribution of food variables in structuring macroinvertebrate communities. This spatial scale dependency can also be caused by the size, the composition, and the heterogeneity of the dataset. Performing extra analyses in which specific samples are removed from the original dataset can give insight in under- or overestimation of the impact of certain factors and offers the possibility to test the robustness of the obtained results.  相似文献   

7.
The isolation principle rests on defining internal and external differentiation for each subset of at least two objects. Subsets with larger external than internal differentiation form isolated groups in the sense that they are internally cohesive and externally isolated. Objects that do not belong to any isolated group are termed solitary. The collection of all isolated groups and solitary objects forms a hierarchical (encaptic) structure. This ubiquitous characteristic of biological organization provides the motivation to identify universally applicable practical methods for the detection of such structure, to distinguish primary types of structure, to quantify their distinctiveness, and to simplify interpretation of structural aspects. A method implementing the isolation principle (by generating all isolated groups and solitary objects) is proven to be specified by single-linkage clustering. Basically, the absence of structure can be stated if no isolated groups exist, the condition for which is provided. Structures that allow for classifications in the sense of complete partitioning into disjoint isolated groups are characterized, and measures of distinctiveness of classification are developed. Among other primary types of structure, chaining (complete nesting) and ties (isolated groups without internal structure) are considered in more detail. Some biological examples for the interpretation of structure resulting from application of the isolation principle are outlined.  相似文献   

8.
Regulatory motif finding by logic regression   总被引:1,自引:0,他引:1  
  相似文献   

9.
In recent research, many univariate and multivariate approaches have been proposed to improve automatic classification of various dementia syndromes using imaging data. Some of these methods do not provide the possibility to integrate possible confounding variables like age into the statistical evaluation. A similar problem sometimes exists in clinical studies, as it is not always possible to match different clinical groups to each other in all confounding variables, like for example, early-onset (age<65 years) and late-onset (age≥65) patients with Alzheimer's disease (AD). Here, we propose a simple method to control for possible effects of confounding variables such as age prior to statistical evaluation of magnetic resonance imaging (MRI) data using support vector machine classification (SVM) or voxel-based morphometry (VBM). We compare SVM results for the classification of 80 AD patients and 79 healthy control subjects based on MRI data with and without prior age correction. Additionally, we compare VBM results for the comparison of three different groups of AD patients differing in age with the same group of control subjects obtained without including age as covariate, with age as covariate or with prior age correction using the proposed method. SVM classification using the proposed method resulted in higher between-group classification accuracy compared to uncorrected data. Further, applying the proposed age correction substantially improved univariate detection of disease-related grey matter atrophy using VBM in AD patients differing in age from control subjects. The results suggest that the approach proposed in this work is generally suited to control for confounding variables such as age in SVM or VBM analyses. Accordingly, the approach might improve and extend the application of these methods in clinical neurosciences.  相似文献   

10.
There are many published methods for predicting resting energy expenditure (REE) from measured body composition. Although these published reports extend back almost a century, new related studies appear on a regular basis. It remains unclear what the similarities and differences are among these various methods and what, if any, advantages the newly introduced REE prediction models offer. These issues led us to develop an organizational system for REE prediction methods with the goal of clarifying prevailing ambiguities in the field. Our classification scheme is founded on body composition level (whole‐body, tissue‐organ, cellular, and molecular) and related components as the REE predictor variables. Each existing REE prediction method by body composition must belong to one body composition level. The suggested classification system, founded on a conceptual basis, highlights similarities and differences among the diverse REE‐body composition prediction methods, provides a framework for teaching REE‐body composition relationships, and identifies important future research opportunities.  相似文献   

11.
两种PCR方法对木耳属菌株的遗传多样性评价   总被引:6,自引:0,他引:6  
应用ERIC和RAPD两种PCR方法对木耳属3种29个菌株进行遗传鉴别,其中ERIC方法是首次运用于食用菌的研究领域。在相似系数75%的水平上,ERIC和RAPD分别将供试菌株分为9组和6组。由ERIC所得的聚类图可将黑木耳和毛木耳两个种区分开,而RAPD则不能完全区分两个种,但两种方法得到了一个相似的结果,即琥珀木耳与黑木耳的亲缘关系极其相近。Southern杂交实验进一步证明了ERIC所得到的29个菌株的同源性关系。分析表明,RAPD方法主要在种的水平上进行鉴别,而ERIC则可以在菌株水平上进行鉴别,结果与菌株栽培性状更为一致。研究结果表明ERIC-PCR是一种比RAPD更快捷可靠的分子标记方法,可以替代RAPD应用于木耳属的遗传多样性及遗传分类的研究。  相似文献   

12.
A P300-based brain-computer interface (BCI) enables a wide range of people to control devices that improve their quality of life. Ensemble classifiers with naive partitioning were recently applied to the P300-based BCI and these classification performances were assessed. However, they were usually trained on a large amount of training data (e.g., 15300). In this study, we evaluated ensemble linear discriminant analysis (LDA) classifiers with a newly proposed overlapped partitioning method using 900 training data. In addition, the classification performances of the ensemble classifier with naive partitioning and a single LDA classifier were compared. One of three conditions for dimension reduction was applied: the stepwise method, principal component analysis (PCA), or none. The results show that an ensemble stepwise LDA (SWLDA) classifier with overlapped partitioning achieved a better performance than the commonly used single SWLDA classifier and an ensemble SWLDA classifier with naive partitioning. This result implies that the performance of the SWLDA is improved by overlapped partitioning and the ensemble classifier with overlapped partitioning requires less training data than that with naive partitioning. This study contributes towards reducing the required amount of training data and achieving better classification performance.  相似文献   

13.
1. Early versions of the river invertebrate prediction and classification system (RIVPACS) used TWINSPAN to classify reference sites based on the macro-invertebrate fauna, followed by multiple discriminant analysis (MDA) for prediction of the fauna to be expected at new sites from environmental variables. This paper examines some alternative methods for the initial site classification and a different technique for prediction. 2. A data set of 410 sites from RIVPACS II was used for initial screening of seventeen alternative methods of site classification. Multiple discriminant analysis was used to predict classification group from environmental variables. 3. Five of the classification–prediction systems which showed promise were developed further to facilitate prediction of taxa at species and at Biological Monitoring Working Party (BMWP) family level. 4. The predictive capability of these new systems, plus RIVPACS II, was tested on an independent data set of 101 sites from locations throughout Great Britain. 5. Differences between the methods were often marginal but two gave the most consistently reliable outputs: the original TWINSPAN method, and the ordination method semi-strong hybrid multidimensional scaling (SSH) followed by K-means clustering. 6. Logistic regression, an alternative approach to prediction which does not require the prior development of a classification system, was also examined. Although its performance fell within the range offered by the other five systems tested, it conveyed no advantages over them. 7. This study demonstrated that several different multivariate methods were suitable for developing a reliable system for predicting expected probability of occurrence of taxa. This is because the prediction system involves a weighted average smoothing across site groupings. 8. Hence, the two most promising procedures for site classification, coupled to MDA, were both used in the exploratory analyses for RIVPACS III development, which utilized over 600 reference sites.  相似文献   

14.
 本文介绍一种非等级分类方法——有序样方聚类法,并将其应用于山西绵山植被垂直带的划分,结果为:1,落叶阔叶林带(包含3个亚带:Ⅰ.落叶阔叶灌丛亚带,Ⅱ.松栎林亚带,Ⅲ.落叶阔叶林亚带),2.寒温性针叶林带和3,亚高山草甸带。基于黄金分割法(或Fisher’s法)的有序样方聚类法,是按照样方在空间(或时间)先后出现的序号和组内相似性最大,组间相似性最小的标准,对样方进行分类,因此,所得结果是最优的。与TWINSPAN的结果比较,在绵山植被遭到严重扰动的情况下,有序样方聚类要优于TWINSPAN。  相似文献   

15.
MOTIVATION: Most supervised classification methods are limited by the requirement for more cases than variables. In microarray data the number of variables (genes) far exceeds the number of cases (arrays), and thus filtering and pre-selection of genes is required. We describe the application of Between Group Analysis (BGA) to the analysis of microarray data. A feature of BGA is that it can be used when the number of variables (genes) exceeds the number of cases (arrays). BGA is based on carrying out an ordination of groups of samples, using a standard method such as Correspondence Analysis (COA), rather than an ordination of the individual microarray samples. As such, it can be viewed as a method of carrying out COA with grouped data. RESULTS: We illustrate the power of the method using two cancer data sets. In both cases, we can quickly and accurately classify test samples from any number of specified a priori groups and identify the genes which characterize these groups. We obtained very high rates of correct classification, as determined by jack-knife or validation experiments with training and test sets. The results are comparable to those from other methods in terms of accuracy but the power and flexibility of BGA make it an especially attractive method for the analysis of microarray cancer data.  相似文献   

16.
Pyrolysis mass spectrometry (PyMS) is a rapid, simple, high-resolution analytical method based on thermal degradation of complex material in a vacuum, and has been widely applied to the discrimination of closely related microbial strains. Minimally prepared samples of embryogenic and non-embryogenic calluses derived from various higher plants (sweet potato, morning glory, Korean ginseng, Siberian ginseng, and balloon flower) were subjected to PyMS for spectral fingerprinting. A dendrogram based on the unweighted pair group method, with arithmetic mean of pyrolysis mass spectra, divided the calluses into Siberian ginseng embryogenic callus and the others, which were subsequently divided into embryogenic and non-embryogenic callus groups, regardless of plant species from which the calluses were derived. In the non-embryogenic callus group, the dendrogram was in agreement with the known taxonomy of the plants. These results indicate that PyMS analysis could be applied for discriminating plant calluses based on embryogenic capacity and taxonomic classification.  相似文献   

17.
AIMS: To determine the 23S and 5S rRNA gene fingerprints in order to reveal phylogenetic relationships among Bacillus thuringiensis strains. METHODS AND RESULTS: Eighty-six B. thuringiensis strains which include 80 serovar type strains, five intraserovar strains and a non-serotypeable strain, wuhanensis, were tested. Total DNA was digested with EcoRI and HindIII. The 23S and 5S rRNA gene restriction fragment length polymorphisms showed 82 distinctive ribopatterns. The dendrogram generated by numerical analysis showed 10 phylogenetic groups and six ungrouped serovars at the 95.5% DNA relatedness rate. A second dendrogram was constructed using a combination of the data from this study and from a previous study on 16S rRNA gene fingerprinting. It revealed eight distinct phylogenetic groups and three ungrouped serovars at the 94% DNA relatedness rate. CONCLUSION: This method permitted the classification and positioning of a wide variety of B. thuringiensis strains on a phylogenetic tree. Bacillus thuringiensis strains appear to be relatively homogeneous and to share a high degree of DNA relatedness. SIGNIFICANCE AND IMPACT OF THE STUDY: This study contributes a further step to the definition of valid taxonomic sublevels for the B. thuringiensis species.  相似文献   

18.
Yi Wang  Hong Yan 《Bioinformation》2008,3(3):124-129
DNA microarray allows the measurement of expression levels of tens of thousands of genes simultaneously and has many applications in biology and medicine. Microarray data are very noisy and this makes it difficult for data analysis and classification. Sub-dimension based methods can overcome the noise problem by partitioning the conditions into sub-groups, performing classification with each group and integrating the results. However, there can be many sub-dimensional groups, which lead to a high computational complexity. In this paper, we propose an entropy-based method to evaluate and select important sub-dimensions and eliminate unimportant ones. This improves the computational efficiency considerably. We have tested our method on four microarray datasets and two other real-world datasets and the experiment results prove the effectiveness of our method.  相似文献   

19.
Although food resource partitioning among sympatric species has often been explored in riverine systems, the potential influence of prey diversity on resource partitioning is little known. Using empirical data, we modeled food resource partitioning (assessed as dietary overlap) of coexisting juvenile Atlantic salmon (Salmo salar) and alpine bullhead (Cottus poecilopus). Explanatory variables incorporated into the model were fish abundance, benthic prey diversity and abundance, and several dietary metrics to give a total of seventeen potential explanatory variables. First, a forward stepwise procedure based on the Akaike information criterion was used to select explanatory variables with significant effects on food resource partitioning. Then, linear mixed‐effect models were constructed using the selected explanatory variables and with sampling site as a random factor. Food resource partitioning between salmon and bullhead increased significantly with increasing prey diversity, and the variation in food resource partitioning was best described by the model that included prey diversity as the only explanatory variable. This study provides empirical support for the notion that prey diversity is a key driver of resource partitioning among competing species.  相似文献   

20.
Variation partitioning and hierarchical partitioning are novel statistical approaches that provide deeper understanding of the importance of different explanatory variables for biodiversity patterns than traditional regression methods. Using these methods, the variation in occupancy and abundance of the clouded apollo butterfly (Parnassius mnemosyne L.) was decomposed into independent and joint effects of larval and adult food resources, microclimate and habitat quantity. The independent effect of habitat quantity variables (habitat area and connectivity) captured the largest fraction of the variation in the clouded apollo patterns, but habitat connectivity had a major contribution only for occupancy data. The independent effects of resources and microclimate were higher on butterfly abundance than on occupancy. However, a considerable amount of variation in the butterfly patterns was accounted for by the joint effects of predictors and may thus be causally related to two or all three groups of variables. Abundance of the butterfly in the surroundings of the focal grid cell had a significant effect in all analyses, independently of the effects of other predictors. Our results encourage wider applications of partitioning methods in biodiversity studies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号