首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
A class of new consensus methods for n-trees (hierarchical clusterings) is proposed. These methods apply systematically to an arbitrary collection of given classifications of a fixed set of taxa, and produce a single consensus classification. They are motivated by the desire that the consensus classification retain as much information as possible from the given classifications, even in the case of only approximate agreement among them. A focus of the paper is the concept of faithfulness of consensus methods; this concept explicates the informal notion of adequate retention of information referred to above, and is proposed as a desirable requirement for consensus methods in general. The new methods are all faithful; they have the additional property that they take hierarchical level into account. Other general properties of consensus methods are investigated, especially with reference to their relation with faithfulness. The most important of these properties is neutrality; loosely speaking a consensus method is neutral if all nontrivial clusters are treated equally in the conditions on the given classifications required to guarantee the appearance of a cluster in the consensus. A central result of the paper is an analogue of the classical impossibility theorem of K. Arrow: with trivial exceptions it is impossible to have a consensus method that is simultaneously faithful and neutral. Thus two intuitively very appealing general properties of consensus methods are seen to be incompatible.  相似文献   

2.
A consensus in dex method comprises a consensus method and a consensus index that are defined on a common set of objects (e.g. classifications). For each profile of objects, the consensus method returns a consensus object representing information or structure shared among profile objects, while the consensus index returns a quantitative measure of agreement among profile objects. Since the relationship between consensus method and consensus index is poorly understood, we propose simple axioms prescribing it in the most general terms. Many taxonomic consensus index methods violate these axioms because their consensus indices measure consensus object invariants rather than profile agreement. We propose paradigms to obtain consensus index methods that measure agreement and satisfy the axioms. These paradigms salvage concepts underlying consensus index methods violating the axioms. This work was supported in part by the Faculty of Science at Memorial University of Newfoundland, and by the Natural Sciences and Engineering Research Council of Canada Under Grant A-4142.  相似文献   

3.
By embedding the ultrametric distances between objects in a classification structure into a Euclidean space, two linear-algebraic procedures can be used to obtain a consensus among any combination of dendrograms, partitions, coverings, or more general arrangements of overlapping groups; the consensus has the property that the sum of the squared distances from the objects in each of the separate representations to their positions in the consensus is minimized in the Euclidean space. A consensus classification structure (usually a dendrogram) is obtained by reversing the embedding procedure. Admissibility criteria for a consensus are briefly considered.  相似文献   

4.
The widely used FD index of functional diversity is based on the construction of a dendrogram. This index has been the subject of a strong debate concerning the choice of the distance and the clustering method to be used, since the method chosen may greatly affect the FD values obtained. Much of this debate has been centred around which method of dendrogram construction gives a faithful representation of species distribution in multidimensional functional trait space. From artificially generated datasets varying in species richness and correlations between traits, we test whether any single combination of clustering method(s) and distance consistently produces a dendrogram that most closely corresponds to the matrix of functional distances between pairs of species studied. We also test the ability of consensus trees, which incorporate features common to a range of different dendrograms, to summarize distance matrices. Our results show that no combination of clustering method(s) and distance constantly outperforms the others due to the complexity of interactions between correlations of traits, species richness, distance measures and clustering methods. Furthermore, the construction of a consensus tree from a range of dendrograms is often the best solution. Consequently, we recommend testing all combinations of distances and clustering methods (including consensus trees), then selecting the most reliable tree (with the lowest dissimilarity) to estimate FD value. Furthermore we suggest that any index that requires the construction of functional dendrograms potentially benefits from this new approach.  相似文献   

5.
Classification of high-throughput genomic data is a powerful method to assign samples to subgroups with specific molecular profiles. Consensus partitioning is the most widely applied approach to reveal subgroups by summarizing a consensus classification from a list of individual classifications generated by repeatedly executing clustering on random subsets of the data. It is able to evaluate the stability of the classification. We implemented a new R/Bioconductor package, cola, that provides a general framework for consensus partitioning. With cola, various parameters and methods can be user-defined and easily integrated into different steps of an analysis, e.g., feature selection, sample classification or defining signatures. cola provides a new method named ATC (ability to correlate to other rows) to extract features and recommends spherical k-means clustering (skmeans) for subgroup classification. We show that ATC and skmeans have better performance than other commonly used methods by a comprehensive benchmark on public datasets. We also benchmark key parameters in the consensus partitioning procedure, which helps users to select optimal parameter values. Moreover, cola provides rich functionalities to apply multiple partitioning methods in parallel and directly compare their results, as well as rich visualizations. cola can automate the complete analysis and generates a comprehensive HTML report.  相似文献   

6.
Many methods have been used for analysing information about organisms in order to understand tionary relationships and/or to determine classifications. The reationship between some of these methods is illustrated for the character state matrix, incompatibility and similarity matrices, minimal unrooted and rooted trees, and tionary classifications. Existing methods of determining the shortest possible tree are described. In addition a new method of building a minimal tree is introduced which starts with the largest possible subset (clique) of characters that is compatible for all pairs of characters. The remaining characters are ranked in order of their increasing number of incompatibilities. These characters are added singly, a tree constructed and then tested for minimality by previously described methods for partitioning characters into subsets. The procedure is repeated at least until the tree can no longer be proved minimal. The relationship between trees and tionary and phylogenetic classifications has been neglected but three methods are metioned and a new criterion suggested. It is suggested that graph theory, rather than statistics, is better suited for the primary analysis of comparative data.  相似文献   

7.
8.
RAPD markers were generated from 13 different Rubus species in order to assess the degree of similarity between species from the important subgenera. All ten primers revealed scorable polymorphisms within both the closely related and the genetically diverse individuals. Three hundred and seventy-two markers were generated and scored from the material analysed. Estimates of similarity, dendrograms and principle co-ordinate analysis were calculated, with the results generally being in agreemen with previous classifications of the species studied, confirming the validity and usefulness of the RAPD method. However, amongst the species studied, R. macraei of the Idaeobats proved more diverse and grouped in with both the Idaeobats and Eubats at only 26% similarity.  相似文献   

9.
In order to test whether tRN A populations are correlated with (determined by or adapted to) the major proteins synthesized by tumor cells, RPC-5 Chromatographie profiles of aminoacyl-tRNAs from 11 mouse plasmacytomas and from normal adult mouse liver and brain were analyzed by the use of “dissimilarity indices” drawn from all possible pairs of tissues. Cluster analysis was then performed and dendrograms constructed. Although myeloma protein synthesis is only one of many proteins being synthesized by these malignant cells, a novel nonparametric statistical analysis of these dendrograms indicates that independently arising tumors have more similar profiles if their immunoglobulin light and heavy chains are very similar than if these chains are dissimilar (P < 0·015). Even more strikingly significant was the finding that drastic changes in myeloma protein synthesis such as loss of both heavy and light chain synthesis do not result in increased dissimilarity of aminoacyl-tRNA profiles (P < 0·00001). Unlike other eukaryotic systems such as sheep reticulocytes and silk worm silk gland which have been shown to adapt their tRNA populations to changes in protein synthesis, these plasmacytomas do not appear to do so.The novel use of statistical methods, esp. cluster analysis, to examine graphic displays of data may have useful applications in comparing other Chromatographie profiles, densitometric scans, etc.  相似文献   

10.
Investigation of patterns in beta diversity has received increased attention over the last years particularly in light of new ecological theories such as the metapopulation paradigm and metacommunity theory. Traditionally, beta diversity patterns can be described by cluster analysis (i.e. dendrograms) that enables the classification of samples. Clustering algorithms define the structure of dendrograms, consequently assessing their performance is crucial. A common, although not always appropriate approach for assessing algorithm suitability is the cophenetic correlation coefficient c. Alternatively the 2-norm has been recently proposed as an increasingly informative method for evaluating the distortion engendered by clustering algorithms. In the present work, the 2-norm is applied for the first time on field data and is compared with the cophenetic correlation coefficient using a set of 105 pairwise combinations of 7 clustering methods (e.g. UPGMA) and 15 (dis)similarity/distance indices (e.g. Jaccard index). In contrast to the 2-norm, cophenetic correlation coefficient does not provide a clear indication on the efficiency of the clustering algorithms for all combinations. The two approaches were not always in agreement in the choice of the most faithful algorithm. Additionally, the 2-norm revealed that UPGMA is the most efficient clustering algorithm and Ward's the least. The present results suggest that goodness-of-fit measures such as the 2-norm should be applied prior to clustering analyses for reliable beta diversity measures.  相似文献   

11.
Some unavoidable methodic and methodological problems arising at each stage of the classification of objects by the methods of hierarchical clustering are shown on the concrete (biogeographical) and abstract--numerical examples. Reasons which cause these problems and possible ways of minimization of cluster analysis artifacts are discussed. Unavoidable constrains in interpretations of dendrograms obtained by means of agglomerate algorithms are indicated.  相似文献   

12.
Consensusn-trees     
It is not unusual for several classifications to be given for the same collection of objects. We present a method, called majority rule, which can be used to define a consensus of these classifications. We also discuss some mathematical properties of this consensus tree.  相似文献   

13.
In the field of numerical taxonomy it is often desirable to determine a tree which is the consensus (common part) of severaln-trees, each of which represents a classification of the same set of objects. In this note, ans-consensus tree and corresponding consensus index for a collection ofn-trees are defined. The choice of a value for the parameters will determine the number of nodes in thes-consensus tree and its tendency to resemble the strict or Adams-consensus tree.  相似文献   

14.
It is asserted that the postmodern concept of science, unlike the classical ideal, presumes necessary existence of various classification approaches (schools) in taxonomy, each corresponding to a particular aspect of consideration of the "taxic reality". They are set up by diversity of initial epistemological and ontological backgrounds which fix in a certain way a) fragments of that reality allowable for investigation, and b) allowable methods of exploration of the fragments being fixed. It makes it possible to define a taxonomic school as a unity of the above backgrounds together with consideration aspect delimited by them. Two extreme positions of these backgrounds could be recognized in recent taxonomic thought. One of them follows the scholastic tradition of elaboration of a formal and, hence, universal classificatory method ("new typology", numerical phenetics, pattern cladistics). Another one asserts dependence of classificatory approach on the judgment of the nature of taxic reality (natural philosophy, evolutionary schools of taxonomy). Some arguments are put forward in favor of significant impact of evolutionary thinking onto the theory of modern taxonomy. This impact is manifested by the correspondence principle which makes classificatory algorithms (and hence resulting classifications) depending onto initial assumptions about causes of taxic diversity. It is asserted that criteria of "quality" of both classifications proper and classificatory methods can be correctly formulated within the framework of a particular consideration aspect only. For any group of organisms, several particular classifications are rightful to exist, each corresponding to a particular consideration aspect. These classifications could not be arranged along the "better-worse" scale, as they reflect different fragments of the taxic reality. Their mutual interpretation depends on degree of compatibility of background assumptions and of the tasks being resolved. Extensionally, classifications are compatible as much as they coincide by context and hierarchical structure of included taxa. Intentionally, typological classifications are compatible if included taxa are comparable by their diagnoses, while phylogenetic classifications are compatible if the included taxa are ascribed monophyletic status. A brief consideration is given to the "new phylogenetics" (= "genophyletics") as to a classificatory approach aimed at elaboration of parsimonious phylogenetic hypotheses based on molecular biology data and employing numerical methods of cladistic analysis. This approach is shown to borrows some phenetic ideas and revives scholastic principle of unified classificatory basis. It is supposed that, in a time, biological classification would get escaping from plethora of positivistic ideas (including those being developed by nowaday cladistics) and would assimilate (revive) more actively holistic worldview.  相似文献   

15.
We propose a method for a posteriori evaluation of classification stability which compares the classification of sites in the original data set (a matrix of species by sites) with classifications of subsets of its sites created by without‐replacement bootstrap resampling. Site assignments to clusters of the original classification and to clusters of the classification of each subset are compared using Goodman‐Kruskal's lambda index. Many resampled subsets are classified and the mean of lambda values calculated for the classifications of these subsets is used as an estimation of classification stability. Furthermore, the mean of the lambda values based on different resampled subsets, calculated for each site of the data set separately, can be used as a measure of the influence of particular sites on classification stability. This method was tested on several artificial data sets classified by commonly used clustering methods and on a real data set of forest vegetation plots. Its strength lies in the ability to distinguish classifications which reflect robust patterns of community differentiation from unstable classifications of more continuous patterns. In addition, it can identify sites within each cluster which have a transitional species composition with respect to other clusters.  相似文献   

16.
A synopsis of the biology of the Ascomycotina. A wide variety of classifications of the Ascomycotina has been proposed but a consensus is being reached on the main orders that it is appropriate to recognize. Most of these orders are well characterized with respect to their ecology and nutritional requirements although defined primarily on morphology. The 43 orders are displayed diagrammatically to illustrate their host and substratum requirements. This display is intended to stimulate argument and research by broadening the consideration of evolutionary pathways to include ecological and nutritional factors. It will also be of value as a teaching aid; overlays can be constructed to show additional features not treated here.  相似文献   

17.
Questions: How similar are solutions of eight commonly used vegetation classification methods? Which classification methods are most effective according to classification validity evaluators? How do evaluators with different optimality criteria differ in their assessments of classification efficacy? In particular, do evaluators which use geometric criteria (e.g. cluster compactness) and non‐geometric evaluators (which rely on diagnostic species) offer similar classification evaluations? Methods: We analysed classifications of two vegetation data‐sets produced by eight classification methods. Classification solutions were assessed with five geometric and four non‐geometric internal evaluators. We formally introduce three new evaluators: PARTANA, an intuitive variation on evaluators which use the ratio of within/between cluster dissimilarity as the optimality criterion, an adaptation of Morisita's index of niche overlap, and ISAMIC, an algorithm which measures the degree to which species are either always present or always absent within clusters. Results and Conclusions: 1. With the exception of single linkage hierarchical clustering, classifications resulting from the eight methods were often similar. 2. Although evaluators varied in their assessment of best overall classification method, they generally favored three hierarchical agglomerative clustering strategies: flexible beta (β=– 0.25), average linkage, and Ward's linkage. 3. Among introduced evaluators PARTANA appears to be an effective geometric strategy which provides assessments similar to C‐index and Gamma evaluators. Non‐geometric evaluators ISAMIC and Morisita's index demonstrate a strong bias for single linkage solutions. 4. Because non‐geometric criteria are of interest to phytosociologists there is a strong need for their continued development for use with vegetation classifications.  相似文献   

18.
Coalescent-based inference of phylogenetic relationships among species takes into account gene tree incongruence due to incomplete lineage sorting, but for such methods to make sense species have to be correctly delimited. Because alternative assignments of individuals to species result in different parametric models, model selection methods can be applied to optimise model of species classification. In a Bayesian framework, Bayes factors (BF), based on marginal likelihood estimates, can be used to test a range of possible classifications for the group under study. Here, we explore BF and the Akaike Information Criterion (AIC) to discriminate between different species classifications in the flowering plant lineage Silene sect. Cryptoneurae (Caryophyllaceae). We estimated marginal likelihoods for different species classification models via the Path Sampling (PS), Stepping Stone sampling (SS), and Harmonic Mean Estimator (HME) methods implemented in BEAST. To select among alternative species classification models a posterior simulation-based analog of the AIC through Markov chain Monte Carlo analysis (AICM) was also performed. The results are compared to outcomes from the software BP&P. Our results agree with another recent study that marginal likelihood estimates from PS and SS methods are useful for comparing different species classifications, and strongly support the recognition of the newly described species S. ertekinii.  相似文献   

19.
20.
MOTIVATION: Metabolic networks are organized in a modular, hierarchical manner. Methods for a rational decomposition of the metabolic network into relatively independent functional subsets are essential to better understand the modularity and organization principle of a large-scale, genome-wide network. Network decomposition is also necessary for functional analysis of metabolism by pathway analysis methods that are often hampered by the problem of combinatorial explosion due to the complexity of metabolic network. Decomposition methods proposed in literature are mainly based on the connection degree of metabolites. To obtain a more reasonable decomposition, the global connectivity structure of metabolic networks should be taken into account. RESULTS: In this work, we use a reaction graph representation of a metabolic network for the identification of its global connectivity structure and for decomposition. A bow-tie connectivity structure similar to that previously discovered for metabolite graph is found also to exist in the reaction graph. Based on this bow-tie structure, a new decomposition method is proposed, which uses a distance definition derived from the path length between two reactions. An hierarchical classification tree is first constructed from the distance matrix among the reactions in the giant strong component of the bow-tie structure. These reactions are then grouped into different subsets based on the hierarchical tree. Reactions in the IN and OUT subsets of the bow-tie structure are subsequently placed in the corresponding subsets according to a 'majority rule'. Compared with the decomposition methods proposed in literature, ours is based on combined properties of the global network structure and local reaction connectivity rather than, primarily, on the connection degree of metabolites. The method is applied to decompose the metabolic network of Escherichia coli. Eleven subsets are obtained. More detailed investigations of the subsets show that reactions in the same subset are really functionally related. The rational decomposition of metabolic networks, and subsequent studies of the subsets, make it more amenable to understand the inherent organization and functionality of metabolic networks at the modular level. SUPPLEMENTARY INFORMATION: http://genome.gbf.de/bioinformatics/  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号