首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 359 毫秒
1.
Most of the methods used in the multivariate analysis of data on vegetation and environment, or transformations implied in such methods, put disproportionate emphasis on species with a relatively wide ecological amplitude occurring with relatively high cover-abundance values, and/or rare species. This problem can be overcome to some extent by reducing the cover-abundance values to presence-absence data, but this means a severe loss of information. A standardization of values by species maxima as is done automatically in some programs, may lead to an undesirable emphasis on species represented with low values only.In this paper a method is presented, by which relatively low cover-abundance values of species are upweighted to an arbitrarily chosen higher value, if these low values are considered to indicate an optimum response of that particular species. The method has been tested on a selection of 40 phytosociological relevés from dune slacks in the Voorne dunes, as well as on the Dune Meadow data set used in the textbook of Jongman et al. (1987). The cluster structure obtained with the optimum-transformation appears to be clearer and the contribution of typical dune slack species to the cluster structure increased significantly. Canonical correspondence analysis of the transformed data gave slightly more important main axes.Abbreviations CA = Correspondence Analysis - CCA = Canonical Correspondence Analysis - DOL = Detection of Optimality Level - SWOM = Standardized Weighted Optimality Measure - WPGMA = Weighted Pair Group Method Average linking clustering  相似文献   

2.
Summary To facilitate the interpretation of data from a genotype by environment (GE) experiment when the GE interaction is large, a cluster method is proposed to group genotypics according to their response to the environments. The dissimilarity index between a pair of genotypes is defined in terms of distance adjusted for the average effects of genotypes, and Sokal and Michener's (1958) unweighted pair-group method is used in the clustering algorithm. The new index, constructed in each cluster cycle for any group, is shown to be equivalent to within group GE interaction mean square under 2-way ANOVA. Thus, if the F-value is used as an empirical stopping criterion for clustering, there will be no significant GE interaction within groups and the genotypes within the groups can be compared by their average effects. The method of analysis is illustrated by an example.Contribution no. I-348 from the Engineering and Statistical Research Institute  相似文献   

3.
How the complexity of food webs relates to stability has been a subject of many studies. Often, unweighted connectance is used to express complexity. Unweighted connectance is measured as the proportion of realized links in the network. Weighted connectance, on the other hand, takes link weights (fluxes or feeding rates) into account and captures the shape of the flux distribution. Here, we used weighted connectance to revisit the relation between complexity and stability. We used 15 real soil food webs and determined the feeding rates and the interaction strength matrices. We calculated both versions of connectance, and related these structural properties to food web stability. We also determined the skewness of both flux and interaction strength distributions with the Gini coefficient. We found no relation between unweighted connectance and food web stability, but weighted connectance was positively correlated with stability. This finding challenges the notion that complexity may constrain stability, and supports the ‘complexity begets stability’ notion. The positive correlation between weighted connectance and stability implies that the more evenly flux rates were distributed over links, the more stable the webs were. This was confirmed by the Gini coefficients of both fluxes and interaction strengths. However, the most even distributions of this dataset still were strongly skewed towards small fluxes or weak interaction strengths. Thus, incorporating these distribution with many weak links via weighted instead of unweighted food web measures can shed new light on classical theories.  相似文献   

4.
Sixteen clustering methods are compatible with the general recurrence equation of combinatorial SAHN (sequential, agglomerative, hierarchical and nonoverlapping) classificatory strategies. These are subdivided into two classes: the d-SAHN methods seek for minimal between-cluster distances the h-SAHN strategies for maximal within-cluster homogeneity. The parameters and some basic features of all combinatorial methods are listed to allow comparisons between these two families of clustering procedures. Interest is centred on the h-SAHN techniques; the derivation of updating parameters is presented and the monotonicity properties are examined. Three new strategies are described, a weighted and an unweighted variant of the minimization of the increase of average distance within clusters and a homogeneity-optimizing flexible method. The performance of d- and h-SAHN techniques is compared using field data from the rock grassland communities of the Sashegy Nature Reserve, Budapest, Hungary.Abbreviations CP = Closest pair - RNN = Reciprocal nearest neighbor - SAHN = Sequential, agglomerative, hierarchical and nonoverlapping  相似文献   

5.
A new method of species (inverse) classification of vegetation data, i.e. classification of species into groups with similar ecological tolerances, is presented which overcomes the problems of species abundance distorting the results. The algorithm TWO-STEP is based on the use of an asymmetric measure of dissimilarity: where i, j are species, h is the stand, n is the total number of stands, and xih is the amount of species i in stand h. The algorithm uses the rows of the asymmetric dissimilarity matrix generated as above to form a second symmetric dissimilarity matrix using the measure: where m is the number of species and k the species. Flexible sorting is applied to generate a species classification. Comparison of results after applying the TWO-STEP algorithm and a standard alternative to an artificial data set demonstrates its efficacy. TWO-STEP also shows considerable advantages over previous analyses for a Queensland rainforest data set (quantitative) and an English heath (qualitative) data set. Normalization of species data appears advantageous for quantitative data only.  相似文献   

6.
Summary CLUSLA, a computer program for the clustering of very large phytosociological data sets is described. It is an elaboration of Janssen's (1975) simple procedure. The essence of the program is the creation of clusters, each starting with one relevé, as the relevés are entered in the program. Each new relevé that is sufficiently distinct from already existing clusters is considered a new cluster. The fusion criterion is the attainment of a certain level of (dis-) similarity between relevé and cluster. Bray and Curtis' dissimilarity measure with presence-absence data was used.The program, written in FORTRAN for an IBM 370–158 system, can deal with practically unlimited numbers of relevés, provided the product of the number of primary clusters and the number of species does not exceed 140.000. We adopted maxima of 100 and 1400 respectively.After the primary clustering round a reallocation is performed. Then a simple table is printed with information on the significance of occurrence of species in clusters according to a chi-square approach. The primary clusters can be treated again with a higher fusion threshold; or approached with more elaborate methods, in our case particularly the TABORD program.The program is demonstrated with a collection of 6072 relevés with 889 species of salt marsh vegetation from the Working-Group for Data-Processing.Contribution from the Working Group for Data-Processing in Phytosociology, International Society for Vegetation Science. Nomenclature follows the Trieste system, which will be published later.The authors are very grateful to Drs. Jan Janssen, Mike Dale, László Orlóci and Mike Austin for their comments on drafts of the program, and to Wil Kortekaas for her help in the interpretation of the tables.  相似文献   

7.
Questions: How similar are solutions of eight commonly used vegetation classification methods? Which classification methods are most effective according to classification validity evaluators? How do evaluators with different optimality criteria differ in their assessments of classification efficacy? In particular, do evaluators which use geometric criteria (e.g. cluster compactness) and non‐geometric evaluators (which rely on diagnostic species) offer similar classification evaluations? Methods: We analysed classifications of two vegetation data‐sets produced by eight classification methods. Classification solutions were assessed with five geometric and four non‐geometric internal evaluators. We formally introduce three new evaluators: PARTANA, an intuitive variation on evaluators which use the ratio of within/between cluster dissimilarity as the optimality criterion, an adaptation of Morisita's index of niche overlap, and ISAMIC, an algorithm which measures the degree to which species are either always present or always absent within clusters. Results and Conclusions: 1. With the exception of single linkage hierarchical clustering, classifications resulting from the eight methods were often similar. 2. Although evaluators varied in their assessment of best overall classification method, they generally favored three hierarchical agglomerative clustering strategies: flexible beta (β=– 0.25), average linkage, and Ward's linkage. 3. Among introduced evaluators PARTANA appears to be an effective geometric strategy which provides assessments similar to C‐index and Gamma evaluators. Non‐geometric evaluators ISAMIC and Morisita's index demonstrate a strong bias for single linkage solutions. 4. Because non‐geometric criteria are of interest to phytosociologists there is a strong need for their continued development for use with vegetation classifications.  相似文献   

8.
9.
Cluster analysis has proven to be a useful tool for investigating the association structure among genes in a microarray data set. There is a rich literature on cluster analysis and various techniques have been developed. Such analyses heavily depend on an appropriate (dis)similarity measure. In this paper, we introduce a general clustering approach based on the confidence interval inferential methodology, which is applied to gene expression data of microarray experiments. Emphasis is placed on data with low replication (three or five replicates). The proposed method makes more efficient use of the measured data and avoids the subjective choice of a dissimilarity measure. This new methodology, when applied to real data, provides an easy-to-use bioinformatics solution for the cluster analysis of microarray experiments with replicates (see the Appendix). Even though the method is presented under the framework of microarray experiments, it is a general algorithm that can be used to identify clusters in any situation. The method's performance is evaluated using simulated and publicly available data set. Our results also clearly show that our method is not an extension of the conventional clustering method based on correlation or euclidean distance.  相似文献   

10.
A method is introduced to compare results of a clustering technique at different levels of abstraction, or of different clustering techniques. The method emphasizes within cluster homogeneity as well as discontinuities between clusters. It has been derived from Hogeweg's method with some important changes. First each cluster is handled separately to determine the ratio between homogeneity and similarity to the nearest neighbour cluster. For a given clustering a weighted average value is computed over all clusters. This average value is standardized using an expected average value for a cluster configuration with the same number of clusters having the same sizes. A low level of the ratio between expected and observed values is supposed to indicate an optimal clustering. A derivation of the criterion is given and results from three sets of data with different properties are evaluated.  相似文献   

11.
The Limnanthaceae is a small family of North American herbs with uncertain internal relationships. Taxa of the family were compared on the basis of 46 flavonol glycosides occurring in all plant tissues or in petals only. The flavonoid data were analyzed by 3 numerical taxonomic techniques: 1) clustering by the Weighted Pair Group method considering positive and negative matches (Using Simple Matching coefficient) ; 2) clustering by the Weighted Pair Group method considering only positive matches (using Jaccard coefficient) ; 3) Varimax Factor Analysis with rotation. No significantly different results were produced by the two clustering methods. Factor analysis provided a clearer indication of relationships among taxa of the Limnanthaceae than did conventional clustering analysis. Apparently, ordination of taxa in three dimensions is necessary to accurately express relationships between divergent evolutionary lines of the family since much of the expressed variation in flavonoids cannot be accounted for by a one-dimensional clustering technique.  相似文献   

12.
Determining true genetic dissimilarity between individuals is an important and decisive point for clustering and analysing diversity within and among populations, because different dissimilarity indices may yield conflicting outcomes. We show that there are no acceptable universal approaches to assessing the dissimilarity between individuals with molecular markers. Different measures are relevant to dominant and codominant DNA markers depending on the ploidy of organisms. The Dice coefficient is the suitable measure for haploids with codominant markers and it can be applied directly to (0,1)-vectors representing banding profiles of individuals. None of the common measures, Dice, Jaccard, simple mismatch coefficient (or the squared Euclidean distance), is appropriate for diploids with codominant markers. By transforming multiallelic banding patterns at each locus into the corresponding homozygous or heterozygous states, a new measure of dissimilarity within locus was developed and expanded to assess dissimilarity between multilocus states of two individuals by averaging across all codominant loci tested. There is no rigorous well-founded solution in the case of dominant markers. The simple mismatch coefficient is the most suitable measure of dissimilarity between banding patterns of closely related haploid forms. For distantly related haploid individuals, the Jaccard dissimilarity is recommended. In general, no suitable method for measuring genetic dissimilarity between diploids with dominant markers can be proposed. Banding patterns of diploids with dominant markers and polyploids with codominant markers represent individuals' phenotypes rather than genotypes. All dissimilarity measures proposed and developed herein are metrics.  相似文献   

13.
14.
We have examined the molecular-phylogenetic relationships between nonmulberry and mulberry silkworm species that belong to the families Saturniidae, Bombycidae and Lasiocampidae using 16S ribosomal RNA (16S rRNA) and cytochrome oxidase subunit I (coxI) gene sequences. Aligned nucleotide sequences of 16S rRNA andcoxI from 14 silk-producing species were used for construction of phylogenetic trees by maximum likelihood and maximum parsimony methods. The tree topology on the basis of 16S rRNA supports monophyly for members of Saturniidae and Bombycidae. Weighted parsimony analysis weighted towards transversions relative to transitions (ts, tv4) forcoxI resulted in more robust bootstrap support over unweighted parsimony and favours the 16S rRNA tree topology. Combined analysis reflected clear biogeographic pattern, and agrees with morphological and cytological data.  相似文献   

15.
The topological importance of species within networks is an important way of bringing a species-level consideration to the study of whole ecological networks. There are many different indices of topological importance, including centrality indices, but it is likely that a small number are sufficient to explain variation in topological importance. We used 14 indices to describe topological importance of plants and pollinators in 12 quantitative mutualistic (plant–pollinator) networks. The 14 indices varied in their consideration of interaction strength (weighted versus unweighted indices) and indirect interactions (from the local measure of degree to meso-scale indices). We use principal components approximation to assess how well every combination of 1–14 indices approximated to the results of principal components analysis (PCA). We found that one or two indices were sufficient to explain up to 90% of the variation in topological importance in both plants and pollinators. The choice of index was crucial because there was considerable variation between the best and the worst approximating subsets of indices. The best single indices were unweighted degree and unweighted topological importance (Jordán's TI index) with two steps (a measurement of apparent competition). The best pairs of indices consisted of a measure of a TI index and one of closeness centrality (weighted or unweighted) or d′ (a standardised species-level measure of partner diversity). Although we have found indices that efficiently explain variation in topological importance, we recommend further research to discover the real-world relevance of different aspects of topological importance to species in ecological networks.  相似文献   

16.
MOTIVATION: The success of each method of cluster analysis depends on how well its underlying model describes the patterns of expression. Outlier-resistant and distribution-insensitive clustering of genes are robust against violations of model assumptions. RESULTS: A measure of dissimilarity that combines advantages of the Euclidean distance and the correlation coefficient is introduced. The measure can be made robust using a rank order correlation coefficient. A robust graphical method of summarizing the results of cluster analysis and a biological method of determining the number of clusters are also presented. These methods are applied to a public data set, showing that rank-based methods perform better than log-based methods. AVAILABILITY: Software is available from http://www.davidbickel.com.  相似文献   

17.
Traditional measures of structural stiffness in the primate skeleton do not consider the heterogeneous material stiffness distribution of bone. This assumption of homogeneity introduces an unknown degree of error in estimating stiffness in skeletal elements. Measures of weighted stiffness can be developed by including heterogeneous grayscale variations evident in computed tomographic (CT) images. Since gray scale correlates with material stiffness, the distribution of bone quality and quantity can be simultaneously considered. We developed weighted measures of bending resistance and applied these to CT images at three locations along the mandibular corpus in the hominoids Gorilla, Pongo, and Pan. We calculated the traditional (unweighted) moment of inertia for comparison to our weighted measure, which weighs each pixel by its gray-scale value. This weighing results in assignment of reduced moment of inertia values to sections of reduced density. Our weighted and unweighted moments differ by up to 22%. These differences are not consistent among sections, however, such that they cannot be calculated by simple correction of unweighted moments. The effect of this result is that the rank ordering of individual sections within species changes if weighted moments are considered. These results suggest that the use of weighted moments may spur different interpretations of comparative data sets that rely on stiffness measures as estimates of biomechanical competence.  相似文献   

18.

Background

Many clustering procedures only allow the user to input a pairwise dissimilarity or distance measure between objects. We propose a clustering method that can input a multi-point dissimilarity measure d(i1, i2, ..., iP) where the number of points P can be larger than 2. The work is motivated by gene network analysis where clusters correspond to modules of highly interconnected nodes. Here, we define modules as clusters of network nodes with high multi-node topological overlap. The topological overlap measure is a robust measure of interconnectedness which is based on shared network neighbors. In previous work, we have shown that the multi-node topological overlap measure yields biologically meaningful results when used as input of network neighborhood analysis.

Findings

We adapt network neighborhood analysis for the use of module detection. We propose the Module Affinity Search Technique (MAST), which is a generalized version of the Cluster Affinity Search Technique (CAST). MAST can accommodate a multi-node dissimilarity measure. Clusters grow around user-defined or automatically chosen seeds (e.g. hub nodes). We propose both local and global cluster growth stopping rules. We use several simulations and a gene co-expression network application to argue that the MAST approach leads to biologically meaningful results. We compare MAST with hierarchical clustering and partitioning around medoid clustering.

Conclusion

Our flexible module detection method is implemented in the MTOM software which can be downloaded from the following webpage: http://www.genetics.ucla.edu/labs/horvath/MTOM/  相似文献   

19.
Summary In this paper, a dissimilarity measure for artificial organisms is proposed. The organisms are simulated in the Framsticks system [10]. Properties of agents are enumerated formally, and the heuristic algorithm for estimating overall phenetic dissimilarity of two agents is described. An example of performance is shown on two selected organisms. Two clustering experiments with interesting results are presented using the UPGMA method. The properties of the measure are then discussed. Computer simulations of complex systems and their characteristics are compared to biological systems, which may bring up ideas for further experiments related to biology.  相似文献   

20.

Background  

A cluster analysis is the most commonly performed procedure (often regarded as a first step) on a set of gene expression profiles. In most cases, a post hoc analysis is done to see if the genes in the same clusters can be functionally correlated. While past successes of such analyses have often been reported in a number of microarray studies (most of which used the standard hierarchical clustering, UPGMA, with one minus the Pearson's correlation coefficient as a measure of dissimilarity), often times such groupings could be misleading. More importantly, a systematic evaluation of the entire set of clusters produced by such unsupervised procedures is necessary since they also contain genes that are seemingly unrelated or may have more than one common function. Here we quantify the performance of a given unsupervised clustering algorithm applied to a given microarray study in terms of its ability to produce biologically meaningful clusters using a reference set of functional classes. Such a reference set may come from prior biological knowledge specific to a microarray study or may be formed using the growing databases of gene ontologies (GO) for the annotated genes of the relevant species.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号