首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
We propose a method for a posteriori evaluation of classification stability which compares the classification of sites in the original data set (a matrix of species by sites) with classifications of subsets of its sites created by without‐replacement bootstrap resampling. Site assignments to clusters of the original classification and to clusters of the classification of each subset are compared using Goodman‐Kruskal's lambda index. Many resampled subsets are classified and the mean of lambda values calculated for the classifications of these subsets is used as an estimation of classification stability. Furthermore, the mean of the lambda values based on different resampled subsets, calculated for each site of the data set separately, can be used as a measure of the influence of particular sites on classification stability. This method was tested on several artificial data sets classified by commonly used clustering methods and on a real data set of forest vegetation plots. Its strength lies in the ability to distinguish classifications which reflect robust patterns of community differentiation from unstable classifications of more continuous patterns. In addition, it can identify sites within each cluster which have a transitional species composition with respect to other clusters.  相似文献   

Consider the scenario of common gene clusters of closely related species where the cluster sizes could be as large as 400 from an alphabet of 25,000 genes. This paper addresses the problem of computing the statistical significance of such large clusters, whose individual elements occur with very low frequency (of the order of the number of species in this case) and the alphabet set of the elements is relatively large. We present a model where we study the structure of the clusters in terms of smaller nested (or otherwise) sub-clusters contained within the cluster. We give a probability estimation based on the expected cluster structure for such clusters (rather than some form of the product of individual probabilities of the elements). We also give an exact probability computation based on a dynamic programming algorithm, which runs in polynomial time.  相似文献   

The updating and rethinking of vegetation classifications is important for ecosystem monitoring in a rapidly changing world, where the distribution of vegetation is changing. The general assumption that discrete and persistent plant communities exist that can be monitored efficiently, is rarely tested before undertaking a classification. Marion Island (MI) is comprised of species-poor vegetation undergoing rapid environmental change. It presents a unique opportunity to test the ability to discretely classify species-poor vegetation with recently developed objective classification techniques and relate it to previous classifications. We classified vascular species data of 476 plots sampled across MI, using Ward hierarchical clustering, divisive analysis clustering, non-hierarchical kmeans and partitioning around medoids. Internal cluster validation was performed using silhouette widths, Dunn index, connectivity of clusters and gap statistic. Indicator species analyses were also conducted on the best performing clustering methods. We evaluated the outputs against previously classified units. Ward clustering performed the best, with the highest average silhouette width and Dunn index, as well as the lowest connectivity. The number of clusters differed amongst the clustering methods, but most validation measures, including for Ward clustering, indicated that two and three clusters are the best fit for the data. However, all classification methods produced weakly separated, highly connected clusters with low compactness and low fidelity and specificity to clusters. There was no particularly robust and effective classification outcome that could group plots into previously suggested vegetation units based on species composition alone. The relatively recent age (c. 450,000 years B.P.), glaciation history (last glacial maximum 34,500 years B.P.) and isolation of the sub-Antarctic islands may have hindered the development of strong vascular plant species assemblages with discrete boundaries. Discrete classification at the community-level using species composition may not be suitable in such species-poor environments. Species-level, rather than community-level, monitoring may thus be more appropriate in species-poor environments, aligning with continuum theory rather than community theory.  相似文献   

Question: Community ecologists are often confronted with multiple possible partitions of a single set of records of species composition and/or abundances from several sites. Different methods of numerical classification produce different results, and the question is which of them, and how many clusters, should be selected for interpretation. We demonstrate a new method for identifying the optimal partition from a series of partitions of the same set of sites, based on number of species with high fidelity to clusters in a partition (faithful species). Methods: The new method, OptimClass, has two variants. OptimClass 1 searches the partition with the maximum number of faithful species across all clusters, while OptimClass 2 searches the partition with the maximum number of clusters that contain at least a preselected minimum number of faithful species. Faithful species are determined based on the P value of the Fisher's exact test, as a measure of fidelity. OptimClass was tested on three vegetation datasets that varied in species richness and internal heterogeneity, using several classification algorithms, resemblance measures and cover transformations. Results: Results from both variants of OptimClass depended on the preselected threshold P value for faithful species: higher P gave higher probability that a partition with more clusters was selected as optimal. Good partitions, in terms of OptimClass criteria, involved flexible beta clustering, and also ordinal clustering. Good partitions were also obtained with TWINSPAN when the required number of clusters was small, or UPGMA when the required number of clusters was large. Poor partitions usually resulted from classifications that used resemblance measures and cover transformations emphasizing differences in species cover; this is not unexpected because OptimClass uses a presence/absence‐based fidelity measure. Conclusions: If the aim of a classification is to obtain clusters rich in faithful species, which can be subsequently used as diagnostic species for identification of community types, OptimClass is a suitable method for simultaneous choice of the optimal classification algorithm and optimal number of clusters. It can be computed in the JUICE program.  相似文献   

Plants colonize post-mining areas such as spoil heaps even without human interference, by spontaneous succession. Knowledge of this process, its direction and role in shaping the vegetation is desirable not only from an ecological point of view but also in the application of remediation techniques of such environmental burdens. The aim of our study was to analyse plant species composition and to describe a succession pattern at two mineralogically different spoil heaps near the city of Banská Bystrica (central Slovakia). Hierarchical cluster analysis was used in PC-ORD software to classify physiognomic vegetation types (PVT) at heaps and synoptic tables were generated in Juice software according to species percentage of frequency and their median cover. At the Hg-heap Ve?ká studňa we recorded a total of 335 taxa of vascular plants and 83 taxa of bryophytes within 21 identified PVT, which were grouped into 4 clusters (forest, wetland, grassland and ruderal cluster). A total of 111 taxa of vascular plants and 76 taxa of bryophytes were recorded at the Cu-heap Podlipa and here 9 PVT formed 3 clusters (initial, non-forest and forest cluster). At both heaps PVT, represented as individual successional stages, formed a distinct continuum along the moisture gradient (in dry and wet areas). We can consider both heaps as species-rich with different diagnostic and constant species and a development of various microhabitats. In addition, bryophytes greatly contribute to species composition and richness and their assamblages in initial stages of succession can be important towards the higher ones.  相似文献   

Changes in composition and structure of alpine and subalpine plant communities in relation to ecological factors were analysed in the Nízke Tatry Mts, Slovakia. Species cover values of vascular and non-vascular plants in each vegetation plot were recorded on the nine-degree scale. A data set of 156 relevés of alpine and subalpine vegetation was sampled recently during one year in the eastern part of the Nízke Tatry National Park. The data set was analysed by cluster analysis and Detrended Correspondence Analysis. analyses were carried out on the entire data set, including the subset of short grassland and dwarf-shrub vegetation. Major gradients and clusters were ecologically interpreted using Ellenberg indicator values. In the entire data set, the major gradient in species composition was associated with nutrient availability and the second most important gradient with light. In the case of short grassland and dwarf-shrub vegetation, the gradients were different. The first one was associated with soil reaction and the second gradient was associated with moisture. Clusters proposed by numerical classification reproduced many traditional phytosociological associations, namely Seslerietum distichae, Sphagno capillifolii-Empetretum nigri, Junco trifidi-Callunetum vulgaris, Juncetum trifidi, Dryopterido dilatatae-Pinetum mugo, Luzuletum obscurae, Agrostio pyrenaiceae-Nardetum strictae, while some other associations were less clearly differentiated (communities of the alliances Calamagrostion villosae, Adenostylion alliariae, Trisetion fusci, Cratoneuro filicini-Calthion laetae or Salicion herbaceae). The next clusters included Vaccinium and Festuca supina dominated communities and artificial roadside grasslands sown 50 years ago. Bryophytes and lichens were highly represented among diagnostic species of particular associations. Distribution pattern of particular plant communities was strongly influenced by site position either on northern or southern slope of the mountains.  相似文献   

Abstract. The first objective of this paper is to define a new measure of fidelity of a species to a vegetation unit, called u. The value of u is derived from the approximation of the binomial or the hypergeometric distribution by the normal distribution. It is shown that the properties of u meet the requirements for a fidelity measure in vegetation science, i.e. (1) to reflect differences of a species’relative frequency inside a certain vegetation unit and its relative frequency in the remainder of the data set; (2) to increase with increasing size of the data set. Additionally (3), u has the property to be dependent on the proportion of the vegetation unit's size to the size of the whole data set. The second objective is to present a method of how to use the value of u for finding species groups in large data bases and for defining vegetation units. A species group is defined by possession of species that show the highest value of u among all species in the data set with regard to the vegetation unit defined by this species group. The vegetation unit is defined as comprising all relevés that include a minimum number of the species in the species group. This minimum number is derived statistically in such a way that fewer relevés always belong to a species group than would be expected if the differential species were distributed randomly among the relevés. An iterative algorithm is described for detecting species groups in data bases. Starting with an initial species group, species composition of this group and the vegetation unit defined by this group are mutually optimized. With this algorithm species groups are formed in a data set independently of each other. Subsequently, these species groups can be combined in such a way that they are suited to define commonly known syntaxa a posteriori.  相似文献   

Associations among species of vascular plants in twelve predefined vegetation zones on Mt Wellington, Tasmania, are described. The vegetation zones were based mainly on the dominant species of eucalypt contained therein, following an approach used previously by Martin (1940). The twelve zones were subsequently reduced to a total of six cluster groups using hierarchical classification procedures. Character species, in the sense of the Braun-Blanquet (1928) system of phytosociology, are identified and tabulated for each group. It is suggested that the division of a survey area into zones based upon dominant euealypts may be applicable to the study of vegetation communities in other parts of Tasmania.  相似文献   

Numerical classification of 2653 geographically stratified relevés of weed vegetation from the Czech and Slovak Republics was performed with cluster analysis. Diagnostic species were determined for each of the seven main clusters using statistical measures of fidelity. The classification reflected clear distinctions between lowland (mostly calcicole) and highland (mostly calcifuge) sites, spring and summer phenological stages, and cereals and root crops. The results of the cluster analysis were compared with traditional phytosociological units. Two clusters corresponded to calcifuge weed vegetation of theScleranthion annui alliance; one cluster represented the vegetation of root crops on moist soils of theOxalidion europaeae alliance; one cluster contained thermophilous weed vegetation of theCaucalidion lappulae alliance; two clusters included weed vegetation of root crops and of stubble fields, which can be assigned to theCaucalidion, Panico-Setarion,Veronico-Euphorbion andEragrostion alliances; one cluster included vernal weed vegetation in little disturbed habitats of theCaucalidion lappulae andScleranthion annui alliances. Our analysis did not support the concept of theSherardion andVeronico-Taraxacion alliances, which were included in earlier overviews of the vegetation units of the Czech Republic and Slovakia.  相似文献   

This study developed a methodology to temporally classify large scale, upper level atmospheric conditions over North America, utilizing a newly-developed upper level synoptic classification (ULSC). Four meteorological variables: geopotential height, specific humidity, and u- and v-wind components, at the 500 hPa level over North America were obtained from the NCEP/NCAR Reanalysis Project dataset for the period 1965-1974. These data were subjected to principal components analysis to standardize and reduce the dataset, and then an average linkage clustering algorithm identified groups of observations with similar flow patterns. The procedure yielded 16 clusters. These flow patterns identified by the ULSC typify all patterns expected to be observed over the study area. Additionally, the resulting cluster calendar for the period 1965-1974 showed that the clusters are generally temporally continuous. Subsequent classification of additional observations through a z-score method produced acceptable results, indicating that additional observations may easily be incorporated into the ULSC calendar. The ULSC calendar of synoptic conditions can be used to identify situations that lead to periods of extreme weather, i.e., heat waves, flooding and droughts, and to explore long-distance dispersal of airborne particles and biota across North America.  相似文献   

In this paper we consider one method of mapping larger units identified from the spatial pattern of sequences of vegetation types. The basic data were presence/absence data for 6450 stands arranged in 90 transects. A second set of data was derived by averaging the species occurrences in non-overlapping groups of 5 stands. A divisive numerical classification was used to determine the primary vegetation units. In all, 5 different sets of primary types were derived, using different species suites, different sample sizes and different numerical methods. We briefly discuss the types identified and their spatial patterns in the area.Each of these types was then used to define a string of type-codes for every transect so that each transect represents a sample from the landscape containing information on the frequency and spatial distribution of the primary vegetation types. The transects may be classified using a Levenshtein dissimilarity measure and agglomerative hierarchical classification, giving 5 analyses of transects, one for each of the primary types discussed above. We then examine these transect classifications to investigate the stability of the vegetation landspace patterns under changes in species used for the primary classification, in size of sample unit and in method of primary classifications. There is a considerable degree of stability in the results. However it seems with this vegetation that the tree species and non-tree species have considerable independence. We also indicate some problems with this approach and some possible extensions.  相似文献   

Finding subtypes of heterogeneous diseases is the biggest challenge in the area of biology. Often, clustering is used to provide a hypothesis for the subtypes of a heterogeneous disease. However, there are usually discrepancies between the clusterings produced by different algorithms. This work introduces a simple method which provides the most consistent clusters across three different clustering algorithms for a melanoma and a breast cancer data set. The method is validated by showing that the Silhouette, Dunne's and Davies-Bouldin's cluster validation indices are better for the proposed algorithm than those obtained by k-means and another consensus clustering algorithm. The hypotheses of the consensus clusters on both the data sets are corroborated by clear genetic markers and 100 percent classification accuracy. In Bittner et al.'s melanoma data set, a previously hypothesized primary cluster is recognized as the largest consensus cluster and a new partition of this cluster into two subclusters is proposed. In van't Veer et al.'s breast cancer data set, previously proposed "basal” and "luminal A” subtypes are clearly recognized as the two predominant clusters. Furthermore, a new hypothesis is provided about the existence of two subgroups within the "basal” subtype in this data set. The clusters of van't Veer's data set is also validated by high classification accuracy obtained in the data set of van de Vijver et al.  相似文献   

In connection with a phytosociological survey of running water macrophytes in Lower Saxony, ecological investigations were carried out in selected river systems. Within these systems, 43 sampling sites were studied. The vegetation of the sampling sites was classified by means of cluster analysis into 7 groups, 3 of which occurred on the diluvial plains and 2 in the coastal marsh area only. Forty-one parameters were measured 3–7 times covering 2 vegetation periods. In the first instance, the structure of the data was carefully studied by bivariate correlation analysis and factor analysis. A high number of significant correlations was detected, which indicates difficulties in ecological interpretation. Temporal variation of the parameters measured was also studied, and they were classified into 3 groups according to stability. For a study of the relationships between the vegetation and the ecological parameters, the data set was split into 5 subsets (physical data, water chemical data, interstitial water chemical data, sediment characteristics, and a mixed set of simple field data). The relationships of each subset to the vegetation was studied separately using cluster analysis. The mixed data set FIELD showed the highest degree of similarity to the vegetation clustering. Analysis of variance was carried out in order to find out which variables differ most among the vegetation types. The best differentiation qualities were shown by some physical and water chemical parameters (oxygen content, turbitity, current velocity, acidity, calcium). This result can only be interpreted ecologically in connection with the intercorrelations observed. The ecological behaviour of some species of medium frequency was also studied in detail by means of analysis of variance. The means of all parameters for occurrence and non-occurrence were compared. In the case of Ranunculus peltatus Schrank, Myriophyllum alterniflorum DC and Elodea canadensis Michx., several differentiation variables could be detected. Finally, the zonation of two rivers was studied in detail by comparing the vegetation sequence with important physical and chemical parameters. The interaction between these parameter groups is clearly shown. Physical parameters like current velocity are responsible for the basic zonation, whilst chemical parameters can modify the zones to a large extent. The necessity for a comprehensive approach to such types of data sets, including profound structural data analysis, is stressed in the discussion. The special problem of relating phytosociological and ecological data is discussed. The methods used are explained and possible objections are noted. The difficulties of using the habitat ecological results for bioindication purposes are outlined. Spatial autocorrelation, vegetation dynamics, interactive processes between the system parameters and the genetic variability of species have to be considered as the main problems in this special application. Nevertheless, the study produced some results which indicated the significance of physical, chemical and sediment parameters for macrophyte growth in the type of waters under investigation, and suggested subject areas for future research.  相似文献   

The specific species-rich high-altitude vegetation of the class Carici rupestris-Kobresietea bellardii Ohba 1974 (CK), with the occurrence of many arctic-alpine and endemic species, was chosen for a case study. The analyses were based on a dataset of 37,204 phytosociological relevés from the Slovak Vegetation Database. The traditional classification of the class CK, based on cluster analyses, was reproduced satisfactorily by means of formalised classification, based on the formal definitions created by the Cocktail method together with the frequency-positive fidelity index affiliation. Unequivocal assignment criteria for all eight associations of both alliances [Oxytropido-Elynion Br.-Bl. (1948) 1949 and Festucion versicoloris Krajina 1933] of the class CK were formulated. The formal delimitations followed the traditional ones very well. It was demonstrated that the results of applying the formal definitions created on the basis of a large, geographically stratified dataset capturing the occurrence of all vegetation types in Slovakia were highly similar in comparison with the traditional classification based on the results of cluster analysis. The reliability and the pros and cons of the expert system are also discussed.  相似文献   

Classification is a data mining task the goal of which is to learn a model, from a training dataset, that can predict the class of a new data instance, while clustering aims to discover natural instance-groupings within a given dataset. Learning cluster-based classification systems involves partitioning a training set into data subsets (clusters) and building a local classification model for each data cluster. The class of a new instance is predicted by first assigning the instance to its nearest cluster and then using that cluster’s local classification model to predict the instance’s class. In this paper, we present an ant colony optimization (ACO) approach to building cluster-based classification systems. Our ACO approach optimizes the number of clusters, the positioning of the clusters, and the choice of classification algorithm to use as the local classifier for each cluster. We also present an ensemble approach that allows the system to decide on the class of a given instance by considering the predictions of all local classifiers, employing a weighted voting mechanism based on the fuzzy degree of membership in each cluster. Our experimental evaluation employs five widely used classification algorithms: naïve Bayes, nearest neighbour, Ripper, C4.5, and support vector machines, and results are reported on a suite of 54 popular UCI benchmark datasets.  相似文献   

鲍雅静  李政海 《生态学报》2008,28(9):4540-4546
植物功能群(plant functional groups, PFGs) 是具有确定的植物功能特征的一系列植物的组合,是生态学家为研究植被对气候变化和干扰的响应而引入的生态学概念.目前功能群研究中最核心的问题仍在于决定植物功能群划分的植物特征的选择上.以内蒙古锡林河流域草原植物群落为例,选取3个草原类型(羊草草原、大针茅草原和羊草草甸草原)及其退化梯度系列(未退化、轻度退化、中度退化、重度退化),在对植物热值进行分析测定的基础上,依据植物的能量属性-单位重量干物质在完全燃烧后所释放出来的热量值,采用人为分段的方法对草原植物进行了能量功能群的划分(高能值植物功能群、中能值植物功能群和低能值植物功能群).并探讨了这种能量功能群划分方法在草原植被动态研究中的客观性与可行性.  相似文献   

Summary CLUSLA, a computer program for the clustering of very large phytosociological data sets is described. It is an elaboration of Janssen's (1975) simple procedure. The essence of the program is the creation of clusters, each starting with one relevé, as the relevés are entered in the program. Each new relevé that is sufficiently distinct from already existing clusters is considered a new cluster. The fusion criterion is the attainment of a certain level of (dis-) similarity between relevé and cluster. Bray and Curtis' dissimilarity measure with presence-absence data was used.The program, written in FORTRAN for an IBM 370–158 system, can deal with practically unlimited numbers of relevés, provided the product of the number of primary clusters and the number of species does not exceed 140.000. We adopted maxima of 100 and 1400 respectively.After the primary clustering round a reallocation is performed. Then a simple table is printed with information on the significance of occurrence of species in clusters according to a chi-square approach. The primary clusters can be treated again with a higher fusion threshold; or approached with more elaborate methods, in our case particularly the TABORD program.The program is demonstrated with a collection of 6072 relevés with 889 species of salt marsh vegetation from the Working-Group for Data-Processing.Contribution from the Working Group for Data-Processing in Phytosociology, International Society for Vegetation Science. Nomenclature follows the Trieste system, which will be published later.The authors are very grateful to Drs. Jan Janssen, Mike Dale, László Orlóci and Mike Austin for their comments on drafts of the program, and to Wil Kortekaas for her help in the interpretation of the tables.  相似文献   

A new vegetation-ecological approach is proposed for classification and evaluation of vegetation zones by means of phytosociological landscape analysis, based on the potential natural vegetation. The study area is the “Fagetea crenatae region” of the cool-temperate zone of Tohoku (northern Honshu) and the northern parts of Kanto. The area was divided into 953 geographic quadrats on a base map at a scale of 1 ∶ 500000. Based on climax complexes of vegetation in each quadrat, 55 community sub-groups were distinguished as basic units of community complex and vegetation landscapes. The community sub-groups were then grouped into 17 larger community groups by the phytosociological table method. As a result, three phytogeographic vegetation zones (Japan Sea side, inland areas and Pacific side) were classified. For each of these community sub-groups, five geographical and climatic variables (average altitude, mean annual temperature, Kira's warmth index, annual precipitation and mean annual maximum snow depth) were averaged, and the community sub-groups in the same community group, which resembled each other ecologically, were assembled into 28 clusters. The clusters were combined into 11 ecological groups by means of Pearson's similarity ratio of geographical and climatic characteristics. By comparing these ecological groups as a vegetation complex, four phytogeographic vegetation zones (Japan Sea side, inland areas, Pacific side and northern Honshu) corresponding to each potential natural vegetation region with distinct environmental characteristics, were newly classified.  相似文献   

Abstract. Statistical measures of fidelity, i.e. the concentration of species occurrences in vegetation units, are reviewed and compared. The focus is on measures suitable for categorical data which are based on observed species frequencies within a vegetation unit compared with the frequencies expected under random distribution. Particular attention is paid to Bruelheide's u value. It is shown that its original form, based on binomial distribution, is an asymmetric measure of fidelity of a species to a vegetation unit which tends to assign comparatively high fidelity values to rare species. Here, a hypergeometric form of u is introduced which is a symmetric measure of the joint fidelity of species to a vegetation unit and vice versa. It is also shown that another form of the binomial u value may be defined which measures the asymmetric fidelity of a vegetation unit to a species. These u values are compared with phi coefficient, chi‐square, G statistic and Fisher's exact test. Contrary to the other measures, phi coefficient is independent of the number of relevés in the data set, and like the hypergeometric form of u and the chi‐square it is little affected by the relative size of the vegetation unit. It is therefore particularly useful when comparing species fidelity values among differently sized data sets and vegetation units. However, unlike the other measures it does not measure any statistical significance and may produce unreliable results for small vegetation units and small data sets. The above measures, all based on the comparison of observed/expected frequencies, are compared with the categorical form of the Dufrêne‐Legendre Indicator Value Index, an index strongly underweighting the fidelity of rare species. These fidelity measures are applied to a data set of 15 989 relevés of Czech herbaceous vegetation. In a small subset of this data set which simulates a phytosociological table, we demonstrate that traditional table analysis fails to determine diagnostic species of general validity in different habitats and large areas. On the other hand, we show that fidelity calculations used in conjunction with large data sets can replace expert knowledge in the determination of generally valid diagnostic species. Averaging positive fidelity values for all species within a vegetation unit is a useful approach to measure quality of delimination of the vegetation unit. We propose a new way of ordering species in synoptic species‐by‐relevé tables, using fidelity calculations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号