首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Conservation planning requires knowledge of the distribution of all species in the area of interest. Surrogates for biodiversity are considered as a possible solution. The two major types are biological and environmental surrogates. Here, we evaluate four different methods of hierarchical clustering, as well as one non-hierarchical method, in the context of producing surrogates for biodiversity. Each clustering method was used to produce maps of both surrogate types. We evaluated the representativeness of each clustering method by finding the average number of species represented in a set of sites, one site of each domain, which was carried out with Monte-Carlo permutations procedure. We propose an additional measure of surrogate performance, which is the degree of evenness of the different domains, e.g., by calculating Simpson's diversity index. Surrogates with low evenness leave little flexibility in site selection since often some of the domains may be represented by a single or very few sites, and thus surrogate maps with a high Simpson's index value may be more relevant for actual decision making. We found that there is a trade-off between species representativeness and evenness. Centroid clustering represented the most species, but had very low values of evenness. Ward's method of minimum variance represented more species than a random choice, and had high evenness values. Using the typical evaluation measures, the Centroid clustering method was most efficient for surrogate production. However, when Simpson's index is also considered, Ward's method of minimum variance is more appropriate for managers.  相似文献   

2.
3.
4.
MOTIVATION: A large fraction of biological research concentrates on individual proteins and on small families of proteins. One of the current major challenges in bioinformatics is to extend our knowledge to very large sets of proteins. Several major projects have tackled this problem. Such undertakings usually start with a process that clusters all known proteins or large subsets of this space. Some work in this area is carried out automatically, while other attempts incorporate expert advice and annotation. RESULTS: We propose a novel technique that automatically clusters protein sequences. We consider all proteins in SWISSPROT, and carry out an all-against-all BLAST similarity test among them. With this similarity measure in hand we proceed to perform a continuous bottom-up clustering process by applying alternative rules for merging clusters. The outcome of this clustering process is a classification of the input proteins into a hierarchy of clusters of varying degrees of granularity. Here we compare the clusters that result from alternative merging rules, and validate the results against InterPro. Our preliminary results show that clusters that are consistent with several rather than a single merging rule tend to comply with InterPro annotation. This is an affirmation of the view that the protein space consists of families that differ markedly in their evolutionary conservation.  相似文献   

5.
A hybrid GA (genetic algorithm)-based clustering (HGACLUS) schema, combining merits of the Simulated Annealing, was described for finding an optimal or near-optimal set of medoids. This schema maximized the clustering success by achieving internal cluster cohesion and external cluster isolation. The performance  相似文献   

6.
7.
SUMMARY: ReMark is a fully automatic tool for clustering orthologs by combining a Recursive and a Markov clustering (MCL) algorithms. The ReMark detects and recursively clusters ortholog pairs through reciprocal BLAST best hits between multiple genomes running software program (RecursiveClustering.java) in the first step. Then, it employs MCL algorithm to compute the clusters (score matrices generated from the previous step) and refines the clusters by adjusting an inflation factor running software program (MarkovClustering.java). This method has two key features. One utilizes, to get more reliable results, the diagonal scores in the matrix of the initial ortholog clusters. Another clusters orthologs flexibly through being controlled naturally by MCL with a selected inflation factor. Users can therefore select the fitting state of orthologous protein clusters by regulating the inflation factor according to their research interests. AVAILABILITY AND IMPLEMENTATION: Source code for the orthologous protein clustering software is freely available for non-commercial use at http://dasan.sejong.ac.kr/~wikim/notice.html, implemented in Java 1.6 and supported on Windows and Linux.  相似文献   

8.
BACKGROUND: Artificial neural networks (ANNs) have been shown to be valuable in the analysis of analytical flow cytometric (AFC) data in aquatic ecology. Automated extraction of clusters is an important first stage in deriving ANN training data from field samples, but AFC data pose a number of challenges for many types of clustering algorithm. The fuzzy k-means algorithm recently has been extended to address nonspherical clusters with the use of scatter matrices. Four variants were proposed, each optimizing a different measure of clustering "goodness." METHODS: With AFC data obtained from marine phytoplankton species in culture, the four fuzzy k-means algorithm variants were compared with each other and with another multivariate clustering algorithm based on critical distances currently used in flow cytometry. RESULTS: One of the algorithm variants (adaptive distances, also known as the Gustafson--Kessel algorithm) was found to be robust and reliable, whereas the others showed various problems. CONCLUSIONS: The adaptive distances algorithm was superior in use to the clustering algorithms against which it was tested, but the problem of automatic determination of the number of clusters remains to be addressed.  相似文献   

9.
Investigation of patterns in beta diversity has received increased attention over the last years particularly in light of new ecological theories such as the metapopulation paradigm and metacommunity theory. Traditionally, beta diversity patterns can be described by cluster analysis (i.e. dendrograms) that enables the classification of samples. Clustering algorithms define the structure of dendrograms, consequently assessing their performance is crucial. A common, although not always appropriate approach for assessing algorithm suitability is the cophenetic correlation coefficient c. Alternatively the 2-norm has been recently proposed as an increasingly informative method for evaluating the distortion engendered by clustering algorithms. In the present work, the 2-norm is applied for the first time on field data and is compared with the cophenetic correlation coefficient using a set of 105 pairwise combinations of 7 clustering methods (e.g. UPGMA) and 15 (dis)similarity/distance indices (e.g. Jaccard index). In contrast to the 2-norm, cophenetic correlation coefficient does not provide a clear indication on the efficiency of the clustering algorithms for all combinations. The two approaches were not always in agreement in the choice of the most faithful algorithm. Additionally, the 2-norm revealed that UPGMA is the most efficient clustering algorithm and Ward's the least. The present results suggest that goodness-of-fit measures such as the 2-norm should be applied prior to clustering analyses for reliable beta diversity measures.  相似文献   

10.
In statistical mechanics, the equilibrium properties of a physical system of particles can be calculated as the statistical average over accessible microstates of the system. In general, these calculations are computationally intractable since they involve summations over an exponentially large number of microstates. Clustering algorithms are one of the methods used to numerically approximate these sums. The most basic clustering algorithms first sub-divide the system into a set of smaller subsets (clusters). Then, interactions between particles within each cluster are treated exactly, while all interactions between different clusters are ignored. These smaller clusters have far fewer microstates, making the summation over these microstates, tractable. These algorithms have been previously used for biomolecular computations, but remain relatively unexplored in this context. Presented here, is a theoretical analysis of the error and computational complexity for the two most basic clustering algorithms that were previously applied in the context of biomolecular electrostatics. We derive a tight, computationally inexpensive, error bound for the equilibrium state of a particle computed via these clustering algorithms. For some practical applications, it is the root mean square error, which can be significantly lower than the error bound, that may be more important. We how that there is a strong empirical relationship between error bound and root mean square error, suggesting that the error bound could be used as a computationally inexpensive metric for predicting the accuracy of clustering algorithms for practical applications. An example of error analysis for such an application-computation of average charge of ionizable amino-acids in proteins-is given, demonstrating that the clustering algorithm can be accurate enough for practical purposes.  相似文献   

11.
A large share of construction material stock (MS) accumulates in urban built environments. To attain a more sustainable use of resources, knowledge about the spatial distribution of urban MS is needed. In this article, an innovative spatial analysis approach to urban MS is proposed. Within this scope, MS indicators are defined at neighborhood level and clustered with k‐mean algorithms. The MS is estimated bottom‐up with (a) material‐intensity coefficients and (b) spatial data for three built environment components: buildings, road transportation, and pipes, using seven material categories. The city of Gothenburg, Sweden is used as a case study. Moreover, being the first case study in Northern Europe, the results are explored through various aspects (material composition, age distribution, material density), and, finally, contrasted on a per capita basis with other studies worldwide. The stock is estimated at circa 84 million metric tons. Buildings account for 73% of the stock, road transport 26%, and pipes 1%. Mineral‐binding materials take the largest share of the stock, followed by aggregates, brick, asphalt, steel, and wood. Per capita, the MS is estimated at 153 metric tons; 62 metric tons are residential, which, in an international context, is a medium estimate. Denser neighborhoods with a mix of nonresidential and residential buildings have a lower proportion of MS in roads and pipes than low‐density single‐family residential neighborhoods. Furthermore, single‐family residential neighborhoods cluster in mixed‐age classes and show the largest content of wood. Multifamily buildings cluster in three distinct age classes, and each represent a specific material composition of brick, mineral binding, and steel. Future work should focus on megacities and contrasting multiple urban areas and, methodologically, should concentrate on algorithms, MS indicators, and spatial divisions of urban stock.  相似文献   

12.
The inference of population genetic structures is essential in many research areas in population genetics, conservation biology and evolutionary biology. Recently, unsupervised Bayesian clustering algorithms have been developed to detect a hidden population structure from genotypic data, assuming among others that individuals taken from the population are unrelated. Under this assumption, markers in a sample taken from a subpopulation can be considered to be in Hardy-Weinberg and linkage equilibrium. However, close relatives might be sampled from the same subpopulation, and consequently, might cause Hardy-Weinberg and linkage disequilibrium and thus bias a population genetic structure analysis. In this study, we used simulated and real data to investigate the impact of close relatives in a sample on Bayesian population structure analysis. We also showed that, when close relatives were identified by a pedigree reconstruction approach and removed, the accuracy of a population genetic structure analysis can be greatly improved. The results indicate that unsupervised Bayesian clustering algorithms cannot be used blindly to detect genetic structure in a sample with closely related individuals. Rather, when closely related individuals are suspected to be frequent in a sample, these individuals should be first identified and removed before conducting a population structure analysis.  相似文献   

13.
14.
15.

Background  

A cluster analysis is the most commonly performed procedure (often regarded as a first step) on a set of gene expression profiles. In most cases, a post hoc analysis is done to see if the genes in the same clusters can be functionally correlated. While past successes of such analyses have often been reported in a number of microarray studies (most of which used the standard hierarchical clustering, UPGMA, with one minus the Pearson's correlation coefficient as a measure of dissimilarity), often times such groupings could be misleading. More importantly, a systematic evaluation of the entire set of clusters produced by such unsupervised procedures is necessary since they also contain genes that are seemingly unrelated or may have more than one common function. Here we quantify the performance of a given unsupervised clustering algorithm applied to a given microarray study in terms of its ability to produce biologically meaningful clusters using a reference set of functional classes. Such a reference set may come from prior biological knowledge specific to a microarray study or may be formed using the growing databases of gene ontologies (GO) for the annotated genes of the relevant species.  相似文献   

16.
On the basis of simulated data, this study compares the relative performances of the Bayesian clustering computer programs structure , geneland , geneclust and a new program named tess . While these four programs can detect population genetic structure from multilocus genotypes, only the last three ones include simultaneous analysis from geographical data. The programs are compared with respect to their abilities to infer the number of populations, to estimate membership probabilities, and to detect genetic discontinuities and clinal variation. The results suggest that combining analyses using tess and structure offers a convenient way to address inference of spatial population structure.  相似文献   

17.
Comprehensive software and hardware have been developed for the processing of biosignals. Such automatic signal processing, however not only has advantages, but also drawbacks. The question as to the reliability of the evaluation algorithm arises when the signal is modified, in the presence of interindividual differences, and in particular when noise is superimposed. This is of great interest for long-term recording when the original signal can no longer be inspected visually. The aim of our work was to display the signals on the screen of a monitor simultaneously with lines marking the points (start, end, extreme value, etc.) processed by the specific signal processing algorithm. The program package permits the on-line recording and monitoring of signals, the parallel processing and marking of detected events on the monitor, as well as storage of the parameters extracted. It is a very effective tool for developing, improving and monitoring of algorithms and their efficiency for signal processing.  相似文献   

18.
19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号