首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Questions: Does fuzzy clustering provide an appropriate numerical framework to manage vegetation classifications? What is the best fuzzy clustering method to achieve this? Material: We used 531 relevés from Catalonia (Spain), belonging to two syntaxonomic alliances of mesophytic and xerophytic montane pastures, and originally classified by experts into nine and 13 associations, respectively. Methods: We compared the performance of fuzzy C‐means (FCM), noise clustering (NC) and possibilistic C‐means (PCM) on four different management tasks: (1) assigning new relevé data to existing types; (2) updating types incorporating new data; (3) defining new types with unclassified relevés; and (4) reviewing traditional vegetation classifications. Results: As fuzzy classifiers, FCM fails to indicate when a given relevé does not belong to any of the existing types; NC might leave too many relevés unclassified; and PCM membership values cannot be compared. As unsupervised clustering methods, FCM is more sensitive than NC to transitional relevés and therefore produces fuzzier classifications. PCM looks for dense regions in the space of species composition, but these are scarce when vegetation data contain many transitional relevés. Conclusions: All three models have advantages and disadvantages, although the NC model may be a good compromise between the restricted FCM model and the robust but impractical PCM model. In our opinion, fuzzy clustering might provide a suitable framework to manage vegetation classifications using a consistent operational definition of vegetation type. Regardless of the framework chosen, national/regional vegetation classification panels should promote methodological standards for classification practices with numerical tools.  相似文献   

3.
Atrial fibrillation (AF) and atrial flutter (AFL) are the two common atrial arrhythmia encountered in the clinical practice. In order to diagnose these abnormalities the electrocardiogram (ECG) is widely used. The conventional linear time and frequency domain methods cannot decipher the hidden complexity present in these signals. The ECG is inherently a non-linear, non-stationary and non-Gaussian signal. The non-linear models can provide improved results and capture minute variations present in the time series. Higher order spectra (HOS) is a non-linear dynamical method which is highly rugged to noise. In the present study, the performances of two methods are compared: (i) 3rd order HOS cumulants and (ii) HOS bispectrum. The 3rd order cumulant and bispectrum coefficients are subjected to dimensionality reduction using independent component analysis (ICA) and classified using classification and regression tree (CART), random forest (RF), artificial neural network (ANN) and k-nearest neighbor (KNN) classifiers to select the best classifier. The ICA components of cumulant coefficients have provided the average accuracy, sensitivity, specificity and positive predictive value of 99.50%, 100%, 99.22% and 99.72% respectively using KNN classifier. Similarly, the ICA components of HOS bispectrum coefficients have yielded the average accuracy, sensitivity, specificity and PPV of 97.65%, 98.16%, 98.75% and 99.53% respectively using KNN. So, the ICA performed on the 3rd order HOS cumulants coupled with KNN classifier performed better than the HOS bispectrum method. The proposed methodology is robust and can be used in mass screening of cardiac patients.  相似文献   

4.
Wang D  Lv Y  Guo Z  Li X  Li Y  Zhu J  Yang D  Xu J  Wang C  Rao S  Yang B 《Bioinformatics (Oxford, England)》2006,22(23):2883-2889
MOTIVATION: Microarrays datasets frequently contain a large number of missing values (MVs), which need to be estimated and replaced for subsequent data mining. The focus of the paper is to study the effects of different MV treatments for cDNA microarray data on disease classification analysis. RESULTS: By analyzing five datasets, we demonstrate that among three kinds of classifiers evaluated in this study, support vector machine (SVM) classifiers are robust to varied MV imputation methods [e.g. replacing MVs by zero, K nearest-neighbor (KNN) imputation algorithm, local least square imputation and Bayesian principal component analysis], while the classification and regression tree classifiers are sensitive in terms of classification accuracy. The KNNclassifiers built on differentially expressed genes (DEGs) are robust to the varied MV treatments, but the performances of the KNN classifiers based on all measured genes can be significantly deteriorated when imputing MVs for genes with larger missing rate (MR) (e.g. MR > 5%). Generally, while replacing MVs by zero performs relatively poor, the other imputation algorithms have little difference in affecting classification performances of the SVM or KNN classifiers. We further demonstrate the power and feasibility of our recently proposed functional expression profile (FEP) approach as means to handle microarray data with MVs. The FEPs, which are derived from the functional modules that are enriched with sets of DEGs and thus can be consistently identified under varied MV treatments, achieve precise disease classification with better biological interpretation. We conclude that the choice of MV treatments should be determined in context of the later approaches used for disease classification. The suggested exclusion criterion of ignoring the genes with larger MR (e.g. >5%), while justifiable for some classifiers such as KNN classifiers, might not be considered as a general rule for all classifiers.  相似文献   

5.
The updating and rethinking of vegetation classifications is important for ecosystem monitoring in a rapidly changing world, where the distribution of vegetation is changing. The general assumption that discrete and persistent plant communities exist that can be monitored efficiently, is rarely tested before undertaking a classification. Marion Island (MI) is comprised of species-poor vegetation undergoing rapid environmental change. It presents a unique opportunity to test the ability to discretely classify species-poor vegetation with recently developed objective classification techniques and relate it to previous classifications. We classified vascular species data of 476 plots sampled across MI, using Ward hierarchical clustering, divisive analysis clustering, non-hierarchical kmeans and partitioning around medoids. Internal cluster validation was performed using silhouette widths, Dunn index, connectivity of clusters and gap statistic. Indicator species analyses were also conducted on the best performing clustering methods. We evaluated the outputs against previously classified units. Ward clustering performed the best, with the highest average silhouette width and Dunn index, as well as the lowest connectivity. The number of clusters differed amongst the clustering methods, but most validation measures, including for Ward clustering, indicated that two and three clusters are the best fit for the data. However, all classification methods produced weakly separated, highly connected clusters with low compactness and low fidelity and specificity to clusters. There was no particularly robust and effective classification outcome that could group plots into previously suggested vegetation units based on species composition alone. The relatively recent age (c. 450,000 years B.P.), glaciation history (last glacial maximum 34,500 years B.P.) and isolation of the sub-Antarctic islands may have hindered the development of strong vascular plant species assemblages with discrete boundaries. Discrete classification at the community-level using species composition may not be suitable in such species-poor environments. Species-level, rather than community-level, monitoring may thus be more appropriate in species-poor environments, aligning with continuum theory rather than community theory.  相似文献   

6.
In this paper, three different clustering algorithms were applied to assemble infrared (IR) spectral maps from IR microspectra of tissues. Using spectra from a colorectal adenocarcinoma section, we show how IR images can be assembled by agglomerative hierarchical (AH) clustering (Ward's technique), fuzzy C-means (FCM) clustering, and k-means (KM) clustering. We discuss practical problems of IR imaging on tissues such as the influence of spectral quality and data pretreatment on image quality. Furthermore, the applicability of cluster algorithms to the spatially resolved microspectroscopic data and the degree of correlation between distinct cluster images and histopathology are compared. The use of any of the clustering algorithms dramatically increased the information content of the IR images, as compared to univariate methods of IR imaging (functional group mapping). Among the cluster imaging methods, AH clustering (Ward's algorithm) proved to be the best method in terms of tissue structure differentiation.  相似文献   

7.
It is a common occurrence in plant breeding programs to observe missing values in three-way three-mode multi-environment trial (MET) data. We proposed modifications of models for estimating missing observations for these data arrays, and developed a novel approach in terms of hierarchical clustering. Multiple imputation (MI) was used in four ways, multiple agglomerative hierarchical clustering, normal distribution model, normal regression model, and predictive mean match. The later three models used both Bayesian analysis and non-Bayesian analysis, while the first approach used a clustering procedure with randomly selected attributes and assigned real values from the nearest neighbour to the one with missing observations. Different proportions of data entries in six complete datasets were randomly selected to be missing and the MI methods were compared based on the efficiency and accuracy of estimating those values. The results indicated that the models using Bayesian analysis had slightly higher accuracy of estimation performance than those using non-Bayesian analysis but they were more time-consuming. However, the novel approach of multiple agglomerative hierarchical clustering demonstrated the overall best performances.  相似文献   

8.
Flow cytometric assessment of viability of lactic acid bacteria   总被引:3,自引:0,他引:3  
The viability of lactic acid bacteria is crucial for their applications as dairy starters and as probiotics. We investigated the usefulness of flow cytometry (FCM) for viability assessment of lactic acid bacteria. The esterase substrate carboxyfluorescein diacetate (cFDA) and the dye exclusion DNA binding probes propidium iodide (PI) and TOTO-1 were tested for live/dead discrimination using a Lactococcus, a Streptococcus, three Lactobacillus, two Leuconostoc, an Enterococcus, and a Pediococcus species. Plate count experiments were performed to validate the results of the FCM assays. The results showed that cFDA was an accurate stain for live cells; in exponential-phase cultures almost all cells were labeled, while 70 degrees C heat-killed cultures were left unstained. PI did not give clear live/dead discrimination for some of the species. TOTO-1, on the other hand, gave clear discrimination between live and dead cells. The combination of cFDA and TOTO-1 gave the best results. Well-separated subpopulations of live and dead cells could be detected with FCM. Cell sorting of the subpopulations and subsequent plating on agar medium provided direct evidence that cFDA labels the culturable subpopulation and that TOTO-1 labels the nonculturable subpopulation. Applied to cultures exposed to deconjugated bile salts or to acid, cFDA and TOTO-1 proved to be accurate indicators of culturability. Our experiments with lactic acid bacteria demonstrated that the combination of cFDA and TOTO-1 makes an excellent live/dead assay with versatile applications.  相似文献   

9.
The wealth of interaction information provided in biomedical articles motivated the implementation of text mining approaches to automatically extract biomedical relations. This paper presents an unsupervised method based on pattern clustering and sentence parsing to deal with biomedical relation extraction. Pattern clustering algorithm is based on Polynomial Kernel method, which identifies interaction words from unlabeled data; these interaction words are then used in relation extraction between entity pairs. Dependency parsing and phrase structure parsing are combined for relation extraction. Based on the semi-supervised KNN algorithm, we extend the proposed unsupervised approach to a semi-supervised approach by combining pattern clustering, dependency parsing and phrase structure parsing rules. We evaluated the approaches on two different tasks: (1) Protein–protein interactions extraction, and (2) Gene–suicide association extraction. The evaluation of task (1) on the benchmark dataset (AImed corpus) showed that our proposed unsupervised approach outperformed three supervised methods. The three supervised methods are rule based, SVM based, and Kernel based separately. The proposed semi-supervised approach is superior to the existing semi-supervised methods. The evaluation on gene–suicide association extraction on a smaller dataset from Genetic Association Database and a larger dataset from publicly available PubMed showed that the proposed unsupervised and semi-supervised methods achieved much higher F-scores than co-occurrence based method.  相似文献   

10.
AIMS: To use BioBall cultures as a precise reference standard to evaluate methods for enumeration of Escherichia coli and other coliform bacteria in water samples. METHODS AND RESULTS: Eight methods were evaluated including membrane filtration, standard plate count (pour and spread plate methods), defined substrate technology methods (Colilert and Colisure), the most probable number method and the Petrifilm disposable plate method. Escherichia coli and Enterobacter aerogenes BioBall cultures containing 30 organisms each were used. All tests were performed using 10 replicates. The mean recovery of both bacteria varied with the different methods employed. CONCLUSIONS: The best and most consistent results were obtained with Petrifilm and the pour plate method. Other methods either yielded a low recovery or showed significantly high variability between replicates. SIGNIFICANCE AND IMPACT OF THE STUDY: The BioBall is a very suitable quality control tool for evaluating the efficiency of methods for bacterial enumeration in water samples.  相似文献   

11.
MOTIVATION: We recently introduced a multivariate approach that selects a subset of predictive genes jointly for sample classification based on expression data. We tested the algorithm on colon and leukemia data sets. As an extension to our earlier work, we systematically examine the sensitivity, reproducibility and stability of gene selection/sample classification to the choice of parameters of the algorithm. METHODS: Our approach combines a Genetic Algorithm (GA) and the k-Nearest Neighbor (KNN) method to identify genes that can jointly discriminate between different classes of samples (e.g. normal versus tumor). The GA/KNN method is a stochastic supervised pattern recognition method. The genes identified are subsequently used to classify independent test set samples. RESULTS: The GA/KNN method is capable of selecting a subset of predictive genes from a large noisy data set for sample classification. It is a multivariate approach that can capture the correlated structure in the data. We find that for a given data set gene selection is highly repeatable in independent runs using the GA/KNN method. In general, however, gene selection may be less robust than classification. AVAILABILITY: The method is available at http://dir.niehs.nih.gov/microarray/datamining CONTACT: LI3@niehs.nih.gov  相似文献   

12.
MOTIVATION: Microarray data are used in a range of application areas in biology, although often it contains considerable numbers of missing values. These missing values can significantly affect subsequent statistical analysis and machine learning algorithms so there is a strong motivation to estimate these values as accurately as possible before using these algorithms. While many imputation algorithms have been proposed, more robust techniques need to be developed so that further analysis of biological data can be accurately undertaken. In this paper, an innovative missing value imputation algorithm called collateral missing value estimation (CMVE) is presented which uses multiple covariance-based imputation matrices for the final prediction of missing values. The matrices are computed and optimized using least square regression and linear programming methods. RESULTS: The new CMVE algorithm has been compared with existing estimation techniques including Bayesian principal component analysis imputation (BPCA), least square impute (LSImpute) and K-nearest neighbour (KNN). All these methods were rigorously tested to estimate missing values in three separate non-time series (ovarian cancer based) and one time series (yeast sporulation) dataset. Each method was quantitatively analyzed using the normalized root mean square (NRMS) error measure, covering a wide range of randomly introduced missing value probabilities from 0.01 to 0.2. Experiments were also undertaken on the yeast dataset, which comprised 1.7% actual missing values, to test the hypothesis that CMVE performed better not only for randomly occurring but also for a real distribution of missing values. The results confirmed that CMVE consistently demonstrated superior and robust estimation capability of missing values compared with other methods for both series types of data, for the same order of computational complexity. A concise theoretical framework has also been formulated to validate the improved performance of the CMVE algorithm. AVAILABILITY: The CMVE software is available upon request from the authors.  相似文献   

13.
DNA甲基化作为一种重要的表观遗传修饰,其甲基化水平被发现与疾病的发生发展密切相关,对其进行聚类分析有希望发现新的疾病亚型并建立有效的疾病预测预后方法。传统的聚类分析方法之一模糊C-均值(FCM:Fuzzy C-means)适用于特征空间呈球形或椭球形分布的场景,缺乏普适性。而Illumina Golden Gate平台通过计算基因的各甲基化位点的甲基化百分比描述其甲基化程度,其值位于(0,1)之间,服从混合贝塔分布,不能直接采用FCM进行聚类分析。鉴于此,本文提出基于KL特征测度的KL-FCM聚类算法,采用各样本间的K-L距离作为样本划分时的度量准则。最后,本文基于KL-FCM算法实现IRIS测试数据集和基因的DNA甲基化水平数据的聚类分析。实验结果表明该方法可以以更低的计算负荷获得优于k-均值(k-means)和传统FCM的分类效果。  相似文献   

14.
Questions: How similar are solutions of eight commonly used vegetation classification methods? Which classification methods are most effective according to classification validity evaluators? How do evaluators with different optimality criteria differ in their assessments of classification efficacy? In particular, do evaluators which use geometric criteria (e.g. cluster compactness) and non‐geometric evaluators (which rely on diagnostic species) offer similar classification evaluations? Methods: We analysed classifications of two vegetation data‐sets produced by eight classification methods. Classification solutions were assessed with five geometric and four non‐geometric internal evaluators. We formally introduce three new evaluators: PARTANA, an intuitive variation on evaluators which use the ratio of within/between cluster dissimilarity as the optimality criterion, an adaptation of Morisita's index of niche overlap, and ISAMIC, an algorithm which measures the degree to which species are either always present or always absent within clusters. Results and Conclusions: 1. With the exception of single linkage hierarchical clustering, classifications resulting from the eight methods were often similar. 2. Although evaluators varied in their assessment of best overall classification method, they generally favored three hierarchical agglomerative clustering strategies: flexible beta (β=– 0.25), average linkage, and Ward's linkage. 3. Among introduced evaluators PARTANA appears to be an effective geometric strategy which provides assessments similar to C‐index and Gamma evaluators. Non‐geometric evaluators ISAMIC and Morisita's index demonstrate a strong bias for single linkage solutions. 4. Because non‐geometric criteria are of interest to phytosociologists there is a strong need for their continued development for use with vegetation classifications.  相似文献   

15.
Flow Cytometric Assessment of Viability of Lactic Acid Bacteria   总被引:4,自引:0,他引:4       下载免费PDF全文
The viability of lactic acid bacteria is crucial for their applications as dairy starters and as probiotics. We investigated the usefulness of flow cytometry (FCM) for viability assessment of lactic acid bacteria. The esterase substrate carboxyfluorescein diacetate (cFDA) and the dye exclusion DNA binding probes propidium iodide (PI) and TOTO-1 were tested for live/dead discrimination using a Lactococcus, a Streptococcus, three Lactobacillus, two Leuconostoc, an Enterococcus, and a Pediococcus species. Plate count experiments were performed to validate the results of the FCM assays. The results showed that cFDA was an accurate stain for live cells; in exponential-phase cultures almost all cells were labeled, while 70°C heat-killed cultures were left unstained. PI did not give clear live/dead discrimination for some of the species. TOTO-1, on the other hand, gave clear discrimination between live and dead cells. The combination of cFDA and TOTO-1 gave the best results. Well-separated subpopulations of live and dead cells could be detected with FCM. Cell sorting of the subpopulations and subsequent plating on agar medium provided direct evidence that cFDA labels the culturable subpopulation and that TOTO-1 labels the nonculturable subpopulation. Applied to cultures exposed to deconjugated bile salts or to acid, cFDA and TOTO-1 proved to be accurate indicators of culturability. Our experiments with lactic acid bacteria demonstrated that the combination of cFDA and TOTO-1 makes an excellent live/dead assay with versatile applications.  相似文献   

16.
AIMS: Beer-spoilage ability of lactic acid bacteria such as Lactobacillus brevis is a strain-dependent phenomenon in which the mechanism has not yet been completely clarified. In order to systematically identify genes that contribute to beer-spoilage, large-scale random amplified polymorphic DNA (RAPD)-based cloning methods was carried out. METHODS AND RESULTS: A systematic RAPD polymerase chain reaction (PCR) analysis using 600 primers was performed on beer-spoilage and on nonspoilage strains of L. brevis. Among 600 primers, three were found to amplify a single locus highly specific to beer-spoilage strains. DNA sequencing of this locus revealed a three-part operon encoding a putative glycosyl transferase, membrane protein and teichoic acid glycosylation protein. PCR analysis of typical beer-spoilage lactic acid bacteria suggested that this locus is highly specific to beer-spoilage strains. CONCLUSION: The cloned markers are highly specific to identify the beer-spoilage strains not only in L. brevis but also in Pediococcus damnosus, Lactobacillus collinoides and Lactobacillus coryniformis. SIGNIFICANCE AND IMPACT OF THE STUDY: This paper proves that RAPD-PCR is an efficient method for cloning the strain-specific genes from bacteria. The markers described here is one of the most useful tools to identify the beer-spoilage strains of lactic acid bacteria.  相似文献   

17.
18.
19.
Many proteins bear multi-locational characteristics, and this phenomenon is closely related to biological function. However, most of the existing methods can only deal with single-location proteins. Therefore, an automatic and reliable ensemble classifier for protein subcellular multi-localization is needed. We propose a new ensemble classifier combining the KNN (K-nearest neighbour) and SVM (support vector machine) algorithms to predict the subcellular localization of eukaryotic, Gram-negative bacterial and viral proteins based on the general form of Chou's pseudo amino acid composition, i.e., GO (gene ontology) annotations, dipeptide composition and AmPseAAC (Amphiphilic pseudo amino acid composition). This ensemble classifier was developed by fusing many basic individual classifiers through a voting system. The overall prediction accuracies obtained by the KNN-SVM ensemble classifier are 95.22, 93.47 and 80.72% for the eukaryotic, Gram-negative bacterial and viral proteins, respectively. Our prediction accuracies are significantly higher than those by previous methods and reveal that our strategy better predicts subcellular locations of multi-location proteins.  相似文献   

20.
Validation methods for chemometric models are presented, which are a necessity for the evaluation of model performance and prediction ability. Reference methods with known performance can be employed for comparison studies. Other validation methods include test set and cross validation, where some samples are set aside for testing purposes. The choice of the testing method mainly depends on the size of the original dataset. Test set validation is suitable for large datasets (>50), whereas cross validation is the best method for medium to small datasets (<50). In this study the K-nearest neighbour algorithm (KNN) was used as a reference method for the classification of contaminated and blank corn samples. A Partial least squares (PLS) regression model was evaluated using full cross validation. Mid-Infrared spectra were collected using the attenuated total reflection (ATR) technique and the fingerprint range (800–1800 cm−1) of 21 maize samples that were contaminated with 300 – 2600 μg/kg deoxynivalenol (DON) was investigated. Separation efficiency after principal component analysis/cluster analysis (PCA/CA) classification was 100%. Cross validation of the PLS model revealed a correlation coefficient of r=0.9926 with a root mean square error of calibration (RMSEC) of 95.01. Validation results gave an r=0.8111 and a root mean square error of cross validation (RMSECV) of 494.5 was calculated. No outliers were reported. Presented at the 25th Mykotoxin Workshop in Giessen, Germany, May 19–21, 2003  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号