首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Fung ES  Ng MK 《Bioinformation》2007,2(5):230-234
One of the applications of the discriminant analysis on microarray data is to classify patient and normal samples based on gene expression values. The analysis is especially important in medical trials and diagnosis of cancer subtypes. The main contribution of this paper is to propose a simple Fisher-type discriminant method on gene selection in microarray data. In the new algorithm, we calculate a weight for each gene and use the weight values as an indicator to identify the subsets of relevant genes that categorize patient and normal samples. A l(2) - l(1) norm minimization method is implemented to the discriminant process to automatically compute the weights of all genes in the samples. The experiments on two microarray data sets have shown that the new algorithm can generate classification results as good as other classification methods, and effectively determine relevant genes for classification purpose. In this study, we demonstrate the gene selection's ability and the computational effectiveness of the proposed algorithm. Experimental results are given to illustrate the usefulness of the proposed model.  相似文献   

2.
We propose a new method for tumor classification from gene expression data, which mainly contains three steps. Firstly, the original DNA microarray gene expression data are modeled by independent component analysis (ICA). Secondly, the most discriminant eigenassays extracted by ICA are selected by the sequential floating forward selection technique. Finally, support vector machine is used to classify the modeling data. To show the validity of the proposed method, we applied it to classify three DNA microarray datasets involving various human normal and tumor tissue samples. The experimental results show that the method is efficient and feasible.  相似文献   

3.
Analysis of microcalorimetric curves for bacterial identification   总被引:2,自引:0,他引:2  
A numeric method is suggested for the treatment of microcalorimetric curves of bacterial growth to provide a new tool for their automatic identification. In this method the microcalorimetric curves are searched against certain reference profiles (stored in a library) by means of a cross-correlation analysis and a parametric comparison. The matching between the new curve and each reference profile is evaluated by means of a specific identification coefficient which provides an objective criterion for the identification of each species. The reliability of the method is discussed.  相似文献   

4.
The use of order statistics to discriminate and classify DNA ploidy patterns is proposed, especially for the classification of additional observations: whether a given sample is more likely to have come from a normal or an abnormal tissue, and with what probability, based on its ploidy pattern. The method involves the order of observations within each of several samples (e.g., euploid and aneuploid DNA patterns) and the use of subsets of the obtained order statistics as independent variables in a linear discriminant analysis. It thus replaces univariate observations by (some of) their order statistics, which are then used as the variables in the discriminant analysis. The procedure does not require normality of distributions or the transformation of nonnormal distributions, as do many discriminant functions; order statistics are usually distribution-free and thus are particularly useful for nonparametric inference. Preliminary simulation studies verified the potential usefulness of the order statistics discriminant function method as applied to DNA ploidy analysis. Its advantages as compared to the usual methods for hypothesis testing, e.g., the use of the chi-square or Kolmogorov-Smirnov tests to as certain "goodness-of-fit," is discussed. The proposed method is easy to implement and easy to interpret; it is also applicable to the study of distributions of other types of measurements.  相似文献   

5.
Recent technical advances in mass spectrometry (MS) have propelled this technology to the forefront of methods employed in metabolome analysis. Here, we compare two distinct analytical approaches based on MS for their potential in revealing specific metabolic footprints of yeast single-deletion mutants. Filtered fermentation broth samples were analyzed by GC-MS and direct infusion ESI-MS. The potential of both methods in producing specific and, therefore, discriminant metabolite profiles was evaluated using samples from several yeast deletion mutants grown in batch-culture conditions with glucose as the carbon source. The mutants evaluated were cat8, gln3, ino2, opi1, and nil1, all with deletion of genes involved in nutrient sensing and regulation. From the analysis, we found that both methods can be used to classify mutants, but the classification depends on which metabolites are measured. Thus, the GC-MS method is good for classification of mutants with altered nitrogen regulation as it primarily measures amino acids, whereas this method cannot classify mutants involved in regulation of phospholipids metabolism as well as the direct infusion MS (DI-MS) method. From the analysis, we find that it is possible to discriminate the mutants in both the exponential and stationary growth phase, but the data from the exponential growth phase provide more physiological relevant information. Based on the data, we identified metabolites that are primarily involved in discrimination of the different mutants, and hereby providing a link between high-throughput metabolome analysis, strain classification, and physiology.  相似文献   

6.
MOTIVATION: The problem of class prediction has received a tremendous amount of attention in the literature recently. In the context of DNA microarrays, where the task is to classify and predict the diagnostic category of a sample on the basis of its gene expression profile, a problem of particular importance is the diagnosis of cancer type based on microarray data. One method of classification which has been very successful in cancer diagnosis is the support vector machine (SVM). The latter has been shown (through simulations) to be superior in comparison with other methods, such as classical discriminant analysis, however, SVM suffers from the drawback that the solution is implicit and therefore is difficult to interpret. In order to remedy this difficulty, an analysis of variance decomposition using structured kernels is proposed and is referred to as the structured polychotomous machine. This technique utilizes Newton-Raphson to find estimates of coefficients followed by the Rao and Wald tests, respectively, for addition and deletion of import vectors. RESULTS: The proposed method is applied to microarray data and simulation data. The major breakthrough of our method is efficiency in that only a minimal number of genes that accurately predict the classes are selected. It has been verified that the selected genes serve as legitimate markers for cancer classification from a biological point of view. AVAILABILITY: All source codes used are available on request from the authors.  相似文献   

7.
Analysis of microfossil silica phytoliths is becoming an increasingly important research tool for taxonomists, archaeobotanists, and paleoecologists. Expanded use of phytolith analysis by researchers is dependent upon development of phytolith systematics. In this study phytoliths produced by the inflorescence bracts from four species of wheat, Triticum monoccocum, T. dicoccon, T. dicoccoides, and T. aestivum, and two species of barley, Hordeum vulgare, and H. spontaneum, were analyzed using computer-assisted image and statistical analysis with the intent to develop taxonomic tools to distinguish among the taxa. A classification key based on significant differences among the mean morphometries of the inflorescence phytoliths produced by each species was created and tested. Discriminant analysis of the morphometries of several morphotypes of phytoliths was also conducted to determine whether this computer-assisted statistical procedure could be used as another method to classify the taxa and to determine which morphotypes have measurements that can best be used in discriminant functions. Test results indicated that, at the genus level, both the classification key and discriminant analysis of certain morphotypes of phytoliths were relatively reliable tools for distinguishing among phytoliths produced in the inflorescence bracts of the taxa considered. For distinguishing among the taxa at the species level, the classification key was most reliable. Of the discriminant analyses tested, that based on all the phytolith morphotypes combined was more reliable than those based on only one morphotype.  相似文献   

8.
New methodologies for surveillance and identification of Mycobacterium tuberculosis are required to stem the spread of disease worldwide. In addition, the ability to discriminate mycobacteria at the strain level may be important to contact or source case investigations. To this end, we are developing MALDI-TOF MS methods for the identification of M. tuberculosis in culture. In this report, we describe the application of MALDI-TOF MS, as well as statistical analysis including linear discriminant and random forest analysis, to 16 medically relevant strains from four species of mycobacteria, M. tuberculosis, M. avium, M. intracellulare, and M. kansasii. Although species discrimination can be accomplished on the basis of unique m/z values observed in the MS fingerprint spectrum, discrimination at the strain level is predicted on the relative abundance of shared m/z values among strains within a species. For the 16 mycobacterial strains investigated in the present study, it is possible to unambiguously identify strains within a species on the basis of MALDI-TOF MS data. The error rate for classification of individual strains using linear discriminant analysis was 0.053 using 37 m/z variables, whereas the error rate for classification of individual strains using random forest analysis was 0.023 using only 18 m/z variables. In addition, using random forest analysis of MALDI-TOF MS data, it was possible to correctly classify bacterial strains as either M. tuberculosis or non-tuberculous with 100% accuracy.  相似文献   

9.
OBJECTIVE: To investigate of the potential value of morphometry and discriminant analysis for the classification of benign and malignant gastric cells and lesions. STUDY DESIGN: The data set consisted of 13,300 cells from 120 cases composed of 30 cases of cancer, 26 cases of gastritis and 64 cases of ulcer according to the final histologic diagnosis. The cytologic diagnosis was divided into 5 categories (gastritis, ulcer, inflammatory dysplasia, cancer and true dysplasia). Classification was attempted at 2 levels: the cell level to classify individual cells and the case level to classify individual cases. For the cellular classification the measured cells from 50% of available cases were selected as a training set to construct a model. The cells from the remaining cases were used as a test set to validate the model. Similarly for case classification, the same 50% of cases that were used for cell classification were used as a training set and the remaining cases as a test set. Images of routinely processed gastric smears stained by the Papanicolaou technique were analyzed by a customized image analysis system. RESULTS: Application of discriminant analysis on the test set gave correct classification of 98.4% of benign cells and 67.1% of malignant cells. On case classification, 100% accuracy was achieved for benign and malignant cases, both for the training and test sets. CONCLUSION: The application of discriminant analysis described in this paper could produce significant classification results at the cellular and individual case level.  相似文献   

10.
Acoustic methods may improve the ability to identify cetacean species during shipboard surveys. Whistles were recorded from nine odontocete species in the eastern tropical Pacific to determine how reliably these vocalizations can be classified to species based on simple spectrographic measurements. Twelve variables were measured from each whistle ( n = 908). Parametric multivariate discriminant function analysis (DFA) correctly classified 41.1% of whistles to species. Non-parametric classification and regression tree (CART) analysis resulted in 51.4% correct classification. Striped dolphin whistles were most difficult to classify. Whistles of bottlenose dolphins, false killer whales, and pilot whales were most distinctive. Correct classification scores may be improved by adding prior probabilities that reflect species distribution to classification models, by measuring alternative whistle variables, using alternative classification techniques, and by localizing vocalizing dolphins when collecting data for classification models.  相似文献   

11.
Do KA  Kirk K 《Biometrics》1999,55(1):174-181
Principal component analysis enhanced by the use of smoothing is used in conjunction with discriminant analysis techniques to devise a statistical classification method for the analysis of event-related potential data. A training set of premedication potentials collected from adolescents with attention-deficit hyperactive disorder (ADHD) is used to predict whether adolescents from an independent subject group will respond to long-term medication. Comparison of outcome prediction rates demonstrates that this method, which uses information from the whole ERP curve, is superior to the classification technique currently used by clinicians, which is based on a single ERP curve feature. The need to administer an initial dose of medication to classify patients is also eliminated.  相似文献   

12.
Hidden Markov models (HMM) are introduced for the offline classification of single-trail EEG data in a brain-computer-interface (BCI). The HMMs are used to classify Hjorth parameters calculated from bipolar EEG data, recorded during the imagination of a left or right hand movement. The effects of different types of HMMs on the recognition rate are discussed. Furthermore a comparison of the results achieved with the linear discriminant (LD) and the HMM, is presented.  相似文献   

13.
Metabolic fingerprinting of biofluids like urine is a useful technique for detecting differences between individuals. With this approach, it might be possible to classify samples according to their biological relevance. In Part 1 of this work a method for the comprehensive screening of metabolites was described, using two different liquid chromatography (LC) column set-ups and detection by electrospray ionization mass spectrometry (ESI-MS). Data pretreatment of the resulting data described in is needed to reduce the complexity of the data and to obtain useful metabolic fingerprints. Three different approaches, i.e., reduced dimensionality (RD), MarkerLynx, and MS Resolver, were compared for the extraction of information. The pretreated data were then subjected to multivariate data analysis by partial least squares discriminant analysis (PLS-DA) for classification. By combining two different chromatographic procedures and data analysis, the detection of metabolites was enhanced as well as the finding of metabolic fingerprints that govern classification. Additional potential biomarkers or xenobiotic metabolites were detected in the fraction containing highly polar compounds that are normally discarded when using reversed-phase liquid chromatography.  相似文献   

14.
1. Early versions of the river invertebrate prediction and classification system (RIVPACS) used TWINSPAN to classify reference sites based on the macro-invertebrate fauna, followed by multiple discriminant analysis (MDA) for prediction of the fauna to be expected at new sites from environmental variables. This paper examines some alternative methods for the initial site classification and a different technique for prediction. 2. A data set of 410 sites from RIVPACS II was used for initial screening of seventeen alternative methods of site classification. Multiple discriminant analysis was used to predict classification group from environmental variables. 3. Five of the classification–prediction systems which showed promise were developed further to facilitate prediction of taxa at species and at Biological Monitoring Working Party (BMWP) family level. 4. The predictive capability of these new systems, plus RIVPACS II, was tested on an independent data set of 101 sites from locations throughout Great Britain. 5. Differences between the methods were often marginal but two gave the most consistently reliable outputs: the original TWINSPAN method, and the ordination method semi-strong hybrid multidimensional scaling (SSH) followed by K-means clustering. 6. Logistic regression, an alternative approach to prediction which does not require the prior development of a classification system, was also examined. Although its performance fell within the range offered by the other five systems tested, it conveyed no advantages over them. 7. This study demonstrated that several different multivariate methods were suitable for developing a reliable system for predicting expected probability of occurrence of taxa. This is because the prediction system involves a weighted average smoothing across site groupings. 8. Hence, the two most promising procedures for site classification, coupled to MDA, were both used in the exploratory analyses for RIVPACS III development, which utilized over 600 reference sites.  相似文献   

15.
MOTIVATION: Several pattern discovery methods have been proposed to detect over-represented motifs in upstream sequences of co-regulated genes, and are for example used to predict cis-acting elements from clusters of co-expressed genes. The clusters to be analyzed are often noisy, containing a mixture of co-regulated and non-co-regulated genes. We propose a method to discriminate co-regulated from non-co-regulated genes on the basis of counts of pattern occurrences in their non-coding sequences. METHODS: String-based pattern discovery is combined with discriminant analysis to classify genes on the basis of putative regulatory motifs. RESULTS: The approach is evaluated by comparing the significance of patterns detected in annotated regulons (positive control), random gene selections (negative control) and high-throughput regulons (noisy data) from the yeast Saccharomyces cerevisiae. The classification is evaluated on the annotated regulons, and the robustness and rejection power is assessed with mixtures of co-regulated and random genes.  相似文献   

16.

Purpose

The classification between different gait patterns is a frequent task in gait assessment. The base vectors were usually found using principal component analysis (PCA) is replaced by an iterative application of the support vector machine (SVM). The aim was to use classifyability instead of variability to build a subspace (SVM space) that contains the information about classifiable aspects of a movement. The first discriminant of the SVM space will be compared to a discriminant found by an independent component analysis (ICA) in the SVM space.

Methods

Eleven runners ran using shoes with different midsoles. Kinematic data, representing the movements during stance phase when wearing the two shoes, was used as input to a PCA and SVM. The data space was decomposed by an iterative application of the SVM into orthogonal discriminants that were able to classify the two movements. The orthogonal discriminants spanned a subspace, the SVM space. It represents the part of the movement that allowed classifying the two conditions. The data in the SVM space was reconstructed for a visual assessment of the movement difference. An ICA was applied to the data in the SVM space to obtain a single discriminant. Cohen''s d effect size was used to rank the PCA vectors that could be used to classify the data, the first SVM discriminant or the ICA discriminant.

Results

The SVM base contains all the information that discriminates the movement of the two shod conditions. It was shown that the SVM base contains some redundancy and a single ICA discriminant was found by applying an ICA in the SVM space.

Conclusions

A combination of PCA, SVM and ICA is best suited to extract all parts of the gait pattern that discriminates between the two movements and to find a discriminant for the classification of dichotomous kinematic data.  相似文献   

17.
Within the framework of Fisher's discriminant analysis, we propose a multiclass classification method which embeds variable screening for ultrahigh‐dimensional predictors. Leveraging interfeature correlations, we show that the proposed linear classifier recovers informative features with probability tending to one and can asymptotically achieve a zero misclassification rate. We evaluate the finite sample performance of the method via extensive simulations and use this method to classify posttransplantation rejection types based on patients' gene expressions.  相似文献   

18.
A method is proposed for estimating the quality of animal habitats from field counts with positioning routes and tracks by means of GPS, multichannel Landsat data, digital elevation model, and discriminant analysis. The distribution of American and European minks is analyzed to demonstrate the principle of choosing an optimal method for analyzing the environmental characteristics that determine the distribution of species and for mapping and estimating the quality of habitats and the probability of track detection. The outlook and some problems of implementation of the proposed approach are discussed.  相似文献   

19.
Robust PCA and classification in biosciences   总被引:7,自引:0,他引:7  
MOTIVATION: Principal components analysis (PCA) is a very popular dimension reduction technique that is widely used as a first step in the analysis of high-dimensional microarray data. However, the classical approach that is based on the mean and the sample covariance matrix of the data is very sensitive to outliers. Also, classification methods based on this covariance matrix do not give good results in the presence of outlying measurements. RESULTS: First, we propose a robust PCA (ROBPCA) method for high-dimensional data. It combines projection-pursuit ideas with robust estimation of low-dimensional data. We also propose a diagnostic plot to display and classify the outliers. This ROBPCA method is applied to several bio-chemical datasets. In one example, we also apply a robust discriminant method on the scores obtained with ROBPCA. We show that this combination of robust methods leads to better classifications than classical PCA and quadratic discriminant analysis. AVAILABILITY: All the programs are part of the Matlab Toolbox for Robust Calibration, available at http://www.wis.kuleuven.ac.be/stat/robust.html.  相似文献   

20.
The classification of microorganisms by high‐dimensional phenotyping methods such as FTIR spectroscopy is often a complicated process due to the complexity of microbial phylogenetic taxonomy. A hierarchical structure developed for such data can often facilitate the classification analysis. The hierarchical tree structure can either be imposed to a given set of phenotypic data by integrating the phylogenetic taxonomic structure or set up by revealing the inherent clusters in the phenotypic data. In this study, we wanted to compare different approaches to hierarchical classification of microorganisms based on high‐dimensional phenotypic data. A set of 19 different species of molds (filamentous fungi) obtained from the mycological strain collection of the Norwegian Veterinary Institute (Oslo, Norway) is used for the study. Hierarchical cluster analysis is performed for setting up the classification trees. Classification algorithms such as artificial neural networks (ANN), partial least‐squared discriminant analysis and random forest (RF) are used and compared. The 2 methods ANN and RF outperformed all the other approaches even though they did not utilize predefined hierarchical structure. To our knowledge, the RF approach is used here for the first time to classify microorganisms by FTIR spectroscopy.   相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号