首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Identification of protein coding regions is fundamentally a statistical pattern recognition problem. Discriminant analysis is a statistical technique for classifying a set of observations into predefined classes and it is useful to solve such problems. It is well known that outliers are present in virtually every data set in any application domain, and classical discriminant analysis methods (including linear discriminant analysis (LDA) and quadratic discriminant analysis (QDA)) do not work well if the data set has outliers. In order to overcome the difficulty, the robust statistical method is used in this paper. We choose four different coding characters as discriminant variables and an approving result is presented by the method of robust discriminant analysis.  相似文献   

2.
Linear discriminant analysis (LDA) is frequently used for classification/prediction problems in physical anthropology, but it is unusual to find examples where researchers consider the statistical limitations and assumptions required for this technique. In these instances, it is difficult to know whether the predictions are reliable. This paper considers a nonparametric alternative to predictive LDA: binary, recursive (or classification) trees. This approach has the advantage that data transformation is unnecessary, cases with missing predictor variables do not require special treatment, prediction success is not dependent on data meeting normality conditions or covariance homogeneity, and variable selection is intrinsic to the methodology. Here I compare the efficacy of classification trees with LDA, using typical morphometric data. With data from modern hominoids, the results show that both techniques perform nearly equally. With complete data sets, LDA may be a better choice, as is shown in this example, but with missing observations, classification trees perform outstandingly well, whereas commercial discriminant analysis programs do not predict classifications for cases with incompletely measured predictor variables and generally are not designed to address the problem of missing data. Testing of data prior to analysis is necessary, and classification trees are recommended either as a replacement for LDA or as a supplement whenever data do not meet relevant assumptions. It is highly recommended as an alternative to LDA whenever the data set contains important cases with missing predictor variables.  相似文献   

3.
Application and comparison of sex discriminant functions in different populations led to the conclusion that a certain combination and weighting of a few sex dimorphism variables (in this study we only used craniometric variables) can give a good discrimination between male and female individuals, independent of the racial group to which this function is applied. In our study, the sex-discriminatory power of five discriminant functions which were based on different ordination and selection procedures (e.g. professional knowledge, stepwise discriminant analysis, literature) of the cranial variables is compared. These discriminant functions were applied to three different data sets, the first being skull measurements from an Amsterdam series (Europids), the second skull measurements of a Zulu series (Negrids) and the third skull measurements of a Japan series (Mongolids). Our decision as to whether a function is a good or less good sex-discriminating function is determined by the Dt values (these values give an idea about the discriminatory value of the discriminant function when applied to a new test sample), the number of variables necessary to obtain this Dt and the location of the sectioning point (i.e. comparison between the estimation of the sectioning point and the ”real” sectioning point). These discriminant functions were compared withGiles Elliot's (1962, 1963) “race-independent” sex function.  相似文献   

4.
The Fourier transform (FT) method was applied to specify the distribution of 14 predefined groups of amino acids (64 residues) at both termini of annotated type III and type I secreted proteins from proteobacteria. Type I proteins displayed a higher occurrence of significant periodicities at both C-and N-termini, indicating potent features to discriminate between secretion types, particularly by the use of variables selected from the full periodicity profiles at 19 orders of FT. The Fishers linear discriminant analysis, together with the stepwise selection of variables throughout equal pairs of combinations for all predefined groups of residues, revealed the C-terminal harmonics of aromatic (HFWY) and aliphatic (VLIA) residues as a set of strong predictor variables to classify both types of secreted proteins with an accuracy of 100% for original grouped cases and 96.4% for cross-validated grouped cases. The prediction accuracy of proposed discriminant function was estimated by repeated k-fold cross-validation procedures where the original data set was randomly divided into k subsets, with one of the k-subsets serving as the test set and the remaining data forming the training set. The average error rate computed across all k-trials and repeats did not exceed that of leave-one-out procedure. The proposed set of predictor variables could be used to assess the compatibility between secretion pathways and secretion substrates of proteobacteria by means of discriminant analysis.  相似文献   

5.
OBJECTIVE: To investigate of the potential value of morphometry and discriminant analysis for the classification of benign and malignant gastric cells and lesions. STUDY DESIGN: The data set consisted of 13,300 cells from 120 cases composed of 30 cases of cancer, 26 cases of gastritis and 64 cases of ulcer according to the final histologic diagnosis. The cytologic diagnosis was divided into 5 categories (gastritis, ulcer, inflammatory dysplasia, cancer and true dysplasia). Classification was attempted at 2 levels: the cell level to classify individual cells and the case level to classify individual cases. For the cellular classification the measured cells from 50% of available cases were selected as a training set to construct a model. The cells from the remaining cases were used as a test set to validate the model. Similarly for case classification, the same 50% of cases that were used for cell classification were used as a training set and the remaining cases as a test set. Images of routinely processed gastric smears stained by the Papanicolaou technique were analyzed by a customized image analysis system. RESULTS: Application of discriminant analysis on the test set gave correct classification of 98.4% of benign cells and 67.1% of malignant cells. On case classification, 100% accuracy was achieved for benign and malignant cases, both for the training and test sets. CONCLUSION: The application of discriminant analysis described in this paper could produce significant classification results at the cellular and individual case level.  相似文献   

6.
Discrete characters of the occlusal surface (additional cusps) have been studied to elaborate a new approach to the identification of the Ground Squirrel species Spermophilus odessanus, S. suslicus, S. pygmaeus, S. citellus, and S. xanthoprymnus. Data on the presence/absence of the additional cusps have been represented as star plots and, in addition, have been studied using discriminant function analysis. The species‐specific sets of the characters (patterns of bunodonty) have been revealed and are of high diagnostic value. The Citellus‐set is defined by the presence of mesostyles and the rareness of the metastylids, paraconules and metaconules, hypostyles and protostyles. The Pygmaeus‐set is characterized by the presence of additional cusps in the lower cheek teeth. The Odessanus‐oriented set is found in the Spermophilus pygmaeus, S. odessanus, and S. suslicus. The relatively high frequency of additional cusps of the metaloph and the paraloph is characteristic for this set. The Plesiomorphic‐set (characters shared by all the studied species and for this reason regarded herein as ancestral) is found in S. xanthoprymnus. The patterns of bunodonty serve as diagnostic criteria only as a whole: the shape of a star plot (relations among the character frequencies), rather than certain character values, is indicative. An optimal level of identification of species is possible based on the combination of the discrete characters mentioned and on the size parameters of the third upper molar. The occlusal sets are intended to remain stable during the time of species existence and seem to correspond to trends in specialization. The functional meaning of the sets can be explained by the dependence between the presence/absence of the discrete characters and the shape of the crown and its main lophs. Each pattern is likely to correspond to a trophic niche, and this niche corresponds to the species. J. Morphol. 277:814–825, 2016. © 2016 Wiley Periodicals, Inc.  相似文献   

7.
Conditional multivariate normal density functions are used to construct conditional quadratic discriminant functions that adjust for covariate differences between disease groups. An expected actual error rate for the conditional discriminant function is defined. The purpose of this paper is to use the conditional quadratic discriminant function and its misolassification error rate in order to help determine if a set of discriminators is a good biological marker for disease screening. The conditional quadratic discriminant analysis is illustrated using data from two alcoholism classification problems. It is shown how the discriminant functions can identify a set of variables that can be used as biological markers.  相似文献   

8.
We present a method of data reduction using a wavelet transform in discriminant analysis when the number of variables is much greater than the number of observations. The method is illustrated with a prostate cancer study, where the sample size is 248, and the number of variables is 48,538 (generated using the ProteinChip technology). Using a discrete wavelet transform, the 48,538 data points are represented by 1271 wavelet coefficients. Information criteria identified 11 of the 1271 wavelet coefficients with the highest discriminatory power. The linear classifier with the 11 wavelet coefficients detected prostate cancer in a separate test set with a sensitivity of 97% and specificity of 100%.  相似文献   

9.
Fermentation database mining by pattern recognition   总被引:1,自引:0,他引:1  
A large volume of data is routinely collected during the course of typical fermentation and other processes. Such data provide the required basis for process documentation and occasionally are also used for process analysis and improvement. The information density of these data is often low, and automatic condensing, analysis, and interpretation ("database mining") are highly desirable. In this article we present a methodology whereby process variables are processed to create a database of derivative process quantities representative of the global patterns, intermediate trends, and local characteristics of the process. A powerful search algorithm subsequently attempts to extract the specific process variables and their particular attributes that uniquely characterize a class of process outcomes such as high- or low-yield fermentations.The basic components of our pattern recognition methodology are described along with applications to the analysis of two sets of data from industrial fermentations. Results indicate that truly discriminating variables do exist in typical fermentation data and they can be useful in identifying the causes or symptoms of different process outcomes. The methodology has been implemented in a user-friendly software, named db-miner, which facilitates the application of the methodology for efficient and speedy analysis of fermentation process data. (c) 1997 John Wiley & Sons, Inc. Biotechnol Bioeng 53: 443-452, 1997.  相似文献   

10.
This study explores various options available for choosing the number of principal coordinates m in the canonical analysis of principal coordinates ‘CAP’, a useful procedure that has wide‐ranging application wherever multivariate data sets are collected or generated. Choosing too few coordinates (small m) in this constrained (i.e. hypothesis‐based) ordination procedure may lead to inadequate separation of the groups (when used as a canonical discriminant analysis) or to inadequate correlation between explanatory and response variables (when used as a canonical correlations analysis), whereas choosing too many (large m) may lead to overparameterization, resulting in overfitting of the data and spurious relationships. It is shown here that the optimum number of principal coordinates is simply the one that results in the smallest P value in the canonical analysis carried out using permutations. For data in which more than one m value results in the same minimum P value, m should be chosen from that set to be the number of principal coordinates that minimizes the leave‐one‐out residual sum of squares. This choice of m provides suitable solutions for each of the 17 case studies investigated here (which yielded 17 canonical discriminant analyses and 7 canonical correlation analyses).  相似文献   

11.
  1. When we collect the growth curves of many individuals, orderly variation in the curves is often observed rather than a completely random mixture of various curves. Small individuals may exhibit similar growth curves, but the curves differ from those of large individuals, whereby the curves gradually vary from small to large individuals. It has been recognized that after standardization with the asymptotes, if all the growth curves are the same (anamorphic growth curve set), the growth curve sets can be estimated using nonchronological data; otherwise, that is, if the growth curves are not identical after standardization with the asymptotes (polymorphic growth curve set), this estimation is not feasible. However, because a given set of growth curves determines the variation in the observed data, it may be possible to estimate polymorphic growth curve sets using nonchronological data.
  2. In this study, we developed an estimation method by deriving the likelihood function for polymorphic growth curve sets. The method involves simple maximum likelihood estimation. The weighted nonlinear regression and least‐squares method after the log‐transform of the anamorphic growth curve sets were included as special cases.
  3. The growth curve sets of the height of cypress (Chamaecyparis obtusa) and larch (Larix kaempferi) trees were estimated. With the model selection process using the AIC and likelihood ratio test, the growth curve set for cypress was found to be polymorphic, whereas that for larch was found to be anamorphic. Improved fitting using the polymorphic model for cypress is due to resolving underdispersion (less dispersion in real data than model prediction).
  4. The likelihood function for model estimation depends not only on the distribution type of asymptotes, but the definition of the growth curve set as well. Consideration of these factors may be necessary, even if environmental explanatory variables and random effects are introduced.
  相似文献   

12.
The purpose of this study was to analyze exercise-induced leg fatigue during a dynamic fatiguing task by examining the shapes of power vs. time curves through the combined use of several statistical methods: B-spline smoothing, functional principal components and (supervised and unsupervised) classification. In addition, granulometric size distributions were also computed to allow for comparison of curves coming from different subjects. Twelve physically active men participated in one acute heavy-resistance exercise protocol which consisted of five sets of 10 repetition maximum leg press with 120 s of rest between sets. To obtain a smooth and accurate representation of the data, a basis of 180 B-splines was used. Functional principal component (FPC) analysis was used to find the dominant modes of variation in the curves. A multivariate cluster over the FPC scores and a k-nearest neighbor classification led to three interpretable groups corresponding to different levels of fatigue. Fatigue-induced changes in the shapes of the power curves were evident, in which curves progressively flatten and develop a second power peak. In a practical setting FPC analysis greatly reduces dimensionality and the use of granulometries allows for comparison of the curve shapes without distorting the time scale.In contrast to the present methodology, which considers each curve as a datum, classical statistical approaches using summary parameters of time series may lead to limited information about the impact of dynamic fatiguing protocols on kinematic and kinetic time-course changes in curve shapes.  相似文献   

13.
The modern wine industry needs tools for process control and quality assessment in order to better manage fermentation or bottling processes. During wine fermentation it is important to measure both substrate and product concentrations (e.g. sugars, phenolic compounds), however, the analysis of these compounds by traditional means requires sample preparation and in some cases several steps of purification are needed. The combination of visible/near-infrared (Vis/NIR) spectroscopy and chemometrics potentially provides an ideal solution to accurately and rapidly monitor physical or chemical changes in wine during processing without the need for chemical analysis. The aim of this study was to assess the possibility of combining spectral and multivariate techniques, such as principal component analysis (PCA), discriminant partial least squares (DPLS), or linear discriminant analysis (LDA), to monitor time-related changes that occur during red wine fermentation. Samples (n = 652) were collected at various times from several pilot scale fermentations with grapes from either Cabernet Sauvignon or Shiraz varieties, over three vintages (2001-2003) and scanned using a monochromator instrument (Foss-NIRSystems 6500, Silver Spring, MD) in transmission mode (400-2,500 nm). PCA was used to demonstrate consistent progressive spectral changes that occur through the time course of the fermentation. LDA using PCA scores showed that regardless of variety or vintage, samples belonging to a particular time point in fermentation could be correctly classified. This study demonstrates the potential of Vis/NIR spectroscopy combined with chemometrics, as a tool for the rapid monitoring of red wine fermentation.  相似文献   

14.
Discriminant analysis of microcalorimetric data of bacterial growth   总被引:2,自引:0,他引:2  
In this work a bacterial classification method based on the discriminant analysis of the microcalorimetric data provided by the growth power-time (p-t) curves is developed. This method is applied to classify several species of Enterobacteria of different origins, and the results are compared with those obtained by conventional techniques. The proposed analysis allows us to classify bacteria into species and discriminate among strains of the same species. The classification is carried out using one run of each isolate after standardization of inocula and growth conditions. The discrimination power of available microcalorimetric data is also discussed, and the most discriminant set of data is proposed as the input variables of the analysis. Finally, the advantages of microcalorimetry as a taxonomical technique are discussed.  相似文献   

15.
16.
Objective: The discrimination of hyperchromatic crowded cell groups (HCCGs) in cervical cytology is a difficult and error‐prone interpretive task. While the classic features of dyskaryosis are of undoubted value, the contribution of size, shape and colour intensity of HCCGs is less certain. This study employed morphometric analysis to determine whether HCCG area, shape and colour intensity are useful in categorising them. Methods: Seventy‐five digital images from each of six categories of HCCG were subjected to image analysis. Ten variables relating to HCCG size, shape and colour intensity were assessed by discriminant function analysis. A further 28 cases were employed as a test set to determine the classification accuracy of the discriminant model. All samples were SurePath liquid‐based cytology preparations. Results: Nine of the 10 variables contributed significantly to the model (P < 0.001) but no single variable had sufficient discriminative ability. Classification accuracy was highest for abnormal endocervical HCCGs and lowest for squamous metaplastic cells (64.0 vs. 17.3% correct classification rate). The accuracy of the model for distinguishing normal and abnormal HCCGs was 70.0%, which was significantly higher than chance (P < 0.0001), but this reduced to 64.3% for the test cases, which was no better than chance (P > 0.05). Conclusions: The area, shape and colour intensity of HCCGs, either alone or in combination, have little discriminative value. Practitioners and trainers should focus on the well‐established features of dyskaryosis, such as chromatin pattern, nuclear membrane irregularities and group architecture. In terms of morphometric analysis, DNA ploidy and chromatin texture analysis may be more fruitful avenues of investigation.  相似文献   

17.
Although dose-response curves are commonly used to describe in vivo cutaneous α-adrenergic responses, modeling parameters and analyses methods are not consistent across studies. The goal of the present investigation was to compare three analysis methods for in vivo cutaneous vasoconstriction studies using one reference data set. Eight women (22 ± 1 yr, 24 ± 1 kg/m(2)) were instrumented with three cutaneous microdialysis probes for progressive norepinephrine (NE) infusions (1 × 10(-8), 1 × 10(-6), 1 × 10(-5), 1 × 10(-4), and 1 × 10(-3) logM). NE was infused alone, co-infused with NG-monomethyl-l-arginine (l-NMMA, 10 mM) or Ketorolac tromethamine (KETO, 10 mM). For each probe, dose-response curves were generated using three commonly reported analyses methods: 1) nonlinear modeling without data manipulation, 2) nonlinear modeling with data normalization and constraints, and 3) percent change from baseline without modeling. Not all data conformed to sigmoidal dose-response curves using analysis 1, whereas all subjects' curves were modeled using analysis 2. When analyzing only curves that fit the sigmoidal model, NE + KETO induced a leftward shift in ED(50) compared with NE alone with analyses 1 and 2 (F test, P < 0.05) but only tended to shift the response leftward with analysis 3 (repeated-measures ANOVA, P = 0.08). Neither maximal vasoconstrictor capacity (E(max)) in analysis 1 nor %change CVC change from baseline in analysis 3 were altered by blocking agents. In conclusion, although the overall detection of curve shifts and interpretation was similar between the two modeling methods of curve fitting, analysis 2 produced more sigmoidal curves.  相似文献   

18.
Discriminant analyses of 23 digital and 15 palmar quantitative dermatoglyphic variables of 1364 Sardinians, 689 males and 675 females, were performed to identify biological relationships among five Sardinian linguistic groups of both sexes. By various subsets of dermatoglyphic variables (23 and 20 digital, 15 and 14 palmar, 4 summary traits) MANOVA revealed high intergroup heterogeneity among the groups of both sexes and within each sex. In the latter case the males are an exception when 15 and 14 (MLI removed) palmar variables are used. Standard discriminant analysis of the 23 digital variables, i.e. the radial and ulnar ridge counts on each of the 10 fingers plus total finger ridge count (TFRC), absolute finger ridge count (AFRC) and pattern intensity (PI), resulted in imperfect separation of males and females and an unclear picture of the biological relationships among the groups. In contrast, standard discriminant analysis of 20 digital variables (TFRC, AFRC and PI were removed from the analysis) resulted in separation of the sexes and a pattern of relationships among the populations consistent with their ethno-historical backgrounds. Standard discriminant analysis of 15 palmar dermatoglyphic variables failed to provide separation of the sexes and produced a pattern of relationships in disagreement with both the linguistic and ethno-historical backgrounds, even removing MLI (Main Line Index). Standard discriminant analysis of 4 summary dermatoglyphic variables (TFRC, AFRC, PI and MLI) yielded imperfect separation of males and females and an unclear pattern of relationships. By stepwise discriminant analysis with p < or = 0.01 as F-to-enter and p < or = 0.05 as F-to-remove, only 4 of the 38 digital and palmar variables were in the model (URC R5, RRC L5, RRC R5, URC R4). The pattern of inter-population biological relationships was conceptually similar to the one produced by the 20 digital variables. It showed a clear separation of the Gallurian group (both males and females), which speaks an Italian dialect. The properly Sardinian linguistic groups (Campidanian and Logudorian), the Sassarian group (which speaks an Italian dialect) and the Alghero group (which speaks Catalan) were close to one another. This picture agrees with the ethno-historical background rather than with the linguistic one.  相似文献   

19.
牛科(哺乳纲:偶蹄目)动物与食物有关的适应形态模式   总被引:1,自引:0,他引:1  
利用逐步分辨分析方法(Stepwise discriminant analysis,SCDA)检测了广义牛科动物的颅齿部结构,这些结构特征可以作为采食行为生态适应特征。在本研究中,测量了72种广义牛科动物的28个颅齿部结构。逐步分辨分析方法得出了6种采食方式适应类型:一般粗食者、新鲜禾草粗食者、开阔生境混合型采食者、精食者、郁闭生境混合型采食者、食果者。用103个标本检测了分辨指标的预测能力,所用标本为缺损标本,大多数缺少一项或多项结构。从这些标本获得的分辨函数的平均预测能力为94%,比用72种广义牛科动物标本建立的分辨函数的平均预测能力(98%)低一些。从一个颅齿部结构小样本建立的分辨函数可以用于考古发掘物中不完整标本的研究。这些指标与用颅下结构测量建立的运动能力和生境选择的指标相结合,可以推断古牛科动物的个体生态学以及古环境重建。  相似文献   

20.
Question: Can discriminant analysis be used to quantify ecological change? Can fossil pollen data be used as a proxy to quantify moisture availability change through discriminant analysis? Location: Lake Sauce, Amazonian piedmont of Peru. Methods: A linear discriminant function was used to classify taxa found through pollen analysis into wet and dry indicators. The data set was filtered to exclude rare taxa from the analysis. Given that after application of the filter there were more variables (samples) than observations (taxa), the model was “de‐saturated” through simulation of samples based on the existing data set. Results: The inclusion of taxa that have a relative abundance of 1% or more in at least 5% of the samples reduces noise in the data set. Application of discriminant analysis to pollen data gave an error of 18% when classifying taxa by affinity with dry or wet conditions. The inferred moisture availability curve shows consistency with independent proxies from the same core and with identified local and sub‐continental moisture patterns. Conclusions: The method provides a reliable means to reduce a complex paleoecological data set to proportional change in a single pre‐defined variable. The output is a relative scale of change of a defined environmental gradient through time, without reliance on an extensive array of modern analogues. The results appear to provide a comparable quality of information to that of isotopic analysis derived from speleothem or sedimentary records.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号