首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The use of discriminant analysis for normal distributed populations with common or differing variances, and for populations with distributions of unknown type is discussed and illustrated by an example. Existing programs are mentioned. The results of the various methods of discriminant analysis are compared with each other.  相似文献   

2.
The use of order statistics to discriminate and classify DNA ploidy patterns is proposed, especially for the classification of additional observations: whether a given sample is more likely to have come from a normal or an abnormal tissue, and with what probability, based on its ploidy pattern. The method involves the order of observations within each of several samples (e.g., euploid and aneuploid DNA patterns) and the use of subsets of the obtained order statistics as independent variables in a linear discriminant analysis. It thus replaces univariate observations by (some of) their order statistics, which are then used as the variables in the discriminant analysis. The procedure does not require normality of distributions or the transformation of nonnormal distributions, as do many discriminant functions; order statistics are usually distribution-free and thus are particularly useful for nonparametric inference. Preliminary simulation studies verified the potential usefulness of the order statistics discriminant function method as applied to DNA ploidy analysis. Its advantages as compared to the usual methods for hypothesis testing, e.g., the use of the chi-square or Kolmogorov-Smirnov tests to as certain "goodness-of-fit," is discussed. The proposed method is easy to implement and easy to interpret; it is also applicable to the study of distributions of other types of measurements.  相似文献   

3.
OBJECTIVE: To describe the use of second order discriminant analysis as a classification methodology along with the underlying assumptions and sampling requirements, with special emphasis on the use of this analysis in chemopreventive efficacy studies. STUDY DESIGN: The discriminant function score distributions derived in an analysis of 2 diagnostic groups may show such overlap that a statistically significant difference in mean values cannot be shown and, more important, that a useful case-based classification cannot be attained. By using the discriminant function score distributions from each case, it is frequently possible to derive a second order discriminant function based on case-specific characteristics, rather than characteristics of nuclei, thereby attaining improved case classification. RESULTS: Second order discriminant analysis has proven very useful in the documentation of case-level efficacy in chemopreventive trials. In a study of orally administered vitamin A, a first order discriminant analysis did not achieve a statistically significant difference in the score distributions for nuclei, but a second order discriminant analysis allowed a correct recognition of intervention effects in 85% of submitted cases. In a chemopreventive study of triamcinolone, a similarly inadequate discrimination based on discriminant function scores for nuclei resulted. After a second order discriminant analysis, a reduction in solar-actinic damage could be shown in 14/15, or 93%, of treated cases. CONCLUSION: Second order discriminant analysis can be highly effective when the discriminating information offered at the nuclear level is inadequate due to high dispersion and small differences in mean values of discriminant function scores for the diagnostic groups. Second order analysis utilizes case-specific characteristics of the discriminant function score distributions to document diagnostic group separation and/or efficacy of chemopreventive intervention by a reduction in case discriminant function scores.  相似文献   

4.
Grassland vegetation on the Montlake fill was analyzed using TWINSPAN. Eight herb communities were recognized. Moisture, proximity to gas vents, and disturbance are the main factors that control species and community distributions. Binary discriminant analysis (BDA) and detrended correspondence analysis (DCA) were used to study species-environment relationships. BDA revealed complex species response patterns and the resultant indicator values were used to interpret the ordination axes. Species distributions are controlled primarily by moisture, but also influenced by soil pH. Multiple regressions revealed little about plant-environment relationships not discovered by BDA. Before robust nonlinear methods are available, BDA, metric ordination with data stratification and nonmetric ordination are methods that can yield satisfactory results in exploratory plant-environment studies. BDA alone is an efficient, useful first approach where response patterns of species are initially unknown.Abbreviations BDA Binary Discriminant Analysis - DCA Detrended Correspondence Analysis  相似文献   

5.
Summary High‐dimensional data such as microarrays have brought us new statistical challenges. For example, using a large number of genes to classify samples based on a small number of microarrays remains a difficult problem. Diagonal discriminant analysis, support vector machines, and k‐nearest neighbor have been suggested as among the best methods for small sample size situations, but none was found to be superior to others. In this article, we propose an improved diagonal discriminant approach through shrinkage and regularization of the variances. The performance of our new approach along with the existing methods is studied through simulations and applications to real data. These studies show that the proposed shrinkage‐based and regularization diagonal discriminant methods have lower misclassification rates than existing methods in many cases.  相似文献   

6.
Prediction of protein structural class by discriminant analysis   总被引:7,自引:0,他引:7  
Protein structural class--alpha, beta, mixed (alpha/beta or alpha + beta), irregular--can be predicted from the amino acid sequence by discriminant analysis. Discrimination is based on distributions, in the classes, of vectors of attributes characterizing the sequences. In this paper, two sets of attributes and two methods of estimating their distributions are compared using more than 100 proteins from the Protein Data Bank. The best results were obtained when canonical variates of the frequencies of occurrence of 20 amino acids and non-parametric estimates of their distributions were used. Three variates are sufficient to allocate proteins to one of four classes with 83% reliability (estimated by cross-validation) and four variates allowed allocation to one of five classes with 78% reliability.  相似文献   

7.
The application of discriminant analysis like other multivariate procedures is essentially complicated with incomplete data. Therefore several methods for handling missing observations occuring in initial samples were compared with each other. Recommendations are given for selecting a suitable method depending on underlying parameters.  相似文献   

8.
Man Jin  Yixin Fang 《Biometrics》2011,67(1):124-132
Summary In family studies, canonical discriminant analysis can be used to find linear combinations of phenotypes that exhibit high ratios of between‐family to within‐family variabilities. But with large numbers of phenotypes, canonical discriminant analysis may overfit. To estimate the predicted ratios associated with the coefficients obtained from canonical discriminant analysis, two methods are developed; one is based on bias correction and the other based on cross‐validation. Because the cross‐validation is computationally intensive, an approximation to the cross‐validation is also developed. Furthermore, these methods can be applied to perform variable selection in canonical discriminant analysis. The proposed methods are illustrated with simulation studies and applications to two real examples.  相似文献   

9.
Summary A generally applicable method for the automated classification of 2D NMR peaks has been developed, based on a Bayesian approach coupled to a multivariate linear discriminant analysis of the data. The method can separate true NMR signals from noise signals, solvent stripes and artefact signals. The analysis relies on the assumption that the different signal classes have different distributions of specific properties such as line shapes, line widths and intensities. As to be expected, the correlation network of the distributions of the selected properties affects the choice of the discriminant function and the final selection of signal properties. The classification rule for the signal classes was deduced from Bayes's theorem. The method was successfully tested on a NOESY spectrum of HPr protein from Staphylococcus aureus. The calculated probabilities for the different signal class memberships are realistic and reliable, with a high efficiency of discrimination between peaks that are true NOE signals and those that are not.  相似文献   

10.
The paper deals with the optimal Bayes discriminant rule for qualitative variables. The performance of variable selection is investigated under strong assumptions like the restriction to dichotomous variables, which are assumed to be independent or dependent with fixed dependence structure, and all parameters known. Differences in comparison with normal variables in linear discriminant analysis can be shown. This is a further reason for applying special methods of discriminant analysis in the case of qualitative variables.  相似文献   

11.
Morphometric feces data are used to identify ungulates, but their effectiveness is questioned by numerous authors. Herein, we evaluated the efficiency of this tool in discriminating scat samples from Neotropical deer with sympatric distributions. We performed discriminant analysis of previously identified scat samples (n = 204). The accuracy of discriminant analysis (56–92%) was lower than the confidence limit established in this study in all sympatric combinations expected in these biomes. These results demonstrate serious limitations regarding the use of scat morphometry for species identification of Neotropical deer and reinforce the need to use non-invasive genetic techniques.  相似文献   

12.
Identification of protein coding regions is fundamentally a statistical pattern recognition problem. Discriminant analysis is a statistical technique for classifying a set of observations into predefined classes and it is useful to solve such problems. It is well known that outliers are present in virtually every data set in any application domain, and classical discriminant analysis methods (including linear discriminant analysis (LDA) and quadratic discriminant analysis (QDA)) do not work well if the data set has outliers. In order to overcome the difficulty, the robust statistical method is used in this paper. We choose four different coding characters as discriminant variables and an approving result is presented by the method of robust discriminant analysis.  相似文献   

13.
Two linear functions for discriminating with qualitative variables (Fisher's linear discriminant function and the independence rule) are compared with the general multinomial procedure, a rule based on Lancaster's definition of higher order interactions and the quadratic discriminant function. The evaluation of these functions is carried out within Monte Carlo experiments. Various types of underlying distributions generated by a special algorithm are used.  相似文献   

14.
We compare the performances of established means of character selection for discriminant analysis in species distinction with a combination procedure for finding the optimal character combination (minimum classification error, minimum number of required characters), using morphometric data sets from the ant genera Cardiocondyla , Lasius and Tetramorium . The established methods are empirical character selection as well as forward selection, backward elimination and stepwise selection of discriminant analysis. The combination procedure is clearly superior to the established methods of character selection, and is widely applicable.  相似文献   

15.
Traditional methods of aging adult skeletons suffer from the problem of age mimicry of the reference collection, as described by Bocquet‐Appel and Masset (1982). Transition analysis (Boldsen et al., 2002) is a method of aging adult skeletons that addresses the problem of age mimicry of the reference collection by allowing users to select an appropriate prior probability. In order to evaluate whether transition analysis results in significantly different age estimates for adults, the method was applied to skeletal collections from Postclassic Cholula and Contact‐Period Xochimilco. The resulting age‐at‐death distributions were then compared with age‐at‐death distributions for the two populations constructed using traditional aging methods. Although the traditional aging methods result in age‐at‐death distributions with high young adult mortality and few individuals living past the age of 50, the age‐at‐death distributions constructed using transition analysis indicate that most individuals who lived into adulthood lived past the age of 50. Am J Phys Anthropol 152:67–78, 2013. © 2013 Wiley Periodicals, Inc.  相似文献   

16.
Classification of species into different functional groups based on biological criteria has been a difficult problem in ecology. The difficulty mainly arises because natural classification patterns are not necessarily mutually exclusive. The more group characteristics overlap, the more difficult it is to identify the membership of a species in the overlapping portions of any two groups. In this paper, we present an application of discriminant analysis by creating classification models from life history and morphological data for two specialist and two generalist life-styles type of predaceous phytoseiid mites. Two stages can be distinguished in our method: life-style group membership assignment and trait variable evaluation. We use a Bayesian framework to create a classifier system to locate or assign species within a mixture of trait distributions. The method assumes that a mixture of trait distributions can represent the multiple dimensions of biological data. The mixture is most evident near the boundaries between groups. Because of the complexity of analytical solution, an iterative method is used to estimate the unknown means, variances, and mixing proportion between groups. We also developed a criterion based on information theory to evaluate model performance with different combinations of input variables and different hypotheses. We present a working example of our proposed methods. We apply these methods to the problem of selecting key species for inoculative release and for classical introductions of biological pest control agents.  相似文献   

17.
Spatial organisation of proteins according to their function plays an important role in the specificity of their molecular interactions. Emerging proteomics methods seek to assign proteins to sub-cellular locations by partial separation of organelles and computational analysis of protein abundance distributions among partially separated fractions. Such methods permit simultaneous analysis of unpurified organelles and promise proteome-wide localisation in scenarios wherein perturbation may prompt dynamic re-distribution. Resolving organelles that display similar behavior during a protocol designed to provide partial enrichment represents a possible shortcoming. We employ the Localisation of Organelle Proteins by Isotope Tagging (LOPIT) organelle proteomics platform to demonstrate that combining information from distinct separations of the same material can improve organelle resolution and assignment of proteins to sub-cellular locations. Two previously published experiments, whose distinct gradients are alone unable to fully resolve six known protein-organelle groupings, are subjected to a rigorous analysis to assess protein-organelle association via a contemporary pattern recognition algorithm. Upon straightforward combination of single-gradient data, we observe significant improvement in protein-organelle association via both a non-linear support vector machine algorithm and partial least-squares discriminant analysis. The outcome yields suggestions for further improvements to present organelle proteomics platforms, and a robust analytical methodology via which to associate proteins with sub-cellular organelles.  相似文献   

18.
Minimum distance probability (MDP) is a robust discriminant algorithm based on a distance function. In this article, we generalize the use of MDP to the case of mixed (continuous and categorical) variables by means of the individual-score (IS) distance. This distance assumes an underlying parametric model and is based on the score transformation of the data. We have adapted it to the usual case of ignoring the distribution of the whole set of observed variables, but assuming that some knowledge about the marginal distributions is available. Finally, MDP with IS distance (IS-MDP) is compared with other discriminant methods (including those designed for mixed data) in several examples and simulations. IS-MDP is shown to be the most efficient method according the leave-one-out criterion.  相似文献   

19.
Ecological data sets often record the abundance of species, together with a set of explanatory variables. Multivariate statistical methods are optimal to analyze such data and are thus frequently used in ecology for exploration, visualization, and inference. Most approaches are based on pairwise distance matrices instead of the sites‐by‐species matrix, which stands in stark contrast to univariate statistics, where data models, assuming specific distributions, are the norm. However, through advances in statistical theory and computational power, models for multivariate data have gained traction. Systematic simulation‐based performance evaluations of these methods are important as guides for practitioners but still lacking. Here, we compare two model‐based methods, multivariate generalized linear models (MvGLMs) and constrained quadratic ordination (CQO), with two distance‐based methods, distance‐based redundancy analysis (dbRDA) and canonical correspondence analysis (CCA). We studied the performance of the methods to discriminate between causal variables and noise variables for 190 simulated data sets covering different sample sizes and data distributions. MvGLM and dbRDA differentiated accurately between causal and noise variables. The former had the lowest false‐positive rate (0.008), while the latter had the lowest false‐negative rate (0.027). CQO and CCA had the highest false‐negative rate (0.291) and false‐positive rate (0.256), respectively, where these error rates were typically high for data sets with linear responses. Our study shows that both model‐ and distance‐based methods have their place in the ecologist's statistical toolbox. MvGLM and dbRDA are reliable for analyzing species–environment relations, whereas both CQO and CCA exhibited considerable flaws, especially with linear environmental gradients.  相似文献   

20.
Summary As the nonparametric generalization of the one‐way analysis of variance model, the Kruskal–Wallis test applies when the goal is to test the difference between multiple samples and the underlying population distributions are nonnormal or unknown. Although the Kruskal–Wallis test has been widely used for data analysis, power and sample size methods for this test have been investigated to a much lesser extent. This article proposes new power and sample size calculation methods for the Kruskal–Wallis test based on the pilot study in either a completely nonparametric model or a semiparametric location model. No assumption is made on the shape of the underlying population distributions. Simulation results show that, in terms of sample size calculation for the Kruskal–Wallis test, the proposed methods are more reliable and preferable to some more traditional methods. A mouse peritoneal cavity study is used to demonstrate the application of the methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号