首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The use of order statistics to discriminate and classify DNA ploidy patterns is proposed, especially for the classification of additional observations: whether a given sample is more likely to have come from a normal or an abnormal tissue, and with what probability, based on its ploidy pattern. The method involves the order of observations within each of several samples (e.g., euploid and aneuploid DNA patterns) and the use of subsets of the obtained order statistics as independent variables in a linear discriminant analysis. It thus replaces univariate observations by (some of) their order statistics, which are then used as the variables in the discriminant analysis. The procedure does not require normality of distributions or the transformation of nonnormal distributions, as do many discriminant functions; order statistics are usually distribution-free and thus are particularly useful for nonparametric inference. Preliminary simulation studies verified the potential usefulness of the order statistics discriminant function method as applied to DNA ploidy analysis. Its advantages as compared to the usual methods for hypothesis testing, e.g., the use of the chi-square or Kolmogorov-Smirnov tests to as certain "goodness-of-fit," is discussed. The proposed method is easy to implement and easy to interpret; it is also applicable to the study of distributions of other types of measurements.  相似文献   

2.
Combining diagnostic test results to increase accuracy   总被引:4,自引:0,他引:4  
When multiple diagnostic tests are performed on an individual or multiple disease markers are available it may be possible to combine the information to diagnose disease. We consider how to choose linear combinations of markers in order to optimize diagnostic accuracy. The accuracy index to be maximized is the area or partial area under the receiver operating characteristic (ROC) curve. We propose a distribution-free rank-based approach for optimizing the area under the ROC curve and compare it with logistic regression and with classic linear discriminant analysis (LDA). It has been shown that the latter method optimizes the area under the ROC curve when test results have a multivariate normal distribution for diseased and non-diseased populations. Simulation studies suggest that the proposed non-parametric method is efficient when data are multivariate normal.The distribution-free method is generalized to a smooth distribution-free approach to: (i) accommodate some reasonable smoothness assumptions; (ii) incorporate covariate effects; and (iii) yield optimized partial areas under the ROC curve. This latter feature is particularly important since it allows one to focus on a region of the ROC curve which is of most relevance to clinical practice. Neither logistic regression nor LDA necessarily maximize partial areas. The approaches are illustrated on two cancer datasets, one involving serum antigen markers for pancreatic cancer and the other involving longitudinal prostate specific antigen data.  相似文献   

3.
Linear discriminant analysis (LDA) is a multivariate classification technique frequently applied to morphometric data in various biomedical disciplines. Canonical variate analysis (CVA), the generalization of LDA for multiple groups, is often used in the exploratory style of an ordination technique (a low-dimensional representation of the data). In the rare case when all groups have the same covariance matrix, maximum likelihood classification can be based on these linear functions. Both LDA and CVA require full-rank covariance matrices, which is usually not the case in modern morphometrics. When the number of variables is close to the number of individuals, groups appear separated in a CVA plot even if they are samples from the same population. Hence, reliable classification and assessment of group separation require many more organisms than variables. A simple alternative to CVA is the projection of the data onto the principal components of the group averages (between-group PCA). In contrast to CVA, these axes are orthogonal and can be computed even when the data are not of full rank, such as for Procrustes shape coordinates arising in samples of any size, and when covariance matrices are heterogeneous. In evolutionary quantitative genetics, the selection gradient is identical to the coefficient vector of a linear discriminant function between the populations before vs. after selection. When the measured variables are Procrustes shape coordinates, discriminant functions and selection gradients are vectors in shape space and can be visualized as shape deformations. Except for applications in quantitative genetics and in classification, however, discriminant functions typically offer no interpretation as biological factors.  相似文献   

4.
Selecting an appropriate variable subset in linear multivariate methods is an important methodological issue for ecologists. Interest often exists in obtaining general predictive capacity or in finding causal inferences from predictor variables. Because of a lack of solid knowledge on a studied phenomenon, scientists explore predictor variables in order to find the most meaningful (i.e. discriminating) ones. As an example, we modelled the response of the amphibious softwater plant Eleocharis multicaulis using canonical discriminant function analysis. We asked how variables can be selected through comparison of several methods: univariate Pearson chi-square screening, principal components analysis (PCA) and step-wise analysis, as well as combinations of some methods. We expected PCA to perform best. The selected methods were evaluated through fit and stability of the resulting discriminant functions and through correlations between these functions and the predictor variables. The chi-square subset, at P < 0.05, followed by a step-wise sub-selection, gave the best results. In contrast to expectations, PCA performed poorly, as so did step-wise analysis. The different chi-square subset methods all yielded ecologically meaningful variables, while probable noise variables were also selected by PCA and step-wise analysis. We advise against the simple use of PCA or step-wise discriminant analysis to obtain an ecologically meaningful variable subset; the former because it does not take into account the response variable, the latter because noise variables are likely to be selected. We suggest that univariate screening techniques are a worthwhile alternative for variable selection in ecology.  相似文献   

5.
A recursive method of obtaining the maximum likelihood estimates of the parameters of the quadratic logistic discriminant function is presented. This method is an extension of the Walker and Duncan procedure (1967) proposed for the linear logistic discriminant function in a dichotomous case. A generalization of the method to the problem of discrimination between several populations is also given in the paper. It works for both linear and quadratic logistic discriminant function. After an estimation of the parameters of the logistic function a classification can be performed. An example of application of the method to automatic diagnosis of some respiratory diseases is presented. Comparison with the standard procedures used for the estimation is done by a short simulation study.  相似文献   

6.
The paper deals with the optimal Bayes discriminant rule for qualitative variables. The performance of variable selection is investigated under strong assumptions like the restriction to dichotomous variables, which are assumed to be independent or dependent with fixed dependence structure, and all parameters known. Differences in comparison with normal variables in linear discriminant analysis can be shown. This is a further reason for applying special methods of discriminant analysis in the case of qualitative variables.  相似文献   

7.
The Fourier transform (FT) method was applied to specify the distribution of 14 predefined groups of amino acids (64 residues) at both termini of annotated type III and type I secreted proteins from proteobacteria. Type I proteins displayed a higher occurrence of significant periodicities at both C-and N-termini, indicating potent features to discriminate between secretion types, particularly by the use of variables selected from the full periodicity profiles at 19 orders of FT. The Fishers linear discriminant analysis, together with the stepwise selection of variables throughout equal pairs of combinations for all predefined groups of residues, revealed the C-terminal harmonics of aromatic (HFWY) and aliphatic (VLIA) residues as a set of strong predictor variables to classify both types of secreted proteins with an accuracy of 100% for original grouped cases and 96.4% for cross-validated grouped cases. The prediction accuracy of proposed discriminant function was estimated by repeated k-fold cross-validation procedures where the original data set was randomly divided into k subsets, with one of the k-subsets serving as the test set and the remaining data forming the training set. The average error rate computed across all k-trials and repeats did not exceed that of leave-one-out procedure. The proposed set of predictor variables could be used to assess the compatibility between secretion pathways and secretion substrates of proteobacteria by means of discriminant analysis.  相似文献   

8.
Two variable selection procedures are evaluated for classification problems: a forward stepwise discrimination procedure, and a stepwise procedure preceded by a preliminary screening of variables on the basis of individual t statistics. Expected probability of correct classification is used as the measure of performance. A comparison is made of the procedures using samples from multi-variate normal populations and from several nonnormal populations. The study demonstrated some situations where the use of all variables is preferable to the use of a stepwise discriminant procedure stopping after a few steps, though usually the latter procedure was superior in performance. However where the stepwise procedure performed better than using all variables, the modified stepwise procedure performed still better. The use of modified stepwise procedures in which not all the covariances of the problem need be estimated seems promising.  相似文献   

9.
J Q Su  J M Lachin 《Biometrics》1992,48(4):1033-1042
Many studies involve the collection of multivariate observations, such as repeated measures, on two groups of subjects who are recruited over time, i.e., with staggered entry of subjects. Various marginal distribution-free multivariate methods have been proposed for the analyses of such multivariate observations where some measures may be missing at random. Using the multivariate U statistic of Wei and Johnson (1985, Biometrika 72, 359-364), we describe the group sequential analysis of such a study where the multivariate observations are observed sequentially--both within and among subjects. We describe a multivariate generalization of the Hodges and Lehmann (1963, Annals of Mathematical Statistics 34, 598-611) estimator of a location shift that can be obtained via the multivariate U statistic with the Mann-Whitney-Wilcoxon kernel. We then describe large-sample group sequential interval estimators and tests based on an aggregate estimate of the location shift combined over all of the repeated measures. We also describe how the same steps could be employed to perform a group sequential analysis based on any one of the variety of marginal multivariate methods that have been proposed. These methods are applied to a real-life example.  相似文献   

10.
Conditional multivariate normal density functions are used to construct conditional quadratic discriminant functions that adjust for covariate differences between disease groups. An expected actual error rate for the conditional discriminant function is defined. The purpose of this paper is to use the conditional quadratic discriminant function and its misolassification error rate in order to help determine if a set of discriminators is a good biological marker for disease screening. The conditional quadratic discriminant analysis is illustrated using data from two alcoholism classification problems. It is shown how the discriminant functions can identify a set of variables that can be used as biological markers.  相似文献   

11.
Impairment in inhibitory control has been postulated as an underlying hallmark of attention deficit/hyperactivity disorder (ADHD), which can be utilized as a quantitative trait for genetic studies. Here, we evaluate whether inhibitory control, measured by simple automatized prepotent response (PR) inhibition variables, is a robust discriminant function for the diagnosis of ADHD in children and can be used as an endophenotype for future genetic studies. One hundred fifty-two school children (30.9% female, 67.8% with ADHD) were recruited. The ADHD checklist was used as the screening tool, whilst the DSM-IV Mini International Neuropsychiatry Interview, neurologic interview and neurologic examination, and the WISC III FSIQ test were administered as the gold standard procedure to assert ADHD diagnosis. A Go/No-Go task using a naturalistic and automatized visual signal was administered. A linear multifactor model (MANOVA) was fitted to compare groups including ADHD status, age, and gender as multiple independent factors. Linear discriminant analysis and the receiver operating characteristic curve were used to assess the predictive performance of PR inhibition variables for ADHD diagnosis. We found that four variables of prepotent response reaction time- and prepotent response inhibition established statistically significant differences between children with and without ADHD. Furthermore, these variables generated a strong discriminant function with a total classification capability of 73, 84% specificity, 68% sensitivity, and 90% positive predictive value for ADHD diagnosis, which support reaction times as a candidate endophenotype that could potentially be used in future ADHD genetic research.  相似文献   

12.
13.
In this paper two nonparametric tests are given for testing the nullhypothesis of parallelity of response curves in r samples. The first procedure is done by a permutation test, whose practical applicability is ensured by a FORTRAN-subroutine available from the author. As the computational work and time grows rapidly with sample sizes, a Monte-Carlo-solution is optionally given by the same procedure. Alternatively an asymptotically distribution-free test based on quadratic forms is constructed for large sample situations.  相似文献   

14.
A variables selection method for case‐control studies is proposed that uses an adaptive weighting scheme along with a permutation method to determine if a variable is useful in differentiating the cases from the controls. This adaptive method is used to select exposure variables for the analysis of data from a bladder cancer case‐control study. An extensive simulation study shows that the adaptive method is nearly as effective at finding those variables that are related to case‐control status when normally distributed variables are used. The simulation also shows that the proposed variable selection procedure is much more effective than the stepwise discriminant analysis method when the variables are not normally distributed. (© 2004 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

15.
A retrospective study was carried out to assess the feasibility of computer-assisted prognostication by discriminant analysis and the Bayesian classification procedure based on clinical information collected on patients with acute myocardial infarction. The overall accuracy was 94.2% in predicting hospital death but the prediction of late death after discharge was less accurate. It was found that not all of the 44 variables used for analysis were necessary to reach the same level of predictive accuracy--16 to 20 variables would result in almost the identical prediction. The Bayesian classification procedure was applied to estimate probabilities of individual patients belonging to the different prognostic categories.  相似文献   

16.
The paper represents a cross-sectional study from a sample of 1800 out of 3000 school-children from the Braunschweiger L?ngsschnitt. In this methodical approach we first eliminate approximately the influence of age and stature on the raw data from all body measurements with regression equations. The transformed data were attached to three "types" named 'below normal', 'normal', and 'above normal', in course of which 'normal' means all cases in the range of the standard deviation, whereas the two other "types" are corresponding to the adjacent ranges of values. Subsequently each transformation on the mean of age and stature a discriminant analysis has been performed grouping the cases by the "types" of the width of pelvis, resp. shoulders. There were found great influences of the stature on the chosen measures of width in our investigated class of age. They could be made clear alone by using allometrical methods. Only before correction of the body height the given grouping is supported by other variables, at which different sets of variables dominate the discriminant functions for boys and girls. Out of this new aspects and considerations result for the understanding of the physique and the physique typologies, which would be significant in our opinion for acceleration phenomenon as well as for the comparative examinations on populations.  相似文献   

17.
S M Snapinn  J D Knoke 《Biometrics》1989,45(1):289-299
Accurate estimation of misclassification rates in discriminant analysis with selection of variables by, for example, a stepwise algorithm, is complicated by the large optimistic bias inherent in standard estimators such as those obtained by the resubstitution method. Application of a bootstrap adjustment can reduce the bias of the resubstitution method; however, the bootstrap technique requires the variable selection procedure to be repeated many times and is therefore difficult to compute. In this paper we propose a smoothed estimator that requires relatively little computation and which, on the basis of a Monte Carlo sampling study, is found to perform generally at least as well as the bootstrap method.  相似文献   

18.
A new method for the choice of variables with the greatest discriminatory power in the location model for mixed variable discriminant analysis is presented in the paper. The procedure based on the multivariate discriminatory measure enables a simultaneous reduction of the number of discrete and continuous variables. The introduced criterion can be used for both optimal or step-wise selection of variable subset. As an example the results of the stepwise variable selection for some medical data are presented in the paper.  相似文献   

19.
Optimal classification rules based on linear functions which maximize the Chernoff distance, or the Morisita distance, or the Kullback-Leibler distance are studied here. We obtain an expression for the optimal linear discriminant function and show that the resulting linear procedure belongs to the Anderson-Bahadur admissible class. For the comparison of discriminant rules we use some index which is the measure of the accuracy of a given class of discriminant procedures. The asymptotic form of the discriminant function is also studied.  相似文献   

20.
The mechanisms involved in the control of growth in chickens are too complex to be explained only under univariate analysis because all related traits are biologically correlated. Therefore, we evaluated broiler chicken performance under a multivariate approach, using the canonical discriminant analysis. A total of 1920 chicks from eight treatments, defined as the combination of four broiler chicken strains (Arbor Acres, AgRoss 308, Cobb 500 and RX) from both sexes, were housed in 48 pens. Average feed intake, average live weight, feed conversion and carcass, breast and leg weights were obtained for days 1 to 42. Canonical discriminant analysis was implemented by SASR CANDISC procedure and differences between treatments were obtained by the F-test (P < 0.05) over the squared Mahalanobis’ distances. Multivariate performance from all treatments could be easily visualised because one graph was obtained from two first canonical variables, which explained 96.49% of total variation, using a SASR CONELIP macro. A clear distinction between sexes was found, where males were better than females. Also between strains, Arbor Acres, AgRoss 308 and Cobb 500 (commercial) were better than RX (experimental). Evaluation of broiler chicken performance was facilitated by the fact that the six original traits were reduced to only two canonical variables. Average live weight and carcass weight (first canonical variable) were the most important traits to discriminate treatments. The contrast between average feed intake and average live weight plus feed conversion (second canonical variable) were used to classify them. We suggest analysing performance data sets using canonical discriminant analysis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号