首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
3.
To discriminate between breast cancer patients and controls, we used a three-step approach to obtain our decision rule. First, we ranked the mass/charge values using random forests, because it generates importance indices that take possible interactions into account. We observed that the top ranked variables consisted of highly correlated contiguous mass/charge values, which were grouped in the second step into new variables. Finally, these newly created variables were used as predictors to find a suitable discrimination rule. In this last step, we compared three different methods, namely Classification and Regression Tree (CART), logistic regression and penalized logistic regression. Logistic regression and penalized logistic regression performed equally well and both had a higher classification accuracy than CART. The model obtained with penalized logistic regression was chosen as we hypothesized that this model would provide a better classification accuracy in the validation set. The solution had a good performance on the training set with a classification accuracy of 86.3%, and a sensitivity and specificity of 86.8% and 85.7%, respectively.  相似文献   

4.
5.
This study investigated the feasibility of using near infrared hyperspectral imaging (NIR-HSI) technique for non-destructive identification of sesame oil. Hyperspectral images of four varieties of sesame oil were obtained in the spectral region of 874–1734 nm. Reflectance values were extracted from each region of interest (ROI) of each sample. Competitive adaptive reweighted sampling (CARS), successive projections algorithm (SPA) and x-loading weights (x-LW) were carried out to identify the most significant wavelengths. Based on the sixty-four, seven and five wavelengths suggested by CARS, SPA and x-LW, respectively, two classified models (least squares-support vector machine, LS-SVM and linear discriminant analysis,LDA) were established. Among the established models, CARS-LS-SVM and CARS-LDA models performed well with the highest classification rate (100%) in both calibration and prediction sets. SPA-LS-SVM and SPA-LDA models obtained better results (95.59% and 98.53% of classification rate in prediction set) with only seven wavelengths (938, 1160, 1214, 1406, 1656, 1659 and 1663 nm). The x-LW-LS-SVM and x-LW-LDA models also obtained satisfactory results (>80% of classification rate in prediction set) with the only five wavelengths (921, 925, 995, 1453 and 1663 nm). The results showed that NIR-HSI technique could be used to identify the varieties of sesame oil rapidly and non-destructively, and CARS, SPA and x-LW were effective wavelengths selection methods.  相似文献   

6.
7.
Linear discriminant analysis (LDA) is frequently used for classification/prediction problems in physical anthropology, but it is unusual to find examples where researchers consider the statistical limitations and assumptions required for this technique. In these instances, it is difficult to know whether the predictions are reliable. This paper considers a nonparametric alternative to predictive LDA: binary, recursive (or classification) trees. This approach has the advantage that data transformation is unnecessary, cases with missing predictor variables do not require special treatment, prediction success is not dependent on data meeting normality conditions or covariance homogeneity, and variable selection is intrinsic to the methodology. Here I compare the efficacy of classification trees with LDA, using typical morphometric data. With data from modern hominoids, the results show that both techniques perform nearly equally. With complete data sets, LDA may be a better choice, as is shown in this example, but with missing observations, classification trees perform outstandingly well, whereas commercial discriminant analysis programs do not predict classifications for cases with incompletely measured predictor variables and generally are not designed to address the problem of missing data. Testing of data prior to analysis is necessary, and classification trees are recommended either as a replacement for LDA or as a supplement whenever data do not meet relevant assumptions. It is highly recommended as an alternative to LDA whenever the data set contains important cases with missing predictor variables.  相似文献   

8.
9.
In this study, we evaluated if the application of multivariate analysis on the data obtained from two-dimensional protein maps could mean an improvement in the search for protein markers. First, we performed a classical proteomic study of the differential expression of serum N-glycoproteins in colorectal cancer patients. Then, applying principal component analysis (PCA) we assessed the utility of the 2-D protein pattern and certain subsets of spots as a tool to distinguish control and case samples, and tested the accuracy of the classification model by linear discriminant analysis (LDA). On the other hand we looked for altered spots by univariate statistics and then analysed them as a cluster by PCA and LDA. We found that those proteins combined presented a theoretical sensitivity and specificity of 100%. Finally, the spots with known protein identity were analysed by multivariate methods, finding a subgroup that behaved as the most obvious candidates for further validation trials.  相似文献   

10.
  1. Insect populations are changing rapidly, and monitoring these changes is essential for understanding the causes and consequences of such shifts. However, large‐scale insect identification projects are time‐consuming and expensive when done solely by human identifiers. Machine learning offers a possible solution to help collect insect data quickly and efficiently.
  2. Here, we outline a methodology for training classification models to identify pitfall trap‐collected insects from image data and then apply the method to identify ground beetles (Carabidae). All beetles were collected by the National Ecological Observatory Network (NEON), a continental scale ecological monitoring project with sites across the United States. We describe the procedures for image collection, image data extraction, data preparation, and model training, and compare the performance of five machine learning algorithms and two classification methods (hierarchical vs. single‐level) identifying ground beetles from the species to subfamily level. All models were trained using pre‐extracted feature vectors, not raw image data. Our methodology allows for data to be extracted from multiple individuals within the same image thus enhancing time efficiency, utilizes relatively simple models that allow for direct assessment of model performance, and can be performed on relatively small datasets.
  3. The best performing algorithm, linear discriminant analysis (LDA), reached an accuracy of 84.6% at the species level when naively identifying species, which was further increased to >95% when classifications were limited by known local species pools. Model performance was negatively correlated with taxonomic specificity, with the LDA model reaching an accuracy of ~99% at the subfamily level. When classifying carabid species not included in the training dataset at higher taxonomic levels species, the models performed significantly better than if classifications were made randomly. We also observed greater performance when classifications were made using the hierarchical classification method compared to the single‐level classification method at higher taxonomic levels.
  4. The general methodology outlined here serves as a proof‐of‐concept for classifying pitfall trap‐collected organisms using machine learning algorithms, and the image data extraction methodology may be used for nonmachine learning uses. We propose that integration of machine learning in large‐scale identification pipelines will increase efficiency and lead to a greater flow of insect macroecological data, with the potential to be expanded for use with other noninsect taxa.
  相似文献   

11.
Various mechanistic and black-box models were applied for on-line estimations of viable cell concentrations in fed-batch cultivation processes for CHO cells. Data from six fed-batch cultivation experiments were used to identify the underlying models and further six independent data sets were used to determine the performance of the estimators. The performances were quantified by means of the root mean square error (RMSE) between the estimates and the corresponding off-line measured validation data sets. It is shown that even simple techniques based on empirical and linear model approaches provide a fairly good on-line estimation performance. Best results with respect to the validation data sets were obtained with hybrid models, multivariate linear regression technique and support vector regression. Hybrid models provide additional important information about the specific cellular growth rates during the cultivation.  相似文献   

12.
Estimates of annual survival rates of birds are valuable in a wide range of studies of population ecology and conservation. These include modelling studies to assess the impacts of climatic change or anthropogenic mortality for many species for which no reliable direct estimates of survival are available. We evaluate the performance of regression models in predicting adult survival rates of birds from values of demographic and ecological covariates available from textbooks and databases. We estimated adult survival for 67 species using dead recoveries of birds ringed in southern Africa and fitted regression models using five covariates: mean clutch size, mean body mass, mean age at first breeding, diet and migratory tendency. Models including these explanatory variables performed well in predicting adult survival in this set of species, both when phylogenetic relatedness of the species was taken into account using phylogenetic generalized least squares (51% of variation in logit survival explained) and when it was not (48%). Two independent validation tests also indicated good predictive power, as indicated by high correlations of observed with expected values in a leave‐one‐out cross validation test performed using data from the 67 species (35% of variation in logit survival explained), and when annual survival rates from independent mark–recapture studies of 38 southern African species were predicted from covariates and the regression using dead recoveries (48%). Clutch size and body mass were the most influential covariates, both with and without the inclusion of phylogenetic effects, and a regression model including only these two variables performed well in both of the validation tests (39 and 48% of variation in logit survival explained). Our regression models, including the version with only clutch size and body mass, are likely to perform well in predicting adult survival rate for southern African species for which direct survival estimates are not available.  相似文献   

13.
14.
A rapid and reliable intraoperative diagnostic technique to support clinical decisions was developed using Fourier‐transform infrared (FTIR) spectroscopy. Twenty‐six fresh tissue samples were collected intraoperatively from patients undergoing gynecological surgeries. Frozen section (FS) histopathology aimed to discriminate between malignant and benign tumors was performed, and attenuated total reflection (ATR) FTIR spectra were collected from these samples. Digital dehydration and principal component analysis and linear discriminant analysis (PCA‐LDA) models were developed to classify samples into malignant and benign groups. Two validation schemes were employed: k‐fold and “leave one out.” FTIR absorption spectrum of a fresh tissue sample was obtained in less than 5 minutes. The fingerprint spectral region of malignant tumors was consistently different from that of benign tumors. The PCA‐LDA discrimination model correctly classified the samples into malignant and benign groups with accuracies of 96% and 93% for the k‐fold and “leave one out” validation schemes, respectively. We showed that a simple tissue preparation followed by ATR‐FTIR spectroscopy provides accurate means for very rapid tumor classification into malignant and benign gynecological tumors. With further development, the proposed method has high potential to be used as an adjunct to the intraoperative FS histopathology technique.  相似文献   

15.
16.
17.
Multiple receptors conformation docking (MRCD) and clustering of dock poses allows seamless incorporation of receptor binding conformation of the molecules on wide range of ligands with varied structural scaffold. The accuracy of the approach was tested on a set of 120 cyclic urea molecules having HIV-1 protease inhibitory activity using 12 high resolution X-ray crystal structures and one NMR resolved conformation of HIV-1 protease extracted from protein data bank. A cross validation was performed on 25 non-cyclic urea HIV-1 protease inhibitor having varied structures. The comparative molecular field analysis (CoMFA) and comparative molecular similarity indices analysis (CoMSIA) models were generated using 60 molecules in the training set by applying leave one out cross validation method, rloo2 values of 0.598 and 0.674 for CoMFA and CoMSIA respectively and non-cross validated regression coefficient r2 values of 0.983 and 0.985 were obtained for CoMFA and CoMSIA respectively. The predictive ability of these models was determined using a test set of 60 cyclic urea molecules that gave predictive correlation (rpred2) of 0.684 and 0.64 respectively for CoMFA and CoMSIA indicating good internal predictive ability. Based on this information 25 non-cyclic urea molecules were taken as a test set to check the external predictive ability of these models. This gave remarkable out come with rpred2 of 0.61 and 0.53 for CoMFA and CoMSIA respectively. The results invariably show that this method is useful for performing 3D QSAR analysis on molecules having different structural motifs.  相似文献   

18.
19.
A new addition method is described in this study for calculating the partition coefficients of peptides. LogP and logD values of peptides are calculated by summing the contributions of the component amino acids. The final models are derived from a multivariate linear regression analysis of 219 peptides with known experimental data. The standard errors in a leave-one-out cross-validation are 0.23 and 0.24 log units for the logP and logD values, respectively. The predictive ability of the model is tested by an extra set of ten peptides, and the self-consistency of the model is further demonstrated by a new validation procedure called the evolution test. The parameters obtained in regression could be used as hydrophobicity scales for amino acids. The application of such hydrophobicity scales has also been discussed.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号