首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
目的 男性型脱发(male pattern baldness,MPB),又称为雄激素性脱发(AGA),是一种常见的男性脱发类型,大约80%的表型差异可以用遗传因素解释。目前的MPB遗传推断研究主要基于欧洲人群,东亚人群相关研究较少。本研究在中国人群中对欧洲人群MPB关联位点进行验证分析,并建立遗传推断模型。方法 本研究调查了486个与欧洲人群MPB相关单核苷酸多态性(SNP)位点在312名中国汉族男性中的关联性,分别使用逐步回归和Lasso回归方法对关联出的位点进行筛选。使用逻辑回归算法构建预测模型,通过十折交叉验证的方法评估。之后进一步比较了逻辑回归、k近邻分类器、随机森林、支持向量机4种常用分类器模型对MPB的预测准确性。结果 有174个SNP位点与中国汉族男性的MPB显著相关(P<0.05)。通过不同的筛选方法,分别得到了22个SNP和25个SNP的位点集合。基于上述位点集合建立了22-SNP和 25-SNP两种逻辑回归预测模型。以AUC(ROC曲线下方的面积大小,area under curve)来衡量,两种模型对MPB预测的准确性分别为0.85和0.84;经十折交叉验证后预测准确性分别下降至0.81和0.77。当加入年龄作为预测因子后,两种模型的AUC均达到最大值0.89。从运行结果来看,逻辑回归预测模型较本研究中的其他分类器模型具有明显优势。结论 总体而言,虽然预测模型的准确性尚未达到临床期望水平,但SNP在MPB的遗传预测方面仍具备很大的潜力,可以为MPB的早期诊断、临床干预和法庭科学应用提供参考。  相似文献   

2.
Habitat modelling is increasingly relevant in biodiversity and conservation studies. A typical application is to predict potential zones of specific conservation interest. With many environmental covariates, a large number of models can be investigated but multi‐model inference may become impractical. Shrinkage regression overcomes this issue by dealing with the identification and accurate estimation of effect size for prediction. In a Bayesian framework we investigated the use of a shrinkage prior, the Horseshoe, for variable selection in spatial generalized linear models (GLM). As study cases, we considered 5 datasets on small pelagic fish abundance in the Gulf of Lion (Mediterranean Sea, France) and 9 environmental inputs. We compared the predictive performances of a simple kriging model, a full spatial GLM model with independent normal priors for regression coefficients, a full spatial GLM model with a Horseshoe prior for regression coefficients and 2 zero‐inflated models (spatial and non‐spatial) with a Horseshoe prior. Predictive performances were evaluated by cross‐validation on a hold‐out subset of the data: models with a Horseshoe prior performed best, and the full model with independent normal priors worst. With an increasing number of inputs, extrapolation quickly became pervasive as we tried to predict from novel combinations of covariate values. By shrinking regression coefficients with a Horseshoe prior, only one model needed to be fitted to the data in order to obtain reasonable and accurate predictions, including extrapolations.  相似文献   

3.
Microcalcifications are an early mammographic sign of breast cancer and a target for stereotactic breast needle biopsy. Here, we develop and compare different approaches for developing Raman classification algorithms to diagnose invasive and in situ breast cancer, fibrocystic change and fibroadenoma that can be associated with microcalcifications. In this study, Raman spectra were acquired from tissue cores obtained from fresh breast biopsies and analyzed using a constituent‐based breast model. Diagnostic algorithms based on the breast model fit coefficients were devised using logistic regression, C4.5 decision tree classification, k‐nearest neighbor (k ‐NN) and support vector machine (SVM) analysis, and subjected to leave‐one‐out cross validation. The best performing algorithm was based on SVM analysis (with radial basis function), which yielded a positive predictive value of 100% and negative predictive value of 96% for cancer diagnosis. Importantly, these results demonstrate that Raman spectroscopy provides adequate diagnostic information for lesion discrimination even in the presence of microcalcifications, which to the best of our knowledge has not been previously reported. (© 2013 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

4.
As proteomic data sets increase in size and complexity, the necessity for database‐centric software systems able to organize, compare, and visualize all the proteomic experiments in a lab grows. We recently developed an integrated platform called high‐throughput autonomous proteomic pipeline (HTAPP) for the automated acquisition and processing of quantitative proteomic data, and integration of proteomic results with existing external protein information resources within a lab‐based relational database called PeptideDepot. Here, we introduce the peptide validation software component of this system, which combines relational database‐integrated electronic manual spectral annotation in Java with a new software tool in the R programming language for the generation of logistic regression spectral models from user‐supplied validated data sets and flexible application of these user‐generated models in automated proteomic workflows. This logistic regression spectral model uses both variables computed directly from SEQUEST output in addition to deterministic variables based on expert manual validation criteria of spectral quality. In the case of linear quadrupole ion trap (LTQ) or LTQ‐FTICR LC/MS data, our logistic spectral model outperformed both XCorr (242% more peptides identified on average) and the X!Tandem E‐value (87% more peptides identified on average) at a 1% false discovery rate estimated by decoy database approach.  相似文献   

5.
In model building and model evaluation, cross‐validation is a frequently used resampling method. Unfortunately, this method can be quite time consuming. In this article, we discuss an approximation method that is much faster and can be used in generalized linear models and Cox’ proportional hazards model with a ridge penalty term. Our approximation method is based on a Taylor expansion around the estimate of the full model. In this way, all cross‐validated estimates are approximated without refitting the model. The tuning parameter can now be chosen based on these approximations and can be optimized in less time. The method is most accurate when approximating leave‐one‐out cross‐validation results for large data sets which is originally the most computationally demanding situation. In order to demonstrate the method's performance, it will be applied to several microarray data sets. An R package penalized, which implements the method, is available on CRAN.  相似文献   

6.
Question: How useful are Ellenberg N‐values for predicting the herbage yield of Central European grasslands in comparison to approaches based on ordination scores of plant species composition or on soil parameters? Location: Central Germany (11°00′‐11°37’E, 50°21‐50°34’N, 500–840 m a.s.l.). Methods: Based on data from a field survey in 2001, the following models were constructed for predicting herbage yield in montane Central European grasslands: (1) Linear regression of mean Ellenberg N‐, R‐ and F‐values; (2) Linear regression of ordination scores derived from Non‐metric Multidimensional Scaling (NMDS) of vegetation data; and (3) Multiple linear regression (MLR) of soil variables. Models were evaluated by cross‐validation and validation with additional data collected in 2002. Results: Best predictions were obtained with models based on species composition. Ellenberg N‐values and NMDS scores performed equally well and better than models based on Ellenberg R‐ or F‐values. Predictions based on soil variables were least accurate. When tested with data from 2002, models based on Ellenberg N‐values or on NMDS scores accurately predicted productivity rank order of sites, but not the actual herbage yield of particular sites. Conclusions: Mean Ellenberg N‐values, which are easy to calculate, are as accurate as ordination scores in predicting herbage yield from plant species composition. In contrast, models based on soil variables may be useful for generating hypotheses about the factors limiting herbage yield, but not for prediction. We support the view that Ellenberg N‐values should be called productivity values rather than nitrogen values.  相似文献   

7.
Accumulating experimental evidence has demonstrated that microRNAs (miRNAs) have a huge impact on numerous critical biological processes and they are associated with different complex human diseases. Nevertheless, the task to predict potential miRNAs related to diseases remains difficult. In this paper, we developed a Kernel Fusion‐based Regularized Least Squares for MiRNA‐Disease Association prediction model (KFRLSMDA), which applied kernel fusion technique to fuse similarity matrices and then utilized regularized least squares to predict potential miRNA‐disease associations. To prove the effectiveness of KFRLSMDA, we adopted leave‐one‐out cross‐validation (LOOCV) and 5‐fold cross‐validation and then compared KFRLSMDA with 10 previous computational models (MaxFlow, MiRAI, MIDP, RKNNMDA, MCMDA, HGIMDA, RLSMDA, HDMP, WBSMDA and RWRMDA). Outperforming other models, KFRLSMDA achieved AUCs of 0.9246 in global LOOCV, 0.8243 in local LOOCV and average AUC of 0.9175 ± 0.0008 in 5‐fold cross‐validation. In addition, respectively, 96%, 100% and 90% of the top 50 potential miRNAs for breast neoplasms, colon neoplasms and oesophageal neoplasms were confirmed by experimental discoveries. We also predicted potential miRNAs related to hepatocellular cancer by removing all known related miRNAs of this cancer and 98% of the top 50 potential miRNAs were verified. Furthermore, we predicted potential miRNAs related to lymphoma using the data set in the old version of the HMDD database and 80% of the top 50 potential miRNAs were confirmed. Therefore, it can be concluded that KFRLSMDA has reliable prediction performance.  相似文献   

8.
To assess regression models for lipid and lean body mass in small birds, we recorded live body mass ±0.1 g, total body electrical conductivity (TOBEC; from “third generation” TOBEC machine EM‐SCAN® SA‐3000) or E‐Value, visual fat score (VisFat), and seven body measurements for 52 migratory passerine birds of 13 species (5–40 g). We determined lipid and lean mass of each bird after petroleum‐ether extraction of lipids. We obtained “netE‐Value (NEV) for each scanned bird by subtracting the E‐Value of the empty bird‐restraining tube, because these showed an inverse temperature dependence (P<0.005). Leave‐one‐out cross validation was used to assess model selection and construct 95% confidence intervals. Although precision of TOBEC increased with bird size (CV of NEV vs. live mass: r=−0.276, P=0.002) and it explained an increasing proportion of variation in lean mass moving from small‐ to medium‐ to large‐bird classes of our data, it did no better than head length in single‐variable prediction of lean or lipid mass and was included in five of the 14 multivariate models we developed. The best multiple regression to predict lean mass included live weight, VisFat, bill length, tarsus and lnNEV (adjusted R2=99.0%); however, the same model lacking only lnNEV yielded aR2=98.9%. A parallel to the above pair of models, but predicting lipid mass, yielded aR2=90.3% and 90.0%, respectively. Subdividing the data by three size classes and three taxa (American redstart Setophaga ruticilla, ovenbird Seiurus aurocapilla, warblers), best‐subset multiple‐regression models predicted lean mass with aR2 from 94.7 to 99.6% and lipid mass with aR2 from 85.4 to 98.3%. Best models for the size‐ and species‐groups included VisFat and zero to five body measurements, and most included live weight. lnNEV was included only in the models for ovenbird (lipid), warblers (lipid), all birds (both), and large birds (both). Actual lipid mass of all birds was more highly correlated with multiple‐regression‐predicted lipid mass (r=0.955) than with visual subcutaneous fat‐scoring (r=0.683). These multiple‐regression models predicting lipid content using live‐bird measurements and visual fat score as independent variables represent more accurate and precise estimates of actual lipid content in small passerines than any previously published. They are particularly accurate for placing birds into percentage body‐fat classes.  相似文献   

9.
Objective: To develop improved predictive regression equations for body fat content derived from common anthropometric measurements. Research Methods and Procedures: 117 healthy German subjects, 46 men and 71 women, 26 to 67 years of age, from two different studies were assigned to a validation and a cross‐validation group. Common anthropometric measurements and body composition by DXA were obtained. Equations using anthropometric measurements predicting body fat mass (BFM) with DXA as a reference method were developed using regression models. Results: The final best predictive sex‐specific equations combining skinfold thicknesses (SF), circumferences, and bone breadth measurements were as follows: BFMNew (kg) for men = ?40.750 + [(0.397 × waist circumference) + [6.568 × (log triceps SF + log subscapular SF + log abdominal SF)]] and BFMNew (kg) for women = ?75.231 + [(0.512 × hip circumference) + [8.889 × (log chin SF + log triceps SF + log subscapular SF)] + (1.905 × knee breadth)]. The estimates of BFM from both validation and cross‐validation had an excellent correlation, showed excellent correspondence to the DXA estimates, and showed a negligible tendency to underestimate percent body fat in subjects with higher BFM compared with equations using a two‐compartment (Durnin and Womersley) or a four‐compartment (Peterson) model as the reference method. Discussion: Combining skinfold thicknesses with circumference and/or bone breadth measures provide a more precise prediction of percent body fat in comparison with established SF equations. Our equations are recommended for use in clinical or epidemiological settings in populations with similar ethnic background.  相似文献   

10.
In biomedical research, the logistic regression model is the most commonly used method for predicting the probability of a binary outcome. While many clinical researchers have expressed an enthusiasm for regression trees, this method may have limited accuracy for predicting health outcomes. We aimed to evaluate the improvement that is achieved by using ensemble‐based methods, including bootstrap aggregation (bagging) of regression trees, random forests, and boosted regression trees. We analyzed 30‐day mortality in two large cohorts of patients hospitalized with either acute myocardial infarction (N = 16,230) or congestive heart failure (N = 15,848) in two distinct eras (1999–2001 and 2004–2005). We found that both the in‐sample and out‐of‐sample prediction of ensemble methods offered substantial improvement in predicting cardiovascular mortality compared to conventional regression trees. However, conventional logistic regression models that incorporated restricted cubic smoothing splines had even better performance. We conclude that ensemble methods from the data mining and machine learning literature increase the predictive performance of regression trees, but may not lead to clear advantages over conventional logistic regression models for predicting short‐term mortality in population‐based samples of subjects with cardiovascular disease.  相似文献   

11.
Acute myeloid leukaemia (AML) is the most common type of adult acute leukaemia and has a poor prognosis. Thus, optimal risk stratification is of greatest importance for reasonable choice of treatment and prognostic evaluation. For our study, a total of 1707 samples of AML patients from three public databases were divided into meta‐training, meta‐testing and validation sets. The meta‐training set was used to build risk prediction model, and the other four data sets were employed for validation. By log‐rank test and univariate COX regression analysis as well as LASSO‐COX, AML patients were divided into high‐risk and low‐risk groups based on AML risk score (AMLRS) which was constituted by 10 survival‐related genes. In meta‐training, meta‐testing and validation sets, the patient in the low‐risk group all had a significantly longer OS (overall survival) than those in the high‐risk group (P < .001), and the area under ROC curve (AUC) by time‐dependent ROC was 0.5854‐0.7905 for 1 year, 0.6652‐0.8066 for 3 years and 0.6622‐0.8034 for 5 years. Multivariate COX regression analysis indicated that AMLRS was an independent prognostic factor in four data sets. Nomogram combining the AMLRS and two clinical parameters performed well in predicting 1‐year, 3‐year and 5‐year OS. Finally, we created a web‐based prognostic model to predict the prognosis of AML patients ( https://tcgi.shinyapps.io/amlrs_nomogram/ ).  相似文献   

12.
In this article, we present COMSAT, a hybrid framework for residue contact prediction of transmembrane (TM) proteins, integrating a support vector machine (SVM) method and a mixed integer linear programming (MILP) method. COMSAT consists of two modules: COMSAT_SVM which is trained mainly on position–specific scoring matrix features, and COMSAT_MILP which is an ab initio method based on optimization models. Contacts predicted by the SVM model are ranked by SVM confidence scores, and a threshold is trained to improve the reliability of the predicted contacts. For TM proteins with no contacts above the threshold, COMSAT_MILP is used. The proposed hybrid contact prediction scheme was tested on two independent TM protein sets based on the contact definition of 14 Å between Cα‐Cα atoms. First, using a rigorous leave‐one‐protein‐out cross validation on the training set of 90 TM proteins, an accuracy of 66.8%, a coverage of 12.3%, a specificity of 99.3% and a Matthews' correlation coefficient (MCC) of 0.184 were obtained for residue pairs that are at least six amino acids apart. Second, when tested on a test set of 87 TM proteins, the proposed method showed a prediction accuracy of 64.5%, a coverage of 5.3%, a specificity of 99.4% and a MCC of 0.106. COMSAT shows satisfactory results when compared with 12 other state‐of‐the‐art predictors, and is more robust in terms of prediction accuracy as the length and complexity of TM protein increase. COMSAT is freely accessible at http://hpcc.siat.ac.cn/COMSAT/ . Proteins 2016; 84:332–348. © 2016 Wiley Periodicals, Inc.  相似文献   

13.
Phylogenetic imputation has recently emerged as a potentially powerful tool for predicting missing data in functional traits datasets. As such, understanding the limitations of phylogenetic modelling in predicting trait values is critical if we are to use them in subsequent analyses. Previous studies have focused on the relationship between phylogenetic signal and clade‐level prediction accuracy, yet variability in prediction accuracy among individual tips of phylogenies remains largely unexplored. Here, we used simulations of trait evolution along the branches of phylogenetic trees to show how the accuracy of phylogenetic imputations is influenced by the combined effects of 1) the amount of phylogenetic signal in the traits and 2) the branch length of the tips to be imputed. Specifically, we conducted cross‐validation trials to estimate the variability in prediction accuracy among individual tips on the phylogenies (hereafter ‘tip‐level accuracy’). We found that under a Brownian motion model of evolution (BM, Pagel't λ = 1), tip‐level accuracy rapidly decreased with increasing tip branch‐lengths, and only tips of approximately 10% or less of the total height of the trees showed consistently accurate predictions (i.e. cross‐validation R‐squared >0.75). When phylogenetic signal was weak, the effect of tip branch‐length was reduced, becoming negligible for traits simulated with λ < 0.7, where accuracy was in any case low. Our study shows that variability in prediction accuracy among individual tips of the phylogeny should be considered when evaluating the reliability of phylogenetically imputed trait values. To address this challenge, we describe a Monte Carlo‐based method that allows one to estimate the expected tip‐level accuracy of phylogenetic predictions for continuous traits. Our approach identifies gaps in functional trait datasets for which phylogenetic imputation performs poorly, and will help ecologists to design more efficient trait collection campaigns by focusing resources on lineages whose trait values are more uncertain.  相似文献   

14.
Suitability of trees as hosts for epiphytic lichens are studied in a forest stand of size 25 ha. Suitability is measured as occupation probabilites which are modelled using hierarchical Bayesian approach. These probabilities are useful for an ecologist. They give smoothed spatial distribution map of suitability for each of the species and can be used in detecting high‐ and low‐probability areas. In addition, suitability is explained by tree‐level covariates. Spatial dependence, which is due to unobserved spatially structured covariates, is modelled through an unobserved Markov random field. Markov chain Monte Carlo method has been applied in Bayesian computation. The extensive spatial data consist of the occurrences of eight lichen species and one bryophyte on all of the 1253 potential host trees. In addition, coordinates of the trees and several tree characteristics have been recorded. The data have been analysed for four most abundant species: Lobaria pulmonaria, Nephroma bellum, Nephroma parile and Peltigera praetextata. The tree level parameters, subject to estimation, consist of the occurrence probabilities for each tree and for each lichen species. Model validation is discussed in detail and, in addition to Bayesian validation tools, the autologistic model and case‐control design based on logistic regression have been suggested for validation of covariate effects. As a result we present suitability maps for the four lichen species. We observed, that among the observed tree covariates, the diameter at breast height (DBH) correlates with lichen occurrence. Our modelling approach has close connections to disease mapping in spatial epidemiology.  相似文献   

15.
Previous publications demonstrated that the extrapolated solubility by polyethylene glycol (PEG) precipitation method (Middaugh et al., J Biol Chem 1979; 254:367–370; Juckes, Biochim Biophys Acta 1971; 229:535–546; Foster et al., Biochim Biophys Acta 1973; 317:505; Mahadevan and Hall, AIChE J 1990; 36:1517–1528; Stevenson and Hageman, Pharm Res 1995; 12:1671–1676) has a strong correlation to experimentally measured solubility of proteins. Here, we explored the utility of extrapolated solubility as a method to compare multiple protein drug candidates when nonideality of a highly soluble protein prohibits accurate quantitative solubility prediction. To achieve high efficiency and reduce the amount of protein required, the method is miniaturized to microwell plate format for high‐throughput screening application. In this simplified version of the method, comparative solubility of proteins can be obtained without the need of concentration measurement of the supernatant following the precipitation step in the conventional method. The monoclonal antibodies with the lowest apparent solubilities determined by this method are the most difficult to be concentrated, indicating a good correlation between the prediction and empirical observations. This study also shows that the PEG precipitation method gives results for opalescence prediction that favorably compares to experimentally determined opalescence levels at high concentration. This approach may be useful in detecting proteins with potential solubility and opalescence problems prior to the time‐consuming and expensive development process of high concentration formulation.  相似文献   

16.
17.
Aim To introduce Gaussian mixture distributions and sequential maximum a posteriori image segmentation (GM‐SMAP) as a model that predicts species ranges from mapped climatic variables, and to compare its predictive capacity with two commonly used bioclimatic models: regression tree analysis (RTA) and smoothed response surfaces (SRS). Location North‐west North America. Methods We compared models for their ability to predict the distributional range of western hemlock (Tsuga heterophylla). We calculated and projected nine climatic and water‐balance variables to a 2‐km grid up to 140 km from the T. heterophylla range. Models were trained using the five variables selected by RTA, as well as subsets of three variables. Goodness of fit was assessed using models trained and tested on the entire study area. Predictive capacity was assessed using 100 cross‐validation tests, each trained on a randomly sampled 1% of the study area and tested on the complement of the study area. Results Models using all five variables were significantly better than three‐variable models. Model fit was greatest for SRS. GM‐SMAP misclassified slightly more area and RTA misclassified almost twice the area compared to SRS. However, cross‐ validation showed that the predictive capacity was clearly greatest for GM‐SMAP and lowest for SRS, indicating that GM‐SMAP makes more accurate predictions from sparse data. Main conclusions GM distributions prevent overfitting using an information‐theoretic approach, and the SMAP algorithm minimizes the spatial extent of the largest misclassified area using a multiscale method. These properties, useful for image classification, also aid their strong predictive capacity as a bioclimatic model. SRS overfit the data, lowering its predictive capacity, and RTA failed to capture details of interactions among variables, yielding a poor fit. These results demonstrate the strong potential of GM‐SMAP as a bioclimatic model.  相似文献   

18.
LncRNA and miRNA are key molecules in mechanism of competing endogenous RNAs(ceRNA), and their interactions have been discovered with important roles in gene regulation. As supplementary to the identification of lncRNA‐miRNA interactions from CLIP‐seq experiments, in silico prediction can select the most potential candidates for experimental validation. Although developing computational tool for predicting lncRNA‐miRNA interaction is of great importance for deciphering the ceRNA mechanism, little effort has been made towards this direction. In this paper, we propose an approach based on linear neighbour representation to predict lncRNA‐miRNA interactions (LNRLMI). Specifically, we first constructed a bipartite network by combining the known interaction network and similarities based on expression profiles of lncRNAs and miRNAs. Based on such a data integration, linear neighbour representation method was introduced to construct a prediction model. To evaluate the prediction performance of the proposed model, k‐fold cross validations were implemented. As a result, LNRLMI yielded the average AUCs of 0.8475 ± 0.0032, 0.8960 ± 0.0015 and 0.9069 ± 0.0014 on 2‐fold, 5‐fold and 10‐fold cross validation, respectively. A series of comparison experiments with other methods were also conducted, and the results showed that our method was feasible and effective to predict lncRNA‐miRNA interactions via a combination of different types of useful side information. It is anticipated that LNRLMI could be a useful tool for predicting non‐coding RNA regulation network that lncRNA and miRNA are involved in.  相似文献   

19.
Relationships between environmental variables and diversity (Shannon‐Weaver index) of the fish communities in the Tagus estuary and adjacent coastal areas were analyzed. The focus was on the linearity or nonlinearity of these abiotic/biotic characteristics, with the aim to obtain an accurate short–medium term time‐scale diversity prediction from habitat variables alone. Multiple Linear Regressions (MLR) were used for the linear approach and Artificial Neural Networks (ANNs) for the nonlinear approach. MLR results in the external validation phase indicated a lack of model accuracy (R2 = 0.0710; %SEP = 47.5868; E = ?0.0217; ARV = 1.0217; N = 43). Results of the best of the Artificial Neural Networks used in this study (12‐15‐15‐1 architecture) in the external validation phase (ANN: R2 = 0.9736; %SEP = 7.8499; E = 0.9722; ARV = 0.0278; N = 43) were more accurate than those obtained with MLR. This indicates a clear nonlinear relationship between variables. In the best ANN model, nitrate concentration, depth, dissolved oxygen and temperature were the most important predictors of fish diversity in the Tagus estuary. The sensibility analysis indicated that the remaining variables (silicate, nitrite, transparency, salinity, slope, phosphate, water particulate organic matter, and chlorophyll a) played lesser roles in the model.  相似文献   

20.
It is highly possible that tea (Camellia sinensis) plant is attacked by more than one pest species at the same time, and the determination of their proportion is of great significance to the management of tea plants. However, there are no literatures focusing on it previously. In this work, two pest species (Ectropis obliqua and Ectropis grisescens) in six different ratios (10:0, 8:2, 6:4, 4:6, 2:8 and 0:10) were applied to attack tea plants and electronic nose (E‐nose) was employed to detect them, labelled as group 10:0, 8:2, 6:4, 4:6, 2:8 and 0:10, respectively. Two prediction methods were applied to predict the ratio of E. obliqua and E. grisescens attacking tea plant and their performances were compared. The first method employed regression algorithm for prediction analysis based on the whole E‐nose data directly. The second method classified tea plants into three main classes (the first class contained group 10:0, the second class contained groups 8:2, 6:4, 4:6 and 2:8, and the third class contained group 0:10) first, then regression algorithm was applied to deal with the second class for prediction analysis. The results showed that the second method had a better performance. Its discrimination results showed 100% of the correct classification rate for training set and 93.75% for testing set. Meanwhile, its prediction results showed 0.0005 of root mean square error (RMSE) for calibration set, 0.0064 for validation set and 99.07% of fitting correlation coefficients (R2) for calibration set, 91.22% for validation set, which were acceptable for prediction analysis and proved that E‐nose was a feasible technique for pests' ratio prediction.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号