首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 250 毫秒
1.
Two algorithms, based onBayesian Networks (BNs), for bacterial subcellular location prediction, are explored in this paper: one predicts all locations for Gram+ bacteria and the other all locations for Gram- bacteria. Methods were evaluated using different numbers of residues (from the N-terminal 10 residues to the whole sequence) and residue representation (amino acid-composition, percentage amino acid-composition or normalised amino acid-composition). The accuracy of the best resulting BN was compared to PSORTB. The accuracy of this multi-location BN was roughly comparable to PSORTB; the difference in predictions is low, often less than 2%. The BN method thus represents both an important new avenue of methodological development for subcellular location prediction and a potentially value new tool of true utilitarian value for candidate subunit vaccine selection.  相似文献   

2.
The performance of the self-consistent mean field theory (SCMFT) method for side-chain modeling, employing rotamer energies calculated with the flexible rotamer model (FRM), is evaluated in the context of comparative modeling of protein structure. Predictions were carried out on a test set of 56 model backbones of varying accuracy, to allow side-chain prediction accuracy to be analyzed as a function of backbone accuracy. A progressive decrease in the accuracy of prediction was observed as backbone accuracy decreased. However, even for very low backbone accuracy, prediction was substantially higher than random, indicating that the FRM can, in part, compensate for the errors in the modeled tertiary environment. It was also investigated whether the introduction in the FRM-SCMFT method of knowledge-based biases, derived from a backbone-dependent rotamer library, could enhance its performance. A bias derived from the backbone-dependent rotamer conformations alone did not improve prediction accuracy. However, a bias derived from the backbone-dependent rotamer probabilities improved prediction accuracy considerably. This bias was incorporated through two different strategies. In one (the indirect strategy), rotamer probabilities were used to reject unlikely rotamers a priori, thus restricting prediction by FRM-SCMFT to a subset containing only the most probable rotamers in the library. In the other (the direct strategy), rotamer energies were transformed into pseudo-energies that were added to the average potential energies of the respective rotamers, thereby creating hybrid energy-based/knowledge-based average rotamer energies, which were used by the FRM-SCMFT method for prediction. For all degrees of backbone accuracy, an optimal strength of the knowledge-based bias existed for both strategies for which predictions were more accurate than pure energy-based predictions, and also than pure knowledge-based predictions. Hybrid knowledge-based/energy-based methods were obtained from both strategies and compared with the SCWRL method, a hybrid method based on the same backbone-dependent rotamer library. The accuracy of the indirect method was approximately the same as that of the SCWRL method, but that of the direct method was significantly higher.  相似文献   

3.
This study aims to enhance the discussion about the usefulness of Artificial Neural Networks and specific input relevance detection for water quality assessment. The focus is on the development of neural modelling techniques initiating further research on predictor selection for bioindication. We tested the predictability of abiotic variables and quality indices BOD5, conductivity, NH3-N, NH4-N, NO2-N, NO3-N, Ntotal, oxygen, pH-value, Ptotal, water temperature, chemical and morphological water quality class and saprobic index by means of benthic macro-invertebrates on 51 sampling sites of nine small streams in Central Germany. The results show that General Regression Neural Networks and modified Multi-Layer-Perceptrons can successfully be applied for modelling and predicting ecological and environmental data because of their ability to solve non-linear and multidimensional problems. Nevertheless, Linear Neural Networks have been proved suitable in some cases. Particularly, stepwise method, genetic algorithms and sensitivity analysis can be used to reduce the complexity of data sets in a reasonable way by detecting important predictors. In many cases the prediction accuracy even increases. In addition, using only the presence of species instead of their abundance provides mostly better results, simpler models and an easier collection of data. Thus, complex systems can be illustrated in easily surveyed models with low measuring and computing effort. We claim that the identification of indicator species and the assessment of complex anthropogenic impacts can be improved substantially and managed more efficiently using the neural-based approach. It is predestinated for bioindication, particularly with regard to aquatic ecosystems.  相似文献   

4.
Accurate protein structure prediction remains an active objective of research in bioinformatics. Membrane proteins comprise approximately 20% of most genomes. They are, however, poorly tractable targets of experimental structure determination. Their analysis using bioinformatics thus makes an important contribution to their on-going study. Using a method based on Bayesian Networks, which provides a flexible and powerful framework for statistical inference, we have addressed the alignment-free discrimination of membrane from non-membrane proteins. The method successfully identifies prokaryotic and eukaryotic alpha-helical membrane proteins at 94.4% accuracy, beta-barrel proteins at 72.4% accuracy, and distinguishes assorted non-membranous proteins with 85.9% accuracy. The method here is an important potential advance in the computational analysis of membrane protein structure. It represents a useful tool for the characterisation of membrane proteins with a wide variety of potential applications.  相似文献   

5.

Background

Dementia and cognitive impairment associated with aging are a major medical and social concern. Neuropsychological testing is a key element in the diagnostic procedures of Mild Cognitive Impairment (MCI), but has presently a limited value in the prediction of progression to dementia. We advance the hypothesis that newer statistical classification methods derived from data mining and machine learning methods like Neural Networks, Support Vector Machines and Random Forests can improve accuracy, sensitivity and specificity of predictions obtained from neuropsychological testing. Seven non parametric classifiers derived from data mining methods (Multilayer Perceptrons Neural Networks, Radial Basis Function Neural Networks, Support Vector Machines, CART, CHAID and QUEST Classification Trees and Random Forests) were compared to three traditional classifiers (Linear Discriminant Analysis, Quadratic Discriminant Analysis and Logistic Regression) in terms of overall classification accuracy, specificity, sensitivity, Area under the ROC curve and Press'Q. Model predictors were 10 neuropsychological tests currently used in the diagnosis of dementia. Statistical distributions of classification parameters obtained from a 5-fold cross-validation were compared using the Friedman's nonparametric test.

Results

Press' Q test showed that all classifiers performed better than chance alone (p < 0.05). Support Vector Machines showed the larger overall classification accuracy (Median (Me) = 0.76) an area under the ROC (Me = 0.90). However this method showed high specificity (Me = 1.0) but low sensitivity (Me = 0.3). Random Forest ranked second in overall accuracy (Me = 0.73) with high area under the ROC (Me = 0.73) specificity (Me = 0.73) and sensitivity (Me = 0.64). Linear Discriminant Analysis also showed acceptable overall accuracy (Me = 0.66), with acceptable area under the ROC (Me = 0.72) specificity (Me = 0.66) and sensitivity (Me = 0.64). The remaining classifiers showed overall classification accuracy above a median value of 0.63, but for most sensitivity was around or even lower than a median value of 0.5.

Conclusions

When taking into account sensitivity, specificity and overall classification accuracy Random Forests and Linear Discriminant analysis rank first among all the classifiers tested in prediction of dementia using several neuropsychological tests. These methods may be used to improve accuracy, sensitivity and specificity of Dementia predictions from neuropsychological testing.  相似文献   

6.
MOTIVATION: With complex traits and diseases having potential genetic contributions of thousands of genetic factors, and with current genotyping arrays consisting of millions of single nucleotide polymorphisms (SNPs), powerful high-dimensional statistical techniques are needed to comprehensively model the genetic variance. Machine learning techniques have many advantages including lack of parametric assumptions, and high power and flexibility. RESULTS: We have applied three machine learning approaches: Random Forest Regression (RFR), Boosted Regression Tree (BRT) and Support Vector Regression (SVR) to the prediction of warfarin maintenance dose in a cohort of African Americans. We have developed a multi-step approach that selects SNPs, builds prediction models with different subsets of selected SNPs along with known associated genetic and environmental variables and tests the discovered models in a cross-validation framework. Preliminary results indicate that our modeling approach gives much higher accuracy than previous models for warfarin dose prediction. A model size of 200 SNPs (in addition to the known genetic and environmental variables) gives the best accuracy. The R(2) between the predicted and actual square root of warfarin dose in this model was on average 66.4% for RFR, 57.8% for SVR and 56.9% for BRT. Thus RFR had the best accuracy, but all three techniques achieved better performance than the current published R(2) of 43% in a sample of mixed ethnicity, and 27% in an African American sample. In summary, machine learning approaches for high-dimensional pharmacogenetic prediction, and for prediction of clinical continuous traits of interest, hold great promise and warrant further research.  相似文献   

7.
The antigenic variability of influenza viruses has always made influenza vaccine development challenging. The punctuated nature of antigenic drift of influenza virus suggests that a relatively small number of genetic changes or combinations of genetic changes may drive changes in antigenic phenotype. The present study aimed to identify antigenicity-associated sites in the hemagglutinin protein of A/H1N1 seasonal influenza virus using computational approaches. Random Forest Regression (RFR) and Support Vector Regression based on Recursive Feature Elimination (SVR-RFE) were applied to H1N1 seasonal influenza viruses and used to analyze the associations between amino acid changes in the HA1 polypeptide and antigenic variation based on hemagglutination-inhibition (HI) assay data. Twenty-three and twenty antigenicity-associated sites were identified by RFR and SVR-RFE, respectively, by considering the joint effects of amino acid residues on antigenic drift. Our proposed approaches were further validated with the H3N2 dataset. The prediction models developed in this study can quantitatively predict antigenic differences with high prediction accuracy based only on HA1 sequences. Application of the study results can increase understanding of H1N1 seasonal influenza virus antigenic evolution and accelerate the selection of vaccine strains.  相似文献   

8.
Abstract. In this study we present a new method for predicting the occurrences of species using data from deciduous forests in South Sweden. Complete species lists of vascular plants were compiled from 101 stands and from representative sample plots inside the stands. Soil samples from each stand were collected for determination of pH and nitrogen mineralization. Presence-absence data for species were fitted to the values of four environmental variables - soil moisture, soil reaction (pH), soil nitrogen and light - by means of Linear (Multiple) Logistic Regression (LLR), and Gaussian (Multiple) Logistic Regression (GLR). First, these values were estimated by calculating the weighted averages of Ellenberg indicator values. Second, the estimates for reaction and nitrogen were substituted by the real measurements of pH and mineralized NH4+, keeping the Ellenberg estimates for light and moisture. The models were validated by an independent test data set. In general, the models had high predictive abilities. GLR fitted the species occurrences better to the environmental variables than LLR, but had a lower accuracy of prediction of species occurrence in the stands. The use of soil measurements instead of Ellenberg indicator values did not improve the predictive abilities of the models. The environmental conditions in the stand test set were successfully estimated by using species data from the plots. When using the species lists of the stands instead of plot data, a slightly better predictive ability was obtained. The collection of plot data, however, is easier and less time-consuming. The accuracy of prediction differed considerably between species.  相似文献   

9.
Multiple reaction monitoring (MRM) has recently become the method of choice for targeted quantitative measurement of proteins using mass spectrometry. The method, however, is limited in the number of peptides that can be measured in one run. This number can be markedly increased by scheduling the acquisition if the accurate retention time (RT) of each peptide is known. Here we present iRT, an empirically derived dimensionless peptide-specific value that allows for highly accurate RT prediction. The iRT of a peptide is a fixed number relative to a standard set of reference iRT-peptides that can be transferred across laboratories and chromatographic systems. We show that iRT facilitates the setup of multiplexed experiments with acquisition windows more than four times smaller compared to in silico RT predictions resulting in improved quantification accuracy. iRTs can be determined by any laboratory and shared transparently. The iRT concept has been implemented in Skyline, the most widely used software for MRM experiments.  相似文献   

10.
In 3D single particle reconstruction, which involves the translational and rotational matching of a large number of electron microscopy (EM) images, the algorithmic performance is largely dependent on the efficiency and accuracy of the underlying 2D image alignment kernel. We present a novel fast rotational matching kernel for 2D images (FRM2D) that significantly reduces the cost of this alignment. The alignment problem is formulated using one translational and two rotational degrees of freedom. This allows us to take advantage of fast Fourier transforms (FFTs) in rotational space to accelerate the search of the two angular parameters, while the remaining translational parameter is explored, within a limited range, by exhaustive search. Since there are no boundary effects in FFTs of cyclic angular variables, we avoid the expensive zero padding associated with Fourier transforms in linear space. To verify the robustness of our method, efficiency and accuracy tests were carried out over a range of noise levels in realistic simulations of EM images. Performance tests against two standard alignment methods, resampling to polar coordinates and self-correlation, demonstrate that FRM2D compares very favorably to the traditional methods. FRM2D exhibits a comparable or higher robustness against noise and a significant gain in efficiency that depends on the fineness of the angular sampling and linear search range.  相似文献   

11.
Logistic Multiple Regression, Principal Component Regression and Classification and Regression Tree Analysis (CART), commonly used in ecological modelling using GIS, are compared with a relatively new statistical technique, Multivariate Adaptive Regression Splines (MARS), to test their accuracy, reliability, implementation within GIS and ease of use. All were applied to the same two data sets, covering a wide range of conditions common in predictive modelling, namely geographical range, scale, nature of the predictors and sampling method. We ran two series of analyses to verify if model validation by an independent data set was required or cross‐validation on a learning data set sufficed. Results show that validation by independent data sets is needed. Model accuracy was evaluated using the area under Receiver Operating Characteristics curve (AUC). This measure was used because it summarizes performance across all possible thresholds, and is independent of balance between classes. MARS and Regression Tree Analysis achieved the best prediction success, although the CART model was difficult to use for cartographic purposes due to the high model complexity.  相似文献   

12.
The accurate representation of species distribution derived from sampled data is essential for management purposes and to underpin population modelling. Additionally, the prediction of species distribution for an expanded area, beyond the sampling area can reduce sampling costs. Here, several well-established and recently developed habitat modelling techniques are investigated in order to identify the most suitable approach to use with presence–absence acoustic data. The fitting efficiency of the modelling techniques are initially tested on the training dataset while their predictive capacity is evaluated using a verification set. For the comparison among models, Receiver Operating Characteristics (ROC), Kappa statistics, correlation and confusion matrices are used. Boosted Regression Trees (BRT) and Associative Neural Networks (ASNN), which are both within the machine learning category, outperformed the other modelling approaches tested.  相似文献   

13.
长白山低山区森林土壤有机碳及养分空间异质性   总被引:2,自引:2,他引:0  
以吉林延边汪清林业局金仓林场境内森林土壤为对象,采用多元线性回归方法和地统计学回归克里格方法,研究了土壤有机碳及养分的垂直分布规律,预测了其空间分布,并对预测结果进行插值.结果表明: 0~60 cm深度土壤有机碳密度为(16.14±4.58) kg·m-2.随土壤深度增加,土壤有机碳含量、有机碳密度以及土壤全N、全P、全K、有效P及速效K含量都呈减小趋势,其中不同土层间土壤有机碳含量、有机碳密度差异显著(P<0.01).0~60 cm土层土壤有机碳含量和碳密度的拟合方程中,地形因子中高程和坡向余弦值是最优的拟合因子,方程的决定系数分别为0.34和0.39(P<0.01).0~20和0~60 cm土层的半方差函数模型分别为高斯模型和指数模型,利用回归克里格插值方法得到土壤有机碳的空间分布图.与普通克里格法相比,回归克里格法的空间预测精度改进了18%~58%.利用回归克里格插值方法预测了土壤全N的空间分布特征.  相似文献   

14.
Since LC-MS-based quantitative proteomics has become increasingly applied to a wide range of biological applications over the past decade, numerous studies have performed relative and/or absolute abundance determinations across large sets of proteins. In this study, we discovered prognostic biomarker candidates from limited breast cancer tissue samples using discovery-through-verification strategy combining iTRAQ method followed by selected reaction monitoring/multiple reaction monitoring analysis (SRM/MRM). We identified and quantified 5122 proteins with high confidence in 18 patient tissue samples (pooled high-risk (n = 9) or low-risk (n = 9)). A total of 2480 proteins (48.4%) of them were annotated as membrane proteins, 16.1% were plasma membrane and 6.6% were extracellular space proteins by Gene Ontology analysis. Forty-nine proteins with >2-fold differences in two groups were chosen for further analysis and verified in 16 individual tissue samples (high-risk (n = 9) or low-risk (n = 7)) using SRM/MRM. Twenty-three proteins were differentially expressed among two groups of which MFAP4 and GP2 were further confirmed by Western blotting in 17 tissue samples (high-risk (n = 9) or low-risk (n = 8)) and Immunohistochemistry (IHC) in 24 tissue samples (high-risk (n = 12) or low-risk (n = 12)). These results indicate that the combination of iTRAQ and SRM/MRM proteomics will be a powerful tool for identification and verification of candidate protein biomarkers.  相似文献   

15.
Genetic risk prediction has several potential applications in medical research and clinical practice and could be used, for example, to stratify a heterogeneous population of patients by their predicted genetic risk. However, for polygenic traits, such as psychiatric disorders, the accuracy of risk prediction is low. Here we use a multivariate linear mixed model and apply multi-trait genomic best linear unbiased prediction for genetic risk prediction. This method exploits correlations between disorders and simultaneously evaluates individual risk for each disorder. We show that the multivariate approach significantly increases the prediction accuracy for schizophrenia, bipolar disorder, and major depressive disorder in the discovery as well as in independent validation datasets. By grouping SNPs based on genome annotation and fitting multiple random effects, we show that the prediction accuracy could be further improved. The gain in prediction accuracy of the multivariate approach is equivalent to an increase in sample size of 34% for schizophrenia, 68% for bipolar disorder, and 76% for major depressive disorders using single trait models. Because our approach can be readily applied to any number of GWAS datasets of correlated traits, it is a flexible and powerful tool to maximize prediction accuracy. With current sample size, risk predictors are not useful in a clinical setting but already are a valuable research tool, for example in experimental designs comparing cases with high and low polygenic risk.  相似文献   

16.
Fluorescence reconstruction microscopy (FRM) describes a class of techniques where transmitted light images are passed into a convolutional neural network that then outputs predicted epifluorescence images. This approach enables many benefits including reduced phototoxicity, freeing up of fluorescence channels, simplified sample preparation, and the ability to re-process legacy data for new insights. However, FRM can be complex to implement, and current FRM benchmarks are abstractions that are difficult to relate to how valuable or trustworthy a reconstruction is. Here, we relate the conventional benchmarks and demonstrations to practical and familiar cell biology analyses to demonstrate that FRM should be judged in context. We further demonstrate that it performs remarkably well even with lower-magnification microscopy data, as are often collected in screening and high content imaging. Specifically, we present promising results for nuclei, cell-cell junctions, and fine feature reconstruction; provide data-driven experimental design guidelines; and provide researcher-friendly code, complete sample data, and a researcher manual to enable more widespread adoption of FRM.  相似文献   

17.
In maize breeding, genomic prediction may be an efficient tool for selecting single-crosses evaluated under abiotic stress conditions. In addition, a promising strategy is applying multiple-trait genomic prediction using selection indices (SIs), increasing genetics gains and reducing time per cycles. In this study, we aimed (i) to compare accuracy of single- and multi-trait genomic prediction (STGP; MTGP) in two maize datasets, (ii) to evaluate prediction of four selection indices that could contribute to the selection of tropical maize hybrids under contrasting nitrogen conditions, and (iii) to compare the use of linear (GBLUP) and nonlinear (RKHS/GK) kernels in STGP and MTGP analyses. For either single-trait GBLUP and RKHS analyses, the highest values obtained for accuracy were 0.40 and 0.41 using harmonic mean (HM), respectively. From multi-trait GBLUP and GK, using the combination of selection indices in MTGP seems to be suitable, increasing the accuracy. Adding grain yield and plant height in MTGP showed a slight improvement in accuracy compared to STGP. In general, there was a modest benefit of using single-trait RKHS and GK multi-trait, rather than GBLUP.  相似文献   

18.
Genomic selection (GS) has been implemented in animal and plant species, and is regarded as a useful tool for accelerating genetic gains. Varying levels of genomic prediction accuracy have been obtained in plants, depending on the prediction problem assessed and on several other factors, such as trait heritability, the relationship between the individuals to be predicted and those used to train the models for prediction, number of markers, sample size and genotype × environment interaction (GE). The main objective of this article is to describe the results of genomic prediction in International Maize and Wheat Improvement Center''s (CIMMYT''s) maize and wheat breeding programs, from the initial assessment of the predictive ability of different models using pedigree and marker information to the present, when methods for implementing GS in practical global maize and wheat breeding programs are being studied and investigated. Results show that pedigree (population structure) accounts for a sizeable proportion of the prediction accuracy when a global population is the prediction problem to be assessed. However, when the prediction uses unrelated populations to train the prediction equations, prediction accuracy becomes negligible. When genomic prediction includes modeling GE, an increase in prediction accuracy can be achieved by borrowing information from correlated environments. Several questions on how to incorporate GS into CIMMYT''s maize and wheat programs remain unanswered and subject to further investigation, for example, prediction within and between related bi-parental crosses. Further research on the quantification of breeding value components for GS in plant breeding populations is required.  相似文献   

19.
20.

Background

Genomic prediction is becoming a daily tool for plant breeders. It makes use of genotypic information to make predictions used for selection decisions. The accuracy of the predictions depends on the number of genotypes used in the calibration; hence, there is a need of combining data across years. A proper phenotypic analysis is a crucial prerequisite for accurate calibration of genomic prediction procedures. We compared stage-wise approaches to analyse a real dataset of a multi-environment trial (MET) in rye, which was connected between years only through one check, and used different spatial models to obtain better estimates, and thus, improved predictive abilities for genomic prediction. The aims of this study were to assess the advantage of using spatial models for the predictive abilities of genomic prediction, to identify suitable procedures to analyse a MET weakly connected across years using different stage-wise approaches, and to explore genomic prediction as a tool for selection of models for phenotypic data analysis.

Results

Using complex spatial models did not significantly improve the predictive ability of genomic prediction, but using row and column effects yielded the highest predictive abilities of all models. In the case of MET poorly connected between years, analysing each year separately and fitting year as a fixed effect in the genomic prediction stage yielded the most realistic predictive abilities. Predictive abilities can also be used to select models for phenotypic data analysis. The trend of the predictive abilities was not the same as the traditionally used Akaike information criterion, but favoured in the end the same models.

Conclusions

Making predictions using weakly linked datasets is of utmost interest for plant breeders. We provide an example with suggestions on how to handle such cases. Rather than relying on checks we show how to use year means across all entries for integrating data across years. It is further shown that fitting of row and column effects captures most of the heterogeneity in the field trials analysed.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-646) contains supplementary material, which is available to authorized users.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号