首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Peptide length-based prediction of peptide-MHC class II binding   总被引:2,自引:0,他引:2  
MOTIVATION: Algorithms for predicting peptide-MHC class II binding are typically similar, if not identical, to methods for predicting peptide-MHC class I binding despite known differences between the two scenarios. We investigate whether representing one of these differences, the greater range of peptide lengths binding MHC class II, improves the performance of these algorithms. RESULTS: A non-linear relationship between peptide length and peptide-MHC class II binding affinity was identified in the data available for several MHC class II alleles. Peptide length was incorporated into existing prediction algorithms using one of several modifications: using regression to pre-process the data, using peptide length as an additional variable within the algorithm, or representing register shifting in longer peptides. For several datasets and at least two algorithms these modifications consistently improved prediction accuracy. AVAILABILITY: http://malthus.micro.med.umich.edu/Bioinformatics  相似文献   

2.
3.
Schimpl M  Lederer C  Daumer M 《PloS one》2011,6(8):e23080
Walking speed is a fundamental indicator for human well-being. In a clinical setting, walking speed is typically measured by means of walking tests using different protocols. However, walking speed obtained in this way is unlikely to be representative of the conditions in a free-living environment. Recently, mobile accelerometry has opened up the possibility to extract walking speed from long-time observations in free-living individuals, but the validity of these measurements needs to be determined. In this investigation, we have developed algorithms for walking speed prediction based on 3D accelerometry data (actibelt?) and created a framework using a standardized data set with gold standard annotations to facilitate the validation and comparison of these algorithms. For this purpose 17 healthy subjects operated a newly developed mobile gold standard while walking/running on an indoor track. Subsequently, the validity of 12 candidate algorithms for walking speed prediction ranging from well-known simple approaches like combining step length with frequency to more sophisticated algorithms such as linear and non-linear models was assessed using statistical measures. As a result, a novel algorithm employing support vector regression was found to perform best with a concordance correlation coefficient of 0.93 (95%CI 0.92-0.94) and a coverage probability CP1 of 0.46 (95%CI 0.12-0.70) for a deviation of 0.1 m/s (CP2 0.78, CP3 0.94) when compared to the mobile gold standard while walking indoors. A smaller outdoor experiment confirmed those results with even better coverage probability. We conclude that walking speed thus obtained has the potential to help establish walking speed in free-living environments as a patient-oriented outcome measure.  相似文献   

4.
Walking speed is a fundamental indicator for human well-being. In a clinical setting, walking speed is typically measured by means of walking tests using different protocols. However, walking speed obtained in this way is unlikely to be representative of the conditions in a free-living environment. Recently, mobile accelerometry has opened up the possibility to extract walking speed from long-time observations in free-living individuals, but the validity of these measurements needs to be determined. In this investigation, we have developed algorithms for walking speed prediction based on 3D accelerometry data (actibelt®) and created a framework using a standardized data set with gold standard annotations to facilitate the validation and comparison of these algorithms. For this purpose 17 healthy subjects operated a newly developed mobile gold standard while walking/running on an indoor track. Subsequently, the validity of 12 candidate algorithms for walking speed prediction ranging from well-known simple approaches like combining step length with frequency to more sophisticated algorithms such as linear and non-linear models was assessed using statistical measures. As a result, a novel algorithm employing support vector regression was found to perform best with a concordance correlation coefficient of 0.93 (95%CI 0.92–0.94) and a coverage probability CP1 of 0.46 (95%CI 0.12–0.70) for a deviation of 0.1 m/s (CP2 0.78, CP3 0.94) when compared to the mobile gold standard while walking indoors. A smaller outdoor experiment confirmed those results with even better coverage probability. We conclude that walking speed thus obtained has the potential to help establish walking speed in free-living environments as a patient-oriented outcome measure.  相似文献   

5.
Attractor reconstruction for non-linear systems: a methodological note   总被引:2,自引:0,他引:2  
Attractor reconstruction is an important step in the process of making predictions for non-linear time-series and in the computation of certain invariant quantities used to characterize the dynamics of such series. The utility of computed predictions and invariant quantities is dependent on the accuracy of attractor reconstruction, which in turn is determined by the methods used in the reconstruction process. This paper suggests methods by which the delay and embedding dimension may be selected for a typical delay coordinate reconstruction. A comparison is drawn between the use of the autocorrelation function and mutual information in quantifying the delay. In addition, a false nearest neighbor (FNN) approach is used in minimizing the number of delay vectors needed. Results highlight the need for an accurate reconstruction in the computation of the Lyapunov spectrum and in prediction algorithms.  相似文献   

6.
We combine the results of three prediction algorithms on a test set of 21 amyloidogenic proteins to predict amyloidogenic determinants. Two prediction algorithms are recently developed prediction algorithms of amyloidogenic stretches in protein sequences, whereas the third is a secondary structure prediction algorithm capable of identifying 'conformational switches' (regions that have both the propensity for alpha-helix and beta-sheet). Surprisingly, the results of prediction agree well and also agree with experimentally investigated amyloidogenic regions. Furthermore, they suggest several previously not identified amino acid stretches as potential amyloidogenic determinants. Most predicted (and experimentally observed) amyloidogenic determinants reside on the protein surface of relevant solved crystal structures. It appears that a consensus prediction algorithm is more objective than individual prediction methods alone.  相似文献   

7.
RNA二级结构的预测算法研究已有近40年的发展历程,研究假结也将近30年的历史。在此期间,RNA二级结构的预测算法取得了很大进步,但假结预测的正确率依然偏低。其中启发式算法能较好地处理复杂假结,使其成为率先解决假结预测难题可能性最大的算法。迄今为止,未见系统地专门总结预测假结的各种启发式算法及其优点与缺点的报道。本文详细介绍了近年来国际上流行的贪婪算法、遗传算法、ILM算法、HotKnots算法以及FlexStem算法等五种算法,并总结分析了每种算法的优点与不足,最后提出在未来一段时期内,利用启发式算法提高假结预测准确度应从建立更完善的假结模型、加入更多影响因素、借鉴不同算法的优势等方面入手。为含假结RNA二级结构预测的研究提供参考。  相似文献   

8.
K Zhou  C Ai  P Dong  X Fan  L Yang 《Glycoconjugate journal》2012,29(7):551-564
In silico approaches have become an alternative method to study O-glycosylation. In this paper, we developed a linear interpretable model for O-glycosylation prediction based on an unbalanced dataset, analyzing the underlying biological knowledge of glycosylation. A training set of 4446 sites involving 468 positive sites and 3978 negative sites was developed during this research. The sites were encoded using the amino acid index (AAindex), and the forward stepwise procedure utilized for feature selection. The linear discriminant analysis with an equal a priori probability (PP-LDA) was employed to develop the interpretable model. Performance of the model was verified using both the internal leave-one-out cross-validation and external validation methods. Two non-linear algorithms, the supervised support vector machine and the unsupervised self-organizing competitive neural network, were used as comparisons. The PP-LDA model exhibited improved classification results with accuracy of 82.1?% for cross-validations and 80.3?% for external prediction. Further analysis of this linear model indicated that the properties at position R(1) and the properties relative to hydrophobicity contributed more to the glycosylation prediction. However, the alpha and turn propensities at the C-terminal, together with physicochemical properties at the N-terminal, are also relative to the glycosylation activity. This model is not only capable of predicting the possibility of glycosylation using an unbalanced dataset, but is also helpful to understand the underlying biological mechanisms of glycosylation. Considering the publicly accessibility of our prediction model, a downloadable program is provided in our supply materials.  相似文献   

9.
Prediction of proteasome cleavage motifs by neural networks   总被引:20,自引:0,他引:20  
We present a predictive method that can simulate an essential step in the antigen presentation in higher vertebrates, namely the step involving the proteasomal degradation of polypeptides into fragments which have the potential to bind to MHC Class I molecules. Proteasomal cleavage prediction algorithms published so far were trained on data from in vitro digestion experiments with constitutive proteasomes. As a result, they did not take into account the characteristics of the structurally modified proteasomes--often called immunoproteasomes--found in cells stimulated by gamma-interferon under physiological conditions. Our algorithm has been trained not only on in vitro data, but also on MHC Class I ligand data, which reflect a combination of immunoproteasome and constitutive proteasome specificity. This feature, together with the use of neural networks, a non-linear classification technique, make the prediction of MHC Class I ligand boundaries more accurate: 65% of the cleavage sites and 85% of the non-cleavage sites are correctly determined. Moreover, we show that the neural networks trained on the constitutive proteasome data learns a specificity that differs from that of the networks trained on MHC Class I ligands, i.e. the specificity of the immunoproteasome is different than the constitutive proteasome. The tools developed in this study in combination with a predictor of MHC and TAP binding capacity should give a more complete prediction of the generation and presentation of peptides on MHC Class I molecules. Here we demonstrate that such an approach produces an accurate prediction of the CTL the epitopes in HIV Nef. The method is available at www.cbs.dtu.dk/services/NetChop/.  相似文献   

10.

Background  

Understanding gene regulatory networks has become one of the central research problems in bioinformatics. More than thirty algorithms have been proposed to identify DNA regulatory sites during the past thirty years. However, the prediction accuracy of these algorithms is still quite low. Ensemble algorithms have emerged as an effective strategy in bioinformatics for improving the prediction accuracy by exploiting the synergetic prediction capability of multiple algorithms.  相似文献   

11.

Background  

The accuracy of protein secondary structure prediction has been improving steadily towards the 88% estimated theoretical limit. There are two types of prediction algorithms: Single-sequence prediction algorithms imply that information about other (homologous) proteins is not available, while algorithms of the second type imply that information about homologous proteins is available, and use it intensively. The single-sequence algorithms could make an important contribution to studies of proteins with no detected homologs, however the accuracy of protein secondary structure prediction from a single-sequence is not as high as when the additional evolutionary information is present.  相似文献   

12.
AIMS: To demonstrate the effect that non-linear dose responses have on the appearance of synergy in mixtures of antimicrobials. METHODS AND RESULTS: A mathematical model, which allows the prediction of the efficacy of mixtures of antimicrobials with non-linear dose responses, was produced. The efficacy of antimicrobial mixtures that would be classified as synergistic by time-kill methodology was shown to be a natural consequence of combining antimicrobials with non-linear dose responses. CONCLUSIONS: The effectiveness of admixtures of biocides and other antimicrobials with non-linear dose responses can be predicted. If the dose response (or dilution coefficient) of any biocidal component, in a mixture, is other than one, then the time-kill methodology used to ascertain the existence of synergy in antimicrobial combinations is flawed. SIGNIFICANCE AND IMPACT OF THE STUDY: The kinetic model developed allows the prediction of the efficacy of antimicrobial combinations. Combinations of known antimicrobials, which reduce the time taken to achieve a specified level of microbial inactivation, can be easily assessed once the kinetic profile of each component has been obtained. Most patented cases of antimicrobial synergy have not taken into account the possible effect of non-linear dose responses of the component materials. That much of the earlier literature can now be predicted, suggests that future cases will require more thorough proof of the alleged synergy.  相似文献   

13.

Background

Most studies on genomic prediction with reference populations that include multiple lines or breeds have used linear models. Data heterogeneity due to using multiple populations may conflict with model assumptions used in linear regression methods.

Methods

In an attempt to alleviate potential discrepancies between assumptions of linear models and multi-population data, two types of alternative models were used: (1) a multi-trait genomic best linear unbiased prediction (GBLUP) model that modelled trait by line combinations as separate but correlated traits and (2) non-linear models based on kernel learning. These models were compared to conventional linear models for genomic prediction for two lines of brown layer hens (B1 and B2) and one line of white hens (W1). The three lines each had 1004 to 1023 training and 238 to 240 validation animals. Prediction accuracy was evaluated by estimating the correlation between observed phenotypes and predicted breeding values.

Results

When the training dataset included only data from the evaluated line, non-linear models yielded at best a similar accuracy as linear models. In some cases, when adding a distantly related line, the linear models showed a slight decrease in performance, while non-linear models generally showed no change in accuracy. When only information from a closely related line was used for training, linear models and non-linear radial basis function (RBF) kernel models performed similarly. The multi-trait GBLUP model took advantage of the estimated genetic correlations between the lines. Combining linear and non-linear models improved the accuracy of multi-line genomic prediction.

Conclusions

Linear models and non-linear RBF models performed very similarly for genomic prediction, despite the expectation that non-linear models could deal better with the heterogeneous multi-population data. This heterogeneity of the data can be overcome by modelling trait by line combinations as separate but correlated traits, which avoids the occasional occurrence of large negative accuracies when the evaluated line was not included in the training dataset. Furthermore, when using a multi-line training dataset, non-linear models provided information on the genotype data that was complementary to the linear models, which indicates that the underlying data distributions of the three studied lines were indeed heterogeneous.

Electronic supplementary material

The online version of this article (doi:10.1186/s12711-014-0075-3) contains supplementary material, which is available to authorized users.  相似文献   

14.
This study focuses on predicting breathing pattern, which is crucial to deal with system latency in the treatments of moving lung tumors. Predicting respiratory motion in real-time is challenging, due to the inherent chaotic nature of breathing patterns, i.e. sensitive dependence on initial conditions. In this work, nonlinear prediction methods are used to predict the short-term evolution of the respiratory system for 62 patients, whose breathing time series was acquired using respiratory position management (RPM) system. Single step and N-point multi step prediction are performed for sampling rates of 5 Hz and 10 Hz. We compare the employed non-linear prediction methods with respect to prediction accuracy to Adaptive Infinite Impulse Response (IIR) prediction filters. A Local Average Model (LAM) and local linear models (LLMs) combined with a set of linear regularization techniques to solve ill-posed regression problems are implemented. For all sampling frequencies both single step and N-point multi step prediction results obtained using LAM and LLM with regularization methods perform better than IIR prediction filters for the selected sample patients. Moreover, since the simple LAM model performs as well as the more complicated LLM models in our patient sample, its use for non-linear prediction is recommended.  相似文献   

15.
A content-balancing accuracy index, called Q(9), has been proposed to evaluate algorithms of protein secondary structure prediction. Here the content-balancing means that the evaluation is independent of the contents of helix, strand and coil in the protein being predicted. It is shown that Q(9) is much superior to the widely used index Q(3). Therefore, algorithms are more objectively evaluated by Q(9) than Q(3). Based on 396 non-homologous proteins, five algorithms of secondary structure prediction were evaluated and compared by the new index Q(9). Of the five algorithms, PHD turned out to be the unique algorithm with an average Q(9) better than 60%. Based on the new index, it is shown that the performance of the consensus method based on a jury-decision from several algorithms is even worse than that of the best individual method. Rather than Q(3), we believe that Q(9) should be used to evaluate algorithms of protein secondary structure prediction in future studies in order to improve prediction quality.  相似文献   

16.
Zhang CT  Zhang R 《Proteins》2001,43(4):520-522
Nowadays even a 1% increase of the accuracy for the secondary structure prediction is considered remarkable progress. In this case, we have to consider the reasonableness of the accuracy index Q3, which is used widely. A refined accuracy index, called Q8, is proposed to evaluate algorithms of secondary structure prediction. It is shown that Q8 is superior to the widely used index Q3 in that the former carries more information of the predictive accuracy matrix than does the latter. Therefore, algorithms are evaluated more objectively by Q8 than Q3. Based on 396 nonhomologous proteins, five currently available algorithms of secondary structure prediction were evaluated and compared using the new index Q8. Of the five algorithms, PHD turned out to be the unique algorithm, with Q8 accuracy better than 70%. It is suggested that Q3 should be replaced by Q8 in evaluating secondary structure prediction in future studies.  相似文献   

17.
CircRNAs are novel members of the non-coding RNA family. For several decades circRNAs have been known to exist, however only recently the widespread abundance has become appreciated. Annotation of circRNAs depends on sequencing reads spanning the backsplice junction and therefore map as non-linear reads in the genome. Several pipelines have been developed to specifically identify these non-linear reads and consequently predict the landscape of circRNAs based on deep sequencing datasets. Here, we use common RNAseq datasets to scrutinize and compare the output from five different algorithms; circRNA_finder, find_circ, CIRCexplorer, CIRI, and MapSplice and evaluate the levels of bona fide and false positive circRNAs based on RNase R resistance. By this approach, we observe surprisingly dramatic differences between the algorithms specifically regarding the highly expressed circRNAs and the circRNAs derived from proximal splice sites. Collectively, this study emphasizes that circRNA annotation should be handled with care and that several algorithms should ideally be combined to achieve reliable predictions.  相似文献   

18.
基于SVR和CAR的多维时间序列分析及其在生态学中的应用   总被引:1,自引:0,他引:1  
基于支持向量回归(SVR)并融合带受控项的自回归模型(CAR),建立了一种既反映样本集动态特征又体现环境因子影响的非线性多维时间序列分析预测方法(SVR-CAR)。用一步预测法对两个生态学样本集的预测结果表明,SVR-CAR在所有参比模型中预测精度最高,并具结构风险最小、非线性、避免过拟合、泛化推广能力优异等诸多优点。SVR-CAR在生态学、农业科学、经济学等多维时间序列预测领域有较广泛的应用前景。  相似文献   

19.

Background

Clinical data, such as patient history, laboratory analysis, ultrasound parameters-which are the basis of day-to-day clinical decision support-are often used to guide the clinical management of cancer in the presence of microarray data. Several data fusion techniques are available to integrate genomics or proteomics data, but only a few studies have created a single prediction model using both gene expression and clinical data. These studies often remain inconclusive regarding an obtained improvement in prediction performance. To improve clinical management, these data should be fully exploited. This requires efficient algorithms to integrate these data sets and design a final classifier.LS-SVM classifiers and generalized eigenvalue/singular value decompositions are successfully used in many bioinformatics applications for prediction tasks. While bringing up the benefits of these two techniques, we propose a machine learning approach, a weighted LS-SVM classifier to integrate two data sources: microarray and clinical parameters.

Results

We compared and evaluated the proposed methods on five breast cancer case studies. Compared to LS-SVM classifier on individual data sets, generalized eigenvalue decomposition (GEVD) and kernel GEVD, the proposed weighted LS-SVM classifier offers good prediction performance, in terms of test area under ROC Curve (AUC), on all breast cancer case studies.

Conclusions

Thus a clinical classifier weighted with microarray data set results in significantly improved diagnosis, prognosis and prediction responses to therapy. The proposed model has been shown as a promising mathematical framework in both data fusion and non-linear classification problems.  相似文献   

20.
马铃薯晚疫病菌全基因组分泌蛋白的初步分析   总被引:1,自引:0,他引:1  
Zhou XG  Hou SM  Chen DW  Tao N  Ding YM  Sun ML  Zhang SS 《遗传》2011,33(7):785-793
利用马铃薯晚疫病菌全基因组测序结果,结合计算机技术和生物信息学的方法,对马铃薯晚疫病菌的蛋白进行分析,为明确该病原菌与寄主互作的分子机制奠定基础。文章应用信号肽预测软件SignalP v3.0和PSORT,跨膜螺旋结构预测软件TMHMM-2.0和THUMBUP,GPI锚定位点预测软件big-PI Predictor,亚细胞器中蛋白定位分布预测软件TargetP v1.01,对已经公布的马铃薯晚疫病菌全基因组22 658个蛋白质氨基酸序列进行分析。结果发现,晚疫病菌全基因组编码蛋白中有671个为潜在的分泌型蛋白,占编码蛋白总数的3.0%。其中有45个分泌蛋白有功能方面的描述,其功能涉及细胞代谢、信号转导等方面;此外,还有一些与激发子类似的分泌蛋白,它们可能与晚疫病菌的毒性有关。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号