首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
To understand the folding behavior of proteins is an important and challenging problem in modern molecular biology. In the present investigation, a large number of features representing protein sequences were developed based on sequence autocorrelation weighted by properties of amino acid residues. Genetic algorithm (GA) combined with multiple linear regression (MLR) was employed to select significant features related to protein folding rates, and to build global predictive model. Moreover, local lazy regression (LLR) method was also used to predict the protein folding rates. The obtained results indicated that LLR performed much better than the global MLR model. The important properties of amino acid residues affecting protein folding rates were also analyzed. The results of this study will be helpful to understand the mechanism of protein folding. Our results also demonstrate that the features of amino acid sequence autocorrelation is effective in representing the relationship between protein sequence and folding rates, and the local method is a powerful tool to predict the protein folding rates.  相似文献   

2.
The prediction of protein domain region is an advantageous process on the study of protein structure and function. In this study, we proposed a new method, which is composed of fuzzy mean operator and region division, to predict the particular positions of domains in a target protein based on its sequence. The whole sequence is aligned and scored by using fuzzy mean operator, and the final determination of domain region position is realized by region division. A published benchmark is used for the comparison with previous researches. In addition, we generate two extra datasets to examine the stability of this method. Finally, the prediction accuracy of independent test dataset achieved by our method was up to 84.13%. We wish that this method could be useful for related researches. Proteins 2015; 83:1462–1469. © 2015 Wiley Periodicals, Inc.  相似文献   

3.
β-turns are the most common type of non-repetitive structures, and constitute on average 25% of the amino acids in proteins. The formation of β-turns plays an important role in protein folding, protein stability and molecular recognition processes. In this work we present the neural network method NetTurnP, for prediction of two-class β-turns and prediction of the individual β-turn types, by use of evolutionary information and predicted protein sequence features. It has been evaluated against a commonly used dataset BT426, and achieves a Matthews correlation coefficient of 0.50, which is the highest reported performance on a two-class prediction of β-turn and not-β-turn. Furthermore NetTurnP shows improved performance on some of the specific β-turn types. In the present work, neural network methods have been trained to predict β-turn or not and individual β-turn types from the primary amino acid sequence. The individual β-turn types I, I', II, II', VIII, VIa1, VIa2, VIba and IV have been predicted based on classifications by PROMOTIF, and the two-class prediction of β-turn or not is a superset comprised of all β-turn types. The performance is evaluated using a golden set of non-homologous sequences known as BT426. Our two-class prediction method achieves a performance of: MCC=0.50, Qtotal=82.1%, sensitivity=75.6%, PPV=68.8% and AUC=0.864. We have compared our performance to eleven other prediction methods that obtain Matthews correlation coefficients in the range of 0.17-0.47. For the type specific β-turn predictions, only type I and II can be predicted with reasonable Matthews correlation coefficients, where we obtain performance values of 0.36 and 0.31, respectively. CONCLUSION: The NetTurnP method has been implemented as a webserver, which is freely available at http://www.cbs.dtu.dk/services/NetTurnP/. NetTurnP is the only available webserver that allows submission of multiple sequences.  相似文献   

4.
5.
Angiosperm phylogeny based on matK sequence information   总被引:2,自引:0,他引:2  
Plastid matK gene sequences for 374 genera representing all angiosperm orders and 12 genera of gymnosperms were analyzed using parsimony (MP) and Bayesian inference (BI) approaches. Traditionally, slowly evolving genomic regions have been preferred for deep-level phylogenetic inference in angiosperms. The matK gene evolves approximately three times faster than the widely used plastid genes rbcL and atpB. The MP and BI trees are highly congruent. The robustness of the strict consensus tree supercedes all individual gene analyses and is comparable only to multigene-based phylogenies. Of the 385 nodes resolved, 79% are supported by high jackknife values, averaging 88%. Amborella is sister to the remaining angiosperms, followed by a grade of Nymphaeaceae and Austrobaileyales. Bayesian inference resolves Amborella + Nymphaeaceae as sister to the rest, but with weak (0.42) posterior probability. The MP analysis shows a trichotomy sister to the Austrobaileyales representing eumagnoliids, monocots + Chloranthales, and Ceratophyllum + eudicots. The matK gene produces the highest internal support yet for basal eudicots and, within core eudicots, resolves a crown group comprising Berberidopsidaceae/Aextoxicaceae, Santalales, and Caryophyllales + asterids. Moreover, matK sequences provide good resolution within many angiosperm orders. Combined analyses of matK and other rapidly evolving DNA regions with available multigene data sets have strong potential to enhance resolution and internal support in deep level angiosperm phylogenetics and provide additional insights into angiosperm evolution.  相似文献   

6.

Background  

Protein-protein interactions are crucially important for cellular processes. Knowledge of these interactions improves the understanding of cell cycle, metabolism, signaling, transport, and secretion. Information about interactions can hint at molecular causes of diseases, and can provide clues for new therapeutic approaches. Several (usually expensive and time consuming) experimental methods can probe protein - protein interactions. Data sets, derived from such experiments make the development of prediction methods feasible, and make the creation of protein-protein interaction network predicting tools possible.  相似文献   

7.
8.
9.
The accuracy of current signal peptide predictors is outstanding. The most successful predictors are based on neural networks and hidden Markov models, reaching a sensitivity of 99% and an accuracy of 95%. Here, we demonstrate that the popular BLASTP alignment tool can be tuned for signal peptide prediction reaching the same high level of prediction success. Alignment-based techniques provide additional benefits. In spite of high success rates signal peptide predictors yield false predictions. Simple sequences like polyvaline, for example, are predicted as signal peptides. The general architecture of learning systems makes it difficult to trace the cause of such problems. This kind of false predictions can be recognized or avoided altogether by using sequence comparison techniques. Based on these results we have implemented a public web service, called Signal-BLAST. Predictions returned by Signal-BLAST are transparent and easy to analyze. AVAILABILITY: Signal-BLAST is available online at http://sigpep.services.came.sbg.ac.at/signalblast.html.  相似文献   

10.
11.
组合药物在复杂疾病的治疗中形成了多靶点,多环节上的密切联系,对疾病的治疗效果也可达到单种药物治疗意想不到的效果。组合药物中各单药功能各异但联用后治疗效果更佳,说明所对应疾病之间可能存在某种关系。通过研究疾病间关联关系,可能会发现治疗某种疾病的新靶标,从而在新药的研发中取得新的进展。本文以DCDB(组合药物数据库)中的药物组合为数据源构建组合药物网络,并通过网络聚类算法得到了33个独立且内部联系紧密的药物模块。其中7组药物模块所包含的组合药物用于治疗两种或两种以上疾病,说明这些疾病之间存在一定的关联关系。对这些关系进行论证,结果表明,组合药物网络是发现疾病关联关系的一种有效手段。  相似文献   

12.
陈绍晴  房德琳  陈彬 《生态学报》2015,35(7):2227-2233
人类开发活动造成剧烈的生态系统自然条件变化,生态风险评价可以对受到人为干扰下生态系统(包括物种和群落等)的潜在影响进行模拟和量化。通过对信息流量的概念和网络控制分析,综合考虑生态系统组分间的直接和间接作用,提出一种能实现全局风险模拟的生态网络模型,即信息网络模型。在该模型基础上,建立了面向整体生态系统的生态风险评价框架,同时实现兼容多胁迫因子统一模拟和多风险受体间的风险追踪。以澜沧江漫湾水库为例,在估算重金属Hg、Pb和Cd初始环境风险后,利用信息网络模型追踪分析生态系统中不同生态功能组分之间的风险传递路径,评估各生态组分和整体系统的危险程度。结果表明,在累积效应作用下,对于生态系统和部分群落,整合网络风险值与初始环境风险值之间有着显著差别;在发生环境胁迫时,虽然处于食物网底层的生物类群可能最先受险,但在控制信息作用下食物网上层类群也会受险,甚至其最终受到的潜在威胁比前者更大。信息网络模型可识别出复杂的风险流动路径和群落间的风险累积,从而为生态系统风险评价和管理提供更为系统综合的理论依据。  相似文献   

13.
14.
MOTIVATION: Obtaining soluble proteins in sufficient concentrations is a recurring limiting factor in various experimental studies. Solubility is an individual trait of proteins which, under a given set of experimental conditions, is determined by their amino acid sequence. Accurate theoretical prediction of solubility from sequence is instrumental for setting priorities on targets in large-scale proteomics projects. RESULTS: We present a machine-learning approach called PROSO to assess the chance of a protein to be soluble upon heterologous expression in Escherichia coli based on its amino acid composition. The classification algorithm is organized as a two-layered structure in which the output of primary support vector machine (SVM) classifiers serves as input for a secondary Naive Bayes classifier. Experimental progress information from the TargetDB database as well as previously published datasets were used as the source of training data. In comparison with previously published methods our classification algorithm possesses improved discriminatory capacity characterized by the Matthews Correlation Coefficient (MCC) of 0.434 between predicted and known solubility states and the overall prediction accuracy of 72% (75 and 68% for positive and negative class, respectively). We also provide experimental verification of our predictions using solubility measurements for 31 mutational variants of two different proteins.  相似文献   

15.
《Genomics》2020,112(1):809-819
Many biological experimental studies have confirmed that microRNAs (miRNAs) play a significant role in human complex diseases. Exploring miRNA-disease associations could be conducive to understanding disease pathogenesis at the molecular level and developing disease diagnostic biomarkers. However, since conducting traditional experiments is a costly and time-consuming way, plenty of computational models have been proposed to predict miRNA-disease associations. In this study, we presented a neoteric Bayesian model (KBMFMDA) that combines kernel-based nonlinear dimensionality reduction, matrix factorization and binary classification. The main idea of KBMFMDA is to project miRNAs and diseases into a unified subspace and estimate the association network in that subspace. KBMFMDA obtained the AUCs of 0.9132, 0.8708, 0.9008±0.0044 in global and local leave-one-out and five-fold cross validation. Moreover, KBMFMDA was applied to three important human cancers in three different kinds of case studies and most of the top 50 potential disease-related miRNAs were confirmed by many experimental reports.  相似文献   

16.

Background  

In the current climate of high-throughput computational biology, the inference of a protein's function from related measurements, such as protein-protein interaction relations, has become a canonical task. Most existing technologies pursue this task as a classification problem, on a term-by-term basis, for each term in a database, such as the Gene Ontology (GO) database, a popular rigorous vocabulary for biological functions. However, ontology structures are essentially hierarchies, with certain top to bottom annotation rules which protein function predictions should in principle follow. Currently, the most common approach to imposing these hierarchical constraints on network-based classifiers is through the use of transitive closure to predictions.  相似文献   

17.
海洋氮循环过程及基于基因组代谢网络模型的预测   总被引:1,自引:0,他引:1  
海洋氮循环在地球元素循环中充当着必不可少的角色。海洋氮循环是由一系列氧化还原反应构成的生物化学过程。固氮作用和氮同化作用为生态系统提供了生物可用氮(铵盐)。硝化作用可进一步将铵盐氧化为硝酸盐,硝酸盐又可以通过反硝化作用转化为氮气。整个氮循环实现了海洋中不同含氮无机盐间的转换。微生物是海洋氮循环的重要驱动者,海洋氮循环的研究可以帮助理解海洋生物与地球环境相互作用及协同演化的机制,从而更好地保护地球生态环境。随着氮循环关键微生物基因组尺度代谢网络模型的发表,研究者可以利用代谢网络模型来研究不同氮循环过程的效率、环境因子对氮循环过程的影响以及解析氮循环及生物网络的内在机理等,从而帮助人们更深入地研究海洋氮转化机制。本文主要综述了海洋氮循环过程中各个转化过程的主要微生物,以及基因组尺度代谢网络模型在分析氮循环中的应用。  相似文献   

18.
Background: Increasing evidences indicate that microRNAs (miRNAs) are functionally related to the development and progression of various human diseases. Inferring disease-related miRNAs can be helpful in promoting disease biomarker detection for the treatment, diagnosis, and prevention of complex diseases. Methods: To improve the prediction accuracy of miRNA-disease association and capture more potential disease-related miRNAs, we constructed a precise miRNA global similarity network (MSFSN) via calculating the miRNA similarity based on secondary structures, families, and functions. Results: We tested the network on the classical algorithms: WBSMDA and RWRMDA through the method of leave-one-out cross-validation. Eventually, AUCs of 0.8212 and 0.9657 are obtained, respectively. Also, the proposed MSFSN is applied to three cancers for breast neoplasms, hepatocellular carcinoma, and prostate neoplasms. Consequently, 82%, 76%, and 82% of the top 50 potential miRNAs for these diseases are respectively validated by the miRNA-disease associations database miR2Disease and oncomiRDB. Conclusion: Therefore, MSFSN provides a novel miRNA similarity network combining precise function network with global structure network of miRNAs to predict the associations between miRNAs and diseases in various models.  相似文献   

19.
The delineation of domain boundaries of a given sequence in the absence of known 3D structures or detectable sequence homology to known domains benefits many areas in protein science, such as protein engineering, protein 3D structure determination and protein structure prediction. With the exponential growth of newly determined sequences, our ability to predict domain boundaries rapidly and accurately from sequence information alone is both essential and critical from the viewpoint of gene function annotation. Anyone attempting to predict domain boundaries for a single protein sequence is invariably confronted with a plethora of databases that contain boundary information available from the internet and a variety of methods for domain boundary prediction. How are these derived and how well do they work? What definition of 'domain' do they use? We will first clarify the different definitions of protein domains, and then describe the available public databases with domain boundary information. Finally, we will review existing domain boundary prediction methods and discuss their strengths and weaknesses.  相似文献   

20.
Plots of biomass digestibility are linear with the natural logarithm of enzyme loading; the slope and intercept characterize biomass reactivity. The feed-forward back-propagation neural networks were performed to predict biomass digestibility by simulating the 1-, 6-, and 72-h slopes and intercepts of glucan, xylan, and total sugar hydrolyses of 147 poplar wood model samples with a variety of lignin contents, acetyl contents, and crystallinity indices. Regression analysis of the neural network models indicates that they performed satisfactorily. Increasing the dimensionality of the neural network input matrix allowed investigation of the influence glucan and xylan enzymatic hydrolyses have on each other. Glucan hydrolysis affected the last stage of xylan digestion, and xylan hydrolysis had no influence on glucan digestibility. This study has demonstrated that neural networks have good potential for predicting biomass digestibility over a wide range of enzyme loadings, thus providing the potential to design cost-effective pretreatment and saccharification processes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号