首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
本文基于范德华力势能预测2D三向的蛋白质结构。首先,将蛋白质结构预测这一生物问题转化为数学问题,并建立基于范德华力势能函数的数学模型。其次,使用遗传算法对数学模型进行求解,为了提高蛋白质结构预测效率,我们在标准遗传算法的基础上引入了调整算子这一概念,改进了遗传算法。最后,进行数值模拟实验。实验的结果表明范德华力势能函数模型是可行的,同时,和规范遗传算法相比,改进后的遗传算法能够较大幅度提高算法的搜索效率,并且遗传算法在蛋白质结构预测问题上有巨大潜力。  相似文献   

2.
The massively parallel genetic algorithm (GA) for RNA structure prediction uses the concepts of mutation, recombination, and survival of the fittest to evolve a population of thousands of possible RNA structures toward a solution structure. As described below, the properties of the algorithm are ideally suited to use in the prediction of possible folding pathways and functional intermediates of RNA molecules given their sequences. Utilizing Stem Trace, an interactive visualization tool for RNA structure comparison, analysis of not only the solution ensembles developed by the algorithm, but also the stages of development of each of these solutions, can give strong insight into these folding pathways. The GA allows the incorporation of information from biological experiments, making it possible to test the influence of particular interactions between structural elements on the dynamics of the folding pathway. These methods are used to reveal the folding pathways of the potato spindle tuber viroid (PSTVd) and the host killing mechanism of Escherichia coli plasmid R1, both of which are successfully explored through the combination of the GA and Stem Trace. We also present novel intermediate folds of each molecule, which appear to be phylogenetically supported, as determined by use of the methods described below.  相似文献   

3.
Thermodynamic folding algorithms and structure probing experiments are commonly used to determine the secondary structure of RNAs. Here we propose a formal framework to reconcile information from both prediction algorithms and probing experiments. The thermodynamic energy parameters are adjusted using 'pseudo-energies' to minimize the discrepancy between prediction and experiment. Our framework differs from related approaches that used pseudo-energies in several key aspects. (i) The energy model is only changed when necessary and no adjustments are made if prediction and experiment are consistent. (ii) Pseudo-energies remain biophysically interpretable and hold positional information where experiment and model disagree. (iii) The whole thermodynamic ensemble of structures is considered thus allowing to reconstruct mixtures of suboptimal structures from seemingly contradicting data. (iv) The noise of the energy model and the experimental data is explicitly modeled leading to an intuitive weighting factor through which the problem can be seen as folding with 'soft' constraints of different strength. We present an efficient algorithm to iteratively calculate pseudo-energies within this framework and demonstrate how this approach can be used in combination with SHAPE chemical probing data to improve secondary structure prediction. We further demonstrate that the pseudo-energies correlate with biophysical effects that are known to affect RNA folding such as chemical nucleotide modifications and protein binding.  相似文献   

4.
We combine a new, extremely fast technique to generate a library of low energy structures of an oligopeptide (by using mutually orthogonal Latin squares to sample its conformational space) with a genetic algorithm to predict protein structures. The protein sequence is divided into oligopeptides, and a structure library is generated for each. These libraries are used in a newly defined mutation operator that, together with variation, crossover, and diversity operators, is used in a modified genetic algorithm to make the prediction. Application to five small proteins has yielded near native structures.  相似文献   

5.
Most cardiovascular diseases are multifactorial by etiology. As an example, the development of myocardial infarction is promoted by numerous risk factors, ranging from rather modifiable lifestyle habits (e.g. smoking, physical activity) to genetic predisposition. With respect to the latter, 15 years of candidate gene analyses have failed to explain the molecular basis for the genetic predisposition to myocardial infarction. By contrast, recent genome-wide association studies have identified chromosomal loci that reproducibly displayed some association with myocardial infarction risk. When molecular genetic studies of coronary artery disease were first begun, it was assumed that genetic factors would soon be routinely incorporated into risk prediction scores. A number of biomarkers have been identified and tested in combination with the classical risk factors for refined risk prediction. However, the strategy for individualized risk prediction by incorporation of new biomarkers in established scores has so far proven to be more difficult than at first hoped.  相似文献   

6.

Background

Multiple protein templates are commonly used in manual protein structure prediction. However, few automated algorithms of selecting and combining multiple templates are available.

Results

Here we develop an effective multi-template combination algorithm for protein comparative modeling. The algorithm selects templates according to the similarity significance of the alignments between template and target proteins. It combines the whole template-target alignments whose similarity significance score is close to that of the top template-target alignment within a threshold, whereas it only takes alignment fragments from a less similar template-target alignment that align with a sizable uncovered region of the target. We compare the algorithm with the traditional method of using a single top template on the 45 comparative modeling targets (i.e. easy template-based modeling targets) used in the seventh edition of Critical Assessment of Techniques for Protein Structure Prediction (CASP7). The multi-template combination algorithm improves the GDT-TS scores of predicted models by 6.8% on average. The statistical analysis shows that the improvement is significant (p-value < 10-4). Compared with the ideal approach that always uses the best template, the multi-template approach yields only slightly better performance. During the CASP7 experiment, the preliminary implementation of the multi-template combination algorithm (FOLDpro) was ranked second among 67 servers in the category of high-accuracy structure prediction in terms of GDT-TS measure.

Conclusion

We have developed a novel multi-template algorithm to improve protein comparative modeling.  相似文献   

7.
We describe an improved algorithm for protein structure prediction, assuming that the location of secondary structural elements is known, with particular focus on prediction for proteins containing β-strands. Hydrogen bonding terms are incorporated into the potential function, supplementing our previously developed residue-residue potential which is based on a combination of database statistics and an excluded volume term. Two small mixed α/β proteins, 1-CTF and BPTI, are studied. In order to obtain native-like structures, it is necessary to allow the β-strands in BPTI to distort substantially from an ideal geometry, and an automated algorithm to carry this out efficiently is presented. Simulated annealing Monte Carlo methods, which contain a genetic algorithm component as well, are used to produce an ensemble of low-energy structures. For both proteins, a cluster of structures with low RMS deviation from the native structure is generated and the energetic ranking of this cluster is in the top 2 or 3 clusters obtained from simulations. These results are encouraging with regard to the possibility of constructing a robust procedure for tertiary folding which is applicable to β-strand containing proteins. Proteins 33:240–252, 1998. © 1998 Wiley-Liss, Inc.  相似文献   

8.
SUMMARY: We present an operon predictor for prokaryotic operons (PPO), which can predict operons in the entire prokaryotic genome. The prediction algorithm used in PPO allows the user to select binary particle swarm optimization (BPSO), a genetic algorithm (GA) or some other methods introduced in the literature to predict operons. The operon predictor on our web server and the provided database are easy to access and use. The main features offered are: (i) selection of the prediction algorithm; (ii) adjustable parameter settings of the prediction algorithm; (iii) graphic visualization of results; (iv) integrated database queries; (v) listing of experimentally verified operons; and (vi) related tools. Availability and implementation: PPO is freely available at http://bio.kuas.edu.tw/PPO/.  相似文献   

9.
Three different strategies to tackle mispredictions from incorrect secondary structure prediction are analysed using 21 small proteins (22-121 amino acids; 1-6 secondary structure elements) with known three dimensional structures: (1) Testing accuracy of different secondary structure predictions and improving them by combinations, (2) correcting mispredictions exploiting protein folding simulations with a genetic algorithm and (3) applying and combining experimental data to refine predictions both for secondary structure and tertiary fold. We demonstrate that predictions from secondary structure prediction programs can be efficiently combined to reduce prediction errors from missed secondary structure elements. Further, up to two secondary structure elements (helices, strands) missed by secondary structure prediction were corrected by the genetic algorithm simulation. Finally, we show how input from experimental data is exploited to refine the predictions obtained.Electronic Supplementary Material available.  相似文献   

10.
目的 研究构建基于共祖(identity-by-descent,IBD)片段算法预测远亲缘关系分析流程并评估预测准确性。方法 采用高密度单核苷酸多态性(single nucleotide polymorphism,SNP)芯片对253份家系样本进行检测,研究基于IBD片段算法的分析流程进行两两个体间亲缘关系预测,评估预测准确性。随机减少SNP位点,评估位点数对算法预测准确性的影响。结果 IBD片段算法预测1~7级亲缘关系平均置信区间准确率为94.72%,预测可信度为99.77%,6级及以上亲缘关系预测时出现假阴性。随着SNP数量减少,预测准确性会出现一定程度的下降。结论 IBD片段算法可用于7级以内亲缘关系的预测,该算法在群体遗传学、法医遗传学等领域有重要应用价值。  相似文献   

11.
The ability to predict quantitative trait phenotypes from molecular polymorphism data will revolutionize evolutionary biology, medicine and human biology, and animal and plant breeding. Efforts to map quantitative trait loci have yielded novel insights into the biology of quantitative traits, but the combination of individually significant quantitative trait loci typically has low predictive ability. Utilizing all segregating variants can give good predictive ability in plant and animal breeding populations, but gives little insight into trait biology. Here, we used the Drosophila Genetic Reference Panel to perform both a genome wide association analysis and genomic prediction for the fitness-related trait chill coma recovery time. We found substantial total genetic variation for chill coma recovery time, with a genetic architecture that differs between males and females, a small number of molecular variants with large main effects, and evidence for epistasis. Although the top additive variants explained 36% (17%) of the genetic variance among lines in females (males), the predictive ability using genomic best linear unbiased prediction and a relationship matrix using all common segregating variants was very low for females and zero for males. We hypothesized that the low predictive ability was due to the mismatch between the infinitesimal genetic architecture assumed by the genomic best linear unbiased prediction model and the true genetic architecture of chill coma recovery time. Indeed, we found that the predictive ability of the genomic best linear unbiased prediction model is markedly improved when we combine quantitative trait locus mapping with genomic prediction by only including the top variants associated with main and epistatic effects in the relationship matrix. This trait-associated prediction approach has the advantage that it yields biologically interpretable prediction models.  相似文献   

12.
A protein secondary structure prediction method from multiply aligned homologous sequences is presented with an overall per residue three-state accuracy of 70.1%. There are two aims: to obtain high accuracy by identification of a set of concepts important for prediction followed by use of linear statistics; and to provide insight into the folding process. The important concepts in secondary structure prediction are identified as: residue conformational propensities, sequence edge effects, moments of hydrophobicity, position of insertions and deletions in aligned homologous sequence, moments of conservation, auto-correlation, residue ratios, secondary structure feedback effects, and filtering. Explicit use of edge effects, moments of conservation, and auto-correlation are new to this paper. The relative importance of the concepts used in prediction was analyzed by stepwise addition of information and examination of weights in the discrimination function. The simple and explicit structure of the prediction allows the method to be reimplemented easily. The accuracy of a prediction is predictable a priori. This permits evaluation of the utility of the prediction: 10% of the chains predicted were identified correctly as having a mean accuracy of > 80%. Existing high-accuracy prediction methods are "black-box" predictors based on complex nonlinear statistics (e.g., neural networks in PHD: Rost & Sander, 1993a). For medium- to short-length chains (> or = 90 residues and < 170 residues), the prediction method is significantly more accurate (P < 0.01) than the PHD algorithm (probably the most commonly used algorithm). In combination with the PHD, an algorithm is formed that is significantly more accurate than either method, with an estimated overall three-state accuracy of 72.4%, the highest accuracy reported for any prediction method.  相似文献   

13.
Two different artificial intelligence techniques namely artificial neural network (ANN) and genetic algorithm (GA) were integrated for optimizing fermentation medium for the production of glucansucrase. The experimental data reported in a previous study were used to build the neural network. The ANN was trained using the back propagation algorithm. The ANN predicted values showed good agreement with the experimentally reported ones from a response surface based experiment. The concentrations of three medium components: viz Tween 80, sucrose and K(2)HPO(4) served as inputs to the neural network model and the enzyme activity as the output of the model. A model was generated with a coefficient of correlation (R(2)) of 1.0 for the training set and 0.90 for the test data. A genetic algorithm was used to optimize the input space of the neural network model to find the optimum settings for maximum enzyme activity. This artificial neural network supported genetic algorithm predicted a maximum glucansucrase activity of 6.92U/ml at medium composition of 0.54% (v/v) Tween 80, 5.98% (w/v) sucrose and 1.01% (w/v) K(2)HPO(4). ANN-GA predicted model gave a 6.0% increase of enzyme activity over the regression based prediction for optimized enzyme activity. The maximum enzyme activity experimentally obtained using the ANN-GA designed medium was 6.75+/-0.09U/ml which was in good agreement with the predicted value.  相似文献   

14.
Prediction of genetic risk for disease is needed for preventive and personalized medicine. Genome-wide association studies have found unprecedented numbers of variants associated with complex human traits and diseases. However, these variants explain only a small proportion of genetic risk. Mounting evidence suggests that many traits, relevant to public health, are affected by large numbers of small-effect genes and that prediction of genetic risk to those traits and diseases could be improved by incorporating large numbers of markers into whole-genome prediction (WGP) models. We developed a WGP model incorporating thousands of markers for prediction of skin cancer risk in humans. We also considered other ways of incorporating genetic information into prediction models, such as family history or ancestry (using principal components, PCs, of informative markers). Prediction accuracy was evaluated using the area under the receiver operating characteristic curve (AUC) estimated in a cross-validation. Incorporation of genetic information (i.e., familial relationships, PCs, or WGP) yielded a significant increase in prediction accuracy: from an AUC of 0.53 for a baseline model that accounted for nongenetic covariates to AUCs of 0.58 (pedigree), 0.62 (PCs), and 0.64 (WGP). In summary, prediction of skin cancer risk could be improved by considering genetic information and using a large number of single-nucleotide polymorphisms (SNPs) in a WGP model, which allows for the detection of patterns of genetic risk that are above and beyond those that can be captured using family history. We discuss avenues for improving prediction accuracy and speculate on the possible use of WGP to prospectively identify individuals at high risk.  相似文献   

15.
A query learning algorithm based on hidden Markov models (HMMs) isdeveloped to design experiments for string analysis and prediction of MHCclass I binding peptides. Query learning is introduced to aim at reducingthe number of peptide binding data for training of HMMs. A multiple numberof HMMs, which will collectively serve as a committee, are trained withbinding data and used for prediction in real-number values. The universeof peptides is randomly sampled and subjected to judgement by the HMMs.Peptides whose prediction is least consistent among committee HMMs aretested by experiment. By iterating the feedback cycle of computationalanalysis and experiment the most wanted information is effectivelyextracted. After 7 rounds of active learning with 181 peptides in all,predictive performance of the algorithm surpassed the so far bestperforming matrix based prediction. Moreover, by combining the bothmethods binder peptides (log Kd < -6) could be predicted with84% accuracy. Parameter distribution of the HMMs that can be inspectedvisually after training further offers a glimpse of dynamic specificity ofthe MHC molecules.  相似文献   

16.
The prediction of translation initiation sites (TISs) in eukaryotic mRNAs has been a challenging problem in computational molecular biology. In this paper, we present a new algorithm to recognize TISs with a very high accuracy. Our algorithm includes two novel ideas. First, we introduce a class of new sequence-similarity kernels based on string editing, called edit kernels, for use with support vector machines (SVMs) in a discriminative approach to predict TISs. The edit kernels are simple and have significant biological and probabilistic interpretations. Although the edit kernels are not positive definite, it is easy to make the kernel matrix positive definite by adjusting the parameters. Second, we convert the region of an input mRNA sequence downstream to a putative TIS into an amino acid sequence before applying SVMs to avoid the high redundancy in the genetic code. The algorithm has been implemented and tested on previously published data. Our experimental results on real mRNA data show that both ideas improve the prediction accuracy greatly and that our method performs significantly better than those based on neural networks and SVMs with polynomial kernels or Salzberg kernels.  相似文献   

17.
In this paper, we propose a genetic algorithm based design procedure for a multi layer feed forward neural network. A hierarchical genetic algorithm is used to evolve both the neural networks topology and weighting parameters. Compared with traditional genetic algorithm based designs for neural networks, the hierarchical approach addresses several deficiencies, including a feasibility check highlighted in literature. A multi objective cost function is used herein to optimize the performance and topology of the evolved neural network simultaneously. In the prediction of Mackey Glass chaotic time series, the networks designed by the proposed approach prove to be competitive, or even superior, to traditional learning algorithms for the multi layer Perceptron networks and radial basis function networks. Based upon the chosen cost function, a linear weight combination decision making approach has been applied to derive an approximated Pareto optimal solution set. Therefore, designing a set of neural networks can be considered as solving a two objective optimization problem.  相似文献   

18.

Background

Dominance effect may play an important role in genetic variation of complex traits. Full featured and easy-to-use computing tools for genomic prediction and variance component estimation of additive and dominance effects using genome-wide single nucleotide polymorphism (SNP) markers are necessary to understand dominance contribution to a complex trait and to utilize dominance for selecting individuals with favorable genetic potential.

Results

The GVCBLUP package is a shared memory parallel computing tool for genomic prediction and variance component estimation of additive and dominance effects using genome-wide SNP markers. This package currently has three main programs (GREML_CE, GREML_QM, and GCORRMX) and a graphical user interface (GUI) that integrates the three main programs with an existing program for the graphical viewing of SNP additive and dominance effects (GVCeasy). The GREML_CE and GREML_QM programs offer complementary computing advantages with identical results for genomic prediction of breeding values, dominance deviations and genotypic values, and for genomic estimation of additive and dominance variances and heritabilities using a combination of expectation-maximization (EM) algorithm and average information restricted maximum likelihood (AI-REML) algorithm. GREML_CE is designed for large numbers of SNP markers and GREML_QM for large numbers of individuals. Test results showed that GREML_CE could analyze 50,000 individuals with 400 K SNP markers and GREML_QM could analyze 100,000 individuals with 50K SNP markers. GCORRMX calculates genomic additive and dominance relationship matrices using SNP markers. GVCeasy is the GUI for GVCBLUP integrated with an existing software tool for the graphical viewing of SNP effects and a function for editing the parameter files for the three main programs.

Conclusion

The GVCBLUP package is a powerful and versatile computing tool for assessing the type and magnitude of genetic effects affecting a phenotype by estimating whole-genome additive and dominance heritabilities, for genomic prediction of breeding values, dominance deviations and genotypic values, for calculating genomic relationships, and for research and education in genomic prediction and estimation.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-270) contains supplementary material, which is available to authorized users.  相似文献   

19.
顾倜  蔡磊鑫  王帅  吕强 《生物信息学》2017,15(3):142-148
假结是RNA中一种重要的结构,由于建模的困难导致它更难被预测。通过碱基之间的配对概率来预测含假结RNA二级结构的Prob Knot算法具有很高的精度,但该算法仅用了配对概率作为预测依据,导致阴性配对大量出现,因此精度中的特异性较低。实验结合Prob Knot算法中碱基配对概率模型,通过使用多目标遗传算法,从而提高预测含假结RNA二级结构的特异性,以此促进总体精度的提高。实验过程中,首先计算出每个碱基成为单链的概率,作为新增的预测依据,然后使用遗传算法对RNA二级结构进行交叉、变异和迭代,最后得到Pareto最优解,进一步得出最高的最大期望精度。实验结果表明,在使用的RNA案例中,采用该方法比现有方法精度平均提高约4%。  相似文献   

20.
Huang HL  Chang FL 《Bio Systems》2007,90(2):516-528
An optimal design of support vector machine (SVM)-based classifiers for prediction aims to optimize the combination of feature selection, parameter setting of SVM, and cross-validation methods. However, SVMs do not offer the mechanism of automatic internal relevant feature detection. The appropriate setting of their control parameters is often treated as another independent problem. This paper proposes an evolutionary approach to designing an SVM-based classifier (named ESVM) by simultaneous optimization of automatic feature selection and parameter tuning using an intelligent genetic algorithm, combined with k-fold cross-validation regarded as an estimator of generalization ability. To illustrate and evaluate the efficiency of ESVM, a typical application to microarray classification using 11 multi-class datasets is adopted. By considering model uncertainty, a frequency-based technique by voting on multiple sets of potentially informative features is used to identify the most effective subset of genes. It is shown that ESVM can obtain a high accuracy of 96.88% with a small number 10.0 of selected genes using 10-fold cross-validation for the 11 datasets averagely. The merits of ESVM are three-fold: (1) automatic feature selection and parameter setting embedded into ESVM can advance prediction abilities, compared to traditional SVMs; (2) ESVM can serve not only as an accurate classifier but also as an adaptive feature extractor; (3) ESVM is developed as an efficient tool so that various SVMs can be used conveniently as the core of ESVM for bioinformatics problems.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号