首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
In order to establish novel hybrid neural discriminant model, linear discriminant analysis (LDA) was used at the first stage to evaluate the contribution of sequence parameters in determining the protein structural class. An in-house program generated parameters including single amino acid and all dipeptide composition frequencies for 498 proteins came from Zhou [An intriguing controversy over protein structural class prediction, J. Protein Chem. 17(8) (1998) 729-738]. Then, 127 statistically effective parameters were selected by stepwise LDA and were used as inputs of the artificial neural networks (ANNs) to build a two-stage hybrid predictor. In this study, self-consistency and jackknife tests were used to verify the performance of this hybrid model, and were compared with some of prior works. The results showed that our two-stage hybrid neural discriminant model approach is very promising and may play a complementary role to the existing powerful approaches.  相似文献   

2.
A genetic algorithm (GA) for feature selection in conjunction with neural network was applied to predict protein structural classes based on single amino acid and all dipeptide composition frequencies. These sequence parameters were encoded as input features for a GA in feature selection procedure and classified with a three-layered neural network to predict protein structural classes. The system was established through optimization of the classification performance of neural network which was used as evaluation function. In this study, self-consistency and jackknife tests on a database containing 498 proteins were used to verify the performance of this hybrid method, and were compared with some of prior works. The adoption of a hybrid model, which encompasses genetic and neural technologies, demonstrated to be a promising approach in the task of protein structural class prediction.  相似文献   

3.
李楠  李春 《生物信息学》2012,10(4):238-240
基于氨基酸的16种分类模型,给出蛋白质序列的派生序列,进而结合加权拟熵和LZ复杂度构造出34维特征向量来表示蛋白质序列。借助于贝叶斯分类器对同源性不超过25%的640数据集进行蛋白质结构类预测,准确度达到71.28%。  相似文献   

4.
Chen C  Zhou X  Tian Y  Zou X  Cai P 《Analytical biochemistry》2006,357(1):116-121
Because a priori knowledge of a protein structural class can provide useful information about its overall structure, the determination of protein structural class is a quite meaningful topic in protein science. However, with the rapid increase in newly found protein sequences entering into databanks, it is both time-consuming and expensive to do so based solely on experimental techniques. Therefore, it is vitally important to develop a computational method for predicting the protein structural class quickly and accurately. To deal with the challenge, this article presents a dual-layer support vector machine (SVM) fusion network that is featured by using a different pseudo-amino acid composition (PseAA). The PseAA here contains much information that is related to the sequence order of a protein and the distribution of the hydrophobic amino acids along its chain. As a showcase, the rigorous jackknife cross-validation test was performed on the two benchmark data sets constructed by Zhou. A significant enhancement in success rates was observed, indicating that the current approach may serve as a powerful complementary tool to other existing methods in this area.  相似文献   

5.
Zhang S  Ding S  Wang T 《Biochimie》2011,93(4):710-714
Information on the structural classes of proteins has been proven to be important in many fields of bioinformatics. Prediction of protein structural class for low-similarity sequences is a challenge problem. In this study, 11 features (including 8 re-used features and 3 newly-designed features) are rationally utilized to reflect the general contents and spatial arrangements of the secondary structural elements of a given protein sequence. To evaluate the performance of the proposed method, jackknife cross-validation tests are performed on two widely used benchmark datasets, 1189 and 25PDB with sequence similarity lower than 40% and 25%, respectively. Comparison of our results with other methods shows that our proposed method is very promising and may provide a cost-effective alternative to predict protein structural class in particular for low-similarity datasets.  相似文献   

6.
Knowledge of structural class plays an important role in understanding protein folding patterns. In this study, a simple and powerful computational method, which combines support vector machine with PSI-BLAST profile, is proposed to predict protein structural class for low-similarity sequences. The evolution information encoding in the PSI-BLAST profiles is converted into a series of fixed-length feature vectors by extracting amino acid composition and dipeptide composition from the profiles. The resulting vectors are then fed to a support vector machine classifier for the prediction of protein structural class. To evaluate the performance of the proposed method, jackknife cross-validation tests are performed on two widely used benchmark datasets, 1189 (containing 1092 proteins) and 25PDB (containing 1673 proteins) with sequence similarity lower than 40% and 25%, respectively. The overall accuracies attain 70.7% and 72.9% for 1189 and 25PDB datasets, respectively. Comparison of our results with other methods shows that our method is very promising to predict protein structural class particularly for low-similarity datasets and may at least play an important complementary role to existing methods.  相似文献   

7.
Protein structural class prediction is one of the challenging problems in bioinformatics. Previous methods directly based on the similarity of amino acid (AA) sequences have been shown to be insufficient for low-similarity protein data-sets. To improve the prediction accuracy for such low-similarity proteins, different methods have been recently proposed that explore the novel feature sets based on predicted secondary structure propensities. In this paper, we focus on protein structural class prediction using combinations of the novel features including secondary structure propensities as well as functional domain (FD) features extracted from the InterPro signature database. Our comprehensive experimental results based on several benchmark data-sets have shown that the integration of new FD features substantially improves the accuracy of structural class prediction for low-similarity proteins as they capture meaningful relationships among AA residues that are far away in protein sequence. The proposed prediction method has also been tested to predict structural classes for partially disordered proteins with the reasonable prediction accuracy, which is a more difficult problem comparing to structural class prediction for commonly used benchmark data-sets and has never been done before to the best of our knowledge. In addition, to avoid overfitting with a large number of features, feature selection is applied to select discriminating features that contribute to achieve high prediction accuracy. The selected features have been shown to achieve stable prediction performance across different benchmark data-sets.  相似文献   

8.
Amino acid propensities for secondary structures were used since the 1970s, when Chou and Fasman evaluated them within datasets of few tens of proteins and developed a method to predict secondary structure of proteins, still in use despite prediction methods having evolved to very different approaches and higher reliability. Propensity for secondary structures represents an intrinsic property of amino acid, and it is used for generating new algorithms and prediction methods, therefore our work has been aimed to investigate what is the best protein dataset to evaluate the amino acid propensities, either larger but not homogeneous or smaller but homogeneous sets, i.e., all-alpha, all-beta, alpha-beta proteins. As a first analysis, we evaluated amino acid propensities for helix, beta-strand, and coil in more than 2000 proteins from the PDBselect dataset. With these propensities, secondary structure predictions performed with a method very similar to that of Chou and Fasman gave us results better than the original one, based on propensities derived from the few tens of X-ray protein structures available in the 1970s. In a refined analysis, we subdivided the PDBselect dataset of proteins in three secondary structural classes, i.e., all-alpha, all-beta, and alpha-beta proteins. For each class, the amino acid propensities for helix, beta-strand, and coil have been calculated and used to predict secondary structure elements for proteins belonging to the same class by using resubstitution and jackknife tests. This second round of predictions further improved the results of the first round. Therefore, amino acid propensities for secondary structures became more reliable depending on the degree of homogeneity of the protein dataset used to evaluate them. Indeed, our results indicate also that all algorithms using propensities for secondary structure can be still improved to obtain better predictive results.  相似文献   

9.
Structural class characterizes the overall folding type of a protein or its domain. This paper develops an accurate method for in silico prediction of structural classes from low homology (twilight zone) protein sequences. The proposed LLSC-PRED method applies linear logistic regression classifier and a custom-designed, feature-based sequence representation to provide predictions. The main advantages of the LLSC-PRED are the comprehensive representation that includes 58 features describing composition and physicochemical properties of the sequences and transparency of the prediction model. The representation also includes predicted secondary structure content, thus for the first time exploring synergy between these two related predictions. Based on tests performed with a large set of 1673 twilight zone domains, the LLSC-PRED's prediction accuracy, which equals over 62%, is shown to be better than accuracy of over a dozen recently published competing in silico methods and similar to accuracy of other, non-transparent classifiers that use the proposed representation.  相似文献   

10.
Summary Chou-Fasman parameters, measuring preferences of each amino acid for different conformational regions in proteins, were used to obtain an amino acid difference index of conformational parameter distance (CPD) values. CPD values were found to be significantly lower for amino acid exchanges representing in the genetic code transitions of purines, GA than for exchanges representing either transitions of pyrimidines, CU, or transversions of purines and pyrimidines. Inasmuch as the distribution of CPD values in these non GA exchanges resembles that obtained for amino acid pairs with double or triple base differences in their underlying codons, we conclude that the genetic code was not particularly designed to minimize effects of mutation on protein conformation. That natural selection minimizes these changes, however, was shown by tabulating results obtained by the maximum parsimony method for eight protein genealogies with a total occurrence of 4574 base substitutions. At the beginning position of the codons GA transitions were in very great excess over other base substitutions, and, conversely, CU transitions were deficient. At the middle position of the codons only fast evolving proteins showed an excess of GA transitions, as though selection mainly preserved conformation in these proteins while weeding out mutations affecting chemical properties of functional sites in slow evolving proteins. In both fast and slow evolving proteins the net direction of transitions and transversions was found to be from G beginning codons to non-G beginning codons resulting in more commonly occurring amino acids, especially alanine with its generalized conformational properties, being replaced at suitable sites by amino acids with more specialized conformational and chemical properties. Historical circumstances pertaining to the origin of the genetic code and the nature of primordial proteins could account for such directional changes leading to increases in the functional density of proteins.In order to further explore the course of protein evolution, a modified parsimony algorithm was developed for constructing protein genealogies on the basis of minimum CPD length. The algorithm's ability to judge with finer discrimination that in protein evolution certain pathways of amino acid substitution should occur more readily than others was considered a potential advantage over strict maximum parsimony. In developing this CPD algorithm, the path of minimum CPD length through intermediate amino acids allowed by the genetic code for each pair of amino acids was determined. It was found that amino acid exchanges representing two base changes have a considerably lower average CPD value per base substitution than the amino acid exchanges representing single base changes. Amino acid exchanges representing three base changes have yet a further marked reduction in CPD per base change. This shows how extreme constraining effects of stabilizing selection can be circumvented, for by way of intermediate amino acids almost any amino acid can ultimately be substituted for another without damage to an evolving protein's conformation during the process.  相似文献   

11.
To characterise the flow of a fluid through a uniform porous medium, the medium may be completely described by its permeability, a measure of flow resistance. Fluid flow in the intertrabecular spaces of cancellous bone has been recognised as an important factor in a number of physical phenomena. In order to investigate the interdependence of permeability, porosity and the structural parameters, we adapted a morphological model and systematically varied its structural parameters. By simulating a viscous Stokes flow regime, we were able to estimate the anisotropic permeability tensor and performed an extensive, stepwise multivariate regression analysis to establish empirical relationships between the morphological parameters and the permeability for the anatomical directions individually. The regression analysis indicated high values of determination coefficients [0.88 < R2 < 0.89 (transversal directions) and R2 = 0.60 (longitudinal direction), porosity-based prediction and R2 = 0.98 for all directions and information presented to the regression model]. We conclude that a pooled set of structural parameters may explain up to 98% of the permeability variability, the regression model predicts permeability values that match experimental data, and a good prediction performance could be achieved by only incorporating the porosity and either the degree of anisotropy (0.89 < R2 < 0.91) or the trabecular spacing predictor (0.96 < R2 < 0.97). These conclusions imply that trabecular thickness and shape parameters only play a minor role in the determination of vertebral trabecular bone permeability. However, a major limitation of the model is that it reflects an idealisation of the real, regionally varying structure of trabecular bone. Therefore, the goodness-of-fit estimates we are presenting should be considered as an upper bound limitation regarding the prediction performance.  相似文献   

12.
Wolff K  Vendruscolo M  Porto M 《Gene》2008,422(1-2):47-51
We discuss a computational approach for reconstructing the native structures of proteins from the knowledge of a structural profile - the first eigenvector of the contact map of the native structure itself. The procedure consists in carrying out Monte Carlo simulations of a tube model of the protein structure with an energy bias towards the target structural profile. We present the reconstruction of two small proteins and address problems arising in the reconstruction of larger proteins. Our results indicate that an accurate physico-chemical energy function should be used in conjunction with the structural profile bias in order to achieve accurate reconstructions.  相似文献   

13.
Summary A method for detecting homology between two protein or nucleic acid sequences which require insertions or deletions for optimum alignment has been devised for use with a computer. Sequences are assessed for possible relationship by Monte Carlo methods involving comparisons between the alignment of the real sequences and alignments of randomly scrambled sequences of the Same composition as the real sequences, each alignment having the optimum number of gaps. As each gap is successively introduced into a comparison (real or random) a maximum score is determined from the similarity of the aligned residues. From the distribution of the maximum alignment scores of randomly scrambled sequences having the same number of gaps, the percentage of random comparisons having higher scores is determined, and the smallest of these percentage levels for each pair of sequences (real or random) indicates the optimum alignment. The fraction of the comparisons of random sequences having percentage levels at their optimum alignment below that of the real sequence comparison at its optimum estimates the probability that such an alignment might have arisen by chance. Related sequences are detected since their optimum alignment score, by virtue of a contribution from ancestral homology in addition to optimised random considerations, occupies a more extreme position in the appropriate frequency distribution of scores than do the majority of optimum scores of randomly scrambled sequences in their appropriate distributions.Application of this optimum match method of sequence comparison shows that the sensitivity of the maximum match method of Needleman and Wunsch (1970) decreases quite dramatically with sequence comparisons which require only a few gaps for a reasonable alignment, or when sequences differ greatly in length. The maximum match method as applied by Barker and Dayhoff (1972) has the additional disadvantage that deletions which have occurred in the longer of two homologous protein sequences further decrease the sensitivity of detection of relationship. The constrained match method of Sankoff and Cedergren (1973) is seen to be misleading since large increments in the alignment score from added gaps do not necessarily result in a high total alignment score required to demonstrate sequence homology.  相似文献   

14.
The complete primary structure of protein L2 which is the largest protein component of the E. coli 50 S subunit, has been established. A combination of enzymatic and chemical cleavages has been employed to isolate peptides, which were sequenced by the micro-DABITC/PITC double-coupling method [FEBS Lett. (1978) 93, 205–214]. The sequence determined shows protein L2 to consist of 272 amino acid residues with Mr = 29730. Secondary structure predictions were made based on the primary structure. Further, sequence regions homologous to other ribosomal proteins are presented. These results suggest protein L2, which binds specifically to the 23 S RNA, to show homologous sequence stretches to the other RNA-binding proteins.  相似文献   

15.
A simple and sensitive method for determining hypusine in proteins was developed. A greater part of amino acids in the acid hydrolysate of proteins was separated from hypusine by treatment with an ion-exchange resin. The sample containing partially purified hypusine was then analyzed by high-performance liquid chromatography using the post-column derivatization method with o-phthalaldehyde. The recovery rate of hypusine through the overall procedure was more than 95%. Using this method, the distribution and developmental changes of hypusine in proteins were determined. The amino acid was found in proteins of all examined organs of rat. Its concentration was 5–40 nmol/g protein. The subcellular distribution in rat liver was also determined. About 60% of total amount of hypusine was present in the proteins of cytoplasmic and microsomal fractions and its relative concentration was high in the proteins of microsome and lysosome and low in mitochondria. In developing rat, the concentration of hypusine in the brain proteins was relatively high during the first 2 or 3 weeks of postnatal life and then decreased until adulthood. Its concentration in the liver proteins was highest at birth and then decreased continuously to the adult level.  相似文献   

16.
Summary Partial depletion of the taurine content in the rat retina was accomplished for up to 22 weeks by introduction of 1.5% guanidinoethanesulfonate (GES) in the drinking water. Taurine levels decreased by 50% after 1 week of GES treatment and by 80% at 16 weeks. Replacement of GES by taurine to the GES-treated rats from week 16 to 22 returned their taurine content to the control value. Whereas addition of taurine (1.5%) to the drinking water of control rats from week 16 to 22 elevated the retinal taurine content to 118% of the control value, the administration of untreated water to GES-treated animals for the 16 to 22 week time period increased the retinal taurine content to only 76% of the control value.The amplitude of the electroretinogram (ERG) b-wave was decreased by 60% after GES-treatment for 16 weeks and maintained this reduced level for up to 22 weeks. Administration of taurine in the drinking water from week 16 to 22 returned the b-wave amplitude to a range not statistically different from the control values whereas the administration of untreated water produced less improvement.After 6 weeks of GES treatment when the retinal taurine content was reduced by 70% and the amplitude of the b-wave was reduced by 50% (extrapolated from Figure 1), phosphorylation of a specific protein with an approximate molecular weight of 20K was increased by 94%. The increased phosphorylation of the ~20K protein observed after GES treatment was reversed when the animals were treated with taurine (1 1/2%) in the drinking water for an additional 6 weeks. There was no change in the phosphorylation of the ~20K protein when animals were treated with taurine for 6 weeks. The data obtained support the theory that taurine may have a regulatory effect on retinal protein phosphorylation.  相似文献   

17.
Protein degradation in isolated rat hepatocytes, as measured by the release of [14C]valine from pre-labelled protein, is partly inhibited by a physiologically balanced mixture of amino acids. The inhibition is largely due to the seven amino acids leucine, phenylalanine, tyrosine, tryptophan, histidine, asparagine and glutamine.When the amino acids are tested individually at different concentrations, asparagine and glutamine are the strongest inhibitors. However, when various combinations are tested, a mixture of the first five amino acids as well as a combination of leucine and asparagine inhibit protein degradation particularly strongly.The inhibition brought about by asparagine plus leucine is not additive to the inhibition by propylamine, a lysosomotropic inhibitor; thus indicating that the amino acids act exclusively upon the lysosomal pathway of protein degradation.Following a lag of about 15 min the effect of asparagine plus leucine is maximal and equal to the effect of propylamine, suggesting that their inhibition of the lysosomal pathway is complete as well as specific.Degradation of endocytosed 125I-labelled asialofetuin is not affected by asparagine plus leucine, indicating that the amino acids do not affect lysosomes directly, but rather inhibit autophagy at a step prior to the fusion of autophagic vacuoles with lysosomes.The aminotransferase inhibitor, aminooxyacetate, does not prevent the inhibitory effect of any of the amino acids, i.e. amino acid metabolites are apparently not involved.  相似文献   

18.
One of the major bottlenecks in many ab initio protein structure prediction methods is currently the selection of a small number of candidate structures for high‐resolution refinement from large sets of low‐resolution decoys. This step often includes a scoring by low‐resolution energy functions and a clustering of conformations by their pairwise root mean square deviations (RMSDs). As an efficient selection is crucial to reduce the overall computational cost of the predictions, any improvement in this direction can increase the overall performance of the predictions and the range of protein structures that can be predicted. We show here that the use of structural profiles, which can be predicted with good accuracy from the amino acid sequences of proteins, provides an efficient means to identify good candidate structures. Proteins 2010. © 2009 Wiley‐Liss, Inc.  相似文献   

19.
Although the HIV-1 Env gp120 and gp41 ectodomain have been extensively characterized in terms of structure and function, similar characterizations of the C-terminal tail (CTT) of HIV gp41 remain relatively limited and contradictory. The current study was designed to examine in detail CTT sequence conservation relative to gp120 and the gp41 ectodomain and to examine the conservation of predicted physicochemical and structural properties across a number of divergent HIV clades and groups. Results demonstrate that CTT sequences display intermediate levels of sequence evolution and diversity in comparison to the more diverse gp120 and the more conserved gp41 ectodomain. Despite the relatively high level of CTT sequence variation, the physicochemical properties of the lentivirus lytic peptide domains (LLPs) within the CTT are evidently highly conserved across clades/groups. Additionally, predictions using PEP-FOLD indicate a high level of structural similarity in the LLP regions that was confirmed by circular dichroism measurements of secondary structure of LLP peptides from clades B, C, and group O. Results demonstrate that LLP peptides adopt helical structure in the presence of SDS or trifluoroethanol but are predominantly unstructured in aqueous buffer. Thus, these data for the first time demonstrate strong conservations of characteristic CTT physicochemical and structural properties despite substantial sequence diversity, apparently indicating a delicate balance between evolutionary pressures and the conservation of CTT structure and associated functional roles in virus replication.  相似文献   

20.
Han X  Chen Y  Gao W  Xue J  Han X  Fang Z  Yang C  Wu X 《Mathematical biosciences》2007,207(1):78-88
It is widely accepted that the APD (action potential duration) restitution plays a key role in the initializing and maintaining of the reentry arrhythmias. The Luo-Rudy II models paced with different protocols showed that the current APD had a complex relation with the previous APDs and diastole intervals (DIs). This relation could not be accurately described by a single exponential function. We used an artificial neural network to formularize this relation. The results suggested that back-propagation (BP) network could predict the current APD from the information of the first three previous beats. This would help provide a target for potential anti-arrhythmic therapies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号