首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Goodarzi H  Nejad HA  Torabi N 《Bio Systems》2004,77(1-3):163-173
The existence of nonrandom patterns in codon assignments is supported by many statistical and biochemical studies. The canonical genetic code is known to be highly efficient in minimizing the effects of mistranslation errors and point mutations. For example, it is known that when an error induces the conversion of an amino acid to another, the biochemical properties of the resulting amino acid are usually very similar to that of the original. Prior studies include many attempts at quantitative estimation of the fraction of randomly generated codes which, based upon load minimization, score higher than the canonical genetic code. In this study, we took into consideration both the relative frequencies of amino acids and nonsense mistranslations, factors which had been previously ignored. Incorporation of these parameters, resulted in a fitness function (phi) which rendered the canonical genetic code to be highly optimized with respect to load minimization. Considering termination codons, we applied a biosynthetic version of the coevolution theory, however, with low significance. We employed a revised cost for the precursor-product pairs of amino acids and showed that the significance of this approach depends on the cost measure matrix used by the researcher. Thus, we have compared the two prominent matrices, point accepted mutations 74-100 (PAM(74-100)) and mutation matrix in our study.  相似文献   

2.
Nonrandom patterns in codon assignments are supported by many statistical and biochemical studies in the last two decades. The canonical genetic code is known to be highly efficient in minimizing the effects of mistranslational errors and point mutations, an ability, which in term is designated "load minimization". Prior studies have included many attempts at quantitative estimation of the fraction of randomly generated codes, which in terms of load minimization, score higher than the canonical genetic code. In this study, a neural network, which estimates a highly optimized genetic code in a relatively short period of time has been devised. Several fitness functions were used throughout this text. Meanwhile, we have made use of two cost measure matrices, PAM74-100 and mutation matrix.  相似文献   

3.
Evolution of the amino acid substitution in the mammalian myoglobin gene   总被引:1,自引:0,他引:1  
Summary Multivariate statistical analyses were applied to 16 physical and chemical properties of amino acids. Four of these properties; volume, polarity, isoelectric point (charge), and hydrophobicity were found to explain adequately 96% of the total variance of amino acid attributes. Using these four quantitative measures of amino acid properties, a structural discriminate function in the form of a weighted difference sum of squares equation was developed. The discriminate function is weighted by the location of each particular residue within a given tertiary structure and yields a numerical discriminate or difference value for the replacement of these residues by different amino acids. This resulting discriminate value represents an expression of the perturbation in the local positional environment of a protein when an amino acid substitution occurs. With the use of this structural discriminate function, a residue by residue comparison of the known mammalian myoglobin sequences was carried out in an attempt to elucidate the positions of possible deviations from the known tertiary structure of sperm whale myoglobin. Only 11 of the 153 residue positions in myoglobin demonstrated possible structural deviations. From this analysis, indices of difference were calculated for all amino acid exchanges between the various myoglobins. All comparisons yielded indices of difference that were considerably lower than would be expected if mutations had been fixed at random, even if the organization of the genetic code is taken into consideration. On the basis of these results, it is inferred that some form of selection has acted in the evolution of mammalian myoglobins to favor amino acid substitutions that are compatible with the retention of the original conformation of the protein.  相似文献   

4.
Statistical and biochemical studies have revealed non-random patterns in codon assignments. The canonical genetic code is known to be highly efficient in minimizing the effects of mistranslation errors and point mutations, since it is known that when an amino acid is converted to another due to error, the biochemical properties of the resulted amino acid are usually very similar to those of the original one. In this study, using altered forms of the fitness functions used in the prior studies, we have optimized the parameters involved in the calculation of the error minimizing property of the genetic code so that the genetic code outscores the random codes as much as possible. This work also compares two prominent matrices, the Mutation Matrix and Point Accepted Mutations 74-100 (PAM(74-100)). It has been resulted that the hypothetical properties of the coevolution theory of the genetic code are already considered in PAM(74-100), giving more evidence on the existence of bias towards the genetic code in this matrix. Furthermore, our results indicate that PAM(74-100) is biased towards the single base mistranslation occurrences in second codon position as well as the frequency of amino acids. Thus PAM(74-100) is not a suitable substitution matrix for the studies conducted on the evolution of the genetic code.  相似文献   

5.
6.
The amino acid sequences of proteins provide rich information for inferring distant phylogenetic relationships and for predicting protein functions. Estimating the rate matrix of residue substitutions from amino acid sequences is also important because the rate matrix can be used to develop scoring matrices for sequence alignment. Here we use a continuous time Markov process to model the substitution rates of residues and develop a Bayesian Markov chain Monte Carlo method for rate estimation. We validate our method using simulated artificial protein sequences. Because different local regions such as binding surfaces and the protein interior core experience different selection pressures due to functional or stability constraints, we use our method to estimate the substitution rates of local regions. Our results show that the substitution rates are very different for residues in the buried core and residues on the solvent-exposed surfaces. In addition, the rest of the proteins on the binding surfaces also have very different substitution rates from residues. Based on these findings, we further develop a method for protein function prediction by surface matching using scoring matrices derived from estimated substitution rates for residues located on the binding surfaces. We show with examples that our method is effective in identifying functionally related proteins that have overall low sequence identity, a task known to be very challenging.  相似文献   

7.
We investigate the relationship between the flexibility, expressed with B‐factor, and the relative solvent accessibility (RSA) in the context of local, with respect to the sequence, neighborhood and related concepts such as residue depth. We observe that the flexibility of a given residue is strongly influenced by the solvent accessibility of the adjacent neighbors. The mean normalized B‐factor of the exposed residues with two buried neighbors is smaller than that of the buried residues with two exposed neighbors. Inclusion of RSA of the neighboring residues (local RSA) significantly increases correlation with the B‐factor. Correlation between the local RSA and B‐factor is shown to be stronger than the correlation that considers local distance‐ or volume‐based residue depth. We also found that the correlation coefficients between B‐factor and RSA for the 20 amino acids, called flexibility‐exposure correlation index, are strongly correlated with the stability scale that characterizes the average contributions of each amino acid to the folding stability. Our results reveal that the predicted RSA could be used to distinguish between the disordered and ordered residues and that the inclusion of local predicted RSA values helps providing a better contrast between these two types of residues. Prediction models developed based on local actual RSA and local predicted RSA show similar or better results in the context of B‐factor and disorder predictions when compared with several existing approaches. We validate our models using three case studies, which show that this work provides useful clues for deciphering the structure–flexibility–function relation. Proteins 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

8.
Hierarchical classifications of the 20 amino acids according to residue relationships within scoring matrices have not hitherto been tested for reliability. In fact, testing here of the residue groupings obtained thus from 18 published matrices shows that they vary considerably in reliability. This behaviour gives a new insight then into the matrices with respect to the relationships between the amino acid scores contained therein. For example, other than the trivial grouping of the 20 amino acids, no reliable residue groupings are present in all 18 matrix amino acid hierarchical classifications. Hierarchical classification of the 18 scoring matrices themselves is investigated in terms of matrix representation and choice of similarity and dissimilarity measures for matrix comparison. There is no absolute standard against which to compare a matrix clustering, of course, but it is possible to assess the usefulness of a measure for the purpose in terms of the reliability of the calculated tree. Matrix representation is shown to be important. Finally, a novel two-step approach for hierarchical classification of the 18 amino acid scoring matrices is described.  相似文献   

9.
MOTIVATION: Fold recognition is a key step in the protein structure discovery process, especially when traditional sequence comparison methods fail to yield convincing structural homologies. Although many methods have been developed for protein fold recognition, their accuracies remain low. This can be attributed to insufficient exploitation of fold discriminatory features. RESULTS: We have developed a new method for protein fold recognition using structural information of amino acid residues and amino acid residue pairs. Since protein fold recognition can be treated as a protein fold classification problem, we have developed a Support Vector Machine (SVM) based classifier approach that uses secondary structural state and solvent accessibility state frequencies of amino acids and amino acid pairs as feature vectors. Among the individual properties examined secondary structural state frequencies of amino acids gave an overall accuracy of 65.2% for fold discrimination, which is better than the accuracy by any method reported so far in the literature. Combination of secondary structural state frequencies with solvent accessibility state frequencies of amino acids and amino acid pairs further improved the fold discrimination accuracy to more than 70%, which is approximately 8% higher than the best available method. In this study we have also tested, for the first time, an all-together multi-class method known as Crammer and Singer method for protein fold classification. Our studies reveal that the three multi-class classification methods, namely one versus all, one versus one and Crammer and Singer method, yield similar predictions. AVAILABILITY: Dataset and stand-alone program are available upon request.  相似文献   

10.
Using an information theoretic formalism, we optimize classes of amino acid substitution to be maximally indicative of local protein structure. Our statistically-derived classes are loosely identifiable with the heuristic constructions found in previously published work. However, while these other methods provide a more rigid idealization of physicochemically constrained residue substitution, our classes provide substantially more structural information with many fewer parameters. Moreover, these substitution classes are consistent with the paradigmatic view of the sequence-to-structure relationship in globular proteins which holds that the three-dimensional architecture is predominantly determined by the arrangement of hydrophobic and polar side chains with weak constraints on the actual amino acid identities. More specific constraints are imposed on the placement of prolines, glycines, and the charged residues. These substitution classes have been used in highly accurate predictions of residue solvent accessibility. They could also be used in the identification of homologous proteins, the construction and refinement of multiple sequence alignments, and as a means of condensing and codifying the information in multiple sequence alignments for secondary structure prediction and tertiary fold recognition. © 1996 Wiley-Liss, Inc.  相似文献   

11.
Prediction of protein structure from its amino acid sequence is still a challenging problem. The complete physicochemical understanding of protein folding is essential for the accurate structure prediction. Knowledge of residue solvent accessibility gives useful insights into protein structure prediction and function prediction. In this work, we propose a random forest method, RSARF, to predict residue accessible surface area from protein sequence information. The training and testing was performed using 120 proteins containing 22006 residues. For each residue, buried and exposed state was computed using five thresholds (0%, 5%, 10%, 25%, and 50%). The prediction accuracy for 0%, 5%, 10%, 25%, and 50% thresholds are 72.9%, 78.25%, 78.12%, 77.57% and 72.07% respectively. Further, comparison of RSARF with other methods using a benchmark dataset containing 20 proteins shows that our approach is useful for prediction of residue solvent accessibility from protein sequence without using structural information. The RSARF program, datasets and supplementary data are available at http://caps.ncbs.res.in/download/pugal/RSARF/.  相似文献   

12.
Standard protein substitution models use a single amino acid replacement rate matrix that summarizes the biological, chemical and physical properties of amino acids. However, site evolution is highly heterogeneous and depends on many factors: genetic code; solvent exposure; secondary and tertiary structure; protein function; etc. These impact the substitution pattern and, in most cases, a single replacement matrix is not enough to represent all the complexity of the evolutionary processes. This paper explores in maximum-likelihood framework phylogenetic mixture models that combine several amino acid replacement matrices to better fit protein evolution.We learn these mixture models from a large alignment database extracted from HSSP, and test the performance using independent alignments from TREEBASE.We compare unsupervised learning approaches, where the site categories are unknown, to supervised ones, where in estimations we use the known category of each site, based on its exposure or its secondary structure. All our models are combined with gamma-distributed rates across sites. Results show that highly significant likelihood gains are obtained when using mixture models compared with the best available single replacement matrices. Mixtures of matrices also improve over mixtures of profiles in the manner of the CAT model. The unsupervised approach tends to be better than the supervised one, but it appears difficult to implement and highly sensitive to the starting values of the parameters, meaning that the supervised approach is still of interest for initialization and model comparison. Using an unsupervised model involving three matrices, the average AIC gain per site with TREEBASE test alignments is 0.31, 0.49 and 0.61 compared with LG (named after Le & Gascuel 2008 Mol. Biol. Evol. 25, 1307-1320), WAG and JTT, respectively. This three-matrix model is significantly better than LG for 34 alignments (among 57), and significantly worse for 1 alignment only. Moreover, tree topologies inferred with our mixture models frequently differ from those obtained with single matrices, indicating that using these mixtures impacts not only the likelihood value but also the output tree. All our models and a PhyML implementation are available from http://atgc.lirmm.fr/mixtures.  相似文献   

13.
The genomic era has seen a remarkable increase in the number of genomes being sequenced and annotated. Nonetheless, annotation remains a serious challenge for compositionally biased genomes. For the preliminary annotation, popular nucleotide and protein comparison methods such as BLAST are widely employed. These methods make use of matrices to score alignments such as the amino acid substitution matrices. Since a nucleotide bias leads to an overall bias in the amino acid composition of proteins, it is possible that a genome with nucleotide bias may have introduced atypical amino acid substitutions in its proteome. Consequently, standard matrices fail to perform well in sequence analysis of these genomes. To address this issue, we examined the amino acid substitution in the AT-rich genome of Plasmodium falciparum, chosen as a reference and reconstituted a substitution matrix in the genome's context. The matrix was used to generate protein sequence alignments for the parasite proteins that improved across the functional regions. We attribute this to the consistency that may have been achieved amid the target and background frequencies calculated exclusively in our study. This study has important implications on annotation of proteins that are of experimental interest but give poor sequence alignments with standard conventional matrices.  相似文献   

14.
The conformational preferences of azaphenylalanine-containing peptide were investigated using a model compound, Ac-azaPhe-NHMe with ab initio method at the HF/3-21G and HF/6-31G(*) levels, and the seven minimum energy conformations with trans orientation of acetyl group and the 4 minimum energy conformations with cis orientation of acetyl group were found at the HF/6-31G(*) level if their mirror images were not considered. An average backbone dihedral angle of the 11 minimum energy conformations is phi=+/-91 degrees +/-24 degrees , psi =+/-18 degrees +/-10 degrees (or +/-169 degrees +/-8 degrees ), corresponding to the i+2 position of beta-turn (delta(R)) or polyproline II (beta(P)) structure, respectively. The chi(1) angle in the aromatic side chain of azaPhe residue adopts preferentially between +/-60 degrees and +/-130 degrees, which reflect a steric hindrance between the N-terminal carbonyl group or the C-terminal amide group and the aromatic side chain with respect to the configuration of the acetyl group. These conformational preferences of Ac-azaPhe-NHMe predicted theoretically were compared with those of For-Phe-NHMe to characterize the structural role of azaPhe residue. Four tripeptides containing azaPhe residue, Boc-Xaa-azaPhe-Ala-OMe [Xaa=Gly(1), Ala(2), Phe(3), Asn(4)] were designed and synthesized to verify whether the backbone torsion angles of azaPhe reside are still the same as compared with theoretical conformations and how the preceding amino acids of azaPhe residue perturb the beta-turn skeleton in solution. The solution conformations of these tripeptide models containing azaPhe residue were determined in CDCl(3) and DMSO solvents using NMR and molecular modeling techniques. The characteristic NOE patterns, the temperature coefficients of amide protons and small solvent accessibility for the azapeptides 1-4 reveal to adopt the beta-turn structure. The structures of azapeptides containing azaPhe residue from a restrained molecular dynamics simulation indicated that average dihedral angles [(phi(1), psi(1)), (phi(2), psi(2))] of Xaa-azaPhe fragment in azapeptide, Boc-Xaa-azaPhe-Ala-OMe were [(-68 degrees, 135 degrees ), (116 degrees, -1 degrees )], and this implies that the intercalation of an azaPhe residue in tripeptide induces the betaII-turn conformation, and the volume change of a preceding amino acid of azaPhe residue in tripeptides would not perturb seriously the backbone dihedral angle of beta-turn conformation. We believe such information could be critical in designing useful molecules containing azaPhe residue for drug discovery and peptide engineering.  相似文献   

15.
Miyazawa S 《PloS one》2011,6(3):e17244

Background

Empirical substitution matrices represent the average tendencies of substitutions over various protein families by sacrificing gene-level resolution. We develop a codon-based model, in which mutational tendencies of codon, a genetic code, and the strength of selective constraints against amino acid replacements can be tailored to a given gene. First, selective constraints averaged over proteins are estimated by maximizing the likelihood of each 1-PAM matrix of empirical amino acid (JTT, WAG, and LG) and codon (KHG) substitution matrices. Then, selective constraints specific to given proteins are approximated as a linear function of those estimated from the empirical substitution matrices.

Results

Akaike information criterion (AIC) values indicate that a model allowing multiple nucleotide changes fits the empirical substitution matrices significantly better. Also, the ML estimates of transition-transversion bias obtained from these empirical matrices are not so large as previously estimated. The selective constraints are characteristic of proteins rather than species. However, their relative strengths among amino acid pairs can be approximated not to depend very much on protein families but amino acid pairs, because the present model, in which selective constraints are approximated to be a linear function of those estimated from the JTT/WAG/LG/KHG matrices, can provide a good fit to other empirical substitution matrices including cpREV for chloroplast proteins and mtREV for vertebrate mitochondrial proteins.

Conclusions/Significance

The present codon-based model with the ML estimates of selective constraints and with adjustable mutation rates of nucleotide would be useful as a simple substitution model in ML and Bayesian inferences of molecular phylogenetic trees, and enables us to obtain biologically meaningful information at both nucleotide and amino acid levels from codon and protein sequences.  相似文献   

16.
Bioinformatic software has used various numerical encoding schemes to describe amino acid sequences. Orthogonal encoding, employing 20 numbers to describe the amino acid type of one protein residue, is often used with artificial neural network (ANN) models. However, this can increase the model complexity, thus leading to difficulty in implementation and poor performance. Here, we use ANNs to derive encoding schemes for the amino acid types from protein three-dimensional structure alignments. Each of the 20 amino acid types is characterized with a few real numbers. Our schemes are tested on the simulation of amino acid substitution matrices. These simplified schemes outperform the orthogonal encoding on small data sets. Using one of these encoding schemes, we generate a colouring scheme for the amino acids in which comparable amino acids are in similar colours. We expect it to be useful for visual inspection and manual editing of protein multiple sequence alignments.  相似文献   

17.
Amino acid substitution matrices play an essential role in protein sequence alignment, a fundamental task in bioinformatics. Most widely used matrices, such as PAM matrices derived from homologous sequences and BLOSUM matrices derived from aligned segments of PROSITE, did not integrate conformation information in their construction. There are a few structure-based matrices, which are derived from limited data of structure alignment. Using databases PDB_SELECT and DSSP, we create a database of sequence-conformation blocks which explicitly represent sequence-structure relationship. Members in a block are identical in conformation and are highly similar in sequence. From this block database, we derive a conformation-specific amino acid substitution matrix CBSM60. The matrix shows an improved performance in conformational segment search and homolog detection.  相似文献   

18.
An easy and uncomplicated method to predict the solvent accessibility state of a site in a multiple protein sequence alignment is described. The approach is based on amino acid exchange and compositional preference matrices for each of three accessibility states: buried, exposed, and intermediate. Calculations utilized a modified version of the 3D―ali databank, a collection of multiple sequence alignments anchored through protein tertiary structural superpositions. The technique achieves the same accuracy as much more complex methods and thus provides such advantages as computational affordability, facile updating, and easily understood residue substitution patterns useful to biochemists involved in protein engineering, design, and structural prediction. The program is available from the authors; and, due to its simplicity, the algorithm can be readily implemented on any system. For a given alignment site, a hand calculation can yield a comparative prediction. Proteins 32:190–199, 1998. © 1998 Wiley-Liss, Inc.  相似文献   

19.
Proteins evolve under a myriad of biophysical selection pressures that collectively control the patterns of amino acid substitutions. These evolutionary pressures are sufficiently consistent over time and across protein families to produce substitution patterns, summarized in global amino acid substitution matrices such as BLOSUM, JTT, WAG, and LG, which can be used to successfully detect homologs, infer phylogenies, and reconstruct ancestral sequences. Although the factors that govern the variation of amino acid substitution rates have received much attention, the influence of thermodynamic stability constraints remains unresolved. Here we develop a simple model to calculate amino acid substitution matrices from evolutionary dynamics controlled by a fitness function that reports on the thermodynamic effects of amino acid mutations in protein structures. This hybrid biophysical and evolutionary model accounts for nucleotide transition/transversion rate bias, multi‐nucleotide codon changes, the number of codons per amino acid, and thermodynamic protein stability. We find that our theoretical model accurately recapitulates the complex yet universal pattern observed in common global amino acid substitution matrices used in phylogenetics. These results suggest that selection for thermodynamically stable proteins, coupled with nucleotide mutation bias filtered by the structure of the genetic code, is the primary driver behind the global amino acid substitution patterns observed in proteins throughout the tree of life.  相似文献   

20.
Human genetic variation is the incarnation of diverse evolutionary history, which reflects both selectively advantageous and selectively neutral change. In this study, we catalogue structural and functional features of proteins that restrain genetic variation leading to single amino acid substitutions. Our variation dataset is divided into three categories: i) Mendelian disease-related variants, ii) neutral polymorphisms and iii) cancer somatic mutations. We characterize structural environments of the amino acid variants by the following properties: i) side-chain solvent accessibility, ii) main-chain secondary structure, and iii) hydrogen bonds from a side chain to a main chain or other side chains. To address functional restraints, amino acid substitutions in proteins are examined to see whether they are located at functionally important sites involved in protein-protein interactions, protein-ligand interactions or catalytic activity of enzymes. We also measure the likelihood of amino acid substitutions and the degree of residue conservation where variants occur. We show that various types of variants are under different degrees of structural and functional restraints, which affect their occurrence in human proteome.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号