首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Different nonsynonymous changes may be under different selective pressure during evolution. Of the 190 possible interchanges among the 20 amino acids, only 75 can be attained by a single-base substitution. An evolutionary index (EI) can be empirically computed for each of the 75 elementary changes as the likelihood of substitutions, relative to that of synonymous changes. We used 280, 1,306, 2,488, and 309 orthologous genes from primates (human versus Old World monkey), rodents (mouse versus rat), yeast (S. cerevisiae versus S. paradoxus), and Drosophila (D. melanogaster versus D. simulans), respectively, to estimate the EIs. In each data set, EI varies more than 10-fold, and the correlation coefficients of EIs from the pairwise comparisons are high (e.g., r = 0.91 between rodent and yeast). The high correlations suggest that the amino acid properties are strong determinants of protein evolution, irrespective of the identities of the proteins or the taxa of interest. However, these properties are not well captured in conventional measures of amino acid exchangeability. We, therefore, propose a universal index of exchange (U): for any large data set, its EI can be expressed as U*R, where R is the average Ka/Ks for that data set. The codon-based, empirically determined EI (i.e., U*R) makes much better predictions on protein evolution than do previous methods.  相似文献   

3.
Over the years, there have been claims that evolution proceeds according to systematically different processes over different timescales and that protein evolution behaves in a non-Markovian manner. On the other hand, Markov models are fundamental to many applications in evolutionary studies. Apparent non-Markovian or time-dependent behavior has been attributed to influence of the genetic code at short timescales and dominance of physicochemical properties of the amino acids at long timescales. However, any long time period is simply the accumulation of many short time periods, and it remains unclear why evolution should appear to act systematically differently across the range of timescales studied. We show that the observed time-dependent behavior can be explained qualitatively by modeling protein sequence evolution as an aggregated Markov process (AMP): a time-homogeneous Markovian substitution model observed only at the level of the amino acids encoded by the protein-coding DNA sequence. The study of AMPs sheds new light on the relationship between amino acid-level and codon-level models of sequence evolution, and our results suggest that protein evolution should be modeled at the codon level rather than using amino acid substitution models.  相似文献   

4.
Model-based phylogenetic reconstruction methods traditionally assume homogeneity of nucleotide frequencies among sequence sites and lineages. Yet, heterogeneity in base composition is a characteristic shared by most biological sequences. Compositional variation in time, reflected in the compositional biases among contemporary sequences, has already been extensively studied, and its detrimental effects on phylogenetic estimates are known. However, fewer studies have focused on the effects of spatial compositional heterogeneity within genes. We show here that different sites in an alignment do not always share a unique compositional pattern, and we provide examples where nucleotide frequency trends are correlated with the site-specific rate of evolution in RNA genes. Spatial compositional heterogeneity is shown to affect the estimation of evolutionary parameters. With standard phylogenetic methods, estimates of equilibrium frequencies are found to be biased towards the composition observed at fast-evolving sites. Conversely, the ancestral composition estimates of some time-heterogeneous but spatially homogeneous methods are found to be biased towards frequencies observed at invariant and slow-evolving sites. The latter finding challenges the result of a previous study arguing against a hyperthermophilic last universal ancestor from the low apparent G + C content of its rRNA sequences. We propose a new model to account for compositional variation across sites. A Gaussian process prior is used to allow for a smooth change in composition with evolutionary rate. The model has been implemented in the phylogenetic inference software PHASE, and Bayesian methods can be used to obtain the model parameters. The results suggest that this model can accurately capture the observed trends in present-day RNA sequences.  相似文献   

5.
The amino acid sequences of proteins provide rich information for inferring distant phylogenetic relationships and for predicting protein functions. Estimating the rate matrix of residue substitutions from amino acid sequences is also important because the rate matrix can be used to develop scoring matrices for sequence alignment. Here we use a continuous time Markov process to model the substitution rates of residues and develop a Bayesian Markov chain Monte Carlo method for rate estimation. We validate our method using simulated artificial protein sequences. Because different local regions such as binding surfaces and the protein interior core experience different selection pressures due to functional or stability constraints, we use our method to estimate the substitution rates of local regions. Our results show that the substitution rates are very different for residues in the buried core and residues on the solvent-exposed surfaces. In addition, the rest of the proteins on the binding surfaces also have very different substitution rates from residues. Based on these findings, we further develop a method for protein function prediction by surface matching using scoring matrices derived from estimated substitution rates for residues located on the binding surfaces. We show with examples that our method is effective in identifying functionally related proteins that have overall low sequence identity, a task known to be very challenging.  相似文献   

6.
A statistical approach was applied to select those models that best fit each individual mitochondrial (mt) protein at different taxonomic levels of metazoans. The existing mitochondrial replacement matrices, MtREV and MtMam, were found to be the best-fit models for the mt-proteins of vertebrates, with the exception of Nd6, at different taxonomic levels. Remarkably, existing mitochondrial matrices generally failed to best-fit invertebrate mt-proteins. In an attempt to better model the evolution of invertebrate mt-proteins, a new replacement matrix, named MtArt, was constructed based on arthropod mt-proteomes. The new model was found to best fit almost all analyzed invertebrate mt-protein data sets. The observed pattern of model fit across the different data sets indicates that no single replacement matrix is able to describe the general evolutionary properties of mt-proteins but rather that taxonomical biases and/or the existence of different mt-genetic codes have great influence on which model is selected.  相似文献   

7.
The changes in hind leg tissue (muscle and skin) amono acid pool size and arteriovenous balance were measured in rats subjected to 0–90 min of cold exposure (4°C). Tissue free amino acid pools presented a different composition pattern from protein amino acids. Muscle rapidly reacted to cold exposure by releasing small amounts of some amino acids (alanine, aspartate), with only small changes in pool size during the first 30 min. Amino acid oxidation was very limited during the whole period of cold exposure, since at all times tested there was either nil ammonia efflux or net absorption of ammonia and glutamine; i.e. the muscle was in positive nitrogen balance throughout the period studied. Thus most of the amino acid nitrogen taken up from the blood and not found in the free amino pools must have been incorporated into protein, since it was not oxidized, as shown by the glutamine and ammonia blance. The data on amino acid incorporation into proteins indicate that hind leg protein turnover is rapidly and widely modulated from a low initial setting upon cold exposure to a higher protein synthesis rate immediately afterwards, suggesting that protein turnover may have an important role in short-term events in cold-exposed muscle, in addition to its influence in long-term adaptation.  相似文献   

8.
We derive an expectation maximization algorithm for maximum-likelihood training of substitution rate matrices from multiple sequence alignments. The algorithm can be used to train hidden substitution models, where the structural context of a residue is treated as a hidden variable that can evolve over time. We used the algorithm to train hidden substitution matrices on protein alignments in the Pfam database. Measuring the accuracy of multiple alignment algorithms with reference to BAliBASE (a database of structural reference alignments) our substitution matrices consistently outperform the PAM series, with the improvement steadily increasing as up to four hidden site classes are added. We discuss several applications of this algorithm in bioinformatics.  相似文献   

9.
The presence in proteins of amino acid residues that change in concert during evolution is associated with keeping constant the protein spatial structure and functions. As in the case with morphological features, correlated substitutions may become the cause of homoplasies--the independent evolution of identical non-homological adaptations. Our data obtained on model phylogenetic trees and corresponding sets of sequences have shown that the presence of correlated substitutions distorts the results of phylogenetic reconstructions. A method for accounting for co-evolving amino acid residues in phylogenetic analysis is proposed. According to this method, only a single site from the group of correlated amino acid positions should remain, whereas other positions should not be used in further phylogenetic analysis. Simulations performed have shown that replacement on the average of 8% of variable positions in a pair of model sequences by coordinately evolving amino acid residues is able to change the tree topology. The removal of such amino acid residues from sequences before phylogenetic analysis restores the correct topology.  相似文献   

10.
We develop an approximate maximum likelihood method to estimate flanking nucleotide context-dependent mutation rates and amino acid exchange-dependent selection in orthologous protein-coding sequences and use it to analyze genome-wide coding sequence alignments from mammals and yeast. Allowing context-dependent mutation provides a better fit to coding sequence data than simpler (context-independent or CpG "hotspot") models and significantly affects selection parameter estimates. Allowing asymmetric (nonreciprocal) selection on amino acid exchanges gives a better fit than simple dN/dS or symmetric selection models. Relative selection strength estimates from our models show good agreement with independent estimates derived from human disease-causing and engineered mutations. Selection strengths depend on local protein structure, showing expected biophysical trends in helical versus nonhelical regions and increased asymmetry on polar-hydrophobic exchanges with increased burial. The more stringent selection that has previously been observed for highly expressed proteins is primarily concentrated in buried regions, supporting the notion that such proteins are under stronger than average selection for stability. Our analyses indicate that a highly parameterized model of mutation and selection is computationally tractable and is a useful tool for exploring a variety of biological questions concerning protein and coding sequence evolution.  相似文献   

11.
12.
Many phylogenetic inference methods are based on Markov models of sequence evolution. These are usually expressed in terms of a matrix (Q) of instantaneous rates of change but some models of amino acid replacement, most notably the PAM model of Dayhoff and colleagues, were originally published only in terms of time-dependent probability matrices (P(t)). Previously published methods for deriving Q have used eigen-decomposition of an approximation to P(t). We show that the commonly used value of t is too large to ensure convergence of the estimates of elements of Q. We describe two simpler alternative methods for deriving Q from information such as that published by Dayhoff and colleagues. Neither of these methods requires approximation or eigen-decomposition. We identify the methods used to derive various different versions of the Dayhoff model in current software, perform a comparison of existing and new implementations, and, to facilitate agreement among scientists using supposedly identical models, recommend that one of the new methods be used as a standard.  相似文献   

13.
The use of amino acid sequence analysis in assessing evolution   总被引:1,自引:0,他引:1  
The thirteen year history of assessing evolution by amino acid sequence analysis has made apparent the limitations imposed upon this system by the finite nature of the characters. This finiteness exists on several levels and ultimately expresses itself as parallelism, back mutation and the retention of primitive characters in the sequences of proteins from present day species and the putative ancestral protein chains. Sequence analysis shares these problems with other molecular approaches, but because it is concerned both with the nucleotide substitutions in the genome and with the functional roles of proteins, it has unique advantages. For example, the large fluctuation in the rate of fixation of mutations in a protein's evolution can be detected and used to point out the unreliability of any molecular clock for estimating divergence dates. Moreover, when consideration is given to studies which assign functional significance to specific amino acid sites in a protein, changes in function during the descent of a protein can be appreciated and their significance correlated with organismal evolution.  相似文献   

14.
The main goal of the protein evolutionist is the reconstruction of past events leading to the structures of contemporary proteins. The common strategy is to align amino acid sequences and make inferences about matters of common ancestry. The rate of change of amino acid sequence varies greatly from protein to protein, and this naturally affects how far back a given protein's ancestry can be traced. Happily, the rate of change of many proteins is slow enough that very ancient events can be inferred. Many mainstream metabolic enzymes, for example, are 40-50% identical in prokaryotes and eukaryotes, groups that diverged from a common ancestor more than 1.5 billion years ago. Moreover, some eukaryotic proteins like actin and tubulin change so slowly that they are seldom less than 60% identical, no matter from what source they are drawn. As it happens, prokaryotic counterparts for many eukaryotic cytoskeletal proteins are unknown. A recent exception involves the finding that a heat shock protein cognate is a relative of actin. The gene duplication that gave rise to these two proteins must have been an ancient event. The more recent invention of other proteins whose distribution is restricted to one or the other of the major kingdoms may be easier to trace. Among the factors that can confound the reconstruction of events, however, are occasional horizontal gene transfers and exon shuffling. The latter has led to a number of mosaic proteins, many of which contain various combinations of a relatively small set of modules like the epidermal growth factor domain.  相似文献   

15.
A method is presented for the quantitative estimation of the individual amino acid radioactivity in biological samples. The material is deproteinized with cold acetone, and, after acetone evaporation, is passed through a column containing 1 g of Amberlite XAD-2, then eluted with 10% ethanol. The samples are derivatized with Sanger's reagent (alkaline 1-fluoro-2,4-dinitrobenzene) and passed again through the Amberlite XAD-2 column; the 10% ethanol eluate is now discarded and the DNP-amino acids eluted with acetone. Aliquots are used for TLC chromatography on Silicagel plates; the spots are identified, cut away and their radioactivity estimated. The actual recovery of radioactivity in the spots is about 86-92% of the initial radioactivity. No contamination with radioactive glucose, lactate, pyruvate or glycerol has been observed.  相似文献   

16.
Using an information theoretic formalism, we optimize classes of amino acid substitution to be maximally indicative of local protein structure. Our statistically-derived classes are loosely identifiable with the heuristic constructions found in previously published work. However, while these other methods provide a more rigid idealization of physicochemically constrained residue substitution, our classes provide substantially more structural information with many fewer parameters. Moreover, these substitution classes are consistent with the paradigmatic view of the sequence-to-structure relationship in globular proteins which holds that the three-dimensional architecture is predominantly determined by the arrangement of hydrophobic and polar side chains with weak constraints on the actual amino acid identities. More specific constraints are imposed on the placement of prolines, glycines, and the charged residues. These substitution classes have been used in highly accurate predictions of residue solvent accessibility. They could also be used in the identification of homologous proteins, the construction and refinement of multiple sequence alignments, and as a means of condensing and codifying the information in multiple sequence alignments for secondary structure prediction and tertiary fold recognition. © 1996 Wiley-Liss, Inc.  相似文献   

17.
In this study, n-peptide compositions are utilized for protein vectorization over a discriminative remote homology detection framework based on support vector machines (SVMs). The size of amino acid alphabet is gradually reduced for increasing values of n to make the method to conform with the memory resources in conventional workstations. A hash structure is implemented for accelerated search of n-peptides. The method is tested to see its ability to classify proteins into families on a subset of SCOP family database and compared against many of the existing homology detection methods including the most popular generative methods; SAM-98 and PSI-BLAST and the recent SVM methods; SVM-Fisher, SVM-BLAST and SVM-Pairwise. The results have demonstrated that the new method significantly outperforms SVM-Fisher, SVM-BLAST, SAM-98 and PSI-BLAST, while achieving a comparable accuracy with SVM-Pairwise. In terms of efficiency, it performs much better than SVM-Pairwise. It is shown that the information of n-peptide compositions with reduced amino acid alphabets provides an accurate and efficient means of protein vectorization for SVM-based sequence classification.  相似文献   

18.
The proportion of amino acid substitutions driven by adaptive evolution can potentially be estimated from polymorphism and divergence data by an extension of the McDonald-Kreitman test. We have developed a maximum-likelihood method to do this and have applied our method to several data sets from three Drosophila species: D. melanogaster, D. simulans, and D. yakuba. The estimated number of adaptive substitutions per codon is not uniformly distributed among genes, but follows a leptokurtic distribution. However, the proportion of amino acid substitutions fixed by adaptive evolution seems to be remarkably constant across the genome (i.e., the proportion of amino acid substitutions that are adaptive appears to be the same in fast-evolving and slow-evolving genes; fast-evolving genes have higher numbers of both adaptive and neutral substitutions). Our estimates do not seem to be significantly biased by selection on synonymous codon use or by the assumption of independence among sites. Nevertheless, an accurate estimate is hampered by the existence of slightly deleterious mutations and variations in effective population size. The analysis of several Drosophila data sets suggests that approximately 25% +/- 20% of amino acid substitutions were driven by positive selection in the divergence between D. simulans and D. yakuba.  相似文献   

19.
氨基酸转运载体LAT1研究进展   总被引:2,自引:0,他引:2  
哺乳动物氨基酸的跨膜运输由多种氨基酸转运载体蛋白介导,其中L型氨基酸转运载体1(LAT1)属于L系统,主要转运大分子支链氨基酸和芳香族中性氨基酸。研究表明,LAT1广泛存在于哺乳动物肝脏、骨髓、大脑、胎盘、心脏和睾丸组织中,LAT1在恶性肿瘤中大量表达,对其不断的增殖起着重要的作用。目前国内对氨基酸转运载体LAT1的研究仍是空白,鉴于LAT1的研究在医学、营养等生命科学领域的研究意义,本文就氨基酸转运载体蛋白LAT1的表达、调节及其相关研究进展作一综述。  相似文献   

20.
Protein sequence world is considerably larger than structure world. In consequence, numerous non-related sequences may adopt similar 3D folds and different kinds of amino acids may thus be found in similar 3D structures. By grouping together the 20 amino acids into a smaller number of representative residues with similar features, sequence world simplification may be achieved. This clustering hence defines a reduced amino acid alphabet (reduced AAA). Numerous works have shown that protein 3D structures are composed of a limited number of building blocks, defining a structural alphabet. We previously identified such an alphabet composed of 16 representative structural motifs (5-residues length) called Protein Blocks (PBs). This alphabet permits to translate the structure (3D) in sequence of PBs (1D). Based on these two concepts, reduced AAA and PBs, we analyzed the distributions of the different kinds of amino acids and their equivalences in the structural context. Different reduced sets were considered. Recurrent amino acid associations were found in all the local structures while other were specific of some local structures (PBs) (e.g Cysteine, Histidine, Threonine and Serine for the alpha-helix Ncap). Some similar associations are found in other reduced AAAs, e.g Ile with Val, or hydrophobic aromatic residues Trp with Phe and Tyr. We put into evidence interesting alternative associations. This highlights the dependence on the information considered (sequence or structure). This approach, equivalent to a substitution matrix, could be useful for designing protein sequence with different features (for instance adaptation to environment) while preserving mainly the 3D fold.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号