首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
The aim of this work was to study the relationship between structure conservation and sequence divergence in protein evolution. To this end, we developed a model of structurally constrained protein evolution (SCPE) in which trial sequences, generated by random mutations at gene level, are selected against departure from a reference three-dimensional structure. Since at the mutational level SCPE is completely unbiased, any emergent sequence pattern will be due exclusively to structural constraints. In this first report, it is shown that SCPE correctly predicts the characteristic hexapeptide motif of the left-handed parallel beta helix (LbetaH) domain of UDP-N-acetylglucosamine acyltransferases (LpxA).  相似文献   

3.
The structurally constrained protein evolution (SCPE) model simulates protein divergence considering protein structure explicitly. The model is based on the observation that protein structure is more conserved during evolution than the sequences encoding for that structure. In the previous work, the SCPE model considered only the tertiary structure. Here we show that the performance of the model is enhanced when the oligomeric structure is taken into account. Our results agree with recent evolutionary studies of oligomeric proteins, which show that conservation of the quaternary structure imposes additional constraints on sequence divergence. The incorporation of protein-protein interactions into protein evolution models may be important in the study of quaternary protein structures and complex protein assemblies.  相似文献   

4.
We develop a new model for studying the molecular evolution of protein-coding DNA sequences. In contrast to existing models, we incorporate the potential for site-to-site heterogeneity of both synonymous and nonsynonymous substitution rates. We demonstrate that within-gene heterogeneity of synonymous substitution rates appears to be common. Using the new family of models, we investigate the utility of a variety of new statistical inference procedures, and we pay particular attention to issues surrounding the detection of sites undergoing positive selection. We discuss how failure to model synonymous rate variation in the model can lead to misidentification of sites as positively selected.  相似文献   

5.
Comparative sequence analyses, including such fundamental bioinformatics techniques as similarity searching, sequence alignment and phylogenetic inference, have become a mainstay for researchers studying type 1 Human Immunodeficiency Virus (HIV-1) genome structure and evolution. Implicit in comparative analyses is an underlying model of evolution, and the chosen model can significantly affect the results. In general, evolutionary models describe the probabilities of replacing one amino acid character with another over a period of time. Most widely used evolutionary models for protein sequences have been derived from curated alignments of hundreds of proteins, usually based on mammalian genomes. It is unclear to what extent these empirical models are generalizable to a very different organism, such as HIV-1-the most extensively sequenced organism in existence. We developed a maximum likelihood model fitting procedure to a collection of HIV-1 alignments sampled from different viral genes, and inferred two empirical substitution models, suitable for describing between-and within-host evolution. Our procedure pools the information from multiple sequence alignments, and provided software implementation can be run efficiently in parallel on a computer cluster. We describe how the inferred substitution models can be used to generate scoring matrices suitable for alignment and similarity searches. Our models had a consistently superior fit relative to the best existing models and to parameter-rich data-driven models when benchmarked on independent HIV-1 alignments, demonstrating evolutionary biases in amino-acid substitution that are unique to HIV, and that are not captured by the existing models. The scoring matrices derived from the models showed a marked difference from common amino-acid scoring matrices. The use of an appropriate evolutionary model recovered a known viral transmission history, whereas a poorly chosen model introduced phylogenetic error. We argue that our model derivation procedure is immediately applicable to other organisms with extensive sequence data available, such as Hepatitis C and Influenza A viruses.  相似文献   

6.
It is a central assumption of evolution that gene duplications provide the genetic raw material from which to create proteins with new functions. The increasing availability in multigene family sequences that has resulted from genome projects has inspired the creation of novel in silico approaches to predict details of protein function. The underlying principle of all such approaches is to compare the evolutionary properties of homologous sequence positions in paralogous proteins. It has been proposed that the positions that show switches in substitution rate over time-i.e., "heterotachous sites," are good indicators of functional divergence. Here, we analyzed the alpha and beta paralogous subunits of hemoglobin in search for such signatures. We found as many heterotachous sites in comparisons between groups of paralogous subunits (alpha/beta) as between orthologous ones (alpha/alpha, beta/beta). Thus, the importance of substitution rate shifts as predictors of specialization between protein subfamilies might be reconsidered. Instead, such shifts may reflect a more general process of protein evolution, consistent with the fact that they can be compatible with function conservation. As an alternative, we focused on those residues showing highly constrained states in two sequence groups, but different in each group, and we named them CBD (for "constant but different"). As opposed to heterotachous positions, CBD sites were markedly overrepresented in paralogous (alpha/beta) comparisons, as opposed to orthologous ones (alpha/alpha, beta/beta), identifying them as likely signatures of functional specialization between the two subunits. When superimposed onto the three-dimensional structure of hemoglobin, CBD positions consistently appeared to cluster preferentially on inter-subunit surfaces, two contact areas crucial to function in vertebrate tetrameric hemoglobin. The identification and analysis of CBD sites by complementing structural information with evolutionary data may represent a promising direction for future studies dealing with the functional characterization of a growing number of multigene families identified by complete genome analyses.  相似文献   

7.
The nonsynonymous (amino acid-altering) to synonymous (silent) substitution rate ratio (omega = d(N)/d(S)) provides a measure of natural selection at the protein level, with omega = 1, >1, and <1, indicating neutral evolution, purifying selection, and positive selection, respectively. Previous studies that used this measure to detect positive selection have often taken an approach of pairwise comparison, estimating substitution rates by averaging over all sites in the protein. As most amino acids in a functional protein are under structural and functional constraints and adaptive evolution probably affects only a few sites at a few time points, this approach of averaging rates over sites and over time has little power. Previously, we developed codon-based substitution models that allow the omega ratio to vary either among lineages or among sites. In this paper we extend previous models to allow the omega ratio to vary both among sites and among lineages and implement the new models in the likelihood framework. These models may be useful for identifying positive selection along prespecified lineages that affects only a few sites in the protein. We apply those branch-site models as well as previous branch- and site-specific models to three data sets: the lysozyme genes from primates, the tumor suppressor BRCA1 genes from primates, and the phytochrome (PHY) gene family in angiosperms. Positive selection is detected in the lysozyme and BRCA genes by both the new and the old models. However, only the new models detected positive selection acting on lineages after gene duplication in the PHY gene family. Additional tests on several data sets suggest that the new models may be useful in detecting positive selection after gene duplication in gene family evolution.  相似文献   

8.
Insect defensins containing cysteine-stabilized alpha/beta motifs (Cs-alpha/beta defensin) are cationic, inducible antibacterial peptides involved in humoral defence against pathogens. To examine trends in molecular evolution of these antimicrobial peptides, sequences similar to the well-characterized Cs-alpha/beta defensin peptide of Anopheles gambiae, using six cysteine residues as landmarks, were retrieved from genomic and protein databases. These sequences were derived from different orders of insects. Genes of insect Cs-alpha/beta defensin appear to constitute a multigene family in which the copy number varies between insect species. Phylogenetic analysis of these sequences revealed two main lineages, one group comprising mainly lepidopteran insects and a second, comprising Hemiptera, Coleoptera, Diptera and Hymenoptera insects. Moreover, the topology of the phylogram indicated dipteran Cs-alpha/beta defensins are diverse, suggesting diversity in immune mechanisms in this order of insects. Overall evolutionary analysis indicated marked diversification and expansion of mature defensin isoforms within the species of mosquitoes relative to non-mosquito defensins, implying the presence of finely tuned immune responses to counter pathogens. The observed higher synonymous substitution rate relative to the nonsynonymous rate in almost all the regions of Cs-alpha/beta defensin of mosquitoes suggests that these peptides are predominately under purifying selection. The maximum-likelihood models of codon substitution indicated selective pressure at different amino acid sites in mosquito mature Cs-alpha/beta defensins is differ and are undergoing adaptive evolution in comparison to non-mosquito Cs-alpha/beta defensins, for which such selection was inconspicuous; this suggests the acquisition of selective advantage of the Cs-alpha/beta defensins in the former group. Finally, this study represents the most detailed report on the evolutionary strategies of Cs-alpha/beta defensins of mosquitoes in particular and insects in general, and indicates that insect Cs-alpha/beta defensins have evolved by duplication followed by divergence, to produce a diverse set of paralogues.  相似文献   

9.
In Darwinian evolution, mutations occur approximately at random in a gene, turned into amino acid mutations by the genetic code. Some mutations are fixed to become substitutions and some are eliminated from the population. Partitioning pairs of closely related species with complete genome sequences by average population size of each pair, we looked at the substitution matrices generated for these partitions and compared the substitution patterns between species. We estimated a population genetic model that relates the relative fixation probabilities of different types of mutations to the selective pressure and population size. Parameterizations of the average and distribution of selective pressures for different amino acid substitution types in different population size comparisons were generated with a Bayesian framework. We found that partitions in population size as well as in substitution type are required to explain the substitution data. Selection coefficients were found to decrease with increasingly radical amino acid substitution and with increasing effective population size.To further explore the role of underlying processes in amino acid substitution, we analyzed embryophyte (plant) gene families from TAED (The Adaptive Evolution Database), where solved structures for at least one member exist in the Protein Data Bank. Using PAML, we assigned branches to three categories: strong negative selection, moderate negative selection/neutrality, and positive diversifying selection. Focusing on the first and third categories, we identified sites changing along gene family lineages and observed the spatial patterns of substitution. Selective sweeps were expected to create primary sequence clustering under positive diversifying selection. Co-evolution through direct physical interaction was expected to cause tertiary structural clustering. Under both positive and negative selection, the substitution patterns were found to be nonrandom. Under positive diversifying selection, significant independent signals were found for primary and tertiary sequence clustering, suggesting roles for both selective sweeps and direct physical interaction. Under strong negative selection, the signals were not found to be independent. All together, a complex interplay of population genetic and protein thermodynamics forces is suggested.  相似文献   

10.
The choice of a probabilistic model to describe sequence evolution can and should be justified. Underfitting the data through the use of overly simplistic models may miss out on interesting phenomena and lead to incorrect inferences. Overfitting the data with models that are too complex may ascribe biological meaning to statistical artifacts and result in falsely significant findings. We describe a likelihood-based approach for evolutionary model selection. The procedure employs a genetic algorithm (GA) to quickly explore a combinatorially large set of all possible time-reversible Markov models with a fixed number of substitution rates. When applied to stem RNA data subject to well-understood evolutionary forces, the models found by the GA 1) capture the expected overall rate patterns a priori; 2) fit the data better than the best available models based on a priori assumptions, suggesting subtle substitution patterns not previously recognized; 3) cannot be rejected in favor of the general reversible model, implying that the evolution of stem RNA sequences can be explained well with only a few substitution rate parameters; and 4) perform well on simulated data, both in terms of goodness of fit and the ability to estimate evolutionary rates. We also investigate the utility of several distance measures for comparing and contrasting inferred evolutionary models. Using widely available small computer clusters, our approach allows, for the first time, to evaluate the performance of existing RNA evolutionary models by comparing them with a large pool of candidate models and to validate common modeling assumptions. In addition, the new method provides the foundation for rigorous selection and comparison of substitution models for other types of sequence data.  相似文献   

11.
Cell adhesion to extracellular matrices is mediated by a set of heterodimeric cell surface receptors called integrins that might be the subject of regulation by growth and differentiation factors. We have examined the effect of transforming growth factor-beta 1 (TGF-beta 1) on the expression of the very late antigens or alpha beta 1 group of integrins in human cell lines. The six known members of this family share a common beta 1 subunit but have distinct alpha subunits that confer selective affinity toward type I collagen, fibronectin, laminin, and other as yet unknown cell adhesion proteins. Using a panel of specific antibodies and cDNA probes, we show that in WI-38 lung fibroblasts TGF-beta 1 elevates concomitantly the expression of alpha 1, alpha 2, alpha 3, alpha 5, and beta 1 integrin subunits at the protein and/or mRNA level, their assembly into the corresponding alpha beta 1 complexes, and their exposure on the cell surface. The rate of synthesis of total alpha subunits relative to beta 1 subunit is higher in TGF-beta 1-treated cells than in control cells. The characteristically slow (t1/2 approximately 10 h) rate of beta 1 conversion from precursor form to mature glycoprotein in untreated cells increases markedly (to t1/2 approximately 3 h) in response to TGF-beta 1. The results suggest that in WI-38 fibroblasts the beta 1 subunit is synthesized in excess over alpha subunits, and assembly of beta 1 subunits with rate-limiting alpha subunits is required for transit through the Golgi and exposure of alpha beta 1 complex on the cell surface. TGF-beta 1 does not induce the synthesis of integrin subunits that are not expressed in unstimulated cells, such as alpha 4 and alpha 6 subunits in WI-38 fibroblasts. However, alpha 4 and alpha 6 subunits can be regulated by TGF-beta in those cells that express them. The results suggest that TGF-beta regulates the expression of individual integrin subunits by parallel but independent mechanisms. By modifying the balance of individual alpha beta 1 integrins, TGF-beta 1 might modulate those aspects of cell migration, positioning, and development that are guided by adhesion to extracellular matrices.  相似文献   

12.
Influenza A virus is one of the best-studied viruses and a model organism for the study of molecular evolution; in particular, much research has focused on detecting natural selection on influenza virus proteins. Here, we study the dynamics of the synonymous and nonsynonymous nucleotide composition of influenza A virus genes. In several genes, the nucleotide frequencies at synonymous positions drift away from the equilibria predicted from the synonymous substitution matrices. We investigate possible reasons for this unexpected behavior by fitting several regression models. Relaxation toward a mutation-selection equilibrium following a host jump fails to explain the dynamics of the synonymous nucleotide composition, even if we allow for slow temporal changes in the substitution matrix. Instead, we find that deep internal branches of the phylogeny show distinct patterns of nucleotide substitution and that these branches strongly influence the dynamics of nucleotide composition, suggesting that the observed trends are at least in part a result of natural selection acting on synonymous sites. Moreover, we find that the dynamics of the nucleotide composition at synonymous and nonsynonymous sites are highly correlated, providing evidence that even nonsynonymous sites can be influenced by selection pressure for nucleotide composition.  相似文献   

13.
Proteins evolve under a myriad of biophysical selection pressures that collectively control the patterns of amino acid substitutions. These evolutionary pressures are sufficiently consistent over time and across protein families to produce substitution patterns, summarized in global amino acid substitution matrices such as BLOSUM, JTT, WAG, and LG, which can be used to successfully detect homologs, infer phylogenies, and reconstruct ancestral sequences. Although the factors that govern the variation of amino acid substitution rates have received much attention, the influence of thermodynamic stability constraints remains unresolved. Here we develop a simple model to calculate amino acid substitution matrices from evolutionary dynamics controlled by a fitness function that reports on the thermodynamic effects of amino acid mutations in protein structures. This hybrid biophysical and evolutionary model accounts for nucleotide transition/transversion rate bias, multi‐nucleotide codon changes, the number of codons per amino acid, and thermodynamic protein stability. We find that our theoretical model accurately recapitulates the complex yet universal pattern observed in common global amino acid substitution matrices used in phylogenetics. These results suggest that selection for thermodynamically stable proteins, coupled with nucleotide mutation bias filtered by the structure of the genetic code, is the primary driver behind the global amino acid substitution patterns observed in proteins throughout the tree of life.  相似文献   

14.
Functional shifts during protein evolution are expected to yield shifts in substitution rate, and statistical methods can test for this at both codon and amino acid levels. Although methods based on models of sequence evolution serve as powerful tools for studying evolutionary processes, violating underlying assumptions can lead to false biological conclusions. It is not unusual for functional shifts to be accompanied by changes in other aspects of the evolutionary process, such as codon or amino acid frequencies. However, models used to test for functional divergence assume these frequencies remain constant over time. We employed simulation to investigate the impact of non-stationary evolution on functional divergence inference. We investigated three likelihood ratio tests based on codon models and found varying degrees of sensitivity. Joint effects of shifts in frequencies and selection pressures can be large, leading to false signals for positive selection. Amino acid-based tests (FunDi and Bivar) were also compromised when several aspects of the substitution process were not adequately modeled. We applied the same tests to a core genome “scan” for functional divergence between light-adapted ecotypes of the cyanobacteria Prochlorococcus, and carried out gene-specific simulations for ten genes. Results of those simulations illustrated how the inference of functional divergence at the genomic level can be seriously impacted by model misspecification. Although computationally costly, simulations motivated by data in hand are warranted when several aspects of the substitution process are either misspecified or not included in the models upon which the statistical tests were built.  相似文献   

15.
Models of protein evolution currently come in two flavors: generalist and specialist. Generalist models (e.g. PAM, JTT, WAG) adopt a one-size-fits-all approach, where a single model is estimated from a number of different protein alignments. Specialist models (e.g. mtREV, rtREV, HIVbetween) can be estimated when a large quantity of data are available for a single organism or gene, and are intended for use on that organism or gene only. Unsurprisingly, specialist models outperform generalist models, but in most instances there simply are not enough data available to estimate them. We propose a method for estimating alignment-specific models of protein evolution in which the complexity of the model is adapted to suit the richness of the data. Our method uses non-negative matrix factorization (NNMF) to learn a set of basis matrices from a general dataset containing a large number of alignments of different proteins, thus capturing the dimensions of important variation. It then learns a set of weights that are specific to the organism or gene of interest and for which only a smaller dataset is available. Thus the alignment-specific model is obtained as a weighted sum of the basis matrices. Having been constrained to vary along only as many dimensions as the data justify, the model has far fewer parameters than would be required to estimate a specialist model. We show that our NNMF procedure produces models that outperform existing methods on all but one of 50 test alignments. The basis matrices we obtain confirm the expectation that amino acid properties tend to be conserved, and allow us to quantify, on specific alignments, how the strength of conservation varies across different properties. We also apply our new models to phylogeny inference and show that the resulting phylogenies are different from, and have improved likelihood over, those inferred under standard models.  相似文献   

16.
Baoqiang Cao  Ron Elber 《Proteins》2010,78(4):985-1003
We investigate small sequence adjustments (of one or a few amino acids) that induce large conformational transitions between distinct and stable folds of proteins. Such transitions are intriguing from evolutionary and protein‐design perspectives. They make it possible to search for ancient protein structures or to design protein switches that flip between folds and functions. A network of sequence flow between protein folds is computed for representative structures of the Protein Data Bank. The computed network is dense, on an average each structure is connected to tens of other folds. Proteins that attract sequences from a higher than expected number of neighboring folds are more likely to be enzymes and alpha/beta fold. The large number of connections between folds may reflect the need of enzymes to adjust their structures for alternative substrates. The network of the Cro family is discussed, and we speculate that capacity is an important factor (but not the only one) that determines protein evolution. The experimentally observed flip from all alpha to alpha + beta fold is examined by the network tools. A kinetic model for the transition of sequences between the folds (with only protein stability in mind) is proposed. Proteins 2010. © 2009 Wiley‐Liss, Inc.  相似文献   

17.
The amino acid sequences of proteins provide rich information for inferring distant phylogenetic relationships and for predicting protein functions. Estimating the rate matrix of residue substitutions from amino acid sequences is also important because the rate matrix can be used to develop scoring matrices for sequence alignment. Here we use a continuous time Markov process to model the substitution rates of residues and develop a Bayesian Markov chain Monte Carlo method for rate estimation. We validate our method using simulated artificial protein sequences. Because different local regions such as binding surfaces and the protein interior core experience different selection pressures due to functional or stability constraints, we use our method to estimate the substitution rates of local regions. Our results show that the substitution rates are very different for residues in the buried core and residues on the solvent-exposed surfaces. In addition, the rest of the proteins on the binding surfaces also have very different substitution rates from residues. Based on these findings, we further develop a method for protein function prediction by surface matching using scoring matrices derived from estimated substitution rates for residues located on the binding surfaces. We show with examples that our method is effective in identifying functionally related proteins that have overall low sequence identity, a task known to be very challenging.  相似文献   

18.
Standard protein substitution models use a single amino acid replacement rate matrix that summarizes the biological, chemical and physical properties of amino acids. However, site evolution is highly heterogeneous and depends on many factors: genetic code; solvent exposure; secondary and tertiary structure; protein function; etc. These impact the substitution pattern and, in most cases, a single replacement matrix is not enough to represent all the complexity of the evolutionary processes. This paper explores in maximum-likelihood framework phylogenetic mixture models that combine several amino acid replacement matrices to better fit protein evolution.We learn these mixture models from a large alignment database extracted from HSSP, and test the performance using independent alignments from TREEBASE.We compare unsupervised learning approaches, where the site categories are unknown, to supervised ones, where in estimations we use the known category of each site, based on its exposure or its secondary structure. All our models are combined with gamma-distributed rates across sites. Results show that highly significant likelihood gains are obtained when using mixture models compared with the best available single replacement matrices. Mixtures of matrices also improve over mixtures of profiles in the manner of the CAT model. The unsupervised approach tends to be better than the supervised one, but it appears difficult to implement and highly sensitive to the starting values of the parameters, meaning that the supervised approach is still of interest for initialization and model comparison. Using an unsupervised model involving three matrices, the average AIC gain per site with TREEBASE test alignments is 0.31, 0.49 and 0.61 compared with LG (named after Le & Gascuel 2008 Mol. Biol. Evol. 25, 1307-1320), WAG and JTT, respectively. This three-matrix model is significantly better than LG for 34 alignments (among 57), and significantly worse for 1 alignment only. Moreover, tree topologies inferred with our mixture models frequently differ from those obtained with single matrices, indicating that using these mixtures impacts not only the likelihood value but also the output tree. All our models and a PhyML implementation are available from http://atgc.lirmm.fr/mixtures.  相似文献   

19.
Currently there exist several computational methods for predicting the functional sites in a set of homologous proteins based on their sequences. Due to difficulties in defining the functional site in a protein, it is not trivial to compare the performance of these methods, evaluate their limitations and quantify improvements by new approaches. Here, we use extensive mutation data from two proteins, Lac repressor and subtilisin, to perform such an analysis. Along with the evaluation of existing approaches, we describe a site class model of evolution as a tool to predict functional sites in proteins. The results indicate that this model, which simulates the evolution process at the amino acid level using site-specific substitution matrices, provides the most accurate information on functional sites in a given protein family. Secondly, we present an application of this model to neurotransmitter transporters, a superfamily of proteins of which we have limited experimental knowledge. Based on this application we present testable hypotheses regarding the mechanism of action of these proteins.  相似文献   

20.
Vicatos S  Reddy BV  Kaznessis Y 《Proteins》2005,58(4):935-949
In this work we present a novel correlated mutations analysis (CMA) method that is significantly more accurate than previously reported CMA methods. Calculation of correlation coefficients is based on physicochemical properties of residues (predictors) and not on substitution matrices. This results in reliable prediction of pairs of residues that are distant in protein sequence but proximal in its three dimensional tertiary structure. Multiple sequence alignments (MSA) containing a sequence of known structure for 127 families from PFAM database have been selected so that all major protein architectures described in CATH classification database are represented. Protein sequences in the selected families were filtered so that only those evolutionarily close to the target protein remain in the MSA. The average accuracy obtained for the alpha beta class of proteins was 26.8% of predicted proximal pairs with average improvement over random accuracy (IOR) of 6.41. Average accuracy is 20.6% for the mainly beta class and 14.4% for the mainly alpha class. The optimum correlation coefficient cutoff (cc cutoff) was found to be around 0.65. The first predictor, which correlates to hydrophobicity, provides the most reliable results. The other two predictors give good predictions which can be used in conjunction to those of the first one. When stricter cc cutoff is chosen, the average accuracy increases significantly (38.76% for alpha beta class), but the trade off is a smaller number of predictions. The use of solvent accessible area estimations for filtering false positives out of the predictions is promising.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号