首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Unbiased estimation of evolutionary distance between nucleotide sequences   总被引:7,自引:2,他引:5  
A new algorithm for estimating the number of nucleotide substitutions per site (i.e., the evolutionary distance) between two nucleotide sequences is presented. This algorithm can be applied to many estimation methods, such as Jukes and Cantor's method, Kimura's transition/transversion method, and Tajima and Nei's method. Unlike ordinary methods, this algorithm is always applicable. Numerical computations and computer simulations indicate that this algorithm gives an almost unbiased estimate of the evolutionary distance, unless the evolutionary distance is very large. This algorithm should be useful especially when we analyze short nucleotide sequences. It can also be applied to amino acid sequences, for estimating the number of amino acid replacements.   相似文献   

2.
Miyazawa S 《PloS one》2011,6(12):e28892
BACKGROUND: A mechanistic codon substitution model, in which each codon substitution rate is proportional to the product of a codon mutation rate and the average fixation probability depending on the type of amino acid replacement, has advantages over nucleotide, amino acid, and empirical codon substitution models in evolutionary analysis of protein-coding sequences. It can approximate a wide range of codon substitution processes. If no selection pressure on amino acids is taken into account, it will become equivalent to a nucleotide substitution model. If mutation rates are assumed not to depend on the codon type, then it will become essentially equivalent to an amino acid substitution model. Mutation at the nucleotide level and selection at the amino acid level can be separately evaluated. RESULTS: The present scheme for single nucleotide mutations is equivalent to the general time-reversible model, but multiple nucleotide changes in infinitesimal time are allowed. Selective constraints on the respective types of amino acid replacements are tailored to each gene in a linear function of a given estimate of selective constraints. Their good estimates are those calculated by maximizing the respective likelihoods of empirical amino acid or codon substitution frequency matrices. Akaike and Bayesian information criteria indicate that the present model performs far better than the other substitution models for all five phylogenetic trees of highly-divergent to highly-homologous sequences of chloroplast, mitochondrial, and nuclear genes. It is also shown that multiple nucleotide changes in infinitesimal time are significant in long branches, although they may be caused by compensatory substitutions or other mechanisms. The variation of selective constraint over sites fits the datasets significantly better than variable mutation rates, except for 10 slow-evolving nuclear genes of 10 mammals. An critical finding for phylogenetic analysis is that assuming variable mutation rates over sites lead to the overestimation of branch lengths.  相似文献   

3.
We outline a method for estimating quantitatively the influence of point mutations and selection on the frequencies of codons and amino acids. We show how the mutation rate, i.e., the rate of amino acid replacement due to point mutation, can be affected by the codon usage as well as by the rates of the involved base exchanges. A comparison of the mutation rates calculated from reliable values of codon usage and base exchange probabilities with those that would be expected on the basis of chance reveals a notable suppression of replacements leading to tryptophan, glutamate, lysine, and methionine, and particularly of those leading to the termination codons. If selection constraints are neglected and only mutations are taken into account, the best agreement between expected and observed frequencies of both codons and amino acids is obtained for alpha = 1.13-1.15, where (Formula: see text). The "selection values" of codons and amino acids derived by our method show a pattern that partially deviates from others in the literature. For example, the selection pressure on methionine and cysteine turns out to be much more pronounced than expected if only the discrepancies between their observed and expected occurrences in proteins are considered. To estimate to what extent randomly occurring amino acid replacements are accepted by selection, we constructed an "acceptability matrix" from the well-established matrix of accepted point mutations. On the basis of this matrix "acceptability values" of the amino acids can be defined that correlate with their selection values. We also examine the significance of mutations and selection of amino acids with respect to their physicochemical properties and functions in proteins. The conservatism of amino acid replacements with respect to certain properties such as polarity can be brought about by the mutational process alone, whereas the conservatism with respect to other relevant properties--among them all measures of bulkiness--obviously is the result of additional selectional constraints on the evolution of protein structures.  相似文献   

4.
When protein sequences divergently evolve under functional constraints, some individual amino acid replacements that reverse the charge (e.g. Lys to Asp) may be compensated by a replacement at a second position that reverses the charge in the opposite direction (e.g. Glu to Arg). When these side-chains are near in space (proximal), such double replacements might be driven by natural selection, if either is selectively disadvantageous, but both together restore fully the ability of the protein to contribute to fitness (are together "neutral"). Accordingly, many have sought to identify pairs of positions in a protein sequence that suffer compensatory replacements, often as a way to identify positions near in space in the folded structure. A "charge compensatory signal" might manifest itself in two ways. First, proximal charge compensatory replacements may occur more frequently than predicted from the product of the probabilities of individual positions suffering charge reversing replacements independently. Conversely, charge compensatory pairs of changes may be observed to occur more frequently in proximal pairs of sites than in the average pair. Normally, charge compensatory covariation is detected by comparing the sequences of extant proteins at the "leaves" of phylogenetic trees. We show here that the charge compensatory signal is more evident when it is sought by examining individual branches in the tree between reconstructed ancestral sequences at nodes in the tree. Here, we find that the signal is especially strong when the positions pairs are in a single secondary structural unit (e.g. alpha helix or beta strand) that brings the side-chains suffering charge compensatory covariation near in space, and may be useful in secondary structure prediction. Also, "node-node" and "node-leaf" compensatory covariation may be useful to identify the better of two equally parsimonious trees, in a way that is independent of the mathematical formalism used to construct the tree itself. Further, compensatory covariation may provide a signal that indicates whether an episode of sequence evolution contains more or less divergence in functional behavior. Compensatory covariation analysis on reconstructed evolutionary trees may become a valuable tool to analyze genome sequences, and use these analyses to extract biomedically useful information from proteome databases.  相似文献   

5.
Yang J  Xie Z  Glover BJ 《The New phytologist》2005,165(2):623-632
NF-Y is a ubiquitous CCAAT-binding factor composed of NF-YA, NF-YB and NF-YC. Multiple genes encoding NF-Y subunits have been identified in plant genomes. It remains unclear whether the duplicate genes underwent different evolutionary patterns. Likelihood-ratio tests were used to examine whether the amino acid substitution rates are the same between duplicate genes. The influences of selection on evolution were evaluated by comparing the conservative and radical amino acid substitution rates, as well as maximum-likelihood analysis. Some NF-YB and NF-YC duplicates showed significant evidence of asymmetric evolution but not the NF-YA duplicates. Most amino acid replacements in the NF-YB and NF-YC duplicates result in changes in hydropathy, polar requirement and polarity. The physicochemical changes in the sequences of NF-YB seem to be coupled to asymmetric divergence in gene function. Plant NF-Y genes have evolved in different patterns. Relaxed selective constraints following gene duplication are most likely responsible for the unequal evolutionary rates and distinct divergence patterns of duplicate NF-Y genes. Positive selection may have promoted amino acid hydropathy changes in the NF-YC duplicates.  相似文献   

6.
One of the principal goals of population genetics is to understand the processes by which genetic variation within species (polymorphism) becomes converted into genetic differences between species (divergence). In this transformation, selective neutrality, near neutrality, and positive selection may each play a role, differing from one gene to the next. Synonymous nucleotide sites are often used as a uniform standard of comparison across genes on the grounds that synonymous sites are subject to relatively weak selective constraints and so may, to a first approximation, be regarded as neutral. Synonymous sites are also interdigitated with nonsynonymous sites and so are affected equally by genomic context and demographic factors. Hence a comparison of levels of polymorphism and divergence between synonymous sites and amino acid replacement sites in a gene is potentially informative about the magnitude of selective forces associated with amino acid replacements. We have analyzed 56 genes in which polymorphism data from D. simulans are compared with divergence from a reference strain of D. melanogaster. The framework of the analysis is Bayesian and assumes that the distribution of selective effects (Malthusian fitnesses) is Gaussian with a mean that differs for each gene. In such a model, the average scaled selection intensity (gamma = N(e)s) of amino acid replacements eligible to become polymorphic or fixed is -7.31, and the standard deviation of selective effects within each locus is 6.79 (assuming homoscedasticity across loci). For newly arising mutations of this type that occur in autosomal or X-linked genes, the average proportion of beneficial mutations is 19.7%. Among the amino acid polymorphisms in the sample, the expected average proportion of beneficial mutations is 47.7%, and among amino acid replacements that become fixed the average proportion of beneficial mutations is 94.3%. The average scaled selection intensity of fixed mutations is +5.1. The presence of positive selection is pervasive with the single exception of kl-5, a Y-linked fertility gene. We find no evidence that a significant fraction of fixed amino acid replacements is neutral or nearly neutral or that positive selection drives amino acid replacements at only a subset of the loci. These results are model dependent and we discuss possible modifications of the model that might allow more neutral and nearly neutral amino acid replacements to be fixed.  相似文献   

7.
Miyazawa S 《PloS one》2011,6(3):e17244

Background

Empirical substitution matrices represent the average tendencies of substitutions over various protein families by sacrificing gene-level resolution. We develop a codon-based model, in which mutational tendencies of codon, a genetic code, and the strength of selective constraints against amino acid replacements can be tailored to a given gene. First, selective constraints averaged over proteins are estimated by maximizing the likelihood of each 1-PAM matrix of empirical amino acid (JTT, WAG, and LG) and codon (KHG) substitution matrices. Then, selective constraints specific to given proteins are approximated as a linear function of those estimated from the empirical substitution matrices.

Results

Akaike information criterion (AIC) values indicate that a model allowing multiple nucleotide changes fits the empirical substitution matrices significantly better. Also, the ML estimates of transition-transversion bias obtained from these empirical matrices are not so large as previously estimated. The selective constraints are characteristic of proteins rather than species. However, their relative strengths among amino acid pairs can be approximated not to depend very much on protein families but amino acid pairs, because the present model, in which selective constraints are approximated to be a linear function of those estimated from the JTT/WAG/LG/KHG matrices, can provide a good fit to other empirical substitution matrices including cpREV for chloroplast proteins and mtREV for vertebrate mitochondrial proteins.

Conclusions/Significance

The present codon-based model with the ML estimates of selective constraints and with adjustable mutation rates of nucleotide would be useful as a simple substitution model in ML and Bayesian inferences of molecular phylogenetic trees, and enables us to obtain biologically meaningful information at both nucleotide and amino acid levels from codon and protein sequences.  相似文献   

8.
The evolution of the gene for a male ejaculatory protein, Acp26Aa, has been shown to be driven by positive selection when nonsibling species in the Drosophila melanogaster subgroup are compared. To know if selection has been operating in the recent past and to understand the details of its dynamics, we obtained DNA sequences of Acp26Aa and the nearby Acp26Ab gene from 39 D. melanogaster chromosomes. Together with the 10 published sequences, we analyzed 49 sequences from five populations in four continents. The southern African population is somewhat differentiated from all other populations, but its nucleotide diversity is lower at these two loci. We find the following results for Acp26Aa: (1) The R: S (replacement : silent changes) ratio is significantly higher in the between-species comparisons than in the within-species data by the McDonald and Kreitman test. Positive selection is probably responsible for the excess of amino acid replacements between species. (2) However, within-species nucleotide diversity is high. Neither the Tajima test nor the Fu and Li test indicates a reduction in nucleotide diversity due to positive selection in the recent past. (3) The newly derived nucleotides in D. melanogaster are at high frequency significantly more often than predicted by the neutral equilibrium. Since the nearby Acp26Ab gene does not show these patterns, these observations cannot be attributed to the characteristics of this chromosomal region. We suggest that positive selection is active, but may be weak, for each amino acid change in the Acp26Aa gene.   相似文献   

9.
On the PAM matrix model of protein evolution   总被引:2,自引:0,他引:2  
The internal consistency of the PAM matrix model of protein evolution is here investigated. The 1 PAM matrix has been constructed from amino acid replacements observed in closely related sequences. Such replacements are of two types, those that do not require an intermediate amino acid replacement and those that do. The second type of replacement must generally be produced by a repetition of the first. This allows data on the first type to be used in predicting data on the second type so that some elements of the 1 PAM matrix may be used to predict others. A discrepancy of more than two orders of magnitude is found between the predictions and the data when this is carried out. This is partly accounted for by an error in constructing the matrix. However, it also seems necessary that the basic model be modified. Several possibilities are considered. One of these is to incorporate a site-dependent spectrum of mutabilities associated with each amino acid.   相似文献   

10.
The sequences of four-alpha-helical bundle proteins are characterized by a pattern of hydrophilic and hydrophobic amino acids which is repeated every seven residues. At each position of the heptad repeat there are specific constraints on the amino acid properties which result from the topology of the tertiary motif. These constraints give rise to patterns of amino acid distribution which are distinct from those of other proteins. The distributions in each of the heptad positions have been determined by a statistical analysis of structural and sequence data derived from seven families of aligned protein sequences. The constitution of each position is dominated by a very small number of different amino acids, with the core positions consisting overwhelmingly of Leu and Ala. The positional preferences of the individual amino acids can be generally interpreted in terms of residue properties and topological constraints. The potential for four-alpha-helix bundle folding is reflected primarily in the pattern of residue occurrence in the heptad and not in the overall amino acid composition of the protein. Possible applications of this analysis in structure predictions, sequence alignments and in the rational design and engineering of four-alpha-helical bundle proteins are discussed.  相似文献   

11.
We analyse in this paper the evolutionary patterns of two types of Drosophila retrotransposons, gypsy (a virus-like element), and bilbo (a LINE-like element), in host species from the Drosophila and Scaptomyza genus. Phylogenetic analysis of the retrotransposon sequences amplified by PCR, revealed concordance with the phylogeny of the Drosophila host species from the obscura group, which is consistent with vertical transmission during differentiation of the species. However, in the species outside of the obscura group, horizontal transmission can be considered. The amplified sequences that presented intact open reading frames were used in an analysis of the evolutionary constraints on the amino acid sequences. The analysed sequences seem to be functional, and the selective constraints are evidenced, especially when sequences from distant species are compared. Comparison of the evolutionary rates of both retrotransposons in the same species, suggests that bilbo seems to evolve more rapidly than gypsy.  相似文献   

12.
Procedures for performing cladistic analyses can provide powerful tools for understanding the evolution of neuropeptide and polypeptide hormone coding genes. These analyses can be done on either amino acid data sets or nucleotide data sets and can utilize several different algorithms that are dependent on distinct sets of operating assumptions and constraints. In some cases, the results of these analyses can be used to gauge phylogenetic relationships between taxa. Selecting the proper cladistic analysis strategy is dependent on the taxonomic level of analysis and the rate of evolution within the orthologous genes being evaluated. For example, previous studies have shown that the amino acid sequence of proopiomelanocortin (POMC), the common precursor for the melanocortins and beta-endorphin, can be used to resolve phylogenetic relationships at the class and order level. This study tested the hypothesis that POMC sequences could be used to resolve phylogenetic relationships at the family taxonomic level. Cladistic analyses were performed on amphibian POMC sequences characterized from the marine toad, Bufo marinus (family Bufonidae; this study), the spadefoot toad, Spea multiplicatus (family Pelobatidae), the African clawed frog, Xenopus laevis (family Pipidae) and the laughing frog, Rana ridibunda (family Ranidae). In these analyses the sequence of Australian lungfish POMC was used as the outgroup. The analyses were done at the amino acid level using the maximum parsimony algorithm and at the nucleotide level using the maximum likelihood algorithm. For the anuran POMC genes, analysis at the nucleotide level using the maximum likelihood algorithm generated a cladogram with higher bootstrap values than the maximum parsimony analysis of the POMC amino acid data set. For anuran POMC sequences, analysis of nucleotide sequences using the maximum likelihood algorithm would appear to be the preferred strategy for resolving phylogenetic relationships at the family taxonomic level.  相似文献   

13.
14.
The pattern of residue substitution in divergently evolving families of globular proteins is highly variable. At each position in a fold there are constraints on the identities of amino acids from both the three-dimensional structure and the function of the protein. To characterize and quantify the structural constraints, we have made a comparative analysis of families of homologous globular proteins. Residues are classified according to amino acid type, secondary structure, accessibility of the sidechain, and existence of hydrogen bonds from sidechain to other sidechains or peptide carbonyl or amide functions. There are distinct patterns of substitution especially where residues are both solvent inaccessible and hydrogen bonded through their sidechains. The patterns of residue substitution can be used to construct templates or to identify 'key' residues if one or more structures are known. Conversely, analysis of conversation and substitution across a large family of aligned sequences in terms of substitution profiles can allow prediction of tertiary environment or indicate a functional role. Similar analyses can be used to test the validity of putative structures if several homologous sequences are available.  相似文献   

15.
Summary The proposed transfer of the gene for Cu/Zn superoxide dismutase from the ponyfish to its symbiotic bacteriumPhotobacterium leiognathi has been evaluated by an extensive analysis of all available Cu/Zn superoxide dismutase sequences. By the use of four different computer programs, phylogenetic trees were constructed from the sequences of the superoxide dismutases of human, ox, pig, horse, swordfish, fruit fly, yeast, andNeurospora crassa to find out whether superoxide dismutase sequences can reliably be used for the reconstruction of genealogical relationships. All programs arrived at the same most parsimonious tree (one requiring 232 amino acid replacements), the topology of which conformed to established opinions about the phylogenetic relations among these eukaryotes, except that it placed humans closer to the artiodactyls ox and pig than it placed horses. This could be corrected at the cost of two amino acid replacements. The sequence ofP. leiognathi superoxide dismutase was then connected at all possible positions to the corrected eukaryotic tree. It was slighly more parsimonious to link the bacterial sequence to the root of the tree than to the fish branch: The former required 316 (or 317) amino acid replacements, versus 319 for the latter. This relative lack of discrimination between such distinct alternative topologies may be a general complication in the comparison of prokaryotic and eukaryotic proteins: Bacterial cytochrome c sequences also were found to be connected as parsimoniously to the root of the eukaryotic tree as to any terminal or ancestral branch. It was calculated that the rate of evolution of the bacterial superoxide dismutase gene, if transfer occurred 30 million years (Myr) ago, must have amounted to 487 amino acid replacements per 100 residues per 100 Myr. This is more than 5 times the highest rate observed in any protein (that found for fibrinopeptides), and even much higher than the maximum rate of protein evolution that can be deduced from the neutral mutation rate of unconstrained DNA. Also, no significant evidence that shared derived amino acid replacements are present in swordfish andP. leiognathi superoxide dismutase, as might be expected had gene transfer occurred, was found. On the basis of the available data it seems more reasonable to ascribe the isolated occurrence of Cu/Zn superoxide dismutase inP. leiognathi (as well as inCaulobacter crescentus) to irregular patterns of gene expression and inactivation in the course of divergent evolution than to undocumented processes of gene transfer from eukaryotes to prokaryotes.  相似文献   

16.
The amino acid sequence of rat liver glucokinase deduced from cloned cDNA   总被引:16,自引:0,他引:16  
Rat liver glucokinase (ATP:D-hexose 6-phosphotransferase, EC 2.7.1.1) was purified to homogeneity, cleaved, and subjected to amino acid sequence analysis. Forty-five percent of the protein sequence was obtained, and this information was used to design oligonucleotide probes to screen a rat liver cDNA library. A 1601-base pair cDNA (GK1) contained an open reading frame that encoded the amino acid sequences found in the peptides used to generate the oligonucleotide probes. A second cDNA was subsequently identified (GK.Z2), which is 2346 base pairs long and corresponds to nearly the entire glucokinase mRNA. Blot transfer analysis of hepatic RNA showed that glucokinase mRNA exists as a single species of about 2400 nucleotides. Four hours of insulin treatment of diabetic rats resulted in a 30-fold induction of this mRNA. GK.Z2 has a long open reading frame which, with the known partial peptide sequence, allowed us to deduce the primary structure of glucokinase. The enzyme is composed of 465 amino acids and has a mass of 51,924 daltons. Glucokinase has 53 and 33% amino acid sequence identities with the carboxyl-terminal domains of rat brain hexokinase I and yeast hexokinase, respectively. If conservative amino acid replacements are also considered, glucokinase is similar to these two enzymes at 75 and 63% of positions, respectively. The putative glucose- and ATP-binding domains of glucokinase were identified, and these regions appear to be highly conserved in the hexokinase family of enzymes.  相似文献   

17.
Z. Yang  S. Kumar    M. Nei 《Genetics》1995,141(4):1641-1650
A statistical method was developed for reconstructing the nucleotide or amino acid sequences of extinct ancestors, given the phylogeny and sequences of the extant species. A model of nucleotide or amino acid substitution was employed to analyze data of the present-day sequences, and maximum likelihood estimates of parameters such as branch lengths were used to compare the posterior probabilities of assignments of character states (nucleotides or amino acids) to interior nodes of the tree; the assignment having the highest probability was the best reconstruction at the site. The lysozyme c sequences of six mammals were analyzed by using the likelihood and parsimony methods. The new likelihood-based method was found to be superior to the parsimony method. The probability that the amino acids for all interior nodes at a site reconstructed by the new method are correct was calculated to be 0.91, 0.86, and 0.73 for all, variable, and parsimony-informative sites, respectively, whereas the corresponding probabilities for the parsimony method were 0.84, 0.76, and 0.51, respectively. The probability that an amino acid in an ancestral sequence is correctly reconstructed by the likelihood analysis ranged from 91.3 to 98.7% for the four ancestral sequences.  相似文献   

18.
Summary The common but generally overlooked problem of how best to construct phylogenies from orthologous amino acid sequences, when their alignment requires the placement therein of gaps denoting insertions/deletions in the evolutionary history of their genes since their common ancestor, has been studied. Three diverse methods were examined: 1. each missing residue in a gap is weighted as equivalent to the average number of minimum nucleotide replacements in known conjugate amino acid pairs of those same two sequences, which weight necessarily differs for each pair of sequences; 2. each missing residue in a gap is weighted as equivalent to a fixed number of nucleotide replacements; and 3. each gap, regardless of length, is weighted as equivalent to a fixed number of nucleotide replacements. For the flavodoxins, each method yielded a different best tree and suggests that the choice of method may be crucial. For the plant ferredoxins, all methods give results inconsistent with botanical classification and suggests the sequences may not all be orthologous. For the bacterial ferredoxins, the method was less germane than the actual weight used, five different best trees being obtained depending upon the weight. The best tree for all ferredoxins (prokaryotic plus eukaryotic) combined proved to be greatly dependent upon the gap locations with several reasonable alignments yielding different best trees. They also suggest that functional equivalence may well prove to be a poor guide to which residues have a common ancestral codon. The rubredoxin sequences show that a partial internal gene duplication occurred in thePseudomonas line, probably very soon after its divergence from the other genera. Together, the results clearly indicate that the phylogenetic answer one gets may greatly depend upon how one treats the gaps but they fail to indicate what treatment may be best. This results partly from the fact that the phylogenies of the taxa represented are not known with sufficient confidence to be sure when the procedures are performing best.  相似文献   

19.
A comparison of seven human DR and DC class II histocompatibility antigen beta-chain amino acid sequences indicates that the allelic variation is of comparable magnitude within the DR and DC beta-chain genes. Silent and replacement nucleotide substitutions in six DR and DC beta-chain sequences, as well as in seven murine class II sequences (three I-A beta and four I-A alpha alleles) were analyzed. The results suggest that the mutation rates are of a comparable magnitude in the nucleotide sequences encoding the first and second external domains of the class II molecules. Nevertheless, the allelic amino acid replacements are predominantly located in the first domains. We conclude that a conservative selective pressure acts on the second domains, whereas in many positions in the first domains replacement substitutions are selectively neutral or maybe even favoured. Thus, the difference between the first and second domains as regards the number of amino acid replacements is mainly due to selection.  相似文献   

20.
The degree of similarity in the three-dimensional structures of two proteins can be examined by comparing the patterns of hydrophobicity found in their amino acid sequences. Each type of amino acid residue is assigned a numerical hydrophobicity, and the correlation coefficient rH is computed between all pairs of residues in the two sequences. In tests on sequences from two properly aligned proteins of similar three-dimensional structures, rH is found in the range 0.3 to 0.7. Improperly aligned sequences or unrelated sequences give rH near zero. By considering the observed frequency of amino acid replacements among related structures, a set of optimal matching hydrophobicities (OMHs) was derived. With this set of OMHs, significant correlation coefficients are calculated for similar three-dimensional structures, even though the two sequences contain few identical residues. An example is the two similar folding domains of rhodanese (rH = 0.5). Predictions are made of similar three-dimensional structures for the alpha and beta chains of the various phycobiliproteins, and for delta hemolysin and melittin.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号