首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The selective forces acting on a protein-coding gene are commonly inferred using evolutionary codon models by contrasting the rate of nonsynonymous substitutions to the rate of synonymous substitutions. These models usually assume that the synonymous substitution rate, Ks, is homogenous across all sites, which is justified if synonymous sites are free from selection. However, a growing body of evidence indicates that the DNA and RNA levels of protein-coding genes are subject to varying degrees of selective constraints due to various biological functions encoded at these levels. In this paper, we develop evolutionary models that account for these layers of selection by allowing for both among-site variability of substitution rates at the DNA/RNA level (which leads to Ks variability among protein-coding sites) and among-site variability of substitution rates at the protein level (Ka variability). These models are constructed so that positive selection is either allowed or not. This enables statistical testing of positive selection when variability at the DNA/RNA substitution rate is accounted for. Using this methodology, we show that variability of the baseline DNA/RNA substitution rate is a widespread phenomenon in coding sequence data of mammalian genomes, most likely reflecting varying degrees of selection at the DNA and RNA levels. Additionally, we use simulations to examine the impact that accounting for the variability of the baseline DNA/RNA substitution rate has on the inference of positive selection. Our results show that ignoring this variability results in a high rate of erroneous positive-selection inference. Our newly developed model, which accounts for this variability, does not suffer from this problem and hence provides a likelihood framework for the inference of positive selection on a background of variability in the baseline DNA/RNA substitution rate.  相似文献   

2.
A model-based approach for detecting coevolving positions in a molecule   总被引:4,自引:0,他引:4  
We present a new method for detecting coevolving sites in molecules. The method relies on a set of aligned sequences (nucleic acid or protein) and uses Markov models of evolution to map the substitutions that occurred at each site onto the branches of the underlying phylogenetic tree. This mapping takes into account the uncertainty over ancestral states and among-site rate variation. We then build, for each site, a "substitution vector" containing the posterior estimates of the number of substitutions in each branch. The amount of coevolution for a pair of sites is then measured as the Pearson correlation coefficient between the two corresponding substitution vectors and compared to the expectation under the null hypothesis of independence. We applied the method to a 79-species bacterial ribosomal RNA data set, for which extensive structural characterization has been done over the last 30 years. More than 95% of the intramolecular predicted pairs of sites correspond to known interacting site pairs.  相似文献   

3.
MOTIVATION: Maximum likelihood-based methods to estimate site by site substitution rate variability in aligned homologous protein sequences rely on the formulation of a phylogenetic tree and generally assume that the patterns of relative variability follow a pre-determined distribution. We present a phylogenetic tree-independent method to estimate the relative variability of individual sites within large datasets of homologous protein sequences. It is based upon two simple assumptions. Firstly that substitutions observed between two closely related sequences are likely, in general, to occur at the most variable sites. Secondly that non-conservative amino acid substitutions tend to occur at more variable sites. Our methodology makes no assumptions regarding the underlying pattern of relative variability between sites. RESULTS: We have compared, using data simulated under a non-gamma distributed model, the performance of this approach to that of a maximum likelihood method that assumes gamma distributed rates. At low mean rates of evolution our method inferred site by site relative substitution rates more accurately than the maximum likelihood approach in the absence of prior assumptions about the relationships between sequences. Our method does not directly account for the effects of mutational saturation, However, we have incorporated an 'ad-hoc' modification that allows the accurate estimation of relative site variability in fast evolving and saturated datasets.  相似文献   

4.
Genetic sequence data typically exhibit variability in substitution rates across sites. In practice, there is often too little variation to fit a different rate for each site in the alignment, but the distribution of rates across sites may not be well modeled using simple parametric families. Mixtures of different distributions can capture more complex patterns of rate variation, but are often parameter-rich and difficult to fit. We present a simple hierarchical model in which a baseline rate distribution, such as a gamma distribution, is discretized into several categories, the quantiles of which are estimated using a discretized beta distribution. Although this approach involves adding only two extra parameters to a standard distribution, a wide range of rate distributions can be captured. Using simulated data, we demonstrate that a "beta-" model can reproduce the moments of the rate distribution more accurately than the distribution used to simulate the data, even when the baseline rate distribution is misspecified. Using hepatitis C virus and mammalian mitochondrial sequences, we show that a beta- model can fit as well or better than a model with multiple discrete rate categories, and compares favorably with a model which fits a separate rate category to each site. We also demonstrate this discretization scheme in the context of codon models specifically aimed at identifying individual sites undergoing adaptive or purifying evolution.  相似文献   

5.
The relative substitution rate of each nucleotide site in bacterial small subunit rRNA, large subunit rRNA and 5S rRNA was calculated from sequence alignments for each molecule. Two-dimensional and three-dimensional variability maps of the rRNAs were obtained by plotting the substitution rates on secondary structure models and on the tertiary structure of the rRNAs available from X-ray diffraction results. This showed that the substitution rates are generally low near the centre of the ribosome, where the nucleotides essential for its function are situated, and that they increase towards the surface. An inventory was made of insertions characteristic of the Archaea, Bacteria and Eucarya domains, and for additional insertions present in specific eukaryotic taxa. All these insertions occur at the ribosome surface. The taxon-specific insertions seem to arise randomly in the eukaryotic evolutionary tree, without any phylogenetic relatedness between the taxa possessing them.  相似文献   

6.
A quantitative map of nucleotide substitution rates in bacterial rRNA.   总被引:14,自引:3,他引:11       下载免费PDF全文
A recently developed method for estimating the variability of nucleotide sites in a sequence alignment [Van de Peer, Y., Van der Auwera, G. and De Wachter, R. (1996) J. Mol. Evol. 42, 201-210] was applied to bacterial 16S, 5S and 23S rRNAs. In this method, the variability of each nucleotide site is defined as its evolutionary rate relative to the average evolutionary rate of all the nucleotide sites of the molecule. Spectra of evolutionary rates were calculated for each rRNA and show the fastest evolving sites substituting at rates more than 1000 times that of the slowest ones. Variability maps are presented for each rRNA, consisting of secondary structure models where the variability of each nucleotide site is indicated by means of a colored dot. The maps can be interpreted in terms of higher order structure, function and evolution of the molecules and facilitate the selection of areas suitable for the design of PCR primers and hybridization probes. Variability measurement is also important for the precise estimation of evolutionary distances and the inference of phylogenetic trees.  相似文献   

7.
S Meyer  G Weiss  A von Haeseler 《Genetics》1999,152(3):1103-1110
This study provides a comprehensive survey of the complex pattern of nucleotide substitution in the control region of human mtDNA, which is of central importance to the studies of human evolution. A total of 1229 different hypervariable region I (HVRI) and 385 different hypervariable region II (HVRII) sequences were analyzed using a complex substitution model. Moreover, we suggest a new method to assign relative rates to each site in the sequence. Estimates are based on maximum-likelihood methods applied to randomly selected subsets of sequences. Our results indicate that the rate of substitution in HVRI is approximately twice as high as in HVRII and that this difference is mainly due to a higher frequency of pyrimidine transitions in HVRI. However, rate heterogeneity is more pronounced in HVRII.  相似文献   

8.
Mitochondrial DNA data have been used extensively to study evolution and early human origins. These applications require estimates of the rate at which nucleotide substitutions occur in the DNA sequence. We consider the problem of estimating substitution rates in the presence of site-to-site rate variation. A coalescent model is presented that allows for different substitution rates for purines and pyrimidines, as well as more detailed models that allow fast and slow rates within each of the purine and pyrimidine classes. A method for estimating such rates is presented. Even for these simple models of site heterogeneity, there are, typically, insufficient data to obtain reliable estimates of site-specific substitution rates. However, estimates of the average rate across all sites appear to be relatively stable even in the presence of site heterogeneity. Simulations of models with site-to-site variation in mutation rate show that hypervariable sites can produce peaks in the pairwise difference curves that have previously been attributed to population dynamics.  相似文献   

9.
We have examined the plasticity of the antigen-combining site of a high-affinity antibody. In phage-displayed Fab libraries, selected CDR positions and one FR position of the humanized anti-Her2 antibody hu4D5 were substituted with all 20 amino acids. Antigen-binding selections were used to enrich for high-affinity variants, and a large number of sequences were obtained prior to convergence of the selected pool to a small set of clones. As expected, sequence variability of the antigen-binding site is overall diminished compared to known IgG sequences; however, certain positions retain much higher variability than others. The sequence variability map of the hu4D5 binding site is compared with a map derived from previous alanine-scanning of the antibody. Affinities of soluble Fab fragments for antigen confirm that multiple variants were selected with high affinity for antigen, including one variant with a single point mutation that was about threefold improved in affinity compared to the parental hu4D5. Interestingly, this mutation is one of the most radical in terms of changing side-chain chemistry (Trp for Asp) and occurs at the most plastic site as calculated by the Wu-Kabat variability coefficient. Thus variability mapping yields information about the antibody-antigen interaction that is useful and complementary to that obtained by alanine scanning mutagenesis.  相似文献   

10.
We consider three approaches for estimating the rates of nonsynonymous and synonymous changes at each site in a sequence alignment in order to identify sites under positive or negative selection: (1) a suite of fast likelihood-based "counting methods" that employ either a single most likely ancestral reconstruction, weighting across all possible ancestral reconstructions, or sampling from ancestral reconstructions; (2) a random effects likelihood (REL) approach, which models variation in nonsynonymous and synonymous rates across sites according to a predefined distribution, with the selection pressure at an individual site inferred using an empirical Bayes approach; and (3) a fixed effects likelihood (FEL) method that directly estimates nonsynonymous and synonymous substitution rates at each site. All three methods incorporate flexible models of nucleotide substitution bias and variation in both nonsynonymous and synonymous substitution rates across sites, facilitating the comparison between the methods. We demonstrate that the results obtained using these approaches show broad agreement in levels of Type I and Type II error and in estimates of substitution rates. Counting methods are well suited for large alignments, for which there is high power to detect positive and negative selection, but appear to underestimate the substitution rate. A REL approach, which is more computationally intensive than counting methods, has higher power than counting methods to detect selection in data sets of intermediate size but may suffer from higher rates of false positives for small data sets. A FEL approach appears to capture the pattern of rate variation better than counting methods or random effects models, does not suffer from as many false positives as random effects models for data sets comprising few sequences, and can be efficiently parallelized. Our results suggest that previously reported differences between results obtained by counting methods and random effects models arise due to a combination of the conservative nature of counting-based methods, the failure of current random effects models to allow for variation in synonymous substitution rates, and the naive application of random effects models to extremely sparse data sets. We demonstrate our methods on sequence data from the human immunodeficiency virus type 1 env and pol genes and simulated alignments.  相似文献   

11.
MOTIVATION: Multiple sequence alignments of homologous proteins are useful for inferring their phylogenetic history and to reveal functionally important regions in the proteins. Functional constraints may lead to co-variation of two or more amino acids in the sequence, such that a substitution at one site is accompanied by compensatory substitutions at another site. It is not sufficient to find the statistical correlations between sites in the alignment because these may be the result of several undetermined causes. In particular, phylogenetic clustering will lead to many strong correlations. RESULTS: A procedure is developed to detect statistical correlations stemming from functional interaction by removing the strong phylogenetic signal that leads to the correlations of each site with many others in the sequence. Our method relies upon the accuracy of the alignment but it does not require any assumptions about the phylogeny or the substitution process. The effectiveness of the method was verified using computer simulations and then applied to predict functional interactions between amino acids in the Pfam database of alignments.  相似文献   

12.
Bierne N  Eyre-Walker A 《Genetics》2003,165(3):1587-1597
Most methods for estimating the rate of synonymous and nonsynonymous substitution per site define a site as a mutational opportunity: the proportion of sites that are synonymous is equal to the proportion of mutations that would be synonymous under the model of evolution being considered. Here we demonstrate that this definition of a site can give misleading results and that a physical definition of site should be used in some circumstances. We illustrate our point by reexamining the relationship between codon usage bias and the synonymous substitution rate. It has recently been shown that the rate of synonymous substitution, calculated using the Goldman-Yang method, which encapsulates the mutational-opportunity definition of a site at a high level of sophistication, is either positively correlated or uncorrelated to synonymous codon bias in Drosophila. Using other methods, which account for synonymous codon bias but define a site physically, we show that there is a negative correlation between the synonymous substitution rate and codon bias and that the lack of a negative correlation using the Goldman-Yang method is due to the way in which the number of synonymous sites is counted. We also show that there is a positive correlation between the synonymous substitution rate and third position GC content in mammals, but that the relationship is considerably weaker than that obtained using the Goldman-Yang method. We argue that the Goldman-Yang method is misleading in this context and conclude that methods that rely on a mutational-opportunity definition of a site should be used with caution.  相似文献   

13.
Hantaviruses are rodent-borne Bunyaviruses that infect the Arvicolinae, Murinae, and Sigmodontinae subfamilies of Muridae. The rate of molecular evolution in the hantaviruses has been previously estimated at approximately 10(-7) nucleotide substitutions per site, per year (substitutions/site/year), based on the assumption of codivergence and hence shared divergence times with their rodent hosts. If substantiated, this would make the hantaviruses among the slowest evolving of all RNA viruses. However, as hantaviruses replicate with an RNA-dependent RNA polymerase, with error rates in the region of one mutation per genome replication, this low rate of nucleotide substitution is anomalous. Here, we use a Bayesian coalescent approach to estimate the rate of nucleotide substitution from serially sampled gene sequence data for hantaviruses known to infect each of the 3 rodent subfamilies: Araraquara virus (Sigmodontinae), Dobrava virus (Murinae), Puumala virus (Arvicolinae), and Tula virus (Arvicolinae). Our results reveal that hantaviruses exhibit short-term substitution rates of 10(-2) to 10(-4) substitutions/site/year and so are within the range exhibited by other RNA viruses. The disparity between this substitution rate and that estimated assuming rodent-hantavirus codivergence suggests that the codivergence hypothesis may need to be reevaluated.  相似文献   

14.
Precise dating of viral subtype divergence enables researchers to correlate divergence with geographic and demographic occurrences. When historical data are absent (that is, the overwhelming majority), viral sequence sampling on a time scale commensurate with the rate of substitution permits the inference of the times of subtype divergence. Currently, researchers use two strategies to approach this task, both requiring strong conditions on the molecular clock assumption of substitution rate. As the underlying structure of the substitution rate process at the time of subtype divergence is not understood and likely highly variable, we present a simple method that estimates rates of substitution, and from there, times of divergence, without use of an assumed molecular clock. We accomplish this by blending estimates of the substitution rate for triplets of dated sequences where each sequence draws from a distinct viral subtype, providing a zeroth-order approximation for the rate between subtypes. As an example, we calculate the time of divergence for three genes among influenza subtypes A-H3N2 and B using subtype C as an outgroup. We show a time of divergence approximately 100 years ago, substantially more recent than previous estimates which range from 250 to 3800 years ago.  相似文献   

15.
We propose models for describing replacement rate variation in genes and proteins, in which the profile of relative replacement rates along the length of a given sequence is defined as a function of the site number. We consider here two types of functions, one derived from the cosine Fourier series, and the other from discrete wavelet transforms. The number of parameters used for characterizing the substitution rates along the sequences can be flexibly changed and in their most parameter-rich versions, both Fourier and wavelet models become equivalent to the unrestricted-rates model, in which each site of a sequence alignment evolves at a unique rate. When applied to a few real data sets, the new models appeared to fit data better than the discrete gamma model when compared with the Akaike information criterion and the likelihood-ratio test, although the parametric bootstrap version of the Cox test performed for one of the data sets indicated that the difference in likelihoods between the two models is not significant. The new models are applicable to testing biological hypotheses such as the statistical identity of rate variation profiles among homologous protein families. These models are also useful for determining regions in genes and proteins that evolve significantly faster or slower than the sequence average. We illustrate the application of the new method by analyzing human immunoglobulin and Drosophilid alcohol dehydrogenase sequences.  相似文献   

16.
Isolation of deletion and substitution mutants of adenovirus type 5   总被引:57,自引:0,他引:57  
N Jones  T Shenk 《Cell》1978,13(1):181-188
The infectivity of adenovirus type 5 DNA can be increased to about 5 x 103 plaque-forming units per μg DNA if the DNA is isolated as a DNA-protein complex. Utilizing this improved infectivity, a method was developed for the selection of mutants lacking restriction endonuclease cleavage sites. The procedure involves three steps. First, the DNA-protein complex is cleaved with a restriction endonuclease. The Eco RI restriction endonuclease was used here. It cleaves adenovirus type 5 DNA to produce three fragments: fragment A (1–76 map units), fragment C (76–83 map units) and fragment B (10–83 map units). Second, the mixture of fragments is rejoined by incubating with DNA ligase, and, third, the modified DNA is used to infect cells in a DNA plaque assay. Mutants were obtained which lacked the endonuclease cleavage site at 0.83 map units. Such mutant DNAs were selected by this procedure because they were cleaved by the Eco RI endonuclease to produce only two fragments: a normal A fragment and a fused B/C fragment. These two fragments could be rejoined to produce a viable DNA molecule as a result of a bimolecular reaction with one ligation event; this exerted a strong selection for such molecules since a trimolecular reaction (keeping the C fragment in its proper orientation) and two ligation events were required to regenerate a wild-type molecule. The alterations resulting in the loss of the Eco RI endonuclease cleavage site at 0.83 map units include both deletion and substitution mutations. The inserted sequences in the substitution mutations are cellular in origin.  相似文献   

17.
DNA sequencing by partial ribosubstitution   总被引:9,自引:0,他引:9  
A new rapid method for DNA sequence analysis has been devised. In this method, base-specific cleavage is achieved at partially substituted ribonucleotides which are introduced by DNA polymerase extension in the presence of Mn2+. Access to a target sequence and label incorporation are achieved by extending a restriction fragment primer with DNA polymerase I. After a short initial incorporation with [α-32P]deoxynucleotide triphosphates to label the 5′ region of the target sequence, the triphosphates are removed and the reaction mixture is divided four ways for a second primed extension. The second extension is a cold chase in the presence of Mn2+, all four deoxynucleotides and one of the four ribonucleotides under conditions that result in about 2% ribonucleotides substitution at each position. After cleavage at the restriction site and alkali cleavage at the positions of partial ribosubstitution, each reaction mixture is analysed by electrophoresis on a high-resolution denaturing acrylamide gel. As in the other rapid DNA sequencing methods the extent of DNA sequence that can be determined from a single experiment is limited only by the resolution of the analysing gels. At present some 100 nucleotides of sequence can be determined from a single priming reaction.  相似文献   

18.
Substitution rates are one of the most fundamental parameters in a phylogenetic analysis and are represented in phylogenetic models as the branch lengths on a tree. Variation in substitution rates across an alignment of molecular sequences is well established and likely caused by variation in functional constraint across the genes encoded in the sequences. Rate variation across alignment sites is important to accommodate in a phylogenetic analysis; failure to account for across-site rate variation can cause biased estimates of phylogeny or other model parameters. Traditionally, rate variation across sites has been modeled by treating the rate for a site as a random variable drawn from some probability distribution (such as the gamma probability distribution) or by partitioning sites to different rate classes and estimating the rate for each class independently. We consider a different approach, related to site-specific models in which sites are partitioned to rate classes. However, instead of treating the partitioning scheme in which sites are assigned to rate classes as a fixed assumption of the analysis, we treat the rate partitioning as a random variable under a Dirichlet process prior. We find that the Dirichlet process prior model for across-site rate variation fits alignments of DNA sequence data better than commonly used models of across-site rate variation. The method appears to identify the underlying codon structure of protein-coding genes; rate partitions that were sampled by the Markov chain Monte Carlo procedure were closer to a partition in which sites are assigned to rate classes by codon position than to randomly permuted partitions but still allow for additional variability across sites.  相似文献   

19.
Over 400 supposedly biochemically and genetically distinct variants of glucose-6-phosphate dehydrogenase (G6PD) have been described in the past. In order to investigate these variants at the DNA sequence level we have now determined the relevant sequences of introns of G6PD and describe a method which allows us to rapidly determine the sequence of the entire coding region of G6PD. This technique was applied to six variants that cause G6PD deficiency to be functionally so severe as to result in nonspherocytic hemolytic anemia. Although the patients were all unrelated, G6PD Marion, Gastonia, and Minnesota each had identical mutations, a G----T at nucleotide (nt) 637 in exon 6 leading to a Val----Leu substitution at amino acid 213. The mutations of Nashville and Anaheim were identical to each other, viz. G----A at nt 1178 in exon 10 producing a Arg----His substitution at amino acid 393. G6PD Loma Linda had a C----A substitution at nt 1089 in exon 10, producing a Asn----Lys change at amino acid 363. The results confirm our earlier results suggesting that the NADP-binding site is in a small region of exon 10 and suggest the possibility that this area is also concerned with the binding of glucose-6-P.  相似文献   

20.
The assumption of a molecular clock for dating events from sequence information is often frustrated by the presence of heterogeneity among evolutionary rates due, among other factors, to positively selected sites. In this work, our goal is to explore methods to estimate infection dates from sequence analysis. One such method, based on site stripping for clock detection, was proposed to unravel the clocklike molecular evolution in sequences showing high variability of evolutionary rates and in the presence of positive selection. Other alternatives imply accommodating heterogeneity in evolutionary rates at various levels, without eliminating any information from the data. Here we present the analysis of a data set of hepatitis C virus (HCV) sequences from 24 patients infected by a single individual with known dates of infection. We first used a simple criterion of relative substitution rate for site removal prior to a regression analysis. Time was regressed on maximum likelihood pairwise evolutionary distances between the sequences sampled from the source individual and infected patients. We show that it is indeed the fastest evolving sites that disturb the molecular clock and that these sites correspond to positively selected codons. The high computational efficiency of the regression analysis allowed us to compare the site-stripping scheme with random removal of sites. We demonstrate that removing the fast-evolving sites significantly increases the accuracy of estimation of infection times based on a single substitution rate. However, the time-of-infection estimations improved substantially when a more sophisticated and computationally demanding Bayesian method was used. This method was used with the same data set but keeping all the sequence positions in the analysis. Consequently, despite the distortion introduced by positive selection on evolutionary rates, it is possible to obtain quite accurate estimates of infection dates, a result of especial relevance for molecular epidemiology studies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号