首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
MOTIVATION: Neighbor-dependent substitution processes generated specific pattern of dinucleotide frequencies in the genomes of most organisms. The CpG-methylation-deamination process is, e.g. a prominent process in vertebrates (CpG effect). Such processes, often with unknown mechanistic origins, need to be incorporated into realistic models of nucleotide substitutions. RESULTS: Based on a general framework of nucleotide substitutions we developed a method that is able to identify the most relevant neighbor-dependent substitution processes, estimate their relative frequencies and judge their importance in order to be included into the modeling. Starting from a model for neighbor independent nucleotide substitution we successively added neighbor-dependent substitution processes in the order of their ability to increase the likelihood of the model describing given data. The analysis of neighbor-dependent nucleotide substitutions based on repetitive elements found in the genomes of human, zebrafish and fruit fly is presented. AVAILABILITY: A web server to perform the presented analysis is freely available at: http://evogen.molgen.mpg.de/server/substitution-analysis  相似文献   

2.
Most models and algorithms developed to perform statistical inference from DNA data make the assumption that substitution processes affecting distinct nucleotide sites are stochastically independent. This assumption ensures both mathematical and computational tractability but is in disagreement with observed data in many situations--one well-known example being CpG dinucleotide hypermutability in mammalian genomes. In this paper, we consider the class of RN95 + YpR substitution models, which allows neighbor-dependent effects--including CpG hypermutability--to be taken into account, through transitions between pyrimidine-purine dinucleotides. We show that it is possible to adapt inference methods originally developed under the assumption of independence between sites to RN95 + YpR models, using a mathematically rigorous framework provided by specific structural properties of this class of models. We assess how efficient this approach is at inferring the CpG hypermutability rate from aligned DNA sequences. The method is tested on simulated data and compared against several alternatives; the results suggest that it delivers a high degree of accuracy at a low computational cost. We then apply our method to an alignment of 10 DNA sequences from primate species. Model comparisons within the RN95 + YpR class show the importance of taking into account neighbor-dependent effects. An application of the method to the detection of hypomethylated islands is discussed.  相似文献   

3.
I formulate a simple model of the ultimatum game, in which a proposer and a responder can receive a reward if they agree on how to divide this reward between them. The model is easy to analyse and shows that strong tendencies to fair division are expected when evolution of strategy frequencies follow the traditional gradient dynamics assumed in evolutionary models. The mean stable offer is typically around 20-40% although this depends on the maximum payoff and if rejection thresholds can evolve independently from proposals. The stable proportion offered at evolutionary equilibrium increases with the maximum payoff, if proposal and acceptance thresholds are dictated by the same strategy and cannot evolve independently. If proposal and acceptance evolve independently, the stable proportion instead decreases with the maximum payoff. The stable outcome may also show substantial variation.  相似文献   

4.
5.
Most current models of sequence evolution assume that all sites of a protein evolve under the same substitution process, characterized by a 20 x 20 substitution matrix. Here, we propose to relax this assumption by developing a Bayesian mixture model that allows the amino-acid replacement pattern at different sites of a protein alignment to be described by distinct substitution processes. Our model, named CAT, assumes the existence of distinct processes (or classes) differing by their equilibrium frequencies over the 20 residues. Through the use of a Dirichlet process prior, the total number of classes and their respective amino-acid profiles, as well as the affiliations of each site to a given class, are all free variables of the model. In this way, the CAT model is able to adapt to the complexity actually present in the data, and it yields an estimate of the substitutional heterogeneity through the posterior mean number of classes. We show that a significant level of heterogeneity is present in the substitution patterns of proteins, and that the standard one-matrix model fails to account for this heterogeneity. By evaluating the Bayes factor, we demonstrate that the standard model is outperformed by CAT on all of the data sets which we analyzed. Altogether, these results suggest that the complexity of the pattern of substitution of real sequences is better captured by the CAT model, offering the possibility of studying its impact on phylogenetic reconstruction and its connections with structure-function determinants.  相似文献   

6.
7.
The impact of synonymous nucleotide substitutions on fitness in mammals remains controversial. Despite some indications of selective constraint, synonymous sites are often assumed to be neutral, and the rate of their evolution is used as a proxy for mutation rate. We subdivide all sites into four classes in terms of the mutable CpG context, nonCpG, postC, preG, and postCpreG, and compare four-fold synonymous sites and intron sites residing outside transposable elements. The distribution of the rate of evolution across all synonymous sites is trimodal. Rate of evolution at nonCpG synonymous sites, not preceded by C and not followed by G, is approximately 10% below that at such intron sites. In contrast, rate of evolution at postCpreG synonymous sites is approximately 30% above that at such intron sites. Finally, synonymous and intron postC and preG sites evolve at similar rates. The relationship between the levels of polymorphism at the corresponding synonymous and intron sites is very similar to that between their rates of evolution. Within every class, synonymous sites are occupied by G or C much more often than intron sites, whose nucleotide composition is consistent with neutral mutation-drift equilibrium. These patterns suggest that synonymous sites are under weak selection in favor of G and C, with the average coefficient s approximately 0.25/Ne approximately 10(-5), where Ne is the effective population size. Such selection decelerates evolution and reduces variability at sites with symmetric mutation, but has the opposite effects at sites where the favored nucleotides are more mutable. The amino-acid composition of proteins dictates that many synonymous sites are CpGprone, which causes them, on average, to evolve faster and to be more polymorphic than intron sites. An average genotype carries approximately 10(7) suboptimal nucleotides at synonymous sites, implying synergistic epistasis in selection against them.  相似文献   

8.
Cooperative binding is one of the most interesting and not fully understood phenomena involved in control and regulation of biological processes. Here we analyze the simplest phenomenological model that can account for cooperativity (i.e. ligand binding to a macromolecule with two binding sites) by generating equilibrium binding isotherms from deterministically simulated binding time courses. We show that the Hill coefficients determined for cooperative binding, provide a good measure of the Gibbs free energy of interaction among binding sites, and that their values are independent of the free energy of association for empty sites. We also conclude that although negative cooperativity and different classes of binding sites cannot be distinguished at equilibrium, they can be kinetically differentiated. This feature highlights the usefulness of pre-equilibrium time-resolved strategies to explore binding models as a key complement of equilibrium experiments. Furthermore, our analysis shows that under conditions of strong negative cooperativity, the existence of some binding sites can be overlooked, and experiments at very high ligand concentrations can be a valuable tool to unmask such sites.  相似文献   

9.
Pairwise alignment incorporating dipeptide covariation   总被引:1,自引:0,他引:1  
MOTIVATION: Standard algorithms for pairwise protein sequence alignment make the simplifying assumption that amino acid substitutions at neighboring sites are uncorrelated. This assumption allows implementation of fast algorithms for pairwise sequence alignment, but it ignores information that could conceivably increase the power of remote homolog detection. We examine the validity of this assumption by constructing extended substitution matrices that encapsulate the observed correlations between neighboring sites, by developing an efficient and rigorous algorithm for pairwise protein sequence alignment that incorporates these local substitution correlations and by assessing the ability of this algorithm to detect remote homologies. RESULTS: Our analysis indicates that local correlations between substitutions are not strong on the average. Furthermore, incorporating local substitution correlations into pairwise alignment did not lead to a statistically significant improvement in remote homology detection. Therefore, the standard assumption that individual residues within protein sequences evolve independently of neighboring positions appears to be an efficient and appropriate approximation.  相似文献   

10.
Billiard S  Castric V  Vekemans X 《Genetics》2007,175(3):1351-1369
We developed a general model of sporophytic self-incompatibility under negative frequency-dependent selection allowing complex patterns of dominance among alleles. We used this model deterministically to investigate the effects on equilibrium allelic frequencies of the number of dominance classes, the number of alleles per dominance class, the asymmetry in dominance expression between pollen and pistil, and whether selection acts on male fitness only or both on male and on female fitnesses. We show that the so-called "recessive effect" occurs under a wide variety of situations. We found emerging properties of finite population models with several alleles per dominance class such as that higher numbers of alleles are maintained in more dominant classes and that the number of dominance classes can evolve. We also investigated the occurrence of homozygous genotypes and found that substantial proportions of those can occur for the most recessive alleles. We used the model for two species with complex dominance patterns to test whether allelic frequencies in natural populations are in agreement with the distribution predicted by our model. We suggest that the model can be used to test explicitly for additional, allele-specific, selective forces.  相似文献   

11.
The aim of this article is to study lattice models of neutral multi-alleles including Ohta-Kimura's step-wise mutation model. We shall show an outline of the construction of a unique strongly continuous non-negative semi-group associated with the infinite dimensional generator and show a general and straightforward method of obtaining the time dependent and equilibrium solutions of all polynomial moments of the gene frequencies. We shall discuss the spectrum of the diffusion processes and as an application we obtain all higher moments of the homozygosity.  相似文献   

12.
Models of codon substitution are developed that incorporate physicochemical properties of amino acids. When amino acid sites are inferred to be under positive selection, these models suggest the nature and extent of the physicochemical properties under selection. This is accomplished by first partitioning the codons on the basis of some property of the encoded amino acids. This partition is used to parametrize the rates of property-conserving and property-altering base substitutions at the codon level by means of finite mixtures of Markov models that also account for codon and transition:transversion biases. Here, we apply this method to two positively selected receptors involved in ligand-recognition: the class I alleles of the human major histocompatibility complex (MHC) of known structure and the S-locus receptor kinase (SRK) of the sporophytic self-incompatibility system (SSI) in cruciferous plants (Brassicaceae), whose structure is unknown. Through likelihood ratio tests we demonstrate that at some sites, the positively selected MHC and SRK proteins are under physicochemical selective pressures to alter polarity, volume, polarity and/or volume, and charge to various extents. An empirical Bayes approach is used to identify sites that may be important for ligand recognition in these proteins.Reviewing Editor : Dr. Willie Swanson  相似文献   

13.
Molecular evolution of the avian CHD1 genes on the Z and W sex chromosomes   总被引:5,自引:0,他引:5  
Fridolfsson AK  Ellegren H 《Genetics》2000,155(4):1903-1912
Genes shared between the nonrecombining parts of the two types of sex chromosomes offer a potential means to study the molecular evolution of the same gene exposed to different genomic environments. We have analyzed the molecular evolution of the coding sequence of the first pair of genes found to be shared by the avian Z (present in both sexes) and W (female-specific) sex chromosomes, CHD1Z and CHD1W. We show here that these two genes evolve independently but are highly conserved at nucleotide as well as amino acid levels, thus not indicating a female-specific role of the CHD1W gene. From comparisons of sequence data from three avian lineages, the frequency of nonsynonymous substitutions (K(a)) was found to be higher for CHD1W (1.55 per 100 sites) than for CHD1Z (0.81), while the opposite was found for synonymous substitutions (K(s), 13.5 vs. 22.7). We argue that the lower effective population size and the absence of recombination on the W chromosome will generally imply that nonsynonymous substitutions accumulate faster on this chromosome than on the Z chromosome. The same should be true for the Y chromosome relative to the X chromosome in XY systems. Our data are compatible with a male-biased mutation rate, manifested by the faster rate of neutral evolution (synonymous substitutions) on the Z chromosome than on the female-specific W chromosome.  相似文献   

14.
In free-living microorganisms, such as Escherichia coli and Saccharomyces cerevisiae, both synonymous and nonsynonymous substitution frequencies correlate with expression levels. Here, we have tested the hypothesis that the correlation between amino acid substitution rates and expression is a by-product of selection for codon bias and translational efficiency in highly expressed genes. To this end, we have examined the correlation between protein evolutionary rates and expression in the human gastric pathogen Helicobacter pylori, where the absence of selection on synonymous sites enables the two types of substitutions to be uncoupled. The results revealed a statistically significant negative correlation between expression levels and nonsynonymous substitutions in both H. pylori and E. coli. We also found that neighboring genes located on the same, but not on opposite strands, evolve at significantly more similar rates than random gene pairs, as expected by co-expression of genes located in the same operon. However, the two species differ in that synonymous substitutions show a strand-specific pattern in E. coli, whereas the weak similarity in synonymous substitutions for neighbors in H. pylori is independent of gene orientation. These results suggest a direct influence of expression levels on nonsynonymous substitution frequencies independent of codon bias and selective constraints on synonymous sites. Electronic Supplementary Material Electronic Supplementary material is available for this article at and accessible for authorised users. [Reviewing Editor: Dr. Nicolas Galtier]  相似文献   

15.
16.
The strength and direction of selection on the identity of an amino acid residue in a protein is typically measured by the ratio of the rate of non-synonymous substitutions to the rate of synonymous substitutions. In attempting to predict positively selected sites from amino acid alignments, we made the unexpected observation that the site likelihood of an alignment column for a given tree tends to be negatively correlated with the posterior probability that site is in the positive selection class under widely-used codon models. This is likely because positively selected sites tend to be more variable and display more “radical” amino acid changes; both of these features are expected to result in low site log-likelihoods. We explored the efficacy of using the site log-likelihood (SLL) score as a predictor for positive selection. Through simulation we show that a SLL-based test has a low false positive rate and comparable power as the codon models. In one case where the simulated data violated the assumption that synonymous substitution rates were constant across the sites, the codon models were not able to detect positive selection in the data while the SLL test did. We applied the new method to ten empirical datasets and found that it made similar predictions as the codon models in eight of them. For the tax gene dataset the SLL test seemed to produce more reasonable results. The SLL methods are a valuable complement to codon models, especially for some cases where the assumptions of codon models are likely violated.  相似文献   

17.
We consider models of nucleotidic substitution processes where the rate of substitution at a given site depends on the state of the neighbours of the site. We first estimate the time elapsed between an ancestral sequence at stationarity and a present sequence. Second, assuming that two sequences are issued from a common ancestral sequence at stationarity, we estimate the time since divergence. In the simplest non-trivial case of a Jukes-Cantor model with CpG influence, we provide and justify mathematically consistent estimators in these two settings. We also provide asymptotic confidence intervals, valid for nucleotidic sequences of finite length, and we compute explicit formulas for the estimators and for their confidence intervals. In the general case of an RN model with YpR influence, we extend these results under a proviso, namely that the equation defining the estimator has a unique solution.  相似文献   

18.
A model-based approach for detecting coevolving positions in a molecule   总被引:4,自引:0,他引:4  
We present a new method for detecting coevolving sites in molecules. The method relies on a set of aligned sequences (nucleic acid or protein) and uses Markov models of evolution to map the substitutions that occurred at each site onto the branches of the underlying phylogenetic tree. This mapping takes into account the uncertainty over ancestral states and among-site rate variation. We then build, for each site, a "substitution vector" containing the posterior estimates of the number of substitutions in each branch. The amount of coevolution for a pair of sites is then measured as the Pearson correlation coefficient between the two corresponding substitution vectors and compared to the expectation under the null hypothesis of independence. We applied the method to a 79-species bacterial ribosomal RNA data set, for which extensive structural characterization has been done over the last 30 years. More than 95% of the intramolecular predicted pairs of sites correspond to known interacting site pairs.  相似文献   

19.
Computational simulation models can provide a way of understanding and predicting insect population dynamics and evolution of resistance, but the usefulness of such models depends on generating or estimating the values of key parameters. In this paper, we describe four numerical algorithms generating or estimating key parameters for simulating four different processes within such models. First, we describe a novel method to generate an offspring genotype table for one- or two-locus genetic models for simulating evolution of resistance, and how this method can be extended to create offspring genotype tables for models with more than two loci. Second, we describe how we use a generalized inverse matrix to find a least-squares solution to an over-determined linear system for estimation of parameters in probit models of kill rates. This algorithm can also be used for the estimation of parameters of Freundlich adsorption isotherms. Third, we describe a simple algorithm to randomly select initial frequencies of genotypes either without any special constraints or with some pre-selected frequencies. Also we give a simple method to calculate the “stable” Hardy–Weinberg equilibrium proportions that would result from these initial frequencies. Fourth we describe how the problem of estimating the intrinsic rate of natural increase of a population can be converted to a root-finding problem and how the bisection algorithm can then be used to find the rate. We implemented all these algorithms using MATLAB and Python code; the key statements in both codes consist of only a few commands and are given in the appendices. The results of numerical experiments are also provided to demonstrate that our algorithms are valid and efficient.  相似文献   

20.
Simple models of molecular evolution assume that sequences evolve by a Poisson process in which nucleotide or amino acid substitutions occur as rare independent events. In these models, the expected ratio of the variance to the mean of substitution counts equals 1, and substitution processes with a ratio greater than 1 are called overdispersed. Comparing the genomes of 10 closely related species of Drosophila, we extend earlier evidence for overdispersion in amino acid replacements as well as in four-fold synonymous substitutions. The observed deviation from the Poisson expectation can be described as a linear function of the rate at which substitutions occur on a phylogeny, which implies that deviations from the Poisson expectation arise from gene-specific temporal variation in substitution rates. Amino acid sequences show greater temporal variation in substitution rates than do four-fold synonymous sequences. Our findings provide a general phenomenological framework for understanding overdispersion in the molecular clock. Also, the presence of substantial variation in gene-specific substitution rates has broad implications for work in phylogeny reconstruction and evolutionary rate estimation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号