首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
ProtTest 3: fast selection of best-fit models of protein evolution   总被引:2,自引:0,他引:2  
We have implemented a high-performance computing (HPC) version of ProtTest that can be executed in parallel in multicore desktops and clusters. This version, called ProtTest 3, includes new features and extended capabilities. AVAILABILITY: ProtTest 3 source code and binaries are freely available under GNU license for download from http://darwin.uvigo.es/software/prottest3, linked to a Mercurial repository at Bitbucket (https://bitbucket.org/). CONTACT: dposada@uvigo.es SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

2.
H Akashi  N Osada  T Ohta 《Genetics》2012,192(1):15-31
The "nearly neutral" theory of molecular evolution proposes that many features of genomes arise from the interaction of three weak evolutionary forces: mutation, genetic drift, and natural selection acting at its limit of efficacy. Such forces generally have little impact on allele frequencies within populations from generation to generation but can have substantial effects on long-term evolution. The evolutionary dynamics of weakly selected mutations are highly sensitive to population size, and near neutrality was initially proposed as an adjustment to the neutral theory to account for general patterns in available protein and DNA variation data. Here, we review the motivation for the nearly neutral theory, discuss the structure of the model and its predictions, and evaluate current empirical support for interactions among weak evolutionary forces in protein evolution. Near neutrality may be a prevalent mode of evolution across a range of functional categories of mutations and taxa. However, multiple evolutionary mechanisms (including adaptive evolution, linked selection, changes in fitness-effect distributions, and weak selection) can often explain the same patterns of genome variation. Strong parameter sensitivity remains a limitation of the nearly neutral model, and we discuss concave fitness functions as a plausible underlying basis for weak selection.  相似文献   

3.
The analysis of extant sequences shows that molecular evolution has been heterogeneous through time and among lineages. However, for a given sequence alignment, it is often difficult to uncover what factors caused this heterogeneity. In fact, identifying and characterizing heterogeneous patterns of molecular evolution along a phylogenetic tree is very challenging, for lack of appropriate methods. Users either have to a priori define groups of branches along which they believe molecular evolution has been similar or have to allow each branch to have its own pattern of molecular evolution. The first approach assumes prior knowledge that is seldom available, and the second requires estimating an unreasonably large number of parameters. Here we propose a convenient and reliable approach where branches get clustered by their pattern of molecular evolution alone, with no need for prior knowledge about the data set under study. Model selection is achieved in a statistical framework and therefore avoids overparameterization. We rely on substitution mapping for efficiency and present two clustering approaches, depending on whether or not we expect neighbouring branches to share more similar patterns of sequence evolution than distant branches. We validate our method on simulations and test it on four previously published data sets. We find that our method correctly groups branches sharing similar equilibrium GC contents in a data set of ribosomal RNAs and recovers expected footprints of selection through dN/dS. Importantly, it also uncovers a new pattern of relaxed selection in a phylogeny of Mantellid frogs, which we are able to correlate to life-history traits. This shows that our programs should be very useful to study patterns of molecular evolution and reveal new correlations between sequence and species evolution. Our programs can run on DNA, RNA, codon, or amino acid sequences with a large set of possible models of substitutions and are available at http://biopp.univ-montp2.fr/forge/testnh.  相似文献   

4.
Comparative sequence analyses, including such fundamental bioinformatics techniques as similarity searching, sequence alignment and phylogenetic inference, have become a mainstay for researchers studying type 1 Human Immunodeficiency Virus (HIV-1) genome structure and evolution. Implicit in comparative analyses is an underlying model of evolution, and the chosen model can significantly affect the results. In general, evolutionary models describe the probabilities of replacing one amino acid character with another over a period of time. Most widely used evolutionary models for protein sequences have been derived from curated alignments of hundreds of proteins, usually based on mammalian genomes. It is unclear to what extent these empirical models are generalizable to a very different organism, such as HIV-1-the most extensively sequenced organism in existence. We developed a maximum likelihood model fitting procedure to a collection of HIV-1 alignments sampled from different viral genes, and inferred two empirical substitution models, suitable for describing between-and within-host evolution. Our procedure pools the information from multiple sequence alignments, and provided software implementation can be run efficiently in parallel on a computer cluster. We describe how the inferred substitution models can be used to generate scoring matrices suitable for alignment and similarity searches. Our models had a consistently superior fit relative to the best existing models and to parameter-rich data-driven models when benchmarked on independent HIV-1 alignments, demonstrating evolutionary biases in amino-acid substitution that are unique to HIV, and that are not captured by the existing models. The scoring matrices derived from the models showed a marked difference from common amino-acid scoring matrices. The use of an appropriate evolutionary model recovered a known viral transmission history, whereas a poorly chosen model introduced phylogenetic error. We argue that our model derivation procedure is immediately applicable to other organisms with extensive sequence data available, such as Hepatitis C and Influenza A viruses.  相似文献   

5.
Callahan BJ 《Fly》2012,6(1):16-20
Central to the study of molecular evolution, and an area of long-standing debate, is the appropriate model for the fitness landscape of proteins. Much of this debate has focused on the strength and frequency of positive and purifying selection, but the form and frequency of selective correlations is also a vital element. The constituent amino acids within a protein generically interact and share selective pressures in predictable ways, which conflicts with the selective independence assumed by common caricatures of the fitness landscape. Here, I discuss a recent study by myself and coauthors that used whole-genome comparisons of orthologous molecular sequences from closely related Drosophilids to explore the form of the selective correlations and selective interactions (epistasis) between the amino acids within a protein. I outline our results and highlight our finding of a selective length scale of ten amino acids within which individual amino acids are substantially and generically more likely to share selective pressures and interact epistatically. I then focus on the evidence presented in our study supporting a substantial role for epistasis in the process of molecular evolution, and discuss further the implications of this widespread epistasis on the overdispersion of the molecular clock and the efficacy of common tests for positive selection.  相似文献   

6.
Summary The adaptation to a variable environment has been studied within soft and hard selection frameworks. It is shown that an epistatically determined habitat preference, following a Markovian process, always leads to the maintenance of an adaptive polymorphism, in a soft selection context. Although local mating does not alter the conditions for polymorphism maintenance, it is shown that, in that case, habitat selection also leads to the evolution of isolated reproductive units within each available habitat. Habitat selection, however, cannot evolve in the total absence of adaptive polymorphism. This represents a theoretical problem for all models assuming habitat selection to be an initially fixed trait, and means that within a soft selection framework, all the available habitats will be exploited, even the less favourable ones.On the other hand, polymorphism cannot be maintained when selection is hard, even when all individuals select their habitat. Here, the evolution of habitat selection does not need any prerequisite polymorphism, and always leads to the exploitation of only one habitat by the most specialized genotype. It appears then that hard selection can account for the existence of empty habitat and for an easier evolution of habitat specialization.  相似文献   

7.
Computer simulation is an essential tool in the analysis ofDNA sequence variation for mapping events of recent adaptiveevolution in the genome. Various simulation methods are employedto predict the signature of selection in sequence variation.The most informative and efficient method currently in use iscoalescent simulation. However, this method is limited to simplemodels of directional selection. Whole-population forward-in-timesimulations are the alternative to coalescent simulations formore complex models. The notorious problem of excessive computationalcost in forward-in-time simulations can be overcome by varioussimplifying amendments. Overall, the success of simulationsdepends on the creative application of some population genetictheory to the simulation algorithm.   相似文献   

8.
Over the years, there have been claims that evolution proceeds according to systematically different processes over different timescales and that protein evolution behaves in a non-Markovian manner. On the other hand, Markov models are fundamental to many applications in evolutionary studies. Apparent non-Markovian or time-dependent behavior has been attributed to influence of the genetic code at short timescales and dominance of physicochemical properties of the amino acids at long timescales. However, any long time period is simply the accumulation of many short time periods, and it remains unclear why evolution should appear to act systematically differently across the range of timescales studied. We show that the observed time-dependent behavior can be explained qualitatively by modeling protein sequence evolution as an aggregated Markov process (AMP): a time-homogeneous Markovian substitution model observed only at the level of the amino acids encoded by the protein-coding DNA sequence. The study of AMPs sheds new light on the relationship between amino acid-level and codon-level models of sequence evolution, and our results suggest that protein evolution should be modeled at the codon level rather than using amino acid substitution models.  相似文献   

9.
Understanding the evolution of biopolymers is important to rationalise the directed and undirected design of functional molecules. Large scale experiments or detailed computational studies are often impractical. Therefore, simple model systems, such as RNA secondary structure and lattice proteins have been adapted to study general statistical and topological features of genotype (sequence) to phenotype (structure) maps. We review findings from such models that address aspects of thermodynamic and mutational robustness, neutral evolution and recombination of proteins. We compare various modelling approaches, and discuss their generality, parameter dependency and experimental verifications of their predictions. The most striking observation is the universal emergence of neutral nets--sets of phenotypically identical genotypes that are interconnected by series of point mutations. However, fast adaptation by point mutations appears to be problematic for proteins. This may explain why proteins appear to be more specific while RNA is rather versatile. This could even be the reason why RNA had to evolve before proteins. Similar principles of biological organisation are reflected in sequence and structure databases of real proteins. Insights gained from modelling are useful for designing more efficient database organisation and search strategies.  相似文献   

10.
11.
Over the past century and half since the process of natural selection was first described, one enduring question has captivated many, "how predictable is evolution?" Because natural selection comprises deterministic components, the course of evolution may exhibit some level of predictability across organismal groups. Here, I provide an early appraisal of the utility of one particular approach to understanding the predictability of evolution: generalized models of divergent selection (GMDS). The GMDS approach is meant to provide a unifying framework for the science of evolutionary prediction, offering a means of better understanding the causes and consequences of phenotypic and genetic evolution. I describe and test a GMDS centered on the evolution of body shape, size of the gonopodium (sperm-transfer organ), steady-swimming abilities, fast-start swimming performance, and reproductive isolation between populations in Gambusia fishes (Family Poeciliidae). The GMDS produced some accurate evolutionary predictions in Gambusia, identifying variation in intensity of predation by piscivorous fish as a major factor driving repeatable and predictable phenotypic divergence, and apparently playing a key role in promoting ecological speciation. Moreover, the model's applicability seems quite general, as patterns of differentiation in body shape between predator regimes in many disparate fishes match the model's predictions. The fact that such a simple model could yield accurate evolutionary predictions in distantly related fishes inhabiting different geographic regions and types of habitat, and experiencing different predator species, suggests that the model pinpointed a causal factor underlying major, shared patterns of diversification. The GMDS approach appears to represent a promising method of addressing the predictability of evolution and identifying environmental factors responsible for driving major patterns of replicated evolution.  相似文献   

12.
13.
Models of protein evolution currently come in two flavors: generalist and specialist. Generalist models (e.g. PAM, JTT, WAG) adopt a one-size-fits-all approach, where a single model is estimated from a number of different protein alignments. Specialist models (e.g. mtREV, rtREV, HIVbetween) can be estimated when a large quantity of data are available for a single organism or gene, and are intended for use on that organism or gene only. Unsurprisingly, specialist models outperform generalist models, but in most instances there simply are not enough data available to estimate them. We propose a method for estimating alignment-specific models of protein evolution in which the complexity of the model is adapted to suit the richness of the data. Our method uses non-negative matrix factorization (NNMF) to learn a set of basis matrices from a general dataset containing a large number of alignments of different proteins, thus capturing the dimensions of important variation. It then learns a set of weights that are specific to the organism or gene of interest and for which only a smaller dataset is available. Thus the alignment-specific model is obtained as a weighted sum of the basis matrices. Having been constrained to vary along only as many dimensions as the data justify, the model has far fewer parameters than would be required to estimate a specialist model. We show that our NNMF procedure produces models that outperform existing methods on all but one of 50 test alignments. The basis matrices we obtain confirm the expectation that amino acid properties tend to be conserved, and allow us to quantify, on specific alignments, how the strength of conservation varies across different properties. We also apply our new models to phylogeny inference and show that the resulting phylogenies are different from, and have improved likelihood over, those inferred under standard models.  相似文献   

14.
It has recently been discovered that many biological systems, when represented as graphs, exhibit a scale-free topology. One such system is the set of structural relationships among protein domains. The scale-free nature of this and other systems has previously been explained using network growth models that, although motivated by biological processes, do not explicitly consider the underlying physics or biology. In this work we explore a sequence-based model for the evolution protein structures and demonstrate that this model is able to recapitulate the scale-free nature observed in graphs of real protein structures. We find that this model also reproduces other statistical feature of the protein domain graph. This represents, to our knowledge, the first such microscopic, physics-based evolutionary model for a scale-free network of biological importance and as such has strong implications for our understanding of the evolution of protein structures and of other biological networks.  相似文献   

15.
A model of the P-M system of hybrid dysgenesis is presented which incorporates single-site transposition of P factors in M cytotype, determination of offspring cytotype by both maternal cytotype and maternal or offspring nuclear genotype, and strong fertility selection in dysgenic individuals. The conditions required for the initial invasion of P factors into a pure M population, information concerning stable polymorphisms, and results of numerical iterations depicting the dynamic, nonequilibrium behavior of the system are summarized. While conditions for initial increase are independent of the rate of cytotype switching, the rate of evolution is accelerated by increased production of dysgenic individuals. If the transposition rate is sufficiently high to overcome the fertility barrier opposing P factors introduced into M populations, then convergence to high frequencies of the P factor occurs very rapidly. Under intense fertility depression, the phase of rapid increase may be preceded by an extended period of gradual increase at low frequencies.  相似文献   

16.
There are two approaches to the discovery of enzyme mimics, that is identifying molecules that are able to bind substrate(s) and then catalyze reactions. The first approach, often inspired by enzymes themselves, utilises chemical knowledge and experience to design the catalyst. The other approach is to create a library and select the best host of a transition state analogue of the required reaction.  相似文献   

17.
18.
There is increasing evidence that epigenetic modifications can be passed from one generation to the next. The population-level consequence of these discoveries, however, remains largely unexplored. In this paper, we introduce and analyze some simple models of constant viability selection acting on such heritable epigenetic variation. These “population-epigenetic” models are analogous to those of traditional population genetics, and are a preliminary step in quantifying the effect of non-genomic transgenerational inheritance, aiming to improve our understanding of how this sort of environmental response may affect evolution.  相似文献   

19.
The relationship between fertility selection as measured by the correlation in progeny number between parents and offspring, and selection at individual loci is investigated in humans. Estimates for the magnitude of fertility selection (0.1) and the rate of gene substitution (0.5 gene substitutions per generation per genome) are used in various mathematical models for selection. It is found that the observed magnitude of fertility selection cannot be explained by non‐epistatic directional selection at individual loci. A symmetric quantitative directional selection model is consistent with the observed data. But it is possible that fertility selection does not have a genetic basis.  相似文献   

20.
Concept of neurodarvinism is regarded in the context of simulation of the "natural" and "artificial" selection of neurons, synapses and neuronal groups. "Natural" selection of neurons is based on mobile devices built of neuron-like elements. These devices should be capable for adaptation to real surrounding. "Artificial" selection of neurons is performed using computerized "neurointelligence" model operating in a virtual environment. Comparison of the models suggests the advantage of the integration of these approaches.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号