首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
SUMMARY: JaDis is a Java application for computing evolutionary distances between nucleic acid sequences and G+C base frequencies. It allows specific comparison of coding sequences, of non-coding sequences or of a non-coding sequence with coding sequences. AVAILABILITY: http://pbil.univ-lyon1.fr/software/jadis.html  相似文献   

2.
3.
The evolutionary selection forces acting on a protein are commonly inferred using evolutionary codon models by contrasting the rate of synonymous to nonsynonymous substitutions. Most widely used models are based on theoretical assumptions and ignore the empirical observation that distinct amino acids differ in their replacement rates. In this paper, we develop a general method that allows assimilation of empirical amino acid replacement probabilities into a codon-substitution matrix. In this way, the resulting codon model takes into account not only the transition-transversion bias and the nonsynonymous/synonymous ratio, but also the different amino acid replacement probabilities as specified in empirical amino acid matrices. Different empirical amino acid replacement matrices, such as secondary structure-specific matrices or organelle-specific matrices (e.g., mitochondria and chloroplasts), can be incorporated into the model, making it context dependent. Using a diverse set of coding DNA sequences, we show that the novel model better fits biological data as compared with either mechanistic or empirical codon models. Using the suggested model, we further analyze human immunodeficiency virus type 1 protease sequences obtained from drug-treated patients and reveal positive selection in sites that are known to confer drug resistance to the virus.  相似文献   

4.
以7种古菌、46种细菌和10种真核生物的基因组为样本,考虑碱基间的短程关联和长程关联作用,得到编码序列的密码对和基因间序列的三联体对中不同位点的二核苷酸频率,据此构建了基于编码序列和基因间序列的系统发生关系。无论是基于编码序列还是基因间序列对信息进行聚类,古菌或真核均被聚在一支上,表明聚类参数的选择是合适的;与基于氨基酸序列构建的系统发生关系进行两两比较,发现大部分硬壁菌的编码序列与基因间序列之间,以及编码序列与氨基酸序列之间的进化都存在较大差异。通过分析认为,只有综合考虑这三类序列的进化信息,才可能得到更自然的系统发生关系。  相似文献   

5.
The human genome contains an estimated 100,000 to 300,000 DNA variants that alter an amino acid in an encoded protein. However, our ability to predict which of these variants are functionally significant is limited. We used a bioinformatics approach to define the functional significance of genetic variation in the ABCA1 gene, a cholesterol transporter crucial for the metabolism of high density lipoprotein cholesterol. To predict the functional consequence of each coding single nucleotide polymorphism and mutation in this gene, we calculated a substitution position-specific evolutionary conservation score for each variant, which considers site-specific variation among evolutionarily related proteins. To test the bioinformatics predictions experimentally, we evaluated the biochemical consequence of these sequence variants by examining the ability of cell lines stably transfected with the ABCA1 alleles to elicit cholesterol efflux. Our bioinformatics approach correctly predicted the functional impact of greater than 94% of the naturally occurring variants we assessed. The bioinformatics predictions were significantly correlated with the degree of functional impairment of ABCA1 mutations (r2 = 0.62, p = 0.0008). These results have allowed us to define the impact of genetic variation on ABCA1 function and to suggest that the in silico evolutionary approach we used may be a useful tool in general for predicting the effects of DNA variation on gene function. In addition, our data suggest that considering patterns of positive selection, along with patterns of negative selection such as evolutionary conservation, may improve our ability to predict the functional effects of amino acid variation.  相似文献   

6.
We propose a model that explains the hierarchical organization of proteins in fold families. The model, which is based on the evolutionary selection of proteins by their native state stability, reproduces patterns of amino acids conserved across protein families. Due to its dynamic nature, the model sheds light on the evolutionary time-scales. By studying the relaxation of the correlation function between consecutive mutations at a given position in proteins, we observe separation of the evolutionary time-scales: at short time intervals families of proteins with similar sequences and structures are formed, while at long time intervals the families of structurally similar proteins that have low sequence similarity are formed. We discuss the evolutionary implications of our model. We provide a "profile" solution to our model and find agreement between predicted patterns of conserved amino acids and those actually observed in nature.  相似文献   

7.
Proteins evolve under a myriad of biophysical selection pressures that collectively control the patterns of amino acid substitutions. These evolutionary pressures are sufficiently consistent over time and across protein families to produce substitution patterns, summarized in global amino acid substitution matrices such as BLOSUM, JTT, WAG, and LG, which can be used to successfully detect homologs, infer phylogenies, and reconstruct ancestral sequences. Although the factors that govern the variation of amino acid substitution rates have received much attention, the influence of thermodynamic stability constraints remains unresolved. Here we develop a simple model to calculate amino acid substitution matrices from evolutionary dynamics controlled by a fitness function that reports on the thermodynamic effects of amino acid mutations in protein structures. This hybrid biophysical and evolutionary model accounts for nucleotide transition/transversion rate bias, multi‐nucleotide codon changes, the number of codons per amino acid, and thermodynamic protein stability. We find that our theoretical model accurately recapitulates the complex yet universal pattern observed in common global amino acid substitution matrices used in phylogenetics. These results suggest that selection for thermodynamically stable proteins, coupled with nucleotide mutation bias filtered by the structure of the genetic code, is the primary driver behind the global amino acid substitution patterns observed in proteins throughout the tree of life.  相似文献   

8.
The sequence of all presently known trypsin-related serine proteases and their zymogens of animal and bacterial origin were optimally aligned on the basis of three different scoring schemes for amino acid comparisons. Sequence homology was found to extend into the activation peptides. The gaps resulting from the alignment of the sequences of the active enzymes formed the basis for a new procedure based on position and number of gaps, which allowed the correct topology of the evolutionary relationship of thrombin and the pancreatic enzymes trypsin, chymotrypsin and elastase to be determined. The procedure was applied in an analogous manner to changes in disulfide bridges as well as to a selected set of amino acid positions.Evolutionary distances between proteins were estimated by minimum, base differences as well as according to the stochastic model of evolution. These distances were used successfully to find the best topology of evolutionary relationships. The fact that the branch lengths in evolutionary trees were less affected by the number of sequences considered when evolutionary distances between contemporary sequences were measured in minimum base differences than when measured according to the stochastic model of evolution, suggested in our specific case, that minimum base differences yielded estimates of evolutionary distance closer to reality than the stochastic model of evolution.All these techniques combined yielded the following picture for the evolution of the four protease families. Prothrombin and the zymogens of the pancreatic serine proteases had a common ancestor with tryptic specificity. After the initial divergence, the gene for trypsinogen duplicated. Evidence was found that the duplicated gene underwent drastic changes for a short period of time to become eventually the common ancestor of chymotrypsin and elastase. The phylogenetic tree elaborated for these enzyme families and the methods introduced to determine its topology, should readily allow determination of the attachment site of branches leading to newly sequenced serine proteases, provided their amino acid sequence can be aligned fairly unambiguously. In addition, the consequences of the alignment of the different serine proteases for the relationship of zymogen to enzyme are discussed.  相似文献   

9.
10.
The ability of the principle of parsimony to accurately reconstruct molecular evolutionary pathways from an analysis of amino acid or nucleic acid sequences from extant organisms is tested by direct comparison with a known pathway. Topological errors occur under specified conditions. Importantly, given no errors in the topology, and error-free experimental sequences, the ancestral sequences inferred by the parsimony principle err significantly, the magnitude of the error increasing with the distance of the nodal sequence from the present. These errors are irreducible as an inherent consequence of any evolutionary process in which chance processes operate within the constraints imposed by Darwinian selection. Formulae are derived which predict the errors in the ancestral sequences from a knowledge of only the internodal distances. The parsimony solution is not a reliably good solution. It is necessary to develop a detailed understanding of the interaction between chance processes and natural selection to further advance our understanding of molecular change in proteins and nucleic acids.  相似文献   

11.
Ramsey DC  Scherrer MP  Zhou T  Wilke CO 《Genetics》2011,188(2):479-488
Recent work with Saccharomyces cerevisiae shows a linear relationship between the evolutionary rate of sites and the relative solvent accessibility (RSA) of the corresponding residues in the folded protein. Here, we aim to develop a mathematical model that can reproduce this linear relationship. We first demonstrate that two models that both seem reasonable choices (a simple model in which selection strength correlates with RSA and a more complex model based on RSA-dependent amino acid distributions) fail to reproduce the observed relationship. We then develop a model on the basis of observed site-specific amino acid distributions and show that this model behaves appropriately. We conclude that evolutionary rates are directly linked to the distribution of amino acids at individual sites. Because of this link, any future insight into the biophysical mechanisms that determine amino acid distributions will improve our understanding of evolutionary rates.  相似文献   

12.
Pei J  Grishin NV 《Proteins》2004,56(4):782-794
We study the effects of various factors in representing and combining evolutionary and structural information for local protein structural prediction based on fragment selection. We prepare databases of fragments from a set of non-redundant protein domains. For each fragment, evolutionary information is derived from homologous sequences and represented as estimated effective counts and frequencies of amino acids (evolutionary frequencies) at each position. Position-specific amino acid preferences called structural frequencies are derived from statistical analysis of discrete local structural environments in database structures. Our method for local structure prediction is based on ranking and selecting database fragments that are most similar to a target fragment. Using secondary structure type as a local structural property, we test our method in a number of settings. The major findings are: (1) the COMPASS-type scoring function for fragment similarity comparison gives better prediction accuracy than three other tested scoring functions for profile-profile comparison. We show that the COMPASS-type scoring function can be derived both in the probabilistic framework and in the framework of statistical potentials. (2) Using the evolutionary frequencies of database fragments gives better prediction accuracy than using structural frequencies. (3) Finer definition of local environments, such as including more side-chain solvent accessibility classes and considering the backbone conformations of neighboring residues, gives increasingly better prediction accuracy using structural frequencies. (4) Combining evolutionary and structural frequencies of database fragments, either in a linear fashion or using a pseudocount mixture formula, results in improvement of prediction accuracy. Combination at the log-odds score level is not as effective as combination at the frequency level. This suggests that there might be better ways of combining sequence and structural information than the commonly used linear combination of log-odds scores. Our method of fragment selection and frequency combination gives reasonable results of secondary structure prediction tested on 56 CASP5 targets (average SOV score 0.77), suggesting that it is a valid method for local protein structure prediction. Mixture of predicted structural frequencies and evolutionary frequencies improve the quality of local profile-to-profile alignment by COMPASS.  相似文献   

13.
The Human Genome Project has provided abundant gene sequence information on human and important model organisms. The chicken is well positioned from an evolutionary standpoint to serve as a link between higher and lower organisms, particularly mammals, and amphibia and fish. In this study we used stringent criteria to select 565 triples of chicken, human, and mouse candidate orthologs. We analyze the sequences with respect to nucleotide and amino acid similarities. This analysis also allows measurement of evolutionary distances of different proteins. We found that chicken-human and chicken-mouse sequence identities are highly correlated; similarly for chicken-human and chicken-mouse evolutionary distances. With chicken as the out-group, we found that mouse has a higher substitution rate than human, supporting the generation-time effect hypothesis. We also described the transversion bias, which is the preference for some transversions than others in nucleotide substitutions. We demonstrated that there are statistically significant properties in the differences of orthologous sequences. The differential patterns, in combination with sequence similarity analysis, may lead to the identification of genes that are very divergent from the mammalian orthologs.  相似文献   

14.
The evolutionary significance of molecular variation is still contentious, with much current interest focusing on the relative contribution of structural changes in proteins versus regulatory variation in gene expression. We present a population genetic and biochemical study of molecular variation at the malic enzyme locus (Men) in Drosophila melanogaster. Two amino acid polymorphisms appear to affect substrate-binding kinetics, while only one appears to affect thermal stability. Interestingly, we find that enzyme activity differences previously assigned to one of the polymorphisms may, instead, be a function of linked regulatory differences. These results suggest that both regulatory and structural changes contribute to differences in protein function. Our examination of the Men coding sequences reveals no evidence for selection acting on the polymorphisms, but earlier work on this enzyme indicates that the biochemical variation observed has physiological repercussions and therefore could potentially be under natural selection.  相似文献   

15.
Prevailing evolutionary forces are typically deduced from the pattern of differences in synonymous and non-synonymous mutations, under the assumption of neutrality in the absence of amino acid change. We determined the complete sequence of ten vesicular stomatitis virus populations evolving under positive selection. A significant number of the mutations occurred independently in two or more strains, a process known as parallel evolution, and a substantial fraction of the parallel mutations were silent. Parallel evolution was also identified in non-coding regions. These results indicate that silent mutations can significantly contribute to adaptation in RNA viruses, and relative frequencies of synonymous and non-synonymous substitutions may not be useful to resolve their evolutionary history.  相似文献   

16.
Single-nucleotide polymorphisms (SNPs) can make an important contribution to our understanding of genetic backgrounds that may influence medical conditions and ethnic diversity. We undertook a systematic survey of genomic DNA for SNPs located not only in coding sequences but also in non-coding regions (e.g., introns and 5' flanking regions) of selected genes. Using DNA samples from 48 Japanese patients with rheumatoid arthritis (RA) as templates, we surveyed 41 genes that represent candidates for RA, screening a total of 104 kb of DNA (30 kb of coding sequences and 74 kb of non-coding DNA). Within this 104 kb of genomic sequences we identified 163 polymorphisms (1 per 638 bases on average), of which 142 were single-nucleotide substitutions and the remainder, insertions or deletions. Of the coding SNPs, 52% were non-synonymous substitutions, and non-conservative amino acid changes were observed in a quarter of those. Sixty-nine polymorphisms showed high frequencies for minor alleles (more than 15%) and 20 revealed low frequencies (<5%). Our results indicated a greater average distance between SNPs than others have reported, but this disparity may reflect the type of genes surveyed and/or the relative ethnic homogeneity of our test population.  相似文献   

17.
We develop an approximate maximum likelihood method to estimate flanking nucleotide context-dependent mutation rates and amino acid exchange-dependent selection in orthologous protein-coding sequences and use it to analyze genome-wide coding sequence alignments from mammals and yeast. Allowing context-dependent mutation provides a better fit to coding sequence data than simpler (context-independent or CpG "hotspot") models and significantly affects selection parameter estimates. Allowing asymmetric (nonreciprocal) selection on amino acid exchanges gives a better fit than simple dN/dS or symmetric selection models. Relative selection strength estimates from our models show good agreement with independent estimates derived from human disease-causing and engineered mutations. Selection strengths depend on local protein structure, showing expected biophysical trends in helical versus nonhelical regions and increased asymmetry on polar-hydrophobic exchanges with increased burial. The more stringent selection that has previously been observed for highly expressed proteins is primarily concentrated in buried regions, supporting the notion that such proteins are under stronger than average selection for stability. Our analyses indicate that a highly parameterized model of mutation and selection is computationally tractable and is a useful tool for exploring a variety of biological questions concerning protein and coding sequence evolution.  相似文献   

18.
Measuring evolutionary distances between DNA or protein sequences forms the basis of many applications in computational biology and evolutionary studies. Of particular interest are distances based on synonymous substitutions, since these substitutions are considered to be under very little selection pressure and therefore assumed to accumulate in an almost clock-like manner. SynPAM, the method presented here, allows the estimation of distances between coding DNA sequences based on synonymous codon substitutions. The problem of estimating an accurate distance from the observed substitution pattern is solved by maximum-likelihood with empirical codon substitution matrices employed for the underlying Markov model. Comparisons with established measures of synonymous distance indicate that SynPAM has less variance and yields useful results over a longer time range.  相似文献   

19.
Interspecific comparisons of protein sequences can reveal regions of evolutionary conservation that are under purifying selection because of functional constraints. Interpreting these constraints requires combining evolutionary information with structural, biochemical, and physiological data to understand the biological function of conserved regions. We take this integrative approach to investigate the evolution and function of the nuclear-encoded subunits of cytochrome c oxidase (COX). We find that the nuclear-encoded subunits evolved subsequent to the origin of mitochondria and the subunit composition of the holoenzyme varies across diverse taxa that include animals, yeasts, and plants. By mapping conserved amino acids onto the crystal structure of bovine COX, we show that conserved residues are structurally organized into functional domains. These domains correspond to some known functional sites as well as to other uncharacterized regions. We find that amino acids that are important for structural stability are conserved at frequencies higher than expected within each taxon, and groups of conserved residues cluster together at distances of less than 5 A more frequently than do randomly selected residues. We, therefore, suggest that selection is acting to maintain the structural foundation of COX across taxa, whereas active sites vary or coevolve within lineages.  相似文献   

20.
Measuring evolutionary distances between DNA or protein sequences forms the basis of many applications in computational biology and evolutionary studies. Of particular interest are distances based on synonymous substitutions since these substitutions are considered to be under very little selection pressure and therefore assumed to accumulate in an almost clock-like manner. SynPAM, the method presented here, allows the estimation of distances between coding DNA sequences based on synonymous codon substitutions. The problem of estimating an accurate distance from the observed substitution pattern is solved by maximum likelihood with empirical codon substitution matrices employed for the underlying Markov model. Comparisons with established measures of synonymous distance indicate that SynPAM has less variance and yields useful results over a longer time range.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号