首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Model-based phylogenetic reconstruction methods traditionally assume homogeneity of nucleotide frequencies among sequence sites and lineages. Yet, heterogeneity in base composition is a characteristic shared by most biological sequences. Compositional variation in time, reflected in the compositional biases among contemporary sequences, has already been extensively studied, and its detrimental effects on phylogenetic estimates are known. However, fewer studies have focused on the effects of spatial compositional heterogeneity within genes. We show here that different sites in an alignment do not always share a unique compositional pattern, and we provide examples where nucleotide frequency trends are correlated with the site-specific rate of evolution in RNA genes. Spatial compositional heterogeneity is shown to affect the estimation of evolutionary parameters. With standard phylogenetic methods, estimates of equilibrium frequencies are found to be biased towards the composition observed at fast-evolving sites. Conversely, the ancestral composition estimates of some time-heterogeneous but spatially homogeneous methods are found to be biased towards frequencies observed at invariant and slow-evolving sites. The latter finding challenges the result of a previous study arguing against a hyperthermophilic last universal ancestor from the low apparent G + C content of its rRNA sequences. We propose a new model to account for compositional variation across sites. A Gaussian process prior is used to allow for a smooth change in composition with evolutionary rate. The model has been implemented in the phylogenetic inference software PHASE, and Bayesian methods can be used to obtain the model parameters. The results suggest that this model can accurately capture the observed trends in present-day RNA sequences.  相似文献   

2.
Many nucleic acid sequences contain local repeats. These are often considered as traces of evolutionary events such as gene duplications. However, every random sequence of four characters contains a rather large amount of chance repeats. To assess the significance of repeats found in a gene it is necessary to know how large a background of chance repeats has to be expected. Equations are derived that allow the computation of the number of repeats of different lengths and frequencies expected in any random sequence of known chain length and base composition. Tandem repeats are considered as well as repeats interspersed with other sequences. Sample calculations on viral, messenger, ribosomal, and transfer RNA sequences show that some contain no more than the expected background of random repeats, whereas others contain an excess. In the latter case, the distribution of distances between the repeats, as well as their number, can give clues as to the evolutionary events that may have produced them.  相似文献   

3.
Pei J  Grishin NV 《Proteins》2004,56(4):782-794
We study the effects of various factors in representing and combining evolutionary and structural information for local protein structural prediction based on fragment selection. We prepare databases of fragments from a set of non-redundant protein domains. For each fragment, evolutionary information is derived from homologous sequences and represented as estimated effective counts and frequencies of amino acids (evolutionary frequencies) at each position. Position-specific amino acid preferences called structural frequencies are derived from statistical analysis of discrete local structural environments in database structures. Our method for local structure prediction is based on ranking and selecting database fragments that are most similar to a target fragment. Using secondary structure type as a local structural property, we test our method in a number of settings. The major findings are: (1) the COMPASS-type scoring function for fragment similarity comparison gives better prediction accuracy than three other tested scoring functions for profile-profile comparison. We show that the COMPASS-type scoring function can be derived both in the probabilistic framework and in the framework of statistical potentials. (2) Using the evolutionary frequencies of database fragments gives better prediction accuracy than using structural frequencies. (3) Finer definition of local environments, such as including more side-chain solvent accessibility classes and considering the backbone conformations of neighboring residues, gives increasingly better prediction accuracy using structural frequencies. (4) Combining evolutionary and structural frequencies of database fragments, either in a linear fashion or using a pseudocount mixture formula, results in improvement of prediction accuracy. Combination at the log-odds score level is not as effective as combination at the frequency level. This suggests that there might be better ways of combining sequence and structural information than the commonly used linear combination of log-odds scores. Our method of fragment selection and frequency combination gives reasonable results of secondary structure prediction tested on 56 CASP5 targets (average SOV score 0.77), suggesting that it is a valid method for local protein structure prediction. Mixture of predicted structural frequencies and evolutionary frequencies improve the quality of local profile-to-profile alignment by COMPASS.  相似文献   

4.
Reconstruction of ancestral DNA and amino acid sequences is an important means of inferring information about past evolutionary events. Such reconstructions suggest changes in molecular function and evolutionary processes over the course of evolution and are used to infer adaptation and convergence. Maximum likelihood (ML) is generally thought to provide relatively accurate reconstructed sequences compared to parsimony, but both methods lead to the inference of multiple directional changes in nucleotide frequencies in primate mitochondrial DNA (mtDNA). To better understand this surprising result, as well as to better understand how parsimony and ML differ, we constructed a series of computationally simple "conditional pathway" methods that differed in the number of substitutions allowed per site along each branch, and we also evaluated the entire Bayesian posterior frequency distribution of reconstructed ancestral states. We analyzed primate mitochondrial cytochrome b (Cyt-b) and cytochrome oxidase subunit I (COI) genes and found that ML reconstructs ancestral frequencies that are often more different from tip sequences than are parsimony reconstructions. In contrast, frequency reconstructions based on the posterior ensemble more closely resemble extant nucleotide frequencies. Simulations indicate that these differences in ancestral sequence inference are probably due to deterministic bias caused by high uncertainty in the optimization-based ancestral reconstruction methods (parsimony, ML, Bayesian maximum a posteriori). In contrast, ancestral nucleotide frequencies based on an average of the Bayesian set of credible ancestral sequences are much less biased. The methods involving simpler conditional pathway calculations have slightly reduced likelihood values compared to full likelihood calculations, but they can provide fairly unbiased nucleotide reconstructions and may be useful in more complex phylogenetic analyses than considered here due to their speed and flexibility. To determine whether biased reconstructions using optimization methods might affect inferences of functional properties, ancestral primate mitochondrial tRNA sequences were inferred and helix-forming propensities for conserved pairs were evaluated in silico. For ambiguously reconstructed nucleotides at sites with high base composition variability, ancestral tRNA sequences from Bayesian analyses were more compatible with canonical base pairing than were those inferred by other methods. Thus, nucleotide bias in reconstructed sequences apparently can lead to serious bias and inaccuracies in functional predictions.  相似文献   

5.
Different functional constraints contribute to different evolutionary rates across genomes. To understand why some sequences evolve faster than others in a single cis-regulatory locus, we investigated function and evolutionary dynamics of the promoter of the Caenorhabditis elegans unc-47 gene. We found that this promoter consists of two distinct domains. The proximal promoter is conserved and is largely sufficient to direct appropriate spatial expression. The distal promoter displays little if any conservation between several closely related nematodes. Despite this divergence, sequences from all species confer robustness of expression, arguing that this function does not require substantial sequence conservation. We showed that even unrelated sequences have the ability to promote robust expression. A prominent feature shared by all of these robustness-promoting sequences is an AT-enriched nucleotide composition consistent with nucleosome depletion. Because general sequence composition can be maintained despite sequence turnover, our results explain how different functional constraints can lead to vastly disparate rates of sequence divergence within a promoter.  相似文献   

6.
Interspecific comparisons of protein sequences can reveal regions of evolutionary conservation that are under purifying selection because of functional constraints. Interpreting these constraints requires combining evolutionary information with structural, biochemical, and physiological data to understand the biological function of conserved regions. We take this integrative approach to investigate the evolution and function of the nuclear-encoded subunits of cytochrome c oxidase (COX). We find that the nuclear-encoded subunits evolved subsequent to the origin of mitochondria and the subunit composition of the holoenzyme varies across diverse taxa that include animals, yeasts, and plants. By mapping conserved amino acids onto the crystal structure of bovine COX, we show that conserved residues are structurally organized into functional domains. These domains correspond to some known functional sites as well as to other uncharacterized regions. We find that amino acids that are important for structural stability are conserved at frequencies higher than expected within each taxon, and groups of conserved residues cluster together at distances of less than 5 A more frequently than do randomly selected residues. We, therefore, suggest that selection is acting to maintain the structural foundation of COX across taxa, whereas active sites vary or coevolve within lineages.  相似文献   

7.
The number of distinct functional classes of single-stranded RNAs (ssRNAs) and the number of sequences representing them are substantial and continue to increase. Organizing this data in an evolutionary context is essential, yet traditional comparative sequence analyses require that homologous sites can be identified. This prevents comparative analysis between sequences of different functional classes that share no site-to-site sequence similarity. Analysis within a single evolutionary lineage also limits evolutionary inference because shared ancestry confounds properties of molecular structure and function that are historically contingent with those that are imposed for biophysical reasons. Here, we apply a method of comparative analysis to ssRNAs that is not restricted to homologous sequences, and therefore enables comparison between distantly related or unrelated sequences, minimizing the effects of shared ancestry. This method is based on statistical similarities in nucleotide base composition among different functional classes of ssRNAs. In order to denote base composition unambiguously, we have calculated the fraction G+A and G+U content, in addition to the more commonly used fraction G+C content. These three parameters define RNA composition space, which we have visualized using interactive graphics software. We have examined the distribution of nucleotide composition from 15 distinct functional classes of ssRNAs from organisms spanning the universal phylogenetic tree and artificial ribozymes evolved in vitro. Surprisingly, these distributions are biased consistently in G+A and G+U content, both within and between functional classes, regardless of the more variable G+C content. Additionally, an analysis of the base composition of secondary structural elements indicates that paired and unpaired nucleotides, known to have different evolutionary rates, also have significantly different compositional biases. These universal compositional biases observed among ssRNAs sharing little or no sequence similarity suggest, contrary to current understanding, that base composition biases constitute a convergent adaptation among a wide variety of molecular functions.  相似文献   

8.
The frequencies of oligonucleotides of length 3-6 were studied in 211 sequences of human DNA (659 kilobases), 22 sequences of DNA of human viruses (120 kbs), in 181 sequences of E. coli (442 kbs), and in 42 sequences of phages of E. coli (137 kbs). The sequences were obtained from Genbank(R) 48. The observed frequencies (O) were compared to the expected frequencies (E) obtained in two ways: 1) according to nucleotide composition for each series, and 2) according to first order Markow chains for triplets, second order for quadruplets, and third order for quintuplets and sextuplets. The ratio O/E was obtained for each oligonucleotide. Then, the correlation between the ratio O/E in a pair of series was calculated. Strong correlations were observed for sequences of man and human viruses, and for E. coli and its phages. Other correlations were small. For higher order Markov chains, there is indication of some correlation also between viruses and phages. It was concluded that through analysis of parallel oligonucleotide series it may be possible to infer some of the complex evolutionary relationships existing between cells and their infectors beyond the level of codon usage.  相似文献   

9.
The analysis of evolutionary rates is a popular approach to characterizing the effect of natural selection at the molecular level. Sequences contributing to species adaptation are expected to evolve faster than nonfunctional sequences because favourable mutations have a higher fixation probability than neutral ones. Such an accelerated rate of evolution might be due to factors other than natural selection, in particular GC-biased gene conversion. This is true of neutral sequences, but also of constrained sequences, which can be illustrated using the mouse Fxy gene. Several criteria can discriminate between the natural selection and biased gene conversion models. These criteria suggest that the recently reported human accelerated regions are most likely the result of biased gene conversion. We argue that these regions, far from contributing to human adaptation, might represent the Achilles' heel of our genome.  相似文献   

10.
11.
One of the longest running debates in evolutionary biology concerns the kind of genetic variation that is primarily responsible for phenotypic variation in species. Here, we address this question for humans specifically from the perspective of population allele frequency of variants across the complete genome, including both coding and noncoding regions. We establish simple criteria to assess the likelihood that variants are functional based on their genomic locations and then use whole-genome sequence data from 29 subjects of European origin to assess the relationship between the functional properties of variants and their population allele frequencies. We find that for all criteria used to assess the likelihood that a variant is functional, the rarer variants are significantly more likely to be functional than the more common variants. Strikingly, these patterns disappear when we focus on only those variants in which the major alleles are derived. These analyses indicate that the majority of the genetic variation in terms of phenotypic consequence may result from a mutation-selection balance, as opposed to balancing selection, and have direct relevance to the study of human disease.  相似文献   

12.
Tests of applicability of several substitution models for DNA sequence data   总被引:8,自引:3,他引:5  
Using linear invariants for various models of nucleotide substitution, we developed test statistics for examining the applicability of a specific model to a given dataset in phylogenetic inference. The models examined are those developed by Jukes and Cantor (1969), Kimura (1980), Tajima and Nei (1984), Hasegawa et al. (1985), Tamura (1992), Tamura and Nei (1993), and a new model called the eight-parameter model. The first six models are special cases of the last model. The test statistics developed are independent of evolutionary time and phylogeny, although the variances of the statistics contain phylogenetic information. Therefore, these statistics can be used before a phylogenetic tree is estimated. Our objective is to find the simplest model that is applicable to a given dataset, keeping in mind that a simple model usually gives an estimate of evolutionary distance (number of nucleotide substitutions per site) with a smaller variance than a complicated model when the simple model is correct. We have also developed a statistical test of the homogeneity of nucleotide frequencies of a sample of several sequences that takes into account possible phylogenetic correlations. This test is used to examine the stationarity in time of the base frequencies in the sample. For Hasegawa et al.'s and the eight-parameter models, analytical formulas for estimating evolutionary distances are presented. Application of the above tests to several sets of real data has shown that the assumption of stationarity of base composition is usually acceptable when the sequences studied are closely related but otherwise it is rejected. Similarly, the simple models of nucleotide substitution are almost always rejected when actual genes are distantly related and/or the total number of nucleotides examined is large.   相似文献   

13.
Evolution of human Y-chromosome DNA   总被引:6,自引:0,他引:6  
We have used human male-specific 3.4 kb Hae III restriction endonuclease fragments to explore the evolutionary history of man's Y-chromosome. We have identified four sets of reiterated, sequences on the basis of their relative sequence homology with autosomal DNA. The sequences account for approximately 40% of the human Y-chromosome, are interspersed within the same 3.4 kb Hae III fragments, are heterogeneous and contain all reiterated DNA previously demonstrated to be specific for the Y-chromosome (it-Y DNA). Y-specific 3.4 kb Hae III sequences do not reassociate with either human female or ape DNA at standard reassociation criteria. However, approximately half of it-Y DNA (cross reacting it-Y) reassociates with both human female and ape DNA at reduced reassociation criteria. The remaining half (Y-specific it-Y) retains its specificity for the human Y-chromosome. These two sets of it-Y DNA have distinct reiteration frequencies and thermal stabilities with their Y-chromosome homologs. Non-Y-specific 3.4 kb Hae III sequences reassociate with both human female and ape DNA at standard reassociation criteria. The abundance of these non-Y-specific sequences decreases as a function of their evolutionary distance from man. One subset of non-Y-specific 3.4 kb Hae III sequences forms stable duplexes with human Y-chromosome DNA and with human and ape autosomal DNA. No detectable base-mismatch occurs among these homologs suggesting complete conservation of these sequences during primate evolution. The second subset of Non-Y-specific Hae III sequences form stable duplexes with human Y-chromosome DNA but highly mismatched duplexes with human and ape autosomal DNA.The finding that homologs of 3.4 kb Hae III sequences are not found within the Y-chromosome of apes but are only present in autosomes suggests that 3.4 kb Hae III sequences are largely autosomal in origin. Since autosomal homologs of most 3.4 kb Hae III-sequences exhibit a greater degree of divergence than those localized to the Y-chromosome, their evolutionary history seems to be chromosome-dependent.Our findings are not easily correlated with the comparative morphology of primate Y-chromosomes and suggest that sequence rearrangement has been a major event in the evolution of the human Y-chromosome. The significance of the specific interspersion of four sets of reiterated sequences, with distinct evolutionary histories, within a repeating unit specific to the human Y-chromosome is not clear. The apparent conservation of at least some of these reiterated sequences suggests they may be of functional importance.  相似文献   

14.
15.
16.
Recent progress in the development of phylogenetic methods and access to molecular phylogenies has made comparative biology more popular than ever before. However, determining cause and effect in phylogenetic comparative studies is inherently difficult without experimentation and evolutionary replication. Here, we provide a roadmap for linking comparative phylogenetic patterns with ecological experiments to test causal hypotheses across ecological and evolutionary scales. As examples, we consider five cornerstones of ecological and evolutionary research: tests of adaptation, tradeoffs and synergisms among traits, coevolution due to species interactions, trait influences on lineage diversification, and community assembly and composition. Although several scenarios can result in a lack of concordance between historical patterns and contemporary experiments, we argue that the coupling of phylogenetic and experimental methods is an increasingly revealing approach to hypothesis testing in evolutionary ecology.  相似文献   

17.
In molecular biology, the issue of quantifying the similarity between two biological sequences is very important. Past research has shown that word-based search tools are computationally efficient and can find some new functional similarities or dissimilarities invisible to other algorithms like FASTA. Recently, under the independent model of base composition, Wu, Burke, and Davison (1997, Biometrics 53, 1431 1439) characterized a family of word-based dissimilarity measures that defined distance between two sequences by simultaneously comparing the frequencies of all subsequences of n adjacent letters (i.e., n-words) in the two sequences. Specifically, they introduced the use of Mahalanobis distance and standardized Euclidean distance into the study of DNA sequence dissimilarity. They showed that both distances had better sensitivity and selectivity than the commonly used Euclidean distance. The purpose of this article is to extend Mahalanobis and standardized Euclidean distances to Markov chain models of base composition. In addition, a new dissimilarity measure based on Kullback-Leibler discrepancy between frequencies of all n-words in the two sequences is introduced. Applications to real data demonstrate that Kullback-Leibler discrepancy gives a better performance than Euclidean distance. Moreover, under a Markov chain model of order kQ for base composition, where kQ is the estimated order based on the query sequence, standardized Euclidean distance performs very well. Under such a model, it performs as well as Mahalanobis distance and better than Kullback-Leibler discrepancy and Euclidean distance. Since standardized Euclidean distance is drastically faster to compute than Mahalanobis distance, in a usual workstation/PC computing environment, the use of standardized Euclidean distance under the Markov chain model of order kQ of base composition is generally recommended. However, if the user is very concerned with computational efficiency, then the use of Kullback-Leibler discrepancy, which can be computed as fast as Euclidean distance, is recommended. This can significantly enhance the current technology in comparing large datasets of DNA sequences.  相似文献   

18.
Convergent evolution of domain architectures (is rare)   总被引:4,自引:0,他引:4  
MOTIVATION: In this paper, we shall examine the evolution of domain architectures across 62 genomes of known phylogeny including all kingdoms of life. We look in particular at the possibility of convergent evolution, with a view to determining the extent to which the architectures observed in the genomes are due to functional necessity or evolutionary descent. We used domains of known structure, because from this and other information we know their evolutionary relationships. We use a range of methods including phylogenetic grouping, sequence similarity/alignment, mutation rates and comparative genomics to approach this difficult problem from several angles. RESULTS: Although we do not claim an exhaustive analysis, we conclude that between 0.4 and 4% of sequences are involved in convergent evolution of domain architectures, and expect the actual number to be close to the lower bound. We also made two incidental observations, albeit on a small sample: the events leading to convergent evolution appear to be random with no functional or structural preferences, and changes in the number of tandem repeat domains occur more readily than changes which alter the domain composition. CONCLUSION: The principal conclusion is that the observed domain architectures of the sequences in the genomes are driven by evolutionary descent rather than functional necessity. CONTACT: gough@supfam.org.  相似文献   

19.
To an approximation Chargaff's rule (%A = %T; %G = %C) applies to single-stranded DNA. In long sequences, not only complementary bases but also complementary oligonucleotides are present in approximately equal frequencies. This applies to all species studied. However, species usually differ in base composition. With the goal of understanding the evolutionary forces involved, I have compared the frequencies of trinucleotides in long sequences and their shuffled counterparts. Among the 32 complementary trinucleotide pairs there is a hierarchy of frequencies which is influenced both by base composition (not affected by shuffling the order of the bases) and by base order (affected by shuffling). The influence of base order is greatest in DNA of 50% G + C and seems to reflects a more fundamental hierarchy of dinucleotide frequencies. Thus if TpA is at low frequency, all eight TpA-containing trinucleotides are at low frequency. Mammals and their viruses share similar hierarchies, with intra- and intergenomic differences being mainly associated with differences in base composition (percentage G + C). E. coli and, to a lesser extent, Drosophila melanogaster hierarchies differ from mammalian hierarchies; this is associated with differences both in base composition and in base order. It is proposed that Chargaff's rule applies to single-stranded DNA because there has been an evolutionary selection pressure favoring mutations that generate complementary oligonucleotides in close proximity, thus creating a potential to form stem-loops. These are dispersed throughout genomes and are rate-limiting in recombination. Differences in (G + C)% between species would impair interspecies recombination by interfering with stem-loop interactions.  相似文献   

20.
12S ribosomal RNA (rRNA) gene sequences from a suite of mammalian taxa (13 placentals, 4 marsupials, 1 monotreme), for which phylogenetic relationships are well established based on independent criteria, were employed to study the evolution of this gene. Phylogenetic analysis of 12S sequences produces a phylogeny that agrees with expectations. Base composition provides evidence for directional symmetrical substitution pressure in loops; in stems, base composition is much more even. Rates of nucleotide substitution are lower in stems than loops. Patterns of nucleotide substitution show an overall preference for transitions over transversions, with this difference more profound in stems than loops. Among different transversion pathways, there is a wide range of transformation frequencies. An analysis of compensatory substitutions shows that there is strong evidence for their occurrence and that a weighting factor of 0.61 should be applied in phylogenetic analyses to account for the dependence of mutations at stem positions relative to positions where changes are independent. Among stem variables (i.e., stem length, interaction distance, substitution rates, G+C content, and the percentage of bases that are paired), several significant correlations were discovered, but stem length and interaction distance are uncorrelated with other variables.   相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号