首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In 2005, Wyckoff and coworkers described a surprisingly strong correlation between Ka/Ks and Ks in several data sets using the LPB93 algorithm. This finding indicated the possibility of a paradigm shift in the way selection strength can be measured using the Ka/Ks ratio. We carried out a calculation of Ka and Ks using six different algorithms on three cross-species orthologous data sets and found a highly variable correlation among the algorithms and lineages. Algorithms based on the GY-HKY substitution model exhibit a weaker positive correlation or a stronger negative correlation than those based on the K2P and JC69 substitution model. Even if one algorithm shows a positive correlation between Ka/Ks and Ks in a warm-blooded lineage, it may show no correlation in a cold-blooded lineage. This algorithm-related and evolutionary lineage-related correlation indicates the need for great caution in drawing conclusions when using only one Ka and Ks algorithm in a genomewide analysis of selection strength. Our results indicated that currently used algorithms for Ka and Ks calculations are flawed and need improvements.  相似文献   

2.
Hurst LD  Williams EJ 《Gene》2000,261(1):107-114
Many attempts to test selectionist and neutralist models employ estimates of synonymous (Ks) and non-synonymous (Ka) substitution rates of orthologous genes. For example, a stronger Ka-Ks correlation than expected under neutrality has been argued to indicate a role for selection and the absence of a Ks-GC4 correlation has been argued to be inconsistent with neutral models for isochore evolution. However, both of these results, we have shown previously, are sensitive to the method by which Ka and Ks are estimated. Using a maximum likelihood (ML) estimator (GY94) we found a positive correlation between Ks and GC4 and only a weak correlation between Ka and Ks, lower than expected under neutral expectations. This ML method is computationally slow. Recently, a new ad hoc approximation of this ML method has been provided (YN00). This is effectively an extension of Li's protocol but that also allows for codon usage bias. This method is computationally near-instantaneous and therefore potentially of great utility for analysis of large datasets. Here we ask whether this method might have such applicability. To this end we ask whether it too recovers the two unusual results. We report that when the ML and earlier ad hoc methods disagree, YN00 recovers the results described by the ML methods, i.e. a positive correlation between GC4 and Ks and only a weak correlation between Ks and Ka. If the ML method can be trusted, then YN00 can also be considered an adequately reliable method for analysis of large datasets. Assuming this to be so we also analyze further the patterns. We show, for example, that the positive correlation between GC4 and Ks is probably in part a mutational bias, there being more methyl induced CpG-->TpG mutations in GC rich regions. As regards the evolution of isochores, it seems inappropriate to use the claimed lack of a correlation between GC and Ks as definitive evidence either against or for any model. If the positive correlation is real then, we argue, this is hard to reconcile with the biased gene conversion model for isochore formation as this predicts a negative correlation.  相似文献   

3.
Methods for estimating synonymous and nonsynonymous substitution rates among protein-coding sequences adopt different mutation (substitution) models with subtle yet significant differences, which lead to different estimates of evolutionary information. Little attention has been devoted to the comparison of methods for obtaining reliable estimates since the amount of sequence variations within targeted datasets is always unpredictable. To our knowledge, there is little information available in literature about evaluation of these different methods. In this study, we compared six widely used methods and provided with evaluation results using simulated sequences. The results indicate that incorporating sequence features (such as transition/transversion bias and nucleotide/codon frequency bias) into methods could yield better performance. We recommend that conclusions related to or derived from Ka and Ks analyses should not be readily drawn only according to results from one method.  相似文献   

4.
It is well established that different allozyme proteins vary in heterozygosity in averages made over large numbers of species. For example, the enzyme 6-phosphogluconate dehydrogenase has a much higher average heterozygosity than glutamate dehydrogenase. Allozyme data alone provide insufficient power to determine the evolutionary cause of such a difference. Many studies have now been carried out on the DNA sequences coding for allozymes. These have identified diverse selective and nonselective causes of polymorphisms at individual loci. However the studies are mainly in a small number of model species; thus, it is difficult to identify from these DNA studies specific causes of global average heterozygosity differences among allozyme proteins. Here we demonstrate that estimates of average heterozygosity for 37 allozyme proteins in vertebrates correlate positively with Ka and Ka/Ks but not with Ks, measured in the human-mouse lineage. The values of Ka/Ks are less than 0.25, and Ka/Ks is negatively correlated with subunit number (quaternary structure), a measure of structural constraint. Proteins with lower levels of constraint have higher values of both Ka/Ks and heterozygosity. These results better support the hypothesis that differences in average allozyme diversity between proteins are more closely related to differences in the level of purifying selection than to differences in the underlying mutation rate or level of positive selection.  相似文献   

5.

Background  

Approximate methods for estimating nonsynonymous and synonymous substitution rates (Ka and Ks) among protein-coding sequences have adopted different mutation (substitution) models. In the past two decades, several methods have been proposed but they have not considered unequal transitional substitutions (between the two purines, A and G, or the two pyrimidines, T and C) that become apparent when sequences data to be compared are vast and significantly diverged.  相似文献   

6.
Accuracy of estimated phylogenetic trees from molecular data   总被引:2,自引:0,他引:2  
Summary The accuracies and efficiencies of four different methods for constructing phylogenetic trees from molecular data were examined by using computer simulation. The methods examined are UPGMA, Fitch and Margoliash's (1967) (F/M) method, Farris' (1972) method, and the modified Farris method (Tateno, Nei, and Tajima, this paper). In the computer simulation, eight OTUs (32 OTUs in one case) were assumed to evolve according to a given model tree, and the evolutionary change of a sequence of 300 nucleotides was followed. The nucleotide substitution in this sequence was assumed to occur following the Poisson distribution, negative binomial distribution or a model of temporally varying rate. Estimates of nucleotide substitutions (genetic distances) were then computed for all pairs of the nucleotide sequences that were generated at the end of the evolution considered, and from these estimates a phylogenetic tree was reconstructed and compared with the true model tree. The results of this comparison indicate that when the coefficient of variation of branch length is large the Farris and modified Farris methods tend to be better than UPGMA and the F/M method for obtaining a good topology. For estimating the number of nucleotide substitutions for each branch of the tree, however, the modified Farris method shows a better performance than the Farris method. When the coefficient of variation of branch length is small, however, UPGMA shows the best performance among the four methods examined. Nevertheless, any tree-making method is likely to make errors in obtaining the correct topology with a high probability, unless all branch lengths of the true tree are sufficiently long. It is also shown that the agreement between patristic and observed genetic distances is not a good indicator of the goodness of the tree obtained.  相似文献   

7.
The best-known endocannabinoid ligands, anandamide and 2-AG, signal at least seven receptors and involve ten metabolic enzymes. Genes for the receptors and enzymes were examined for heterogeneities in tempo (relative rate of evolution, RRE) and mode (selection pressure, Ka/Ks) in six organisms with sequenced genomes. BLAST identified orthologs as reciprocal best hits, and nucleotide alignments were performed with ClustalX and MacClade. Two bioinformatics platforms, LiKaKs (a distance-based LWL85 model) and SNAP (a parsimony-based NG86 model) made pairwise comparisons of orthologs in murids (rat and mouse) and primates (human and macaque). Mean RRE of the 18 endocannabinoid genes was significantly greater in murids than primates, whereas mean Ka/Ks did not differ significantly. Next we used FUGE (tree-based maximum-likelihood model) to compute human lineage-specific Ka/Ks calculations for 18 genes, which ranged from 1.11 to 0.00, in rank order from highest to lowest: PTPN22, NAAA, TRPV1, TRPA1, NAPE-PLD, MAGL, PPARγ, FAAH1, COX2, FAAH2, ABDH4, CB2, GPR55, DAGLβ, PPARα, TRPV4, CB1, DAGLα; differences were significant (p < 0.0001). Rat and mouse presented different rank orders (e.g., GPR55 generated the greatest Ka/Ks ratio). The 18 genes were then tested for recent positive selection (within 10,000 yr) using an extended haplotype homozygosity analysis of SNP data from the HapMap database. Significant evidence (p < 0.05) of a positive “selective sweep” was exhibited by PTPN22, TRPV1, NAPE-PLD, and DAGLα. In conclusion, the endocannabinoid system is collectively under strong purifying selection, although some genes show evidence of adaptive evolution. Electronic supplementary material The online version of this article (doi: ) contains supplementary material, which is available to authorized users. Reviewing Editor: Dr. Bryan Fry  相似文献   

8.
New methods for estimating the numbers of synonymous and nonsynonymous substitutions per site were developed. The methods are unweighted pathway methods based on Kimura's two-parameter model. Computer simulations were conducted to evaluate the accuracies of the new methods, Nei and Gojobori's (NG) method, Miyata and Yasunaga's (MY) method, Li, Wu, and Luo's (LWL) method, and Pamilo, Bianchi, and Li's (PBL) method. The following results were obtained: (1) The NG, MY, and LWL methods give overestimates of the number of synonymous substitutions and underestimates of the number of nonsynonymous substitutions. The major cause for the biased estimation is that these three methods underestimate the number of synonymous sites and overestimate the number of nonsynonymous sites. (2) The PBL method gives better estimates of the numbers of synonymous and nonsynonymous substitutions than those obtained by the NG, MY, and LWL methods. (3) The new methods also give better estimates of the numbers of synonymous and nonsynonymous substitutions than those obtained by the NG, MY, and LWL methods. In addition, estimates of the numbers of synonymous and nonsynonymous sites obtained by the new methods are reasonably accurate. (4) In some cases, the new methods and the PBL method give biased estimates of substitution numbers. However, from the number of nucleotide substitutions at the third position of codons, we can examine whether estimates obtained by the new methods are good or not, whereas we cannot make an examination of estimates obtained by the PBL method. (5) When there are strong transition/transversion and nucleotide-frequency biases like mitochondrial genes, all of the above methods give biased estimates of substitution numbers. In such cases, Kondo et al.'s method is recommended to be used for estimating the number of synonymous substitutions, although their method cannot estimate the number of nonsynonymous substitutions and is time-consuming. These results, particularly result (1), call for reexaminations of some genes. This is because evolutionary pictures of genes have often been discussed on the basis of results obtained by the NG, MY, and LWL methods, which are favorable for the neutral theory of molecular evolution.  相似文献   

9.
10.
SRY基因在人猿超科和旧大陆猴中具有不同的进化规律   总被引:1,自引:0,他引:1  
王晓霞  吕雪梅  张亚平 《遗传学报》2000,27(10):847-852
通过PCR扩增、测序,得到了白臀叶猴和红面猴的SRY基因全序列。结合现有的灵长类其他物种序列进行分析,验证了HMG盒的保守性。通过构建系统发育树,比较旧大陆猴和人猿超科两个类群内和类群间HMG盒侧翼序列Ka/Ks的比率。有趣的是,人猿超科两物种比较呈现较高的Ka/Ks比值,但在旧大陆猴中及旧大陆猴与狨猴间的Ka/Ks比值显著低于人猿超科的,呈现很不同的格局。同时,对于HMG盒序列,Ka/Ks比值在  相似文献   

11.
Estimation of evolutionary distances between nucleotide sequences   总被引:11,自引:0,他引:11  
A formal mathematical analysis of the substitution process in nucleotide sequence evolution was done in terms of the Markov process. By using matrix algebra theory, the theoretical foundation of Barry and Hartigan's (Stat. Sci. 2:191–210, 1987) and Lanave et al.'s (J. Mol. Evol. 20:86–93, 1984) methods was provided. Extensive computer simulation was used to compare the accuracy and effectiveness of various methods for estimating the evolutionary distance between two nucleotide sequences. It was shown that the multiparameter methods of Lanave et al.'s (J. Mol. Evol. 20:86–93, 1984), Gojobori et al.'s (J. Mol. Evol. 18:414–422, 1982), and Barry and Hartigan's (Stat. Sci. 2:191–210, 1987) are preferable to others for the purpose of phylogenetic analysis when the sequences are long. However, when sequences are short and the evolutionary distance is large, Tajima and Nei's (Mol. Biol. Evol. 1:269–285, 1984) method is superior to others.  相似文献   

12.
Comparison of numbers of synonymous and nonsynonymous substitutions is useful for understanding mechanisms of molecular evolution. In this paper, I examine the statistical properties of six methods of estimating numbers of synonymous and nonsynonymous substitutions. The six methods are Miyata and Yasunaga’s (MY) method; Nei and Gojobori’s (NG) method; Li, Wu and Luo’s (LWL) method; Pamilo, Bianchi and Li’s (PBL) method; and Ina’s (Ina) two methods. When the transition/transversion bias at the mutation level is strong, the numbers of synonymous and nonsynonymous substitutions are estimated more accurately by the PBL and Ina methods than by the NG, MY and LWL methods. When the nucleotide-frequency bias is strong and distantly related sequences are compared, all the six methods give underestimates of the number of synonymous substitutions. The concept of synonymous and nonsynonymous categories is also useful for analysis of DNA polymorphism data.  相似文献   

13.
While adaptive immunity genes evolve rapidly under the influence of positive selection, innate immune system genes are known to evolve slowly due to strong purifying selection. Among the sensors of the innate immune system, Toll-like receptors (TLRs) are particularly important due to their ability to recognize and respond to pathogen-associated molecular patterns (PAMP), such as lipopolysaccharides, peptidoglycans, and nucleic acids from bacteria or viruses. In the present study, we examine the evolutionary process that has operated on the TLR7 family genes TLR7, TLR8, and TLR9. The results demonstrate that the average Ka/Ks (the ratio between nonsynonymous and synonymous substitution rates) of each TLR family gene is far lower than one regardless of estimating methods, supporting previous observations of strong purifying selection in this gene family. Interestingly, however, analysis of Ka/Ks ratios along the coding regions of TLR7 family genes by sliding-window analysis reveals a few narrow high peaks (Ka/Ks > 1). The most prominent peak corresponds to a specific region in the ectodomain, which exists only in the TLR7 family, suggesting that this unique structure of the TLR7 family might have been a target of positive selection in a variety of lineages. Furthermore, maximum likelihood model tests suggest that positive selection is the best explanation for a certain fraction of the amino acid substitutions in the TLR9.  相似文献   

14.
Substitutions rates are expected to be rather constant when a gene is compared between species. To analyze this feature, Ka/Ks ratios have been studied for Alcohol dehydrogenase (Adh) and Alcohol dehydrogenase duplication (Adh-dup) genes in Drosophila species. Adh Ka/Ks values are lower in intrasubgenus comparisons involving species of the Sophophora group than when these species are compared to the D. immigrans and S. lebanonensis, and this difference does not occur in the Adh-dup comparisons.  相似文献   

15.
16.
KaKs_Calculator is a software package that calculates nonsynonymous (Ka) and synonymous (Ks) substitution rates through model selection and model averaging. Since existing methods for this estimation adopt their specific mutation (substitution) models that consider different evolutionary features, leading to diverse estimates, KaKs_Calculator implements a set of candidate models in a maximum likelihood framework and adopts the Akaike information criterion to measure fitness between models and data, aiming to include as many features as needed for accurately capturing evolutionary information in protein-coding sequences. In addition, several existing methods for calculating Ka and Ks are also incorporated into this software. KaKs_Calculator, including source codes, compiled executables, and documentation, is freely available for academic use at http://evolution.genomics.org.cn/software.htm.  相似文献   

17.
Spatial range expansion during population colonization is characterized by demographic events that may have significant effects on the efficiency of natural selection. Population genetics suggests that genetic drift brought by small effective population size (Ne) may undermine the efficiency of selection, leading to a faster accumulation of nonsynonymous mutations. However, it is still unknown whether this effect might be balanced or even reversed by strong selective constraints. Here, we used wild boars and local domestic pigs from tropical (Vietnam) and subarctic region (Siberia) as animal model to evaluate the effects of functional constraints and genetic drift on shaping molecular evolution. The likelihood‐ratio test revealed that Siberian clade evolved significantly different from Vietnamese clades. Different datasets consistently showed that Siberian wild boars had lower Ka/Ks ratios than Vietnamese samples. The potential role of positive selection for branches with higher Ka/Ks was evaluated using branch‐site model comparison. No signal of positive selection was found for the higher Ka/Ks in Vietnamese clades, suggesting the interclade difference was mainly due to the reduction in Ka/Ks for Siberian samples. This conclusion was further confirmed by the result from a larger sample size, among which wild boars from northern Asia (subarctic and nearby region) had lower Ka/Ks than those from southern Asia (temperate and tropical region). The lower Ka/Ks might be due to either stronger functional constraints, which prevent nonsynonymous mutations from accumulating in subarctic wild boars, or larger Ne in Siberian wild boars, which can boost the efficacy of purifying selection to remove functional mutations. The latter possibility was further ruled out by the Bayesian skyline plot analysis, which revealed that historical Ne of Siberian wild boars was smaller than that of Vietnamese wild boars. Altogether, these results suggest stronger functional constraints acting on mitogenomes of subarctic wild boars, which may provide new insights into their local adaptation of cold resistance.  相似文献   

18.
A simple method for estimating the transition/transversion ratio was developed. This method can be applied to not only two sequences but also more than two sequences. The statistical properties of the method and some other methods were examined by numerical computation and computer simulation. The results obtained showed that, in terms of bias and variance, the new method gives a better estimate of the transition/transversion ratio than do the other examined methods. The new method was applied to human and chimpanzee mitochondrial control region sequences. Received: 22 September 1997 / Accepted: 1 November 1997  相似文献   

19.
Whether the Amborella/Amborella-Nymphaeales or the grass lineage diverged first within the angiosperms has recently been debated. Central to this issue has been focused on the artifacts that might result from sampling only grasses within the monocots. We therefore sequenced the entire chloroplast genome (cpDNA) of Phalaenopsis aphrodite, Taiwan moth orchid. The cpDNA is a circular molecule of 148,964 bp with a comparatively short single-copy region (11,543 bp) due to the unusual loss and truncation/scattered deletion of certain ndh subunits. An open reading frame, orf91, located in the complementary strand of the rrn23 was reported for the first time. A comparison of nucleotide substitutions between P. aphrodite and the grasses indicates that only the plastid expression genes have a strong positive correlation between nonsynonymous (Ka) and synonymous (Ks) substitutions per site, providing evidence for a generation time effect, mainly across these genes. Among the intron-containing protein-coding genes of the sampled monocots, the Ks of the genes are significantly correlated to transitional substitutions of their introns. We compiled a concatenated 61 protein-coding gene alignment for the available 20 cpDNAs of vascular plants and analyzed the data set using Bayesian inference, maximum parsimony, and neighbor-joining (NJ) methods. The analyses yielded robust support for the Amborella/Amborella-Nymphaeales-basal hypothesis and for the orchid and grasses together being a monophyletic group nested within the remaining angiosperms. However, the NJ analysis using Ka, the first two codon positions, or amino acid sequences, respectively, supports the monocots-basal hypothesis. We demonstrated that these conflicting angiosperm phylogenies are most probably linked to the transitional sites at all codon positions, especially at the third one where the strong base-composition bias and saturation effect take place.  相似文献   

20.
Finding genes that are under positive selection is a difficult task, especially in non-model organisms. Here, we have analyzed expressed sequence tag (EST) data from 4 species (Pinus pinaster, Pinus taeda, Picea glauca, and Pseudotsuga menziesii) to investigate selection patterns during their evolution and to identify genes likely to be under positive selection. To confirm selection, population samples of these genes have been sequenced in Pinus sylvestris, a species that was not included in the EST data set. The estimates of branch-specific Ka/Ks (nonsynonymous/synonymous substitution rates) across all genes in the EST data set were similar or smaller than estimates from other higher plant species. There was no evidence for the traditional indication of positive selection, Ka/Ks above 1. However, several lines of evidence based on polymorphism patterns suggest that genes with high Ka/Ks (0.20-0.52) in the EST data set are in fact more affected by positive selection in P. sylvestris than genes with low Ka/Ks (0.01-0.04). The high Ka/Ks genes have a lower level of polymorphism and more negative Tajima's D than the low Ka/Ks genes. Further, in the high Ka/Ks group, the Hudson-Kreitman-Aguade test is significant. This suggests that the EST data set is a good starting point for finding genes under positive selection in conifers and that even moderate Ka/Ks values could be indicative of selection. A group of 5 genes with high Ka/Ks collectively show evidence for positive selection within P. sylvestris.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号