首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 187 毫秒
1.
Methods for estimating synonymous and nonsynonymous substitution rates among protein-coding sequences adopt different mutation (substitution) models with subtle yet significant differences, which lead to different estimates of evolutionary information. Little attention has been devoted to the comparison of methods for obtaining reliable estimates since the amount of sequence variations within targeted datasets is always unpredictable. To our knowledge, there is little information available in literature about evaluation of these different methods. In this study, we compared six widely used methods and provided with evaluation results using simulated sequences. The results indicate that incorporating sequence features (such as transition/transversion bias and nucleotide/codon frequency bias) into methods could yield better performance. We recommend that conclusions related to or derived from Ka and Ks analyses should not be readily drawn only according to results from one method.  相似文献   

2.
How Do Variable Substitution Rates Influence Ka and Ks Calculations?   总被引:2,自引:1,他引:1  
The ratio of nonsynonymous substitution rate (Ka) to synonymous substitution rate (Ks) is widely used as an indicator of selective pressure at sequence level among different species, and diverse mutation models have been incorporated into several computing methods. We have previously developed a new γ-MYN method by capturing a key dynamic evolution trait of DNA nucleotide sequences, in consideration of varying mutation rates across sites. We now report a further improvement of NG, LWL, MLWL, LPB, MLPB, and ...  相似文献   

3.
Soil respiration (SR) is commonly modeled by a Q10 (an indicator of temperature sensitivity) function in ecosystem models. Q10 is usually treated as a constant of 2 in these models, although Q10 value of SR often decreases with increasing temperatures. It remains unclear whether a general temperature- dependent Q10 model of SR exists at biome and global scale. In this paper, we have compiled the long-term Q10 data of 38 SR studies ranging from the Boreal, Temperate, to Tropical/Sublropical biome on four continents. Our analysis indicated that the general temperature-dependent biome Q10 models of SR existed, especially in the Boreal and Temperate biomes. A single-exponential model was better than a simple linear model in fitting the average Q10 values at the biome scale. Average soil temperature is a better predictor of Q10 value than average air temperature in these models, especially in the Boreal biome. Soil temperature alone could explain about 50% of the Q10 variations in both the Boreal and Temperate biome single-exponential Q10 model. Q10 value of SR decreased with increasing soil temperature but at quite different rates among the three biome Q10 models. The k values (Q10 decay rate constants) were 0.09, 0.07, and 0.02/℃ in the Boreal, Temperate, and Tropical/Subtropical biome, respectively, suggesting that Q10 value is the most sensitive to soil temperature change in the Boreal biome, the second in the Temperate biome, and the least sensitive in the Tropical/ Subtropical biome. This also indirectly confirms that acclimation of SR in many soil warming experiments probably occurs. The k value in the "global" single-exponential Q10 model which combined both the Boreal and Temperate biome data set was 0.08/℃. However, the global general temperature-dependent Q10 model developed using the data sets of the three biomes is not adequate for predicting Q10 values of SR globally. The existence of the general temperature-dependent Q10 models of SR in the Boreal and Temperate biome has important implications for modeling SR, especially in the Boreal biome. More detail model runs are needed to exactly evaluate the impact of using a fixed Q10 vs a temperature-dependent Q10 on SR estimate in ecosystem models (e.g., TEM, Biome-BGC, and PnET).  相似文献   

4.
As one of the important vegetation parameters, vegetation fractional coverage (VFC) is more difficult to measure accurately among a good many parameters of plant communities. The temperate typical steppe in the north of China was chosen for investigation in the present study and a digital camera was used to measure herb community coverage in the field, adopting methods of ocular estimation, gridding measurement, visual interpretation, supervised classification, and information extraction of color spatial transformation to calculate the VFC of images captured by the digital camera. In addition VFC calculated by various methods was analyzed and compared VFC, enabling us to propose an effective method for measuring VFC using a digital camera. The results of the present study indicate that: (i) as two common useful and effective methods of measuring VFC with a digital camera, not only does the error of estimated values of visual estimation and supervised classification vary considerably, but the degree of automatization is very low and depends, to a great extent, on the manipulator; (ii) although the method of visual interpretation may assure the precision of the calculated VFC and enable the precision of results obtained using other methods to be determined, as far as large quantities of data are concerned, this method has the disadvantages of wasting time and energy, and the applications of this method are limited; (iii) the precision and stability of VFC calculated using the grid and node method are superior to those of visual estimation and supervised classification and inferior to those of visual interpretation, but, as for visual interpretation and supervised classification, gridding measurements are difficult to apply in practice because they are not time efficient; and (iv) in terms of the precision of calculation of the VFC, an information-extracting model based on an intensity, hue, saturation (IHS) color space-multi-component series segmentation strategy is superior to methods of ocular estimation, gridding measurement, and supervised classification. In terms of practical efficiency, the information-extracting model is superior to visual interpretation, supervised classification, and gridding measurement. It has been proven that estimating the VFC of the north temperate typical steppe using this model is feasible. This is very fundamental research work in grassland ecology.  相似文献   

5.
Carter RL  Chan AW 《遗传学报》2012,39(6):253-259
Pluripotent cellular models have shown great promise in the study of a number of neurological disorders.Several advantages of using a stem cell model include the potential for cells to derive disease relevant neuronal cell types,providing a system for researchers to monitor disease progression during neurogenesis,along with serving as a platform for drug discovery.A number of stem cell derived models have been employed to establish in vitro research models of Huntington’s disease that can be used to investigate cellular pathology and screen for drug and cell-based therapies.Although some progress has been made,there are a number of challenges and limitations that must be overcome before the true potential of this research strategy is achieved.In this article we review current stem cell models that have been reported,as well as discuss the issues that impair these studies.We also highlight the prospective application of Huntington’s disease stem cell models in the development of novel therapeutic strategies and advancement of personalized medicine.  相似文献   

6.
In the final essay of this series the gaps between biology and engineering are examined, and methods are suggested for crossing them. Creativity is seen as the essential, and TRIZ (the Russian Theory of Inventive Problem Solving) is recommended as the best set of methods both for stimulating creativity and for solving technical problems. When the catalogue of Inventive Principles of TRIZ is used to bring biology and technology to the same level of detail, the comparison shows that the similarity is only about 12%. The differences largely reside in the reliance of energy as a controlling parameter in conventional technology and the replacement of energy by information in biological systems. Although we might be moving slowly in this direction, a numerically based comparison such as this should provide more impetus.  相似文献   

7.
We present an integrated stand-alone software package named KaKs_Calculator 2.0 as an updated version.It incorporates 17 methods for the calculation of nonsynonymous and synonymous substitution rates;among them,we added our modified versions of several widely used methods as the gamma series including γ-NG,γ-LWL,γ-MLWL,γ-LPB,γ-MLPB,γ-YN and γ-MYN,which have been demonstrated to perform better under certain conditions than their original forms and are not implemented in the previous version.The package is readily used for the identification of positively selected sites based on a sliding window across the sequences of interests in 5' to 3' direction of protein-coding sequences,and have improved the overall performance on sequence analysis for evolution studies.A toolbox,including C++ and Java source code and executable files on both Windows and Linux platforms together with a user instruction,is downloadable from the website for academic purpose at https://sourceforge.net/projects/kakscalculator2/.  相似文献   

8.
Profile hidden Markov models (HMMs) based on classical HMMs have been widely applied for protein sequence identification. The formulation of the forward and backward variables in profile HMMs is made under statistical independence assumption of the probability theory. We propose a fuzzy profile HMM to overcome the limitations of that assumption and to achieve an improved alignment for protein sequences belonging to a given family. The proposed model fuzzifies the forward and backward variables by incorporating Sugeno fuzzy measures and Choquet integrals, thus further extends the generalized HMM. Based on the fuzzified forward and backward variables, we propose a fuzzy Baum-Welch parameter estimation algorithm for profiles. The strong correlations and the sequence preference involved in the protein structures make this fuzzy architecture based model as a suitable candidate for building profiles of a given family, since the fuzzy set can handle uncertainties better than classical methods.  相似文献   

9.
10.
The discovery of novel cancer genes is one of the main goals in cancer research.Bioinformatics methods can be used to accelerate cancer gene discovery,which may help in the understanding of cancer and the development of drug targets.In this paper,we describe a classifier to predict potential cancer genes that we have developed by integrating multiple biological evidence,including protein-protein interaction network properties,and sequence and functional features.We detected 55 features that were significantly different between cancer genes and non-cancer genes.Fourteen cancer-associated features were chosen to train the classifier.Four machine learning methods,logistic regression,support vector machines(SVMs),BayesNet and decision tree,were explored in the classifier models to distinguish cancer genes from non-cancer genes.The prediction power of the different models was evaluated by 5-fold cross-validation.The area under the receiver operating characteristic curve for logistic regression,SVM,Baysnet and J48 tree models was 0.834,0.740,0.800 and 0.782,respectively.Finally,the logistic regression classifier with multiple biological features was applied to the genes in the Entrez database,and 1976 cancer gene candidates were identified.We found that the integrated prediction model performed much better than the models based on the individual biological evidence,and the network and functional features had stronger powers than the sequence features in predicting cancer genes.  相似文献   

11.
In 2005, Wyckoff and coworkers described a surprisingly strong correlation between Ka/Ks and Ks in several data sets using the LPB93 algorithm. This finding indicated the possibility of a paradigm shift in the way selection strength can be measured using the Ka/Ks ratio. We carried out a calculation of Ka and Ks using six different algorithms on three cross-species orthologous data sets and found a highly variable correlation among the algorithms and lineages. Algorithms based on the GY-HKY substitution model exhibit a weaker positive correlation or a stronger negative correlation than those based on the K2P and JC69 substitution model. Even if one algorithm shows a positive correlation between Ka/Ks and Ks in a warm-blooded lineage, it may show no correlation in a cold-blooded lineage. This algorithm-related and evolutionary lineage-related correlation indicates the need for great caution in drawing conclusions when using only one Ka and Ks algorithm in a genomewide analysis of selection strength. Our results indicated that currently used algorithms for Ka and Ks calculations are flawed and need improvements.  相似文献   

12.
13.
Hurst LD  Williams EJ 《Gene》2000,261(1):107-114
Many attempts to test selectionist and neutralist models employ estimates of synonymous (Ks) and non-synonymous (Ka) substitution rates of orthologous genes. For example, a stronger Ka-Ks correlation than expected under neutrality has been argued to indicate a role for selection and the absence of a Ks-GC4 correlation has been argued to be inconsistent with neutral models for isochore evolution. However, both of these results, we have shown previously, are sensitive to the method by which Ka and Ks are estimated. Using a maximum likelihood (ML) estimator (GY94) we found a positive correlation between Ks and GC4 and only a weak correlation between Ka and Ks, lower than expected under neutral expectations. This ML method is computationally slow. Recently, a new ad hoc approximation of this ML method has been provided (YN00). This is effectively an extension of Li's protocol but that also allows for codon usage bias. This method is computationally near-instantaneous and therefore potentially of great utility for analysis of large datasets. Here we ask whether this method might have such applicability. To this end we ask whether it too recovers the two unusual results. We report that when the ML and earlier ad hoc methods disagree, YN00 recovers the results described by the ML methods, i.e. a positive correlation between GC4 and Ks and only a weak correlation between Ks and Ka. If the ML method can be trusted, then YN00 can also be considered an adequately reliable method for analysis of large datasets. Assuming this to be so we also analyze further the patterns. We show, for example, that the positive correlation between GC4 and Ks is probably in part a mutational bias, there being more methyl induced CpG-->TpG mutations in GC rich regions. As regards the evolution of isochores, it seems inappropriate to use the claimed lack of a correlation between GC and Ks as definitive evidence either against or for any model. If the positive correlation is real then, we argue, this is hard to reconcile with the biased gene conversion model for isochore formation as this predicts a negative correlation.  相似文献   

14.
The selective forces acting on a protein-coding gene are commonly inferred using evolutionary codon models by contrasting the rate of nonsynonymous substitutions to the rate of synonymous substitutions. These models usually assume that the synonymous substitution rate, Ks, is homogenous across all sites, which is justified if synonymous sites are free from selection. However, a growing body of evidence indicates that the DNA and RNA levels of protein-coding genes are subject to varying degrees of selective constraints due to various biological functions encoded at these levels. In this paper, we develop evolutionary models that account for these layers of selection by allowing for both among-site variability of substitution rates at the DNA/RNA level (which leads to Ks variability among protein-coding sites) and among-site variability of substitution rates at the protein level (Ka variability). These models are constructed so that positive selection is either allowed or not. This enables statistical testing of positive selection when variability at the DNA/RNA substitution rate is accounted for. Using this methodology, we show that variability of the baseline DNA/RNA substitution rate is a widespread phenomenon in coding sequence data of mammalian genomes, most likely reflecting varying degrees of selection at the DNA and RNA levels. Additionally, we use simulations to examine the impact that accounting for the variability of the baseline DNA/RNA substitution rate has on the inference of positive selection. Our results show that ignoring this variability results in a high rate of erroneous positive-selection inference. Our newly developed model, which accounts for this variability, does not suffer from this problem and hence provides a likelihood framework for the inference of positive selection on a background of variability in the baseline DNA/RNA substitution rate.  相似文献   

15.

Background  

Over the past two decades, there have been several approximate methods that adopt different mutation models and used for estimating nonsynonymous and synonymous substitution rates (Ka and Ks) based on protein-coding sequences across species or even different evolutionary lineages. Among them, MYN method (a Modified version of Yang-Nielsen method) considers three major dynamic features of evolving DNA sequences–bias in transition/transversion rate, nucleotide frequency, and unequal transitional substitution but leaves out another important feature: unequal substitution rates among different sites or nucleotide positions.  相似文献   

16.
While adaptive immunity genes evolve rapidly under the influence of positive selection, innate immune system genes are known to evolve slowly due to strong purifying selection. Among the sensors of the innate immune system, Toll-like receptors (TLRs) are particularly important due to their ability to recognize and respond to pathogen-associated molecular patterns (PAMP), such as lipopolysaccharides, peptidoglycans, and nucleic acids from bacteria or viruses. In the present study, we examine the evolutionary process that has operated on the TLR7 family genes TLR7, TLR8, and TLR9. The results demonstrate that the average Ka/Ks (the ratio between nonsynonymous and synonymous substitution rates) of each TLR family gene is far lower than one regardless of estimating methods, supporting previous observations of strong purifying selection in this gene family. Interestingly, however, analysis of Ka/Ks ratios along the coding regions of TLR7 family genes by sliding-window analysis reveals a few narrow high peaks (Ka/Ks > 1). The most prominent peak corresponds to a specific region in the ectodomain, which exists only in the TLR7 family, suggesting that this unique structure of the TLR7 family might have been a target of positive selection in a variety of lineages. Furthermore, maximum likelihood model tests suggest that positive selection is the best explanation for a certain fraction of the amino acid substitutions in the TLR9.  相似文献   

17.
Three frequently used methods for estimating the synonymous and nonsynonymous substitution rates (Ks and Ka) were evaluated and compared for their accuracies; these methods are denoted by LWL85, LPB93, and GY94, respectively. For this purpose, we used a codon-evolution model to obtain the expected Ka and Ks values for the above three methods and compared the values with those obtained by the three methods. We also proposed some modifications of LWL85 and LPB93 to increase their accuracies. Our computer simulations under the codon-evolution model showed that for sequences < or =300 codons, the performance of GY94 may not be reliable. For longer sequences, GY94 is more accurate for estimating the Ka/Ks ratio than the modified LPB93 and LWL85 in the majority of the cases studied. This is particularly so when k > or = 3, which is the transition/transversion (mutation) rate ratio. However, when k is approximately 2 and when the sequence divergence is relatively large, the modified LWL85 performed better than GY94 and the modified LPB93. The inferiority of LPB93 to LWL85 is surprising because LPB93 was intended to improve LWL85. Also, it has been thought that the codon-based method of GY94 is better than the heuristic method of LWL85, but our simulation results showed that in many cases, the opposite was true, even though our simulation was based on the codon-evolution model.  相似文献   

18.
Du J  Tian Z  Sui Y  Zhao M  Song Q  Cannon SB  Cregan P  Ma J 《The Plant cell》2012,24(1):21-32
The evolutionary forces that govern the divergence and retention of duplicated genes in polyploids are poorly understood. In this study, we first investigated the rates of nonsynonymous substitution (Ka) and the rates of synonymous substitution (Ks) for a nearly complete set of genes in the paleopolyploid soybean (Glycine max) by comparing the orthologs between soybean and its progenitor species Glycine soja and then compared the patterns of gene divergence and expression between pericentromeric regions and chromosomal arms in different gene categories. Our results reveal strong associations between duplication status and Ka and gene expression levels and overall low Ks and low levels of gene expression in pericentromeric regions. It is theorized that deleterious mutations can easily accumulate in recombination-suppressed regions, because of Hill-Robertson effects. Intriguingly, the genes in pericentromeric regions-the cold spots for meiotic recombination in soybean-showed significantly lower Ka and higher levels of expression than their homoeologs in chromosomal arms. This asymmetric evolution of two members of individual whole genome duplication (WGD)-derived gene pairs, echoing the biased accumulation of singletons in pericentromeric regions, suggests that distinct genomic features between the two distinct chromatin types are important determinants shaping the patterns of divergence and retention of WGD-derived genes.  相似文献   

19.

Background  

Approximate methods for estimating nonsynonymous and synonymous substitution rates (Ka and Ks) among protein-coding sequences have adopted different mutation (substitution) models. In the past two decades, several methods have been proposed but they have not considered unequal transitional substitutions (between the two purines, A and G, or the two pyrimidines, T and C) that become apparent when sequences data to be compared are vast and significantly diverged.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号