首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We have investigated the effects of different among-site rate variation models on the estimation of substitution model parameters, branch lengths, topology, and bootstrap proportions under minimum evolution (ME) and maximum likelihood (ML). Specifically, we examined equal rates, invariable sites, gamma-distributed rates, and site-specific rates (SSR) models, using mitochondrial DNA sequence data from three protein-coding genes and one tRNA gene from species of the New Zealand cicada genus Maoricicada. Estimates of topology were relatively insensitive to the substitution model used; however, estimates of bootstrap support, branch lengths, and R-matrices (underlying relative substitution rate matrix) were strongly influenced by the assumptions of the substitution model. We identified one situation where ME and ML tree building became inaccurate when implemented with an inappropriate among-site rate variation model. Despite the fact the SSR models often have a better fit to the data than do invariable sites and gamma rates models, SSR models have some serious weaknesses. First, SSR rate parameters are not comparable across data sets, unlike the proportion of invariable sites or the alpha shape parameter of the gamma distribution. Second, the extreme among-site rate variation within codon positions is problematic for SSR models, which explicitly assume rate homogeneity within each rate class. Third, the SSR models appear to give severe underestimates of R-matrices and branch lengths relative to invariable sites and gamma rates models in this example. We recommend performing phylogenetic analyses under a range of substitution models to test the effects of model assumptions not only on estimates of topology but also on estimates of branch length and nodal support.  相似文献   

2.
3.
Selecting the best-fit model of nucleotide substitution   总被引:2,自引:0,他引:2  
Despite the relevant role of models of nucleotide substitution in phylogenetics, choosing among different models remains a problem. Several statistical methods for selecting the model that best fits the data at hand have been proposed, but their absolute and relative performance has not yet been characterized. In this study, we compare under various conditions the performance of different hierarchical and dynamic likelihood ratio tests, and of Akaike and Bayesian information methods, for selecting best-fit models of nucleotide substitution. We specifically examine the role of the topology used to estimate the likelihood of the different models and the importance of the order in which hypotheses are tested. We do this by simulating DNA sequences under a known model of nucleotide substitution and recording how often this true model is recovered by the different methods. Our results suggest that model selection is reasonably accurate and indicate that some likelihood ratio test methods perform overall better than the Akaike or Bayesian information criteria. The tree used to estimate the likelihood scores does not influence model selection unless it is a randomly chosen tree. The order in which hypotheses are tested, and the complexity of the initial model in the sequence of tests, influence model selection in some cases. Model fitting in phylogenetics has been suggested for many years, yet many authors still arbitrarily choose their models, often using the default models implemented in standard computer programs for phylogenetic estimation. We show here that a best-fit model can be readily identified. Consequently, given the relevance of models, model fitting should be routine in any phylogenetic analysis that uses models of evolution.  相似文献   

4.
MOTIVATION: It is well known that neighbouring nucleotides in DNA sequences do not mutate independently of each other. In this paper, we introduce a context-dependent substitution model and derive an algorithm to calculate the likelihood of sequences evolving under this model. We use this algorithm to estimate neighbour-dependent substitution rates, as well as rates for dinucleotide substitutions, using a Bayesian sampling procedure. The model is irreversible, giving an arrow to time, and allowing the position of the root between a pair of sequences to be inferred without using out-groups. RESULTS: We applied the model upon aligned human-mouse non-coding data. Clear neighbour dependencies were observed, including 17-18-fold increased CpG to TpG/CpA rates compared with other substitutions. Root inference positioned the root halfway the mouse and human tips, suggesting an approximately clock-like behaviour of the irreversible part of the substitution process.  相似文献   

5.
C M Lebreton  P M Visscher 《Genetics》1998,148(1):525-535
Several nonparametric bootstrap methods are tested to obtain better confidence intervals for the quantitative trait loci (QTL) positions, i.e., with minimal width and unbiased coverage probability. Two selective resampling schemes are proposed as a means of conditioning the bootstrap on the number of genetic factors in our model inferred from the original data. The selection is based on criteria related to the estimated number of genetic factors, and only the retained bootstrapped samples will contribute a value to the empirically estimated distribution of the QTL position estimate. These schemes are compared with a nonselective scheme across a range of simple configurations of one QTL on a one-chromosome genome. In particular, the effect of the chromosome length and the relative position of the QTL are examined for a given experimental power, which determines the confidence interval size. With the test protocol used, it appears that the selective resampling schemes are either unbiased or least biased when the QTL is situated near the middle of the chromosome. When the QTL is closer to one end, the likelihood curve of its position along the chromosome becomes truncated, and the nonselective scheme then performs better inasmuch as the percentage of estimated confidence intervals that actually contain the real QTL''s position is closer to expectation. The nonselective method, however, produces larger confidence intervals. Hence, we advocate use of the selective methods, regardless of the QTL position along the chromosome (to reduce confidence interval sizes), but we leave the problem open as to how the method should be altered to take into account the bias of the original estimate of the QTL''s position.  相似文献   

6.
When the number of nucleotides examined is relatively small, the estimators of nucleotide substitutions between DNA sequences often introduce systematic error even if the data used fit the mathematical model underlying the estimation formula. The systematic error of this kind is especially large for models that allow variation in substitution rate among different sites. In the present paper we present a number of formulas that produce virtually bias-free estimates of evolutionary distances for these models. Correspondence to: M. Nei  相似文献   

7.
A codon-based model of nucleotide substitution for protein-coding DNA sequences   总被引:11,自引:23,他引:11  
A codon-based model for the evolution of protein-coding DNA sequences is presented for use in phylogenetic estimation. A Markov process is used to describe substitutions between codons. Transition/transversion rate bias and codon usage bias are allowed in the model, and selective restraints at the protein level are accommodated using physicochemical distances between the amino acids coded for by the codons. Analyses of two data sets suggest that the new codon-based model can provide a better fit to data than can nucleotide-based models and can produce more reliable estimates of certain biologically important measures such as the transition/transversion rate ratio and the synonymous/nonsynonymous substitution rate ratio.   相似文献   

8.
The effect of nucleotide substitution on DNA denaturation profiles.   总被引:1,自引:1,他引:0       下载免费PDF全文
The melting profiles were obtained for DNA restriction fragments of approx. 1150 bp with deletion of one, five or six base pairs making them different from each other. In all cases the deletions caused a shift of one melting peak without affecting the positions of the other three peaks. The effect amounted to 0.28 +/- 0.03C upon the deletion of one GC pair. The melting of DNA fragments was also studied by electrophoresis in denaturing gradient gels. The deletion of one GC pair was shown to cause an appreciable shift of the electrophoretic denaturation profile.  相似文献   

9.

Background  

Comparative genomics aims to detect signals of evolutionary conservation as an indicator of functional constraint. Surprisingly, results of the ENCODE project revealed that about half of the experimentally verified functional elements found in non-coding DNA were classified as unconstrained by computational predictions. Following this observation, it has been hypothesized that this may be partly explained by biased estimates on neutral evolutionary rates used by existing sequence conservation metrics. All methods we are aware of rely on a comparison with the neutral rate and conservation is estimated by measuring the deviation of a particular genomic region from this rate. Consequently, it is a reasonable assumption that inaccurate neutral rate estimates may lead to biased conservation and constraint estimates.  相似文献   

10.
EFRON  BRADLEY 《Biometrika》1981,68(3):589-599
  相似文献   

11.
Estimating the pattern of nucleotide substitution   总被引:43,自引:0,他引:43  
Knowledge of the pattern of nucleotide substitution is important both to our understanding of molecular sequence evolution and to reliable estimation of phylogenetic relationships. The method of parsimony analysis, which has been used to estimate substitution patterns in real sequences, has serious drawbacks and leads to results difficult to interpret. In this paper a model-based maximum likelihood approach is proposed for estimating substitution patterns in real sequences. Nucleotide substitution is assumed to follow a homogeneous Markov process, and the general reversible process model (REV) and the unrestricted model without the reversibility assumption are used. These models are also applied to examine the adequacy of the model of Hasegawa et al. (J. Mol. Evol. 1985;22:160–174) (HKY85). Two data sets are analyzed. For the -globin pseudogenes of six primate species, the REV model fits the data much better than HKY85, while, for a segment of mtDNA sequences from nine primates, REV cannot provide a significantly better fit than HKY85 when rate variation over sites is taken into account in the models. It is concluded that the use of the REV model in phylogenetic analysis can be recommended, especially for large data sets or for sequences with extreme substitution patterns, while HKY85 may be expected to provide a good approximation. The use of the unrestricted model does not appear to be worthwhile.  相似文献   

12.
At present there is tremendous interest in characterizing the magnitude and distribution of linkage disequilibrium (LD) throughout the human genome, which will provide the necessary foundation for genome-wide LD analyses and facilitate detailed evolutionary studies. To this end, a human high-density single-nucleotide polymorphism (SNP) marker map has been constructed. Many of the SNPs on this map, however, were identified by sampling a small number of chromosomes from a single population, and inferences drawn from studies using such SNPs may be influenced by ascertainment bias (AB). Through extensive simulations, we have found that AB is a potentially significant problem in estimating and comparing LD within and between populations. Specifically, the magnitude of AB is a function of the SNP discovery strategy, number of chromosomes used for SNP discovery, population genetic characteristics of the particular genomic region considered, amount of gene flow between populations, and demographic history of the populations. We demonstrate that a balanced SNP discovery strategy (where equal numbers of chromosomes are sampled from multiple subpopulations) is the optimal study design for generating broadly applicable SNP resources. Finally, we validate our theoretical predictions by comparing our results to publicly available data from ten genes sequenced in 24 African American and 23 European American individuals.  相似文献   

13.

Background  

Human genetic variations primarily result from single nucleotide polymorphisms (SNPs) that occur approximately every 1000 bases in the overall human population. The non-synonymous SNPs (nsSNPs) that lead to amino acid changes in the protein product may account for nearly half of the known genetic variations linked to inherited human diseases. One of the key problems of medical genetics today is to identify nsSNPs that underlie disease-related phenotypes in humans. As such, the development of computational tools that can identify such nsSNPs would enhance our understanding of genetic diseases and help predict the disease.  相似文献   

14.
The nucleoside analogue dP (6-(2-deoxy-beta-D-ribofuranosyl)-3,4-dihydro-6H,8H-pyrimido[4,5-c][1,2]oxazin-2-one) displays ambivalent hydrogen bonding characteristics whereby the imino tautomer of P can base-pair with adenine and its amino tautomer can base-pair with guanine. Fixed imino and amino tautomers of 6-methyl-3,4-dihydro-6H,8H-pyrimido[4,5-c][1,2]oxazin-2-one (N-methyl P) have been synthesised and their structures obtained by X-ray crystallography. The tautomeric constant of N-methyl P has been calculated from pK(a) values of the fixed tautomers and the kinetic parameters for the incorporation of its 5'-triphosphate (dPTP) by exonuclease-free Klenow fragment of DNA polymerase I have been determined. A strong correlation between the tautomeric constant and the incorporation specificity of dPTP is found. These results lend support to the proposal that the minor tautomeric forms of the natural bases may play an important role in substitution mutagenesis during DNA replication. Furthermore, they imply that DNA polymerases impose specific steric requirements on the base-pair during nucleotide incorporation.  相似文献   

15.
The increasing ability to extract and sequence DNA from noncontemporaneous tissue offers biologists the opportunity to analyse ancient DNA (aDNA) together with modern DNA (mDNA) to address the taxonomy of extinct species, evolutionary origins, historical phylogeography and biogeography. Perhaps more exciting are recent developments in coalescence-based Bayesian inference that offer the potential to use temporal information from aDNA and mDNA for the estimation of substitution rates and divergence dates as an alternative to fossil and geological calibration. This comes at a time of growing interest in the possibility of time dependency for molecular rate estimates. In this study, we provide a critical assessment of Bayesian Markov chain Monte Carlo (MCMC) analysis for the estimation of substitution rate using simulated samples of aDNA and mDNA. We conclude that the current models and priors employed in Bayesian MCMC analysis of heterochronous mtDNA are susceptible to an upward bias in the estimation of substitution rates because of model misspecification when the data come from populations with less than simple demographic histories, including sudden short-lived population bottlenecks or pronounced population structure. However, when model misspecification is only mild, then the 95% highest posterior density intervals provide adequate frequentist coverage of the true rates.  相似文献   

16.
Miyazawa S 《PloS one》2011,6(3):e17244

Background

Empirical substitution matrices represent the average tendencies of substitutions over various protein families by sacrificing gene-level resolution. We develop a codon-based model, in which mutational tendencies of codon, a genetic code, and the strength of selective constraints against amino acid replacements can be tailored to a given gene. First, selective constraints averaged over proteins are estimated by maximizing the likelihood of each 1-PAM matrix of empirical amino acid (JTT, WAG, and LG) and codon (KHG) substitution matrices. Then, selective constraints specific to given proteins are approximated as a linear function of those estimated from the empirical substitution matrices.

Results

Akaike information criterion (AIC) values indicate that a model allowing multiple nucleotide changes fits the empirical substitution matrices significantly better. Also, the ML estimates of transition-transversion bias obtained from these empirical matrices are not so large as previously estimated. The selective constraints are characteristic of proteins rather than species. However, their relative strengths among amino acid pairs can be approximated not to depend very much on protein families but amino acid pairs, because the present model, in which selective constraints are approximated to be a linear function of those estimated from the JTT/WAG/LG/KHG matrices, can provide a good fit to other empirical substitution matrices including cpREV for chloroplast proteins and mtREV for vertebrate mitochondrial proteins.

Conclusions/Significance

The present codon-based model with the ML estimates of selective constraints and with adjustable mutation rates of nucleotide would be useful as a simple substitution model in ML and Bayesian inferences of molecular phylogenetic trees, and enables us to obtain biologically meaningful information at both nucleotide and amino acid levels from codon and protein sequences.  相似文献   

17.
18.
MOTIVATION: Neighbor-dependent substitution processes generated specific pattern of dinucleotide frequencies in the genomes of most organisms. The CpG-methylation-deamination process is, e.g. a prominent process in vertebrates (CpG effect). Such processes, often with unknown mechanistic origins, need to be incorporated into realistic models of nucleotide substitutions. RESULTS: Based on a general framework of nucleotide substitutions we developed a method that is able to identify the most relevant neighbor-dependent substitution processes, estimate their relative frequencies and judge their importance in order to be included into the modeling. Starting from a model for neighbor independent nucleotide substitution we successively added neighbor-dependent substitution processes in the order of their ability to increase the likelihood of the model describing given data. The analysis of neighbor-dependent nucleotide substitutions based on repetitive elements found in the genomes of human, zebrafish and fruit fly is presented. AVAILABILITY: A web server to perform the presented analysis is freely available at: http://evogen.molgen.mpg.de/server/substitution-analysis  相似文献   

19.
The nucleotide sequences of a segment of mitochondrial DNA (mtDNA) have been determined for nine species or subspecies of the subgenus Drosophila of the genus Drosophila. This segment contains two complete protein-coding genes (i.e., NADH dehydrogenase subunit 1 and cytochrome b) and a transfer RNA gene (tRNA(ser)). The G+C content at third-codon positions for the two protein-coding genes was 1.5 times higher than that in the D. melanogaster species group, which belongs to the subgenus Sophophora. However, there was a substantial difference between the nucleotide frequencies of G and C. The number of nucleotide substitutions per silent site was more than three times higher than that for nuclear DNA, although it was only 60% of that for mammalian mtDNA. Both parametric and nonparametric analyses revealed a strong transition-transversion bias in nucleotide substitution, as was observed in mammalian mtDNA. Moreover, the rate of substitution of A and T for G and C is higher than that for the opposite direction. This bias seems to be responsible for the extremely A+T-rich base composition of Drosophila mtDNA. It is also noted that the rate of transitional change between A and G is higher than that between T and C.  相似文献   

20.
Transect count data form the basis of many butterfly and other insect monitoring programs worldwide. A clear understanding of the limitations of such datasets, including the potential for biases in the statistical methods used to analyze them, is therefore crucial. The classical Zonneveld model (CZ) can extract estimates of a suite of demographic parameters from transect count datasets, and has also been used in theoretical analyses of protandry and reproductive asynchrony. The CZ relies on strong assumptions about the emergence and death processes underlying observed transect count datasets. Though reasonable as a starting place, a growing body of empirical evidence suggests these assumptions will, in many cases, not hold. Here, I explore how violations of these assumptions bias CZ-based estimates of two key population parameters: total population size and mean individual lifespan. To do this, I generalize the Zonneveld model by relaxing the symmetrical emergence distribution and constant death rate assumptions such that the generalized models contain the CZ as a special case. Using the generalized models as data generating processes, I then show that the CZ is able to closely mimic the shape of the abundance time course produced by either variant of the generalized model under a wide range of conditions, but produces highly biased estimates of population size and mean lifespan in doing so. My analysis therefore demonstrates both that the CZ is not robust to violations of its emergence and death assumptions, and that a good observed fit to transect count data does not mean these assumptions are satisfied.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号