首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Estimation of evolutionary distance between nucleotide sequences   总被引:25,自引:9,他引:25  
A mathematical formula for estimating the average number of nucleotide substitutions per site (delta) between two homologous DNA sequences is developed by taking into account unequal rates of substitution among different nucleotide pairs. Although this formula is obtained for the equal-input model of nucleotide substitution, computer simulations have shown that it gives a reasonably good estimate for a wide range of nucleotide substitution patterns as long as delta is equal to or smaller than 1. Furthermore, the frequency of cases to which the formula is inapplicable is much lower than that for other similar methods recently proposed. This point is illustrated using insulin genes. A statistical method for estimating the number of nucleotide changes due to deletion and insertion is also developed. Application of this method to globin gene data indicates that the number of nucleotide changes per site increases with evolutionary time but the pattern of the increase is quite irregular.   相似文献   

2.
Estimation of evolutionary distances between nucleotide sequences   总被引:11,自引:0,他引:11  
A formal mathematical analysis of the substitution process in nucleotide sequence evolution was done in terms of the Markov process. By using matrix algebra theory, the theoretical foundation of Barry and Hartigan's (Stat. Sci. 2:191–210, 1987) and Lanave et al.'s (J. Mol. Evol. 20:86–93, 1984) methods was provided. Extensive computer simulation was used to compare the accuracy and effectiveness of various methods for estimating the evolutionary distance between two nucleotide sequences. It was shown that the multiparameter methods of Lanave et al.'s (J. Mol. Evol. 20:86–93, 1984), Gojobori et al.'s (J. Mol. Evol. 18:414–422, 1982), and Barry and Hartigan's (Stat. Sci. 2:191–210, 1987) are preferable to others for the purpose of phylogenetic analysis when the sequences are long. However, when sequences are short and the evolutionary distance is large, Tajima and Nei's (Mol. Biol. Evol. 1:269–285, 1984) method is superior to others.  相似文献   

3.
In most studies of molecular evolution, the nucleotide base at a site is assumed to change with the apparent rate under functional constraint, and the comparison of base changes between homologous genes is thought to yield the evolutionary distance corresponding to the site-average change rate multiplied by the divergence time. However, this view is not sufficiently successful in estimating the divergence time of species, but mostly results in the construction of tree topology without a time-scale. In the present paper, this problem is investigated theoretically by considering that observed base changes are the results of comparing the survivals through selection of mutated bases. In the case of weak selection, the time course of base changes due to mutation and selection can be obtained analytically, leading to a theoretical equation showing how the selection has influence on the evolutionary distance estimated from the enumeration of base changes. This result provides a new method for estimating the divergence time more accurately from the observed base changes by evaluating both the strength of selection and the mutation rate. The validity of this method is verified by analysing the base changes observed at the third codon positions of amino acid residues with four-fold codon degeneracy in the protein genes of mammalian mitochondria; i.e. the ratios of estimated divergence times are fairly well consistent with a series of fossil records of mammals. Throughout this analysis, it is also suggested that the mutation rates in mitochondrial genomes are almost the same in different lineages of mammals and that the lineage-specific base-change rates indicated previously are due to the selection probably arising from the preference of transfer RNAs to codons.  相似文献   

4.
Phylogenetic methods that use matrices of pairwise distances between sequences (e.g., neighbor joining) will only give accurate results when the initial estimates of the pairwise distances are accurate. For many different models of sequence evolution, analytical formulae are known that give estimates of the distance between two sequences as a function of the observed numbers of substitutions of various classes. These are often of a form that we call "log transform formulae". Errors in these distance estimates become larger as the time t since divergence of the two sequences increases. For long times, the log transform formulae can sometimes give divergent distance estimates when applied to finite sequences. We show that these errors become significant when t approximately 1/2 |lambda(max)|(-1) logN, where lambda(max) is the eigenvalue of the substitution rate matrix with the largest absolute value and N is the sequence length. Various likelihood-based methods have been proposed to estimate the values of parameters in rate matrices. If rate matrix parameters are known with reasonable accuracy, it is possible to use the maximum likelihood method to estimate evolutionary distances while keeping the rate parameters fixed. We show that errors in distances estimated in this way only become significant when t approximately 1/2 |lambda(1)|(-1) logN, where lambda(1) is the eigenvalue of the substitution rate matrix with the smallest nonzero absolute value. The accuracy of likelihood-based distance estimates is therefore much higher than those based on log transform formulae, particularly in cases where there is a large range of timescales involved in the rate matrix (e.g., when the ratio of transition to transversion rates is large). We discuss several practical ways of estimating the rate matrix parameters before distance calculation and hence of increasing the accuracy of distance estimates.  相似文献   

5.
SUMMARY: QDist is a program for computing the quartet distance between two unrooted trees, i.e. the number of quartet topology differences between the trees, where a quartet topology is the topological subtree induced by four species. The program is based on an algorithm with running time O(n log2 n), which makes it practical to compare large trees. Available under GNU license. AVAILABILITY: http://www.birc.dk/Software/QDist  相似文献   

6.

Background  

Sequence alignment is a common tool in bioinformatics and comparative genomics. It is generally assumed that multiple sequence alignment yields better results than pair wise sequence alignment, but this assumption has rarely been tested, and never with the control provided by simulation analysis. This study used sequence simulation to examine the gain in accuracy of adding a third sequence to a pair wise alignment, particularly concentrating on how the phylogenetic position of the additional sequence relative to the first pair changes the accuracy of the initial pair's alignment as well as their estimated evolutionary distance.  相似文献   

7.
Asynchronous distance between homologous DNA sequences   总被引:7,自引:0,他引:7  
D Barry  J A Hartigan 《Biometrics》1987,43(2):261-276
The distance between homologous DNA sequences of two species is proposed to be -1/4 ln[det(P)], where P is the conditional probability matrix specifying the proportions of the various nucleotides in the second sequence, corresponding to each of the four nucleotides in the first sequence. A probability model is described which supports this choice of distance. Distance measures based on a constant evolutionary rate assumption are described and compared with the proposed measure. Sampling properties of both types of distance are examined and we conclude by applying the distance measures to mitochondrial DNA sequences of the hominoids.  相似文献   

8.
Unbiased estimation of individual asymmetry   总被引:1,自引:0,他引:1  
The importance of measurement error (ME) for the estimation of population level fluctuating asymmetry (FA) has long been recognized. At the individual level, however, this aspect has been studied in less detail. Recently, it has been shown that the random slopes of a mixed regression model can estimate individual asymmetry levels that are unbiased with respect to ME. Yet, recent studies have shown that such estimates may fail to reflect heterogeneity in these effects. In this note I show that this is not the case for the estimation of individual asymmetry. The random slopes adequately reflect between‐individual heterogeneity in the underlying developmental instability. Increased levels of ME resulted in, on average, lower estimates of individual asymmetry relative to the traditional unsigned asymmetry. This well‐known shrinkage effect in Bayesian analysis adequately corrected for ME and heterogeneity in ME resulting in unbiased estimates of individual asymmetry that were more closely correlated with the true underlying asymmetry.  相似文献   

9.

Background  

The estimation of the difference between two evolutionary distances within a triplet of homologs is a common operation that is used for example to determine which of two sequences is closer to a third one. The most accurate method is currently maximum likelihood over the entire triplet. However, this approach is relatively time consuming.  相似文献   

10.
Wang J 《Genetics》2011,187(3):887-901
Knowledge of the genetic relatedness between individuals is important in many research areas in quantitative genetics, conservation genetics, forensics, evolution, and ecology. In the absence of pedigree records, relatedness can be estimated from genetic marker data using a number of estimators. These estimators, however, make the critical assumption of a large random mating population without genetic structures. The assumption is frequently violated in the real world where geographic/social structures or nonrandom mating usually lead to genetic structures. In this study, I investigated two approaches to the estimation of relatedness between a pair of individuals from a subpopulation due to recent common ancestors (i.e., relatedness is defined and measured with the current focal subpopulation as reference). The indirect approach uses the allele frequencies of the entire population with and without accounting for the population structure, and the direct approach uses the allele frequencies of the current focal subpopulation. I found by simulations that currently widely applied relatedness estimators are upwardly biased under the indirect approach, but can be modified to become unbiased and more accurate by using Wright's F(st) to account for population structures. However, the modified unbiased estimators under the indirect approach are clearly inferior to the unmodified original estimators under the direct approach, even when small samples are used in estimating both allele frequencies and relatedness.  相似文献   

11.
When two strings of symbols are aligned it is important to know whether the observed number of matches is better than that expected between two independent sequences with the same frequency of symbols. When strings are of different lengths, nulls need to be inserted in order to align the sequences. One approach is to use simple approximations of sampling for replacement. We describe an algorithm for exactly determining the frequencies of given numbers of matches, sampling without replacement. This does not lead to a simple closed form expression. However we show examples where sampling with, or without, replacement give very similar results and the simple approach may be adequate for all but the smallest cases.  相似文献   

12.
M S Horwitz  D K Dube  L A Loeb 《Génome》1989,31(1):112-117
Recent advances in the selection of biologically active DNA sequences from random populations are reviewed. Within the framework of evolution, forces are considered that have precluded the testing of all possible DNA sequences, purely with regard to their functionality as genetic regulatory elements or protein coding sequences. Examples are drawn from cassette mutagenesis of enzyme active sites, protein domain replacement by fusion with random genomic digests, and the selection of bacterial promoters from random DNA. Efforts to derive new activities are examined, and the likelihood of future success is evaluated.  相似文献   

13.
14.
15.
A simple method to obtain the distribution of site differences between two randomly chosen cistrons in a finite population is shown. It is assumed that the number of sites and the number of alleles at each site are finite and that no recombination occurs between sites. The mean and variance of the percent difference and the number of site differences between the two sequences are shown to be simple functions of two variables. The differences between a finite-site model an infinite-site model are discussed. The results are applied to the β-globin polymorphism in man.  相似文献   

16.
17.
18.
Phylogenetic estimation of evolutionary timescales has become routine in biology, forming the basis of a wide range of evolutionary and ecological studies. However, there are various sources of bias that can affect these estimates. We investigated whether tree imbalance, a property that is commonly observed in phylogenetic trees, can lead to reduced accuracy or precision of phylogenetic timescale estimates. We analysed simulated data sets with calibrations at internal nodes and at the tips, taking into consideration different calibration schemes and levels of tree imbalance. We also investigated the effect of tree imbalance on two empirical data sets: mitogenomes from primates and serial samples of the African swine fever virus. In analyses calibrated using dated, heterochronous tips, we found that tree imbalance had a detrimental impact on precision and produced a bias in which the overall timescale was underestimated. A pronounced effect was observed in analyses with shallow calibrations. The greatest decreases in accuracy usually occurred in the age estimates for medium and deep nodes of the tree. In contrast, analyses calibrated at internal nodes did not display a reduction in estimation accuracy or precision due to tree imbalance. Our results suggest that molecular‐clock analyses can be improved by increasing taxon sampling, with the specific aims of including deeper calibrations, breaking up long branches and reducing tree imbalance.  相似文献   

19.
Arndt PF 《Gene》2007,390(1-2):75-83
Maximum likelihood phylogeny reconstruction methods are widely used in uncovering and assessing the evolutionary history and relationships of natural systems. However, several simplifying assumptions commonly made in this analysis limit the explanatory power of the results obtained. We present an algorithm that performs the phylogenetic analysis without making the common assumptions for sequence data from at least three leaf nodes in a star phylogeny. In particular, the underlying nucleotide substitution model does not have to be reversible and may include neighbor-dependent processes like the CpG methylation deamination process (CpG-effect). The base composition of the sequences at the external nodes and the one of the ancestral sequence may be different from each other and they do not have to be stationary state distributions of the corresponding substitution model. The algorithm is able to reconstruct the ancestral base composition and accurately estimate substitution frequencies in the branches of the star phylogeny. Extensive tests on simulated data validate the very favorable performance of the algorithm. As an application we present the analysis of aligned genomic sequences from human, mouse, and dog. Different substitution pattern can be observed in the three lineages.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号