首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 703 毫秒
1.
A method is proposed to calculate the maximum likelihood estimate of gene frequency and linkage disequilibrium from disease-codominant marker conditional data. The method is illustrated using data on sickle-cell anemia and Duchenne muscular dystrophy and linked polymorphic restriction endonuclease cleavage sites.  相似文献   

2.
A new method is presented for inferring evolutionary trees using nucleotide sequence data. The birth-death process is used as a model of speciation and extinction to specify the prior distribution of phylogenies and branching times. Nucleotide substitution is modeled by a continuous-time Markov process. Parameters of the branching model and the substitution model are estimated by maximum likelihood. The posterior probabilities of different phylogenies are calculated and the phylogeny with the highest posterior probability is chosen as the best estimate of the evolutionary relationship among species. We refer to this as the maximum posterior probability (MAP) tree. The posterior probability provides a natural measure of the reliability of the estimated phylogeny. Two example data sets are analyzed to infer the phylogenetic relationship of human, chimpanzee, gorilla, and orangutan. The best trees estimated by the new method are the same as those from the maximum likelihood analysis of separate topologies, but the posterior probabilities are quite different from the bootstrap proportions. The results of the method are found to be insensitive to changes in the rate parameter of the branching process. Correspondence to: Z. Yang  相似文献   

3.
Computation of most probable numbers   总被引:2,自引:0,他引:2  
A rapid computational method for maximum likelihood estimation of most-probable-number values, incorporating a modified Newton-Raphson method, is presented. The method offers a much greater reliability for the most-probable-number estimate of total viable bacteria, i.e., those capable of growth in laboratory media.  相似文献   

4.
Computation of Most Probable Numbers   总被引:5,自引:4,他引:1       下载免费PDF全文
A rapid computational method for maximum likelihood estimation of most-probable-number values, incorporating a modified Newton-Raphson method, is presented. The method offers a much greater reliability for the most-probable-number estimate of total viable bacteria, i.e., those capable of growth in laboratory media.  相似文献   

5.
We develop a new approach to estimate a matrix of pairwise evolutionary distances from a codon-based alignment based on a codon evolutionary model. The method first computes a standard distance matrix for each of the three codon positions. Then these three distance matrices are weighted according to an estimate of the global evolutionary rate of each codon position and averaged into a unique distance matrix. Using a large set of both real and simulated codon-based alignments of nucleotide sequences, we show that this approach leads to distance matrices that have a significantly better treelikeness compared to those obtained by standard nucleotide evolutionary distances. We also propose an alternative weighting to eliminate the part of the noise often associated with some codon positions, particularly the third position, which is known to induce a fast evolutionary rate. Simulation results show that fast distance-based tree reconstruction algorithms on distance matrices based on this codon position weighting can lead to phylogenetic trees that are at least as accurate as, if not better, than those inferred by maximum likelihood. Finally, a well-known multigene dataset composed of eight yeast species and 106 codon-based alignments is reanalyzed and shows that our codon evolutionary distances allow building a phylogenetic tree which is similar to those obtained by non-distance-based methods (e.g., maximum parsimony and maximum likelihood) and also significantly improved compared to standard nucleotide evolutionary distance estimates.  相似文献   

6.
Maximum likelihood and Bayesian approaches are presented for analyzing hierarchical statistical models of natural selection operating on DNA polymorphism within a panmictic population. For analyzing Bayesian models, we present Markov chain Monte-Carlo (MCMC) methods for sampling from the joint posterior distribution of parameters. For frequentist analysis, an Expectation-Maximization (EM) algorithm is presented for finding the maximum likelihood estimate of the genome wide mean and variance in selection intensity among classes of mutations. The framework presented here provides an ideal setting for modeling mutations dispersed through the genome and, in particular, for the analysis of how natural selection operates on different classes of single nucleotide polymorphisms (SNPs).  相似文献   

7.
The models of nucleotide substitution used by most maximum likelihood-based methods assume that the evolutionary process is stationary, reversible, and homogeneous. We present an extension of the Barry and Hartigan model, which can be used to estimate parameters by maximum likelihood (ML) when the data contain invariant sites and there are violations of the assumptions of stationarity, reversibility, and homogeneity. Unlike most ML methods for estimating invariant sites, we estimate the nucleotide composition of invariant sites separately from that of variable sites. We analyze a bacterial data set where problems due to lack of stationarity and homogeneity have been previously well noted and use the parametric bootstrap to show that the data are consistent with our general Markov model. We also show that estimates of invariant sites obtained using our method are fairly accurate when applied to data simulated under the general Markov model.  相似文献   

8.
We would like to use maximum likelihood to estimate parameters such as the effective population size N(e) or, if we do not know mutation rates, the product 4N(e) mu of mutation rate per site and effective population size. To compute the likelihood for a sample of unrecombined nucleotide sequences taken from a random-mating population it is necessary to sum over all genealogies that could have led to the sequences, computing for each one the probability that it would have yielded the sequences, and weighting each one by its prior probability. The genealogies vary in tree topology and in branch lengths. Although the likelihood and the prior are straightforward to compute, the summation over all genealogies seems at first sight hopelessly difficult. This paper reports that it is possible to carry out a Monte Carlo integration to evaluate the likelihoods approximately. The method uses bootstrap sampling of sites to create data sets for each of which a maximum likelihood tree is estimated. The resulting trees are assumed to be sampled from a distribution whose height is proportional to the likelihood surface for the full data. That it will be so is dependent on a theorem which is not proven, but seems likely to be true if the sequences are not short. One can use the resulting estimated likelihood curve to make a maximum likelihood estimate of the parameter of interest, N(e) or of 4N(e) mu. The method requires at least 100 times the computational effort required for estimation of a phylogeny by maximum likelihood, but is practical on today's work stations. The method does not at present have any way of dealing with recombination.  相似文献   

9.
The sampling theory for the infinite site model taking into account the phylogenetic relationship between the alleles is developed for those cases in which two or three alleles are observed in the sample. From this theory a maximum likelihood estimate of θ = 4 can be obtained. Unlike the maximum likelihood estimate of θ based on the infinite allele model or the number of segregating sites, this estimate of θ is a function of the frequencies of the alleles. This method is used to estimate θ for mitochondrial DNA in Drosophila melanogaster and D. virilis from data obtained by Shah and Langley (1979. Nature (London)281, 696–699) using restriction endonucleases.  相似文献   

10.
An evolutionary model for maximum likelihood alignment of DNA sequences   总被引:16,自引:0,他引:16  
Summary Most algorithms for the alignment of biological sequences are not derived from an evolutionary model. Consequently, these alignment algorithms lack a strong statistical basis. A maximum likelihood method for the alignment of two DNA sequences is presented. This method is based upon a statistical model of DNA sequence evolution for which we have obtained explicit transition probabilities. The evolutionary model can also be used as the basis of procedures that estimate the evolutionary parameters relevant to a pair of unaligned DNA sequences. A parameter-estimation approach which takes into account all possible alignments between two sequences is introduced; the danger of estimating evolutionary parameters from a single alignment is discussed.  相似文献   

11.
It is known that under neutral mutation at a known mutation rate a sample of nucleotide sequences, within which there is assumed to be no recombination, allows estimation of the effective size of an isolated population. This paper investigates the case of very long sequences, where each pair of sequences allows a precise estimate of the divergence time of those two gene copies. The average divergence time of all pairs of copies estimates twice the effective population number and an estimate can also be derived from the number of segregating sites. One can alternatively estimate the genealogy of the copies. This paper shows how a maximum likelihood estimate of the effective population number can be derived from such a genealogical tree. The pairwise and the segregating sites estimates are shown to be much less efficient than this maximum likelihood estimate, and this is verified by computer simulation. The result implies that there is much to gain by explicitly taking the tree structure of these genealogies into account.  相似文献   

12.
Estimation for an island model where mutation maintains ak-allele neutral polymorphism at a single locus on each island is considered. The likelihood of an observed sample type configuration is obtained by applying a computational algorithm analogous to Griffiths and Tavaré (Theor. Popul. Biol.46(1994), 131–159). This allows the computation of sampling distributions in an island model and investigation of their properties. Given a sample type configuration, the maximum likelihood estimate of the migration parameter is obtained by simulating independently the likelihood at a grid of points and, also, using a surface simulation method. The latter method generates the whole likelihood trajectory in a single application of the simulation program. An estimate of variance of the estimate of the migration parameter is obtained using the likelihood trajectory. A comparison of the maximum likelihood estimates of the gene flow between subpopulations is made with those obtained by using Wright'sFSTstatistic.  相似文献   

13.
Summary A new estimate of the sequence divergence of mitochondrial DNA in related species using restriction enzyme maps is constructed. The estimate is derived assuming a simple Posisson-like model for the evolutionary process and is chosen to maximize an expression which is a reasonable approximation to the true likelihood of the restriction map data. Using this estimate, four sets of mitochondrial DNA data are analyzed and discussed.  相似文献   

14.
A method is presented for estimating the transition/transversion ratio (TI/TV), based on phylogenetically independent comparisons. TI/TV is a parameter of some models used in phylogeny estimation intended to reflect the fact that nucleotide substitutions are not all equally likely. Previous attempts to estimate TI/TV have commonly faced three problems: (1) few taxa; (2) nonindependence among pairwise comparisons; and (3) multiple hits make the apparent TI/TV between two sequences decrease over time since their divergence, giving a misleading impression of relative substitution probabilities. We have made use of the time dependency, modeling how the observed TI/TV changes over time and extrapolating to estimate the ``instantaneous' TI/TV—the relevant parameter for phylogenetic inference. To illustrate our method, TI/TV was estimated for two mammalian mitochondrial genes. For 26 pairs of cytochrome b sequences, the estimate of TI/TV was 5.5; 16 pairs of 12s rRNA yielded an estimate of 9.5. These estimates are higher than those given by the maximum likelihood method and than those obtained by averaging all possible pairwise comparisons (with or without a two-parameter correction for multiple substitutions). We discuss strengths, weaknesses, and further uses of our method. Received: 22 August 1995 / Accepted: 26 July 1996  相似文献   

15.
MOTIVATION: Neighbor-dependent substitution processes generated specific pattern of dinucleotide frequencies in the genomes of most organisms. The CpG-methylation-deamination process is, e.g. a prominent process in vertebrates (CpG effect). Such processes, often with unknown mechanistic origins, need to be incorporated into realistic models of nucleotide substitutions. RESULTS: Based on a general framework of nucleotide substitutions we developed a method that is able to identify the most relevant neighbor-dependent substitution processes, estimate their relative frequencies and judge their importance in order to be included into the modeling. Starting from a model for neighbor independent nucleotide substitution we successively added neighbor-dependent substitution processes in the order of their ability to increase the likelihood of the model describing given data. The analysis of neighbor-dependent nucleotide substitutions based on repetitive elements found in the genomes of human, zebrafish and fruit fly is presented. AVAILABILITY: A web server to perform the presented analysis is freely available at: http://evogen.molgen.mpg.de/server/substitution-analysis  相似文献   

16.
M Hühn 《Génome》2000,43(5):853-856
Some relationships between the estimates of recombination fraction in two-point linkage analysis obtained by maximum likelihood, minimum chi-square, and general least squares are derived. These theoretical results are based on an approximation for the multinomial distribution. Applications (theoretical and experimental) with RFLP (restriction fragment length polymorphism) markers for a segregating F2 population are given. The minimum chi-square estimate is slightly larger than the maximum likelihood estimate. For applications, however, both estimates must be considered to be approximately equal. The least squares estimates are slightly different (larger or smaller) from these estimates.  相似文献   

17.
An estimate of the average number of evolutionarily acceptable substitutions per nucleotide since the most recent common ancestor of a pair of homologous sequences is found which uses nucleotide sequence data. The estimate is derived assuming a Poisson-like model for the evolutionary process. A method is also suggested for analyzing nucleotide sequence data in M homologous sequences (M 3). A simulation study is reported showing that the estimates are satisfactory providing there is sufficient homology between the sequences. To demonstrate the methods a numerical example using some β-globin data is presented.  相似文献   

18.
A maximum likelihood method for independently estimating the relative rate of substitution at different nucleotide sites is presented. With this method, the evolution of DNA sequences can be analyzed without assuming a specific distribution of rates among sites. To investigate the pattern of correlation of rates among sites, the method was applied to a data set consisting of the protein-coding regions of the mitochondrial genome from 10 vertebrate species. Rates appear to be strongly correlated at distances up to 40 codons apart. Furthermore, there appears to be some higher order correlation of sites approximately 75 codons apart. The method of site-by-site estimation of the rate of substitution may also be applied to examine other aspects of rate variation along a DNA sequence and to assess the difference in the support of a tree along the sequence.  相似文献   

19.
Several approaches have been suggested for estimating a respiratory response slope when both x and y variables are observed with error. Recently, a maximum likelihood estimate under the assumption of a bivariate normal distribution has been proposed. A method of moments solution yields a slope estimate of y/x as long as the underlying process mean is nonzero. This paper extends the maximum likelihood approach to the case where the process mean is zero. In this case, certain additional error assumptions must be made to yield a unique estimate. These concepts are applied to the problem of estimating an effective lung volume for steady-state breath-to-breath gas exchange data during exercise.  相似文献   

20.
Phylogenetic analysis using parsimony and likelihood methods   总被引:1,自引:0,他引:1  
The assumptions underlying the maximum-parsimony (MP) method of phylogenetic tree reconstruction were intuitively examined by studying the way the method works. Computer simulations were performed to corroborate the intuitive examination. Parsimony appears to involve very stringent assumptions concerning the process of sequence evolution, such as constancy of substitution rates between nucleotides, constancy of rates across nucleotide sites, and equal branch lengths in the tree. For practical data analysis, the requirement of equal branch lengths means similar substitution rates among lineages (the existence of an approximate molecular clock), relatively long interior branches, and also few species in the data. However, a small amount of evolution is neither a necessary nor a sufficient requirement of the method. The difficulties involved in the application of current statistical estimation theory to tree reconstruction were discussed, and it was suggested that the approach proposed by Felsenstein (1981,J. Mol. Evol. 17: 368–376) for topology estimation, as well as its many variations and extensions, differs fundamentally from the maximum likelihood estimation of a conventional statistical parameter. Evidence was presented showing that the Felsenstein approach does not share the asymptotic efficiency of the maximum likelihood estimator of a statistical parameter. Computer simulations were performed to study the probability that MP recovers the true tree under a hierarchy of models of nucleotide substitution; its performance relative to the likelihood method was especially noted. The results appeared to support the intuitive examination of the assumptions underlying MP. When a simple model of nucleotide substitution was assumed to generate data, the probability that MP recovers the true topology could be as high as, or even higher than, that for the likelihood method. When the assumed model became more complex and realistic, e.g., when substitution rates were allowed to differ between nucleotides or across sites, the probability that MP recovers the true topology, and especially its performance relative to that of the likelihood method, generally deteriorates. As the complexity of the process of nucleotide substitution in real sequences is well recognized, the likelihood method appears preferable to parsimony. However, the development of a statistical methodology for the efficient estimation of the tree topology remains a difficult open problem.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号