首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
 Multivariate analysis is a branch of statistics that successfully exploits the powerful tools of linear algebra to obtain a fairly comprehensive theory of estimation. The purpose of this paper is to explore to what extent a linear theory of estimation can be developed in the context of coalescent models used in the analysis of DNA polymorphism. We consider a large class of coalescent models, of which the neutral infinite sites model is one example. In the process, we discover several limitations of linear estimators that are quite distinct from those in the classical theory. In particular, we prove that there does not exist a uniformly BLUE (best linear unbiased estimator) for the scaled mutation parameter, under the assumptions of the neutral model of evolution. In fact, we show that no linear estimator performs uniformly better than the Watterson (1975) method based on the total number of segregating sites. For certain coalescent models, the segregating-sites estimator is actually optimal. The general conclusion is the following. If genealogical information is useful for estimating the rate of evolution, then there is no optimal linear method. If there is an optimal linear method, then no information other than the total number of segregating sites is needed. Received: 29 July 1998 / Revised version: 9 October 1998  相似文献   

3.
4.
5.
6.
7.
Linear Bayes estimators of the potency curve in bioassay   总被引:1,自引:0,他引:1  
KUO  LYNN 《Biometrika》1988,75(1):91-96
  相似文献   

8.
Empirical Bayes estimators in a multiple linear regression model   总被引:1,自引:0,他引:1  
  相似文献   

9.
Wheeler WC and Pickett KM (2008. Topology-Bayes versus clade-Bayesin phylogenetic analysis. Mol Biol Evol. 25:447–453.)discuss two ways of summarizing the posterior probability distributionof a Bayesian phylogenetic analysis, which they refer to as"topology-Bayes" and "clade-Bayes." They claim that the clade-Bayesapproach leads to problems such as "exaggerated clade support,inconsistently biased priors, and the impossibility of topologyhypothesis testing," which are not problems for the topology-Bayesapproach. However, their argument for topology-Bayes over clade-Bayesis based on errors in the interpretation of summary statisticsassociated with Bayesian phylogenetic analysis. Although thereis a well-documented difference between the maximum posteriorprobability topology and the majority-rule consensus topology(the established terms for topology-Bayes and clade-Bayes summaries,respectively), both have a place in phylogenetic analysis. Choiceof summarization strategy should be driven by choice of parametersthat need to be estimated versus those to be marginalized giventhe evolutionary questions being asked or hypotheses being tested.  相似文献   

10.
Different genes often have different phylogenetic histories. Even within regions having the same phylogenetic history, the mutation rates often vary. We investigate the prospects of phylogenetic reconstruction when all the characters are generated from the same tree topology, but the branch lengths vary (with possibly different tree shapes). Furthering work of Kolaczkowski and Thornton (2004, Nature 431: 980-984) and Chang (1996, Math. Biosci. 134: 189-216), we show examples where maximum likelihood (under a homogeneous model) is an inconsistent estimator of the tree. We then explore the prospects of phylogenetic inference under a heterogeneous model. In some models, there are examples where phylogenetic inference under any method is impossible - despite the fact that there is a common tree topology. In particular, there are nonidentifiable mixture distributions, i.e., multiple topologies generate identical mixture distributions. We address which evolutionary models have nonidentifiable mixture distributions and prove that the following duality theorem holds for most DNA substitution models. The model has either: (i) nonidentifiability - two different tree topologies can produce identical mixture distributions, and hence distinguishing between the two topologies is impossible; or (ii) linear tests - there exist linear tests which identify the common tree topology for character data generated by a mixture distribution. The theorem holds for models whose transition matrices can be parameterized by open sets, which includes most of the popular models, such as Tamura-Nei and Kimura's 2-parameter model. The duality theorem relies on our notion of linear tests, which are related to Lake's linear invariants.  相似文献   

11.
MOTIVATION: Phylogenies--the evolutionary histories of groups of organisms-play a major role in representing relationships among biological entities. Although many biological processes can be effectively modeled as tree-like relationships, others, such as hybrid speciation and horizontal gene transfer (HGT), result in networks, rather than trees, of relationships. Hybrid speciation is a significant evolutionary mechanism in plants, fish and other groups of species. HGT plays a major role in bacterial genome diversification and is a significant mechanism by which bacteria develop resistance to antibiotics. Maximum parsimony is one of the most commonly used criteria for phylogenetic tree inference. Roughly speaking, inference based on this criterion seeks the tree that minimizes the amount of evolution. In 1990, Jotun Hein proposed using this criterion for inferring the evolution of sequences subject to recombination. Preliminary results on small synthetic datasets. Nakhleh et al. (2005) demonstrated the criterion's application to phylogenetic network reconstruction in general and HGT detection in particular. However, the naive algorithms used by the authors are inapplicable to large datasets due to their demanding computational requirements. Further, no rigorous theoretical analysis of computing the criterion was given, nor was it tested on biological data. RESULTS: In the present work we prove that the problem of scoring the parsimony of a phylogenetic network is NP-hard and provide an improved fixed parameter tractable algorithm for it. Further, we devise efficient heuristics for parsimony-based reconstruction of phylogenetic networks. We test our methods on both synthetic and biological data (rbcL gene in bacteria) and obtain very promising results.  相似文献   

12.
Flavonoids have been used successfully for interpreting evolutionary relationships in many groups of angiosperms. These interpretations often have been presented in narrative fashion without specific indications of the kinds of relationships expressed. In this paper a method of phylogeny reconstruction with flavonoid data showing cladistic, patristic, and phenetic relationships is presented. Such a phylogram contains maximal information about flavonoid evolution. As an example, relationships in the North American species ofCoreopsis (Compositae), containing 46 species in 11 sections, are analyzed by this approach. A phylogeny of sections of the genus from previous morphological, chromosomal and hybridization data is compared with that from data on anthochlors (chalcones and aurones). Strong correspondence of these evolutionary interpretations gives support to the hypothesized evolutionary trends within the group.  相似文献   

13.
MOTIVATION: Previous studies have shown that accounting for site-specific amino acid replacement patterns using mixtures of stationary probability profiles offers a promising approach for improving the robustness of phylogenetic reconstructions in the presence of saturation. However, such profile mixture models were introduced only in a Bayesian context, and are not yet available in a maximum likelihood (ML) framework. In addition, these mixture models only perform well on large alignments, from which they can reliably learn the shapes of profiles, and their associated weights. RESULTS: In this work, we introduce an expectation-maximization algorithm for estimating amino acid profile mixtures from alignment databases. We apply it, learning on the HSSP database, and observe that a set of 20 profiles is enough to provide a better statistical fit than currently available empirical matrices (WAG, JTT), in particular on saturated data.  相似文献   

14.
Performance measures of phylogenetic estimation methods such as accuracy, consistency, and power are an attempt at summarizing an ensemble of a given estimator's behavior. These summaries characterize an ensemble behavior with a single number, leading to a variety of definitions. In particular, the relationships between different performance measures such as accuracy and consistency or accuracy and error depend on the exact definition of these measures. In addition, it is relatively common to use large-sample behavior to infer similar behavior for small samples. In fact, large-sample results such as the claimed asymptotic efficiency of the maximum-likelihood estimator are often uninformative for small samples. Conversely, small-sample behavior using simulations is sometimes used to imply large-sample behavior such as consistency. However, such extrapolation is often difficult. How the performance of a phylogenetic estimator scales with the addition of taxa must be qualified with respect to whether the whole tree is being estimated or a fixed subset of taxa is being estimated. It must also be qualified with respect to how tree models are sampled. Over the ensemble of all possible trees of a given size, the performance of the estimators for the whole tree estimate suffers when the tree size becomes larger. However, under certain models of cladogenesis, the estimate can improve with the addition of taxa. In fact, at all numbers of taxa there are subsets of tree models that are easier to estimate than others. This suggests that with judicious addition or subtraction of taxa we can move from tree models that are more difficult to estimate at one number of taxa to those that are easier to estimate at another number of taxa.  相似文献   

15.
Now that large-scale genome-sequencing projects are sampling many organismal lineages, it is becoming possible to compare large data sets of not only DNA and protein sequences, but also genome-level features, such as gene arrangements and the positions of mobile genetic elements. Although it is unlikely that comparisons of such features will address a large number of evolutionary branch points across the broad tree of life owing to the infeasibility of such sampling, they have great potential for resolving many crucial, contested relationships for which no other data seem promising. Here, I discuss the advancements, advantages, methods, and problems of the use of genome-level characters for reconstructing evolutionary relationships.  相似文献   

16.
本文介绍了昆虫系统发育重建研究的步骤,常用方法及相关软件的使用,指出了不同研究方法的优缺点及适用范围,分析了系统发育重建存在的问题,从而为相关研究的开展提供了参考。  相似文献   

17.
18.
19.
The use of empirical Bayes estimators in a linear regression model   总被引:1,自引:0,他引:1  
  相似文献   

20.
The computationally challenging problem of reconstructing the phylogeny of a set of contemporary data, such as DNA sequences or morphological attributes, was treated by an extended version of the neighbor-joining (NJ) algorithm. The original NJ algorithm provides a single-tree topology, after a cascade of greedy pairing decisions that tries to simultaneously optimize the minimum evolution and the least squares criteria. Given that some sub-trees are more stable than others, and that the minimum evolution tree may not be achieved by the original NJ algorithm, we propose a multi-neighbor-joining (MNJ) algorithm capable of performing multiple pairing decisions at each level of the tree reconstruction, keeping various partial solutions along the recursive execution of the NJ algorithm. The main advantages of the new reconstruction procedure are: 1) as is the case for the original NJ algorithm, the MNJ algorithm is still a low-cost reconstruction method; 2) a further investigation of the alternative topologies may reveal stable and unstable sub-trees; 3) the chance of achieving the minimum evolution tree is greater; 4) tree topologies with very similar performances will be simultaneously presented at the output. When there are multiple unrooted tree topologies to be compared, a visualization tool is also proposed, using a radial layout to uniformly distribute the branches with the help of well-known metaheuristics used in computer science.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号