共查询到20条相似文献,搜索用时 15 毫秒
1.
Inferring evolutionary processes from phylogenies 总被引:23,自引:0,他引:23
MARK PAGEL 《Zoologica scripta》1997,26(4):331-348
Evolutionary processes shape the regular trends of evolution and are responsible for the diversity and distribution of contemporary species. They include correlated evolutionary change and trajectories of trait evolution, convergent and parallel evolution, differential rates of evolution, speciation and extinction, the order and direction of change in characters, and the nature of the evolutionary process itself—does change accumulate gradually, episodically, or in punctuational bursts. Phylogenies, in combination with information on species, contain the imprint of these historical evolutionary processes. By applying comparative methods based upon statistical models of evolution to well resolved phylogenies, it is possible to infer the historical evolutionary processes that must have existed in the past, given the patterns of diversity seen in the present. I describe a set of maximum likelihood statistical methods for inferring such processes. The methods estimate parameters of statistical models for inferring correlated evolutionary change in continuously varying characters, for detecting correlated evolution in discrete characters, for estimating rates of evolution, and for investigating the nature of the evolutionary process itself. They also anticipate the wealth of information becoming available to biological scientists from genetic studies that pin down relationships among organisms with unprecedented accuracy. 相似文献
2.
Inferring speciation rates from phylogenies 总被引:6,自引:0,他引:6
Nee S 《Evolution; international journal of organic evolution》2001,55(4):661-668
Abstract It is possible to estimate the rate of diversification of clades from phylogenies with a temporal dimension. First, I present several methods for constructing confidence intervals for the speciation rate under the simple assumption of a pure birth process. I discuss the relationships among these methods in the hope of clarifying some fundamental theory in this area. Their performances are compared in a simulation study and one is recommended for use as a result. A variety of other questions that may, in fact, be the questions of primary interest (e.g., Has the rate of cladogenesis been declining?) are then recast as biological variants of the purely statistical question—Is the birth process model appropriate for my data? Seen in this way, a preexisting arsenal of statistical techniques is opened up for use in this area: in particular, techniques developed for the analysis of Poisson processes and the analysis of survival data. These two approaches start from different representations of the data—the branch lengths in the tree—and I explicitly relate the two. Aiming for a synoptic account of useful theory in this area, I briefly discuss some important results from the analysis of two distinct birth‐death processes: the one introduced into this area by Hey (1992) is refitted with some powerful statistical tools. 相似文献
3.
MICHAEL HEADS 《Biological journal of the Linnean Society. Linnean Society of London》2009,98(4):757-774
The present study illustrates a method for analysing the biogeography of a group that is based on the group's phylogeny but does not invoke founder dispersal or centre of origin. The case studies presented include groups from many different parts of the world, but most are from the south‐west Pacific. The idea that basal groups are ancestral is not valid as a generalization. Neither the basal group, nor the oldest fossil represents the centre of origin, the time of origin or the ancestral ecology. Basal groups comprise less diverse sister groups and their distributions occur around centres of differentiation in already widespread ancestors, and not centres of origin for the whole group. Thus, the sequence of nodes in a phylogeny may indicate the spatial sequence of differentiation in a widespread ancestor rather than a series of founder dispersal events. Allocation of clades to a priori geographic areas, such as the continents, in the initial stages of biogeographic analysis has often involved incorrect assumptions of sympatry. This has led to the idea that the ‘areas of sympatry’ were centres of origin. Areas other than those defined by the taxa themselves need not be used in analysis. The fossil‐calibrated molecular clock, with dates transmogrified from minimum to maximum dates, has been used to test for vicariance. Recent work in population genetics, however, indicates that allopatry is caused by vicariance rather than founder dispersal, and so vicariance can instead be used to test the clock. Deriving evolutionary chronology by calibrating spatial vicariance in molecular clades with associated tectonic events is more reasonable than relying on the fossil record to give maximum (absolute) dates. © 2009 The Linnean Society of London, Biological Journal of the Linnean Society, 2009, 98 , 757–774. 相似文献
4.
Martin Lott Andreas Spillner Katharina T Huber Anna Petri Bengt Oxelman Vincent Moulton 《BMC evolutionary biology》2009,9(1):216
Background
Gene trees that arise in the context of reconstructing the evolutionary history of polyploid species are often multiply-labeled, that is, the same leaf label can occur several times in a single tree. This property considerably complicates the task of forming a consensus of a collection of such trees compared to usual phylogenetic trees. 相似文献5.
Gadagkar SR Rosenberg MS Kumar S 《Journal of experimental zoology. Part B. Molecular and developmental evolution》2005,304(1):64-74
Phylogenetic trees from multiple genes can be obtained in two fundamentally different ways. In one, gene sequences are concatenated into a super-gene alignment, which is then analyzed to generate the species tree. In the other, phylogenies are inferred separately from each gene, and a consensus of these gene phylogenies is used to represent the species tree. Here, we have compared these two approaches by means of computer simulation, using 448 parameter sets, including evolutionary rate, sequence length, base composition, and transition/transversion rate bias. In these simulations, we emphasized a worst-case scenario analysis in which 100 replicate datasets for each evolutionary parameter set (gene) were generated, and the replicate dataset that produced a tree topology showing the largest number of phylogenetic errors was selected to represent that parameter set. Both randomly selected and worst-case replicates were utilized to compare the consensus and concatenation approaches primarily using the neighbor-joining (NJ) method. We find that the concatenation approach yields more accurate trees, even when the sequences concatenated have evolved with very different substitution patterns and no attempts are made to accommodate these differences while inferring phylogenies. These results appear to hold true for parsimony and likelihood methods as well. The concatenation approach shows >95% accuracy with only 10 genes. However, this gain in accuracy is sometimes accompanied by reinforcement of certain systematic biases, resulting in spuriously high bootstrap support for incorrect partitions, whether we employ site, gene, or a combined bootstrap resampling approach. Therefore, it will be prudent to report the number of individual genes supporting an inferred clade in the concatenated sequence tree, in addition to the bootstrap support. 相似文献
6.
A statistical test of phylogenies estimated from sequence data 总被引:4,自引:0,他引:4
W H Li 《Molecular biology and evolution》1989,6(4):424-435
A simple approach to testing the significance of the branching order, estimated from protein or DNA sequence data, of three taxa is proposed. The branching order is inferred by the transformed-distance method, under the assumption that one or two outgroups are available, and the branch lengths are estimated by the least-squares method. The inferred branching order is considered significant if the estimated internodal distance is significantly greater than zero. To test this, a formula for the variance of the internodal distance has been developed. The statistical test proposed has been checked by computer simulation. The same test also applies to the case of four taxa with no outgroup, if one considers an unrooted tree. Formulas for the variances of internodal distances have also been developed for the case of five taxa. Conditions are given under which it is more efficient to add the sequence of a fifth taxon than to do 25% more nucleotide sequencing in each of the original four. A method is presented for combining analyses of disparate data to get a single P value. Finally, the test, applied to the human-chimpanzee-gorilla problem, shows that the issue is not yet resolved. 相似文献
7.
Using DNA sequence data from pathogens to infer transmission networks has traditionally been done in the context of epidemics and outbreaks. Sequence data could analogously be applied to cases of ubiquitous commensal bacteria; however, instead of inferring chains of transmission to track the spread of a pathogen, sequence data for bacteria circulating in an endemic equilibrium could be used to infer information about host contact networks. Here, we show--using simulated data--that multilocus DNA sequence data, based on multilocus sequence typing schemes (MLST), from isolates of commensal bacteria can be used to infer both local and global properties of the contact networks of the populations being sampled. Specifically, for MLST data simulated from small-world networks, the small world parameter controlling the degree of structure in the contact network can robustly be estimated. Moreover, we show that pairwise distances in the network--degrees of separation--correlate with genetic distances between isolates, so that how far apart two individuals in the network are can be inferred from MLST analysis of their commensal bacteria. This result has important consequences, and we show an example from epidemiology: how this result could be used to test for infectious origins of diseases of unknown etiology. 相似文献
8.
A new method is developed for calculating sequence substitution probabilities using Markov chain Monte Carlo (MCMC) methods. The basic strategy is to use uniformization to transform the original continuous time Markov process into a Poisson substitution process and a discrete Markov chain of state transitions. An efficient MCMC algorithm for evaluating substitution probabilities by this approach using a continuous gamma distribution to model site-specific rates is outlined. The method is applied to the problem of inferring branch lengths and site-specific rates from nucleotide sequences under a general time-reversible (GTR) model and a computer program BYPASSR is developed. Simulations are used to examine the performance of the new program relative to an existing program BASEML that uses a discrete approximation for the gamma distributed prior on site-specific rates. It is found that BASEML and BYPASSR are in close agreement when inferring branch lengths, regardless of the number of rate categories used, but that BASEML tends to underestimate high site-specific substitution rates, and to overestimate intermediate rates, when fewer than 50 rate categories are used. Rate estimates obtained using BASEML agree more closely with those of BYPASSR as the number of rate categories increases. Analyses of the posterior distributions of site-specific rates from BYPASSR suggest that a large number of taxa are needed to obtain precise estimates of site-specific rates, especially when rates are very high or very low. The method is applied to analyze 45 sequences of the alpha 2B adrenergic receptor gene (A2AB) from a sample of eutherian taxa. In general, the pattern expected for regions under negative selection is observed with third codon positions having the highest inferred rates, followed by first codon positions and with second codon positions having the lowest inferred rates. Several sites show exceptionally high substitution rates at second codon positions that may represent the effects of positive selection. 相似文献
9.
Phylogenetic relationships and taxonomic distinctiveness of closely related species and subspecies are most accurately inferred from data derived from multiple independent loci. Here, we apply several approaches for understanding species-level relationships using data from 18 nuclear DNA loci and 1 mitochondrial DNA locus within currently described species and subspecies of Sistrurus rattlesnakes. Collectively, these methods provide evidence that a currently described species, the massasauga rattlesnake (Sistrurus catenatus), consists of two well-supported clades, one composed of the two western subspecies (S. c. tergeminus and S. c. edwardsii) and the other the eastern subspecies (S. c. catenatus). Within pigmy rattlesnakes (S. miliarius), however, there is not strong support across methods for any particular grouping at the subspecific level. Monophyly based tests for taxonomic distinctiveness show evidence for distinctiveness of all subspecies but this support is strongest by far for the S. c. catenatus clade. Because support for the distinctiveness of S. c. catenatus is both strong and consistent across methods, and due to its morphological distinctiveness and allopatric distribution, we suggest that this subspecies be elevated to full species status, which has significant conservation implications. Finally, most divergence time estimates based upon a fossil-calibrated species tree are > 50% younger than those from a concatenated gene tree analysis and suggest that an active period of speciation within Sistrurus occurred within the late Pliocene/Pleistocene eras. 相似文献
10.
Statistical consistency in phylogenetics has traditionally referred to the accuracy of estimating phylogenetic parameters for a fixed number of species as we increase the number of characters. However, it is also useful to consider a dual type of statistical consistency where we increase the number of species, rather than characters. This raises some basic questions: what can we learn about the evolutionary process as we increase the number of species? In particular, does having more species allow us to infer the ancestral state of characters accurately? This question is particularly important when sequence evolution varies in a complex way from character to character, as methods applicable for i.i.d. models may no longer be valid. In this paper, we assemble a collection of results to analyse various approaches for inferring ancestral information with increasing accuracy as the number of taxa increases. 相似文献
11.
12.
Inferring a population structure for Staphylococcus epidermidis from multilocus sequence typing data 下载免费PDF全文
Miragaia M Thomas JC Couto I Enright MC de Lencastre H 《Journal of bacteriology》2007,189(6):2540-2552
Despite its importance as a human pathogen, information on population structure and global epidemiology of Staphylococcus epidermidis is scarce and the relative importance of the mechanisms contributing to clonal diversification is unknown. In this study, we addressed these issues by analyzing a representative collection of S. epidermidis isolates from diverse geographic and clinical origins using multilocus sequence typing (MLST). Additionally, we characterized the mobile element (SCCmec) carrying the genetic determinant of methicillin resistance. The 217 S. epidermidis isolates from our collection were split by MLST into 74 types, suggesting a high level of genetic diversity. Analysis of MLST data using the eBURST algorithm revealed the existence of nine epidemic clonal lineages that were disseminated worldwide. One single clonal lineage (clonal complex 2) comprised 74% of the isolates, whereas the remaining isolates were clustered into 8 minor clonal lineages and 13 singletons. According to our evolutionary model, SCCmec was acquired at least 56 times by S. epidermidis. Although geographic dissemination of S. epidermidis strains and the value of the index of association between the alleles, 0.2898 (P < 0.05), support the clonality of S. epidermidis species, examination of the sequence changes at MLST loci during clonal diversification showed that recombination gives rise to new alleles approximately twice as frequently as point mutations. We suggest that S. epidermidis has a population with an epidemic structure, in which nine clones have emerged upon a recombining background and evolved quickly through frequent transfer of genetic mobile elements, including SCCmec. 相似文献
13.
ABSTRACT: Large-scale sequencing of genomes has enabled the inference of phylogenies based on the evolution of genomic architecture, under such events as rearrangements, duplications, and losses. Many evolutionary models and associated algorithms have been designed over the last few years and have found use in comparative genomics and phylogenetic inference. However, the assessment of phylogenies built from such data has not been properly addressed to date. The standard method used in sequence-based phylogenetic inference is the bootstrap, but it relies on a large number of homologous characters that can be resampled; yet in the case of rearrangements, the entire genome is a single character. Alternatives such as the jackknife suffer from the same problem, while likelihood tests cannot be applied in the absence of well established probabilistic models. We present a new approach to the assessment of distance-based phylogenetic inference from whole-genome data; our approach combines features of the jackknife and the bootstrap and remains nonparametric. For each feature of our method, we give an equivalent feature in the sequence-based framework; we also present the results of extensive experimental testing, in both sequence-based and genome-based frameworks. Through the feature-by-feature comparison and the experimental results, we show that our bootstrapping approach is on par with the classic phylogenetic bootstrap used in sequence-based reconstruction, and we establish the clear superiority of the classic bootstrap for sequence data and of our corresponding new approach for rearrangement data over proposed variants. Finally, we test our approach on a small dataset of mammalian genomes, verifying that the support values match current thinking about the respective branches. Our method is the first to provide a standard of assessment to match that of the classic phylogenetic bootstrap for aligned sequences. Its support values follow a similar scale and its receiver-operating characteristics are nearly identical, indicating that it provides similar levels of sensitivity and specificity. Thus our assessment method makes it possible to conduct phylogenetic analyses on whole genomes with the same degree of confidence as for analyses on aligned sequences. Extensions to search-based inference methods such as maximum parsimony and maximum likelihood are possible, but remain to be thoroughly tested. 相似文献
14.
The harmful effects of inbreeding can be reduced if deleterious recessive alleles were removed (purged) by selection against homozygotes in earlier generations. If only a few generations are involved, purging is due almost entirely to recessive alleles that reduce fitness to near zero. In this case the amount of purging and allele frequency change can be inferred approximately from pedigree data alone and are independent of the allele frequency. We examined pedigrees of 59,778 U.S. Jersey cows. Most of the pedigrees were for six generations, but a few went back slightly farther. Assuming recessive homozygotes have fitness 0, the reduction of total genetic load due to purging is estimated at 17%, but most of this is not expressed, being concealed by dominant alleles. Considering those alleles that are currently expressed due to inbreeding, the estimated amount of purging is such as to reduce the expressed load (inbreeding depression) in the current generation by 12.6%. That the reduction is not greater is due mainly to (1) generally low inbreeding levels because breeders in the past have tended to avoid consanguineous matings, and (2) there is essentially no information more than six generations back. The methods used here should be applicable to other populations for which there is pedigree information. 相似文献
15.
Background
Phylogenetic analyses of angiosperm relationships have used only a small percentage of available sequence data, but phylogenetic data matrices often can be augmented with existing data, especially if one allows missing characters. We explore the effects on phylogenetic analyses of adding 378 matK sequences and 240 26S rDNA sequences to the complete 3-gene, 567-taxon angiosperm phylogenetic matrix of Soltis et al. 相似文献16.
17.
Inferring admixture proportions from molecular data 总被引:19,自引:2,他引:17
We derive here two new estimators of admixture proportions based on a
coalescent approach that explicitly takes into account molecular
information as well as gene frequencies. These estimators can be applied to
any type of molecular data (such as DNA sequences, restriction fragment
length polymorphisms [RFLPs], or microsatellite data) for which the extent
of molecular diversity is related to coalescent times. Monte Carlo
simulation studies are used to analyze the behavior of our estimators. We
show that one of them (mY) appears suitable for estimating admixture from
molecular data because of its absence of bias and relatively low variance.
We then compare it to two conventional estimators that are based on gene
frequencies. mY proves to be less biased than conventional estimators over
a wide range of situations and especially for microsatellite data. However,
its variance is larger than that of conventional estimators when parental
populations are not very differentiated. The variance of mY becomes smaller
than that of conventional estimators only if parental populations have been
kept separated for about N generations and if the mutation rate is high.
Simulations also show that several loci should always be studied to achieve
a drastic reduction of variance and that, for microsatellite data, the mean
square error of mY rapidly becomes smaller than that of conventional
estimators if enough loci are surveyed. We apply our new estimator to the
case of admixed wolflike Canid populations tested for microsatellite data.
相似文献
18.
Methods for inferring phylogenies from nucleic acid sequence data by using maximum likelihood and linear invariants 总被引:3,自引:2,他引:1
Likelihood methods and methods using invariants are procedures for inferring the evolutionary relationships among species through statistical analysis of nucleic acid sequences. A likelihood-ratio test may be used to determine the feasibility of any tree for which the maximum likelihood can be computed. The method of linear invariants described by Cavender, which includes Lake's method of evolutionary parsimony as a special case, is essentially a form of the likelihood-ratio method. In the case of a small number of species (four or five), these methods may be used to find a confidence set for the correct tree. An exact version of Lake's asymptotic chi 2 test has been mentioned by Holmquist et al. Under very general assumptions, a one-sided exact test is appropriate, which greatly increases power. 相似文献
19.
Soltis DE Soltis PS Mort ME Chase MW Savolainen V Hoot SB Morton CM 《Systematic biology》1998,47(1):32-42
To explore the feasibility of parsimony analysis for large data sets, we conducted heuristic parsimony searches and bootstrap analyses on separate and combined DNA data sets for 190 angiosperms and three outgroups. Separate data sets of 18S rDNA (1,855 bp), rbcL (1,428 bp), and atpB (1,450 bp) sequences were combined into a single matrix 4,733 bp in length. Analyses of the combined data set show great improvements in computer run times compared to those of the separate data sets and of the data sets combined in pairs. Six searches of the 18S rDNA + rbcL + atpB data set were conducted; in all cases TBR branch swapping was completed, generally within a few days. In contrast, TBR branch swapping was not completed for any of the three separate data sets, or for the pairwise combined data sets. These results illustrate that it is possible to conduct a thorough search of tree space with large data sets, given sufficient signal. In this case, and probably most others, sufficient signal for a large number of taxa can only be obtained by combining data sets. The combined data sets also have higher internal support for clades than the separate data sets, and more clades receive bootstrap support of > or = 50% in the combined analysis than in analyses of the separate data sets. These data suggest that one solution to the computational and analytical dilemmas posed by large data sets is the addition of nucleotides, as well as taxa. 相似文献