首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
类群取样与系统发育分析精确度之探索   总被引:6,自引:2,他引:4  
Appropriate and extensive taxon sampling is one of the most important determinants of accurate phylogenetic estimation. In addition, accuracy of inferences about evolutionary processes obtained from phylogenetic analyses is improved significantly by thorough taxon sampling efforts. Many recent efforts to improve phylogenetic estimates have focused instead on increasing sequence length or the number of overall characters in the analysis, and this often does have a beneficial effect on the accuracy of phylogenetic analyses. However, phylogenetic analyses of few taxa (but each represented by many characters) can be subject to strong systematic biases, which in turn produce high measures of repeatability (such as bootstrap proportions) in support of incorrect or misleading phylogenetic results. Thus, it is important for phylogeneticists to consider both the sampling of taxa, as well as the sampling of characters, in designing phylogenetic studies. Taxon sampling also improves estimates of evolutionary parameters derived from phylogenetic trees, and is thus important for improved applications of phylogenetic analyses. Analysis of sensitivity to taxon inclusion, the possible effects of long-branch attraction, and sensitivity of parameter estimation for model-based methods should be a part of any careful and thorough phylogenetic analysis. Furthermore, recent improvements in phylogenetic algorithms and in computational power have removed many constraints on analyzing large, thoroughly sampled data sets. Thorough taxon sampling is thus one of the most practical ways to improve the accuracy of phylogenetic estimates, as well as the accuracy of biological inferences that are based on these phylogenetic trees.  相似文献   

2.
3.
4.
Performance measures of phylogenetic estimation methods such as accuracy, consistency, and power are an attempt at summarizing an ensemble of a given estimator's behavior. These summaries characterize an ensemble behavior with a single number, leading to a variety of definitions. In particular, the relationships between different performance measures such as accuracy and consistency or accuracy and error depend on the exact definition of these measures. In addition, it is relatively common to use large-sample behavior to infer similar behavior for small samples. In fact, large-sample results such as the claimed asymptotic efficiency of the maximum-likelihood estimator are often uninformative for small samples. Conversely, small-sample behavior using simulations is sometimes used to imply large-sample behavior such as consistency. However, such extrapolation is often difficult. How the performance of a phylogenetic estimator scales with the addition of taxa must be qualified with respect to whether the whole tree is being estimated or a fixed subset of taxa is being estimated. It must also be qualified with respect to how tree models are sampled. Over the ensemble of all possible trees of a given size, the performance of the estimators for the whole tree estimate suffers when the tree size becomes larger. However, under certain models of cladogenesis, the estimate can improve with the addition of taxa. In fact, at all numbers of taxa there are subsets of tree models that are easier to estimate than others. This suggests that with judicious addition or subtraction of taxa we can move from tree models that are more difficult to estimate at one number of taxa to those that are easier to estimate at another number of taxa.  相似文献   

5.
JJ Wiens  J Tiu 《PloS one》2012,7(8):e42925

Background

Phylogenies are essential to many areas of biology, but phylogenetic methods may give incorrect estimates under some conditions. A potentially common scenario of this type is when few taxa are sampled and terminal branches for the sampled taxa are relatively long. However, the best solution in such cases (i.e., sampling more taxa versus more characters) has been highly controversial. A widespread assumption in this debate is that added taxa must be complete (no missing data) in order to save analyses from the negative impacts of limited taxon sampling. Here, we evaluate whether incomplete taxa can also rescue analyses under these conditions (empirically testing predictions from an earlier simulation study).

Methodology/Principal Findings

We utilize DNA sequence data from 16 vertebrate species with well-established phylogenetic relationships. In each replicate, we randomly sample 4 species, estimate their phylogeny (using Bayesian, likelihood, and parsimony methods), and then evaluate whether adding in the remaining 12 species (which have 50, 75, or 90% of their data replaced with missing data cells) can improve phylogenetic accuracy relative to analyzing the 4 complete taxa alone. We find that in those cases where sampling few taxa yields an incorrect estimate, adding taxa with 50% or 75% missing data can frequently (>75% of relevant replicates) rescue Bayesian and likelihood analyses, recovering accurate phylogenies for the original 4 taxa. Even taxa with 90% missing data can sometimes be beneficial.

Conclusions

We show that adding taxa that are highly incomplete can improve phylogenetic accuracy in cases where analyses are misled by limited taxon sampling. These surprising empirical results confirm those from simulations, and show that the benefits of adding taxa may be obtained with unexpectedly small amounts of data. These findings have important implications for the debate on sampling taxa versus characters, and for studies attempting to resolve difficult phylogenetic problems.  相似文献   

6.
Bayesian clustering methods have been widely used for studying species delimitation and genetic introgression. In order to test the effect of phylogenetic relationships and sampling scheme on the inferred clustering solution and on the performance of Bayesian clustering analysis, I simulated genotypes of the interfertile oak species Quercus robur, Quercus petraea, and Quercus pubescens and I run analyses using two popular software programs, STRUCTURE and BAPS. First, based on purebred simulations, I compared clustering solutions resulting from different sample size configurations. While clustering solution generally reflected the taxonomic relationships when equal samples of each species were included, spurious partition was inferred by STRUCTURE when some species were represented by larger and others by smaller samples. In very unbalanced configurations, STRUCTURE failed to identify the three species, even if three subpopulations were assumed. By contrast, BAPS could properly identify the three species under any sampling scheme. Second, based on simulations of purebreds and hybrids, I tested the performance of individual assignments with variable number of loci. This analysis showed that STRUCTURE can detect introgressed individuals more efficiently than BAPS. However, BAPS could assign purebreds more efficiently with a lower number of loci. Method performance also depended on phylogenetic relationships. In the case of Q. petraea, Q. pubescens, and their hybrids, method performance was lower due to their phylogenetic affinity. Inclusion of three instead of two species into the analysis led to reduction of performance, and to misclassification of hybrids, which often reflected the phylogenetic affinity between Q. petraea and Q. pubescens.  相似文献   

7.
We asked whether regenerating hindlimb motor axons would innervate inappropriate hindlimb regions if competition from appropriate innervation were prevented. The three ventral roots that innervate the hindlimb in the bullfrog (Rana catesbeiana) tadpole were transected, and the two more rostral roots were ligated to prevent regeneration. The most caudal root, which primarily supplies more distal limb musculature in unoperated tadpoles, was left free to regenerate. The specificity of regeneration was assessed by retrogradely labeling spinal motoneurons with HRP placed in the ventral thigh, a region that receives most of its innervation from the ligated roots. Despite the lack of competition from appropriate innervation, the regenerating root did not provide substantial innervation to proximal limb musculature. The same result was obtained in tadpoles operated upon at stages when regeneration of motor axons is specific and in tadpoles at stages when regenerating motor axons do not reinnervate their appropriate targets (Farel and Bemelmans, 1986), although the mechanisms in each case are likely different.  相似文献   

8.
We perform Bayesian phylogenetic analyses on cytochrome b sequences from 264 of the 290 extant cetartiodactyl mammals (whales plus even-toed ungulates) and two recently extinct species, the 'Mouse Goat' and the 'Irish Elk'. Previous primary analyses have included only a small portion of the species diversity within Cetartiodactyla, while a complete supertree analysis lacks resolution and branch lengths limiting its utility for comparative studies. The benefits of using a single-gene approach include rapid phylogenetic estimates for a large number of species. However, single-gene phylogenies often differ dramatically from studies involving multiple datasets suggesting that they often are unreliable. However, based on recovery of benchmark clades-clades supported in prior studies based on multiple independent datasets-and recovery of undisputed traditional taxonomic groups, Cytb performs extraordinarily well in resolving cetartiodactyl phylogeny when taxon sampling is dense. Missing data, however, (taxa with partial sequences) can compromise phylogenetic accuracy, suggesting a tradeoff between the benefits of adding taxa and introducing question marks. In the full data, a few species with a short sequences appear misplaced, however, sequence length alone seems a poor predictor of this phenomenon as other taxa with equally short sequences were not conspicuously misplaced. Although we recommend awaiting a better supported phylogeny based on more character data to reconsider classification and taxonomy within Cetartiodactyla, the new phylogenetic hypotheses provided here represent the currently best available tool for comparative species-level studies within this group. Cytb has been sequenced for a large percentage of mammals and appears to be a reliable phylogenetic marker as long as taxon sampling is dense. Therefore, an opportunity exists now to reconstruct detailed phylogenies of most of the major mammalian clades to rapidly provide much needed tools for species-level comparative studies.  相似文献   

9.

Background  

Malagasy tenrecs belong to the Afrotherian clade of placental mammals and comprise three subfamilies divided in eight genera (Tenrecinae: Tenrec, Echinops, Setifer and Hemicentetes; Oryzorictinae: Oryzorictes, Limnogale and Microgale; Geogalinae: Geogale). The diversity of their morphology and incomplete taxon sampling made it difficult until now to resolve phylogenies based on either morphology or molecular data for this group. Therefore, in order to delineate the evolutionary history of this family, phylogenetic and dating analyses were performed on a four nuclear genes dataset (ADRA2B, AR, GHR and vWF) including all Malagasy tenrec genera. Moreover, the influence of both taxon sampling and data partitioning on the accuracy of the estimated ages were assessed.  相似文献   

10.
Fossil taxa are critical to inferences of historical diversity and the origins of modern biodiversity, but realizing their evolutionary significance is contingent on restoring fossil species to their correct position within the tree of life. For most fossil species, morphology is the only source of data for phylogenetic inference; this has traditionally been analysed using parsimony, the predominance of which is currently challenged by the development of probabilistic models that achieve greater phylogenetic accuracy. Here, based on simulated and empirical datasets, we explore the relative efficacy of competing phylogenetic methods in terms of clade support. We characterize clade support using bootstrapping for parsimony and Maximum Likelihood, and intrinsic Bayesian posterior probabilities, collapsing branches that exhibit less than 50% support. Ignoring node support, Bayesian inference is the most accurate method in estimating the tree used to simulate the data. After assessing clade support, Bayesian and Maximum Likelihood exhibit comparable levels of accuracy, and parsimony remains the least accurate method. However, Maximum Likelihood is less precise than Bayesian phylogeny estimation, and Bayesian inference recaptures more correct nodes with higher support compared to all other methods, including Maximum Likelihood. We assess the effects of these findings on empirical phylogenies. Our results indicate probabilistic methods should be favoured over parsimony.  相似文献   

11.
We propose a new approach to fitting marginal models to clustered data when cluster size is informative. This approach uses a generalized estimating equation (GEE) that is weighted inversely with the cluster size. We show that our approach is asymptotically equivalent to within-cluster resampling (Hoffman, Sen, and Weinberg, 2001, Biometrika 73, 13-22), a computationally intensive approach in which replicate data sets containing a randomly selected observation from each cluster are analyzed, and the resulting estimates averaged. Using simulated data and an example involving dental health, we show the superior performance of our approach compared to unweighted GEE, the equivalence of our approach with WCR for large sample sizes, and the superior performance of our approach compared with WCR when sample sizes are small.  相似文献   

12.
The pooling robustness property of distance sampling results in unbiased abundance estimation even when sources of variation in detection probability are not modeled. However, this property cannot be relied upon to produce unbiased subpopulation abundance estimates when using a single pooled detection function that ignores subpopulations. We investigate by simulation the effect of differences in subpopulation detectability upon bias in subpopulation abundance estimates. We contrast subpopulation abundance estimates using a pooled detection function with estimates derived using a detection function model employing a subpopulation covariate. Using point transect survey data from a multispecies songbird study, species-specific abundance estimates are compared using pooled detection functions with and without a small number of adjustment terms, and a detection function with species as a covariate. With simulation, we demonstrate the bias of subpopulation abundance estimates when a pooled detection function is employed. The magnitude of the bias is positively related to the magnitude of disparity between the subpopulation detection functions. However, the abundance estimate for the entire population remains unbiased except when there is extreme heterogeneity in detection functions. Inclusion of a detection function model with a subpopulation covariate essentially removes the bias of the subpopulation abundance estimates. The analysis of the songbird point count surveys shows some bias in species-specific abundance estimates when a pooled detection function is used. Pooling robustness is a unique property of distance sampling, producing unbiased abundance estimates at the level of the study area even in the presence of large differences in detectability between subpopulations. In situations where subpopulation abundance estimates are required for data-poor subpopulations and where the subpopulations can be identified, we recommend the use of subpopulation as a covariate to reduce bias induced in subpopulation abundance estimates.  相似文献   

13.
The Ixodes ricinus species complex is a group of ticks distributed in almost all geographic regions of the world. Lyme borreliosis spirochetes are primarily transmitted by tick species within this complex. It has been hypothesized that the Lyme vector ticks around the world are closely related and represent a monophyletic group. This implies that vector competence in ixodid ticks for Lyme agents might have evolved only once. To test this hypothesis, we used a molecular phylogenetic approach. Two fragments of mitochondrial 16S ribosomal deoxyribonucleic acid were sequenced from 11 species in the I. ricinus complex and from 16 other species of Ixodes. Phylogenetic analysis using Bayesian methodology indicated that the I. ricinus complex is not a monophyletic group unless 3 additional Ixodes species are included in it. The known major vectors of Lyme disease agents in different areas of the world are not sister taxa. This suggests that acquisition of the ability to transmit borreliosis agents in species of Ixodes may have multiple origins.  相似文献   

14.
Summary The 16S ribosomal RNA (30S subunit) ofRhodopseudomonas spheroides has been characterized in terms of T1 ribonuclease digestion products. This fingerprint ultimately permits the placement ofR. spheroides into a detailed procaryotic phylogenetic tree. Given the number of major procaryotic lines that have been characterized in these terms to date, one can tentatively place the Athiorhodaceae closer to the Vibrio-Enteric group than to the Bacillaceae or Cyanophyta.  相似文献   

15.
Single ion channel currents can be analysed by hidden or aggregated Markov models. A classical result from Fredkin et al. (Proceedings of the Berkeley conference in honor of Jerzy Neyman and Jack Kiefer, vol I, pp 269–289, 1985) states that the maximum number of identifiable parameters is bounded by 2nonc, where no and nc denote the number of open and closed states, respectively. We show that this bound can be overcome when the probabilities of the initial distribution are known and the data consist of several sweeps.  相似文献   

16.
Sakaguchi M  Inagaki Y  Hashimoto T 《Gene》2007,405(1-2):47-54
By recent advance in evolutionary biology, the majority of eukaryotes are classified into six eukaryotic assemblages called as "supergroups". However, several eukaryotic groups show no clear evolutionary affinity to any of the six supergroups. Centrohelida, one of major heliozoan groups, are such an unresolved lineage. In this study, we newly determined the genes encoding translation elongation factor 2 (EF2), cytosolic heat shock protein 70 (HSP70), and cytosolic heat shock protein 90 (HSP90) from the centroheliozoan Raphidiophrys contractilis. The three Raphidiophrys genes were then combined with previously determined actin, alpha-tubulin, beta-tubulin, and SSU rRNA sequences to phylogenetically analyze the position of Centrohelida in global eukaryotic phylogeny. Although the multi-gene data sets examined in this study are the largest ones including the centroheliozoan sequences, the relationships between Centrohelida and the eukaryotic groups considered were unresolved. Our careful investigation revealed that the phylogenetic estimates were highly sensitive to genes included in the multi-gene alignment. The signal of SSU rRNA and that of alpha-tubulin appeared to conflict with one another: the former strongly prefers a monophyly of Diplomonadida (e.g., Giardia), Parabasalia (e.g., Trichomonas), Heterolobosea (e.g., Naegleria), and Euglenozoa (e.g., Trypanosoma), while the latter unites Diplomonadida, Parabasalia, Metazoa, and Fungi. In addition, EF2 robustly unites Rhodophyta and Viridiplantae, while the remaining genes considered in this study do not positively support the particular relationship. Thus, it is difficult to identify the phylogenetic relatives of Centrohelida in the present study, since strong (and some are conflicting) gene-specific "signals" are predominant in the current multi-gene data. We concluded that larger scale multi-gene phylogenies are necessary to elucidate the origin and evolution of Centrohelida.  相似文献   

17.
Plumage-based phylogenetic analyses of the Merops bee-eaters   总被引:1,自引:0,他引:1  
D. BRENT BURT 《Ibis》2004,146(3):481-492
I review previous systematic work on the family Meropidae and present phylogenetic hypotheses derived from my analyses of colour, pattern and shape variation in 30 plumage regions among species and subspecies in this family. Consistent patterns are seen across shallow portions of the trees. Uncertainty remains concerning the placement of several deep branches within this group's phylogeny. In particular, the phylogenetic placement of Meropogon forsteni and Merops breweri , M. ornatus , M. hirundineus and M. boehmi remains uncertain. The biogeographical patterns in the resultant trees are similar with either a Southeast Asian or African origin for the family, with most of the early diversification occurring in Africa, and with multiple independent subsequent invasions of non-African areas.  相似文献   

18.
Interest in methods that estimate speciation and extinction rates from molecular phylogenies has increased over the last decade. The application of such methods requires reliable estimates of tree topology and node ages, which are frequently obtained using standard phylogenetic inference combining concatenated loci and molecular dating. However, this practice disregards population‐level processes that generate gene tree/species tree discordance. We evaluated the impact of employing concatenation and coalescent‐based phylogeny inference in recovering the correct macroevolutionary regime using simulated data based on the well‐established diversification rate shift of delphinids in Cetacea. We found that under scenarios of strong incomplete lineage sorting, macroevolutionary analysis of phylogenies inferred by concatenating loci failed to recover the delphinid diversification shift, while the coalescent‐based tree consistently retrieved the correct rate regime. We suggest that ignoring microevolutionary processes reduces the power of methods that estimate macroevolutionary regimes from molecular data.  相似文献   

19.
20.
Inferring the relationships among Bilateria has been an active and controversial research area since Haeckel. The lack of a sufficient number of phylogenetically reliable characters was the main limitation of traditional phylogenies based on morphology. With the advent of molecular data, this problem has been replaced by another one, statistical inconsistency, which stems from an erroneous interpretation of convergences induced by multiple changes. The analysis of alignments rich in both genes and species, combined with a probabilistic method (maximum likelihood or Bayesian) using sophisticated models of sequence evolution, should alleviate these two major limitations. We applied this approach to a dataset of 94 genes and 79 species using CAT, a previously developed model accounting for site-specific amino acid replacement patterns. The resulting tree is in good agreement with current knowledge: the monophyly of most major groups (e.g. Chordata, Arthropoda, Lophotrochozoa, Ecdysozoa, Protostomia) was recovered with high support. Two results are surprising and are discussed in an evo-devo framework: the sister-group relationship of Platyhelminthes and Annelida to the exclusion of Mollusca, contradicting the Neotrochozoa hypothesis, and, with a lower statistical support, the paraphyly of Deuterostomia. These results, in particular the status of deuterostomes, need further confirmation, both through increased taxonomic sampling, and future improvements of probabilistic models.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号