首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Resolution of the total evidence (i.e., character congruence) versus consensus (i.e., taxonomic congruence) debate has been impeded by (1) a failure to employ validation methods consistently across both tree-building and consensus analyses, (2) the incomparability of methods for constructing as opposed to those for combining trees, and (3) indifference to aspects of trees other than their topologies. We demonstrate a uniform, distance-based approach which allows for comparability among the results of character- and taxonomic-congruence studies, whether or not an identical suite of taxa has been included in all contributing data sets. Our results indicate that total-evidence and consensus trees differ little in topology if branch lengths are taken into account when combining two or more trees. In addition, when character-state data are converted to distances, our method permits their combination with information produced by techniques which generate distances directly. Moreover, treating all data sets or trees as distance matrices avoids the problem that different numbers of characters in contributing studies may confound the conclusions of a total-evidence or consensus analysis. Our protocol is illustrated with an example involving bats, in which the three component studies based on serology, DNA hybridization, and anatomy imply distinct phylogenies. However, the total-evidence and consensus trees support a fourth, somewhat different, topology resolved at all but one node and which conforms closely to the currently accepted higher category classification of Chiroptera.  相似文献   

2.
Phylogenetic trees based on mtDNA polymorphisms are often used to infer the history of recent human migrations. However, there is no consensus on which method to use. Most methods make strong assumptions which may bias the choice of polymorphisms and result in computational complexity which limits the analysis to a few samples/polymorphisms. For example, parsimony minimizes the number of mutations, which biases the results to minimizing homoplasy events. Such biases may miss the global structure of the polymorphisms altogether, with the risk of identifying a "common" polymorphism as ancient without an internal check on whether it either is homoplasic or is identified as ancient because of sampling bias (from oversampling the population with the polymorphism). A signature of this problem is that different methods applied to the same data or the same method applied to different datasets results in different tree topologies. When the results of such analyses are combined, the consensus trees have a low internal branch consensus. We determine human mtDNA phylogeny from 1737 complete sequences using a new, direct method based on principal component analysis (PCA) and unsupervised consensus ensemble clustering. PCA identifies polymorphisms representing robust variations in the data and consensus ensemble clustering creates stable haplogroup clusters. The tree is obtained from the bifurcating network obtained when the data are split into k = 2,3,4,...,kmax clusters, with equal sampling from each haplogroup. Our method assumes only that the data can be clustered into groups based on mutations, is fast, is stable to sample perturbation, uses all significant polymorphisms in the data, works for arbitrary sample sizes, and avoids sample choice and haplogroup size bias. The internal branches of our tree have a 90% consensus accuracy. In conclusion, our tree recreates the standard phylogeny of the N, M, L0/L1, L2, and L3 clades, confirming the African origin of modern humans and showing that the M and N clades arose in almost coincident migrations. However, the N clade haplogroups split along an East-West geographic divide, with a "European R clade" containing the haplogroups H, V, H/V, J, T, and U and a "Eurasian N subclade" including haplogroups B, R5, F, A, N9, I, W, and X. The haplogroup pairs (N9a, N9b) and (M7a, M7b) within N and M are placed in nonnearest locations in agreement with their expected large TMRCA from studies of their migrations into Japan. For comparison, we also construct consensus maximum likelihood, parsimony, neighbor joining, and UPGMA-based trees using the same polymorphisms and show that these methods give consistent results only for the clade tree. For recent branches, the consensus accuracy for these methods is in the range of 1-20%. From a comparison of our haplogroups to two chimp and one bonobo sequences, and assuming a chimp-human coalescent time of 5 million years before present, we find a human mtDNA TMRCA of 206,000 +/- 14,000 years before present.  相似文献   

3.
Majority-rule reduced consensus trees and their use in bootstrapping   总被引:3,自引:0,他引:3  
Bootstrap analyses are usually summarized with majority-rule component consensus trees. This consensus method is based on replicated components and, like all component consensus methods, it is insensitive to other kinds of agreement between trees. Recently developed reduced consensus methods can be used to summarize much additional agreement on hypothesised phylogenetic relationships among multiple trees. The new methods are "strict" in the sense that they require agreement among all the trees being compared for any relationships to be represented in a consensus tree. Majority-rule reduced consensus methods are described and their use in bootstrap analyses is illustrated with a hypothetical and a real example. The new methods provide summaries of the bootstrap proportions of all n-taxon statements/partitions and facilitate the identification of hypotheses of relationships that are supported by high bootstrap proportions, in spite of a lack of support for particular components or clades. In practice majority-rule reduced consensus profiles may contain many trees. The size of the profile can be reduced by constraints on minimal bootstrap proportions and/or cardinality of the included trees. Majority-rule reduced consensus trees can also be selected a posteriori from the profile. Surrogates to the majority-rule reduced consensus methods using partition tables or tree pruning options provided by widely used phylogenetic inference software are also described. The methods are designed to produce more informative summaries of bootstrap analyses and thereby foster more informed assessment of the strengths and weaknesses of complex phylogenetic hypotheses.   相似文献   

4.
In intraspecific studies, reticulated graphs are valuable tools for visualization, within a single figure, of alternative genealogical pathways among haplotypes. As available software packages implementing the global maximum parsimony (MP) approach only give the possibility to merge resulting topologies into less-resolved consensus trees, MP has often been neglected as an alternative approach to purely algorithmic (i.e., methods defined solely on the basis of an algorithm) "network" construction methods. Here, we propose to search tree space using the MP criterion and present a new algorithm for uniting all equally most parsimonious trees into a single (possibly reticulated) graph. Using simulated sequence data, we compare our method with three purely algorithmic and widely used graph construction approaches (minimum-spanning network, statistical parsimony, and median-joining network). We demonstrate that the combination of MP trees into a single graph provides a good estimate of the true genealogy. Moreover, our analyses indicate that, when internal node haplotypes are not sampled, the median-joining and MP methods provide the best estimate of the true genealogy whereas the minimum-spanning algorithm shows very poor performances.  相似文献   

5.
Quartet mapping and the extent of lateral transfer in bacterial genomes   总被引:4,自引:0,他引:4  
Several recent analyses have used quartet-based methods to assess the congruence among phylogenies derived for large sets of genes from prokaryotic genomes. The principal conclusion from these studies is that lateral gene transfer (LGT) has blurred prokaryotic phylogenies to such a degree that the darwinian scheme of treelike evolution might be abandoned in favor of a net or web. Here, we focus on one of these methods, quartet mapping, and show that its application can lead to overestimation of the extent of inferred LGT in prokaryotes, particularly when applied to distantly related taxa.  相似文献   

6.
Distributional similarity (congruence) between phylogenetically independent taxonomic groups has important biogeographical as well as conservation implications. When multiple groups show congruence, one or two of them can be used as surrogates of diversity in others; thus, simplifying some of the challenges of area prioritization for conservation action. Here we test for congruence in complementarity between amphibians, reptiles and birds across seven tropical rainforest sites in the Eastern Himalaya and Indo-Burma global biodiversity hotspots. The results show that while frogs and lizards are strongly congruent with each other, birds as a whole do not show congruence with either of them. However, certain bird subgroups delineated on the basis of broad ecological niche and life history attributes are significantly congruent with both frogs and lizards. Multiple Mantel regression between environmental variable and species distribution dissimilarity matrices indicate that along with differential response to between-site ecological differences, inherent life-history characteristics shared by certain groups contributes to observed patterns of congruence. Our analyses indicate that examining biologically distinct subsets of larger groups can improve the resolution of congruence analyses. This approach can refine area-prioritization initiatives by revealing fine-scale discordances between otherwise concordant groups, and vice versa. Given that monetary resources do not always allow inclusion of multiple groups in biodiversity inventorying efforts, performing such analyses also makes economic sense because it can provide better resolution even with single-group data. In the context of conservation in North-east India, the results highlight the biogeographical complexity of the region, and also point at future priorities for biodiversity inventorying and conservation prioritization, both in terms of areas as well as taxonomic groups.  相似文献   

7.
Many phylogenetic analyses that include numerous terminals but few genes show high resolution and branch support for relatively recently diverged clades, but lack of resolution and/or support for "basal" clades of the tree. The various benefits of increased taxon and character sampling have been widely discussed in the literature, albeit primarily based on simulations rather than empirical data. In this study, we used a well-sampled gene-tree analysis (based on 100 mitochondrial genomes of higher teleost fishes) to test empirically the efficiency of different methods of data sampling and phylogenetic inference to "correctly" resolve the basal clades of a tree (based on congruence with the reference tree constructed using all 100 taxa and 7990 characters). By itself, increased character sampling was an inefficient method by which to decrease the likelihood of "incorrect" resolution (i.e., incongruence with the reference tree) for parsimony analyses. Although increased taxon sampling was a powerful approach to alleviate "incorrect" resolution for parsimony analyses, it had the general effect of increasing the number of, and support for, "incorrectly" resolved clades in the Bayesian analyses. For both the parsimony and Bayesian analyses, increased taxon sampling, by itself, was insufficient to help resolve the basal clades, making this sampling strategy ineffective for that purpose. For this empirical study, the most efficient of the six approaches considered to resolve the basal clades when adding nucleotides to a dataset that consists of a single gene sampled for a small, but representative, number of taxa, is to increase character sampling and analyze the characters using the Bayesian method.  相似文献   

8.
Knowledge of cross-transmission and hybridization between parasites of humans and reservoir hosts is critical for understanding the evolution of the parasite and for implementing control programmes. There is now a consensus that populations of pig and human Ascaris (roundworms) show significant genetic subdivision. However, it is unclear whether this has resulted from a single or multiple host shift(s). Furthermore, previous molecular data have not been sufficient to determine whether sympatric populations of human and pig Ascaris can exchange genes. To disentangle patterns of host colonization and hybridization, we used 23 microsatellite loci to conduct Bayesian clustering analyses of individual worms collected from pigs and humans. We observed strong differentiation between populations which was primarily driven by geography, with secondary differentiation resulting from host affiliation within locations. This pattern is consistent with multiple host colonization events. However, there is low support for the short internal branches of the dendrograms. In part, the relationships among clusters may result from current hybridization among sympatric human and pig roundworms. Indeed, congruence in three Bayesian methods indicated that 4 and 7% of roundworms sampled from Guatemala and China, respectively, were hybrids. These results indicate that there is contemporary cross-transmission between populations of human and pig Ascaris.  相似文献   

9.
Exon-intron structure and evolution of the Lipocalin gene family   总被引:6,自引:0,他引:6  
The Lipocalins are an ancient protein family whose expression is currently confirmed in bacteria, protoctists, plants, arthropods, and chordates. The evolution of this protein family has been assessed previously using amino acid sequence phylogenies. In this report we use an independent set of characters derived from the gene structure (exon-intron arrangement) to infer a new lipocalin phylogeny. We also present the novel gene structure of three insect lipocalins. The position and phase of introns are well preserved among lipocalin clades when mapped onto a protein sequence alignment, suggesting the homologous nature of these introns. Because of this homology, we use the intron position and phase of 23 lipocalin genes to reconstruct a phylogeny by maximum parsimony and distance methods. These phylogenies are very similar to the phylogenies derived from protein sequence. This result is confirmed by congruence analysis, and a consensus tree shows the commonalities between the two source trees. Interestingly, the intron arrangement phylogeny shows that metazoan lipocalins have more introns than other eukaryotic lipocalins, and that intron gains have occurred in the C-termini of chordate lipocalins. We also analyze the relationship of intron arrangement and protein tertiary structure, as well as the relationship of lipocalins with members of the proposed structural superfamily of calycins. Our congruence analysis validates the gene structure data as a source of phylogenetic information and helps to further refine our hypothesis on the evolutionary history of lipocalins.  相似文献   

10.
Critical comparison of consensus methods for molecular sequences.   总被引:6,自引:0,他引:6       下载免费PDF全文
Consensus methods are recognized as valuable tools for data analysis, especially when some sort of data aggregation is desired. Although consensus methods for sequences play a vital role in molecular biology, researchers pay little heed to the features and limitations of such methods, and so there are risks that criteria for constructing consensus sequences will be misused or misunderstood. To understand better the issues involved, we conducted a critical comparison of nine consensus methods for sequences, of which eight were used in papers appearing in this journal. We report the results of that comparison, and we make recommendations which we hope will assist researchers when they must select particular consensus methods for particular applications.  相似文献   

11.
Independence of alignment and tree search   总被引:6,自引:0,他引:6  
I assert that similarity is the appropriate homology criterion for sequence alignment, as it is with morphology. Methods that select among alignments using parsimony-based tree lengths, as implemented in MALIGN and POY, arrange the data such that they are consistent with a minimum-evolution model. When combining data sets in phylogenetic analyses, we are not trying to reinforce our earlier hypotheses about relationships, but rather to test them. The severity of this test is compromised when congruence with other characters is favored when selecting among alignment parameters.  相似文献   

12.
Because it is based on a significance test that takes the shape of the tree as given, the Rzhetsky/Nei Confidence Probability (CP) can attribute high "confidence" to groups with little or even literally no support. CP further overestimates confidence in that it takes no account of reliability of alignment, and it shows instability in that drastic changes in results can be produced by small changes in data. Instability can arise when alignment is uncertain, since different alignment strategies can lead to slightly different matrices. Parsimony jackknifing offers a more reliable and stable way of assessing support. To take ambiguities of alignment into account with parsimony jackknifing, we suggest "consensus" and "average" methods of summarizing jackknife results from several alignments. Reanalyzing 12S and 16S rRNA data on pelecaniform birds, we find that CP has overestimated support for the Ciconiida, for placing frigatebirds with condors, and for placing tropicbirds with cormorants.  相似文献   

13.
Within phylogenetics, two methods are known to implement cladistics: parsimony or maximum parsimony (MP) and three-item analysis (3ia). Despite the lack of suitable software, 3ia is occasionally used in systematic, and more regularly, in historical biogeography. Here, we present LisBeth, the first and only phylogenetic/biogeographic program freely available that uses the 3ia approach and offer some insights into its theoretical propositions. LisBeth does not rely on the conventional taxon/character matrix. Instead, characters are represented as rooted trees. LisBeth performs 3ia analyses based on maximum congruence of three-item statements and calculates the intersection tree (which differs from usual consensus). In biogeography, it applies the transparent method to handle widespread taxa and implements paralogy-free subtree analysis to remove redundant distributions. For the sake of interoperability, LisBeth may import/export characters from/to matrix in NEXUS format, allowing comparison with other cladistic programs. LisBeth also imports phylogenetic characters from Xper2 knowledge bases.  相似文献   

14.
Graft failure that occurs in the clonal propagation of chestnuts is a practical problem which has arisen in recent years. Several hypotheses have been put forward to explain reasons for the failure but none have focused on origin and relationships of cultivars. This study was carried out to determine whether relationships of New Zealand chestnut selections and their origin reflect patterns of graft failure within the selections. Two different character data sets, random amplified polymorphic DNA (RAPD) and morpho-nut, were employed for the analyses of the relationships between the chestnut selections. Four different analyses were done to generate trees depicting the relationships of the selections. These were: morpho-nut character, RAPD character, taxonomic congruence (combination of morpho-nut and RAPD trees), and character congruence (combination of morpho-nut and RAPD data sets). When graft failure data were mapped onto the majority rule consensus tree constructed from character congruence analysis, it was found that self graft incompatibility was reflected in the origin and relationships of the chestnut selections. Information on the affinities of the chestnut selections to introduced chestnut species showed that the selections that were mostly implicated in graft failure which are from the North Island had affinities with theCastanea crenata species. But the selections (from the South Island) that were placed withCastanea sativa as well as hybrids (1002 and 1007 from the North Island) ofCastanea mollissima andC. crenata had no failed grafts. This finding indicates that graft failure in New Zealand chestnut selections does not occur by chance but is dependent on the origin and/or evolutionary history of the selections.  相似文献   

15.
For current state-of-the-art methods, the prediction of correct topology of membrane proteins has been reported to be above 80%. However, this performance has only been observed in small and possibly biased data sets obtained from protein structures or biochemical assays. Here, we test a number of topology predictors on an "unseen" set of proteins of known structure and also on four "genome-scale" data sets, including one recent large set of experimentally validated human membrane proteins with glycosylated sites. The set of glycosylated proteins is also used to examine the ability of prediction methods to separate membrane from nonmembrane proteins. The results show that methods utilizing multiple sequence alignments are overall superior to methods that do not. The best performance is obtained by TOPCONS, a consensus method that combines several of the other prediction methods. The best methods to distinguish membrane from nonmembrane proteins belong to the "Phobius" group of predictors. We further observe that the reported high accuracies in the smaller benchmark sets are not quite maintained in larger scale benchmarks. Instead, we estimate the performance of the best prediction methods for eukaryotic membrane proteins to be between 60% and 70%. The low agreement between predictions from different methods questions earlier estimates about the global properties of the membrane proteome. Finally, we suggest a pipeline to estimate these properties using a combination of the best predictors that could be applied in large-scale proteomics studies of membrane proteins.  相似文献   

16.

Background

Although Mycobacterium tuberculosis isolates are consisted of several different lineages and the epidemiology analyses are usually assessed relative to a particular reference genome, M. tuberculosis H37Rv, which might introduce some biased results. Those analyses are essentially based genome sequence information of M. tuberculosis and could be performed in sillico in theory, with whole genome sequence (WGS) data available in the databases and obtained by next generation sequencers (NGSs). As an approach to establish higher resolution methods for such analyses, whole genome sequences of the M. tuberculosis complexes (MTBCs) strains available on databases were aligned to construct virtual reference genome sequences called the consensus sequence (CS), and evaluated its feasibility in in sillico epidemiological analyses.

Results

The consensus sequence (CS) was successfully constructed and utilized to perform phylogenetic analysis, evaluation of read mapping efficacy, which is crucial for detecting single nucleotide polymorphisms (SNPs), and various MTBC typing methods virtually including spoligotyping, VNTR, Long sequence polymorphism and Beijing typing. SNPs detected based on CS, in comparison with H37Rv, were utilized in concatemer-based phylogenetic analysis to determine their reliability relative to a phylogenetic tree based on whole genome alignment as the gold standard. Statistical comparison of phylogenic trees based on CS with that of H37Rv indicated the former showed always better results that that of later. SNP detection and concatenation with CS was advantageous because the frequency of crucial SNPs distinguishing among strain lineages was higher than those of H37Rv. The number of SNPs detected was lower with the consensus than with the H37Rv sequence, resulting in a significant reduction in computational time. Performance of each virtual typing was satisfactory and accorded with those published when those are available.

Conclusions

These results indicated that virtual CS constructed from genome sequence data is an ideal approach as a reference for MTBC studies.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1368-9) contains supplementary material, which is available to authorized users.  相似文献   

17.
Protein fold is defined by a spatial arrangement of three types of secondary structures (SSs) including helices, sheets, and coils/loops. Current methods that predict SS from sequences rely on complex machine learning-derived models andprovide the three-state accuracy (Q3) at about 82%. Further improvements in predictive quality could be obtained with a consensus-based approach, which so far received limited attention. We perform first-of-its-kind comprehensive design of a SS consensus predictor (SScon), in which we consider 12 modern standalone SS predictors and utilize Support Vector Machine (SVM) to combine their predictions. Using a large benchmark data-set with 10 random training-test splits, we show that a simple, voting-based consensus of carefully selected base methods improves Q3 by 1.9% when compared to the best single predictor. Use of SVM provides additional 1.4% improvement with the overall Q3 at 85.6% and segment overlap (SOV3) at 83.7%, when compared to 82.3 and 80.9%, respectively, obtained by the best individual methods. We also show strong improvements when the consensus is based on ab-initio methods, with Q3 = 82.3% and SOV3 = 80.7% that match the results from the best template-based approaches. Our consensus reduces the number of significant errors where helix is confused with a strand, provides particularly good results for short helices and strands, and gives the most accurate estimates of the content of individual SSs in the chain. Case studies are used to visualize the improvements offered by the consensus at the residue level. A web-server and a standalone implementation of SScon are available at http://biomine.ece.ualberta.ca/SSCon/.  相似文献   

18.
Biodiversity losses over the next century are predicted to result in alterations of ecosystem functions that are on par with other major drivers of global change. Given the seriousness of this issue, there is a need to effectively monitor global biodiversity. Because performing biodiversity censuses of all taxonomic groups is prohibitively costly, indicator groups have been studied to estimate the biodiversity of different taxonomic groups. Quantifying cross-taxon congruence is a method of evaluating the assumption that the diversity of one taxonomic group can be used to predict the diversity of another. To improve the predictive ability of cross-taxon congruence in aquatic ecosystems, we evaluated whether body size, measured as the ratio of average body length between organismal groups, is a significant predictor of their cross-taxon biodiversity congruence. To test this hypothesis, we searched the published literature and screened for studies that used species richness correlations as their metric of cross-taxon congruence. We extracted 96 correlation coefficients from 16 studies, which encompassed 784 inland water bodies. With these correlation coefficients, we conducted a categorical meta-analysis, grouping data based on the body size ratio of organisms. Our results showed that cross-taxon congruence is variable among sites and between different groups (r values ranging between −0.53 to 0.88). In addition, our quantitative meta-analysis demonstrated that organisms most similar in body size showed stronger species richness correlations than organisms which differed increasingly in size (radj 2 = 0.94, p = 0.02). We propose that future studies applying biodiversity indicators in aquatic ecosystems consider functional traits such as body size, so as to increase their success at predicting the biodiversity of taxonomic groups where cost-effective conservation tools are needed.  相似文献   

19.
Here a differential geometry (DG) representation of protein backbone is explored on the analyses of protein conformational ensembles. The protein backbone is described by curvature, κ, and torsion, τ, values per residue and we propose 1) a new dissimilarity and protein flexibility measurement and 2) a local conformational clustering method. The methods were applied to Ubiquitin and c-Myb-KIX protein conformational ensembles and results show that κ\τ metric space allows to properly judge protein flexibility by avoiding the superposition problem. The dmax measurement presents equally good or superior results when compared to RMSF, especially for the intrinsically unstructured protein. The clustering method is unique as it relates protein global to local dynamics by providing a global clustering solutions per residue. The methods proposed can be especially useful to the analyses of highly flexible proteins. The software written for the analyses presented here is available at https://github.com/AMarinhoSN/FleXgeo for academic usage only.  相似文献   

20.
Panbiogeography represents the spatial congruence among species distributions by means of generalized tracks. Some critics have suggested the method fails to objectively evaluate congruence, being neither repeatable nor falsifiable. The MartiTracks software was proposed to address spatial congruence using geometric properties as a counterpoint to the manual procedures so far employed in generalized track obtainment. To evaluate whether MartiTracks is a reliable alternative to the congruence problem in the quantitative panbiogeographic approach, we tested the software parameters with three analysis schemes under two real datasets. Then, we proceeded to a comparison of the results to those produced from Parsimony Analysis of Endemicity (PAE) and Clique Analysis, two quantitative methods which are based in predefined biogeographic areas or in the employment of grid cells. For PAE we used both analytical units, while Clique Analysis was restricted to grid cells. Through this, we aimed to comparatively evaluate the criteria of spatial congruence in different approaches. For each dataset and method, significantly different tracks resulted, highlighting the disparate congruence criteria among panbiogeographic approaches. Despite PAE ending up as the most reliable of the tools tested, it is still far from solving panbiogeographic congruence. The main focus of this paper, MartiTracks, is indeed a tool that makes minimum spanning tree construction a repeatable and easy-to-visualize process, but stumbles upon its obscure procedures of generalized track obtainment, congruence criteria, subjective parameter definition, the unclear implications of employing said parameters, and dubious results. Our results suggest that the subjectivity of the parameter setup process substantially influences the results, biasing them to the user-desired level of congruence. That the software produces fast and easy-to-visualize results does not make it a definitive solution to the problem of quantitative panbiogeographic approaches.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号