首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Sensitivity analyses can be performed with respect to different methodologies, differential analytical parameters or models within a single methodology, or alignment parameters. The latter investigations are particularly relevant when divergence and/or the size of molecular data sets make alignment of sequences difficult. Sensitivity analyses are often performed for analyses incorporating Direct Optimization (via POY), either to select optimal alignment parameters or to investigate the stability of topology across parameter sets. Such investigations are rarely, if ever, performed for Clustal alignments as some manual adjustments are nearly always incorporated in the final alignment. Exploration of the performance of both POY and Clustal for a large insect data set incorporating three genes (18S, 28S, H3) and morphology reveals that the performance of POY, as measured by and ILD metric, is predictable across the landscape topology with minimal incongruence when all parameters are treated equally. In contrast, Clustal alignment followed by parsimony analysis yields a landscape with less overall variance, but less predictable behaviour across the parameter topology. © The Willi Hennig Society 2005.  相似文献   

2.
The behavior of two topological and four character‐based congruence measures was explored using different indel treatments in three empirical data sets, each with different alignment difficulties. The analyses were done using direct optimization within a sensitivity analysis framework in which the cost of indels was varied. Indels were treated either as a fifth character state, or strings of contiguous gaps were considered single events by using linear affine gap cost. Congruence consistently improved when indels were treated as single events, but no congruence measure appeared as the obviously preferable one. However, when combining enough data, all congruence measures clearly tended to select the same alignment cost set as the optimal one. Disagreement among congruence measures was mostly caused by a dominant fragment or a data partition that included all or most of the length variation in the data set. Dominance was easily detected, as the character‐based congruence measures approached their optimal value when indel costs were incremented. Dominance of a fragment or data partition was overwhelmed when new sequence length‐variable fragments or data partitions were added. © The Willi Hennig Society 2005.  相似文献   

3.
Exploring a large number of parameter sets in sensitivity analyses of direct optimization parsimony can be costly in terms of time and computing resources, and there is little a priori guidance available for reasonable limits to these search parameters. For this reason, we sought a general‐purpose upper limit for gap costs in the direct optimization program POY to streamline this process. To test the performance of POY as gap costs increase, we simulated data onto a pre‐set topology using a GTR + I + G model modified to include gaps by adding them according to a negative‐binomial model. Gaps were then removed and the data were analysed in POY at increasing gap costs. Increasing gap costs consistently resulted in reduced phylogenetic accuracy across trees of different relative branch lengths. Decoupling gap insertion and gap extension costs recovered a fraction of the accuracy lost by having both high gap insertion and gap extension costs, but only in trees with long internal nodes. To determine whether loss of phylogenetic accuracy was node‐specific, we designed a small dataset with a constrained node, where all possible combinations of cost substitution and different percentages of gap versus nucleotide changes were explored. These analyses showed that the effects of gap insertion and extension are node‐specific, and the minimum threshold for convergence on gap‐supported nodes is similar to the threshold for accuracy loss found in the larger simulated datasets. Subsequent analyses of empirical data revealed that a similar pattern of loss with gap cost increase can occur with ribosomal genes (18S, 28S, 16S and 12S) but this pattern was not seen in the intron data (myoglobin II) examined. In conjunction with previously published congruence‐based studies, the results suggest that POY sensitivity analyses can be streamlined and made more accurate if gap insertion and extension costs follow, as a guideline, a limit of four times the highest base‐transformation cost. © The Willi Hennig Society 2008.  相似文献   

4.
Comprehensive phylogenetic analyses utilize data from distinct sources, including nuclear, mitochondrial, and plastid molecular sequences and morphology. Such heterogeneous datasets are likely to require distinct models of analysis, given the different histories of mutational biases operating on these characters. The incongruence length difference (ILD) test is increasingly being used to arbitrate between competing models of phylogenetic analysis in cases where multiple data partitions have been collected. Our work suggests that the ILD test is unlikely to be an effective measure of congruence when two datasets differ markedly in size. We show that models that increase the contribution of one data partition over another are likely to increase congruence, as measured by this test. More alarmingly, for many bipartition comparisons, character congruence increases bimodally - either increasing or decreasing the contribution of one data partition will increase congruence - making it impossible to arrive at a single optimally congruent model of analysis.  相似文献   

5.
Phylogenetic relationships in southern African members of chloridoid grasses were investigated using DNA sequences from the chloroplast trnL (UAA) 5’ exon‐ trnF (GAA) region and the nuclear ribosomal internal transcribed spacer regions. The two datasets were analysed separately before being combined into a matrix of 50 specimens, representing 38 species. The congruence between the individual data sets was assessed in a conditional combination approach and the congruent data sets were then combined into a single data set. In this analysis, the chloridoid grasses were monophyletic and two large groups, corresponding to the tribes Eragrostideae and Cynodonteae, were polyphyletic; Eragrostis, the largest genus in the subfamily, was polyphyletic. Otherwise, high support levels were found at species and generic level.  相似文献   

6.
The problem of testing for congruence between phylogenetic data has long been debated among phylogeneticists, but reaches a critical point with the availability of large amount of biological sequences. Notably in prokaryotes, where the amount of lateral transfers is believed to be important, the inference of phylogenies using multiple genes requires testing for incongruence before concatenating the genes. On another scale, incongruence tests can be used to detect recombination points within single gene alignments. The incongruence length difference test (ILD), based on parsimony, has been proved to be useful for finding incongruent data sets, but its application remains limited to small data sets for computational time reasons. Here, we have adapted the principle of ILD to the BIONJ algorithm. This algorithm is based on a tree length minimisation criterion and is suitable to replace parsimony in this test when used with uncorrected distance (model-free approach). We show that this new test, ILD-BIONJ, while being much faster, is often more accurate than the ILD test, especially when the alignments compared are simulated under different evolutionary models.  相似文献   

7.
Few botanical studies have explored the potential of nuclear ribosomal DNA (nrDNA) and mitochondrial DNA (mtDNA) data obtained through genome skimming for phylogeny reconstruction. Here, we analyzed the phylogenetic information included in the nrDNA and mtDNA of 44 species of the “Adenocalymma‐Neojobertia” clade (Bignoniaceae). To deal with intraindividual polymorphisms within the nrDNA, different coding schemes were explored through the analyses of four datasets: (i) “nrDNA contig,” with base call following the majority rule; (ii) “nrDNA ambiguous,” with ambiguous base calls; (iii) “nrDNA informative,” with ambiguities converted to multistate characters; and, (iv) “mitochondrial,” with 39 mitochondrial genes. Combined analyses using the nrDNA and mtDNA data and previously published “plastid” datasets were also conducted. Trees were obtained using Maximum Likelihood and Bayesian criteria. The congruence among genomes was assessed. The nrDNA datasets were shown to be highly polymorphic within individuals, while the “mitochondrial” dataset was the least informative, with 0.36% of informative bases within the ingroup. The topologies inferred using the nrDNA and mtDNA datasets were broadly congruent with the tree derived from the analyses of the “plastid” dataset. The topological differences recovered were generally poorly supported. The topology that resulted from the analyses of the “combined” dataset largely resembles the “plastid” tree. These results highlight limitations of nuclear ribosomal DNA and mitochondrial genes for phylogeny reconstruction obtained through genome skimming and the need to include more data from both genomes. The different topologies observed among genomes also highlight the importance of exploring data from various genomes in plant phylogenetics.  相似文献   

8.
In the taxonomic congruence approach to systematics, data sets are analyzed separately, and corroboration among data sets is indicated by replicated components in topologies derived from the separate analyses. By contrast, in the total evidence and conditional combination approaches, characters from different data sets are mixed in combined phylogenetic analyses. In optimal topologies derived from such simultaneous analyses, support for a particular node may be attributed to one, some, or all of the individual data sets. Partitioned branch support (PBS) is one technique for describing the distribution of character support and conflict among data sets in simultaneous analysis. PBS is analogous to branch support (BS), but recognizes hidden support and conflicts that emerge with the combination of characters from different data sets. For both BS and PBS, support for a particular node is interpreted as the difference in cost between optimal and suboptimal topologies. A different measure, the clade stability index (CSI), assesses the robustness of a particular node through the successive removal of characters. Here, we introduce variations of the CSI, the data set removal index (DRI) and nodal data set influence (NDI), that indicate the stability of a particular node to the removal of entire data sets. Like PBS, the DRI and NDI summarize the influence of different data sets in simultaneous analysis. However, because these new methods and PBS use different perturbations to assess stability, DRI and NDI scores do not always predict PBS scores and vice versa. In this report, the DRI and NDI are compared to PBS and taxonomic congruence in a cladistic analysis of 17 data sets for Artiodactyla (Mammalia). Five indices of hidden support and conflict are defined and applied to the combined artiodactyl character set. These measures identify substantial hidden support for controversial relationships within Artiodactyla. Hidden character support is ignored in the taxonomic congruence approach to systematics, but the DRI, NDI, and PBS utilize this cryptic information in estimates of support among data sets for a given node.  相似文献   

9.
The phylogenetic position of the most speciose meiofaunal polychaete family, Nerillidae, has remained contentious. Recent hypotheses have generally focused on the fact that Nerillidae shares with Aciculata (a major polychaete subgroup) features such as compound chaetae, ventral buccal organ and short ventrolateral palps. Here we present the first phylogenetic analysis of Aciculata, together with Nerillidae, combining morphological and molecular data. We also include Aberrantidae, previously referred to or placed near to spiomorph polychaetes, but recently referred to Aciculata, possibly close to Nerillidae. The data sets of 24 terminals contain 53 morphological characters and nearly complete sequences of 18S rRNA. The sequences were analysed simultaneously with the morphological data by direct optimization in the program POY with a variety of parameter settings (costs of gaps: transversions: transitions). The various settings resulted in markedly different phylogenetic hypotheses, but on the basis of congruence (ILD) the results of two parameter settings were chosen. In all analyses, the three included nerillid species constituted a monophyletic group. Only two analyses provided fully resolved cladograms. The morphological analysis gave poor resolution and the position of the nerillids was equivocal. The two molecular‐based cladograms (minimizing ILD) were also poorly resolved, but one provided a position for nerillids next to Eunice pennata and Nothria conchilega, from the subgroup Eunicida within Aciculata. The two cladograms of the combined analyses (minimizing ILD) were fully resolved and placed nerillids in a terminal position next to Aberranta sp., within a clade of eunicidan species. The study showed that the analytical conditions for the homology assignment of 18S rRNA strongly influenced the phylogenetic results. The various previous proposals on the phylogenetic position of the Nerillidae are reviewed, some of which are in accordance with the results of the present study.  相似文献   

10.
Patterson A  Karsi A  Feng J  Liu Z 《Gene》2003,305(2):151-160
Ribosomal protein genes have become widely used as markers for phylogenetic studies and comparative genomics, but they have not been available in fish. We have cloned and sequenced a complete set of all 47 60S ribosomal protein cDNAs from channel catfish (Ictalurus punctatus), of which 43 included the complete protein encoding regions. Most ribosomal protein mRNAs in channel catfish are highly similar to their mammalian counterparts. However, L4, L14, and L29 are significantly shorter in channel catfish than in mammals due to deletions in the 3' end of the gene. Two distantly related L5 cDNAs, L5a and L5b, were found in channel catfish. L5a is more similar to L5 in other vertebrates, while L5b showed significant levels of divergence, suggesting independent evolution of the two L5-encoding genes. The 47 ribosomal protein genes are generally highly expressed and together account for 11-14% of overall gene expression, depending on the tissues. Expression levels were highly variable both within a single tissue among different ribosomal protein genes, and among tissues with regard to a single ribosomal protein gene. Strong tissue preference expression was also observed for some ribosomal proteins. This set of ribosomal protein gene sequences represents one of the most complete sets from any single organism and will aid in fish phylogenetic and comparative genomic studies.  相似文献   

11.
A phylogeny of the meiofaunal polychaete family Nerillidae based on morphological, molecular and combined data is presented here. The data sets comprise nearly complete sequences of 18S rDNA and 40 morphological characters of 17 taxa. Sequences were analyzed simultaneously with the morphological data by direct optimization in the program POY, with a variety of parameter sets (costs of gaps: transversions: transitions). Three outgroups were selected from the major polychaete group Aciculata and one from Scolecida. The 13 nerillid species from 11 genera were monophyletic in all analyses with very high support, and three new apomorphies for Nerillidae are identified. The topology of the ingroup varied according to the various parameter settings. Reducing the number of outgroups to one decreased the variance among the phylogenetic hypotheses. The congruence among these was tested and a parameter set, with equal weights (222) and extension gap weighted 1, yielded minimum incongruence (ILD). Several terminal clades of the combined analysis were highly supported, as well as the position of Leptonerilla prospera as sister terminal to the other nerillids. The evolution of morphological characters such as segment numbers, chaetae, appendages and ciliation are traced and discussed. A regressive pathway within Nerillidae is indicated for several characters, however, generally implying several convergent losses. Numerous genera are shown to require revision. © The Willi Hennig Society 2005.  相似文献   

12.
Measuring Topological Congruence by Extending Character Techniques   总被引:1,自引:0,他引:1  
A measure of topological congruence which is an extension of the Mickevich–Farris character incongruence metric ( i.e. , ILD; Mickevich and Farris, 1981) is proposed. Group inclusion characters (1 = member of a clade; 0 = not a member) are constructed for each topology to be considered. The sets of characters derived from the topologies are then compared for character incongruence due to data set combination. Each homoplasy signifies a disagreement among topological statements. The value is normalized for potential maximum incongruence to adjust values for unresolved topologies. This measure is compared to other topological and character congruence techniques and explored in test data.  相似文献   

13.
Combined analysis of multiple phylogenetic data sets can reveal emergent character support that is not evident in separate analyses of individual data sets. Previous parsimony analyses have shown that this hidden support often accounts for a large percentage of the overall phylogenetic signal in cladistic studies. Here, reanalysis of a large comparative genomic data set for yeast (genus Saccharomyces) demonstrates that hidden support can be an important factor in maximum likelihood analyses of multiple data sets as well. Emergent signal in a concatenation of 106 genes was responsible for up to 64% of the likelihood support at a particular node (the difference in log likelihood scores between optimal topologies that included and excluded a supported clade). A grouping of four yeast species (S. cerevisiae, S. paradoxus, S. mikatae, and S. kudriavzevii) was robustly supported by combined analysis of all 106 genes, but separate analyses of individual genes suggested numerous conflicts. Forty-eight genes strictly contradicted S. cerevisiae + S. paradoxus + S. mikatae + S. kudriavzevii in separate analyses, but combined likelihood analyses that included up to 45 of the "wrong" data sets supported this group. Extensive hidden support also emerged in a combined likelihood analysis of 41 genes that each recovered the exact same topology in separate analyses of the individual genes. These results show that isolated analyses of individual data sets can mask congruence and distort interpretations of clade stability, even in strictly model-based phylogenetic methods. Consensus and supertree procedures that ignore hidden phylogenetic signals are, at best, incomplete.  相似文献   

14.
For bottom‐up proteomics, there are wide variety of database‐searching algorithms in use for matching peptide sequences to tandem MS spectra. Likewise, there are numerous strategies being employed to produce a confident list of peptide identifications from the different search algorithm outputs. Here we introduce a grid‐search approach for determining optimal database filtering criteria in shotgun proteomics data analyses that is easily adaptable to any search. Systematic Trial and Error Parameter Selection‐–referred to as STEPS‐–utilizes user‐defined parameter ranges to test a wide array of parameter combinations to arrive at an optimal “parameter set” for data filtering, thus maximizing confident identifications. The benefits of this approach in terms of numbers of true‐positive identifications are demonstrated using datasets derived from immunoaffinity‐depleted blood serum and a bacterial cell lysate, two common proteomics sample types.  相似文献   

15.
Tests for incongruence as an indicator of among-data partition conflict have played an important role in conditional data combination. When such tests reveal significant incongruence, this has been interpreted as a rationale for not combining data into a single phylogenetic analysis. In this study of lorisiform phylogeny, we use the incongruence length difference (ILD) test to assess conflict among three independent data sets. A large morphological data set and two unlinked molecular data sets--the mitochondrial cytochrome b gene and the nuclear interphotoreceptor retinoid binding protein (exon 1)--are analyzed with various optimality criteria and weighting mechanisms to determine the phylogenetic relationships among slow lorises (Primates, Loridae). When analyzed separately, the morphological data show impressive statistical support for a monophyletic Loridae. Both molecular data sets resolve the Loridae as paraphyletic, though with different branching orders depending on the optimality criterion or character weighting used. When the three data partitions are analyzed in various combinations, an inverse relationship between congruence and phylogenetic accuracy is observed. Nearly all combined analyses that recover monophyly indicate strong data partition incongruence (P = 0.00005 in the most extreme case), whereas all analyses that recover paraphyly indicate lack of significant incongruence. Numerous lines of evidence verify that monophyly is the accurate phylogenetic result. Therefore, this study contributes to a growing body of information affirming that measures of incongruence should not be used as indicators of data set combinability.  相似文献   

16.
This paper examines the efficiency of the incongruence length difference test (ILD) proposed by Farris et al. (1994) for assessing the incongruence between sets of characters. DNA sequences were simulated under various evolutionary conditions: (1) following symmetric or asymmetric trees, (2) with various mutation rates, (3) with constant or variable evolutionary rates along the branches, and (4) with different among-site substitution rates. We first compared two sets of sequences generated along the same tree and under the same evolutionary conditions. The probability of a Type-I error (wrongly rejecting the true hypothesis of congruence) was substantially below the standard 5% level of significance given by the ILD test; this finding indicates that the choice of the 5% level is rather conservative in this case. We then compared two data sets, still generated along the same tree, but under different evolutionary conditions (constant vs. variable evolutionary rate, homogeneity vs. heterogeneity rate of substitution). Under these conditions, the probability of rejecting the true hypothesis of congruence was greater than the 5% given by the ILD test and increased with the number of sites and the degree to which the tree was asymmetric. Finally, the comparison of the two data sets, simulated under contrasting tree structures (symmetric vs. asymmetric) but under the same evolutionary conditions, led us to reject the hypothesis of congruence, albeit weakly, particularly when the number of informative sites was low and among-site substitution rate heterogeneous. We conclude that the ILD test has only limited power to detect incongruence caused by differences in the evolutionary conditions or in the tree topology, except when numerous characters are present and the substitution rate is homogeneous from site to site.  相似文献   

17.
Abstract.— Previous studies of phylogenetic congruence between aphids and their symbiotic bacteria ( Buchnera ) supported long-term vertical transmission of symbionts. However, those studies were based on distantly related aphids and would not have revealed horizontal transfer of symbionts among closely related hosts. Aphid species of the genus Uroleucon are closely related phylogenetically and overlap in geographic ranges, habitats, and parasitoids. To examine support for congruence of phylogenies of Buchnera and Uroleucon , sequences from four mitochondrial, one nuclear, and one endosymbiont gene ( trpB ) were obtained. Congruence of phylogenies based on pooled aphid genes with phylogenies based on trpB was highly significant: Most nodes resolved by trpB corresponded to nodes resolved by the pooled aphid genes. Furthermore, no nodes were both inconsistent between the trees and strongly supported in both trees. Two kinds of analyses testing the null hypothesis of perfect congruence between pairwise combinations of datasets and tree topologies were performed: the Kishino-Hasegawa test and the likelihood-ratio test. Both tests indicated significant disagreement among most pairwise combinations of mitochondrial, nuclear, and symbiont datasets. Because rampant recombination among mitochondrial genomes of different aphid species is unlikely, inaccurate assumptions in the evolutionary models underlying these tests appear to be causing the hypothesis of a shared history to be incorrectly rejected. Moreover, trpB was more consistent with the aphid genes as a set than any single aphid gene was with the others, suggesting that the symbionts show the same phylogeny as the aphids. Overall, analyses support the interpretation that symbionts and aphids have undergone strict cospeciation, with no horizontal transmission of symbionts even among closely related, ecologically similar aphid hosts.  相似文献   

18.
Amaranthus includes approximately 60 species, of which three are cultivated as a grain source. Many wild Amaranthus species possess agriculturally desirable traits such as drought and salt tolerance, and pathogen resistance. We examined relationships among wild and cultivated Amaranthus species based upon restriction-site variation in two chloroplast DNA regions and in a nuclear DNA region. The chloroplast regions consisted of (1) an intergenic spacer in transfer RNA genes and (2) the ribulose-1,5-bisphosphate carboxylase gene with a flanking open reading frame. The nuclear region was the internal transcribed spacers ITS-1 and ITS-2 flanking the 5.8S gene in the ribosomal DNA. These regions were amplified by the polymerase chain reaction and digested with a total of 38 restriction endonucleases. We detected 11 potentially informative restriction-site mutations and seven length-polymorphisms among the 28 Amaranthus species. Parsimony analysis was used to find the shortest tree for each separate data set (chloroplast, nuclear, and length) and for two combined matrices (chloroplast/nuclear and all data sets). Overall, there was a low level of variation which generated poorly resolved trees among the 28 species. Congruence analyses revealed that the chloroplast and nuclear data sets were congruent with each other but not to the length data set. The congruence of the chloroplast and nuclear data sets suggested that cytoplasmic gene flow may not be a confounding factor in our analyses. The phylogeny also suggested that drought tolerance evolved independently several times. The molecular phylogeny provides a basis for selection of species pairs for crop development.  相似文献   

19.
The level of integration between associated partners can range from ectosymbioses to extracellular and intracellular endosymbioses, and this range has been assumed to reflect a continuum from less intimate to evolutionarily highly stable associations. In this study, we examined the specificity and evolutionary history of marine symbioses in a group of closely related sulphur‐oxidizing bacteria, called Candidatus Thiosymbion, that have established ecto‐ and endosymbioses with two distantly related animal phyla, Nematoda and Annelida. Intriguingly, in the ectosymbiotic associations of stilbonematine nematodes, we observed a high degree of congruence between symbiont and host phylogenies, based on their ribosomal RNA (rRNA) genes. In contrast, for the endosymbioses of gutless phallodriline annelids (oligochaetes), we found only a weak congruence between symbiont and host phylogenies, based on analyses of symbiont 16S rRNA genes and six host genetic markers. The much higher degree of congruence between nematodes and their ectosymbionts compared to those of annelids and their endosymbionts was confirmed by cophylogenetic analyses. These revealed 15 significant codivergence events between stilbonematine nematodes and their ectosymbionts, but only one event between gutless phallodrilines and their endosymbionts. Phylogenetic analyses of 16S rRNA gene sequences from 50 Cand. Thiosymbion species revealed seven well‐supported clades that contained both stilbonematine ectosymbionts and phallodriline endosymbionts. This closely coupled evolutionary history of marine ecto‐ and endosymbionts suggests that switches between symbiotic lifestyles and between the two host phyla occurred multiple times during the evolution of the Cand. Thiosymbion clade, and highlights the remarkable flexibility of these symbiotic bacteria.  相似文献   

20.
Gene set analysis allows the inclusion of knowledge from established gene sets, such as gene pathways, and potentially improves the power of detecting differentially expressed genes. However, conventional methods of gene set analysis focus on gene marginal effects in a gene set, and ignore gene interactions which may contribute to complex human diseases. In this study, we propose a method of gene interaction enrichment analysis, which incorporates knowledge of predefined gene sets (e.g. gene pathways) to identify enriched gene interaction effects on a phenotype of interest. In our proposed method, we also discuss the reduction of irrelevant genes and the extraction of a core set of gene interactions for an identified gene set, which contribute to the statistical variation of a phenotype of interest. The utility of our method is demonstrated through analyses on two publicly available microarray datasets. The results show that our method can identify gene sets that show strong gene interaction enrichments. The enriched gene interactions identified by our method may provide clues to new gene regulation mechanisms related to the studied phenotypes. In summary, our method offers a powerful tool for researchers to exhaustively examine the large numbers of gene interactions associated with complex human diseases, and can be a useful complement to classical gene set analyses which only considers single genes in a gene set.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号