首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Indels in DNA sequences frequently affect more than a single nucleotide, creating problems for alignment, character coding and phylogenetic analysis. However, the size and frequency of multiple‐residue indels is not usually tested, and with popular alignment packages their reconstruction is indirectly acheived by reducing the affine (gap extension) cost. We explored the length distribution of indels in intron sequences of the gene Mp20 by modifying the gap opening and gap extension costs. Given a “known” tree for the study group, global homology levels were greatest under low gap cost, with gap extension costs of roughly 0.4‐fold the opening cost. Different approaches to gap coding and weighting suggested that taxonomic congruence was correlated with high frequencies of multiple‐position indels, with a maximum indel length of 2–5 bp and few indels above 15 bp, but also including a proportion of indels > 100 bp. Only a small minority of indels could be reconstructed as single‐position indels. Consequently, tree topologies improved when homologous multinucleotide indels were recoded as binary characters which are otherwise highly homoplastic and weighted characters in single‐position coding. In tree‐generating alignment procedures as implemented in POY, where gap penalty determines the character weight during tree search, the problem of assigning inappropriately high weight to multiple‐residue indels could partly be overcome by setting the extension costs to about 0.4‐fold lower than gap opening costs. We conclude that multiple consecutive gap positions are not independent characters and hence methods for parsimony reconstruction of long indels are required. Finally, we also observed a general lack of correlation between taxonomic and character congruence, demonstrating the difficulties of applying congruence criteria to decide among competing alignments. This highlights the value of recent model‐based alignment procedures which can implement the statistical distributions of indel size classes, and do not rely on potentially circular strategies for optimizing overall congruence. © The Willi Hennig Society 2006.  相似文献   

2.
The behavior of two topological and four character‐based congruence measures was explored using different indel treatments in three empirical data sets, each with different alignment difficulties. The analyses were done using direct optimization within a sensitivity analysis framework in which the cost of indels was varied. Indels were treated either as a fifth character state, or strings of contiguous gaps were considered single events by using linear affine gap cost. Congruence consistently improved when indels were treated as single events, but no congruence measure appeared as the obviously preferable one. However, when combining enough data, all congruence measures clearly tended to select the same alignment cost set as the optimal one. Disagreement among congruence measures was mostly caused by a dominant fragment or a data partition that included all or most of the length variation in the data set. Dominance was easily detected, as the character‐based congruence measures approached their optimal value when indel costs were incremented. Dominance of a fragment or data partition was overwhelmed when new sequence length‐variable fragments or data partitions were added. © The Willi Hennig Society 2005.  相似文献   

3.
The 60 000 described species of Cyclorrhapha are characterized by an unusual diversity in larval life‐history traits, which range from saprophagy over phytophagy to parasitism and predation. However, the direction of evolutionary change between the different modes remains unclear. Here, we use the Scathophagidae (Diptera) for reconstructing the direction of change in this relatively small family (≈ 250 spp.) whose larval habits mirror the diversity in natural history found in Cyclorrhapha. We subjected a molecular data set for 63 species (22 genera) and DNA sequences from seven genes (12S, 16S, Cytb, COI, 28S, Ef1‐alfa, Pol II) to an extensive sensitivity analysis and compare the performance of three different alignment strategies (manual, Clustal, POY). We find that the default Clustal alignment performs worst as judged by character incongruence, topological congruence and branch support. For this alignment, scoring indels as a fifth character state worsens character incongruence and topological congruence. However, manual alignment and direct optimization perform similarly well and yield near‐identical trees, although branch support is lower for the direct‐optimization trees. All three alignment techniques favor the upweighting of transversion. We furthermore confirm the independence of the concepts “node support” and “node stability” by documenting several cases of poorly supported nodes being very stable and cases of well supported nodes being unstable. We confirm the monophyly of the Scathophagidae, its two constituent subfamilies, and most genera. We demonstrate that phytophagy in the form of leaf mining is the ancestral larval feeding habit for Scathophagidae. From phytophagy, two shifts to saprophagy and one shift to predation has occurred while a second origin of predation is from a saprophagous ancestor. © The Willi Hennig Society 2006.  相似文献   

4.
Abstract. The semiaquatic bugs (Hemiptera–Heteroptera, infraorder Gerromorpha), comprising water striders and their allies (c. 1900 described species), are familiar inhabitants of water surfaces in all continents. Recent fossil evidence indicates that the evolutionary history of semiaquatic bugs spans more than 120 million years of geological time. At present, our insight into the phylogeny of higher taxa is based upon Andersen's manual cladistic analysis of a suite of morphological characters. The present work expands the phylogenetic insight with numerical cladistic analyses of morphological and molecular datasets (partial sequences of 16S and 28S rDNA) for forty species of Gerromorpha covering most higher taxa (families, subfamilies), estimates of branch support, character incongruence, and topological congruence (nodal stability). For the molecular data we apply different alignment options (manual vs numerical alignment; multiple alignment vs direct optimization), treat insertion–deletion events (indels) as either missing data or as a fifth character state, subject the data to a sensitivity analysis, and estimate topological congruence between different analysis trees. Relationships change considerably under different analysis conditions, which means that there is little node stability, and for selecting preferred analysis conditions there is conflicting evidence from rescaled incongruence length difference and the key node criterion. Based on the analysis of the combined morphological and molecular datasets, this study supports the close relationship between the families Gerridae, Hermatobatidae and Veliidae (superfamily Gerroidea), but not the monophyly of the family Veliidae. The results suggest that the genus Ocellovelia (Ocelloveliinae) should be excluded from this family and placed as a sister group to Gerridae + the remaining species of Veliidae. Our study also supports a close relationship between the subfamilies Halobatinae and Ptilomerinae (Gerridae), and that the subfamily Veliinae is probably nonmonophyletic.  相似文献   

5.
Sensitivity analysis provides a way to measure robustness of clades in sequence‐based phylogenetic analyses to variation in alignment parameters rather than measuring their branch support. We compared three different approaches to multiple sequence alignment in the context of sensitivity analysis: progressive pairwise alignment, as implemented in MUSCLE; simultaneous multiple alignment of sequence fragments, as implemented in DCA; and direct optimization followed by generation of the implied alignment(s), as implemented in POY. We set out to determine the relative sensitivity of these three alignment methods using rDNA sequences and randomly generated sequences. A total of 36 parameter sets were used to create the alignments, varying the transition, transversion, and gap costs. Tree searches were performed using four different character‐coding and weighting approaches: the cost function used for alignment or equally weighted parsimony with gap positions treated as missing data, separate characters, or as fifth states. POY was found to be as sensitive, or more sensitive, to variation in alignment parameters than DCA and MUSCLE for the three empirical datasets, and POY was found to be more sensitive than MUSCLE, which in turn was found to be as sensitive, or more sensitive, than DCA when applied to the randomly generated sequences when sensitivity was measured using the averaged jackknife values. When significant differences in relative sensitivity were found between the different ways of weighting character‐state changes, equally weighted parsimony, for all three ways of treating gapped positions, was less sensitive than applying the same cost function used in alignment for phylogenetic analysis. When branch support is incorporated into the sensitivity criterion, our results favour the use of simultaneous alignment and progressive pairwise alignment using the similarity criterion over direct optimization followed by using the implied alignment(s) to calculate branch support.  相似文献   

6.
Although there has been a recent proliferation in maximum‐likelihood (ML)‐based tree estimation methods based on a fixed sequence alignment (MSA), little research has been done on incorporating indel information in this traditional framework. We show, using a simple model on a single character example, that a trivial alignment of a different form than that previously identified for parsimony is optimal in ML under standard assumptions treating indels as “missing” data, but that it is not optimal when indels are incorporated into the character alphabet. We show that the optimality of the trivial alignment is not an artefact of simplified theory assumptions by demonstrating that trivial alignment likelihoods of five different multiple sequence alignment datasets exhibit this phenomenon. These results demonstrate the need for use of indel information in likelihood analysis on fixed MSAs, and suggest that caution must be exercised when drawing conclusions from software implementations claiming improvements in likelihood scores under an indels‐as‐missing assumption. © The Willi Hennig Society 2012.  相似文献   

7.
Nuclear introns are commonly used as phylogenetic markers, but a number of issues related to alignment strategies, indel treatments, and the incorporation of length-variant heterozygotes (LVHs) are not routinely addressed when generating phylogenetic hypotheses. Topological congruence in relation to an extensive mitochondrial DNA multigene phylogeny (derived from 2,423 bp of 12S, 16S, ND4, and CYTB genes) of the Asian pitviper Trimeresurus radiation was used to compare combinations of "by eye" and edited and unedited ClustalX 1.8 alignments of two nuclear introns. Indels were treated as missing data, fifth character states, and assigned simple and multistate codes. Upon recovery of the optimal alignment and indel treatment strategy, a total evidence approach was used to investigate the phylogenetic utility of the indels and test new generic arrangements within Trimeresurus. Approximately one third of the intron data partitions exhibited LVHs, suggesting that they are common in introns. Furthermore, a simple concatenation approach can facilitate the incorporation of LVHs into phylogenetic analyses to make use of all available data and investigate mechanisms of molecular evolution. Analyses of ClustalX 1.8-assisted alignments were generally more congruent than the "by eye" alignment and the analysis of a simple coded, edited ClustalX 1.8 (gap opening cost 5, gap extension cost 1) alignment revealed the most congruent tree. The total evidence approach supported the new arrangements within Trimeresurus, suggesting that the phylogeny should be considered as a working benchmark in Asian pitviper systematics. Finally, a critical appraisal of the diverse array of indels (56 to 57 per intron, ranging from 1 to 151 bp in length) suggested that they are a combination of Hennigian and homoplasious events unrelated to indel size or location within the intron. [Alignment; indels; intron analysis; length-variant heterozygotes; Trimeresurus.].  相似文献   

8.
9.
Implied weighting, a method for phylogenetic inference that actively seeks to downweight supposed homoplasy, has in recent years begun to be widely utilized in palaeontological datasets. Given the method's purported ability at handling widespread homoplasy/convergence, we investigate the effects of implied weighting on modelled phylogenetic data. We generated 100 character matrices consisting of 55 characters each using a Markov Chain morphology model of evolution based on a known phylogenetic tree. Rates of character evolution in these datasets were variable and generated by pulling from a gamma distribution for each character in the matrix. These matrices were then analysed under equal weighting and four settings of implied weights (= 1, 3, 5, and 10). Our results show that implied weighting is inconsistent in its ability to retrieve a known phylogenetic tree. Equally weighted analyses are found to generally be more conservative, retrieving higher frequency of polytomies but being less likely to generate erroneous topologies. Implied weighting is found to generally resolve polytomies while also propagating errors, resulting in an increase in both correctly and incorrectly resolved nodes with a tendency towards higher rates of error compared to equal weighting. Our results suggest that equal weights may be a preferable method for parsimony analysis.  相似文献   

10.
In a recent study, the phylogeny of Caseidae (a herbivorous family of Palaeozoic synapsids belonging to the paraphyletic grade known as pelycosaurs) was analysed with a dataset employing more than three hundred continuous morphological characters in an effort to follow the principles of total evidence. Continuous characters are a source of great debate, with disagreements surrounding their suitability for and treatment in phylogenetic analysis. A number of shortcomings were identified in the handling of continuous characters in this study of caseids, including the use of gap weighting to discretize the characters and potential issues with redundancy and character non‐independence. Therefore, an alternative treatment for these characters is suggested here. First, rather than using gap weighting, the continuous characters were analysed in the program TNT, in which the raw values can be treated as continuous rather than discrete. Second, prior to the phylogenetic analysis, the continuous characters were subjected to a log‐ratio principal component analysis, and then the principal components were included in the character matrix rather than the raw ratios. Analysing the original data in TNT produced little difference in the results, but using the principal components as continuous characters resulted in alternative positions for Caseopsis agilis, Ennatosaurus tecton and Caseoides sanangeloensis. The differences are judged to be due to the reduced redundancy of the characters, the smaller number of principal components not overwhelming the discrete characters and the use of a scaling method which allows principal components with a higher variance to have a greater influence on the analysis. The positions of highly fragmentary fossils depended heavily on the method used to treat the missing characters in the principal component analysis, and so the method proposed here is not recommended for analysing very incomplete taxa.  相似文献   

11.
We use a multigene data set (the mitochondrial locus and nine nuclear gene regions) to test phylogenetic relationships in the South American "lava lizards" (genus Microlophus) and describe a strategy for aligning noncoding sequences that accounts for differences in tempo and class of mutational events. We focus on seven nuclear introns that vary in size and frequency of multibase length mutations (i.e., indels) and present a manual alignment strategy that incorporates insertions and deletions (indels) for each intron. Our method is based on mechanistic explanations of intron evolution that does not require a guide tree. We also use a progressive alignment algorithm (Probabilistic Alignment Kit; PRANK) and distinguishes insertions from deletions and avoids the "gapcost" conundrum. We describe an approach to selecting a guide tree purged of ambiguously aligned regions and use this to refine PRANK performance. We show that although manual alignment is successful in finding repeat motifs and the most obvious indels, some regions can only be subjectively aligned, and there are limits to the size and complexity of a data matrix for which this approach can be taken. PRANK alignments identified more parsimony-informative indels while simultaneously increasing nucleotide identity in conserved sequence blocks flanking the indel regions. When comparing manual and PRANK with two widely used methods (CLUSTAL, MUSCLE) for the alignment of the most length-variable intron, only PRANK recovered a tree congruent at deeper nodes with the combined data tree inferred from all nuclear gene regions. We take this concordance as an objective function of alignment quality and present a strongly supported phylogenetic hypothesis for Microlophus relationships. From this hypothesis we show that (1) a coded indel data partition derived from the PRANK alignment contributed significantly to nodal support and (2) the indel data set permitted detection of significant conflict between mitochondrial and nuclear data partitions, which we hypothesize arose from secondary contact of distantly related taxa, followed by hybridization and mtDNA introgression.  相似文献   

12.
The phylogenetic relationships of 22 species of Coelopidae are reconstructed based on a data matrix consisting of morphological and DNA sequence characters (16S rDNA, EF-1alpha). Optimal gap and transversion costs are determined via a sensitivity analysis and both equal weighting and a transversion cost of 2 are found to perform best based on taxonomic congruence, character incongruence, and tree support. The preferred phylogenetic hypothesis is fully resolved and well-supported by jackknife, bootstrap, and Bremer support values, but it is in conflict with the cladogram based on morphological characters alone. Most notably, the Coelopidae and the genus Coelopa are not monophyletic. However, partitioned Bremer Support and an analysis of node stability under different gap and transversion costs reveal that the critical clades rendering these taxa non-monophyletic are poorly supported. Furthermore, the monophyly of Coelopidae and Coelopa is not rejected in analyses using 16S rDNA that was manually aligned. The resolution of the tree based on this reduced data sets is, however, lower than for the tree based on the full data sets. Partitioned Bremer support values reveal that 16S rDNA characters provide the largest amount of tree support, but the support values are heavily dependent on analysis conditions. Problems with direct comparison of branch support values for trees derived using fixed alignments with those obtained under optimization alignment are discussed. Biogeographic history and available behavioral and genetic data are also discussed in light of this first cladogram for Coelopidae based on a quantitative phylogenetic analysis.  相似文献   

13.
Opinions split when it comes to the significance and thus the weighting of indel characters as phylogenetic markers. This paper attempts to test the phylogenetic information content of indels and nucleotide substitutions by proposing an a priori weighting system of non-protein-coding genes. Theoretically, the system rests on a weighting scheme which is based on a falsificationist approach to cladistic inference. It provides insertions, deletions and nucleotide substitutions weights according to their specific number of identical classes of potential falsifiers, resulting in the following system: nucleotide substitutions weight = 3, deletions of n nucleotides weight = (2n–1), and insertions of n nucleotides weight = (5n–1). This weighting system and the utility of indels as phylogenetic markers are tested against a suitable data set of 18S rDNA sequences of Diptera and Strepsiptera taxa together with other Metazoa species. The indels support the same clades as the nucleotide substitution data, and the application of the weighting system increases the corresponding consistency indices of the differentially weighted character types. As a consequence, applying the weighting system seems to be reasonable, and indels appear to be good phylogenetic markers.  相似文献   

14.
We infer phylogenetic relationships within Teioidea, a superfamily of Nearctic and Neotropical lizards, using nucleotide sequences. Phylogenetic analyses relied on parsimony under tree‐alignment and similarity‐alignment, with length variation (i.e. gaps) treated as evidence and as absence of evidence, and maximum‐likelihood under similarity‐alignment with gaps as absence of evidence. All analyses produced almost completely resolved trees despite 86% of missing data. Tree‐alignment produced the shortest trees, the strict consensus of which is more similar to the maximum‐likelihood tree than to any of the other parsimony trees, in terms of both number of clades shared, parsimony cost and likelihood scores. Comparisons of tree costs suggest that the pattern of indels inferred by similarity‐alignment drove parsimony analyses on similarity‐aligned sequences away from more optimal solutions. All analyses agree in a majority of clades, although they differ from each other in unique ways, suggesting that neither the criterion of optimality, alignment nor treatment of indels alone can explain all differences. Parsimony rejects the monophyly of Gymnophthalmidae due to the position of Alopoglossinae relative to Teiidae, whereas support of Gymnophthalmidae by maximum‐likelihood was low. We address various nomenclatural issues, including Gymnophthalmidae Fitzinger, 1826 being an older name than Teiidae Gray, 1827. We recognize three families in the arrangement Alopoglossidae + (Teiidae + Gymnophthalmidae). Within Gymnophthalmidae we recognize Cercosaurinae, Gymnophthalminae, Rhachisaurinae and Riolaminae in the relationship Cercosaurinae + (Rhachisaurinae + (Riolaminae + Gymnophthalminae)). Cercosaurinae is composed of three tribes—Bachiini, Cercosaurini and Ecpleopodini—and Gymnophthalminae is composed of three—Gymnophthalmini, Heterodactylini and Iphisini. Within Teiidae we retain the currently recognized three subfamilies in the arrangement: Callopistinae + (Tupinambinae + Teiinae). We also propose several genus‐level changes to restore the monophyly of taxa.  相似文献   

15.
Direct optimization (DO) of 126 nuclear‐encoded SSU rRNA diatom sequences was conducted. The optimal phylogeny indicated several unique relationships with respect to those recovered from a maximum likelihood (ML) analysis of an alignment based on maximizing primary and secondary structural similarity between 126 nuclear‐encoded SSU rRNA diatom sequences ( Medlin and Kaczmarska, 2004 ). Dividing diatoms into the subdivisions Coscinodiscophytina and Bacillariophytina was not supported by the DO phylogeny, due to the paraphyly of the former. The same pertains to Coscinodiscophyceae, Mediophyceae, Thalassiosira, Fragilaria and Amphora. The ordinal‐level classification of the diatoms proposed by Round et al. (1990 ) was for the most part found to be unsupported. The DO phylogeny represented a more rigorous hypothesis than the ML tree because DO maximized character congruence during the homology testing (i.e., alignment/tree search) process whereas the non‐phylogenetic similarity‐based alignment used in the ML analysis did not. The above statement is supported by “controlled” parsimony analyses of 35 sequences, which strongly suggested that dissimilarities in the DO and ML tree structure were due to the specific homology testing approach used. It could not be precluded that differences in taxon sampling and the use of a dissimilar optimality criteria contributed to discrepancies in the structure of the optimal ML and DO trees.  相似文献   

16.
The performance of the computer program for phyloge netic analysis, POY, and its two implemented methods, "optimization alignment" and "fixed-states optimization," are explored for four data sets. Four gap costs are analyzed for every partition; some of the partitions (the 18S rRNA) are treated as a single fragment or in increasing fragments of 3, 10, and 30. Comparisons within and among methods are undertaken according to gap cost, number of fragments in which the sequences are divided, tree length, character congruence, topological congruence, primary homology statements, and computation time.  相似文献   

17.
Insertion and deletion events (indels) provide a suite of markers with enormous potential for molecular phylogenetics. Using many more indel characters than those in previous studies, we here for the first time address the impact of indel inclusion on the phylogenetic inferences of Arctoidea (Mammalia: Carnivora). Based on 6843 indel characters from 22 nuclear intron loci of 16 species of Arctoidea, our analyses demonstrate that when the indels were not taken into consideration, the monophyly of Ursidae and Pinnipedia tree and the monophyly of Pinnipedia and Musteloidea tree were both recovered, whereas inclusion of indels by using three different indel coding schemes give identical phylogenetic tree topologies supporting the monophyly of Ursidae and Pinnipedia. Our work brings new perspectives on the previously controversial placements among Arctoidea families, and provides another example demonstrating the importance of identifying and incorporating indels in the phylogenetic analyses of introns. In addition, comparison of indel incorporation methods revealed that the three indel coding methods are all advantageous over treating indels as missing data, given that incorporating indels produces consistent results across methods. This is the first report of the impact of different indel coding schemes on phylogenetic reconstruction at the family level in Carnivora, which indicates that indels should be taken into account in the future phylogenetic analyses.  相似文献   

18.
Summary In this paper we present an iterative character weighting method for the construction of phyletic trees. An initial tree is used to calculate the character weights, which are the number of mutations normalized so that the possible range is corrected for. The weights obtained are used to adjust the tree; this process is iterated until a stable tree is found. Using data generated according to a model tree, we show that the trees constructed by the iterative character weighting method converge to the true underlying tree. Using biological data, the trees become closer to the systematic classification of the species concerned, and patterns conflicting with the phylogenetic pattern can be singled out. The method involves a combination of minimal length methods and similarity methods, whereby the strict parsimony criterion is relaxed.  相似文献   

19.
In order to investigate the effects of different weighting methods on a phylogeny reconstruction based on DNA sequences and to evaluate the phylogenetic information content of various secondary structures, a fragment of the large ribosomal mitochondrial gene (16S) was sequenced from 13 species of New World monkeys, three species of catarrhines, and Tarsius. The data were analyzed cladistically without weighting characters or changes, and with different weighting methods: a priori differential weights for transitions and transversions, two variants of dynamic weighting for each kind and direction of change, and successive approximations, using both the character consistency index (CI) and the rescaled consistency index (RC). The results were compared with published trees constructed from nuclear sequences of ε-globins and morphological characters by different authors. The result of the analysis of the mtDNA data set with successive approximations, using the RC as weighting function, was the closest to the topology on which all molecular and morphological trees concur. Other relationships were unique to this tree. "Loops" were the type of secondary structure that showed maximum variation in sequence length and sites with the lowest character CI and RC. A large number of sites within loops showed high values for these indices, however, which suggests that uniform downweighting of these regions represents a large loss of phylogenetic information. Successive weighting, which assigns a weight for each particular character, seems to be a desirable alternative to this practice. We propose a new variant of dynamic weighting, which we call homoplasy-correcting dynamic weighting, that like dynamic weighting, is applicable to any kind of sequence, coding or noncoding.  相似文献   

20.
Joyce, W.G. and Sterli J. 2010. Congruence, non‐homology, and the phylogeny of basal turtles.–Acta Zoologica (Stockholm) Modern cladistic analysis is characterized by the assembly of increasingly larger data sets coupled with the use of congruence as the final test of homology. Some critics of this development have recently called for a return to more detailed primary homology analysis while questioning the utility of congruence. This discussion appears to be central to the debate regarding the phylogenetic relationships of basal turtles, as the large data sets developed by us have been criticized recently for utilizing poorly constructed characters and including too many homoplasy‐prone characters. Our analysis of this critique reveals that (1) new information regarding poorly understood taxa has a greater impact on the outcome of turtle phylogenies than the characters under dispute; (2) most current turtle phylogenies differ in taxon sampling, not character sampling, and so it appears illogical to condemn a particular analysis for its character sampling; (3) even evolutionary taxonomists should agree that key characters utilized to resolve basal turtle relationships cannot be thought to be ‘infallible’; (4) whereas various criteria provide positive evidence for homology, only congruence provides positive evidence for non‐homology; and (5) a stalemate between conflicting camps within a congruence frame work is preferable to the ad hoc dismissal of data sets, because authoritative statements are untestable.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号