首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The behavior of two topological and four character‐based congruence measures was explored using different indel treatments in three empirical data sets, each with different alignment difficulties. The analyses were done using direct optimization within a sensitivity analysis framework in which the cost of indels was varied. Indels were treated either as a fifth character state, or strings of contiguous gaps were considered single events by using linear affine gap cost. Congruence consistently improved when indels were treated as single events, but no congruence measure appeared as the obviously preferable one. However, when combining enough data, all congruence measures clearly tended to select the same alignment cost set as the optimal one. Disagreement among congruence measures was mostly caused by a dominant fragment or a data partition that included all or most of the length variation in the data set. Dominance was easily detected, as the character‐based congruence measures approached their optimal value when indel costs were incremented. Dominance of a fragment or data partition was overwhelmed when new sequence length‐variable fragments or data partitions were added. © The Willi Hennig Society 2005.  相似文献   

2.
Nuclear introns are commonly used as phylogenetic markers, but a number of issues related to alignment strategies, indel treatments, and the incorporation of length-variant heterozygotes (LVHs) are not routinely addressed when generating phylogenetic hypotheses. Topological congruence in relation to an extensive mitochondrial DNA multigene phylogeny (derived from 2,423 bp of 12S, 16S, ND4, and CYTB genes) of the Asian pitviper Trimeresurus radiation was used to compare combinations of "by eye" and edited and unedited ClustalX 1.8 alignments of two nuclear introns. Indels were treated as missing data, fifth character states, and assigned simple and multistate codes. Upon recovery of the optimal alignment and indel treatment strategy, a total evidence approach was used to investigate the phylogenetic utility of the indels and test new generic arrangements within Trimeresurus. Approximately one third of the intron data partitions exhibited LVHs, suggesting that they are common in introns. Furthermore, a simple concatenation approach can facilitate the incorporation of LVHs into phylogenetic analyses to make use of all available data and investigate mechanisms of molecular evolution. Analyses of ClustalX 1.8-assisted alignments were generally more congruent than the "by eye" alignment and the analysis of a simple coded, edited ClustalX 1.8 (gap opening cost 5, gap extension cost 1) alignment revealed the most congruent tree. The total evidence approach supported the new arrangements within Trimeresurus, suggesting that the phylogeny should be considered as a working benchmark in Asian pitviper systematics. Finally, a critical appraisal of the diverse array of indels (56 to 57 per intron, ranging from 1 to 151 bp in length) suggested that they are a combination of Hennigian and homoplasious events unrelated to indel size or location within the intron. [Alignment; indels; intron analysis; length-variant heterozygotes; Trimeresurus.].  相似文献   

3.
Although there has been a recent proliferation in maximum‐likelihood (ML)‐based tree estimation methods based on a fixed sequence alignment (MSA), little research has been done on incorporating indel information in this traditional framework. We show, using a simple model on a single character example, that a trivial alignment of a different form than that previously identified for parsimony is optimal in ML under standard assumptions treating indels as “missing” data, but that it is not optimal when indels are incorporated into the character alphabet. We show that the optimality of the trivial alignment is not an artefact of simplified theory assumptions by demonstrating that trivial alignment likelihoods of five different multiple sequence alignment datasets exhibit this phenomenon. These results demonstrate the need for use of indel information in likelihood analysis on fixed MSAs, and suggest that caution must be exercised when drawing conclusions from software implementations claiming improvements in likelihood scores under an indels‐as‐missing assumption. © The Willi Hennig Society 2012.  相似文献   

4.
We describe a novel model and algorithm for simultaneously estimating multiple molecular sequence alignments and the phylogenetic trees that relate the sequences. Unlike current techniques that base phylogeny estimates on a single estimate of the alignment, we take alignment uncertainty into account by considering all possible alignments. Furthermore, because the alignment and phylogeny are constructed simultaneously, a guide tree is not needed. This sidesteps the problem in which alignments created by progressive alignment are biased toward the guide tree used to generate them. Joint estimation also allows us to model rate variation between sites when estimating the alignment and to use the evidence in shared insertion/deletions (indels) to group sister taxa in the phylogeny. Our indel model makes use of affine gap penalties and considers indels of multiple letters. We make the simplifying assumption that the indel process is identical on all branches. As a result, the probability of a gap is independent of branch length. We use a Markov chain Monte Carlo (MCMC) method to sample from the posterior of the joint model, estimating the most probable alignment and tree and their support simultaneously. We describe a new MCMC transition kernel that improves our algorithm's mixing efficiency, allowing the MCMC chains to converge even when started from arbitrary alignments. Our software implementation can estimate alignment uncertainty and we describe a method for summarizing this uncertainty in a single plot.  相似文献   

5.
Insertion and deletion events (indels) provide a suite of markers with enormous potential for molecular phylogenetics. Using many more indel characters than those in previous studies, we here for the first time address the impact of indel inclusion on the phylogenetic inferences of Arctoidea (Mammalia: Carnivora). Based on 6843 indel characters from 22 nuclear intron loci of 16 species of Arctoidea, our analyses demonstrate that when the indels were not taken into consideration, the monophyly of Ursidae and Pinnipedia tree and the monophyly of Pinnipedia and Musteloidea tree were both recovered, whereas inclusion of indels by using three different indel coding schemes give identical phylogenetic tree topologies supporting the monophyly of Ursidae and Pinnipedia. Our work brings new perspectives on the previously controversial placements among Arctoidea families, and provides another example demonstrating the importance of identifying and incorporating indels in the phylogenetic analyses of introns. In addition, comparison of indel incorporation methods revealed that the three indel coding methods are all advantageous over treating indels as missing data, given that incorporating indels produces consistent results across methods. This is the first report of the impact of different indel coding schemes on phylogenetic reconstruction at the family level in Carnivora, which indicates that indels should be taken into account in the future phylogenetic analyses.  相似文献   

6.
Rapidly evolving, indel-rich phylogenetic markers play a pivotal role in our understanding of the relationships at multiple levels of the tree of life. There is extensive evidence that indels provide conserved phylogenetic signal, however, the range of phylogenetic depths for which gaps retain tree signal has not been investigated in detail. Here we address this question using the fungal internal transcribed spacer (ITS), which is central in many phylogenetic studies, molecular ecology, detection and identification of pathogenic and non-pathogenic species. ITS is repeatedly criticized for indel-induced alignment problems and the lack of phylogenetic resolution above species level, although these have not been critically investigated. In this study, we examined whether the inclusion of gap characters in the analyses shifts the phylogenetic utility of ITS alignments towards earlier divergences. By re-analyzing 115 published fungal ITS alignments, we found that indels are slightly more conserved than nucleotide substitutions, and when included in phylogenetic analyses, improved the resolution and branch support of phylogenies across an array of taxonomic ranges and extended the resolving power of ITS towards earlier nodes of phylogenetic trees. Our results reconcile previous contradicting evidence for the effects of data exclusion: in the case of more sophisticated indel placement, the exclusion of indel-rich regions from the analyses results in a loss of tree resolution, whereas in the case of simpler alignment methods, the exclusion of gapped sites improves it. Although the empirical datasets do not provide to measure alignment accuracy objectively, our results for the ITS region are consistent with previous simulations studies alignment algorithms. We suggest that sophisticated alignment algorithms and the inclusion of indels make the ITS region and potentially other rapidly evolving indel-rich loci valuable sources of phylogenetic information, which can be exploited at multiple taxonomic levels.  相似文献   

7.
The performance of the computer program for phyloge netic analysis, POY, and its two implemented methods, "optimization alignment" and "fixed-states optimization," are explored for four data sets. Four gap costs are analyzed for every partition; some of the partitions (the 18S rRNA) are treated as a single fragment or in increasing fragments of 3, 10, and 30. Comparisons within and among methods are undertaken according to gap cost, number of fragments in which the sequences are divided, tree length, character congruence, topological congruence, primary homology statements, and computation time.  相似文献   

8.
The Sepsidae is, with approximately 300 described species, a relatively small family of cyclorrhaphan flies whose behaviour, morphology, and development have been extensively studied. However, currently the only available tree for Sepsidae is more than 10 years old and was based entirely on morphological characters. Here, we present the results of parsimony and Bayesian analyses based on 75 species, ten genes, and morphology. Parsimony and Bayesian analyses produce largely congruent and well‐supported topologies regardless of whether indels are coded as 5th character states, as missing values, or all sites with indels are removed. The tree confirms the monophyly of Sepsidae and identifies the Ropalomeridae as its sister group. With regard to higher‐level relationships, we identify widespread conflict between the morphological and the DNA sequence data. The proposed hypothesis based on both partitions largely reflects the signal in the molecular data. Particularly surprising is the rejection of two relationship hypotheses with strong morphological support, namely the sister group relationship between Orygma and the remaining Sepsidae and the monophyly of the Sepsis species group. Our partitioned Bremer support (PBS) analyses imply that indel coding has a stronger effect on the relative performance of individual gene partitions than the exclusion of alignment‐ambiguous sequences or the location of a gene on the mitochondrial or nuclear genome. However, these analyses also reveal unexpectedly strong fluctuations in PBS values given that indel treatment has only a minor effect on tree topology and jacknife support. These unexpected fluctuations highlight the need for a comparative study across multiple data sets that investigates the influence of conflict and indel treatment on PBS values. © The Willi Hennig Society 2008.  相似文献   

9.
In this study we use sensitivity analysis sensu Wheeler (1995 ) for a matrix entirely composed of DNA sequences. We propose that not only congruence but also phylogenetic structure, as measured by character resampling, should be used to choose among competing weighting regimes. An extensive analysis of a five‐gene data set for Themira (Sepsidae: Diptera) reveals that even with different ways of partitioning the data, measures of topological congruence, character incongruence, and phylogenetic structure favor similar weighting regimes involving the down‐weighting of transitions. We furthermore use sensitivity analysis for obtaining empirical evidence that allows us to select weights for third positions, deciding between treating indels as fifth character states or missing values, and choosing between manual and computational alignments. For our data, sensitivity analysis favors manual alignment over a Clustal‐generated numerical alignment, the treatment of indels as fifth character states over considering them missing values, and equal weights for all positions in protein‐encoding genes over the down‐weighting of third positions. Among the topological congruence measures compared, symmetric tree distance performed best. Partitioned Bremer Support analysis reveals that COI contributes the largest amount of support for our phylogenetic tree for Themira. © The Willi Hennig Society 2005.  相似文献   

10.
The phylogenetic relationships of 22 species of Coelopidae are reconstructed based on a data matrix consisting of morphological and DNA sequence characters (16S rDNA, EF-1alpha). Optimal gap and transversion costs are determined via a sensitivity analysis and both equal weighting and a transversion cost of 2 are found to perform best based on taxonomic congruence, character incongruence, and tree support. The preferred phylogenetic hypothesis is fully resolved and well-supported by jackknife, bootstrap, and Bremer support values, but it is in conflict with the cladogram based on morphological characters alone. Most notably, the Coelopidae and the genus Coelopa are not monophyletic. However, partitioned Bremer Support and an analysis of node stability under different gap and transversion costs reveal that the critical clades rendering these taxa non-monophyletic are poorly supported. Furthermore, the monophyly of Coelopidae and Coelopa is not rejected in analyses using 16S rDNA that was manually aligned. The resolution of the tree based on this reduced data sets is, however, lower than for the tree based on the full data sets. Partitioned Bremer support values reveal that 16S rDNA characters provide the largest amount of tree support, but the support values are heavily dependent on analysis conditions. Problems with direct comparison of branch support values for trees derived using fixed alignments with those obtained under optimization alignment are discussed. Biogeographic history and available behavioral and genetic data are also discussed in light of this first cladogram for Coelopidae based on a quantitative phylogenetic analysis.  相似文献   

11.
To test whether gaps resulting from sequence alignment contain phylogenetic signal concordant with those of base substitutions, we analyzed the occurrence of indel mutations upon a well-resolved, substitution-based tree for three nuclear genes in bumble bees (Bombus, Apidae: Bombini). The regions analyzed were exon and intron sequences of long-wavelength rhodopsin (LW Rh), arginine kinase (ArgK), and elongation factor-1alpha (EF-1alpha) F2 copy genes. LW Rh intron had only a few uninformative gaps, ArgK intron had relatively long gaps that were easily aligned, and EF-1alpha intron had many short gaps, resulting in multiple optimal alignments. The unambiguously aligned gaps within ArgK intron sequences showed no homoplasy upon the substitution-based tree, and phylogenetic signals within ambiguously aligned regions of EF-1alpha intron were highly congruent with those of base substitutions. We further analyzed the contribution of gap characters to phylogenetic reconstruction by incorporating them in parsimony analysis. Inclusion of gap characters consistently improved support for nodes recovered by substitutions, and inclusion of ambiguously aligned regions of EF-1alpha intron resolved several additional nodes, most of which were apical on the phylogeny. We conclude that gaps are an exceptionally reliable source of phylogenetic information that can be used to corroborate and refine phylogenies hypothesized by base substitutions, at least at lower taxonomic levels. At present, full use of gaps in phylogenetic reconstruction is best achieved in parsimony analysis, pending development of well-justified and generally applicable methods for incorporating indels in explicitly model-based methods.  相似文献   

12.
Sensitivity analysis provides a way to measure robustness of clades in sequence‐based phylogenetic analyses to variation in alignment parameters rather than measuring their branch support. We compared three different approaches to multiple sequence alignment in the context of sensitivity analysis: progressive pairwise alignment, as implemented in MUSCLE; simultaneous multiple alignment of sequence fragments, as implemented in DCA; and direct optimization followed by generation of the implied alignment(s), as implemented in POY. We set out to determine the relative sensitivity of these three alignment methods using rDNA sequences and randomly generated sequences. A total of 36 parameter sets were used to create the alignments, varying the transition, transversion, and gap costs. Tree searches were performed using four different character‐coding and weighting approaches: the cost function used for alignment or equally weighted parsimony with gap positions treated as missing data, separate characters, or as fifth states. POY was found to be as sensitive, or more sensitive, to variation in alignment parameters than DCA and MUSCLE for the three empirical datasets, and POY was found to be more sensitive than MUSCLE, which in turn was found to be as sensitive, or more sensitive, than DCA when applied to the randomly generated sequences when sensitivity was measured using the averaged jackknife values. When significant differences in relative sensitivity were found between the different ways of weighting character‐state changes, equally weighted parsimony, for all three ways of treating gapped positions, was less sensitive than applying the same cost function used in alignment for phylogenetic analysis. When branch support is incorporated into the sensitivity criterion, our results favour the use of simultaneous alignment and progressive pairwise alignment using the similarity criterion over direct optimization followed by using the implied alignment(s) to calculate branch support.  相似文献   

13.

Background

Several ways of incorporating indels into phylogenetic analysis have been suggested. Simple indel coding has two strengths: (1) biological realism and (2) efficiency of analysis. In the method, each indel with different start and/or end positions is considered to be a separate character. The presence/absence of these indel characters is then added to the data set.

Algorithm

We have written a program, GapCoder to automate this procedure. The program can input PIR format aligned datasets, find the indels and add the indel-based characters. The output is a NEXUS format file, which includes a table showing what region each indel characters is based on. If regions are excluded from analysis, this table makes it easy to identify the corresponding indel characters for exclusion.

Discussion

Manual implementation of the simple indel coding method can be very time-consuming, especially in data sets where indels are numerous and/or overlapping. GapCoder automates this method and is therefore particularly useful during procedures where phylogenetic analyses need to be repeated many times, such as when different alignments are being explored or when various taxon or character sets are being explored. GapCoder is currently available for Windows from http://www.home.duq.edu/~youngnd/GapCoder.
  相似文献   

14.
A search was performed for single-nucleotide polymorphisms (SNP) and short insertions-deletions (indels) in 34 melon (Cucumis melo L.) expressed sequence tag (EST) fragments between two distantly related melon genotypes, a group Inodorus 'Piel de sapo' market class breeding line T111 and the Korean accession PI 161375. In total, we studied 15 kb of melon sequence. The average frequency of SNPs between the two genotypes was one every 441 bp. One indel was also found every 1666 bp. Seventy-five percent of the polymorphisms were located in introns and the 3'untranslated regions. On average, there were 1.26 SNPs plus indels per amplicon. We explored three different SNP detection systems to position five of the SNPs in a melon genetic map. Three of the SNPs were mapped using cleaved amplified polymorphic sequence (CAPS) markers, one SNP was mapped using the single primer extension reaction with fluorescent-labelled dideoxynucleotides, and one indel was mapped using polyacrilamide gel electrophoresis separation. The discovery of SNPs based on ESTs and a suitable system for SNP detection has broad potential utility in melon genome mapping.  相似文献   

15.
Each holotype specimen provides the only objective link to a particular Linnean binomen. Sequence information from them is increasingly valuable due to the growing usage of DNA barcodes in taxonomy. As type specimens are often old, it may only be possible to recover fragmentary sequence information from them. We tested the efficacy of short sequences from type specimens in the resolution of a challenging taxonomic puzzle: the Elachista dispunctella complex which includes 64 described species with minuscule morphological differences. We applied a multistep procedure to resolve the taxonomy of this species complex. First, we sequenced a large number of newly collected specimens and as many holotypes as possible. Second, we used all >400 bp examine species boundaries. We employed three unsupervised methods (BIN, ABGD, GMYC) with specified criteria on how to handle discordant results and examined diagnostic bases from each delineated putative species (operational taxonomic units, OTUs). Third, we evaluated the morphological characters of each OTU. Finally, we associated short barcodes from types with the delineated OTUs. In this step, we employed various supervised methods, including distance‐based, tree‐based and character‐based. We recovered 658 bp barcode sequences from 194 of 215 fresh specimens and recovered an average of 141 bp from 33 of 42 holotypes. We observed strong congruence among all methods and good correspondence with morphology. We demonstrate potential pitfalls with tree‐, distance‐ and character‐based approaches when associating sequences of varied length. Our results suggest that sequences as short as 56 bp can often provide valuable taxonomic information. The results support significant taxonomic oversplitting of species in the Elachista dispunctella complex.  相似文献   

16.
We are interested in detecting homologous genomic DNA sequences with the goal of locating approximate inverted, interspersed, and tandem repeats. Standard search techniques start by detecting small matching parts, called seeds, between a query sequence and database sequences. Contiguous seed models have existed for many years. Recently, spaced seeds were shown to be more sensitive than contiguous seeds without increasing the random hit rate. To determine the superiority of one seed model over another, a model of homologous sequence alignment must be chosen. Previous studies evaluating spaced and contiguous seeds have assumed that matches and mismatches occur within these alignments, but not insertions and deletions (indels). This is perhaps appropriate when searching for protein coding sequences (<5% of the human genome), but is inappropriate when looking for repeats in the majority of genomic sequence where indels are common. In this paper, we assume a model of homologous sequence alignment which includes indels and we describe a new seed model, called indel seeds, which explicitly allows indels. We present a waiting time formula for computing the sensitivity of an indel seed and show that indel seeds significantly outperform contiguous and spaced seeds when homologies include indels. We discuss the practical aspect of using indel seeds and finally we present results from a search for inverted repeats in the dog genome using both indel and spaced seeds.  相似文献   

17.
We use a multigene data set (the mitochondrial locus and nine nuclear gene regions) to test phylogenetic relationships in the South American "lava lizards" (genus Microlophus) and describe a strategy for aligning noncoding sequences that accounts for differences in tempo and class of mutational events. We focus on seven nuclear introns that vary in size and frequency of multibase length mutations (i.e., indels) and present a manual alignment strategy that incorporates insertions and deletions (indels) for each intron. Our method is based on mechanistic explanations of intron evolution that does not require a guide tree. We also use a progressive alignment algorithm (Probabilistic Alignment Kit; PRANK) and distinguishes insertions from deletions and avoids the "gapcost" conundrum. We describe an approach to selecting a guide tree purged of ambiguously aligned regions and use this to refine PRANK performance. We show that although manual alignment is successful in finding repeat motifs and the most obvious indels, some regions can only be subjectively aligned, and there are limits to the size and complexity of a data matrix for which this approach can be taken. PRANK alignments identified more parsimony-informative indels while simultaneously increasing nucleotide identity in conserved sequence blocks flanking the indel regions. When comparing manual and PRANK with two widely used methods (CLUSTAL, MUSCLE) for the alignment of the most length-variable intron, only PRANK recovered a tree congruent at deeper nodes with the combined data tree inferred from all nuclear gene regions. We take this concordance as an objective function of alignment quality and present a strongly supported phylogenetic hypothesis for Microlophus relationships. From this hypothesis we show that (1) a coded indel data partition derived from the PRANK alignment contributed significantly to nodal support and (2) the indel data set permitted detection of significant conflict between mitochondrial and nuclear data partitions, which we hypothesize arose from secondary contact of distantly related taxa, followed by hybridization and mtDNA introgression.  相似文献   

18.
19.
Brachytheciaceae is often considered a taxonomically difficult group of mosses. For example, morphological variation has led to difficulty in generic delimitation. We used DNA sequence data (chloroplast psbT‐H and trnL‐F and nuclear ITS2) together with morphology (63 characters) to examine the relationships within this family. The combined unaligned length of the DNA sequences used in the phylogenetic analyses varied between 1277 and 1343 bp. For phylogeny reconstruction we performed direct optimization, as implemented in POY. Analyses were performed with three different gap costs and the morphological data partition was weighted both: (1) equal to gap cost, and (2) with a weight of one. The utility of sensitivity analysis has recently been cast into doubt; hence in this study it was performed only to explore the effects of weighting on homology statements and topologies and to enable more detailed comparisons between earlier studies utilizing the direct optimization method. The wide sequence length variation of non‐coding ITS2 sequences resulted in character optimizations (i.e., “alignments”) of very different lengths when various gap costs were applied. Despite this variation, the topologies of equally parsimonious trees remained fairly stable. The inclusion of several outgroups, instead of only one, was observed to increase the congruence between data sets and to slightly increase the resolution. An inversion event in the 9 bp loop region in the chloroplast psbT‐N spacer in mosses has been postulated to include only uninformative variation, thus possibly negatively impacting the phylogeny reconstruction. Despite this inversion, its variation within Brachytheciaceae was clearly congruent with information from other sources, but inclusion of these 9 bp in the analysis had only a minor effect on the phylogenetic results. In the most parsimonious topology, which was obtained with equal weighting of all data, Meteoriaceae and Brachytheciaceae were resolved as monophyletic sister groups, which had recently been suggested based on a few shared morphological characters. Our study revealed some new generic relationships within the Brachytheciaceae, which are discussed in light of the morphological characters traditionally used for generic delimitation.  相似文献   

20.
Simulation with indels was used to produce alignments where true site homologies in DNA sequences were known; the gaps from these datasets were removed and the sequences were then aligned to produce hypothesized alignments. Both alignments were then analyzed under three widely used methods of treating gaps during tree reconstruction under the maximum parsimony principle. With the true alignments, for many cases (82%), there was no difference in topological accuracy for the different methods of gap coding. However, in cases where a difference was present, coding gaps as a fifth state character or as separate presence/absence characters outperformed treating gaps as unknown/missing data nearly 90% of the time. For the hypothesized alignments, on average, all gap treatment approaches performed equally well. Data sets with higher sequence divergence and more pectinate tree shapes with variable branch lengths are more affected by gap coding than datasets associated with shallower non-pectinate tree shapes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号