首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Most coding strategies that address the problem of questionable alignment (elision, case sensitive, missing, polymorphic, gaps as presence/absence matrix) conflict with phylogenetic principles, particularly those relating to the concept of homology (shared similiarity explained by common ancestry). In some cases, the test of conjunction is failed. In other cases, characters that are coded ambiguously can lead to character-state optimization in the terminal taxa that conflicts with the original observations. Only data exclusion and contraction avoid these pitfalls. In highly dissimilar sequences additional character states can represent the available information. Two new methods that accomplish this—block and stretch coding—are introduced here. These two new coding strategies are not in conflict with the test of conjunction and do not contradict the original observations. They are comparable to coding practices with morphological data once the intrinsic differences due to character-state identity and topographical identity have been taken into account. It is suggested that, of the three recoding methods, the one is selected that preserves the maximum potential phylogenetic information as measured with the minimum number of steps required for the particular part of the data matrix. Received: 1 August 2000 / Accepted: 10 July 2001  相似文献   

2.
To test whether gaps resulting from sequence alignment contain phylogenetic signal concordant with those of base substitutions, we analyzed the occurrence of indel mutations upon a well-resolved, substitution-based tree for three nuclear genes in bumble bees (Bombus, Apidae: Bombini). The regions analyzed were exon and intron sequences of long-wavelength rhodopsin (LW Rh), arginine kinase (ArgK), and elongation factor-1alpha (EF-1alpha) F2 copy genes. LW Rh intron had only a few uninformative gaps, ArgK intron had relatively long gaps that were easily aligned, and EF-1alpha intron had many short gaps, resulting in multiple optimal alignments. The unambiguously aligned gaps within ArgK intron sequences showed no homoplasy upon the substitution-based tree, and phylogenetic signals within ambiguously aligned regions of EF-1alpha intron were highly congruent with those of base substitutions. We further analyzed the contribution of gap characters to phylogenetic reconstruction by incorporating them in parsimony analysis. Inclusion of gap characters consistently improved support for nodes recovered by substitutions, and inclusion of ambiguously aligned regions of EF-1alpha intron resolved several additional nodes, most of which were apical on the phylogeny. We conclude that gaps are an exceptionally reliable source of phylogenetic information that can be used to corroborate and refine phylogenies hypothesized by base substitutions, at least at lower taxonomic levels. At present, full use of gaps in phylogenetic reconstruction is best achieved in parsimony analysis, pending development of well-justified and generally applicable methods for incorporating indels in explicitly model-based methods.  相似文献   

3.
Near-full-length 18S and 28S rRNA gene sequences were obtained for 33 nematode species. Datasets were constructed based on secondary structure and progressive multiple alignments, and clades were compared for phylogenies inferred by Bayesian and maximum likelihood methods. Clade comparisons were also made following removal of ambiguously aligned sites as determined using the program ProAlign. Different alignments of these data produced tree topologies that differed, sometimes markedly, when analyzed by the same inference method. With one exception, the same alignment produced an identical tree topology when analyzed by different methods. Removal of ambiguously aligned sites altered the tree topology and also reduced resolution. Nematode clades were sensitive to differences in multiple alignments, and more than doubling the amount of sequence data by addition of 28S rRNA did not fully mitigate this result. Although some individual clades showed substantially higher support when 28S data were combined with 18S data, the combined analysis yielded no statistically significant increases in the number of clades receiving higher support when compared to the 18S data alone. Secondary structure alignment increased accuracy in positional homology assignment and, when used in combination with paired-site substitution models, these structural hypotheses of characters and improved models of character state change yielded high levels of phylogenetic resolution. Phylogenetic results included strong support for inclusion of Daubaylia potomaca within Cephalobidae, whereas the position of Fescia grossa within Tylenchina varied depending on the alignment, and the relationships among Rhabditidae, Diplogastridae, and Bunonematidae were not resolved.  相似文献   

4.

Background  

The quality of multiple sequence alignments plays an important role in the accuracy of phylogenetic inference. It has been shown that removing ambiguously aligned regions, but also other sources of bias such as highly variable (saturated) characters, can improve the overall performance of many phylogenetic reconstruction methods. A current scientific trend is to build phylogenetic trees from a large number of sequence datasets (semi-)automatically extracted from numerous complete genomes. Because these approaches do not allow a precise manual curation of each dataset, there exists a real need for efficient bioinformatic tools dedicated to this alignment character trimming step.  相似文献   

5.
With the growing number of phylogenetic studies that use length variable DNA sequences, incorporating information from length-mutational events into phylogenetic analysis is becoming increasingly important. A new method, modified complex indel coding is described that aims at maximizing the phylogenetic information retained from unambiguously aligned sequence regions or regions where the principal relative position of gaps to one another can be safely established. An algorithm is described that allows application of the method to all theoretically possible gap-nucleotide patterns. A platform-independent computer program is introduced that automates the new method as well as several previously published coding schemes. Differences to previously published indel coding approaches as well as to the integration of ambiguously aligned regions into phylogenetic analysis are discussed.  相似文献   

6.
We use a multigene data set (the mitochondrial locus and nine nuclear gene regions) to test phylogenetic relationships in the South American "lava lizards" (genus Microlophus) and describe a strategy for aligning noncoding sequences that accounts for differences in tempo and class of mutational events. We focus on seven nuclear introns that vary in size and frequency of multibase length mutations (i.e., indels) and present a manual alignment strategy that incorporates insertions and deletions (indels) for each intron. Our method is based on mechanistic explanations of intron evolution that does not require a guide tree. We also use a progressive alignment algorithm (Probabilistic Alignment Kit; PRANK) and distinguishes insertions from deletions and avoids the "gapcost" conundrum. We describe an approach to selecting a guide tree purged of ambiguously aligned regions and use this to refine PRANK performance. We show that although manual alignment is successful in finding repeat motifs and the most obvious indels, some regions can only be subjectively aligned, and there are limits to the size and complexity of a data matrix for which this approach can be taken. PRANK alignments identified more parsimony-informative indels while simultaneously increasing nucleotide identity in conserved sequence blocks flanking the indel regions. When comparing manual and PRANK with two widely used methods (CLUSTAL, MUSCLE) for the alignment of the most length-variable intron, only PRANK recovered a tree congruent at deeper nodes with the combined data tree inferred from all nuclear gene regions. We take this concordance as an objective function of alignment quality and present a strongly supported phylogenetic hypothesis for Microlophus relationships. From this hypothesis we show that (1) a coded indel data partition derived from the PRANK alignment contributed significantly to nodal support and (2) the indel data set permitted detection of significant conflict between mitochondrial and nuclear data partitions, which we hypothesize arose from secondary contact of distantly related taxa, followed by hybridization and mtDNA introgression.  相似文献   

7.
Phylogenetic relationships among 95 genera collectively representing 17 of the 18 currently recognized cyclostome braconid wasp subfamilies were investigated based on DNA sequence fragments of the mitochondrial COI and the nuclear 28S rDNA genes, in addition to morphological data. The treatment of sequence length variation of the 28S partition was explored by either excluding ambiguously aligned regions and indel information (28SN) or recoding them (28SA) using the 'fragment-level' alignment method with a modified coding approach. Bayesian MCMC analyses were performed for the separate and combined data sets and a maximum parsimony analysis was also carried out for the simultaneous molecular and morphological data sets. There was a significant incongruence between the two genes and between 28S and morphology, but not for morphology and COI. Different analyses with the 28SA data matrix resulted in topologies that were generally similar to the ones from the 28SN matrix; however, the former topologies recovered a higher number of significantly supported clades and had a higher mean Bayesian posterior probability, thus supporting the inclusion of information from ambiguously aligned regions and indel events in phylogenetic analyses where possible. Based on the significantly supported clades obtained from the simultaneous molecular and morphological analyses, we propose that a total of 17 subfamilies should be recognized within the cyclostome group. The subfamilial placements of several problematic cyclostome genera were also established.  相似文献   

8.
The small subunit of nuclear ribosomal RNA (SSU nrRNA), whose sedimentation is mostly 18S in eukaryotes, is considered a relatively conservative marker for resolving phylogenetic relationship at the order level or higher. Length variation in SSU nrDNA is common, and can be rather large in some groups. In studies of Hexapoda phylogeny, the SSU nrDNA has been repeatedly used as a marker. Sternorrhyncha has been rarely included. The lengths of SSU nrDNAs of sternorrhynchids, the basal group of Hemiptera identified in the previous study are 0.3-0.6 kb longer than the usual ones in Hexapoda (1.8-1.9 kb). To use the entire SSU nrDNA sequences or the length-variable parts could cause alignment trouble and therefore affect phylogenetic results, as shown in this study of Euhemiptera phylogeny. Two problems are particularly noticeable. One is that two hyper-variable regions flanking a short length-conservative region could become overlapped in the alignment. This will destroy the positional homology over a larger range. The other is that, when a base pair in a stem of the secondary structure is located near the length-variable regions (LVRs), the simultaneous positional homology of these two bases in the pair is always lost in the alignment results. In this study, the secondary structure model of Hexapoda SSU nrRNA was slightly adjusted and the LVR distributions in it were finely positioned. The noise caused by the hyper LVRs was eliminated and the simultaneous homology for the paired bases was recovered based on the secondary structure model. These corrections improved the quality of the data matrix and hence improved the resolving behavior of the algorithm used. This study provided more convincing evidence for resolving the Euhemiptera suborders phylogeny as (Archaeorrhyncha+(Clypeorrhyncha+(Coleorrhyncha+Heteroptera))). This result provided a more solid background for outgroup determination according to the phylogenetic studies inside each suborder. The problems caused by LVRs have seldom been well addressed. As phylogenetic reconstruction depends more on the data matrix itself than on the algorithm, and length variation of SSU/LSU rRNA exists more or less in any group, it is necessary to closely investigate the effect of rRNA length variation on alignment and phylogenetic reconstruction in more groups.  相似文献   

9.
The ribosomal DNA comprised of the ITS1-5.8S-ITS2 regions is widely used as a fungal marker in molecular ecology and systematics but cannot be aligned with confidence across genetically distant taxa. In order to study the diversity of Agaricomycotina in forest soils, we designed primers targeting the more alignable 28S (LSU) gene, which should be more useful for phylogenetic analyses of the detected taxa. This paper compares the performance of the established ITS1F/4B primer pair, which targets basidiomycetes, to that of two new pairs. Key factors in the comparison were the diversity covered, off-target amplification, rarefaction at different Operational Taxonomic Unit (OTU) cutoff levels, sensitivity of the method used to process the alignment to missing data and insecure positional homology, and the congruence of monophyletic clades with OTU assignments and BLAST-derived OTU names. The ITS primer pair yielded no off-target amplification but also exhibited the least fidelity to the expected phylogenetic groups. The LSU primers give complementary pictures of diversity, but were more sensitive to modifications of the alignment such as the removal of difficult-to align stretches. The LSU primers also yielded greater numbers of singletons but also had a greater tendency to produce OTUs containing sequences from a wider variety of species as judged by BLAST similarity. We introduced some new parameters to describe alignment heterogeneity based on Shannon entropy and the extent and contents of the OTUs in a phylogenetic tree space. Our results suggest that ITS should not be used when calculating phylogenetic trees from genetically distant sequences obtained from environmental DNA extractions and that it is inadvisable to define OTUs on the basis of very heterogeneous alignments.  相似文献   

10.
The MUST package is a phylogenetically oriented set of programs for data management and display, allowing one to handle both raw data (sequences) and results (trees, number of steps, bootstrap proportions). It is complementary to the main available software for phylogenetic analysis (PHYLIP, PAUP, HENNING86, CLUSTAL) with which it is fully compatible. The first part of MUST consists of the acquisition of new sequences, their storage, modification, and checking of sequence integrity in files of aligned sequences. In order to improve alignment, an editor function for aligned sequences offers numerous options, such as selection of subsets of sequences, display of consensus sequences, and search for similarities over small sequence fragments. For phylogenetic reconstruction, the choice of species and portions of sequences to be analyzed is easy and very rapid, permitting fast testing of numerous combinations of sequences and taxa. The resulting files can be formatted for most programs of tree construction. An interactive tree-display program recovers the output of all these programs. Finally, various modules allow an in-depth analysis of results, such as comparison of distance matrices, variation of bootstrap proportions with respect to various parameters or comparison of the number of steps per position. All presently available complete sequences of 28S rRNA are furnished aligned in the package. MUST therefore allows the management of all the operations required for phylogenetic reconstructions.  相似文献   

11.
Putative synapomorphy assessment (primary homology assessment) is distinct for DNA strings having a codon structure (hereafter, coding DNA) versus those lacking it (hereafter, non-coding DNA). The first requires the identification of a reading frame and of usually few in-frame insertions and deletions. In non-coding DNA, where length variation is much more common, putative synapomorphy assessment is considerably less straightforward and highly depends on the alignment method. Appreciating the existence of evolutionary constraints, alignments that consider patterns associated with specific putative evolutionary events are favored. Once the sequences have been aligned, the postulated putative evolutionary events need to be coded as an additional step. In order for the alignments and the alignment coding to be falsifiable, they should be carried out using justified and explicitly formulated criteria. Alternative coding methods for the most common patterns present in alignments of non-coding DNA are discussed here. Simpler putative synapomorphy assessment will not always correlate to more reliable phylogenetic information because simplicity does not necessarily correlate to the degree of homoplasy. The use of non-coding DNA can result in more laborious coding, but at the same time in more corroborated hypotheses, mirroring their accuracy for phylogenetic inference.  相似文献   

12.
The Teloschistaceae is a widespread family with considerable morphological and ecological heterogeneity across genera and species groups. In order to provide a comprehensive molecular phylogeny for this family, phylogenetic analyses were carried out on sequences from the nuclear ribosomal ITS region obtained from 114 individuals that represent virtually all main lineages of Teloschistaceae. Our study confirmed the polyphyly of Caloplaca, Fulgensia and Xanthoria, and revealed that Teloschistes is probably non-monophyletic. We also confirm here that species traditionally included in Caloplaca subgenus Gasparrinia do not form a monophyletic entity. Caloplaca aurantia, C. carphinea and C. saxicola s. str. groups were recovered as monophyletic. The subgenera Caloplaca and Pyrenodesmia were also polyphyletic. In the subgenus Caloplaca, the traditionally recognized C. cerina group was recovered as monophyletic. Because this study is based solely on ITS, to maximize taxon sampling, the inclusion of phylogenetic signal from ambiguously aligned regions in MP (recoded INAASE and arc characters) resulted in the most highly supported phylogenetic reconstruction, compared with Bayesian inference restricted to alignable sites.  相似文献   

13.
A global alignment of EF-G(2) sequences was corrected by reference to protein structure. The selection of characters eligible for construction of phylogenetic trees was optimized by searching for regions arising from the artifactual matching of sequence segments unique to different phylogenetic domains. The spurious matchings were identified by comparing all sections of the global alignment with a comprehensive inventory of significant binary alignments obtained by BLAST probing of the DNA and protein databases with representative EF-G(2) sequences. In three discrete alignment blocks (one in domain II and two in domain IV), the alignment of the bacterial sequences with those of Archaea–Eucarya was not retrieved by database probing with EF-G(2) sequences, and no EF-G homologue of the EF-2 sequence segments was detected by using partial EF-G(2) sequences as probes in BLAST/FASTA searches. The two domain IV regions (one of which comprises the ADP-ribosylatable site of EF-2) are almost certainly due to the artifactual alignment of insertion segments that are unique to Bacteria and to Archaea–Eucarya. Phylogenetic trees have been constructed from the global alignment after deselecting positions encompassing the unretrieved, spuriously aligned regions, as well as positions arising from misalignment of the G′ and G″ subdomain insertion segments flanking the ``fifth' consensus motif of the G domain (?varsson, 1995). The results show inconsistencies between trees inferred by alternative methods and alternative (DNA and protein) data sets with regard to Archaea being a monophyletic or paraphyletic grouping. Both maximum-likelihood and maximum-parsimony methods do not allow discrimination (by log-likelihood difference and difference in number of inferred substitutions) between the conflicting (monophyletic vs. paraphyletic Archaea) topologies. No specific EF-2 insertions (or terminal accretions) supporting a crenarchaeal–eucaryal clade are detectable in the new EF-G(2) sequence alignment.  相似文献   

14.
The chromosome arrangement in interphase nuclei is of growing interest, e.g., the spatial vicinity of homologous sequences is decisive for efficient repair of DNA damage by homologous recombination, and close alignment of sister chromatids is considered as a prerequisite for their bipolar orientation and subsequent segregation during nuclear division. To study the degree of homologous pairing and of sister chromatid alignment in plants, we applied fluorescent in situ hybridisation with specific bacterial artificial chromosome inserts to interphase nuclei. Previously we found in Arabidopsis thaliana and in A. lyrata positional homologous pairing at random, and, except for centromere regions, sister chromatids were frequently not aligned. To test whether these features are typical for higher plants or depend on genome size, chromosome organisation and/or phylogenetic affiliation, we investigated distinct individual loci in other species. The positional pairing of these loci was mainly random. The highest frequency of sister alignment (in >93% of homologues) was found for centromeres, some rDNA and a few other high copy loci. Apparently, somatic homologous pairing is not a typical feature of angiosperms, and sister chromatid aligment is not obligatory along chromosome arms. Thus, the high frequency of chromatid exchanges at homologous positions after mutagen treatment needs another explanation than regular somatic pairing of homologues (possibly an active search of damaged sites for homology). For sister chromatid exchanges a continuous sister chromatid alignment is not required. For correct segregation, permanent alignment of sister centromeres is sufficient.  相似文献   

15.
A total of 20%-25% of the proteins in a typical genome are helical membrane proteins. The transmembrane regions of these proteins have markedly different properties when compared with globular proteins. This presents a problem when homology search algorithms optimized for globular proteins are applied to membrane proteins. Here we present modifications of the standard Smith-Waterman and profile search algorithms that significantly improve the detection of related membrane proteins. The improvement is based on the inclusion of information about predicted transmembrane segments in the alignment algorithm. This is done by simply increasing the alignment score if two residues predicted to belong to transmembrane segments are aligned with each other. Benchmarking over a test set of G-protein-coupled receptor sequences shows that the number of false positives is significantly reduced in this way, both when closely related and distantly related proteins are searched for.  相似文献   

16.
Multiple sequence alignment with hierarchical clustering.   总被引:155,自引:8,他引:147       下载免费PDF全文
F Corpet 《Nucleic acids research》1988,16(22):10881-10890
An algorithm is presented for the multiple alignment of sequences, either proteins or nucleic acids, that is both accurate and easy to use on microcomputers. The approach is based on the conventional dynamic-programming method of pairwise alignment. Initially, a hierarchical clustering of the sequences is performed using the matrix of the pairwise alignment scores. The closest sequences are aligned creating groups of aligned sequences. Then close groups are aligned until all sequences are aligned in one group. The pairwise alignments included in the multiple alignment form a new matrix that is used to produce a hierarchical clustering. If it is different from the first one, iteration of the process can be performed. The method is illustrated by an example: a global alignment of 39 sequences of cytochrome c.  相似文献   

17.
It is at present difficult to accurately position gaps in sequence alignment and to determine substructural homology in structure alignment when reconstructing phylogenies based on highly divergent sequences. Therefore, we have developed a new strategy for inferring phylogenies based on highly divergent sequences. In this new strategy, the whole secondary structure presented as a string in bracket notation is used as phylogenetic characters to infer phylogenetic relationships. It is no longer necessary to decompose the secondary structure into homologous substructural components. In this study, reliable phylogenetic relationships of eight species in Pectinidae were inferred from the structure alignment, but not from sequence alignment, even with the aid of structural information. The results suggest that this new strategy should be useful for inferring phylogenetic relationships based on highly divergent sequences. Moreover, the structural evolution of ITS1 in Pectinidae was also investigated. The whole ITS1 structure could be divided into four structural domains. Compensatory changes were found in all four structural domains. Structural motifs in these domains were identified further. These motifs, especially those in D2 and D3, may have important functions in the maturation of rRNAs.  相似文献   

18.
In order to maximise the positional homology in the primary sequence alignment of the second internal transcribed spacer for 30 species of equine strongyloid nematodes, the secondary structures of the precursor ribosomal RNA were predicted using an approach combining an energy minimisation method and comparative sequence analysis. The results indicated that a common secondary structure model of the second internal transcribed spacer of these nematodes was maintained, despite significant interspecific differences (2–56%) in primary sequences. The secondary structure model was then used to refine the primary second internal transcribed spacer sequence alignment. The “manual” and “structure” alignments were both subjected to phylogenetic analysis using three different tree-building methods to compare the effect of using different sequence alignments on phylogenetic inference. The topologies of the phylogenetic trees inferred from the manual second internal transcribed spacer alignment were usually different to those derived from the structure second internal transcribed spacer alignment. The results suggested that the positional homology in the second internal transcribed spacer primary sequence alignment was maximised when the secondary structure model was taken into consideration.  相似文献   

19.
Fulgensia Massal. & De Not. is a widespread genus with considerable morphological and ecological heterogeneity across species. For this reason, the taxonomic delimitation of this genus has been controversial. Relationships among species of Fulgensia, Caloplaca Th. Fr., and Xanthoria (Fr.) Th. Fr. (Lecanorales) were investigated based on a comprehensive phylogenetic analysis of 62 DNA sequences from the nuclear ribosomal internal transcribed spacer (ITS) region using maximum parsimony (MP) and likelihood (ML). Ambiguously aligned (INAASE coded characters) and unambiguous regions were analyzed separately and combined when using MP as the optimization criterion. All our analyses confirm the polyphyly of this genus as three distinct lineages: Fulgensia sensu stricto, F. australis, and F. schistidii. We report here that Caloplaca, Fulgensia, and Xanthoria together form two main sister lineages. One lineage includes Fulgensia schistidii (part of the C. saxicola group), Xanthoria, and most of the lobed Caloplaca species belonging to the Gasparrinia group. A second main lineage comprises the remaining Caloplaca species, Fulgensia sensu stricto, and F. australis. Therefore, the traditional generic level classification schemes for the family Teloschistaceae appear to be highly artificial. All three genera were found to be nonmonophyletic. We demonstrate here that the ITS is appropriate to resolve relationships across the Teloschistaceae. However, a combination of an MP analysis, in which ambiguously aligned regions are accommodated using INAASE, with an ML analysis, in which phylogenetic confidence is estimated using a Bayesian approach, is needed.  相似文献   

20.
An earlier analysis of the trnL intron in the Colletieae (Rhamnaceae) showed polyphyly of the genus Discaria. Polyphyly of Discaria is supported only by an AT-rich region of ambiguous alignment within the trnL intron. Polyphyly of the genus relies on extracting the information of the AT-rich region correctly. Ambiguously aligned regions are commonly excluded from phylogenetic analysis. In the present study the question was raised whether random or noisy data could generate a pattern like the one found in the AT-rich region of ambiguous alignment. The original pattern was resistant to changes in alignment parameter cost when submitted to a sensitivity analysis using direct optimization. Artificially generated random or noisy data gave well-resolved trees but these were found to be extremely sensitive to changes in parameter costs. However, information from additional data, such as conserved regions, restricts the influence of random data. It is here suggested that the information in ambiguously aligned regions need not be dismissed, provided that an appropriate method that finds all possible optimal alignments is used to extract the information. In addition to commonly used support measures, some information of robustness to changes in alignment parameter costs is needed in order to make the most reliable conclusions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号