首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
The future of phylogeny reconstruction   总被引:1,自引:0,他引:1  
A new approach to phylogenetic analysis, parsimony jackknifing, uses simple parsimony calculations combined with resampling of characters to arrive at a tree comprising well-supported groups. This is usually much the same as the consensus of most-parsimonious trees found from extensive multiple-tree calculations, but the new method is thousands of times faster, allowing analysis of much larger data matrices, and also provides information on the strength of support for different groups. Jackknife frequencies provide a more reliable assessment of support than do alternative methods, notably "confidence probability" (CP) and T-PTP testing.  相似文献   

2.
MOTIVATION: Pairwise local sequence alignment is commonly used to search data bases for sequences related to some query sequence. Alignments are obtained using a scoring matrix that takes into account the different frequencies of occurrence of the various types of amino acid substitutions. Software like BLAST provides the user with a set of scoring matrices available to choose from, and in the literature it is sometimes recommended to try several scoring matrices on the sequences of interest. The significance of an alignment is usually assessed by looking at E-values and p-values. While sequence lengths and data base sizes enter the standard calculations of significance, it is much less common to take the use of several scoring matrices on the same sequences into account. Altschul proposed corrections of the p-value that account for the simultaneous use of an infinite number of PAM matrices. Here we consider the more realistic situation where the user may choose from a finite set of popular PAM and BLOSUM matrices, in particular the ones available in BLAST. It turns out that the significance of a result can be considerably overestimated, if a set of substitution matrices is used in an alignment problem and the most significant alignment is then quoted. RESULTS: Based on extensive simulations, we study the multiple testing problem that occurs when several scoring matrices for local sequence alignment are used. We consider a simple Bonferroni correction of the p-values and investigate its accuracy. Finally, we propose a more accurate correction based on extreme value distributions fitted to the maximum of the normalized scores obtained from different scoring matrices. For various sets of matrices we provide correction factors which can be easily applied to adjust p- and E-values reported by software packages.  相似文献   

3.
Leucorrhinia (Odonata, Anisoptera, Libellulidae) consists of 14-15 species with a holarctic distribution. We have combined the morphological characters of a previous study with sequence data from the ITS1, 5.8S rDNA, and ITS2 regions of the nuclear ribosomal repeat. Cloning was used to investigate the intra-individual variation and such variation was found in all investigated species. Parsimony jackknifing was used to identify supported groups. The effect of sequence alignment and gap coding was explored by a modified sensitivity analysis. Loss of spines in Leucorrhinia larvae has occurred twice: once in Europe and once in North America. The role of spines as a defence against predation is discussed in a phylogenetic context.  相似文献   

4.
Alignment of nucleotide and/or amino acid sequences is a fundamental component of sequence‐based molecular phylogenetic studies. Here we examined how different alignment methods affect the phylogenetic trees that are inferred from the alignments. We used simulations to determine how alignment errors can lead to systematic biases that affect phylogenetic inference from those sequences. We compared four approaches to sequence alignment: progressive pairwise alignment, simultaneous multiple alignment of sequence fragments, local pairwise alignment and direct optimization. When taking into account branch support, implied alignments produced by direct optimization were found to show the most extreme behaviour (based on the alignment programs for which nearly equivalent alignment parameters could be set) in that they provided the strongest support for the correct tree in the simulations in which it was easy to resolve the correct tree and the strongest support for the incorrect tree in our long‐branch‐attraction simulations. When applied to alignment‐sensitive process partitions with different histories, direct optimization showed the strongest mutual influence between the process partitions when they were aligned and phylogenetically analysed together, which makes detecting recombination more difficult. Simultaneous alignment performed well relative to direct optimization and progressive pairwise alignment across all simulations. Rather than relying upon methods that integrate alignment and tree search into a single step without accounting for alignment uncertainty, as with implied alignments, we suggest that simultaneous alignment using the similarity criterion, within the context of information available on biological processes and function, be applied whenever possible for sequence‐based phylogenetic analyses.  相似文献   

5.
In a case study of fungi of the class Sordariomycetes, we evaluated the effect of multiple sequence alignment (MSA) on the reliability of the phylogenetic trees, topology and confidence of major phylogenetic clades. We compared two main approaches for constructing MSA based on (1) the knowledge of the secondary (2D) structure of ribosomal RNA (rRNA) genes, and (2) automatic construction of MSA by four alignment programs characterized by different algorithms and evaluation methods, CLUSTAL, MAFFT, MUSCLE, and SAM. In the primary fungal sequences of the two functional rRNA genes, the nuclear small and large ribosomal subunits (18 S and 28 S), we identified four and six, respectively, highly variable regions, which correspond mainly to hairpin loops in the 2D structure. These loops are often positioned in expansion segments, which are missing or are not completely developed in the Archaeal and Eubacterial kingdoms. Proper sorting of these sites was a key for constructing an accurate MSA. We utilized DNA sequences from 28 S as an example for one-gene analysis. Five different MSAs were created and analyzed with maximum parsimony and maximum likelihood methods. The phylogenies inferred from the alignments improved with 2D structure with identified homologous segments, and those constructed using the MAFFT alignment program, with all highly variable regions included, provided the most reliable phylograms with higher bootstrap support for the majority of clades. We illustrate and provide examples demonstrating that re-evaluating ambiguous positions in the consensus sequences using 2D structure and covariance is a promising means in order to improve the quality and reliability of sequence alignments.  相似文献   

6.
Multiple Sequence Alignment (MSA) methods are typically benchmarked on sets of reference alignments. The quality of the alignment can then be represented by the sum-of-pairs (SP) or column (CS) scores, which measure the agreement between a reference and corresponding query alignment. Both the SP and CS scores treat mismatches between a query and reference alignment as equally bad, and do not take the separation into account between two amino acids in the query alignment, that should have been matched according to the reference alignment. This is significant since the magnitude of alignment shifts is often of relevance in biological analyses, including homology modeling and MSA refinement/manual alignment editing. In this study we develop a new alignment benchmark scoring scheme, SPdist, that takes the degree of discordance of mismatches into account by measuring the sequence distance between mismatched residue pairs in the query alignment. Using this new score along with the standard SP score, we investigate the discriminatory behavior of the new score by assessing how well six different MSA methods perform with respect to BAliBASE reference alignments. The SP score and the SPdist score yield very similar outcomes when the reference and query alignments are close. However, for more divergent reference alignments the SPdist score is able to distinguish between methods that keep alignments approximately close to the reference and those exhibiting larger shifts. We observed that by using SPdist together with SP scoring we were able to better delineate the alignment quality difference between alternative MSA methods. With a case study we exemplify why it is important, from a biological perspective, to consider the separation of mismatches. The SPdist scoring scheme has been implemented in the VerAlign web server (http://www.ibi.vu.nl/programs/veralignwww/). The code for calculating SPdist score is also available upon request.  相似文献   

7.

Background  

While most multiple sequence alignment programs expect that all or most of their input is known to be homologous, and penalise insertions and deletions, this is not a reasonable assumption for non-coding DNA, which is much less strongly conserved than protein-coding genes. Arguing that the goal of sequence alignment should be the detection of homology and not similarity, we incorporate an evolutionary model into a previously published multiple sequence alignment program for non-coding DNA, Sigma, as a sensitive likelihood-based way to assess the significance of alignments. Version 1 of Sigma was successful in eliminating spurious alignments but exhibited relatively poor sensitivity on synthetic data. Sigma 1 used a p-value (the probability under the "null hypothesis" of non-homology) to assess the significance of alignments, and, optionally, a background model that captured short-range genomic correlations. Sigma version 2, described here, retains these features, but calculates the p-value using a sophisticated evolutionary model that we describe here, and also allows for a transition matrix for different substitution rates from and to different nucleotides. Our evolutionary model takes separate account of mutation and fixation, and can be extended to allow for locally differing functional constraints on sequence.  相似文献   

8.
Improvements to resampling measures of group support   总被引:4,自引:0,他引:4  
Several aspects of current resampling methods to assess group support are reviewed. When the characters have different prior weights or some state transformation costs are different, the frequencies under either bootstrapping or jackknifing can be distorted, producing either under‐ or overestimations of the actual group support. This is avoided by symmetric resampling, where the probability p of increasing the weight of a character equals the probability of decreasing it. Problems with interpreting absolute group frequencies as a measure of the support are discussed; group support does not necessarily vary with the frequency itself, since in some cases groups with positive support may have much lower frequencies than groups with no support at all. Three possible solutions for this problem are suggested. The first is measuring the support as the difference in frequency between the group and its most frequent contradictory group. The second is calculating frequencies for values of p below the threshold under which the frequency ranks the groups in the right order of support (this threshold may vary from data set to data set). The third is estimating the support by using the slope of the frequency as a function of different (low) values of p; when p is low, groups with actual support have negative slopes (closer to 0 when the support is higher), and groups with no support have positive slopes (larger when evidence for and against the group is more abundant).  相似文献   

9.
Both center-of-pressure (CP) displacements under each foot and relative body-weight distribution intervene in the production of resultant CP movements. To better understand their respective involvement, a protocol was set up for young healthy individuals consisting in standing on a double seesaw, favoring pitch motions and laying on a dual-force platform. The postural control effects induced by two types of asymmetry, weight-bearing and the CP movement patterns, were investigated. These asymmetries were achieved by associating two seesaws with two different lengths for the radii of the ridges and by requiring specific body-weight distributions. The results indicate that the postural strategies, aimed at controlling anteroposterior sway, are related to the subjects’ capacity to minimize the CP displacements under the less stable support, whatever load is applied. In contrast, the degree of involvement of the more stable support must be viewed as a complement used to secure the appropriate motor output, i.e., the resultant CP movements. Within this objective, both the applied load and the CP amplitudes under the more stable support are taken into account. These data provide additional insights into the compensatory mechanisms between the interactions between the two feet, which are used to produce the adequate resultant CP movements and therefore upright stance control. The specificity of the double seesaw that can induce asymmetric CP patterns and/or asymmetric body-weight distribution makes it a legitimate contender to be used as a rehabilitation device for patients with neurological and/or traumatic diseases.  相似文献   

10.
Highly accurate estimation of phylogenetic trees for large data sets is difficult, in part because multiple sequence alignments must be accurate for phylogeny estimation methods to be accurate. Coestimation of alignments and trees has been attempted but currently only SATé estimates reasonably accurate trees and alignments for large data sets in practical time frames (Liu K., Raghavan S., Nelesen S., Linder C.R., Warnow T. 2009b. Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees. Science. 324:1561-1564). Here, we present a modification to the original SATé algorithm that improves upon SATé (which we now call SATé-I) in terms of speed and of phylogenetic and alignment accuracy. SATé-II uses a different divide-and-conquer strategy than SATé-I and so produces smaller more closely related subsets than SATé-I; as a result, SATé-II produces more accurate alignments and trees, can analyze larger data sets, and runs more efficiently than SATé-I. Generally, SATé is a metamethod that takes an existing multiple sequence alignment method as an input parameter and boosts the quality of that alignment method. SATé-II-boosted alignment methods are significantly more accurate than their unboosted versions, and trees based upon these improved alignments are more accurate than trees based upon the original alignments. Because SATé-I used maximum likelihood (ML) methods that treat gaps as missing data to estimate trees and because we found a correlation between the quality of tree/alignment pairs and ML scores, we explored the degree to which SATé's performance depends on using ML with gaps treated as missing data to determine the best tree/alignment pair. We present two lines of evidence that using ML with gaps treated as missing data to optimize the alignment and tree produces very poor results. First, we show that the optimization problem where a set of unaligned DNA sequences is given and the output is the tree and alignment of those sequences that maximize likelihood under the Jukes-Cantor model is uninformative in the worst possible sense. For all inputs, all trees optimize the likelihood score. Second, we show that a greedy heuristic that uses GTR+Gamma ML to optimize the alignment and the tree can produce very poor alignments and trees. Therefore, the excellent performance of SATé-II and SATé-I is not because ML is used as an optimization criterion for choosing the best tree/alignment pair but rather due to the particular divide-and-conquer realignment techniques employed.  相似文献   

11.

Background

The analysis of RNA sequences, once a small niche field for a small collection of scientists whose primary emphasis was the structure and function of a few RNA molecules, has grown most significantly with the realizations that 1) RNA is implicated in many more functions within the cell, and 2) the analysis of ribosomal RNA sequences is revealing more about the microbial ecology within all biological and environmental systems. The accurate and rapid alignment of these RNA sequences is essential to decipher the maximum amount of information from this data.

Methods

Two computer systems that utilize the Gutell lab's RNA Comparative Analysis Database (rCAD) were developed to align sequences to an existing template alignment available at the Gutell lab's Comparative RNA Web (CRW) Site. Multiple dimensions of cross-indexed information are contained within the relational database - rCAD, including sequence alignments, the NCBI phylogenetic tree, and comparative secondary structure information for each aligned sequence. The first program, CRWAlign-1 creates a phylogenetic-based sequence profile for each column in the alignment. The second program, CRWAlign-2 creates a profile based on phylogenetic, secondary structure, and sequence information. Both programs utilize their profiles to align new sequences into the template alignment.

Results

The accuracies of the two CRWAlign programs were compared with the best template-based rRNA alignment programs and the best de-novo alignment programs. We have compared our programs with a total of eight alternative alignment methods on different sets of 16S rRNA alignments with sequence percent identities ranging from 50% to 100%. Both CRWAlign programs were superior to these other programs in accuracy and speed.

Conclusions

Both CRWAlign programs can be used to align the very extensive amount of RNA sequencing that is generated due to the rapid next-generation sequencing technology. This latter technology is augmenting the new paradigm that RNA is intimately implicated in a significant number of functions within the cell. In addition, the use of bacterial 16S rRNA sequencing in the identification of the microbiome in many different environmental systems creates a need for rapid and highly accurate alignment of bacterial 16S rRNA sequences.
  相似文献   

12.
The phylogenetic position of Annelida as well as its ingroup relationships are a matter of ongoing debate. A molecular phylogenetic study of sedentary polychaete relationships was conducted based on 70 sequences of 18S rRNA, including unpublished sequences of 18 polychaete species. The data set was analysed with maximum parsimony and maximum likelihood methods. Clade robustness was estimated by parsimony-bootstrapping and jackknifing, decay index, and clade support, as well as a posteriori probability tests using Bayesian inference. Irrespective of the applied method, some traditional sedentary polychaete taxa, such as Cirratulidae, Opheliidae, Orbiniidae, Siboglinidae and Spionidae, were recovered by our phylogenetic reconstruction. A close relationship between Orbiniidae and Questa received a particularly strong support. Echiura appears to be a polychaete ingroup taxon which is closely related to Dasybranchus (Capitellidae). As in previous molecular analyses, no support was found for the monophyly of Annelida nor for that of Polychaeta. However, we suggest that an increase in taxon sampling may yield additional resolution in the reconstruction of polychaete ingroup phylogeny, although the difficulties in reconstructing the basal phylogenetic relationships within Annelida may be due to their rapid radiation.  相似文献   

13.
PALS db: Putative Alternative Splicing database   总被引:6,自引:0,他引:6       下载免费PDF全文
PALS db is a collection of Putative Alternative Splicing information from 19 936 human UniGene clusters and 16 615 mouse UniGene clusters. Alternative splicing (AS) sites were predicted by using the longest messenger RNA (mRNA) sequence in each UniGene cluster as the reference sequence. This sequence was aligned with related sequences in UniGene and dbEST to reveal the AS. This information was presented with six features: (i) literature aliases were used to improve the result of a gene name search; (ii) the quality of a prediction can be easily judged from the color-coded similarity and the scaled length of an alignment; (iii) we have clustered those EST sequences that support the same AS site together to enhance the users’ confidence on a prediction; (iv) the users can also set up the alignment criteria interactively to recover false negatives; (v) tissue distribution can be displayed by placing the mouse cursor over an alignment; (vi) gene features will be analyzed at foreign sites by submitting the selected mRNA or its encoded protein as a query. Using these features, the users cannot only discover putative AS sites in silico, but also make new observations by combining AS information with tissue distributions or with gene features. PALS db is available at http://palsdb.ym.edu.tw/.  相似文献   

14.
The conflicts over sex allocation and male production in insect societies have long served as an important test bed for Hamilton''s theory of inclusive fitness, but have for the most part been considered separately. Here, we develop new coevolutionary models to examine the interaction between these two conflicts and demonstrate that sex ratio and colony productivity costs of worker reproduction can lead to vastly different outcomes even in species that show no variation in their relatedness structure. Empirical data on worker-produced males in eight species of Melipona bees support the predictions from a model that takes into account the demographic details of colony growth and reproduction. Overall, these models contribute significantly to explaining behavioural variation that previous theories could not account for.  相似文献   

15.
All popular algorithms of pair-wise alignment of protein primary structures (e.g. Smith-Waterman (SW), FASTA, BLAST, et al.) utilize only amino acid sequences. The SW-algorithm is the most accurate among them, i.e. it produces alignments that are most similar to the alignments obtained by superposition of protein 3D-structures. But even the SW-algorithm is unable to restore the 3D-based alignment if similarity of amino acid sequences (%id) is below 30%. We have proposed a novel alignment method that explicitly takes into account the secondary structure of the compared proteins. We have shown that it creates significantly more accurate alignments compared to SW-algorithm. In particular, for sequences with %id < 30% the average accuracy of the new method is 58% compared to 35% for SW-algorithm (the accuracy of an algorithmic sequence alignment is the part of restored position of a "golden standard" alignment obtained by superposition of corresponding 3D-structures). The accuracy of the proposed method is approximately identical both for experimental, and for theoretically predicted secondary structures. Thus the method can be applied for alignment of protein sequences even if protein 3D-structure is unknown. The program is available at ftp://194.149.64.196/STRUSWER/.  相似文献   

16.
The 60 000 described species of Cyclorrhapha are characterized by an unusual diversity in larval life‐history traits, which range from saprophagy over phytophagy to parasitism and predation. However, the direction of evolutionary change between the different modes remains unclear. Here, we use the Scathophagidae (Diptera) for reconstructing the direction of change in this relatively small family (≈ 250 spp.) whose larval habits mirror the diversity in natural history found in Cyclorrhapha. We subjected a molecular data set for 63 species (22 genera) and DNA sequences from seven genes (12S, 16S, Cytb, COI, 28S, Ef1‐alfa, Pol II) to an extensive sensitivity analysis and compare the performance of three different alignment strategies (manual, Clustal, POY). We find that the default Clustal alignment performs worst as judged by character incongruence, topological congruence and branch support. For this alignment, scoring indels as a fifth character state worsens character incongruence and topological congruence. However, manual alignment and direct optimization perform similarly well and yield near‐identical trees, although branch support is lower for the direct‐optimization trees. All three alignment techniques favor the upweighting of transversion. We furthermore confirm the independence of the concepts “node support” and “node stability” by documenting several cases of poorly supported nodes being very stable and cases of well supported nodes being unstable. We confirm the monophyly of the Scathophagidae, its two constituent subfamilies, and most genera. We demonstrate that phytophagy in the form of leaf mining is the ancestral larval feeding habit for Scathophagidae. From phytophagy, two shifts to saprophagy and one shift to predation has occurred while a second origin of predation is from a saprophagous ancestor. © The Willi Hennig Society 2006.  相似文献   

17.
Ceruloplasmin (CP) is essential for brain iron homeostasis. However, its precise function in brain iron transport has not been definitely determined. In this study, we investigated the effects of soluble CP on iron influx and efflux in primary neuronal culture from the midbrain (the substantia nigra and striatum) and the hippocampus. Our data showed that low concentrations of CP (2, 4, 8 microg/ml) can promote iron influx into iron-deficient neurons, but not the neurons with normal iron status. The same concentrations of CP had no effect on iron efflux from iron-sufficient and normal-iron neurons. Contrary to our expectation, we did not find any regional difference in the effects of CP on iron influx as well as efflux in neurons. The changes in quenching (iron influx) and also dequenching (iron efflux) of intracellular fluorescence, induced by the addition of CP with iron, in the midbrain neurons were no different from those in the hippocampus neurons. The data showed that soluble CP has a role in iron uptake by iron-deficient brain neurons under our experimental conditions. The physiological significance of the results forms the focus for future work.  相似文献   

18.
The jackknife technique was tested by fitting a two-exponential function to the time course of disappearance of radioactivity from the area of a wheat leaf that had been fed 14CO2. The function was fitted by both unweighted and weighted least squares, first without and then with the jackknife. Weighting altered the estimates of the function's parameters, but jackknifing did not. Hence jackknifing did not remove any of the bias introduced by incorrect weighting. The confidence limits of the parameters calculated by jackknifing were greater than those estimated from the variance-covariance matrix of the regression, but similar to those derived from replicate experiments. The jackknife also allowed confidence limits for the rate constants and transit time of the underlying two-compartment model to be derived.  相似文献   

19.
Phylogenetic analysis of large datasets using complex nucleotide substitution models under a maximum likelihood framework can be computationally infeasible, especially when attempting to infer confidence values by way of nonparametric bootstrapping. Recent developments in phylogenetics suggest the computational burden can be reduced by using Bayesian methods of phylogenetic inference. However, few empirical phylogenetic studies exist that explore the efficiency of Bayesian analysis of large datasets. To this end, we conducted an extensive phylogenetic analysis of the wide-ranging and geographically variable Eastern Fence Lizard (Sceloporus undulatus). Maximum parsimony, maximum likelihood, and Bayesian phylogenetic analyses were performed on a combined mitochondrial DNA dataset (12S and 16S rRNA, ND1 protein-coding gene, and associated tRNA; 3,688 bp total) for 56 populations of S. undulatus (78 total terminals including other S. undulatus group species and outgroups). Maximum parsimony analysis resulted in numerous equally parsimonious trees (82,646 from equally weighted parsimony and 335 from weighted parsimony). The majority rule consensus tree derived from the Bayesian analysis was topologically identical to the single best phylogeny inferred from the maximum likelihood analysis, but required approximately 80% less computational time. The mtDNA data provide strong support for the monophyly of the S. undulatus group and the paraphyly of "S. undulatus" with respect to S. belli, S. cautus, and S. woodi. Parallel evolution of ecomorphs within "S. undulatus" has masked the actual number of species within this group. This evidence, along with convincing patterns of phylogeographic differentiation suggests "S. undulatus" represents at least four lineages that should be recognized as evolutionary species.  相似文献   

20.
Recent multi-gene phylogenetic analyses of plastid-encoded genes have recovered a robust monophyly of chlorophyll-c containing plastids (Chl-c palstids) in cryptophytes, haptophytes, photosynthetic stramenopiles, and dinoflagellates. However, all the plastid multi-gene phylogenies published to date utilized the "linked" model, which ignores the heterogeneity of sequence evolution across genes in alignments. Both empirical and simulation studies show that, compared to the linked model, the "unlinked" model, which accounts for gene-specific evolution, can greatly improve multi-gene estimations. Here we newly sequenced 46 genes of Chl-c plastids, and examined the Chl-c plastid evolution by multi-gene analyses under the unlinked model. Unexpectedly, Chl-c plastid monophyly received only low to medium support in our analyses based on multi-gene data sets including up to 4829 alignment positions. Although we systematically surveyed and excluded the genes that could mislead estimation, the (inconclusive) support for Chl-c plastid monophyly was not significantly altered. We conclude that the estimates from the current plastid-encoded gene data are insufficient to resolve Chl-c plastid evolution with confidence, and are highly affected by genes subjected to the analyses, and methods for tree reconstruction applied. Thus, future data analyses of larger multi-gene data sets, preferentially under the unlinked model, are required to conclusively understand Chl-c plastid evolution.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号