首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background  

Phylogenetic methods which do not rely on multiple sequence alignments are important tools in inferring trees directly from completely sequenced genomes. Here, we extend the recently described Genome BLAST Distance Phylogeny (GBDP) strategy to compute phylogenetic trees from all completely sequenced plastid genomes currently available and from a selection of mitochondrial genomes representing the major eukaryotic lineages. BLASTN, TBLASTX, or combinations of both are used to locate high-scoring segment pairs (HSPs) between two sequences from which pairwise similarities and distances are computed in different ways resulting in a total of 96 GBDP variants. The suitability of these distance formulae for phylogeny reconstruction is directly estimated by computing a recently described measure of "treelikeness", the so-called δ value, from the respective distance matrices. Additionally, we compare the trees inferred from these matrices using UPGMA, NJ, BIONJ, FastME, or STC, respectively, with the NCBI taxonomy tree of the taxa under study.  相似文献   

2.

Background  

Phylogenetic trees are an important tool for representing evolutionary relationships among organisms. In a phylogram or chronogram, the ordering of taxa is not considered meaningful, since complete topological information is given by the branching order and length of the branches, which are represented in the root-to-node direction. We apply a novel method based on a (λ + μ)-Evolutionary Algorithm to give meaning to the order of taxa in a phylogeny. This method applies random swaps between two taxa connected to the same node, without changing the topology of the tree. The evaluation of a new tree is based on different distance matrices, representing non-phylogenetic information such as other types of genetic distance, geographic distance, or combinations of these. To test our method we use published trees of Vesicular stomatitis virus, West Nile virus and Rice yellow mottle virus.  相似文献   

3.

Background  

Parsimony methods are widely used in molecular evolution to estimate the most plausible phylogeny for a set of characters. Sankoff parsimony determines the minimum number of changes required in a given phylogeny when a cost is associated to transitions between character states. Although optimizations exist to reduce the computations in the number of taxa, the original algorithm takes time O(n 2) in the number of states, making it impractical for large values of n.  相似文献   

4.
In this article the question of reconstructing a phylogeny from additive distance data is addressed. Previous algorithms used the complete distance matrix of then OTUs (Operational Taxonomic Unit), that corresponds to the tips of the tree. This usedO(n 2) computing time. It is shown that this is wasteful for biologically reasonable trees. If the tree has internal nodes with degrees that are bounded onO(n*log(n)) algorithm is possible. It is also shown if the nodes can have unbounded degrees the problem hasn 2 as lower bound.  相似文献   

5.
The accurate reconstruction of phylogenies from short molecular sequences is an important problem in computational biology. Recent work has highlighted deep connections between sequence-length requirements for high-probability phylogeny reconstruction and the related problem of the estimation of ancestral sequences. In Daskalakis et al. (in Probab. Theory Relat. Fields 2010), building on the work of Mossel (Trans. Am. Math. Soc. 356(6):2379–2404, 2004), a tight sequence-length requirement was obtained for the simple CFN model of substitution, that is, the case of a two-state symmetric rate matrix Q. In particular the required sequence length for high-probability reconstruction was shown to undergo a sharp transition (from O(log n) to poly(n), where n is the number of leaves) at the “critical” branch length g ML(Q) (if it exists) of the ancestral reconstruction problem defined roughly as follows: below g ML(Q) the sequence at the root can be accurately estimated from sequences at the leaves on deep trees, whereas above g ML(Q) information decays exponentially quickly down the tree.  相似文献   

6.

Background  

Distance matrix methods constitute a major family of phylogenetic estimation methods, and the minimum evolution (ME) principle (aiming at recovering the phylogeny with shortest length) is one of the most commonly used optimality criteria for estimating phylogenetic trees. The major difficulty for its application is that the number of possible phylogenies grows exponentially with the number of taxa analyzed and the minimum evolution principle is known to belong to the -hard class of problems.  相似文献   

7.

Background  

Distance-based methods are popular for reconstructing evolutionary trees thanks to their speed and generality. A number of methods exist for estimating distances from sequence alignments, which often involves some sort of correction for multiple substitutions. The problem is to accurately estimate the number of true substitutions given an observed alignment. So far, the most accurate protein distance estimators have looked for the optimal matrix in a series of transition probability matrices, e.g. the Dayhoff series. The evolutionary distance between two aligned sequences is here estimated as the evolutionary distance of the optimal matrix. The optimal matrix can be found either by an iterative search for the Maximum Likelihood matrix, or by integration to find the Expected Distance. As a consequence, these methods are more complex to implement and computationally heavier than correction-based methods. Another problem is that the result may vary substantially depending on the evolutionary model used for the matrices. An ideal distance estimator should produce consistent and accurate distances independent of the evolutionary model used.  相似文献   

8.

Background  

The North American Agalinis are representatives of a taxonomically difficult group that has been subject to extensive taxonomic revision from species level through higher sub-generic designations (e.g., subsections and sections). Previous presentations of relationships have been ambiguous and have not conformed to modern phylogenetic standards (e.g., were not presented as phylogenetic trees). Agalinis contains a large number of putatively rare taxa that have some degree of taxonomic uncertainty. We used DNA sequence data from three chloroplast genes to examine phylogenetic relationships among sections within the genus Agalinis Raf. (= Gerardia), and between Agalinis and closely related genera within Orobanchaceae.  相似文献   

9.
Phylogeny estimation is extremely crucial in the study of molecular evolution. The increase in the amount of available genomic data facilitates phylogeny estimation from multilocus sequence data. Although maximum likelihood and Bayesian methods are available for phylogeny reconstruction using multilocus sequence data, these methods require heavy computation, and their application is limited to the analysis of a moderate number of genes and taxa. Distance matrix methods present suitable alternatives for analyzing huge amounts of sequence data. However, the manner in which distance methods can be applied to multilocus sequence data remains unknown. Here, we suggest new procedures to estimate molecular phylogeny using multilocus sequence data and evaluate its significance in the framework of the distance method. We found that concatenation of the multilocus sequence data may result in incorrect phylogeny estimation with an extremely high bootstrap probability (BP), which is due to incorrect estimation of the distances and intentional ignorance of the intergene variations. Therefore, we suggest that the distance matrices for multilocus sequence data be estimated separately and these matrices be subsequently combined to reconstruct phylogeny instead of phylogeny reconstruction using concatenated sequence data. To calculate the BPs of the reconstructed phylogeny, we suggest that 2-stage bootstrap procedures be adopted; in this, genes are resampled followed by resampling of the sequence columns within the resampled genes. By resampling the genes during calculation of BPs, intergene variations are properly considered. Via simulation studies and empirical data analysis, we demonstrate that our 2-stage bootstrap procedures are more suitable than the conventional bootstrap procedure that is adopted after sequence concatenation.  相似文献   

10.
A stress-responsive gene, yggG, was introduced into an l-phenylalanine producer, Escherichia coli AJ12741. In shake-flask culture, the yggG-containing recombinant strain (named AJ12741/pHYGG) produced 6.4 g l-phenylalanine l−1 at the end of culture and its yield on glucose was 0.16 g l-phenylalanine g glucose−1. These values are much higher than those of the original AJ12741 strain (3.7 g l-phenylalanine l−1 and 0.09 g l-phenylalanine g glucose−1, respectively). On the other hand, AJ12741/pHYGG strain produced only 4.5 g acetic acid l−1 and its yield on glucose was about a half of that of the AJ12741 culture. Analysis of gene expression revealed that in late growth phase, the expression levels of genes involved in acetic acid production (pta, ackA, and poxB) were relatively low in AJ12741/pHYGG cells. In particular, the level of poxB expression in AJ12741/pHYGG strains was one-seventh of that of the original strain. These results suggest that the formation of a bottleneck for acetic acid production brings about a metabolic flow favorable to l-phenylalanine synthesis in the recombinant strain over-expressing the yggG gene. Electronic supplementary material  The online version of this article (doi:) contains supplementary material, which is available to authorized users.  相似文献   

11.
An experimental phylogeny was constructed using bacteriophage T7 and a propagation protocol, in the presence of the mutagen N-methyl-N′-nitro-N′-nitrosoguanidine, based on Hillis et al. [Hillis, D.M., Bull, J.J., White, M.E., Badgett, M.R., Molineux, I.J., 1992. Experimental phylogenetics, generation of a known phylogeny. Science 255, 589–592]. The topology presented in this study has a considerable variation in branch lengths and is less symmetric than the one presented by Hillis et al. [Hillis, D.M., Bull, J.J., White, M.E., Badgett, M.R., Molineux, I.J., 1992. Experimental phylogenetics, generation of a known phylogeny. Science 255, 589–592]. These features are known to present additional difficulties to phylogenetic inference methods. The performance of several phylogenetic methods (conventional and less conventional) was tested using restriction site and nucleotide data. Only methods that encompassed a molecular clock or those based on sequence signatures recovered the true phylogeny. Nevertheless a likelihood ratio test rejected the hypothesis of the existence of a molecular clock when the whole sequence data set was considered. This fact or the particular substitution pattern (mainly G → A and C → T) may be related to the unexpected performance of distance methods based on sequence signatures. To test if the results could have been predicted by simulation studies we estimated the evolution parameters from the real phylogeny and used them to simulate evolution along the same tree (parametric bootstrap). We found that simulation could predict most but not all of the problems encountered by phylogenetic inference methods in the real phylogeny. Short interior branches may be more prone to error than predicted by theoretical studies.  相似文献   

12.
Purified epithelial brush border membrane vesicles (BBMV) were produced from the hepatopancreas of the Atlantic White shrimp, Litopeneaus setiferus, using standard methods originally developed for mammalian tissues and previously applied to other crustacean and echinoderm epithelia. These vesicles were used to study the cation dependency of sugar and amino acid transport across luminal membranes of hepatopancreatic epithelial cells. 3H-d-glucose uptake by BBMV against transient sugar concentration gradients occurred when either transmembrane sodium or potassium gradients were the only driving forces for sugar accumulation, suggesting the presence of a possible coupled transport system capable of using either cation. 3H-l-histidine transport was only stimulated by a transmembrane potassium gradient, while 3H-l-leucine uptake was enhanced by either a sodium or potassium gradient. These responses suggest the possible presence of a potassium-dependent transporter that accommodates either amino acid and a sodium-dependent system restricted only to l-leucine. Uptake of 3H-l-leucine was significantly stimulated (P < 0.05) by several metallic cations (e.g., Zn2+, Cu2+, Mn2+, Cd2+, or Co2+) at external pH values of 7.0 or 5.0 (internal pH 7.0), suggesting a potential synergistic role of the cations in the transmembrane transfer of amino acids. 3H-l-histidine influxes (15 suptakes) were hyperbolic functions of external [zinc] or [manganese], following Michaelis–Menten kinetics. The apparent affinity constant (e.g., K m) for manganese was an order of magnitude smaller (K m = 0.22 μM Mn) than that for zinc (K m = 1.80 μM Zn), while no significant difference (P > 0.05) occurred between their maximal transport velocities (e.g., J max). These results suggest that a number of cation-dependent nutrient transport systems occur on the shrimp brush border membrane and aid in the absorption of these important dietary elements.  相似文献   

13.

Background  

The most widely used multiple sequence alignment methods require sequences to be clustered as an initial step. Most sequence clustering methods require a full distance matrix to be computed between all pairs of sequences. This requires memory and time proportional to N 2 for N sequences. When N grows larger than 10,000 or so, this becomes increasingly prohibitive and can form a significant barrier to carrying out very large multiple alignments.  相似文献   

14.
Dissimilarity measures for (possibly weighted) phylogenetic trees based on the comparison of their vectors of path lengths between pairs of taxa, have been present in the systematics literature since the early seventies. For rooted phylogenetic trees, however, these vectors can only separate non-weighted binary trees, and therefore these dissimilarity measures are metrics only on this class of rooted phylogenetic trees. In this paper we overcome this problem, by splitting in a suitable way each path length between two taxa into two lengths. We prove that the resulting splitted path lengths matrices single out arbitrary rooted phylogenetic trees with nested taxa and arcs weighted in the set of positive real numbers. This allows the definition of metrics on this general class of rooted phylogenetic trees by comparing these matrices through metrics in spaces Mn(\mathbb R){\mathcal{M}_n(\mathbb {R})} of real-valued n × n matrices. We conclude this paper by establishing some basic facts about the metrics for non-weighted phylogenetic trees defined in this way using L p metrics on Mn(\mathbb R){\mathcal{M}_n(\mathbb {R})}, with ${p \in \mathbb {R}_{ >0 }}${p \in \mathbb {R}_{ >0 }}.  相似文献   

15.
Integrative and replicative plasmids for the expression driven by the P43 promoter and secretion of recombinant proteins in Bacillus subtilis were constructed. The plasmids named pInt and pRep respectively were tested for the production of recombinant human interferon gamma (rhIFN-γ). A synthetic hIFN-γ gene employing the optimized B. subtilis codon usage was fused with the Bacillus licheniformis α-amylase signal peptide (sp-amyL) encoding sequence. The integrative construct produced 2.5 ± 0.2 mg l−1 and the replicative system produced 20.3 ± 0.8 mg l−1 of total recombinant rhIFN-γ. The results showed that secretion of hIFN-γ was the bottleneck for the overexpression of mature rhIFN-γ by B. subtilis.  相似文献   

16.
A reliable phylogeny relating the major groups of Galliformes was sought in order to shed light on an unusual case of coupled amino acid replacements in the lysozymes c of these birds. The New World quail and the African guinea fowl share a unique trio of amino acids at three internal positions but have been separated phylogenetically by the majority of trees based on morphological characters. Alternative hypotheses based on molecular data have suggested an arrangement that would be more parsimonious with regard to the lysozyme data. The entire mitochondrial cytochrome b gene (1,143 bp) was amplified via the polymerase chain reaction (PCR) and sequenced for nine galliforms and a representative anseriform to provide DNA sequence data for a phylogenetic reconstruction. The mode and tempo of change in these sequences were analyzed to determine the characters most appropriate for phylogenetic reconstruction. Our results place the New World quail outside all other representative game birds except the cracids. Although in conflict with various morphological analyses, this finding is consistent with the results of DNA-DNA hybridization studies. A model to account for the coupled replacements in the lysozymes is presented. Our results also suggest a rapid but ancient radiation among the Galliformes such that the majority of cytochrome b sequence differences among taxa have accumulated on the terminal branches of the reconstructed phylogenetic trees.Deceased July 21, 1991 Correspondence to: J.R. Kornegay  相似文献   

17.
We examined relationships between fragrance and phylogeny using a number of approaches to coding fragrance data and comparing the hierarchical information in fragrance data with the phylogenetic signal in a DNA sequence data set. We first used distance analyses to determine which coding method(s) best distinguishes species while grouping conspecifics. Results suggest that interspecific differences in fragrance composition were maximized by coding as presence/absence of fragrance compounds and biosynthetic pathways rather than when quantitative information was also included. Useful systematic information came from both compounds and pathways and from fragrance emitted by both floral and vegetative tissues. The coding methods that emerged from the distance analyses as best distinguishing species were then adapted for use in phylogenetic analysis. Although hierarchical signal among fragrance data sets was congruent, this signal was highly incongruent with the phylogenetic signal in the DNA sequence data. Notably, topologies inferred from fragrance data sets were congruent with the DNA topology only in the most distal portions (e.g., sister group pairs or closely related species that had similar fragrance profiles were often recovered by analyses of fragrance). Examination of consistency and retention indices for individual fragrance compounds and pathways as optimized onto one of the most-parsimonious trees inferred from DNA data revealed that although most compounds were homoplastic, some compounds were perfectly congruent with the DNA phylogeny. In particular, compounds and pathways found in a few taxa were less homoplastic than those found in many taxa. Pathways that synthesize few volatiles also seem to have lower homoplasy than those that produce many. Although fragrance data as a whole may not be useful in phylogeny reconstruction, these data can provide additional support for clades reconstructed with other types of characters. Factors other than phylogeny, including pollinator interactions, also likely influence fragrance composition.  相似文献   

18.
The reconstruction of phylogenetic history is predicated on being able to accurately establish hypotheses of character homology, which involves sequence alignment for studies based on molecular sequence data. In an empirical study investigating nucleotide sequence alignment, we inferred phylogenetic trees for 43 species of the Apicomplexa and 3 of Dinozoa based on complete small-subunit rDNA sequences, using six different multiple-alignment procedures: manual alignment based on the secondary structure of the 18S rRNA molecule, and automated similarity-based alignment algorithms using the PileUp, ClustalW, TreeAlign, MALIGN, and SAM computer programs. Trees were constructed using neighboring-joining, weighted-parsimony, and maximum- likelihood methods. All of the multiple sequence alignment procedures yielded the same basic structure for the estimate of the phylogenetic relationship among the taxa, which presumably represents the underlying phylogenetic signal. However, the placement of many of the taxa was sensitive to the alignment procedure used; and the different alignments produced trees that were on average more dissimilar from each other than did the different tree-building methods used. The multiple alignments from the different procedures varied greatly in length, but aligned sequence length was not a good predictor of the similarity of the resulting phylogenetic trees. We also systematically varied the gap weights (the relative cost of inserting a new gap into a sequence or extending an already-existing gap) for the ClustalW program, and this produced alignments that were at least as different from each other as those produced by the different alignment algorithms. Furthermore, there was no combination of gap weights that produced the same tree as that from the structure alignment, in spite of the fact that many of the alignments were similar in length to the structure alignment. We also investigated the phylogenetic information content of the helical and nonhelical regions of the rDNA, and conclude that the helical regions are the most informative. We therefore conclude that many of the literature disagreements concerning the phylogeny of the Apicomplexa are probably based on differences in sequence alignment strategies rather than differences in data or tree-building methods.   相似文献   

19.
Phylogeny reconstruction is the process of inferring evolutionary relationships from molecular sequences, and methods that are expected to accurately reconstruct trees from sequences of reasonable length are highly desirable. To formalize this concept, the property of fast-convergence has been introduced to describe phylogeny reconstruction methods that, with high probability, recover the true tree from sequences that grow polynomially in the number of taxa n. While provably fast-converging methods have been developed, the neighbor-joining (NJ) algorithm of Saitou and Nei remains one of the most popular methods used in practice. This algorithm is known to converge for sequences that are exponential in n, but no lower bound for its convergence rate has been established. To address this theoretical question, we analyze the performance of the NJ algorithm on a type of phylogeny known as a 'caterpillar tree'. We find that, for sequences of polynomial length in the number of taxa n, the variability of the NJ criterion is sufficiently high that the algorithm is likely to fail even in the first step of the phylogeny reconstruction process, regardless of the degree of polynomial considered. This result demonstrates that, for general n-taxa trees, the exponential bound cannot be improved.  相似文献   

20.

Background  

The phylogeny of Eumalacostraca (Crustacea) remains elusive, despite over a century of interest. Recent morphological and molecular phylogenies appear highly incongruent, but this has not been assessed quantitatively. Moreover, 18S rRNA trees show striking branch length differences between species, accompanied by a conspicuous clustering of taxa with similar branch lengths. Surprisingly, previous research found no rate heterogeneity. Hitherto, no phylogenetic analysis of all major eumalacostracan taxa (orders) has either combined evidence from multiple loci, or combined molecular and morphological evidence.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号