首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 203 毫秒
1.
基于DNA分子标记数据构建系统进化树的新策略   总被引:7,自引:0,他引:7  
结合DPS软件和MEGA软件优点,进行DNA分子标记数据处理和系统发育树构建的新策略:首先使用DPS软件进行0,1数据系统聚类方法获得遗传距离矩阵,然后将此矩阵输入MEGA3,利用NJ或者UPGMA进行系统进化树的构建和树的优化。该方法操作简单,得到的树形美观。  相似文献   

2.
拓扑树间的通经拓扑距离   总被引:1,自引:1,他引:0  
给出了一种新的系统树间的拓扑距离,使用NJ,MP,UPGMA等3种方法对13种动物的线粒体中14个基因(含组合的)DNA序列数据进行系统树的构建,利用分割拓扑距离和本文给出的通经拓扑距离对这14种系统树这间及其与真树进行比较。结果显示,NJ法对获得已知树的有效率最高,MP法次之,UPGMA法最低。这14种DNA序列所构建的系统树与已知树的拓扑距离基本上是随其DNA序列长度增加而减小,但两者的相关系数并未达到显著水平,分割拓扑距离在总体上可反映树间的拓扑结构差异,但其测度精确度比通经拓扑距离要低。  相似文献   

3.
从斜纹夜蛾核型多角体病毒(Spodopteralituramulticapsidnucleopolyhedrovirus,SpltMNPV)日本分离株(C3)基因组中克隆了gp41基因。该基因编码区含993bp核苷酸,编码分子量为36.9kDa的多肽。将该基因克隆至原核表达载体pET28a,经IPTG诱导后在大肠杆菌BL21(DE3)中获得了表达。应用CLUSTAL程序分析表明,SpltMNPV日本株(C3)gp41的核苷酸序列和氨基酸序列与SpltMNPV中国G2株相似性最高,均达99.9%。用MEGA分别构建了基于gp41和ph的聚类分析图和分子进化树,发现它们具有相似的拓扑结构。将这两个基因序列结合在一起构建进化树,该树的结构与基于gp41的进化树相似。突变率分析显示gp41的突变率高于ph,这意味着在杆状病毒进化过程中,gp41和ph面临不同的选择压力。  相似文献   

4.
SpltMNPV日本分离株gp41的克隆表达及gp41和ph的进化分析   总被引:1,自引:0,他引:1  
从斜纹夜蛾核型多角体病毒(Spodopteralitura multicapsid nucleopolyhedrovirus,SpltMNPV)日本分离株(C3)基因组中克隆了gp41基因.该基因编码区含993bp核苷酸,编码分子量为36.9kDa的多肽.将该基因克隆至原核表达载体pET28a,经IPTG诱导后在大肠杆菌BL21(DE3)中获得了表达.应用CLUSTAL程序分析表明,SpltMNPV日本株(C3)gp41的核苷酸序列和氨基酸序列与SpltMNPV中国G2株相似性最高,均达99.9%.用MEGA分别构建了基于gp41和ph的聚类分析图和分子进化树,发现它们具有相似的拓扑结构.将这两个基因序列结合在一起构建进化树,该树的结构与基于gp41的进化树相似.突变率分析显示gp41的突变率高于ph,这意味着在杆状病毒进化过程中,gp41和ph面临不同的选择压力.  相似文献   

5.
陈兆斌 《生物信息学》2013,11(4):317-320
这篇文章要讨论的拽线法(DL)是贪婪算法的一种。和Fitch—Margoliash(FM)一样,DL也是基于距离矩阵构建系统发育树,但是和FM算法相比,DL具有低复杂度、较高的容错性和准确度高的优点。当存在误差时,DL算法只是加大了不在同一个父节点下的基因序列的距离,但能够准确的判断序列的亲缘关系,进而得到完美的进化树拓扑结构;相比之下,FM算法让各个基因序列间的距离均摊了这种误差,从而有可能将本应该具有相同父节点的基因序列分到不同的分支。  相似文献   

6.
山羊磷脂氢谷胱甘肽过氧化物酶生物信息学分析   总被引:1,自引:0,他引:1  
旨在克隆山羊磷脂氢谷胱甘肽过氧化物酶(PHGPx)基因cDNA全序列并进行序列分析.提取山羊睾丸中总RNA,利用RT-PCR和RACE技术,扩增山羊PHGPx基因cDNA序列并对其生物信息学进行分析.结果表明,山羊PHGPx基因cDNA序列全长844 bp,共编码199个氨基酸;山羊与牛、猪、人和小鼠的氨基酸序列同源性均大于90%;山羊PHGPx蛋白二级结构功能区域属谷胱甘肽过氧化物酶家族,预测23-24和28-29氨基酸位点有潜在的信号肽位点;UPGMA算法构建该物种间分子系统进化树,山羊与牛先聚为一类,再分别与猪、鼠、人、鸡聚类,最后与蜜蜂聚类,与物种动物学分类基本吻合.首次克隆了山羊PHGPx基因,具有GSH-Px家族典型特征,研究结果将为PHGPx基因表达分子调控研究提供一定的理论依据.  相似文献   

7.
树鼩神经肽Y的分子克隆及其灵长类类似物的同源性比较   总被引:1,自引:0,他引:1  
Dong L  Lv LB  Lai R 《动物学研究》2012,33(1):75-78
树鼩由于与灵长类动物有较密切的亲缘关系和其个体小,以及繁殖周期短等特性而倍受关注,尤其是作为医用实验动物的研究,近年来已受到越来越多的重视,但树鼩的分类地位还一直有所争论。该研究从树鼩脑cDNA文库中克隆得到编码树鼩神经肽Y(neuropeptide Y,NPY)前体序列,序列比对发现该序列与灵长类NPY序列同源性高达96.9%。将该序列与GenBank数据库中其他物种的NPY序列构建系统进化树,发现树鼩与灵长类处于同一分支。该研究结果揭示了树鼩与灵长类较近的亲缘关系。  相似文献   

8.
树鼩IL-2全长编码序列的克隆及分子特征分析   总被引:1,自引:0,他引:1  
树鼩作为多种人类疾病模型已受到广泛关注,而免疫因子对于树鼩模型评价至关重要,但目前对其白细胞介素-2(IL-2)的研究鲜有报道。该实验以经ConA(concanavalin)诱导培养的树鼩淋巴细胞总RNA为模板,RT-PCR克隆出465bp的树鼩IL-2全长编码序列,并采用ClustalW软件分析其序列和分子特征。结果表明树鼩IL-2cDNA编码一个由154个氨基酸组成的蛋白质,其cDNA及氨基酸序列与人的同源性分别为93%及80%,且其整体结构与人IL-2相似。MEGA5.0软件构建的进化树表明,树鼩与人及恒河猴的亲缘关系较近。Pymol软件对树鼩和人IL-2氨基酸序列进行的三维结构模建表明,两者的IL-2分子三维空间结构基本相似,表面大部分区域所带电荷相同,但在某些区域差异较大,且树鼩多出一个糖基化位点,这些差异对抗体的结合可能存在影响。该研究为今后树鼩IL-2单克隆抗体的制备及功能研究奠定了基础。  相似文献   

9.
基于DNA序列K-tuple分布的一种非序列比对分析   总被引:1,自引:0,他引:1  
沈娟  吴文武  解小莉  郭满才  袁志发 《遗传》2010,32(6):606-612
文章在基因组K-tuple分布的基础上, 给出了一种推测生物序列差异大小的非序列比对方法。该方法可用于衡量真实DNA序列和随机重排序列在K-tuple分布上的差异。将此方法用于构建含有26种胎盘哺乳动物线粒体全基因组的系统树时, 随着K的增大, 系统树的分类效果与生物学一致公认的结果愈加匹配。结果表明, 用此方法构建的系统进化树比用其他非序列比对分析方法构建的更加合理。  相似文献   

10.
现有92株芜菁花叶病毒(TuMV)的全基因组序列已在GenBank报道,据分析报道其中58株不含重组序列。利用系统聚类法对92株TuMV的全基因组序列和58株TuMV全基因组序列的相对密码子频率RSCU值进行聚类分析。同时利用系统发育分析方法分析了这92株和58株TuMV全基因组序列。结果发现,92株芜菁花叶病毒株的密码子偏性聚类树与其系统进化树的一致度很低;而不含重组序列的58株芜菁花叶病毒株的密码子偏性聚类树与其系统进化树的一致度却非常高,且与寄生宿主类型基本对应。这表明在不存在重组的情况下,TuMV密码子频率的偏性可能是宿主内的一种选择压力,影响TuMV基因组的点突变进化方向,促使TuMV适应宿主内环境。  相似文献   

11.
12.
Tie trees generated by distance methods of phylogenetic reconstruction   总被引:2,自引:0,他引:2  
In examining genetic data in recent publications, Backeljau et al. showed cases in which two or more different trees (tie trees) were constructed from a single data set for the neighbor-joining (NJ) method and the unweighted pair group method with arithmetic mean (UPGMA). However, it is still unclear how often and under what conditions tie trees are generated. Therefore, I examined these problems by computer simulation. Examination of cases in which tie trees occur shows that tie trees can appear when no substitutions occur along some interior branch(es) on a tree. However, even when some substitutions occur along interior branches, tie trees can appear by chance if parallel or backward substitutions occur at some sites. The simulation results showed that tie trees occur relatively frequently for sequences with low divergence levels or with small numbers of sites. For such data, UPGMA sometimes produced tie trees quite frequently, whereas tie trees for the NJ method were generally rare. In the simulation, bootstrap values for clusters (tie clusters) that differed among tie trees were mostly low (< 60%). With a small probability, relatively high bootstrap values (at most 70%-80%) appeared for tie clusters. The bias of the bootstrap values caused by an input order of sequence can be avoided if one of the different paths in the cycles of making an NJ or UPGMA tree is chosen at random in each bootstrap replication.   相似文献   

13.
Molecular sequences provide a rich source of data for inferring the phylogenetic relationships among species. However, recent work indicates that even an accurate multiple alignment of a large sequence set may yield an incorrect phylogeny and that the quality of the phylogenetic tree improves when the input consists only of the highly conserved, motif regions of the alignment. This work introduces two methods of producing multiple alignments that include only the conserved regions of the initial alignment. The first method retains conserved motifs, whereas the second retains individual conserved sites in the initial alignment. Using parsimony analysis on a mitochondrial data set containing 19 species among which the phylogenetic relationships are widely accepted, both conserved alignment methods produce better phylogenetic trees than the complete alignment. Unlike any of the 19 inference methods used before to analyze this data, both methods produce trees that are completely consistent with the known phylogeny. The motif-based method employs far fewer alignment sites for comparable error rates. For a larger data set containing mitochondrial sequences from 39 species, the site-based method produces a phylogenetic tree that is largely consistent with known phylogenetic relationships and suggests several novel placements. J. Exp. Zool. ( Mol. Dev. Evol.) 285:128-139, 1999.  相似文献   

14.
The general problem of representing collections of trees as a single graph has led to many tree summary techniques. Many consensus approaches take sets of trees (either inferred as separate gene trees or gleaned from the posterior of a Bayesian analysis) and produce a single “best” tree. In scenarios where horizontal gene transfer or hybridization are suspected, networks may be preferred, which allow for nodes to have two parents, representing the fusion of lineages. One such construct is the cluster union network (CUN), which is constructed using the union of all clusters in the input trees. The CUN has a number of mathematically desirable properties, but can also present edges not observed in the input trees. In this paper we define a new network construction, the edge union network (EUN), which displays edges if and only if they are contained in the input trees. We also demonstrate that this object can be constructed with polynomial time complexity given arbitrary phylogenetic input trees, and so can be used in conjunction with network analysis techniques for further phylogenetic hypothesis testing.  相似文献   

15.
Accuracy of phylogenetic trees estimated from DNA sequence data   总被引:4,自引:1,他引:3  
The relative merits of four different tree-making methods in obtaining the correct topology were studied by using computer simulation. The methods studied were the unweighted pair-group method with arithmetic mean (UPGMA), Fitch and Margoliash's (FM) method, thd distance Wagner (DW) method, and Tateno et al.'s modified Farris (MF) method. An ancestral DNA sequence was assumed to evolve into eight sequences following a given model tree. Both constant and varying rates of nucleotide substitution were considered. Once the DNA sequences for the eight extant species were obtained, phylogenetic trees were constructed by using corrected (d) and uncorrected (p) nucleotide substitutions per site. The topologies of the trees obtained were then compared with that of the model tree. The results obtained can be summarized as follows: (1) The probability of obtaining the correct rooted or unrooted tree is low unless a large number of nucleotide differences exists between different sequences. (2) When the number of nucleotide substitutions per sequence is small or moderately large, the FM, DW, and MF methods show a better performance than UPGMA in recovering the correct topology. The former group of methods is particularly good for obtaining the correct unrooted tree. (3) When the number of substitutions per sequence is large, UPGMA is at least as good as the other methods, particularly for obtaining the correct rooted tree. (4) When the rate of nucleotide substitution varies with evolutionary lineage, the FM, DW, and MF methods show a better performance in obtaining the correct topology than UPGMA, except when a rooted tree is to be produced from data with a large number of nucleotide substitutions per sequence.(ABSTRACT TRUNCATED AT 250 WORDS)   相似文献   

16.
Comparisons are made of the accuracy of the restricted maximum-likelihood, Wagner parsimony, and UPGMA (unweighted pair-group method using arithmetic averages) clustering methods to estimate phylogenetic trees. Data matrices were generated by constructing simulated stochastic evolution in a multidimensional gene-frequency space using a simple genetic-drift model (Brownian-motion, random-walk) with constant rates of divergence in all lineages. Ten differentphylogenetic tree topologies of 20 operational taxonomic units (OTU's), representing a range of tree shapes, were used. Felsenstein's restricted maximum-likelihood method, Wagner parsimony, and UPGMA clustering were used to construct trees from the resulting data matrices. The computations for the restricted maximum-likelihood method were performed on a Cray-1 supercomputer since the required calculations (especially when optimized for the vector hardware) are performed substantially faster than on more conventional computing systems. The overall level of accuracy of tree reconstruction depends on the topology of the true phylogenetic tree. The UPGMA clustering method, especially when genetic-distance coefficients are used, gives the most accurate estimates of the true phylogeny (for our model with constant evolutionary rates). For large numbers of loci, all methods give similar results, but trends in the results imply that the restricted maximum-likelihood method would produce the most accurate trees if sample sizes were large enough.  相似文献   

17.
We describe a novel method for efficient reconstruction of phylogenetic trees, based on sequences of whole genomes or proteomes, whose lengths may greatly vary. The core of our method is a new measure of pairwise distances between sequences. This measure is based on computing the average lengths of maximum common substrings, which is intrinsically related to information theoretic tools (Kullback-Leibler relative entropy). We present an algorithm for efficiently computing these distances. In principle, the distance of two l long sequences can be calculated in O(l) time. We implemented the algorithm using suffix arrays our implementation is fast enough to enable the construction of the proteome phylogenomic tree for hundreds of species and the genome phylogenomic forest for almost two thousand viruses. An initial analysis of the results exhibits a remarkable agreement with "acceptable phylogenetic and taxonomic truth." To assess our approach, our results were compared to the traditional (single-gene or protein-based) maximum likelihood method. The obtained trees were compared to implementations of a number of alternative approaches, including two that were previously published in the literature, and to the published results of a third approach. Comparing their outcome and running time to ours, using a "traditional" trees and a standard tree comparison method, our algorithm improved upon the "competition" by a substantial margin. The simplicity and speed of our method allows for a whole genome analysis with the greatest scope attempted so far. We describe here five different applications of the method, which not only show the validity of the method, but also suggest a number of novel phylogenetic insights.  相似文献   

18.
In phylogenetics, a central problem is to infer the evolutionary relationships between a set of species X; these relationships are often depicted via a phylogenetic tree—a tree having its leaves labeled bijectively by elements of X and without degree-2 nodes—called the “species tree.” One common approach for reconstructing a species tree consists in first constructing several phylogenetic trees from primary data (e.g., DNA sequences originating from some species in X), and then constructing a single phylogenetic tree maximizing the “concordance” with the input trees. The obtained tree is our estimation of the species tree and, when the input trees are defined on overlapping—but not identical—sets of labels, is called “supertree.” In this paper, we focus on two problems that are central when combining phylogenetic trees into a supertree: the compatibility and the strict compatibility problems for unrooted phylogenetic trees. These problems are strongly related, respectively, to the notions of “containing as a minor” and “containing as a topological minor” in the graph community. Both problems are known to be fixed parameter tractable in the number of input trees k, by using their expressibility in monadic second-order logic and a reduction to graphs of bounded treewidth. Motivated by the fact that the dependency on k of these algorithms is prohibitively large, we give the first explicit dynamic programming algorithms for solving these problems, both running in time \(2^{O(k^2)} \cdot n\), where n is the total size of the input.  相似文献   

19.
From the DNA sequences for N taxa, the (generally unknown) phylogenetic tree T that gave rise to them is to be reconstructed. Various methods give rise, for each quartet J consisting of exactly four taxa, to a predicted tree L(J) based only on the sequences in J, and these are then used to reconstruct T. The author defines an "error-correcting map" (Ec), which replaces each L(J) with a new tree, Ec(L)(J), which has been corrected using other trees, L(K), in the list L. The "quartet distance" between two trees is defined as the number of quartets J on which the two trees differ, and two distinct trees are shown to always have quartet distance of at least N - 3. If L has quartet distance at most (N - 4)/2 from T, then Ec(L) will coincide with the correct list for T; and this result cannot be improved. In general, Ec can correct many more errors in L. Iteration of the map Ec may produce still more accurate lists. Simulations are reported which often show improvement even when the quartet distance considerably exceeds (N - 4)/2. Moreover, the Buneman tree for Ec(L) is shown to refine the Buneman tree for L, so that strongly supported edges for L remain strongly supported for Ec(L). Simulations show that if methods such as the C-tree or hypercleaning are applied to Ec(L), the resulting trees often have more resolution than when the methods are applied only to L.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号