首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
随着越来越多基因组的测序完成,基于全基因组的非比对的系统发生分析已成为研究热点。不同的生物物种或个体基因组之间的核酸组分不完全相同。遗传语言-DNA序列的信息很大程度上反映在其k—mer频数中。基于基因组序列k-mer频数的系统发生树则从新的角度为我们提供物种之间的亲缘关系。本文定义基于k-mer,频数的信息参数,并用它表征基因组序列,计算不同基因组之间信息参数的距离,用邻接法对84个病毒构建了系统发生树,发现构建的系统发生树很大程度上与已有的系统发生树相吻合。  相似文献   

2.
Rational classification of proteins encoded in sequenced genomes is critical for making the genome sequences maximally useful for functional and evolutionary studies. The database of Clusters of Orthologous Groups of proteins (COGs) is an attempt on a phylogenetic classification of the proteins encoded in 21 complete genomes of bacteria, archaea and eukaryotes (http://www. ncbi.nlm. nih.gov/COG). The COGs were constructed by applying the criterion of consistency of genome-specific best hits to the results of an exhaustive comparison of all protein sequences from these genomes. The database comprises 2091 COGs that include 56-83% of the gene products from each of the complete bacterial and archaeal genomes and approximately 35% of those from the yeast Saccharomyces cerevisiae genome. The COG database is accompanied by the COGNITOR program that is used to fit new proteins into the COGs and can be applied to functional and phylogenetic annotation of newly sequenced genomes.  相似文献   

3.
MOTIVATION: Most molecular phylogenies are based on sequence alignments. Consequently, they fail to account for modes of sequence evolution that involve frequent insertions or deletions. Here we present a method for generating accurate gene and species phylogenies from whole genome sequence that makes use of short character string matches not placed within explicit alignments. In this work, the singular value decomposition of a sparse tetrapeptide frequency matrix is used to represent the proteins of organisms uniquely and precisely as vectors in a high-dimensional space. Vectors of this kind can be used to calculate pairwise distance values based on the angle separating the vectors, and the resulting distance values can be used to generate phylogenetic trees. Protein trees so derived can be examined directly for homologous sequences. Alternatively, vectors defining each of the proteins within an organism can be summed to provide a vector representation of the organism, which is then used to generate species trees. RESULTS: Using a large mitochondrial genome dataset, we have produced species trees that are largely in agreement with previously published trees based on the analysis of identical datasets using different methods. These trees also agree well with currently accepted phylogenetic theory. In principle, our method could be used to compare much larger bacterial or nuclear genomes in full molecular detail, ultimately allowing accurate gene and species relationships to be derived from a comprehensive comparison of complete genomes. In contrast to phylogenetic methods based on alignments, sequences that evolve by relative insertion or deletion would tend to remain recognizably similar.  相似文献   

4.
Through routine and nested PCR amplifications, four complete genome sequences of porcine Torque teno virus (TTV) type II were obtained from swine herds. By comparison with the TTV genome sequences deposited in GenBank, we found the most divergent types so far described. The level of genetic diversity between these genomes is higher than would be expected within a single virus species. A nucleotide and amino acid phylogenetic tree was constructed.  相似文献   

5.
Molecular phylogenetic trees are constructed in three dimensions relative to the distribution of MW and pl classes and immunocrossreactivity against polyclonal antibodies to lens crystallins, as well as multiple sequence alignment between amino acid sequences, coding nucleotide sequences and the gene nucleotide sequences for beta-globin. Euclidian distances are estimated to position species in x, y, z space by multidimensional scaling and merged with bootstrap-tested branching pattern of Fitch & Margoliash plots to obtain 3-D phylogenetic tree. Compared to single attributes, phylogenetic trees based on multiple parameters allow significant repositioning of rodents, chiroptera and primates.  相似文献   

6.
rRNA二级结构序列用于真菌系统学研究的方法初探   总被引:1,自引:0,他引:1  
本文首次利用核酸二级结构特征代替核酸碱基作为探讨类群之间亲缘关系的信号,构建了基于结构特征的子囊菌部分类群的系统进化树。该方法以S(规范的碱基对),Q(不规范的碱基对),I(单链),B(侧环),M(多分枝环)和H(发卡结构)为代码将二级结构特征区分为6种不同的亚结构类型,然后将二级结构特征转换为结构序列,并进行结构序列分析。该方法使rRNA不只局限于碱基比较,拓展了其应用范围,为揭示分子的功能与进化的关系提供了线索。结果表明,结构序列分析可用于子囊菌的系统学研究;相对于核酸序列分析,结构分析的结果似乎更加清晰地体现子囊果的演化过程。  相似文献   

7.
The carbohydrate-binding sequences (CBS) in the lectin genes of Trijilium repens, T. pratense, and T. tri-chocephalum were sequenced. The gene regions encoding lectin CBS of T. pratense and T. repens displayed a considerable similarity; however, the CBS of these species differed essentially. Moreover, T. repens formed a compact cluster with Melilotus albus and M. officinalis in the phylogenetic trees constructed according to the nucleotide sequences and the corresponding CBS of legume lectins. T. trichocephalum does not fall into the group of the tribe Trifolieae members according to both the amino acid sequence of lectin carbohydrate-binding region and the nucleotide sequence of lectin gene.  相似文献   

8.
One of the causes of genome size expansion is considered to be amplification of retrotransposons. We determined nucleotide sequences of 24 PCR products for each of six retrotransposons in Brassica rapa and Brassica oleracea. Phylogenetic trees of these sequences showed species-specific clades. We also sequenced STF7a homologs and Tto1 homologs, 24 PCR products each, in nine diploids and three allopolyploids, and constructed phylogenetic trees. In these phylogenetic trees, species-specific clades of diploid species were also formed, but retrotransposons of allopolyploids were clustered into the clades of their original genomes, indicating that these two retrotransposons amplified after speciation of the nine diploids. Genetic variation in these retrotransposons may have arisen before emergence of allopolyploid species. There was a positive correlation between the genome size and the average number of substitutions of STF7a and Tto1 homologs in at least seven diploids. The implications of these results in the genome evolution of Brassicaceae are herein discussed.  相似文献   

9.
We investigated whether relative rates of divergence were correlated between the mitochondrial and chloroplast genomes as expected under lineage effects or were genome specific as expected with locus-specific effects. Five mitochondrial noncoding regions (nad1B_C, nad4exon1_2, nad7exon2_3, nad7exon3_4, and rps14-cob) for 21 samples from Lecythidaceae were sequenced. Three chloroplast regions (rpl20-5'rps12, trnS-trnG, and psbA-trnH) were sequenced to expand the taxa in an existing data set. Absolute rates of nucleotide and insertion and deletion (indel) changes were 13 times faster in the chloroplast genome than in the mitochondrial genome. Similar indel length frequency distributions for both organelles suggested that common mechanisms were responsible for generating indels. Molecular clock tests applied to phylogenetic trees estimated from mitochondrial and chloroplast sequences revealed global rate heterogeneity of nucleotide substitution. Maximum likelihood and Tajima's 1D relative rate tests show that Lecythis zabucajo exhibited a rate acceleration for both the mitochondrial and chloroplast sequences. Whereas Eschweilera romeu-cardosoi showed a significant rate slowdown for chloroplast sequences, the mitochondrial sequences for 3 Eschweilera taxa showed evidence for a rate slowdown only when compared with L. zabucajo. Significant rate heterogeneity was also observed for indel changes in the mitochondrial genome but not for the chloroplast. The lack of mitochondrial nucleotide changes for some taxa as well as chloroplast indel homoplasy may have limited the power of relative rate tests to detect rate variation. Relative ratio tests consistently indicated rate proportionality among branch lengths between the mitochondrial and chloroplast phylogenetic trees. The relative ratio tests showed that taxa possessing rate heterogeneity had parallel relative divergence rates in both mitochondrial and chloroplast sequences as expected under lineage effects. A neutral replication-dependent model of rate heterogeneity for both nucleotide and indel changes provides a simple explanation for common patterns of rate heterogeneity across the 2 organelle genomes in Lecythidaceae. The lineage effects observed here were uncoupled from annual/perennial habit because all the species from this study are perennial.  相似文献   

10.
王华  张正线 《遗传学报》1995,22(6):413-423
葡萄糖转运蛋白是一个在结构上相似功能上不同的多基因家族(GLUT1-GLUT5)。由于这一组蛋白和体内的葡萄糖利用有关,因此被认为是糖尿病胰岛素抵抗(抗性)的一个候选基因。本文比较了不同种生物这一基因家族的氨基酸和核苷酸顺序;推测了亲水性和疏水性分布;计算了蛋白质和核苷酸的进化距离,并在此基础上构建了分子进化树。研究表明:这一基因家族具有高度的同源性、极为相似的亲水性和疏水性分布以及结构的对称性。提示这一基因家族起源于一个共同的祖先并可能通过基因的重复而形成。这一进化机制可能有利于氨基酸结构的稳定及抵抗突变的作用。由于邻元法构建的进化树其分支长度存在差异,提示在这一基因家族的进化过程中,各分支上的进化速率并不相同。蛋白质进化距离和核苷酸进化距离所构建进化树的差异提示了在基因组中可能存在隐匿替换。两种方法构建的进化树都提示了GLUT1、3、4在结构和功能上要更为保守。  相似文献   

11.
Whole-genome or multiple gene phylogenetic analysis is of interest since single gene analysis often results in poorly resolved trees. Here, the use of spectral techniques for analyzing multigene data sets is explored. The protein sequences are treated as categorical time series, and a measure of similarity between a pair of sequences, the spectral covariance, is based on the common periodicity between these two sequences. Unlike the other methods, the spectral covariance method focuses on the relationship between the sites of genetic sequences. By properly scaling the dissimilarity measures derived from different genes between a pair of species, we can use the mean of these scaled dissimilarity measures as a summary statistic to measure the taxonomic distances across multiple genes. The methods are applied to three different data sets, one noncontroversial and two with some dispute over the correct placement of the taxa in the tree. Trees are constructed using two distance-based methods, BIONJ and FITCH. A variation of block bootstrap sampling method is used for inference. The methods are able to recover all major clades in the corresponding reference trees with moderate to high bootstrap support. Through simulations, we show that the covariance-based methods effectively capture phylogenetic signal even when structural information is not fully retained. Comparisons of simulation results with the bootstrap permutation results indicate that the covariance-based methods are fairly robust under perturbations in sequence similarity but more sensitive to perturbations in structural similarity.  相似文献   

12.
Sampling properties of DNA sequence data in phylogenetic analysis   总被引:20,自引:6,他引:20  
We inferred phylogenetic trees from individual genes and random samples of nucleotides from the mitochondrial genomes of 10 vertebrates and compared the results to those obtained by analyzing the whole genomes. Individual genes are poor samples in that they infrequently lead to the whole-genome tree. A large number of nucleotide sites is needed to exactly determine the whole-genome tree. A relatively small number of sites, however, often results in a tree close to the whole-genome tree. We found that blocks of contiguous sites were less likely to lead to the whole-genome tree than samples composed of sites drawn individually from throughout the genome. Samples of contiguous sites are not representative of the entire genome, a condition that violates a basic assumption of the bootstrap method as it is applied in phylogenetic studies.   相似文献   

13.
TOPD/FMTS: a new software to compare phylogenetic trees   总被引:1,自引:0,他引:1  
SUMMARY: TOPD/FMTS has been developed to evaluate similarities and differences between phylogenetic trees. The software implements several new algorithms (including the Disagree method that returns the taxa, that disagree between two trees and the Nodal method that compares two trees using nodal information) and several previously described methods (such as the Partition method, Triplets or Quartets) to compare phylogenetic trees. One of the novelties of this software is that the FMTS (From Multiple to Single) program allows the comparison of trees that contain both orthologs and paralogs. Each option is also complemented with a randomization analysis to test the null hypothesis that the similarity between two trees is not better than chance expectation. AVAILABILITY: The Perl source code of TOPD/FMTS is available at http://genomes.urv.es/topd.  相似文献   

14.
The genes of ribosomal RNA are the most popular and frequently used markers for bacterial phylogeny and reconstruction of insect-symbiont coevolution. In primary symbionts, such as Buchnera and Wigglesworthia, genome economization leads to the establishment of a single copy of these sequences. In phylogenetic studies, they provide sufficient information and yield phylogenetic trees congruent with host evolution. In contrast, other symbiotic lineages (e.g., the genus Arsenophonus) carry a higher number of rRNA copies in their genomes, which may have serious consequences for phylogenetic inference. In this study, we show that in Arsenophonus triatominarum the degree of heterogeneity can affect reconstruction of phylogenetic relationships and mask possible coevolution between the symbiont and its host. Phylogenetic arrangement of individual rRNA copies was used, together with a calculation of their divergence time, to demonstrate that the incongruent 16S rDNA trees and low nucleotide diversity in the secondary symbiont could be reconciled with the coevolutionary scenario.  相似文献   

15.
Here, we present the results of a computational analysis of a group of hypothetical GH10 endo-β-xylanases from the Planctomycetes, a bacterial phylum with poorly characterized functional capabilities. These proteins are encoded in all analyzed genomes of heterotrophic Planctomycetes and form a phylogenetically distinct and tight cluster. In addition, we determined nucleotide sequences for endo-β-xylanase genes from five strains of Isosphaera-Singulisphaera group of the Planctomycetes. The trees constructed for the 16S rRNA genes and the inferred amino acid sequences of endo-β-xylanases were highly congruent, thus suggesting the vertical transfer of endo-β-xylanase genes and their functional importance in Planctomycetes.  相似文献   

16.
In silico genomic fingerprints were produced by virtual hybridization of 191 fully sequenced bacterial genomes using a set of 15,264 13-mer probes specially designed to produce universal whole genome fingerprints. A novel approach for constructing phylogenetic trees, based on comparative analysis of genomic fingerprints, was developed. The resultant bacterial phylogenetic tree had strong similarities to those produced from the alignment of conserved sequences. Notably, the trees derived from the alignment of other conserved COG genes divided the Bacillus and Corynebacterium genera into the same subgroups produced by the novel bacterial tree. A number of discrepancies between both techniques were observed for the grouping of some Lactobacillus species. However, a detailed analysis of the alignment of these genomes using other bioinformatics tools revealed that the grouping of these organisms in the novel tree was more satisfactory than the groupings from previous classifications, which used only a few conserved genes. All these data suggest that the bacterial taxonomy produced by genomic fingerprints is satisfactory, but sometimes different from classical taxonomies. Discrepancies probably arise because the fingerprinting technique analyzes genomic sequences and reveals more information than previously used approaches.  相似文献   

17.
Phylogenies involving nonmodel species are based on a few genes, mostly chosen following historical or practical criteria. Because gene trees are sometimes incongruent with species trees, the resulting phylogenies may not accurately reflect the evolutionary relationships among species. The increase in availability of genome sequences now provides large numbers of genes that could be used for building phylogenies. However, for practical reasons only a few genes can be sequenced for a wide range of species. Here we asked whether we can identify a few genes, among the single-copy genes common to most fungal genomes, that are sufficient for recovering accurate and well-supported phylogenies. Fungi represent a model group for phylogenomics because many complete fungal genomes are available. An automated procedure was developed to extract single-copy orthologous genes from complete fungal genomes using a Markov Clustering Algorithm (Tribe-MCL). Using 21 complete, publicly available fungal genomes with reliable protein predictions, 246 single-copy orthologous gene clusters were identified. We inferred the maximum likelihood trees using the individual orthologous sequences and constructed a reference tree from concatenated protein alignments. The topologies of the individual gene trees were compared to that of the reference tree using three different methods. The performance of individual genes in recovering the reference tree was highly variable. Gene size and the number of variable sites were highly correlated and significantly affected the performance of the genes, but the average substitution rate did not. Two genes recovered exactly the same topology as the reference tree, and when concatenated provided high bootstrap values. The genes typically used for fungal phylogenies did not perform well, which suggests that current fungal phylogenies based on these genes may not accurately reflect the evolutionary relationships among species. Analyses on subsets of species showed that the phylogenetic performance did not seem to depend strongly on the sample. We expect that the best-performing genes identified here will be very useful for phylogenetic studies of fungi, at least at a large taxonomic scale. Furthermore, we compare the method developed here for finding genes for building robust phylogenies with previous ones and we advocate that our method could be applied to other groups of organisms when more complete genomes are available.  相似文献   

18.
We describe a novel method for efficient reconstruction of phylogenetic trees, based on sequences of whole genomes or proteomes, whose lengths may greatly vary. The core of our method is a new measure of pairwise distances between sequences. This measure is based on computing the average lengths of maximum common substrings, which is intrinsically related to information theoretic tools (Kullback-Leibler relative entropy). We present an algorithm for efficiently computing these distances. In principle, the distance of two l long sequences can be calculated in O(l) time. We implemented the algorithm using suffix arrays our implementation is fast enough to enable the construction of the proteome phylogenomic tree for hundreds of species and the genome phylogenomic forest for almost two thousand viruses. An initial analysis of the results exhibits a remarkable agreement with "acceptable phylogenetic and taxonomic truth." To assess our approach, our results were compared to the traditional (single-gene or protein-based) maximum likelihood method. The obtained trees were compared to implementations of a number of alternative approaches, including two that were previously published in the literature, and to the published results of a third approach. Comparing their outcome and running time to ours, using a "traditional" trees and a standard tree comparison method, our algorithm improved upon the "competition" by a substantial margin. The simplicity and speed of our method allows for a whole genome analysis with the greatest scope attempted so far. We describe here five different applications of the method, which not only show the validity of the method, but also suggest a number of novel phylogenetic insights.  相似文献   

19.
In silico genomic fingerprints were produced by virtual hybridization of 191 fully sequenced bacterial genomes using a set of 15,264 13-mer probes specially designed to produce universal whole genome fingerprints. A novel approach for constructing phylogenetic trees, based on comparative analysis of genomic fingerprints, was developed. The resultant bacterial phylogenetic tree had strong similarities to those produced from the alignment of conserved sequences. Notably, the trees derived from the alignment of other conserved COG genes divided the Bacillus and Corynebacterium genera into the same subgroups produced by the novel bacterial tree. A number of discrepancies between both techniques were observed for the grouping of some Lactobacillus species. However, a detailed analysis of the alignment of these genomes using other bioinformatics tools revealed that the grouping of these organisms in the novel tree was more satisfactory than the groupings from previous classifications, which used only a few conserved genes. All these data suggest that the bacterial taxonomy produced by genomic fingerprints is satisfactory, but sometimes different from classical taxonomies. Discrepancies probably arise because the fingerprinting technique analyzes genomic sequences and reveals more information than previously used approaches.  相似文献   

20.
Genomes of 23 strains of cyanobacteria were comparatively analyzed using quantitative methods of estimation of gene order similarity. It has been found that reconstructions of phylogenesis of cyanobacteria based on the comparison of the orders of genes in chromosomes and nucleotide sequences appear to be similar. This confirms the applicability of quantitative measures of similarity of gene orders for phylogenetic reconstructions. In the evolution of marine unicellular planktonic cyanobacteria, genome rearrangements are fixed with a low rate (about 3% of gene order changes per 1% of 16S rRNA changes), whereas in other groups of cyanobacteria the gene order can change several times more rapidly. The gene orders in genomes of cyanobacteria and chloroplasts preserve a considerable degree of similarity. The closest relatives of chloroplasts among the analyzed cyanobacteria are likely to be strains from hot springs belonging to the genus Synechococcus. Comparative analysis of gene orders and nucleotide sequences strongly suggests that Synechococcus strains from different environments (sea, fresh waters, hot springs) are not related and belong to evolutionally distant lines.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号