首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A new sequence distance measure for phylogenetic tree construction   总被引:5,自引:0,他引:5  
MOTIVATION: Most existing approaches for phylogenetic inference use multiple alignment of sequences and assume some sort of an evolutionary model. The multiple alignment strategy does not work for all types of data, e.g. whole genome phylogeny, and the evolutionary models may not always be correct. We propose a new sequence distance measure based on the relative information between the sequences using Lempel-Ziv complexity. The distance matrix thus obtained can be used to construct phylogenetic trees. RESULTS: The proposed approach does not require sequence alignment and is totally automatic. The algorithm has successfully constructed consistent phylogenies for real and simulated data sets. AVAILABILITY: Available on request from the authors.  相似文献   

2.
It is at present difficult to accurately position gaps in sequence alignment and to determine substructural homology in structure alignment when reconstructing phylogenies based on highly divergent sequences. Therefore, we have developed a new strategy for inferring phylogenies based on highly divergent sequences. In this new strategy, the whole secondary structure presented as a string in bracket notation is used as phylogenetic characters to infer phylogenetic relationships. It is no longer necessary to decompose the secondary structure into homologous substructural components. In this study, reliable phylogenetic relationships of eight species in Pectinidae were inferred from the structure alignment, but not from sequence alignment, even with the aid of structural information. The results suggest that this new strategy should be useful for inferring phylogenetic relationships based on highly divergent sequences. Moreover, the structural evolution of ITS1 in Pectinidae was also investigated. The whole ITS1 structure could be divided into four structural domains. Compensatory changes were found in all four structural domains. Structural motifs in these domains were identified further. These motifs, especially those in D2 and D3, may have important functions in the maturation of rRNAs.  相似文献   

3.
Deng M  Yu C  Liang Q  He RL  Yau SS 《PloS one》2011,6(3):e17293

Background

Most existing methods for phylogenetic analysis involve developing an evolutionary model and then using some type of computational algorithm to perform multiple sequence alignment. There are two problems with this approach: (1) different evolutionary models can lead to different results, and (2) the computation time required for multiple alignments makes it impossible to analyse the phylogeny of a whole genome. This motivates us to create a new approach to characterize genetic sequences.

Methodology

To each DNA sequence, we associate a natural vector based on the distributions of nucleotides. This produces a one-to-one correspondence between the DNA sequence and its natural vector. We define the distance between two DNA sequences to be the distance between their associated natural vectors. This creates a genome space with a biological distance which makes global comparison of genomes with same topology possible. We use our proposed method to analyze the genomes of the new influenza A (H1N1) virus, human rhinoviruses (HRV) and mammalian mitochondrial. The result shows that a triple-reassortant swine virus circulating in North America and the Eurasian swine virus belong to the lineage of the influenza A (H1N1) virus. For the HRV and mammalian mitochondrial genomes, the results coincide with biologists'' analyses.

Conclusions

Our approach provides a powerful new tool for analyzing and annotating genomes and their phylogenetic relationships. Whole or partial genomes can be handled more easily and more quickly than using multiple alignment methods. Once a genome space has been constructed, it can be stored in a database. There is no need to reconstruct the genome space for subsequent applications, whereas in multiple alignment methods, realignment is needed to add new sequences. Furthermore, one can make a global comparison of all genomes simultaneously, which no other existing method can achieve.  相似文献   

4.
Coronavirus phylogeny based on a geometric approach   总被引:5,自引:0,他引:5  
  相似文献   

5.
Besides the complete genome, different partial genomic sequences of Hepatitis E virus (HEV) have been used in genotyping studies, making it difficult to compare the results based on them. No commonly agreed partial region for HEV genotyping has been determined. In this study, we used a statistical method to evaluate the phylogenetic performance of each partial genomic sequence from a genome wide, by comparisons of evolutionary distances between genomic regions and the full-length genomes of 101 HEV isolates to identify short genomic regions that can reproduce HEV genotype assignments based on full-length genomes. Several genomic regions, especially one genomic region at the 3′-terminal of the papain-like cysteine protease domain, were detected to have relatively high phylogenetic correlations with the full-length genome. Phylogenetic analyses confirmed the identical performances between these regions and the full-length genome in genotyping, in which the HEV isolates involved could be divided into reasonable genotypes. This analysis may be of value in developing a partial sequence-based consensus classification of HEV species.  相似文献   

6.
With the development of genome sequencing more whole genomes of microorganisms were completed, many methods wereintroduced to reconstruct the phylogenetic tree of those microorganismswith the information extracted from the whole genomes through variousways of transforming or mapping the whole genome sequences into otherforms which can describe the evolutionary distance in a new way. We thinkit might be possible that there exists information buried in the wholegenome transferred along lineage, which remains stable and is moreessential than sequence conservation of individual genes or the arrangementof some genes of a selected set. We need to find one measurement that caninvolve as many phylogenetic features as possible that are beyond thegenome sequence itself. We converted each genome sequence of themicroorganisms into another linear sequence to represent the functionalstructure of the sequence, and we used a new information function tocalculate the discrepancy of sequences and to get one distance matrix of thegenomes, and built one phylogenetic tree with a neighbor joining method.The resulting tree shows that the major lineages are consistent with theresult based on their 16srRNA sequences. Our method discovered onephylogenetic feature derived from the genome sequences and the encodedgenes that can rebuild the phylogenetic tree correctly. The mapping of onegenome sequence to its new form representing the relative positions of thefunctional genes provides a new way to measure the phylogeneticrelationships, and with the more specific classification of gene functions theresult could be more sensitive.  相似文献   

7.
Individual genes or regions are still commonly used to estimate the phylogenetic relationships among viral isolates. The genomic regions that can faithfully provide assessments consistent with those predicted with full-length genome sequences would be preferable to serve as good candidates of the phylogenetic markers for molecular epidemiological studies of many viruses. Here we employed a statistical method to evaluate the evolutionary relationships between individual viral genes and full-length genomes without tree construction as a way to determine which gene can match the genome well in phylogenetic analyses. This method was performed by calculation of linear correlations between the genetic distance matrices of aligned individual gene sequences and aligned genome sequences. We applied this method to the phylogenetic analyses of porcine circovirus 2 (PCV2), measles virus (MV), hepatitis E virus (HEV) and Japanese encephalitis virus (JEV). Phylogenetic trees were constructed for comparisons and the possible factors affecting the method accuracy were also discussed in the calculations. The results revealed that this method could produce results consistent with those of previous studies about the proper consensus sequences that could be successfully used as phylogenetic markers. And our results also suggested that these evolutionary correlations could provide useful information for identifying genes that could be used effectively to infer the genetic relationships.  相似文献   

8.
Species evolutionary relationships have traditionally been defined by sequence similarities of phylogenetic marker molecules, recently followed by whole-genome phylogenies based on gene order, average ortholog similarity or gene content. Here, we introduce genome conservation--a novel metric of evolutionary distances between species that simultaneously takes into account, both gene content and sequence similarity at the whole-genome level. Genome conservation represents a robust distance measure, as demonstrated by accurate phylogenetic reconstructions. The genome conservation matrix for all presently sequenced organisms exhibits a remarkable ability to define evolutionary relationships across all taxonomic ranges. An assessment of taxonomic ranks with genome conservation shows that certain ranks are inadequately described and raises the possibility for a more precise and quantitative taxonomy in the future. All phylogenetic reconstructions are available at the genome phylogeny server: .  相似文献   

9.
10.
The distribution of freshwater taxa is a good biogeographic model to study pattern and process of vicariance and dispersal. The subfamily Leuciscinae (Cyprinidae, Teleostei) consists of many species distributed widely in Eurasia and North America. Leuciscinae have been divided into two phyletic groups, leuciscin and phoxinin. The phylogenetic relationships between major clades within the subfamily are poorly understood, largely because of the overwhelming diversity of the group. The origin of the Far Eastern phoxinin is an interesting question regarding the evolutionary history of Leuciscinae. Here we present phylogenetic analysis of 31 species of Leuciscinae and outgroups based on complete mitochondrial genome sequences to clarify the phylogenetic relationships and to infer the evolutionary history of the subfamily.  相似文献   

11.
Conflicting results often accompany phylogenetic analyses of RNA, DNA, or protein sequences across diverse species. Causes contributing to these conflicts relate to ambiguities in identifying homologous characters of alignments, sensitivity of tree-making methods to unequal evolutionary rates, biases in species sampling, unrecognized paralogy, functional differentiation, loss of phylogenetic informational content due to long branches or fast evolution, and difficulties with the assumptions and approximations used to infer phylogenetic relationships. Attempts to surmount these conflicts by averaging over many proteins are problematic due to inherent biases of selected families, lack of signal in others, and events of lateral transfer, fusion, and/or chimerism. The process of assessing reliability of the results using the bootstrap method is strewn with obstacles because of lack of independence and inhomogeneity in the molecular data. Problems inherent to the three major procedures for developing phylogenetic trees--parsimony, likelihood, distance--are reviewed. Special attention is given to the problem of inferring evolutionary distances from patterns of similarity among sequences. The difficulties encountered by methods of phylogenetic reconstructions based on the analysis of divergent sequence families make new methods based on the analysis of complete genomes reasonable alternatives. Several of these are considered, including the signature sequences of Gupta and associates, the study of genome profiles, and the genomic signature set forth by Karlin and colleagues.  相似文献   

12.
Sequence alignment is a standard method to infer evolutionary, structural, and functional relationships among sequences. The quality of alignments depends on the substitution matrix used. Here we derive matrices based on superimpositions from protein pairs of similar structure, but of low or no sequence similarity. In a performance test the matrices are compared with 12 other previously published matrices. It is found that the structure-derived matrices are applicable for comparisons of distantly related sequences. We investigate the influence of evolutionary relationships of protein pairs on the alignment accuracy.  相似文献   

13.
Ribosomal DNA internal transcribed spacers (ITS) and partial external transcribed spacers (ETSf) are popularly used to infer evolutionary hypotheses. However, there is generally little consideration given to the secondary structures of these small RNA molecules and their potential effects on sequence alignment and phylogenetic analyzes. Intergeneric relationships amongst three of the four major lineages in the Sapindaceae, the Dodonaeoideae, Hippcastanoideae and Xanthoceroideae were assessed by firstly, generating secondary structure predictions for ITS and partial ETSf sequences, and then these predictions were used to assist alignment of the sequences. Secondly, the alignment was analyzed using RNA specific models of sequence evolution that account for the variation in nucleotide evolution in the independent loops and covariating stems regions of the ribosomal spacers. These models and phylogeny drawn from these analyzes were compared with that from analyzes using ‘traditional’ 4-state models and previous plastid analyzes. These analyzes identified that paired-site models developed to deal specifically with stem structures in RNA encoding sequences more appropriately account for the evolutionary history of the sequences than traditional 4-state substitution models.  相似文献   

14.
Determining the mode, or geographical context, of speciation is a critical first step to understanding the evolutionary mechanisms that cause new species to arise. In this study, we estimated phylogenetic relationships in the cerasina species group of the Hawaiian cricket genus Laupala (Orthoptera: Gryllidae) to test competing phylogeographical hypotheses and thus infer the mode of speciation. A previous phylogenetic result based on nuclear sequence data suggested that populations of L. cerasina on the Big Island of Hawaii are the result of two independent colonizations from Maui, implying parallel speciation and convergent song evolution, and contradicting systematic hypotheses based on behavioural and morphological data. We used amplified fragment length polymorphisms to investigate further the relationships among species and populations in the cerasina species group. Results of these analyses provide a robust estimate of phylogenetic relationships and support the phylogeographical history indicated by behavioural and morphological data.  相似文献   

15.
Finding correct species relationships using phylogeny reconstruction based on molecular data is dependent on several empirical and technical factors. These include the choice of DNA sequence from which phylogeny is to be inferred, the establishment of character homology within a sequence alignment, and the phylogeny algorithm used. Nevertheless, sequencing and phylogeny tools provide a way of testing certain hypotheses regarding the relationship among the organisms for which phenotypic characters demonstrate conflicting evolutionary information. The protozoan family Sarcocystidae is one such group for which molecular data have been applied phylogenetically to resolve questionable relationships. However, analyses carried out to date, particularly based on small-subunit ribosomal DNA, have not resolved all of the relationships within this family. Analysis of more than one gene is necessary in order to obtain a robust species signal, and some DNA sequences may not be appropriate in terms of their phylogenetic information content. With this in mind, we tested the informativeness of our chosen molecule, the large-subunit ribosomal DNA (lsu rDNA), by using subdivisions of the sequence in phylogenetic analysis through PAUP, fastDNAml, and neighbor joining. The segments of sequence applied correspond to areas of higher nucleotide variation in a secondary-structure alignment involving 21 taxa. We found that subdivision of the entire lsu rDNA is inappropriate for phylogenetic analysis of the Sarcocystidae. There are limited informative nucleotide sites in the lsu rDNA for certain clades, such as the one encompassing the subfamily Toxoplasmatinae. Consequently, the removal of any segment of the alignment compromises the final tree topology. We also tested the effect of using two different alignment procedures (CLUSTAL W and the structure alignment using DCSE) and three different tree-building methods on the final tree topology. This work shows that congruence between different methods in the formation of clades may be a feature of robust topology; however, a sequence alignment based on primary structure may not be comparing homologous nucleotides even though the expected topology is obtained. Our results support previous findings showing the paraphyly of the current genera Sarcocystis and Hammondia and again bring to question the relationships of Sarcocystis muris, Isospora felis, and Neospora caninum. In addition, results based on phylogenetic analysis of the structure alignment suggest that Sarcocystis zamani and Sarcocystis singaporensis, which have reptilian definitive hosts, are monophyletic with Sarcocystis species using mammalian definitive hosts if the genus Frenkelia is synonymized with Sarcocystis.  相似文献   

16.
The information from ancient DNA (aDNA) provides an unparalleled opportunity to infer phylogenetic relationships and population history of extinct species and to investigate genetic evolution directly. However, the degraded and fragmented nature of aDNA has posed technical challenges for studies based on conventional PCR amplification. In this study, we present an approach based on next generation sequencing to efficiently sequence the complete mitochondrial genome (mitogenome) of two extinct passenger pigeons (Ectopistes migratorius) using de novo assembly of massive short (90 bp), paired-end or single-end reads. Although varying levels of human contamination and low levels of postmortem nucleotide lesion were observed, they did not impact sequencing accuracy. Our results demonstrated that the de novo assembly of shotgun sequence reads could be a potent approach to sequence mitogenomes, and offered an efficient way to infer evolutionary history of extinct species.  相似文献   

17.
Mitochondrial DNA sequences can be used to estimate phylogenetic relationships among animal taxa and for molecular phylogenetic evolution analysis. With the development of sequencing technology, more and more mitochondrial sequences have been made available in public databases, including whole mitochondrial DNA sequences. These data have been used for phylogenetic analysis of animal species, and for studies of evolutionary processes. We made phylogenetic analyses of 19 species of Cervidae, with Bos taurus as the outgroup. We used neighbor joining, maximum likelihood, maximum parsimony, and Bayesian inference methods on whole mitochondrial genome sequences. The consensus phylogenetic trees supported monophyly of the family Cervidae; it was divided into two subfamilies, Plesiometacarpalia and Telemetacarpalia, and four tribes, Cervinae, Muntiacinae, Hydropotinae, and Odocoileinae. The divergence times in these families were estimated by phylogenetic analysis using the Bayesian method with a relaxed molecular clock method; the results were consistent with those of previous studies. We concluded that the evolutionary structure of the family Cervidae can be reconstructed by phylogenetic analysis based on whole mitochondrial genomes; this method could be used broadly in phylogenetic evolutionary analysis of animal taxa.  相似文献   

18.
SUMMARY: TreeMos is a novel high-throughput graphical analysis application that allows the user to search for phylogenetic mosaicism among one or more DNA or protein sequence multiple alignments and additional unaligned sequences. TreeMos uses a sliding window and local alignment algorithm to identify the nearest neighbour of each sequence segment, and visualizes instances of sequence segments whose nearest neighbour is anomalous to that identified using the global alignment. Data sets can include whole genome sequences allowing phylogenomic analyses in which mosaicism may be attributed to recombination between any two points in the genome. TreeMos can be run from the command line, or within a web browser allowing the relationships between taxa to be explored by drill-through. AVAILABILITY: http://www2.warwick.ac.uk/fac/sci/whri/research/archaeobotany.  相似文献   

19.
Composition Vector Tree (CVTree) is an alignment-free algorithm to infer phylogenetic relationships from genome sequences. It has been successfully applied to study phylogeny and taxonomy of viruses, prokaryotes, and fungi based on the whole genomes, as well as chloroplast genomes, mitochondrial genomes, and metagenomes. Here we presented the standalone software for the CVTree algorithm. In the software, an extensible parallel workflow for the CVTree algorithm was designed. Based on the workflow, new alignment-free methods were also implemented. And by examining the phylogeny and taxonomy of 13,903 prokaryotes based on 16S rRNA sequences, we showed that CVTree software is an efficient and effective tool for studying phylogeny and taxonomy based on genome sequences. The code of CVTree software can be available at https://github.com/ghzuo/cvtree.  相似文献   

20.
Abstract

Molecular sequence data have become prominent tools for phylogenetic relationship inference, particularly useful in the analysis of highly diverse taxonomic orders. Ribosomal RNA sequences provide markers that can be used in the study of phylogeny, because their function and structure have been conserved to a large extent throughout the evolutionary history of organisms. These sequences are inferred from cloned or enzymatically amplified gene sequences, or determined by direct RNA sequencing. The first step of the phylogenetic interpretation of nucleic acid sequence variations implies proper alignment of corresponding sequences from various organisms. Best alignment based on similarity criteria is greatly reinforced, in the case of ribosomal RNAs, by secondary structure homologies. Distance matrix methods to infer evolutionary trees are based on the assumption that the phylogenetic distance between each pair of organisms is proportional to the number of nucleotide substitution events. Computed tree inference methods usually take into consideration the possibility of unequal mutation rates among lineages. Divergence times can be estimated on the tree, provided that at least one lineage has been dated by fossil records. We have utilized this approach based on ribosomal RNA sequence comparison to investigate the phylogenetic relationship between dinoflagellated and other eukaryote protists, and to refine controverse phylogenies of the class Dinophycae.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号