首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 22 毫秒
1.
We recently developed a method for producing comprehensive gene and species phylogenies from unaligned whole genome data using singular value decomposition (SVD) to analyze character string frequencies. This work provides an integrated gene and species phylogeny for 64 vertebrate mitochondrial genomes composed of 832 total proteins. In addition, to provide a theoretical basis for the method, we present a graphical interpretation of both the original frequency matrix and the SVD-derived matrix. These large matrices describe high-dimensional Euclidean spaces within which biomolecular sequences can be uniquely represented as vectors. In particular, the SVD-derived vector space describes each protein relative to a restricted set of newly defined, independent axes, each of which represents a novel form of conserved motif, termed a correlated peptide motif. A quantitative comparison of the relative orientations of protein vectors in this space provides accurate and straightforward estimates of sequence similarity, which can in turn be used to produce comprehensive gene trees. Alternatively, the vector representations of genes from individual species can be summed, allowing species trees to be produced.  相似文献   

2.
3.
Genome data have accumulated rapidly in recent years, doubling roughly after every 6 months due to the influx of next-generation sequencing technologies. A plethora of plant genomes are available in comprehensive public databases. This easy access to data provides an opportunity to explore genome datasets and recruit new genes in various plant species not possible a decade ago. In the past few years, many gene families have been published using these public datasets. These genome-wide studies identify and characterize gene members, gene structures, evolutionary relationships, expression patterns, protein interactions and gene ontologies, and predict putative gene functions using various computational tools. Such studies provide meaningful information and an initial framework for further functional elucidation. This review provides a concise layout of approaches used in these gene family studies and demonstrates an outline for employing various plant genome datasets in future studies.  相似文献   

4.
The order Archaeognatha was an ancient group of Hexapoda and was considered as the most primitive of living insects. Two extant families (Meinertellidae and Machilidae) consisted of approximately 500 species. This study determined 3 complete mitochondrial genomes and 2 nearly complete mitochondrial genome sequences of the bristletail. The size of the 5 mitochondrial genome sequences of bristletail were relatively modest, containing 13 protein-coding genes (PCGs), 2 ribosomal RNA (rRNA) genes, 22 transfer RNA (tRNA) genes and one control region. The gene orders were identical to that of Drosophila yakuba and most bristletail species suggesting a conserved genome evolution within the Archaeognatha. In order to estimate archaeognathan evolutionary relationships, phylogenetic analyses were conducted using concatenated nucleotide sequences of 13 protein-coding genes, with four different computational algorithms (NJ, MP, ML and BI). Based on the results, the monophyly of the family Machilidae was challenged by both datasets (W12 and G12 datasets). The relationships among archaeognathan subfamilies seemed to be tangled and the subfamily Machilinae was also believed to be a paraphyletic group in our study.  相似文献   

5.
6.
MOTIVATION: Most molecular phylogenies are based on sequence alignments. Consequently, they fail to account for modes of sequence evolution that involve frequent insertions or deletions. Here we present a method for generating accurate gene and species phylogenies from whole genome sequence that makes use of short character string matches not placed within explicit alignments. In this work, the singular value decomposition of a sparse tetrapeptide frequency matrix is used to represent the proteins of organisms uniquely and precisely as vectors in a high-dimensional space. Vectors of this kind can be used to calculate pairwise distance values based on the angle separating the vectors, and the resulting distance values can be used to generate phylogenetic trees. Protein trees so derived can be examined directly for homologous sequences. Alternatively, vectors defining each of the proteins within an organism can be summed to provide a vector representation of the organism, which is then used to generate species trees. RESULTS: Using a large mitochondrial genome dataset, we have produced species trees that are largely in agreement with previously published trees based on the analysis of identical datasets using different methods. These trees also agree well with currently accepted phylogenetic theory. In principle, our method could be used to compare much larger bacterial or nuclear genomes in full molecular detail, ultimately allowing accurate gene and species relationships to be derived from a comprehensive comparison of complete genomes. In contrast to phylogenetic methods based on alignments, sequences that evolve by relative insertion or deletion would tend to remain recognizably similar.  相似文献   

7.
The phylogenetic positions of the families Campynemataceae and Corsiaceae within the order Liliales remains unclear. To date, molecular data from the plastid genome of Corsiaceae has been obtained exclusively from Arachnitis, for which alignment and phylogenetic inference has proved difficult. The extent of gene conservation among mycoheterotrophic species within Corsiaceae remains unknown. To clarify the phylogenetic position of Campynemataceae and Corsiaceae within Liliales, functional plastid-coding genes of species representing both families have been analyzed. Examination of two phylogenetic data sets of plastid genes employing parsimony, maximum-likelihood, and Bayesian inference methods strongly supported both families forming a basal clade to the remaining taxa of Liliales. The first data set consists of five functional plastid-encoded genes (matK, rps7, rps2, rps19, and rpl2) sequenced from Corsia dispar (Corsiaceae). The data set included 31 species representing all families within Liliales, as well as selected orders that are related closely to Liliales (10 outgroup species from Asparagales, Dioscoreales, and Pandanales). The second phylogenetic analysis was based on 75 plastid genes. This data set included 18 species from Liliales, representing major clades within the order, and 10 outgroup species from Asparagales, Dioscoreales, and Pandanales. In this latter data set, Campynemataceae was represented by 60 plastid-encoded genes sequenced from herbarium material of Campynema lineare. A large proportion of the plastid genome of C. dispar was also sequenced and compared to the plastid genomes of photosynthetic plants within Liliales and mycoheterotrophic plants within Asparagales to explore plastid genome reduction. The plastid genome of C. dispar is in the advanced stages of reduction, which signifies its high dependency on mycorrhizal fungi and is suggestive of a loss in photosynthetic ability. Functional plastid genes found in C. dispar may be applicable to other species in Corsiaceae, which will provide a basis for in-depth molecular analyses of interspecies relationships within the family, once molecular data from other members become available.  相似文献   

8.
9.
Lateral gene transfer (LGT) is an important mechanism of natural variation among prokaryotes. Over the full course of evolution, most or all of the genes resident in a given prokaryotic genome have been affected by LGT, yet the frequency of LGT can vary greatly across genes and across prokaryotic groups. The proteobacteria are among the most diverse of prokaryotic taxa. The prevalence of LGT in their genome evolution calls for the application of network-based methods instead of tree-based methods to investigate the relationships among these species. Here, we report networks that capture both vertical and horizontal components of evolutionary history among 1,207,272 proteins distributed across 329 sequenced proteobacterial genomes. The network of shared proteins reveals modularity structure that does not correspond to current classification schemes. On the basis of shared protein-coding genes, the five classes of proteobacteria fall into two main modules, one including the alpha-, delta-, and epsilonproteobacteria and the other including beta- and gammaproteobacteria. The first module is stable over different protein identity thresholds. The second shows more plasticity with regard to the sequence conservation of proteins sampled, with the gammaproteobacteria showing the most chameleon-like evolutionary characteristics within the present sample. Using a minimal lateral network approach, we compared LGT rates at different phylogenetic depths. In general, gene evolution by LGT within proteobacteria is very common. At least one LGT event was inferred to have occurred in at least 75% of the protein families. The average LGT rate at the species and class depth is about one LGT event per protein family, the rate doubling at the phylum level to an average of two LGT events per protein family. Hence, our results indicate that the rate of gene acquisition per protein family is similar at the level of species (by recombination) and at the level of classes (by LGT). The frequency of LGT per genome strongly depends on the species lifestyle, with endosymbionts showing far lower LGT frequencies than free-living species. Moreover, the nature of the transferred genes suggests that gene transfer in proteobacteria is frequently mediated by conjugation.  相似文献   

10.
Tetraodontiformes includes approximately 350 species assigned to nine families, sharing several reduced morphological features of higher teleosts. The order has been accepted as a monophyletic group by many authors, although several alternative hypotheses exist regarding its phylogenetic position within the higher teleosts. To date, acanthuroids, zeiforms, and lophiiforms have been proposed as sister-groups of the tetraodontiforms. The monophyly and sister-group status was investigated using whole mitochondrial genome (mitogenome) sequences from 44 purposefully-chosen species (26 sequences newly-determined during the study) that fully represent the major tetraodontiform lineages plus all the groups that have been hypothesized as being close relatives. Partitioned Bayesian analyses were conducted with the three datasets that comprised concatenated nucleotide sequences from 13 protein-coding genes (with and without, or with RY-coding, 3rd codon positions), plus 22 transfer RNA and two ribosomal RNA genes. The resultant trees were well resolved and largely congruent, with most internal branches being supported by high posterior probabilities. Mitogenomic data strongly supported the monophyly of tetraodontiform fishes, placing them as a sister-group of either Lophiiformes plus Caproidei or Caproidei only. The sister-group relationship between Acanthuroidei and Tetraodontiformes was statistically rejected using Bayes factors. These results were confirmed by a reanalysis of the previously published nuclear RAG1 gene sequences using the Bayesian method. Within the Tetraodontiformes, however, monophylies of the three superfamilies were not recovered and further taxonomic sampling and subsequent efforts should clarify these relationships.  相似文献   

11.
12.
13.
14.
Avoidance of 4-, 5-, and 6-letter palindromes is observed in many prokaryotic genomes. A large fraction of such palindromes is formed by restriction sites of the species itself or a closely related species. One possible reason for that is the horizontal transfer of genes encoding restriction-modification systems. In organisms isolated from the action of such systems (e.g., in Mycoplasma), palindromes are not avoided. The general tendencies in preferences and avoidance of palindromes were studied for 33 available prokaryotic genomes. The results obtained provide additional insight into the relationships within and between taxonomic groups.  相似文献   

15.
16.
藻类植物的cpDNA结构复杂,普遍缺失反向重复序列IR,且存在IR的藻类植物种类的cpDNA也有IR变短退化迹象.藻类植物的cpDNA包含的基因一般比高等植物要多,编码能力更强.藻类植物cpDNA全序列的测定方法主要是Fosmid文库构建,配合使用Long-PCR技术.该文对国内外有关藻类植物叶绿体基因组结构、叶绿体编码基因、叶绿体基因组在藻类系统发育中的应用以及藻类植物叶绿体基因组的提取和序列测定方法等进行综述,为藻类植物的系统发育和叶绿体起源以及功能基因组学的研究提供理论依据.  相似文献   

17.
Transposable elements (TEs) are mobile, repetitive DNA sequences that are almost ubiquitous in prokaryotic and eukaryotic genomes. They have a large impact on genome structure, function and evolution. With the recent development of high-throughput sequencing methods, many genome sequences have become available, making possible comparative studies of TE dynamics at an unprecedented scale. Several methods have been proposed for the de novo identification of TEs in sequenced genomes. Most begin with the detection of genomic repeats, but the subsequent steps for defining TE families differ. High-quality TE annotations are available for the Drosophila melanogaster and Arabidopsis thaliana genome sequences, providing a solid basis for the benchmarking of such methods. We compared the performance of specific algorithms for the clustering of interspersed repeats and found that only a particular combination of algorithms detected TE families with good recovery of the reference sequences. We then applied a new procedure for reconciling the different clustering results and classifying TE sequences. The whole approach was implemented in a pipeline using the REPET package. Finally, we show that our combined approach highlights the dynamics of well defined TE families by making it possible to identify structural variations among their copies. This approach makes it possible to annotate TE families and to study their diversification in a single analysis, improving our understanding of TE dynamics at the whole-genome scale and for diverse species.  相似文献   

18.
Microbial systematics and phylogeny should form the foundation and guiding light for a comprehensive understanding of different aspects of microbiology. However, there are many critical issues in microbial systematics that are currently not resolved. Some of these include: how to define and delimit a prokaryotic species; development of rationale criteria for the assignment of higher taxonomic ranks; understanding what unique properties distinguish species from different groups; and understanding the branching order and interrelationship among higher prokaryotic clades. The sequencing of genomes from large numbers of cultured as well as uncultured microbes covering prokaryotic diversity provides unique means to achieve these important objectives. Prokaryotic genomes are found to be very diverse and dynamic and horizontal gene transfers (HGTs) are indicated to have played important role in species/genome evolution. Although HGT adds a layer of complexity in terms of understanding the genomes and species evolution, it is contended that vast majority of genes and genetic characteristics that are distinctive characteristics of higher prokaryotic taxa are vertically inherited and based on them a solid foundation for microbial systematics can be developed. We describe two kinds of molecular markers consisting of conserved indels in protein sequences and whole proteins that are specific for different groups that are proving particularly valuable in defining different prokaryotic groups in clear molecular terms and in understanding their interrelationships. The genetic and biochemical studies on these taxa-specific molecular markers also open the way to discover novel biochemical and physiological characteristics that are unique properties of these groups.  相似文献   

19.
Environmental genomics, the big picture?   总被引:14,自引:0,他引:14  
The enormous sequencing capabilities of our times might be reaching the point of overflowing the possibilities to analyse data and allow for a feedback on where to focus the available resources. We have now a foreseeable future in which most bacterial species will have an annotated genome. However, we know also that most prokaryotic diversity would not be included there. On the one hand, there is the problem of many groups not being easily amenable to culture and hence not represented in culture-centred microbial taxonomy. On the other hand, the gene pools present in one species can be orders of magnitude larger than the genome of one strain (selected for genome sequencing). Contrasting with eukaryotic genomes, the repertoire of genes present in one prokaryotic cell genome does not correlate stringently with its taxonomic identity. Hence gene catalogues from one environment might provide more meaningful information than the classical species catalogues. Metagenomics or microbial environmental genomics provide a different tool that gravitates around the habitat rather than the species. Such a tool could be just the right way to complement "organismal genomics". Its potential to advance our understanding of microbial ecology and prokaryotic diversity and evolution is discussed.  相似文献   

20.
D. G. Naumoff 《Microbiology》2013,82(4):415-422
α-L-Rhamnosidases are an important group of glycoside hydrolases represented in many organisms from various prokaryotic phyla. Based on the homology of catalytic domains, all these proteins are assigned to the GH78 and GH106 families of glycoside hydrolases. However, most prokaryotic genomes contain no genes encoding proteins from these two families. We found that the unique genome of Clostridium methylpentosum DSM5476 contains 83 genes of proteins from these families and undertook investigation of their phylogeny. The absence of homologous genes in most of strains of the genus Clostridium suggests an important ecological role of these genes, in C. methylpentosum in particular. Phylogenetic analysis revealed multiple lateral transfers and duplications of the corresponding genes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号