首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The phylogenetic profile method has been widely applied in the prediction of protein-protein interactions (PPIs). Studies often use all of the available complete genomes for this method. With more than 400 genomes complete and new ones on the horizon, it remains unclear how to select reference organisms for profile construction and then influence the PPI prediction. Here, we performed a systematic assessment of reference organism selection from 225 complete genomes with their evolutionary tree. Our results suggest that reference organisms should be selected from moderately and highly genetically distant organisms, from all three domains (Bacteria, Archaea, and Eukarya), and by their even distribution at the fifth hierarchical level in the evolutionary tree. Our study provides important guidance on the construction of phylogenetic profiles for PPI prediction and functional genomics, which has become challenging due to the large and increasing number of available candidate organisms.  相似文献   

2.
Sun J  Xu J  Liu Z  Liu Q  Zhao A  Shi T  Li Y 《Bioinformatics (Oxford, England)》2005,21(16):3409-3415
MOTIVATION: The increasing availability of complete genome sequences provides excellent opportunity for the further development of tools for functional studies in proteomics. Several experimental approaches and in silico algorithms have been developed to cluster proteins into networks of biological significance that may provide new biological insights, especially into understanding the functions of many uncharacterized proteins. Among these methods, the phylogenetic profiles method has been widely used to predict protein-protein interactions. It involves the selection of reference organisms and identification of homologous proteins. Up to now, no published report has systematically studied the effects of the reference genome selection and the identification of homologous proteins upon the accuracy of this method. RESULTS: In this study, we optimized the phylogenetic profiles method by integrating phylogenetic relationships among reference organisms and sequence homology information to improve prediction accuracy. Our results revealed that the selection of the reference organisms set and the criteria for homology identification significantly are two critical factors for the prediction accuracy of this method. Our refined phylogenetic profiles method shows greater performance and potentially provides more reliable functional linkages compared with previous methods.  相似文献   

3.
It is desirable to estimate a tree of life, a species tree including all available species in the 3 superkingdoms, Archaea, Bacteria, and Eukaryota, using not a limited number of genes but full-scale genome information. Here, we report a new method for constructing a tree of life based on protein domain organizations, that is, sequential order of domains in a protein, of all proteins detected in a genome of an organism. The new method is free from the identification of orthologous gene sets and therefore does not require the burdensome and error-prone computation. By pairwise comparisons of the repertoires of protein domain organizations of 17 archaeal, 136 bacterial, and 14 eukaryotic organisms, we computed evolutionary distances among them and constructed a tree of life. Our tree shows monophyly in Archaea, Bacteria, and Eukaryota and then monophyly in each of eukaryotic kingdoms and in most bacterial phyla. In addition, the branching pattern of the bacterial phyla in our tree is consistent with the widely accepted bacterial taxonomy and is very close to other genome-based trees. A couple of inconsistent aspects between the traditional trees and the genome-based trees including ours, however, would perhaps urge to revise the conventional view, particularly on the phylogenetic positions of hyperthermophiles.  相似文献   

4.
Here, we used data of complete genomes to study comparatively the metabolism of different species. We built phenetic trees based on the enzymatic functions present in different parts of metabolism. Seven broad metabolic classes, comprising a total of 69 metabolic pathways, were comparatively analyzed for 27 fully sequenced organisms of the domains Eukarya, Bacteria and Archaea. Phylogenetic profiles based on the presence/absence of enzymatic functions for each metabolic class were determined and distance matrices for all the organisms were then derived from the profiles. Unrooted phenetic trees based upon the matrices revealed the distribution of the organisms according to their metabolic capabilities, reflecting the ecological pressures and adaptations that those species underwent during their evolution. We found that organisms that are closely related in phylogenetic terms could be distantly related metabolically and that the opposite is also true. For example, obligate bacterial pathogens were usually grouped together in our metabolic trees, demonstrating that obligate pathogens share common metabolic features regardless of their diverse phylogenetic origins. The branching order of proteobacteria often did not match their classical phylogenetic classification and Gram-positive bacteria showed diverse metabolic affinities. Archaea were found to be metabolically as distant from free-living bacteria as from eukaryotes, and sometimes were placed close to the metabolically highly specialized group of obligate bacterial pathogens. Metabolic trees represent an integrative approach for the comparison of the evolution of the metabolism and its correlation with the evolution of the genome, helping to find new relationships in the tree of life.  相似文献   

5.
非模式生物转录组研究   总被引:7,自引:0,他引:7  
刘红亮  郑丽明  刘青青  权富生  张涌 《遗传》2013,35(8):955-970
  相似文献   

6.
The advent of high‐throughput sequencing (HTS) has made genomic‐level analyses feasible for nonmodel organisms. A critical step of many HTS pipelines involves aligning reads to a reference genome to identify variants. Despite recent initiatives, only a fraction of species has publically available reference genomes. Therefore, a common practice is to align reads to the genome of an organism related to the target species; however, this could affect read alignment and bias genotyping. In this study, I conducted an experiment using empirical RADseq datasets generated for two species of salmonids (Actinopterygii; Teleostei; Salmonidae) to address these questions. There are currently reference genomes for six salmonids of varying phylogenetic distance. I aligned the RADseq data to all six genomes and identified variants with several different genotypers, which were then fed into population genetic analyses. Increasing phylogenetic distance between target species and reference genome reduced the proportion of reads that successfully aligned and mapping quality. Reference genome also influenced the number of SNPs that were generated and depth at those SNPs, although the affect varied by genotyper. Inferences of population structure were mixed: increasing reference genome divergence reduced estimates of differentiation but similar patterns of population relationships were found across scenarios. These findings reveal how the choice of reference genome can influence the output of bioinformatic pipelines. It also emphasizes the need to identify best practices and guidelines for the burgeoning field of biodiversity genomics.  相似文献   

7.
Species evolutionary relationships have traditionally been defined by sequence similarities of phylogenetic marker molecules, recently followed by whole-genome phylogenies based on gene order, average ortholog similarity or gene content. Here, we introduce genome conservation--a novel metric of evolutionary distances between species that simultaneously takes into account, both gene content and sequence similarity at the whole-genome level. Genome conservation represents a robust distance measure, as demonstrated by accurate phylogenetic reconstructions. The genome conservation matrix for all presently sequenced organisms exhibits a remarkable ability to define evolutionary relationships across all taxonomic ranges. An assessment of taxonomic ranks with genome conservation shows that certain ranks are inadequately described and raises the possibility for a more precise and quantitative taxonomy in the future. All phylogenetic reconstructions are available at the genome phylogeny server: .  相似文献   

8.
基于ITS序列的栓菌属部分种的分子分类初步研究   总被引:2,自引:0,他引:2  
栓菌属 Trametes 的一些近缘种宏观和微观形态学非常相近,传统分类学方法难于对其进行准确分类定位。测定了 34 个分类单元的 ITS(包括 5.8SrDNA)序列,并对得到的 43 个分类单元的 ITS 序列进行系统发生分析,构建了聚类分析树状图。该树状图显示,栓菌属类群与其他属类群明显分开,Trametes versicolor 聚类到一个高支持率的独立分支。形态学上定名为 T. hirsuta 和 T. pubescens 物种聚类到同一高支持率的独立分支,试验分析表明这两个种应视为同一物种。  相似文献   

9.

Background

One of the crucial steps toward understanding the biological functions of a cellular system is to investigate protein–protein interaction (PPI) networks. As an increasing number of reliable PPIs become available, there is a growing need for discovering PPIs to reconstruct PPI networks of interesting organisms. Some interolog-based methods and homologous PPI families have been proposed for predicting PPIs from the known PPIs of source organisms.

Results

Here, we propose a multiple-strategy scoring method to identify reliable PPIs for reconstructing the mouse PPI network from two well-known organisms: human and fly. We firstly identified the PPI candidates of target organisms based on homologous PPIs, sharing significant sequence similarities (joint E-value ≤ 1 × 10−40), from source organisms using generalized interolog mapping. These PPI candidates were evaluated by our multiple-strategy scoring method, combining sequence similarities, normalized ranks, and conservation scores across multiple organisms. According to 106,825 PPI candidates in yeast derived from human and fly, our scoring method can achieve high prediction accuracy and outperform generalized interolog mapping. Experiment results show that our multiple-strategy score can avoid the influence of the protein family size and length to significantly improve PPI prediction accuracy and reflect the biological functions. In addition, the top-ranked and conserved PPIs are often orthologous/essential interactions and share the functional similarity. Based on these reliable predicted PPIs, we reconstructed a comprehensive mouse PPI network, which is a scale-free network and can reflect the biological functions and high connectivity of 292 KEGG modules, including 216 pathways and 76 structural complexes.

Conclusions

Experimental results show that our scoring method can improve the predicting accuracy based on the normalized rank and evolutionary conservation from multiple organisms. Our predicted PPIs share similar biological processes and cellular components, and the reconstructed genome-wide PPI network can reflect network topology and modularity. We believe that our method is useful for inferring reliable PPIs and reconstructing a comprehensive PPI network of an interesting organism.  相似文献   

10.

Background

Inappropriate taxon definitions may have severe consequences in many areas. For instance, biologically sensible species delimitation of plant pathogens is crucial for measures such as plant protection or biological control and for comparative studies involving model organisms. However, delimiting species is challenging in the case of organisms for which often only molecular data are available, such as prokaryotes, fungi, and many unicellular eukaryotes. Even in the case of organisms with well-established morphological characteristics, molecular taxonomy is often necessary to emend current taxonomic concepts and to analyze DNA sequences directly sampled from the environment. Typically, for this purpose clustering approaches to delineate molecular operational taxonomic units have been applied using arbitrary choices regarding the distance threshold values, and the clustering algorithms.

Methodology

Here, we report on a clustering optimization method to establish a molecular taxonomy of Peronospora based on ITS nrDNA sequences. Peronospora is the largest genus within the downy mildews, which are obligate parasites of higher plants, and includes various economically important pathogens. The method determines the distance function and clustering setting that result in an optimal agreement with selected reference data. Optimization was based on both taxonomy-based and host-based reference information, yielding the same outcome. Resampling and permutation methods indicate that the method is robust regarding taxon sampling and errors in the reference data. Tests with newly obtained ITS sequences demonstrate the use of the re-classified dataset in molecular identification of downy mildews.

Conclusions

A corrected taxonomy is provided for all Peronospora ITS sequences contained in public databases. Clustering optimization appears to be broadly applicable in automated, sequence-based taxonomy. The method connects traditional and modern taxonomic disciplines by specifically addressing the issue of how to optimally account for both traditional species concepts and genetic divergence.  相似文献   

11.
A decade of progress in plant molecular phylogenetics   总被引:8,自引:0,他引:8  
Over the past decade, botanists have produced several thousand phylogenetic analyses based on molecular data, with particular emphasis on sequencing rbcL, the plastid gene encoding the large subunit of Rubisco (ribulose bisphosphate carboxylase). Because phylogenetic trees retrieved from the three plant genomes (plastid, nuclear and mitochondrial) have been highly congruent, the ‘Angiosperm Phylogeny Group’ has used these DNA-based phylogenetic trees to reclassify all families of flowering plants. However, in addition to taxonomy, these major phylogenetic efforts have also helped to define strategies to reconstruct the ‘tree of life’, and have revealed the size of the ancestral plant genome, uncovered potential candidates for the ancestral flower, identified molecular living fossils, and linked the rate of neutral substitutions with species diversity. With an increased interest in DNA sequencing programmes in non-model organisms, the next decade will hopefully see these phylogenetic findings integrated into new genetic syntheses, from genomes to taxa.  相似文献   

12.
The ranks higher than the species in the prokaryotic taxonomy are primarily designated based on phylogenetic analysis of the 16S rRNA gene sequences, but no definite standards exist for the absolute relatedness (measured by 16S rRNA or other means) between the ranks. Accordingly, it remains unknown how comparable the ranks are between different organisms. To gain insights into this question, we studied the relationship between shared gene content and genetic relatedness for 175 fully sequenced strains, using as a robust measure of relatedness the average amino acid identity (AAI) of the shared genes. Our results reveal that adjacent ranks (e.g., phylum versus class) frequently show extensive overlap in terms of genetic and gene content relatedness of the grouped organisms, and hence, the current system is of limited predictive power in this respect. The overlap between nonadjacent ranks (e.g., phylum versus family) is generally limited and attributable to clear inconsistencies of the taxonomy. In addition to providing means for standardizing taxonomy, our AAI-based approach provides a means to evaluate the robustness of alternative genetic markers for phylogenetic purposes. For instance, the 23S rRNA gene was found to be as good a marker as the 16S rRNA gene, while several of the widely distributed protein-coding genes, such as the RNA polymerase and gyrase subunits, show a strong phylogenetic signal, albeit less strong than the rRNA genes (0.78 > R2 > 0.69 for the protein-coding genes versus R2 = 0.84 for the rRNA genes). The AAI approach outlined here could contribute significantly to a genome-based taxonomy for all microbial organisms.  相似文献   

13.
Although genetic methods of species identification, especially DNA barcoding, are strongly debated, tests of these methods have been restricted to a few empirical cases for pragmatic reasons. Here we use simulation to test the performance of methods based on sequence comparison (BLAST and genetic distance) and tree topology over a wide range of evolutionary scenarios. Sequences were simulated on a range of gene trees spanning almost three orders of magnitude in tree depth and in coalescent depth; that is, deep or shallow trees with deep or shallow coalescences. When the query's conspecific sequences were included in the reference alignment, the rate of positive identification was related to the degree to which different species were genetically differentiated. The BLAST, distance, and liberal tree-based methods returned higher rates of correct identification than did the strict tree-based requirement that the query was within, but not sister to, a single-species clade. Under this more conservative approach, ambiguous outcomes occurred in inverse proportion to the number of reference sequences per species. When the query's conspecific sequences were not in the reference alignment, only the strict tree-based approach was relatively immune to making false-positive identifications. Thresholds affected the rates at which false-positive identifications were made when the query's species was unrepresented in the reference alignment but did not otherwise influence outcomes. A conservative approach using the strict tree-based method should be used initially in large-scale identification systems, with effort made to maximize sequence sampling within species. Once the genetic variation within a taxonomic group is well characterized and the taxonomy resolved, then the choice of method used should be dictated by considerations of computational efficiency. The requirement for extensive genetic sampling may render these techniques inappropriate in some circumstances.  相似文献   

14.
Discordance between mitochondrial and nuclear phylogenies is being increasingly recognized in animals and may confound DNA‐based taxonomy. This is especially relevant for taxa whose microscopic size often challenges any effort to distinguish between cryptic species without the assistance of molecular data. Regarding mitonuclear discordance, two strikingly contrasting scenarios have been recently demonstrated in the monogonont rotifers of the genus Brachionus. While strict mitonuclear concordance was observed in the marine B. plicatilis species complex, widespread hybridization‐driven mitonuclear discordance was revealed in the freshwater B. calyciflorus species complex. Here, we investigated the frequency of occurrence and the potential drivers of mitonuclear discordance in three additional freshwater monogonont rotifer taxa, and assessed its potential impact on the reliability of DNA taxonomy results based on commonly used single markers. We studied the cryptic species complexes of Keratella cochlearis, Polyarthra dolichoptera and Synchaeta pectinata. Phylogenetic reconstructions were based on the mitochondrial barcoding marker cytochrome c oxidase subunit I gene and the nuclear internal transcribed spacer 1 locus, which currently represent the two most typical genetic markers used in rotifer DNA taxonomy. Species were delimited according to each marker separately using a combination of tree‐based coalescent, distance‐based and allele‐sharing‐based approaches. Mitonuclear discordance was observed in all species complexes with incomplete lineage sorting and unresolved phylogenetic reconstructions recognized as the likely drivers. Evidence from additional sources, such as morphology and ecology, is thus advisable for deciding between often contrasting mitochondrial and nuclear species scenarios in these organisms.  相似文献   

15.
D-H Kim  D Heber  D W Still 《Génome》2004,47(1):102-111
The taxonomy of Echinacea is based on morphological characters and has varied depending on the monographer. The genus consists of either nine species and four varieties or four species and eight varieties. We have used amplified fragment length polymorphisms (AFLP) to assess genetic diversity and phenetic relationships among nine species and three varieties of Echinacea (sensu McGregor). A total of 1086 fragments, of which approximately 90% were polymorphic among Echinacea taxa, were generated from six primer combinations. Nei and Li's genetic distance coefficient and the neighbor-joining algorithm were employed to construct a phenetic tree. Genetic distance results indicate that all Echinacea species are closely related, and the average pairwise distance between populations was approximately three times the intrapopulation distances. The topology of the neighbor-joining tree strongly supports two major clades, one containing Echinacea purpurea, Echinacea sanguinea, and Echinacea simulata and the other containing the remainder of the Echinacea taxa (sensu McGregor). The species composition within the clades differs between our AFLP data and the morphometric treatment offered by Binns and colleagues. We also discuss the suitability of AFLP in determining phylogenetic relationships.  相似文献   

16.
We propose that what makes an organism is nearly complete cooperation, with strong control of intraorganism conflicts, and no affiliations above the level of the organism as unified as those at the organism level. Organisms can be made up of like units, which we call fraternal organisms, or different units, making them egalitarian organisms. Previous definitions have concentrated on the factors that favor high cooperation and low conflict, or on the adapted outcomes of organismality. Our approach brings these definitions together, conceptually unifying our understanding of organismality. Although the organism is a concerted cluster of adaptations, nearly all directed toward the same end, some conflict may remain. To understand such conflict, we extend Leigh's metaphor of the parliament of genes to include parties with different interests and committees that work on particular tasks.  相似文献   

17.
18.
DNA barcoding has become a promising means for the identification of organisms of all life‐history stages. Currently, distance‐based and tree‐based methods are most widely used to define species boundaries and uncover cryptic species. However, there is no universal threshold of genetic distance values that can be used to distinguish taxonomic groups. Alternatively, DNA barcoding can deploy a “character‐based” method, whereby species are identified through the discrete nucleotide substitutions. Our research focuses on the delimitation of moth species using DNA‐barcoding methods. We analyzed 393 Lepidopteran specimens belonging to 80 morphologically recognized species with a standard cytochrome c oxidase subunit I (COI) sequencing approach, and deployed tree‐based, distance‐based, and diagnostic character‐based methods to identify the taxa. The tree‐based method divided the 393 specimens into 79 taxa (species), and the distance‐based method divided them into 84 taxa (species). Although the diagnostic character‐based method found only 39 so‐identifiable species in the 80 species, with a reduction in sample size the accuracy rate substantially improved. For example, in the Arctiidae subset, all 12 species had diagnostics characteristics. Compared with traditional morphological method, molecular taxonomy performed well. All three methods enable the rapid delimitation of species, although they have different characteristics and different strengths. The tree‐based and distance‐based methods can be used for accurate species identification and biodiversity studies in large data sets, while the character‐based method performs well in small data sets and can also be used as the foundation of species‐specific biochips.  相似文献   

19.
Plant taxonomy based on molecular phylogenetic study and/or chemosystematics study has become increasingly important in exploring and utilizing medicinal resources due to the advent of big data era. In this study, we proposed a classifying approach combining DNA and chemical metabolites for the prediction of new medicinal resources. Specifically, we obtained 104 ITS2 barcodes and 847 chemical metabolites from 104 species in Ranunculaceae. Then, phylogenetic tree based on the ITS2 barcode and clustering tree based on structural similarity of metabolites were separately constructed. In addition, we tested the classifying accuracy of the two methods by Baker`s correlation coefficient and the result showed that phylogenetic tree based on the ITS2 barcode was more accurate, giving a higher score of 0.627, whereas clustering tree based on chemical metabolites obtained a lower score of 0.301. Therefore, the natural products of plants might be described using these clades found by ITS2-based methods, and thus new metabolites of plants might be predicted due to the close relationships in a given clade. Using this combined method, 53 plants with structurally similar metabolites were included in 9 plant groups and currently unknown species-metabolite relations were predicted. Finally, 26.92% species in Ranunculaceae were found to contain the predicted metabolites after verification using two alternative KNApSAcKCore and ChEBI databases. As a whole, the combined approach can successfully classify plants and predict specialized natural products based on plant taxa.  相似文献   

20.
Gomaa F  Todorov M  Heger TJ  Mitchell EA  Lara E 《Protist》2012,163(3):389-399
The systematics of lobose testate amoebae (Arcellinida), a diverse group of shelled free-living unicellular eukaryotes, is still mostly based on morphological criteria such as shell shape and composition. Few molecular phylogenetic studies have been performed on these organisms to date, and their phylogeny suffers from typical under-sampling artefacts, resulting in a still mostly unresolved tree. In order to clarify the phylogenetic relationships among arcellinid testate amoebae at the inter-generic and inter-specific level, and to evaluate the validity of the criteria used for taxonomy, we amplified and sequenced the SSU rRNA gene of nine taxa - Difflugia bacillariarum, D. hiraethogii, D. acuminata, D. lanceolata, D. achlora, Bullinularia gracilis, Netzelia oviformis, Physochila griseola and Cryptodifflugia oviformis. Our results, combined with existing data demonstrate the following: 1) Most arcellinids are divided into two major clades, 2) the genus Difflugia is not monophyletic, and the genera Netzelia and Arcella are closely related, and 3) Cryptodifflugia branches at the base of the Arcellinida clade. These results contradict the traditional taxonomy based on shell composition, and emphasize the importance of general shell shape in the taxonomy of arcellinid testate amoebae.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号