共查询到20条相似文献,搜索用时 9 毫秒
1.
重建生物进化树一直以来都是进化生物学家的梦想。大量物种全基因组的测序使得我们可以从全基因组水平上构建进化树,来研究各个物种之间的进化关系。本文采用2种统计方法和3种距离计算方法,在全基因组水平上建立基于蛋白质结构的进化树。选取93个物种的全基因组作为分析对象,涵盖了3个超界:真核生物,细菌和古细菌。而结果也正确地将这些物种分为三个大类,每个大分支内部的物种聚类情况也基本和这些物种的形态学分类相吻合。并将这些方法的聚类结果与物种分类的结果相比较,得出丰度的统计方法和基于两向量夹角的距离计算方法这种组合在构建进化树上比其他组合更好。 相似文献
2.
Aquerium: A web application for comparative exploration of domain‐based protein occurrences on the taxonomically clustered genome tree 下载免费PDF全文
Gene duplication and loss are major driving forces in evolution. While many important genomic resources provide information on gene presence, there is a lack of tools giving equal importance to presence and absence information as well as web platforms enabling easy visual comparison of multiple domain‐based protein occurrences at once. Here, we present Aquerium, a platform for visualizing genomic presence and absence of biomolecules with a focus on protein domain architectures. The web server offers advanced domain organization querying against the database of pre‐computed domains for ~26,000 organisms and it can be utilized for identification of evolutionary events, such as fusion, disassociation, duplication, and shuffling of protein domains. The tool also allows alternative inputs of custom entries or BLASTP results for visualization. Aquerium will be a useful tool for biologists who perform comparative genomic and evolutionary analyses. The web server is freely accessible at http://aquerium.utk.edu . Proteins 2016; 85:72–77. © 2016 Wiley Periodicals, Inc. 相似文献
3.
Databases of multiple sequence alignments are a valuable aid to protein sequence classification and analysis. One of the main challenges when constructing such a database is to simultaneously satisfy the conflicting demands of completeness on the one hand and quality of alignment and domain definitions on the other. The latter properties are best dealt with by manual approaches, whereas completeness in practice is only amenable to automatic methods. Herein we present a database based on hidden Markov model profiles (HMMs), which combines high quality and completeness. Our database, Pfam, consists of parts A and B. Pfam-A is curated and contains well-characterized protein domain families with high quality alignments, which are maintained by using manually checked seed alignments and HMMs to find and align all members. Pfam-B contains sequence families that were generated automatically by applying the Domainer algorithm to cluster and align the remaining protein sequences after removal of Pfam-A domains. By using Pfam, a large number of previously unannotated proteins from the Caenorhabditis elegans genome project were classified. We have also identified many novel family memberships in known proteins, including new kazal, Fibronectin type III, and response regulator receiver domains. Pfam-A families have permanent accession numbers and form a library of HMMs available for searching and automatic annotation of new protein sequences. Proteins: 28:405–420, 1997. © 1997 Wiley-Liss, Inc. 相似文献
4.
Harald Brüssow 《Philosophical transactions of the Royal Society of London. Series B, Biological sciences》2009,364(1527):2263-2274
Darwin provided a great unifying theory for biology; its visual expression is the universal tree of life. The tree concept is challenged by the occurrence of horizontal gene transfer and—as summarized in this review—by the omission of viruses. Microbial ecologists have demonstrated that viruses are the most numerous biological entities on earth, outnumbering cells by a factor of 10. Viral genomics have revealed an unexpected size and distinctness of the viral DNA sequence space. Comparative genomics has shown elements of vertical evolution in some groups of viruses. Furthermore, structural biology has demonstrated links between viruses infecting the three domains of life pointing to a very ancient origin of viruses. However, presently viruses do not find a place on the universal tree of life, which is thus only a tree of cellular life. In view of the polythetic nature of current life definitions, viruses cannot be dismissed as non-living material. On earth we have therefore at least two large DNA sequence spaces, one represented by capsid-encoding viruses and another by ribosome-encoding cells. Despite their probable distinct evolutionary origin, both spheres were and are connected by intensive two-way gene transfers. 相似文献
5.
A domain interaction map based on phylogenetic profiling 总被引:2,自引:0,他引:2
Phylogenetic profiling is a well established method for predicting functional relations and physical interactions between proteins. We present a new method for finding such relations based on phylogenetic profiling of conserved domains rather than proteins, avoiding computationally expensive all versus all sequence comparisons among genomes. The resulting domain interaction map (DIMA) can be explored directly or mapped to a genome of interest. We demonstrate that the performance of DIMA is comparable to that of classical phylogenetic profiling and its predictions often yield information that cannot be detected by profiling of entire protein chains. We provide a list of novel domain associations predicted by our method. 相似文献
6.
Protein domain architectures (PDAs), in which single domains are linked to form multiple-domain proteins, are a major molecular form used by evolution for the diversification of protein functions. However, the design principles of PDAs remain largely uninvestigated. In this study, we constructed networks to connect domain architectures that had grown out from the same single domain for every single domain in the Pfam-A database and found that there are three main distinctive types of these networks, which suggests that evolution can exploit PDAs in three different ways. Further analysis showed that these three different types of PDA networks are each adopted by different types of protein domains, although many networks exhibit the characteristics of more than one of the three types. Our results shed light on nature''s blueprint for protein architecture and provide a framework for understanding architectural design from a network perspective. 相似文献
7.
Domains are basic evolutionary units of proteins and most proteins have more than one domain. Advances in domain modeling and collection are making it possible to annotate a large fraction of known protein sequences by a linear ordering of their domains, yielding their architecture. Protein domain architectures link evolutionarily related proteins and underscore their shared functions. Here, we attempt to better understand this association by identifying the evolutionary pathways by which extant architectures may have evolved. We propose a model of evolution in which architectures arise through rearrangements of inferred precursor architectures and acquisition of new domains. These pathways are ranked using a parsimony principle, whereby scenarios requiring the fewest number of independent recombination events, namely fission and fusion operations, are assumed to be more likely. Using a data set of domain architectures present in 159 proteomes that represent all three major branches of the tree of life allows us to estimate the history of over 85% of all architectures in the sequence database. We find that the distribution of rearrangement classes is robust with respect to alternative parsimony rules for inferring the presence of precursor architectures in ancestral species. Analyzing the most parsimonious pathways, we find 87% of architectures to gain complexity over time through simple changes, among which fusion events account for 5.6 times as many architectures as fission. Our results may be used to compute domain architecture similarities, for example, based on the number of historical recombination events separating them. Domain architecture "neighbors" identified in this way may lead to new insights about the evolution of protein function. 相似文献
8.
蛋白转导域内在化机制的研究进展 总被引:1,自引:0,他引:1
蛋白转导域(protein transduction domain,PTD)可以携带外源生物大分子进入细胞,在分子生物学、细胞生物学的基础研究及生物技术应用中,都展示出良好的前景,应用广泛,但机制不甚明确。已知的PTD均有其关键的特定氨基酸存在和较强的正电荷分布,并具有独特的二级结构及空间构象,这些特殊的结构特征对其内在化机制起决定作用。目前认为巨胞饮作用是PTD入胞的主要机制,PTD在经过细胞表面糖胺聚糖紧密结合快速作用及电荷作用后,由脂筏蛋白介导的巨胞饮作用内在化,然后巨胞饮体脂质双层破裂,使蛋白转导域.大分子释放入胞浆及胞核。 相似文献
9.
Assessing protein co-evolution in the context of the tree of life assists in the prediction of the interactome 总被引:1,自引:0,他引:1
The identification of the whole set of protein interactions taking place in an organism is one of the main tasks in genomics, proteomics and systems biology. One of the computational techniques used by many investigators for studying and predicting protein interactions is the comparison of evolutionary histories (phylogenetic trees), under the hypothesis that interacting proteins would be subject to a similar evolutionary pressure resulting in a similar topology of the corresponding trees. Here, we present a new approach to predict protein interactions from phylogenetic trees, which incorporates information on the overall evolutionary histories of the species (i.e. the canonical "tree of life") in order to correct by the expected background similarity due to the underlying speciation events. We test the new approach in the largest set of annotated interacting proteins for Escherichia coli. This assessment of co-evolution in the context of the tree of life leads to a highly significant improvement (P(N) by sign test approximately 10E-6) in predicting interaction partners with respect to the previous technique, which does not incorporate information on the overall speciation tree. For half of the proteins we found a real interactor among the 6.4% top scores, compared with the 16.5% by the previous method. We applied the new method to the whole E.coli proteome and propose functions for some hypothetical proteins based on their predicted interactors. The new approach allows us also to detect non-canonical evolutionary events, in particular horizontal gene transfers. We also show that taking into account these non-canonical evolutionary events when assessing the similarity between evolutionary trees improves the performance of the method predicting interactions. 相似文献
10.
Frankenfield KN Powers ET Kelly JW 《Protein science : a publication of the Protein Society》2005,14(8):2154-2166
Prion diseases appear to be caused by the aggregation of the cellular prion protein (PrP(C)) into an infectious form denoted PrP(Sc). The in vitro aggregation of the prion protein has been extensively investigated, yet many of these studies utilize truncated polypeptides. Because the C-terminal portion of PrP(Sc) is protease-resistant and retains infectivity, it is assumed that studies on this fragment are most relevant. The full-length protein can be distinguished from the truncated protein because it contains a largely structured, alpha-helical, C-terminal region in addition to an N terminus that is unstructured in the absence of metal ion binding. Herein, the in vitro aggregation of a truncated portion of the prion protein (PrP 90-231) and a full-length version (PrP 23-231) were compared. In each case, concentration-dependent aggregation was analyzed to discern whether it proceeds by a nucleation-dependent pathway. Both protein constructs appear to aggregate via a nucleated polymerization with a small nucleus size, yet the later steps differ. The full-length protein forms larger aggregates than the truncated protein, indicating that the N terminus may mediate higher-order aggregation processes. In addition, the N terminus has an influence on the assembly state of PrP before aggregation begins, causing the full-length protein to adopt several oligomeric forms in a neutral pH buffer. Our results emphasize the importance of studying the full-length protein in addition to the truncated forms for in vitro aggregation studies in order to make valid hypotheses about the mechanisms of prion aggregation and the distribution of aggregates in vivo. 相似文献
11.
Domains are considered as the basic units of protein folding, evolution, and function. Decomposing each protein into modular domains is thus a basic prerequisite for accurate functional classification of biological molecules. Here, we present ADDA, an automatic algorithm for domain decomposition and clustering of all protein domain families. We use alignments derived from an all-on-all sequence comparison to define domains within protein sequences based on a global maximum likelihood model. In all, 90% of domain boundaries are predicted within 10% of domain size when compared with the manual domain definitions given in the SCOP database. A representative database of 249,264 protein sequences were decomposed into 450,462 domains. These domains were clustered on the basis of sequence similarities into 33,879 domain families containing at least two members with less than 40% sequence identity. Validation against family definitions in the manually curated databases SCOP and PFAM indicates almost perfect unification of various large domain families while contamination by unrelated sequences remains at a low level. The global survey of protein-domain space by ADDA confirms that most large and universal domain families are already described in PFAM and/or SMART. However, a survey of the complete set of mobile modules leads to the identification of 1479 new interesting domain families which shuffle around in multi-domain proteins. The data are publicly available at ftp://ftp.ebi.ac.uk/pub/contrib/heger/adda. 相似文献
12.
13.
14.
The overall function of a multi‐domain protein is determined by the functional and structural interplay of its constituent domains. Traditional sequence alignment‐based methods commonly utilize domain‐level information and provide classification only at the level of domains. Such methods are not capable of taking into account the contributions of other domains in the proteins, and domain‐linker regions and classify multi‐domain proteins. An alignment‐free protein sequence comparison tool, CLAP (CLAssification of Proteins) was previously developed in our laboratory to especially handle multi‐domain protein sequences without a requirement of defining domain boundaries and sequential order of domains. Through this method we aim to achieve a biologically meaningful classification scheme for multi‐domain protein sequences. In this article, CLAP‐based classification has been explored on 5 datasets of multi‐domain proteins and we present detailed analysis for proteins containing (1) Tyrosine phosphatase and (2) SH3 domain. At the domain‐level CLAP‐based classification scheme resulted in a clustering similar to that obtained from an alignment‐based method. CLAP‐based clusters obtained for full‐length datasets were shown to comprise of proteins with similar functions and domain architectures. Our study demonstrates that multi‐domain proteins could be classified effectively by considering full‐length sequences without a requirement of identification of domains in the sequence. 相似文献
15.
16.
Selenoproteins serve important functions in many organisms, usually providing essential oxidoreductase enzymatic activity, often for defense against toxic xenobiotic substances. Most eukaryotic genomes possess a small number of these proteins, usually not more than 20. Selenoproteins belong to various structural classes, often related to oxidoreductase function, yet a few of them are completely uncharacterised.Here, the structural and functional prediction for the uncharacterised selenoprotein O (SELO) is presented. Using bioinformatics tools, we predict that SELO protein adopts a three-dimensional fold similar to protein kinases. Furthermore, we argue that despite the lack of conservation of the "classic" catalytic aspartate residue of the archetypical His-Arg-Asp motif, SELO kinases might have retained catalytic phosphotransferase activity, albeit with an atypical active site. Lastly, the role of the selenocysteine residue is considered and the possibility of an oxidoreductase-regulated kinase function for SELO is discussed.The novel kinase prediction is discussed in the context of functional data on SELO orthologues in model organisms, FMP40 a.k.a.YPL222W (yeast), and ydiU (bacteria). Expression data from bacteria and yeast suggest a role in oxidative stress response. Analysis of genomic neighbourhoods of SELO homologues in the three domains of life points toward a role in regulation of ABC transport, in oxidative stress response, or in basic metabolism regulation. Among bacteria possessing SELO homologues, there is a significant over-representation of aquatic organisms, also of aerobic ones. The selenocysteine residue in SELO proteins occurs only in few members of this protein family, including proteins from Metazoa, and few small eukaryotes (Ostreococcus, stramenopiles). It is also demonstrated that enterobacterial mchC proteins involved in maturation of bactericidal antibiotics, microcins, form a distant subfamily of the SELO proteins.The new protein structural domain, with a putative kinase function assigned, expands the known kinome and deserves experimental determination of its biological role within the cell-signaling network. 相似文献
17.
Successful prediction of protein domain boundaries provides valuable information not only for the computational structure prediction of multidomain proteins but also for the experimental structure determination. Since protein sequences of multiple domains may contain much information regarding evolutionary processes such as gene-exon shuffling, this information can be detected by analyzing the position-specific scoring matrix (PSSM) generated by PSI-BLAST. We have presented a method, PPRODO (Prediction of PROtein DOmain boundaries) that predicts domain boundaries of proteins from sequence information by a neural network. The network is trained and tested using the values obtained from the PSSM generated by PSI-BLAST. A 10-fold cross-validation technique is performed to obtain the parameters of neural networks using a nonredundant set of 522 proteins containing 2 contiguous domains. PPRODO provides good and consistent results for the prediction of domain boundaries, with accuracy of about 66% using the +/-20 residue criterion. The PPRODO source code, as well as all data sets used in this work, are available from http://gene.kias.re.kr/ approximately jlee/pprodo/. 相似文献
18.
Alex Sabogal Donald C Rio 《Protein science : a publication of the Protein Society》2010,19(11):2210-2218
Guanosine triphosphate (GTP) binding and hydrolysis events often act as molecular switches in proteins, modulating conformational changes between active and inactive states in many signaling molecules and transport systems. The P element transposase of Drosophila melanogaster requires GTP binding to proceed along its reaction pathway, following initial site‐specific DNA binding. GTP binding is unique to P elements and may represent a novel form of transpositional regulation, allowing the bound transposase to find a second site, looping the transposon DNA for strand cleavage and excision. The GTP‐binding activity has been previously mapped to the central portion of the transposase protein; however, the P element transposase contains little sequence identity with known GTP‐binding folds. To identify soluble, active transposase domains, a GFP solubility screen was used testing the solubility of random P element gene fragments in E. coli. The screen produced a single clone spanning known GTP‐binding residues in the central portion of the transposase coding region. This clone, amino acids 275–409 in the P element transposase, was soluble, highly expressed in E.coli and active for GTP‐binding activity, therefore is a candidate for future biochemical and structural studies. In addition, the chimeric screen revealed a minimal N‐terminal THAP DNA‐binding domain attached to an extended leucine zipper coiled‐coil dimerization domain in the P element transposase, precisely delineating the DNA‐binding and dimerization activities on the primary sequence. This study highlights the use of a GFP‐based solubility screen on a large multidomain protein to identify highly expressed, soluble truncated domain subregions. 相似文献
19.
Topology predictions for integral membrane proteins can be substantially improved if parts of the protein can be constrained to a given in/out location relative to the membrane using experimental data or other information. Here, we have identified a set of 367 domains in the SMART database that, when found in soluble proteins, have compartment-specific localization of a kind relevant for membrane protein topology prediction. Using these domains as prediction constraints, we are able to provide high-quality topology models for 11% of the membrane proteins extracted from 38 eukaryotic genomes. Two-thirds of these proteins are single spanning, a group of proteins for which current topology prediction methods perform particularly poorly. 相似文献
20.
Schäfer K Magnusson U Scheffel F Schiefner A Sandgren MO Diederichs K Welte W Hülsmann A Schneider E Mowbray SL 《Journal of molecular biology》2004,335(1):261-274
Maltose-binding proteins act as primary receptors in bacterial transport and chemotaxis systems. We report here crystal structures of the thermoacidostable maltose-binding protein from Alicyclobacillus acidocaldarius, and explore its modes of binding to maltose and maltotriose. Further, comparison with the structures of related proteins from Escherichia coli (a mesophile), and two hyperthermophiles (Pyrococcus furiosus and Thermococcus litoralis) allows an investigation of the basis of thermo- and acidostability in this family of proteins.The thermoacidophilic protein has fewer charged residues than the other three structures, which is compensated by an increase in the number of polar residues. Although the content of acidic and basic residues is approximately equal, more basic residues are exposed on its surface whereas most acidic residues are buried in the interior. As a consequence, this protein has a highly positive surface charge. Fewer salt bridges are buried than in the other MBP structures, but the number exposed on its surface does not appear to be unusual. These features appear to be correlated with the acidostability of the A. acidocaldarius protein rather than its thermostability.An analysis of cavities within the proteins shows that the extremophile proteins are more closely packed than the mesophilic one. Proline content is slightly higher in the hyperthermophiles and thermoacidophiles than in mesophiles, and this amino acid is more common at the second position of beta-turns, properties that are also probably related to thermostability. Secondary structural content does not vary greatly in the different structures, and so is not a contributing factor. 相似文献