首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
    
The ever increasing speed of DNA sequencing widens the discrepancy between the number of known gene products, and the knowledge of their function and structure. Proper annotation of protein sequences is therefore crucial if the missing information is to be deduced from sequence‐based similarity comparisons. These comparisons become exceedingly difficult as the pairwise identities drop to very low values. To improve the accuracy of domain identification, we exploit the fact that the three‐dimensional structures of domains are much more conserved than their sequences. Based on structure‐anchored multiple sequence alignments of low identity homologues we constructed 850 structure‐anchored hidden Markov models (saHMMs), each representing one domain family. Since the saHMMs are highly family specific, they can be used to assign a domain to its correct family and clearly distinguish it from domains belonging to other families, even within the same superfamily. This task is not trivial and becomes particularly difficult if the unknown domain is distantly related to the rest of the domain sequences within the family. In a search with full length protein sequences, harbouring at least one domain as defined by the structural classification of proteins database (SCOP), version 1.71, versus the saHMM database based on SCOP version 1.69, we achieve an accuracy of 99.0%. All of the few hits outside the family fall within the correct superfamily. Compared to Pfam_ls HMMs, the saHMMs obtain about 11% higher coverage. A comparison with BLAST and PSI‐BLAST demonstrates that the saHMMs have consistently fewer errors per query at a given coverage. Within our recommended E‐value range, the same is true for a comparison with SUPERFAMILY. Furthermore, we are able to annotate 232 proteins with 530 nonoverlapping domains belonging to 102 different domain families among human proteins labelled “unknown” in the NCBI protein database. Our results demonstrate that the saHMM database represents a versatile and reliable tool for identification of domains in protein sequences. With the aid of saHMMs, homology on the family level can be assigned, even for distantly related sequences. Due to the construction of the saHMMs, the hits they provide are always associated with high quality crystal structures. The saHMM database can be accessed via the FISH server at http://babel.ucmp.umu.se/fish/ . Proteins 2009. © 2008 Wiley‐Liss, Inc.  相似文献   

2.
  总被引:2,自引:0,他引:2  
One hundred-forty-five full-length aldehyde dehydrogenase-related sequences were aligned to determine relationships within the aldehyde dehydrogenase (ALDH) extended family. The alignment reveals only four invariant residues: two glycines, a phenylalanine involved in NAD binding, and a glutamic acid that coordinates the nicotinamide ribose in certain E-NAD binary complex crystal structures, but which may also serve as a general base for the catalytic reaction. The cysteine that provides the catalytic thiol and its closest neighbor in space, an asparagine residue, are conserved in all ALDHs with demonstrated dehydrogenase activity. Sixteen residues are conserved in at least 95% of the sequences; 12 of these cluster into seven sequence motifs conserved in almost all ALDHs. These motifs cluster around the active site of the enzyme. Phylogenetic analysis of these ALDHs indicates at least 13 ALDH families, most of which have previously been identified but not grouped separately by alignment. ALDHs cluster into two main trunks of the phylogenetic tree. The largest, the \"Class 3\" trunk, contains mostly substrate-specific ALDH families, as well as the class 3 ALDH family itself. The other trunk, the \"Class 1/2\" trunk, contains mostly variable substrate ALDH families, including the class 1 and 2 ALDH families. Divergence of the substrate-specific ALDHs occurred earlier than the division between ALDHs with broad substrate specificities. A site on the World Wide Web has also been devoted to this alignment project.  相似文献   

3.
Referee: Dr. Philip Becraft, Zoology and Genetics/Agronomy Depts., 2116 Molecular Building, lowa State University, Ames, IA 50011 Forty-two lectin receptor kinase (lecRK)-related sequences and nine related soluble legume lectin sequences were identified in the Arabidopsis thaliana genome. The genes are scattered as a single or gathered copies at different loci throughout the five chromosomes, and four predicted lecRK probably correspond to pseudogenes. Both structural alignments and molecular modeling revealed striking similarities between the lectinlike domain of lecRK, and related A. thaliana soluble lectins and legume lectins. The hydrophobic cavity is extremely conserved, whereas most of the residues forming the monosaccharide-binding site and the bivalent cation-binding site of legume lectins are poorly conserved. LecRK should be unable to bind the simple sugars usually recognized by genuine legume lectins. Molecular modeling of the kinase domain suggests that, except for two apparently inactive receptors, all other lecRK contain a putative functional Ser/Thr kinase catalytic domain. Both the juxtamembrane and C-terminal domains, which are considered important regions for regulating the kinase activity, exhibit a few specific stretches of amino acid residues. Some phylogenetic relationships are inferred from the phylogenetic trees built up from the different lecRK domain sequences. LecRK cluster in three distinct classes (A,B,C), one of them (B) being more closely related to soluble lectins of A. thaliana and legume lectins.  相似文献   

4.
    
Members of a new molecular family of bacterial nonspecific acid phosphatases (NSAPs), indicated as class C, were found to share significant sequence similarities to bacterial class B NSAPs and to some plant acid phosphatases, representing the first example of a family of bacterial NSAPs that has a relatively close eukaryotic counterpart. Despite the lack of an overall similarity, conserved sequence motifs were also identified among the above enzyme families (class B and class C bacterial NSAPs, and related plant phosphatases) and several other families of phosphohydrolases, including bacterial phosphoglycolate phosphatases, histidinol-phosphatase domains of the bacterial bifunctional enzymes imidazole-glycerolphosphate dehydratases, and bacterial, eukaryotic, and archaeal phosphoserine phosphatases and threalose-6-phosphatases. These conserved motifs are clustered within two domains, separated by a variable spacer region, according to the pattern [FILMAVT]-D-[ILFRMVY]-D-[GSNDE]-[TV]-[ILVAM]-[AT S VILMC]-X-¿YFWHKR)-X-¿YFWHNQ¿-X( 102,191)-¿KRHNQ¿-G-D-¿FYWHILVMC¿-¿QNH¿-¿FWYGP¿-D -¿PSNQYW¿. The dephosphorylating activity common to all these proteins supports the definition of this phosphatase motif and the inclusion of these enzymes into a superfamily of phosphohydrolases that we propose to indicate as \"DDDD\" after the presence of the four invariant aspartate residues. Database searches retrieved various hypothetical proteins of unknown function containing this or similar motifs, for which a phosphohydrolase activity could be hypothesized.  相似文献   

5.
Secreted and cell-surface-localized members of the immunoglobulin superfamily (IgSF) play central roles in regulating adaptive and innate immune responses and are prime targets for the development of protein-based therapeutics. An essential activity of the ectodomains of these proteins is the specific recognition of cognate ligands, which are often other members of the IgSF. In this work, we provide functional insight for this important class of proteins through the development of a clustering algorithm that groups together extracellular domains of the IgSF with similar binding preferences. Information from hidden Markov model-based sequence profiles and domain architecture is calibrated against manually curated protein interaction data to define functional families of IgSF proteins. The method is able to assign 82% of the 477 extracellular IgSF protein to a functional family, while the rest are either single proteins with unique function or proteins that could not be assigned with the current technology. The functional clustering of IgSF proteins generates hypotheses regarding the identification of new cognate receptor–ligand pairs and reduces the pool of possible interacting partners to a manageable level for experimental validation.  相似文献   

6.
基于知识的蛋白质结构预测   总被引:5,自引:0,他引:5       下载免费PDF全文
介绍了近几年基于知识的蛋白质三维结构预测方法及其进展.目前,基于知识的结构预测方法主要有两类,一类是同源蛋白模建,这种技术比较成熟,模建的结果可靠性比较高,但只适用于同源性比较高的目标序列的模建;另一类方法即蛋白质逆折叠技术,主要包括3D profile方法和基于势函数的方法,给出的是目标蛋白质的空间走向,它主要可用于序列同源性比较低的蛋白质的结构预测.  相似文献   

7.
  总被引:7,自引:3,他引:7  
  相似文献   

8.
The ProDom database is a comprehensive set of protein domain families automatically generated from the SWISS-PROT and TrEMBL sequence databases. An associated database, ProDom-CG, has been derived as a restriction of ProDom to completely sequenced genomes. The ProDom construction method is based on iterative PSI-BLAST searches and multiple alignments are generated for each domain family. The ProDom web server provides the user with a set of tools to visualise multiple alignments, phylogenetic trees and domain architectures of proteins, as well as a BLAST-based server to analyse new sequences for homologous domains. The comprehensive nature of ProDom makes it particularly useful to help sustain the growth of InterPro.  相似文献   

9.
An approach to discover sequence patterns characteristic of ligand classes is described and applied to aminergic G protein-coupled receptors (GPCRs). Putative ligand-binding residue positions were inferred from considering three lines of evidence: conservation in the subfamily absent or underrepresented in the superfamily, any available mutation data, and the physicochemical properties of the ligand. For aminergic GPCRs, the motif is composed of a conserved aspartic acid in the third transmembrane (TM) domain (rhodopsin position 117) and a conserved tryptophan in the seventh TM domain (rhodopsin position 293); the roles of each are readily justified by molecular modeling of ligand-receptor interactions. This minimally defined motif is an appropriate computational tool for identifying additional, potentially novel aminergic GPCRs from a set of experimentally uncharacterized "orphan" GPCRs, complementing existing sequence matching, clustering, and machine-learning techniques. Motif sensitivity stems from the stepwise addition of residues characteristic of an entire class of ligand (and not tailored for any particular biogenic amine). This sensitivity is balanced by careful consideration of residues (evidence drawn from mutation data, correlation of ligand properties to residue properties, and location with respect to the extracellular face), thereby maintaining specificity for the aminergic class. A number of orphan GPCRs assigned to the aminergic class by this motif were later discovered to be a novel subfamily of trace amine GPCRs, as well as the successful classification of the histamine H4 receptor.  相似文献   

10.
    
In this paper, we present an updated classification of the ubiquitous MIP (Major Intrinsic Protein) family proteins, including 153 fully or partially sequenced members available in public databases. Presently, about 30 of these proteins have been functionally characterized, exhibiting essentially two distinct types of channel properties: (1) specific water transport by the aquaporins, and (2) small neutral solutes transport, such as glycerol by the glycerol facilitators. Sequence alignments were used to predict amino acids and motifs discriminant in channel specificity. The protein sequences were also analyzed using statistical tools (comparisons of means and correspondence analysis). Five key positions were clearly identified where the residues are specific for each functional subgroup and exhibit high dissimilar physico-chemical properties. Moreover, we have found that the putative channels for small neutral solutes clearly differ from the aquaporins by the amino acid content and the length of predicted loop regions, suggesting a substrate filter function for these loops. From these results, we propose a signature pattern for water transport.  相似文献   

11.
It is commonly believed that similarities between the sequences of two proteins infer similarities between their structures. Sequence alignments reliably recognize pairs of protein of similar structures provided that the percentage sequence identity between their two sequences is sufficiently high. This distinction, however, is statistically less reliable when the percentage sequence identity is lower than 30% and little is known then about the detailed relationship between the two measures of similarity. Here, we investigate the inverse correlation between structural similarity and sequence similarity on 12 protein structure families. We define the structure similarity between two proteins as the cRMS distance between their structures. The sequence similarity for a pair of proteins is measured as the mean distance between the sequences in the subsets of sequence space compatible with their structures. We obtain an approximation of the sequence space compatible with a protein by designing a collection of protein sequences both stable and specific to the structure of that protein. Using these measures of sequence and structure similarities, we find that structural changes within a protein family are linearly related to changes in sequence similarity.  相似文献   

12.
    
Shatsky M  Nussinov R  Wolfson HJ 《Proteins》2006,62(1):209-217
Routinely used multiple-sequence alignment methods use only sequence information. Consequently, they may produce inaccurate alignments. Multiple-structure alignment methods, on the other hand, optimize structural alignment by ignoring sequence information. Here, we present an optimization method that unifies sequence and structure information. The alignment score is based on standard amino acid substitution probabilities combined with newly computed three-dimensional structure alignment probabilities. The advantage of our alignment scheme is in its ability to produce more accurate multiple alignments. We demonstrate the usefulness of the method in three applications: 1) computing more accurate multiple-sequence alignments, 2) analyzing protein conformational changes, and 3) computation of amino acid structure-sequence conservation with application to protein-protein docking prediction. The method is available at http://bioinfo3d.cs.tau.ac.il/staccato/.  相似文献   

13.
  总被引:4,自引:0,他引:4  
We describe a new strategy for utilizing multiple sequence alignment information to detect distant relationships in searches of sequence databases. A single sequence representing a protein family is enriched by replacing conserved regions with position-specific scoring matrices (PSSMs) or consensus residues derived from multiple alignments of family members. In comprehensive tests of these and other family representations, PSSM-embedded queries produced the best results overall when used with a special version of the Smith-Waterman searching algorithm. Moreover, embedding consensus residues instead of PSSMs improved performance with readily available single sequence query searching programs, such as BLAST and FASTA. Embedding PSSMs or consensus residues into a representative sequence improves searching performance by extracting multiple alignment information from motif regions while retaining single sequence information where alignment is uncertain.  相似文献   

14.
Consensus design is an appealing strategy for the stabilization of proteins. It exploits amino acid conservation in sets of homologous proteins to identify likely beneficial mutations. Nevertheless, its success depends on the phylogenetic diversity of the sequence set available. Here, we show that randomization of a single protein represents a reliable alternative source of sequence diversity that is essentially free of phylogenetic bias. A small number of functional protein sequences selected from binary-patterned libraries suffice as input for the consensus design of active enzymes that are easier to produce and substantially more stable than individual members of the starting data set. Although catalytic activity correlates less consistently with sequence conservation in these extensively randomized proteins, less extreme mutagenesis strategies might be adopted in practice to augment stability while maintaining function.  相似文献   

15.
The evolution of the prototypical (βα)8-barrel protein imidazole glycerol phosphate synthase (HisF) was studied by complementary computational and experimental approaches. The 4-fold symmetry of HisF suggested that its constituting (βα)2 quarter-barrels have a common evolutionary origin. This conclusion was supported by the computational reconstruction of the HisF sequence of the last common ancestor, which showed that its quarter-barrels were more similar to each other than are those of extant HisF proteins. A comprehensive sequence analysis identified HisF-N1 [corresponding to (βα)1-2] as the slowest evolving quarter-barrel. This finding indicated that it is the closest relative of the common (βα)2 predecessor, which must have been a stable and presumably tetrameric protein. In accordance with this prediction, a recombinantly produced HisF-N1 protein was properly folded and formed a tetramer being stabilised by disulfide bonds. The introduction of a disulfide bond in HisF-C1 [corresponding to (βα)5-6] also resulted in the formation of a stable tetramer. The fusion of two identical HisF-N1 quarter-barrels yielded the stable dimeric half-barrel HisF-N1N1. Our findings suggest a two-step evolutionary pathway in which a HisF-N1-like predecessor was duplicated and fused twice to yield HisF. Most likely, the (βα)2 quarter-barrel and (βα)4 half-barrel intermediates on this pathway were stabilised by disulfide bonds that became dispensable upon consolidation of the (βα)8-barrel.  相似文献   

16.
17.
    
The ability to generate and design antibodies recognizing specific targets has revolutionized the pharmaceutical industry and medical imaging. Engineering antibody therapeutics in some cases requires modifying their constant domains to enable new and altered interactions. Engineering novel specificities into antibody constant domains has proved challenging due to the complexity of inter‐domain interactions. Covarying networks of residues that tend to cluster on the protein surface and near binding sites have been identified in some proteins. However, the underlying role these networks play in the protein resulting in their conservation remains unclear in most cases. Resolving their role is crucial, because residues in these networks are not viable design targets if their role is to maintain the fold of the protein. Conversely, these networks of residues are ideal candidates for manipulating specificity if they are primarily involved in binding, such as the myriad interdomain interactions maintained within antibodies. Here, we identify networks of evolutionarily‐related residues in C‐class antibody domains by evaluating covariation, a measure of propensity with which residue pairs vary dependently during evolution. We computationally test whether mutation of residues in these networks affects stability of the folded antibody domain, determining their viability as design candidates. We find that members of covarying networks cluster at domain‐domain interfaces, and that mutations to these residues are diverse and frequent during evolution, precluding their importance to domain stability. These results indicate that networks of covarying residues exist in antibody domains for functional reasons unrelated to thermodynamic stability, making them ideal targets for antibody design. Proteins 2013. © 2012 Wiley Periodicals, Inc.  相似文献   

18.

Background

The increasing abundance of neuromorphological data provides both the opportunity and the challenge to compare massive numbers of neurons from a wide diversity of sources efficiently and effectively. We implemented a modified global alignment algorithm representing axonal and dendritic bifurcations as strings of characters. Sequence alignment quantifies neuronal similarity by identifying branch-level correspondences between trees.

Results

The space generated from pairwise similarities is capable of classifying neuronal arbor types as well as, or better than, traditional topological metrics. Unsupervised cluster analysis produces groups that significantly correspond with known cell classes for axons, dendrites, and pyramidal apical dendrites. Furthermore, the distinguishing consensus topology generated by multiple sequence alignment of a group of neurons reveals their shared branching blueprint. Interestingly, the axons of dendritic-targeting interneurons in the rodent cortex associates with pyramidal axons but apart from the (more topologically symmetric) axons of perisomatic-targeting interneurons.

Conclusions

Global pairwise and multiple sequence alignment of neurite topologies enables detailed comparison of neurites and identification of conserved topological features in alignment-defined clusters. The methods presented also provide a framework for incorporation of additional branch-level morphological features. Moreover, comparison of multiple alignment with motif analysis shows that the two techniques provide complementary information respectively revealing global and local features.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0605-1) contains supplementary material, which is available to authorized users.  相似文献   

19.
    
Valdar WS 《Proteins》2002,48(2):227-241
The importance of a residue for maintaining the structure and function of a protein can usually be inferred from how conserved it appears in a multiple sequence alignment of that protein and its homologues. A reliable metric for quantifying residue conservation is desirable. Over the last two decades many such scores have been proposed, but none has emerged as a generally accepted standard. This work surveys the range of scores that biologists, biochemists, and, more recently, bioinformatics workers have developed, and reviews the intrinsic problems associated with developing and evaluating such a score. A general formula is proposed that may be used to compare the properties of different particular conservation scores or as a measure of conservation in its own right.  相似文献   

20.
Clustal W—蛋白质与核酸序列分析软件   总被引:2,自引:1,他引:2  
蛋白质与核酸的序列分析在现代生物学和生物信息学中发挥着重要作用,新的算法和软件层出不穷,本文介绍一个可运行在PC机上的完全免费的多序列比较软件-ClustalW,它不但可以进行蛋白质与核酸的多序列比较,分析不同序列之间的相似性关系,还可以绘制进化树。由于其灵活的输入输出格式、方便的参数设定和选择、详尽的在线帮助以及良好的可移植性,使得ClustalW在蛋白质与核酸的序列分析中得到了广泛应用。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号