首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
    
Chaudhuri I  Söding J  Lupas AN 《Proteins》2008,71(2):795-803
beta-Propellers are toroidal folds, in which repeated, four-stranded beta-meanders are arranged in a circular and slightly tilted fashion, like the blades of a propeller. They are found in all domains of life, with a strong preponderance among eukaryotes. Propellers show considerable sequence diversity and are classified into six separate structural groups by the SCOP and CATH databases. Despite this diversity, they often show similarities across groups, not only in structure but also in sequence, raising the possibility of a common origin. In agreement with this hypothesis, most propellers group together in a cluster map of all-beta folds generated by sequence similarity, because of numerous pairwise matches, many of which are individually nonsignificant. In total, 45 of 60 propellers in the SCOP25 database, covering four SCOP folds, are clustered in this group and analysis with sensitive sequence comparison methods shows that they are similar at a level indicative of homology. Two mechanisms appear to contribute to the evolution of beta-propellers: amplification from single blades and subsequent functional differentiation. The observation of propellers with nearly identical blades in genomic sequences show that these mechanisms are still operating today.  相似文献   

2.
    
Bernsel A  Viklund H  Elofsson A 《Proteins》2008,71(3):1387-1399
Compared with globular proteins, transmembrane proteins are surrounded by a more intricate environment and, consequently, amino acid composition varies between the different compartments. Existing algorithms for homology detection are generally developed with globular proteins in mind and may not be optimal to detect distant homology between transmembrane proteins. Here, we introduce a new profile-profile based alignment method for remote homology detection of transmembrane proteins in a hidden Markov model framework that takes advantage of the sequence constraints placed by the hydrophobic interior of the membrane. We expect that, for distant membrane protein homologs, even if the sequences have diverged too far to be recognized, the hydrophobicity pattern and the transmembrane topology are better conserved. By using this information in parallel with sequence information, we show that both sensitivity and specificity can be substantially improved for remote homology detection in two independent test sets. In addition, we show that alignment quality can be improved for the most distant homologs in a public dataset of membrane protein structures. Applying the method to the Pfam domain database, we are able to suggest new putative evolutionary relationships for a few relatively uncharacterized protein domain families, of which several are confirmed by other methods. The method is called Searcher for Homology Relationships of Integral Membrane Proteins (SHRIMP) and is available for download at http://www.sbc.su.se/shrimp/.  相似文献   

3.
Cadherins are cell surface adhesion proteins important for tissue development and integrity. Type I and type II, or classical, cadherins form adhesive dimers via an interface formed through the exchange, or “swapping”, of the N-terminal β-strands from their membrane-distal EC1 domains. Here, we ask which sequence and structural features in EC1 domains are responsible for β-strand swapping and whether members of other cadherin families form similar strand-swapped binding interfaces. We created a comprehensive database of multiple alignments of each type of cadherin domain. We used the known three-dimensional structures of classical cadherins to identify conserved positions in multiple sequence alignments that appear to be crucial determinants of the cadherin domain structure. We identified features that are unique to EC1 domains. On the basis of our analysis, we conclude that all cadherin domains have very similar overall folds but, with the exception of classical and desmosomal cadherin EC1 domains, most of them do not appear to bind through a strand-swapping mechanism. Thus, non-classical cadherins that function in adhesion are likely to use different protein-protein interaction interfaces. Our results have implications for the evolution of molecular mechanisms of cadherin-mediated adhesion in vertebrates.  相似文献   

4.
Protein functional annotation relies on the identification of accurate relationships, sequence divergence being a key factor. This is especially evident when distant protein relationships are demonstrated only with three-dimensional structures. To address this challenge, we describe a computational approach to purposefully bridge gaps between related protein families through directed design of protein-like “linker” sequences. For this, we represented SCOP domain families, integrated with sequence homologues, as multiple profiles and performed HMM-HMM alignments between related domain families. Where convincing alignments were achieved, we applied a roulette wheel-based method to design 3,611,010 protein-like sequences corresponding to 374 SCOP folds. To analyze their ability to link proteins in homology searches, we used 3024 queries to search two databases, one containing only natural sequences and another one additionally containing designed sequences. Our results showed that augmented database searches showed up to 30% improvement in fold coverage for over 74% of the folds, with 52 folds achieving all theoretically possible connections. Although sequences could not be designed between some families, the availability of designed sequences between other families within the fold established the sequence continuum to demonstrate 373 difficult relationships. Ultimately, as a practical and realistic extension, we demonstrate that such protein-like sequences can be “plugged-into” routine and generic sequence database searches to empower not only remote homology detection but also fold recognition. Our richly statistically supported findings show that complementary searches in both databases will increase the effectiveness of sequence-based searches in recognizing all homologues sharing a common fold.  相似文献   

5.
Throughout history, the population size of modern humans has varied considerably due to changes in environment, culture, and technology. More accurate estimates of population size changes, and when they occurred, should provide a clearer picture of human colonization history and help remove confounding effects from natural selection inference. Demography influences the pattern of genetic variation in a population, and thus genomic data of multiple individuals sampled from one or more present-day populations contain valuable information about the past demographic history. Recently, Li and Durbin developed a coalescent-based hidden Markov model, called the pairwise sequentially Markovian coalescent (PSMC), for a pair of chromosomes (or one diploid individual) to estimate past population sizes. This is an efficient, useful approach, but its accuracy in the very recent past is hampered by the fact that, because of the small sample size, only few coalescence events occur in that period. Multiple genomes from the same population contain more information about the recent past, but are also more computationally challenging to study jointly in a coalescent framework. Here, we present a new coalescent-based method that can efficiently infer population size changes from multiple genomes, providing access to a new store of information about the recent past. Our work generalizes the recently developed sequentially Markov conditional sampling distribution framework, which provides an accurate approximation of the probability of observing a newly sampled haplotype given a set of previously sampled haplotypes. Simulation results demonstrate that we can accurately reconstruct the true population histories, with a significant improvement over the PSMC in the recent past. We apply our method, called diCal, to the genomes of multiple human individuals of European and African ancestry to obtain a detailed population size change history during recent times.  相似文献   

6.
7.
隐马尔科夫过程在生物信息学中的应用   总被引:3,自引:0,他引:3  
隐马尔科夫过程(hidden markov model,简称HMM)是20世纪70年代提出来的一种统计方法,以前主要用于语音识别。1989年Churchill将其引入计算生物学。目前,HMM是生物信息学中应用比较广泛的一种统计方法,主要用于:线性序列分析、模型分析、基因发现等方面。对HMM进行了简明扼要的描述,并对其在上述几个方面的应用作一概略介绍。  相似文献   

8.
    
  相似文献   

9.
    
Hou Y  Hsu W  Lee ML  Bystroff C 《Proteins》2004,57(3):518-530
Remote homology detection refers to the detection of structural homology in proteins when there is little or no sequence similarity. In this article, we present a remote homolog detection method called SVM-HMMSTR that overcomes the reliance on detectable sequence similarity by transforming the sequences into strings of hidden Markov states that represent local folding motif patterns. These state strings are transformed into fixed-dimension feature vectors for input to a support vector machine. Two sets of features are defined: an order-independent feature set that captures the amino acid and local structure composition; and an order-dependent feature set that captures the sequential ordering of the local structures. Tests using the Structural Classification of Proteins (SCOP) 1.53 data set show that the SVM-HMMSTR gives a significant improvement over several current methods.  相似文献   

10.
  总被引:1,自引:0,他引:1  
We designed a simple position-specific hidden Markov model to predict protein structure. Our new framework naturally repeats itself to converge to a final target, conglomerating fragment assembly, clustering, target selection, refinement, and consensus, all in one process. Our initial implementation of this theory converges to within 6 A of the native structures for 100% of decoys on all six standard benchmark proteins used in ROSETTA (discussed by Simons and colleagues in a recent paper), which achieved only 14%-94% for the same data. The qualities of the best decoys and the final decoys our theory converges to are also notably better.  相似文献   

11.
    
Karchin R  Cline M  Karplus K 《Proteins》2004,55(3):508-518
Residue burial, which describes a protein residue's exposure to solvent and neighboring atoms, is key to protein structure prediction, modeling, and analysis. We assessed 21 alphabets representing residue burial, according to their predictability from amino acid sequence, conservation in structural alignments, and utility in one fold-recognition scenario. This follows upon our previous work in assessing nine representations of backbone geometry.1 The alphabet found to be most effective overall has seven states and is based on a count of C(beta) atoms within a 14 A-radius sphere centered at the C(beta) of a residue of interest. When incorporated into a hidden Markov model (HMM), this alphabet gave us a 38% performance boost in fold recognition and 23% in alignment quality.  相似文献   

12.
    
We consider hidden Markov models as a versatile class of models for weakly dependent random phenomena. The topic of the present paper is likelihood-ratio testing for hidden Markov models, and we show that, under appropriate conditions, the standard asymptotic theory of likelihood-ratio tests is valid. Such tests are crucial in the specification of multivariate Gaussian hidden Markov models, which we use to illustrate the applicability of our general results. Finally, the methodology is illustrated by means of a real data set.  相似文献   

13.
  总被引:3,自引:1,他引:3  
Using a variety of homology search methods and multiple alignments, a new extracellular module was identified in (1) agrin, (2) enterokinase, (3) a 63-kDa sea urchin sperm protein, (4) perlecan, (5) the breast cancer marker MUCI (episialin), (6) the cell surface antigen 114/A10, and (7/8) two functionally uncharacterized, probably extracellular, Caenorhabditis elegans proteins. Despite the functional diversity of these adhesive proteins, a common denominator seems to be their existence in heavily glycosylated environments. In addition, the better characterized proteins mentioned above contain all O-glycosidic-linked carbohydrates such as heparan sulfate that contribute considerably to their molecular masses. The common module might regulate or assist binding to neighboring carbohydrate moieties.  相似文献   

14.
    
The mitochondrial inner and outer membranes are composed of a variety of integral membrane proteins, assembled into the membranes posttranslationally. The small translocase of the inner mitochondrial membranes (TIMs) are a group of approximately 10 kDa proteins that function as chaperones to ferry the imported proteins across the mitochondrial intermembrane space to the outer and inner membranes. In yeast, there are 5 small TIM proteins: Tim8, Tim9, Tim10, Tim12, and Tim13, with equivalent proteins reported in humans. Using hidden Markov models, we find that many eukaryotes have proteins equivalent to the Tim8 and Tim13 and the Tim9 and Tim10 subunits. Some eukaryotes provide \"snapshots\" of evolution, with a single protein showing the features of both Tim8 and Tim13, suggesting that a single progenitor gene has given rise to each of the small TIMs through duplication and modification. We show that no \"Tim12\" family of proteins exist, but rather that variant forms of the cognate small TIMs have been recently duplicated and modified to provide new functions: the yeast Tim12 is a modified form of Tim10, whereas in humans and some protists variant forms of Tim9, Tim8, and Tim13 are found instead. Sequence motif analysis reveals acidic residues conserved in the Tim10 substrate-binding tentacles, whereas more hydrophobic residues are found in the equivalent substrate-binding region of Tim13. The substrate-binding region of Tim10 and Tim13 represent structurally independent domains: when the acidic domain from Tim10 is attached to Tim13, the Tim8-Tim13(10) complex becomes essential and the Tim9-Tim10 complex becomes dispensable. The conserved features in the Tim10 and Tim13 subunits provide distinct binding surfaces to accommodate the broad range of substrate proteins delivered to the mitochondrial inner and outer membranes.  相似文献   

15.
  总被引:5,自引:0,他引:5  
Using computer methods for database search and multiple alignment, statistically significant sequence similarities were identified between several nitrilases with distinct substrate specificity, cyanide hydratases, aliphatic amidases, beta-alanine synthase, and a few other proteins with unknown molecular function. All these proteins appear to be involved in the reduction of organic nitrogen compounds and ammonia production. Sequence conservation over the entire length, as well as the similarity in the reactions catalyzed by the known enzymes in this family, points to a common catalytic mechanism. The new family of enzymes is characterized by several conserved motifs, one of which contains an invariant cysteine that is part of the catalytic site in nitrilases. Another highly conserved motif includes an invariant glutamic acid that might also be involved in catalysis.  相似文献   

16.
    
Reversible protein phosphorylation by protein kinases and phosphatases is a ubiquitous signaling mechanism in all eukaryotic cells. A multilevel hidden Markov model library is presented which is able to classify protein kinases into one of 12 families, with a misclassification rate of zero on the characterized kinomes of H. sapiens, M. musculus, D. melanogaster, C. elegans, S. cerevisiae, D. discoideum, and P. falciparum. The Library is shown to outperform BLASTP and a general Pfam hidden Markov model of the kinase catalytic domain in the retrieval and family-level classification of protein kinases. The application of the Library to the 38 unclassified kinases of yeast enriches the yeast kinome in protein kinases of the families AGC (5), CAMK (17), CMGC (4), and STE (1), thereby raising the family-level classification of yeast conventional protein kinases from 66.96 to 90.43%. The application of the Library to 21 eukaryotic genomes shows seven families (AGC, CAMK, CK1, CMGC, STE, PIKK, and RIO) to be present in all genomes analyzed, and so is likely to be essential to eukaryotes. Putative tyrosine kinases (TKs) are found in the plants A. thaliana (2), O. sativa ssp. Indica (6), and O. sativa ssp. Japonica (7), and in the amoeba E. histolytica (7). To our knowledge, TKs have not been predicted in plants before. This also suggests that a primitive set of TKs might have predated the radiation of eukaryotes. Putative tyrosine kinase-like kinases (TKLs) are found in the fungi C. neoformans (2), P. chrysosporium (4), in the Apicomplexans C. hominis (4), P. yoelii (4), and P. falciparum (6), the amoeba E. histolytica (109), and the alga T. pseudonana (6). TKLs are found to be abundant in plants (776 in A. thaliana, 1010 in O. sativa ssp. Indica, and 969 in O. sativa ssp. Japonica). TKLs might have predated the radiation of eukaryotes too and have been lost secondarily from some fungi. The application of the Library facilitates the annotation of kinomes and has provided novel insights on the early evolution and subsequent adaptations of the various protein kinase families in eukaryotes.  相似文献   

17.
Protein family databases are an important resource for protein annotation and understanding protein evolution and function. In recent years hidden Markov models (HMMs) have become one of the key technologies used for detection of members of these families. This paper reviews the Pfam, TIGRFAMs and SMART databases that use the profile-HMMs provided by the HMMER package.  相似文献   

18.
    
  相似文献   

19.
M Rehmsmeier  M Vingron 《Proteins》2001,45(4):360-371
We present a database search method that is based on phylogenetic trees (treesearch). The method is used to search a protein sequence database for homologs to a protein family. In preparation for the search, a phylogenetic tree is constructed from a given multiple alignment of the family. During the search, each database sequence is temporarily inserted into the tree, thus adding a new edge to the tree. Homology between family and sequence is then judged from the length of this edge. In a comparison of our method to profiles (ISREC pfsearch), two implementations of hidden Markov models (HMMER hmmsearch and SAM hmmscore), and to the family pairwise search (FPS) method on 43 families from the SCOP database based on minimum false-positive counts (min-FPCs), we found a considerable gain in sensitivity. In 69% of the test cases, treesearch showed a min-FPC of at most 50, whereas the two second best methods (hmmsearch and FPS) showed this performance only in 53% cases. A similar impression holds for a large range of min-FPC thresholds. The results demonstrate that phylogenetic information can significantly improve the detection of distant homologies and justify our method as a useful alternative to existing methods.  相似文献   

20.
    
Elofsson A 《Proteins》2002,46(3):330-339
One of the most central methods in bioinformatics is the alignment of two protein or DNA sequences. However, so far large-scale benchmarks examining the quality of these alignments are scarce. On the other hand, recently several large-scale studies of the capacity of different methods to identify related sequences has led to new insights about the performance of fold recognition methods. To increase our understanding about fold recognition methods, we present a large-scale benchmark of alignment quality. We compare alignments from several different alignment methods, including sequence alignments, hidden Markov models, PSI-BLAST, CLUSTALW, and threading methods. For most methods, the alignment quality increases significantly at about 20% sequence identity. The difference in alignment quality between different methods is quite small, and the main difference can be seen at the exact positioning of the sharp rise in alignment quality, that is, around 15-20% sequence identity. The alignments are improved by using structural information. In general, the best alignments are obtained by methods that use predicted secondary structure information and sequence profiles obtained from PSI-BLAST. One interesting observation is that for different pairs many different methods create the best alignments. This finding implies that if a method that could select the best alignment method for each pair existed, a significant improvement of the alignment quality could be gained.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号