首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Given a transmembrane protein, we wish to find related ones by a database search. Due to the strongly hydrophobic amino acid composition of transmembrane domains, suboptimal results are obtained when general-purpose scoring matrices such as BLOSUM are used. Recently, a transmembrane-specific score matrix called PHAT was shown to perform much better than BLOSUM. In this article, we derive a transmembrane score matrix family, called SLIM, which has several distinguishing features. In contrast to currently used matrices, SLIM is non-symmetric. The asymmetry arises because different background compositions are assumed for the transmembrane query and the unknown database sequences. We describe the mathematical model behind SLIM in detail and show that SLIM outperforms PHAT both on simulated data and in a realistic setting. Since non-symmetric score matrices are a new concept in database search methods, we discuss some important theoretical and practical issues.  相似文献   

2.
MOTIVATION: The database of transmembrane protein (TMP) structures is still very small. At the same time, more and more TMP sequences are being determined. Molecular modeling is an interim answer that may bridge the gap between the two databases. The first step in homology modeling is to achieve a good alignment between the target sequences and the template structure. However, since most algorithms to obtain the alignments were constructed with data derived from globular proteins, they perform poorly when applied to TMPs. In our application, we automate the alignment procedure and design it specifically for TMP. We first identify segments likely to form transmembrane alpha-helices. We then apply different sets of criteria for transmembrane and non-transmembrane segments. For example, the penalty for insertion/deletions in the transmembrane segments is much higher than that of a penalty in the loop region. Different substitution matrices are used since the frequencies of occurrence of the various amino acids differ for transmembrane segments and water-soluble domains. RESULTS: This program leads to better models since it does not treat the protein as a single entity with the same properties, but accounts for the different physical properties of the various segments. STAM is the first multisequence alignment program that is directly targeted at transmembrane proteins. AVAILABILITY: Source code and installation package are available on request from the authors. Web access is currently implemented.  相似文献   

3.
A total of 20%-25% of the proteins in a typical genome are helical membrane proteins. The transmembrane regions of these proteins have markedly different properties when compared with globular proteins. This presents a problem when homology search algorithms optimized for globular proteins are applied to membrane proteins. Here we present modifications of the standard Smith-Waterman and profile search algorithms that significantly improve the detection of related membrane proteins. The improvement is based on the inclusion of information about predicted transmembrane segments in the alignment algorithm. This is done by simply increasing the alignment score if two residues predicted to belong to transmembrane segments are aligned with each other. Benchmarking over a test set of G-protein-coupled receptor sequences shows that the number of false positives is significantly reduced in this way, both when closely related and distantly related proteins are searched for.  相似文献   

4.
MOTIVATION: Amino acid substitution matrices play a central role in protein alignment methods. Standard log-odds matrices, such as those of the PAM and BLOSUM series, are constructed from large sets of protein alignments having implicit background amino acid frequencies. However, these matrices frequently are used to compare proteins with markedly different amino acid compositions, such as transmembrane proteins or proteins from organisms with strongly biased nucleotide compositions. It has been argued elsewhere that standard matrices are not ideal for such comparisons and, furthermore, a rationale has been presented for transforming a standard matrix for use in a non-standard compositional context. RESULTS: This paper presents the mathematical details underlying the compositional adjustment of amino acid or DNA substitution matrices.  相似文献   

5.
Patterns of hydrophobic and hydrophilic residues play a major role in protein folding and function. Long, predominantly hydrophobic strings of 20-22 amino acids each are associated with transmembrane helices and have been used to identify such sequences. Much less attention has been paid to hydrophobic sequences within globular proteins. In prior work on computer simulations of the competition between on-pathway folding and off-pathway aggregate formation, we found that long sequences of consecutive hydrophobic residues promoted aggregation within the model, even controlling for overall hydrophobic content. We report here on an analysis of the frequencies of different lengths of contiguous blocks of hydrophobic residues in a database of amino acid sequences of proteins of known structure. Sequences of three or more consecutive hydrophobic residues are found to be significantly less common in actual globular proteins than would be predicted if residues were selected independently. The result may reflect selection against long blocks of hydrophobic residues within globular proteins relative to what would be expected if residue hydrophobicities were independent of those of nearby residues in the sequence.  相似文献   

6.
Bernsel A  Viklund H  Elofsson A 《Proteins》2008,71(3):1387-1399
Compared with globular proteins, transmembrane proteins are surrounded by a more intricate environment and, consequently, amino acid composition varies between the different compartments. Existing algorithms for homology detection are generally developed with globular proteins in mind and may not be optimal to detect distant homology between transmembrane proteins. Here, we introduce a new profile-profile based alignment method for remote homology detection of transmembrane proteins in a hidden Markov model framework that takes advantage of the sequence constraints placed by the hydrophobic interior of the membrane. We expect that, for distant membrane protein homologs, even if the sequences have diverged too far to be recognized, the hydrophobicity pattern and the transmembrane topology are better conserved. By using this information in parallel with sequence information, we show that both sensitivity and specificity can be substantially improved for remote homology detection in two independent test sets. In addition, we show that alignment quality can be improved for the most distant homologs in a public dataset of membrane protein structures. Applying the method to the Pfam domain database, we are able to suggest new putative evolutionary relationships for a few relatively uncharacterized protein domain families, of which several are confirmed by other methods. The method is called Searcher for Homology Relationships of Integral Membrane Proteins (SHRIMP) and is available for download at http://www.sbc.su.se/shrimp/.  相似文献   

7.
For the past 50?years, the Ramachandran map has been used effectively to study the protein structure and folding. However, though extensive analysis has been done on dihedral angle preferences of residues in globular proteins, related studies and reports of membrane proteins are limited. It is of interest to explore the conformational preferences of residues in transmembrane regions of membrane proteins which are involved in several important and diverse biological processes. Hence, in the present work, a systematic comparative computational analysis has been made on dihedral angle preferences of alanine and glycine in alpha and beta transmembrane regions (the two major classes of transmembrane proteins) with the aid of the Ramachandran map. Further, the conformational preferences of residues in transmembrane regions were compared with the non-transmembrane regions. We have extracted cation-pi interacting residues present in transmembrane regions and explored the dihedral angle preferences. From our observations, we reveal the higher percentage of occurrences of glycine in alpha and beta transmembrane regions than other hydrophobic residues. Further, we noted a clear shift in ψ-angle preferences of glycine residues from negative bins in alpha transmembrane regions to positive bins in beta transmembrane regions. Also, cation-pi interacting residues in beta transmembrane regions avoid preferring ψ-angles in the range of ?59° to ?30°. In this article, we insist that the studies on preferences of dihedral angles in transmembrane regions, thorough understanding of structure and folding of transmembrane proteins, can lead to modeling of novel transmembrane regions towards designing membrane proteins.  相似文献   

8.
We assessed the disease-causing potential of single nucleotide polymorphisms (SNPs) based on a simple set of sequence-based features. We focused on SNPs from the dbSNP database in G-protein-coupled receptors (GPCRs), a large class of important transmembrane (TM) proteins. Apart from the location of the SNP in the protein, we evaluated the predictive power of three major classes of features to differentiate between disease-causing mutations and neutral changes: (i) properties derived from amino-acid scales, such as volume and hydrophobicity; (ii) position-specific phylogenetic features reflecting evolutionary conservation, such as normalized site entropy, residue frequency and SIFT score; and (iii) substitution-matrix scores, such as those derived from the BLOSUM62, GRANTHAM and PHAT matrices. We validated our approach using a control dataset consisting of known disease-causing mutations and neutral variations. Logistic regression analyses indicated that position-specific phylogenetic features that describe the conservation of an amino acid at a specific site are the best discriminators of disease mutations versus neutral variations, and integration of all our features improves discrimination power. Overall, we identify 115 SNPs in GPCRs from dbSNP that are likely to be associated with disease and thus are good candidates for genotyping in association studies.  相似文献   

9.
MOTIVATION: Integral membrane proteins play important roles in living cells. Although these proteins are estimated to constitute 25% of proteins at a genomic scale, the Protein Data Bank (PDB) contains only a few hundred membrane proteins due to the difficulties with experimental techniques. The presence of transmembrane proteins in the structure data bank, however, is quite invisible, as the annotation of these entries is rather poor. Even if a protein is identified as a transmembrane one, the possible location of the lipid bilayer is not indicated in the PDB because these proteins are crystallized without their natural lipid bilayer, and currently no method is publicly available to detect the possible membrane plane using the atomic coordinates of membrane proteins. RESULTS: Here, we present a new geometrical approach to distinguish between transmembrane and globular proteins using structural information only and to locate the most likely position of the lipid bilayer. An automated algorithm (TMDET) is given to determine the membrane planes relative to the position of atomic coordinates, together with a discrimination function which is able to separate transmembrane and globular proteins even in cases of low resolution or incomplete structures such as fragments or parts of large multi chain complexes. This method can be used for the proper annotation of protein structures containing transmembrane segments and paves the way to an up-to-date database containing the structure of all known transmembrane proteins and fragments (PDB_TM) which can be automatically updated. The algorithm is equally important for the purpose of constructing databases purely of globular proteins.  相似文献   

10.
The genomic era has seen a remarkable increase in the number of genomes being sequenced and annotated. Nonetheless, annotation remains a serious challenge for compositionally biased genomes. For the preliminary annotation, popular nucleotide and protein comparison methods such as BLAST are widely employed. These methods make use of matrices to score alignments such as the amino acid substitution matrices. Since a nucleotide bias leads to an overall bias in the amino acid composition of proteins, it is possible that a genome with nucleotide bias may have introduced atypical amino acid substitutions in its proteome. Consequently, standard matrices fail to perform well in sequence analysis of these genomes. To address this issue, we examined the amino acid substitution in the AT-rich genome of Plasmodium falciparum, chosen as a reference and reconstituted a substitution matrix in the genome's context. The matrix was used to generate protein sequence alignments for the parasite proteins that improved across the functional regions. We attribute this to the consistency that may have been achieved amid the target and background frequencies calculated exclusively in our study. This study has important implications on annotation of proteins that are of experimental interest but give poor sequence alignments with standard conventional matrices.  相似文献   

11.
The packing of helices spanning lipid bilayers is crucial for the stability and function of alpha-helical membrane proteins. Using a modified Voronoi procedure, we calculated packing densities for helix-helix contacts in membrane spanning domains. Our results show that the transmembrane helices of protein channels and transporters are significantly more loosely packed compared with helices in globular proteins. The observed packing deficiencies of these membrane proteins are also reflected by a higher amount of cavities at functionally important sites. The cavities positioned along the gated pores of membrane channels and transporters are noticeably lined by polar amino acids that should be exposed to the aqueous medium when the protein is in the open state. In contrast, nonpolar amino acids surround the cavities in those protein regions where large rearrangements are supposed to take place, as near the hinge regions of transporters or at restriction sites of protein channels. We presume that the observed deficiencies of helix-helix packing are essential for the helical mobility that sustains the function of many membrane protein channels and transporters.  相似文献   

12.
The analysis of inter-residue interactions in protein structures provides considerable insight to understand their folding and stability. We have previously analyzed the role of medium- and long-range interactions in the folding of globular proteins. In this work, we study the distinct role of such interactions in the three-dimensional structures of membrane proteins. We observed a higher number of long-range contacts in the termini of transmembrane helical (TMH) segments, implying their role in the stabilization of helix-helix interactions. The transmembrane strand (TMS) proteins are having appreciably higher long-range contacts than that in all-beta class of globular proteins, indicating closer packing of the strands in TMS proteins. The residues in membrane spanning segments of TMH proteins have 1.3 times higher medium-range contacts than long-range contacts whereas that of TMS proteins have 14 times higher long-range contacts than medium-range contacts. Residue-wise analysis indicates that in TMH proteins, the residues Cys, Glu, Gly, Pro, Gln, Ser and Tyr have higher long-range contacts than medium-range contacts in contrast with all-alpha class of globular proteins. The charged residue pairs have higher medium-range contacts in all-alpha proteins, whereas hydrophobic residue pairs are dominant in TMH proteins. The information on the preference of residue pairs to form medium-range contacts has been successfully used to discriminate the TMH proteins from all-alpha proteins. The statistical significance of the results obtained from the present study has been verified using randomized structures of TMH and TMS protein templates.  相似文献   

13.
We have analyzed 29 published substitution matrices (SMs) and five statistical protein contact potentials (CPs) for comparison. We find that popular, 'classical' SMs obtained mainly from sequence alignments of globular proteins are mostly correlated by at least a value of 0.9. The BLOSUM62 is the central element of this group. A second group includes SMs derived from alignments of remote homologs or transmembrane proteins. These matrices correlate better with classical SMs (0.8) than among themselves (0.7). A third group consists of intermediate links between SMs and CPs - matrices and potentials that exhibit mutual correlations of at least 0.8. Next, we show that SMs can be approximated with a correlation of 0.9 by expressions c(0) + x(i)x(j) + y(i)y(j) + z(i)z(j), 1相似文献   

14.
Crystal structure data of globular proteins were used to prepare (phi, psi) probability maps of 20 proteinous amino acids. These maps were compared grid-wise with each other and a conformational similarity index was calculated for each pair of amino acids. A weight matrix, called Conformational Similarity Weight (CSW) matrix, was prepared using the conformational similarity index. This weight matrix was used to align sequences of 21 pairs of proteins whose crystal structures are known. The aligned regions with more than seven contiguous amino acids were further analysed by plotting average weight (W) values of overlapping hepatapeptides in these regions and carrying out curve fitting by Fourier series having TEN harmonics. The protein fragments corresponding to the half-linewidth of peaks were predicted as fragments having similar conformation in the protein pair under consideration. Such an approach allows us to pick up conformationally similar protein fragments with more than 67% accuracy.  相似文献   

15.
We studied the low-frequency terahertz spectroscopy of two photoactive protein systems, rhodopsin and bacteriorhodopsin, as a means to characterize collective low-frequency motions in helical transmembrane proteins. From this work, we found that the nature of the vibrational motions activated by terahertz radiation is surprisingly similar between these two structurally similar proteins. Specifically, at the lowest frequencies probed, the cytoplasmic loop regions of the proteins are highly active; and at the higher terahertz frequencies studied, the extracellular loop regions of the protein systems become vibrationally activated. In the case of bacteriorhodopsin, the calculated terahertz spectra are compared with the experimental terahertz signature. This work illustrates the importance of terahertz spectroscopy to identify vibrational degrees of freedom which correlate to known conformational changes in these proteins.  相似文献   

16.
The amino acid sequences of proteins provide rich information for inferring distant phylogenetic relationships and for predicting protein functions. Estimating the rate matrix of residue substitutions from amino acid sequences is also important because the rate matrix can be used to develop scoring matrices for sequence alignment. Here we use a continuous time Markov process to model the substitution rates of residues and develop a Bayesian Markov chain Monte Carlo method for rate estimation. We validate our method using simulated artificial protein sequences. Because different local regions such as binding surfaces and the protein interior core experience different selection pressures due to functional or stability constraints, we use our method to estimate the substitution rates of local regions. Our results show that the substitution rates are very different for residues in the buried core and residues on the solvent-exposed surfaces. In addition, the rest of the proteins on the binding surfaces also have very different substitution rates from residues. Based on these findings, we further develop a method for protein function prediction by surface matching using scoring matrices derived from estimated substitution rates for residues located on the binding surfaces. We show with examples that our method is effective in identifying functionally related proteins that have overall low sequence identity, a task known to be very challenging.  相似文献   

17.
Aligning protein sequences using a score matrix has became a routine but valuable method in modern biological research. However, alignment in the ‘twilight zone’ remains an open issue. It is feasible and necessary to construct a new score matrix as more protein structures are resolved. Three structural class-specific score matrices (all-alpha, allbeta and alpha/beta) were constructed based on the structure alignment of low identity proteins of the corresponding structural classes. The class-specific score matrices were significantly better than a structure-derived matrix (HSDM) and three other generalized matrices (BLOSUM30, BLOSUM60 and Gonnet250) in alignment performance tests. The optimized gap penalties presented here also promote alignment performance. The results indicate that different protein classes have distinct amino acid substitution patterns, and an amino acid score matrix should be constructed based on different structural classes. The class-specific score matrices could also be used in profile construction to improve homology detection.  相似文献   

18.
We have used three reference sequences representative of bacterial drug resistance pumps and sugar transport proteins to collect the 91 most closely related sequences from a composite, nonredundant protein sequence database. Having eliminated certain very close relatives, the remainder were subjected to analysis and alignment by using two different similarity matrices: one of these was a matrix based on structural conservation of amino acid residues in proteins of known conformation and the other was based on the more familiar mutational matrix. Unrooted similarity trees for these proteins were constructed for each matrix and compared. A systematic analysis of the differences between these trees was undertaken and the sequences were analyzed for the presence or absence of certain sequence motifs. The results show that the clades created by the two methods are broadly comparable but that there are some clusters of sequences that are significantly different. Further analysis confirmed that (1) the sequences collected by this objective method are all known or putative 12-helix (in some cases reported as 14-helix) transmembrane proteins, (2) there is evidence for few cases of an origin based on gene duplication, (3) the bacterial drug resistance pumps are distributed in more than one clade and cannot be regarded as a definitive subset of these proteins, and that (4) the diversity is such that there is no evidence of a single ancestral protein. The possible extension of the methods to other cases of divergent protein sequences is discussed.  相似文献   

19.
A growing number of proteins are being identified that are biologically active though intrinsically disordered, in sharp contrast with the classic notion that proteins require a well-defined globular structure in order to be functional. At the same time recent work showed that aggregation and amyloidosis are initiated in amino acid sequences that have specific physico-chemical properties in terms of secondary structure propensities, hydrophobicity and charge. In intrinsically disordered proteins (IDPs) such sequences would be almost exclusively solvent-exposed and therefore cause serious solubility problems. Further, some IDPs such as the human prion protein, synuclein and Tau protein are related to major protein conformational diseases. However, this scenario contrasts with the large number of unstructured proteins identified, especially in higher eukaryotes, and the fact that the solubility of these proteins is often particularly good. We have used the algorithm TANGO to compare the beta aggregation tendency of a set of globular proteins derived from SCOP and a set of 296 experimentally verified, non-redundant IDPs but also with a set of IDPs predicted by the algorithms DisEMBL and GlobPlot. Our analysis shows that the beta-aggregation propensity of all-alpha, all-beta and mixed alpha/beta globular proteins as well as membrane-associated proteins is fairly similar. This illustrates firstly that globular structures possess an appreciable amount of structural frustration and secondly that beta-aggregation is not determined by hydrophobicity and beta-sheet propensity alone. We also show that globular proteins contain almost three times as much aggregation nucleating regions as IDPs and that the formation of highly structured globular proteins comes at the cost of a higher beta-aggregation propensity because both structure and aggregation obey very similar physico-chemical constraints. Finally, we discuss the fact that although IDPs have a much lower aggregation propensity than globular proteins, this does not necessarily mean that they have a lower potential for amyloidosis.  相似文献   

20.
Peroxin 2 (PEX2) is a 35-kDa integral peroxisomal membrane protein with two transmembrane regions and a zinc RING domain within its cytoplasmically exposed C-terminus. Although its role in peroxisome biogenesis and function is poorly understood, it seems to be involved in peroxisomal matrix protein import. PEX2 is synthesized on free cytosolic ribosomes and is posttranslationally imported into the peroxisome membrane by specific targeting information. While a clear picture of the basic targeting mechanisms for peroxisomal matrix proteins has emerged over the past years, the targeting processes for peroxisomal membrane proteins are less well understood. We expressed various deletion constructs of PEX2 in fusion with the green fluorescent protein in COS-7 cells and determined their intracellular localization. We found that the minimum peroxisomal targeting signal of human PEX2 consists of an internal protein region of 30 amino acids (AA130 to AA159) and the first transmembrane domain, and that adding the second transmembrane domain increases targeting efficiency. Within the minimum targeting region we identified the motif "KX6(I/L)X(L/F/I)LK(L/F/I)" that includes important targeting information and is also present in the targeting regions of the 22-kDa peroxisomal membrane protein (PMP22) and the 70-kDa peroxisomal membrane protein (PMP70). Mutations in this targeting motif mislocalize PEX2 to the cytosol. In contrast, the second transmembrane domain does not seem to have specific peroxisomal membrane targeting information. Replacing the second transmembrane domain of human PEX2 with the transmembrane domain of human cytochrome c oxidase subunit IV does not alter PEX2 peroxisome targeting function and efficiency.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号