首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 986 毫秒
1.
Mishra P  Pandey PN 《Bioinformation》2011,6(10):372-374
The number of amino acid sequences is increasing very rapidly in the protein databases like Swiss-Prot, Uniprot, PIR and others, but the structure of only some amino acid sequences are found in the Protein Data Bank. Thus, an important problem in genomics is automatically clustering homologous protein sequences when only sequence information is available. Here, we use graph theoretic techniques for clustering amino acid sequences. A similarity graph is defined and clusters in that graph correspond to connected subgraphs. Cluster analysis seeks grouping of amino acid sequences into subsets based on distance or similarity score between pairs of sequences. Our goal is to find disjoint subsets, called clusters, such that two criteria are satisfied: homogeneity: sequences in the same cluster are highly similar to each other; and separation: sequences in different clusters have low similarity to each other. We tested our method on several subsets of SCOP (Structural Classification of proteins) database, a gold standard for protein structure classification. The results show that for a given set of proteins the number of clusters we obtained is close to the superfamilies in that set; there are fewer singeltons; and the method correctly groups most remote homologs.  相似文献   

2.
The amino acid composition of human alcohol dehydrogenase (ADH) was compared with alcohol dehydrogenases from different organisms and with other proteins. Similar amino acid sequences in human ADH (template protein) and in other proteins were determined by means of an original computer program. Analysis of amino acid motifs reveals that the ADHs from evolutionary more close organisms have more common amino acid sequences. The quantity measure of amino acid similarity was the number of similar motifs in analyzed protein per protein length. This value was measured for ADHs and for different proteins. For ADHs, this quotient was higher than for proteins with different functions; for vertebrates it correlated with evolutionary closeness. The similar operation of motif comparison was made with the help of program complex “MEME”. The analysis of ADHs revealed 4 motifs common to 6 of 10 tested organisms and no such motifs for proteins of different function. The conclusion is that general amino composition is more important for protein function than amino acid order and for enzymes of similar function it better correlates with evolutionary distance between organisms.  相似文献   

3.
Penicillin-binding proteins 1A and 1B of Escherichia coli are the major peptidoglycan transglycosylase-transpeptidases that catalyse the polymerisation and insertion of peptidoglycan precursors into the bacterial cell wall during cell elongation. The nucleotide sequence of a 2764-base-pair fragment of DNA that contained the ponA gene, encoding penicillin-binding protein 1A, was determined. The sequence predicted that penicillin-binding protein 1A had a relative molecular mass of 93 500 (850 amino acids). The amino-terminus of the protein had the features of a signal peptide but it is not known if this peptide is removed during insertion of the protein into the cytoplasmic membrane. The nucleotide sequence of a 2758-base-pair fragment of DNA that contained the ponB gene, encoding penicillin-binding protein 1B, was also determined. Penicillin-binding protein 1B consists of two major components which were shown to result from the use of alternative sites for the initiation of translation. The large and small forms of penicillin-binding protein 1B were predicted to have relative molecular masses of 94 100 and 88 800 (844 and 799 amino acids). The amino acid sequences of penicillin-binding proteins 1A and 1B could be aligned if two large gaps were introduced into the latter sequence and the two proteins then showed about 30% identity. The amino acid sequences of the proteins showed no extensive similarity to the sequences of penicillin-binding proteins 3 or 5, or to the class A or class C beta-lactamases. Two short regions of amino acid similarity were, however, found between penicillin-binding proteins 1A and 1B and the other penicillin-binding proteins and beta-lactamases. One of these included the predicted active-site serine residue which was located towards the middle of the sequences of penicillin-binding proteins 1A, 1B and 3, within the conserved sequence Gly-Ser-Xaa-Xaa-Lys-Pro. The other region was 19-40 residues to the amino-terminal side of the active-site serine and may be part of a conserved penicillin-binding site in these proteins.  相似文献   

4.
Two glycoproteins bands isolated from the cyst wall protein pattern of two colpodid ciliates, Colpoda inflata (gp46CI) and Colpoda cucullus (gp46CC) were analysed for their amino acid composition. Both glycoproteins are very rich in glycine and have a relatively high hydrophobicity, containing additionally many leucine and alanine residues. Their high degree of similarity is both quantitative and qualitative. Compared with just two previously published reports, their amino acid compositions are similar to those found in the hydrolysed cyst wall total proteins from the ciliates C. steinii and Paraurostyla spp. The amino acid composition corroborates that they are indeed glycoproteins, because asparagine, an amino acid residue suitable for the attachment to N-acetylglucosamine by its amide group (N-glycan), is abundant in both proteins. We discuss our data in relation to other glycine-rich proteins and a comparison with amino acid composition protein databases is carried out.  相似文献   

5.
A special matrix of amino acid antigenic similarity for computer detection of the potential antigenic proximity of unrelated proteins is proposed. The matrix was built using the data concerning affinities of amino acid residue interactions between subunits in oligomeric proteins. The diagonal elements of the matrix characterize the recognition of amino acid residues and the non-diagonal ones represent the relative similarity measure of antibody--amino acid residue interactions specificity. The application of the new matrix for comparing proteins allows the hydrophilic potentially immunologically active regions of sequences to be picked out as similar fragments. When the influenza virus hemagglutinin was compared with 116 human proteins, eight fragments were picked out, that could not be determined by means of the routinely used MDM78 matrix. The antigenic similarity matrix for defining the forbidden structures is proposed to be used for preparing the peptidic antiviral vaccines.  相似文献   

6.
Li T  Fan K  Wang J  Wang W 《Protein engineering》2003,16(5):323-330
It is well known that there are some similarities among various naturally occurring amino acids. Thus, the complexity in protein systems could be reduced by sorting these amino acids with similarities into groups and then protein sequences can be simplified by reduced alphabets. This paper discusses how to group similar amino acids and whether there is a minimal amino acid alphabet by which proteins can be folded. Various reduced alphabets are obtained by reserving the maximal information for the simplified protein sequence compared with the parent sequence using global sequence alignment. With these reduced alphabets and simplified similarity matrices, we achieve recognition of the protein fold based on the similarity score of the sequence alignment. The coverage in dataset SCOP40 for various levels of reduction on the amino acid types is obtained, which is the number of homologous pairs detected by program BLAST to the number marked by SCOP40. For the reduced alphabets containing 10 types of amino acids, the ability to detect distantly related folds remains almost at the same level as that by the alphabet of 20 types of amino acids, which implies that 10 types of amino acids may be the degree of freedom for characterizing the complexity in proteins.  相似文献   

7.
The degree of similarity in the three-dimensional structures of two proteins can be examined by comparing the patterns of hydrophobicity found in their amino acid sequences. Each type of amino acid residue is assigned a numerical hydrophobicity, and the correlation coefficient rH is computed between all pairs of residues in the two sequences. In tests on sequences from two properly aligned proteins of similar three-dimensional structures, rH is found in the range 0.3 to 0.7. Improperly aligned sequences or unrelated sequences give rH near zero. By considering the observed frequency of amino acid replacements among related structures, a set of optimal matching hydrophobicities (OMHs) was derived. With this set of OMHs, significant correlation coefficients are calculated for similar three-dimensional structures, even though the two sequences contain few identical residues. An example is the two similar folding domains of rhodanese (rH = 0.5). Predictions are made of similar three-dimensional structures for the alpha and beta chains of the various phycobiliproteins, and for delta hemolysin and melittin.  相似文献   

8.
Using several consensus sequences for the 106 amino acid residue alpha-spectrin repeat segment as probes we searched animal sequence databases using the BLAST program in order to find proteins revealing limited, but significant similarity to spectrin. Among many spectrins and proteins from the spectrin-alpha-actinin-dystrophin family as well as sequences showing a rather high degree of similarity in very short stretches, we found seven homologous animal sequences of low overall similarity to spectrin but showing the presence of one or more spectrin-repeat motifs. The homology relationship of these sequences to alpha-spectrin was further analysed using the SEMIHOM program. Depending on the probe, these segments showed the presence of 6 to 26 identical amino acid residues and a variable number of semihomologous residues. Moreover, we found six protein sequences, which contained a sequence fragment sharing the SH3 (sarc homology region 3) domain homology of 42-59% similarity. Our data indicate the occurrence of motifs of significant homology to alpha-spectrin repeat segments among animal proteins, which are not classical members of the spectrin-alpha-actinin-dystrophin family. This might indicate that these segments together with the SH3 domain motif are conserved in proteins which possibly at the early stage of evolution were close cognates of spectrin-alpha-actinin-dystrophin progenitors but then evolved separately.  相似文献   

9.
The probability distribution in the (?,ψ)-plane obtained for each amino acid residue from cyrstal structure data of globular proteins is compared. This has shown amino acid residues. Pro and Gly to be conformationally unique. Conformational similarity in the (?,ψ)-plane of amino acid reced does not necessarily mean that they will have the same chemical or biochemical properties or similar secondary structures. A set of amino acid residues are given which can adopt the conformations of other amino acid residues without much difficulty either in the whole (?,ψ)-plane or in regions, where the observed conformations are maximum.  相似文献   

10.
The question of protein homology versus analogy arises when proteins share a common function or a common structural fold without any statistically significant amino acid sequence similarity. Even though two or more proteins do not have similar sequences but share a common fold and the same or closely related function, they are assumed to be homologs, descendant from a common ancestor. The problem of homolog identification is compounded in the case of proteins of 100 or less amino acids. This is due to a limited number of basic single domain folds and to a likelihood of identifying by chance sequence similarity. The latter arises from two conditions: first, any search of the currently very large protein database is likely to identify short regions of chance match; secondly, a direct sequence comparison among a small set of short proteins sharing a similar fold can detect many similar patterns of hydrophobicity even if proteins do not descend from a common ancestor. In an effort to identify distant homologs of the many ubiquitin proteins, we have developed a combined structure and sequence similarity approach that attempts to overcome the above limitations of homolog identification. This approach results in the identification of 90 probable ubiquitin-related proteins, including examples from the two prokaryotic domains of life, Archaea and Bacteria.  相似文献   

11.
During screening for antigenic proteins in Burkholderia pseudomallei, a novel insertion sequence, IS Bp1, was found by sequence similarity searches. IS Bp1 contains two overlapping ORFs of 261 bp ( orfA) and 852 bp ( orfB), encoding 87 and 284 amino acid residues, respectively, and an imperfect inverted repeat. The putative protein encoded by orfA (OrfA) is similar to the OrfA in insertion sequences of the IS 3 family in other bacteria, showing 49% and 76% amino acid identity and similarity, respectively, with the transposase encoded by IS D1 of Desulfovibrio vulgaris vulgaris. The putative protein encoded by orfB (OrfB) is similar to the OrfB in insertion sequences of the IS 3 family in other bacteria, showing 43% and 62% amino acid identity and similarity, respectively, with the transposase encoded by IS 1222 of Enterobacter agglomerans. Sequence analysis of OrfA showed the presence of an alpha-helix-turn-alpha-helix motif, as well as the putative leucine zipper at its 3' end, for possible DNA binding to the terminal inverted repeats. Sequence analysis of OrfB showed the presence of a DDE motif of aspartic acid, aspartic acid, and glutamic acid, a highly conserved motif present in OrfB of other members of the IS 3 family. Furthermore, several other conserved amino acid residues, including the arginine residue located seven amino acids downstream from the glutamic acid residue, were observed. PCR amplification of the IS Bp1 gene showed a specific band in 65% of the 26 B. pseudomallei strains tested. Southern blot hybridization after XhoI or SacI digestion showed nine different patterns of hybridization. The number of copies of IS Bp1 in those strains that possessed the insertion sequence ranged from three to 12. Using several insertion sequences and a combination of insertion-sequence-based and non-insertion-sequence-based methods such as ribotyping will probably increase the discriminatory power of molecular typing in B. pseudomallei.  相似文献   

12.
A statistically valid similarity was found to exist between the amino acid sequences of poliovirus genome-linked protein VPg and a fragment of bacteriophage Mu transposase (Mu A protein). Based on this observation a hypothesis is proposed that the molecular mechanisms underlying the functions of the two proteins may be analogous. Both proteins are supposed to be site-specific endonucleases which form covalent linkage with the 5'-phosphate group of the nicked DNA or RNA strand. The amino acid residue participating in the formation of this linkage in MuA is tentatively identified as Tyr413.  相似文献   

13.
The globin family has long been known from studies of approximately 150-residue proteins such as vertebrate myoglobins and haemoglobins. Recently, this family has been enriched by the investigation of the sequences and structures of truncated globins, which have the same basic topology but are approximately 30 residues shorter and exhibit functions other than the familiar one of binding diatomic ligands. The divergence of protein sequences, structures and functions reveals Nature's exploration of the potential inherent in a folding pattern, that is, the topology of the native structure. The observation of what remains constant and what varies during the evolution of a protein family reveals essential features of structure and function. Study of proteins with a wide range of divergence can therefore sharpen our understanding of how different amino acid sequences can determine similar three-dimensional structures. Globins have provided, and continue to provide, interesting material for such studies.  相似文献   

14.
Tillier ER  Biro L  Li G  Tillo D 《Proteins》2006,63(4):822-831
Approaches for the determination of interacting partners from different protein families (such as ligands and their receptors) have made use of the property that interacting proteins follow similar patterns and relative rates of evolution. Interacting protein partners can then be predicted from the similarity of their phylogenetic trees or evolutionary distances matrices. We present a novel method called Codep, for the determination of interacting protein partners by maximizing co-evolutionary signals. The order of sequences in the multiple sequence alignments from two protein families is determined in such a manner as to maximize the similarity of substitution patterns at amino acid sites in the two alignments and, thus, phylogenetic congruency. This is achieved by maximizing the total number of interdependencies of amino acids sites between the alignments. Once ordered, the corresponding sequences in the two alignments indicate the predicted interacting partners. We demonstrate the efficacy of this approach with computer simulations and in analyses of several protein families. A program implementing our method, Codep, is freely available to academic users from our website: http://www.uhnresearch.ca/labs/tillier/.  相似文献   

15.
The beta antigen of the lbc protein complex of Group B streptococci is a cell-surface receptor which binds the Fc region of human immunoglobulin A (IgA). Determination of the nucleotide sequence of the beta antigen gene shows that it encodes a preprotein having a molecular weight of 130,963 daltons and a polypeptide of 1164 amino acid residues that is typical of other Gram-positive cell-wall proteins. There is a long signal sequence of 37 amino acids at the N-terminus. Four of the five C-terminal amino acid residues are basic and are preceded by a hydrophobic stretch that appears to anchor the C-terminus in the cell membrane. To the N-terminal side of this hydrophobic stretch is a putative cell-wall-spanning region containing proline-rich repeated sequences. An unusual feature of these repeated sequences is a three-residue periodicity, whereby every first residue is a proline, the second residue is alternating positively or negatively charged, and the third residue is uncharged. The IgA-binding activity was approximately localized by expressing subfragments of the beta antigen as fusion proteins. Two distinct but adjacent DNA segments specified peptides that bound IgA, which indicates that the IgA-binding activity is located in two distinct regions of the protein.  相似文献   

16.
Hu X  Kuhlman B 《Proteins》2006,62(3):739-748
Loss of side-chain conformational entropy is an important force opposing protein folding and the relative preferences of the amino acids for being buried or solvent exposed may be partially determined by which amino acids lose more side-chain entropy when placed in the core of a protein. To investigate these preferences, we have incorporated explicit modeling of side-chain entropy into the protein design algorithm, RosettaDesign. In the standard version of the program, the energy of a particular sequence for a fixed backbone depends only on the lowest energy side-chain conformations that can be identified for that sequence. In the new model, the free energy of a single amino acid sequence is calculated by evaluating the average energy and entropy of an ensemble of structures generated by Monte Carlo sampling of amino acid side-chain conformations. To evaluate the impact of including explicit side-chain entropy, sequences were designed for 110 native protein backbones with and without the entropy model. In general, the differences between the two sets of sequences are modest, with the largest changes being observed for the longer amino acids: methionine and arginine. Overall, the identity between the designed sequences and the native sequences does not increase with the addition of entropy, unlike what is observed when other key terms are added to the model (hydrogen bonding, Lennard-Jones energies, and solvation energies). These results suggest that side-chain conformational entropy has a relatively small role in determining the preferred amino acid at each residue position in a protein.  相似文献   

17.
The DNA sequences of the entire coding regions of the A and C type variable surface protein genes from Paramecium tetraurelia, stock 51 have been determined. The 8151 nucleotide open reading frame of the A gene contains several tandem repeats of 210 nucleotides within the central portion of the molecule as well as a periodic structure defined by cysteine residues. The 6699 nucleotide open reading frame of the C gene does not contain any identifiable tandem repeats or internal similarity but maintains a periodicity based on the cysteine residue spacing. The deduced amino acid sequences encoded by the two genes are most similar within the 600 amino-terminal and 600 carboxyl-terminal amino acid residues, the central portions show only limited sequence similarity. We conclude that internal repeats are not a conserved feature of variable surface proteins in Paramecium and discuss the possible importance of the regular pattern of cysteine residues.  相似文献   

18.
Naumoff DG 《FEBS letters》1999,448(1):177-179
Comparison of the amino acid sequences of two families of glycosyl hydrolases reveals that they are related in a region in the central part of the sequences. One of these families (GH family 68) includes levansucrases and the other one (glycosyl hydrolase family 43) includes bifunctional beta-xylosidases and alpha-L-arabinofuranosidases. The similarity of the primary structure of proteins from these families allows us to consider the invariant glutamate residue as a component of their active center. It is shown for the first time that glycosyl hydrolases recognizing different glycofuranoside residues can have a common sequence motif.  相似文献   

19.
Homology detection and protein structure prediction are central themes in bioinformatics. Establishment of relationship between protein sequences or prediction of their structure by sequence comparison methods finds limitations when there is low sequence similarity. Recent works demonstrate that the use of profiles improves homology detection and protein structure prediction. Profiles can be inferred from protein multiple alignments using different approaches. The "Conservatism-of-Conservatism" is an effective profile analysis method to identify structural features between proteins having the same fold but no detectable sequence similarity. The information obtained from protein multiple alignments varies according to the amino acid classification employed to calculate the profile. In this work, we calculated entropy profiles from PSI-BLAST-derived multiple alignments and used different amino acid classifications summarizing almost 500 different attributes. These entropy profiles were converted into pseudocodes which were compared using the FASTA program with an ad-hoc matrix. We tested the performance of our method to identify relationships between proteins with similar fold using a nonredundant subset of sequences having less than 40% of identity. We then compared our results using Coverage Versus Error per query curves, to those obtained by methods like PSI-BLAST, COMPASS and HHSEARCH. Our method, named HIP (Homology Identification with Profiles) presented higher accuracy detecting relationships between proteins with the same fold. The use of different amino acid classifications reflecting a large number of amino acid attributes, improved the recognition of distantly related folds. We propose the use of pseudocodes representing profile information as a fast and powerful tool for homology detection, fold assignment and analysis of evolutionary information enclosed in protein profiles.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号