首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Recognition of homologies may give hints about the structure and function of proteins; therefore, we are developing strategies to aid sequence comparisons. Detecting homology of mosaic proteins is especially difficult since the modules constituting these proteins are usually distantly related and their homology is not readily recognized by conventional computer programs. In the present work we show that the rules of the evolution of mosaic proteins can guide the identification of modules of mosaic proteins and can delineate the group of sequences in which the presence of homologous sequences may be expected. By this approach we can concentrate the search for homology to a limited group of sequences; thus ensuring a more intense and more fruitful search. The power of this approach is illustrated by the fact that it could detect homologies not identified by earlier methods of sequence comparison. In this paper we show that thrombomodulin contains a domain homologous with animal lectins, that complement components C9, C8 alpha and C8 beta have modules homologous with one of the repeat units of thrombospondin and that the somatomedin B module of vitronectin is homologous with the internal repeats of plasma cell membrane glycoprotein PC-1.  相似文献   

2.
A set of aligned homologous protein sequences is divided into two groups consisting of m and n most related sequences. The value of position variability for homologous protein sequences is defined as a number of failures to coincide in the intergroup comparison of all possible m*n pairs of amino acid residues in that position divided by m*n. The position variability value plotted versus the sequence position number with a window of 10 positions gives the intergroup local variability profile. Area S of the figure included between the local variability profile and the straight line corresponding to the mean local variability value is compared with the average area Sr for 1000 random homologous protein families. If S is greater than Sr by more than 2 standard deviation units sigma r, the local variability profile is assumed to contain peaks and hollows corresponding to significant variable and conservative regions of the sequences. The profile extrema containing the area surplus delta S = S-(Sr+ 2 sigma r) are cut off by two straight lines to locate significant regions. The difference (S-Sr) given in standard deviation units sigma r is believed to be the amino acid substitution overall irregularity along the homologous protein sequences OI = (S-Sr)/sigma r. The significant conservative and variable regions of six homologous sequence families (phospholipase A2, cytochromes b, alpha-subunits of Na,K-ATPase, L- and M-subunits of photosynthetic bacteria photoreaction centre and human rhodopsins) were identified. It was shown that for artificial homologous protein sequences derived by k-fold lengthening of natural protein sequences, the OI value rises as square root of k. To compare the degree of substitution irregularity in homologous protein sequence families of different lengths L the value of standard substitution overall irregularity for L = 250 is proposed.  相似文献   

3.
The technique of model-building a protein of known sequence but unknown tertiary structure from the structures of homologous proteins is probably so far the most reliable means of mapping from primary to tertiary structure. A key step towards the realization of the aim is to develop ways of aligning three-dimensional structures of homologus proteins, thereby deriving the rules useful for protein modelling. We have developed a generalized differential-geometric representation of protein local conformation for use in a protein comparison program which aligns protein sequences on the basis of their sequence and conformational knowledge. Because the differetial-geometric distance measure between local conformations is independent of the coordinate frame and remains chirality information, the comparison program is easily implemented, relatively rational and reasonably fast. The utility of this program for aligning closely and distantly related homologous proteins is demonstrated by multiple alignment of globins, serine proteinases and aspartic proteinase domains. Particularly, the method has reached the rational alignment between the mammalian and microbial serine proteinases as compared with many published alignment programs.  相似文献   

4.
The complete amino acid sequences of ribosomal proteins S11 from the Gram-positive eubacterium Bacillus stearothermophilus and of S19 from the archaebacterium Halobacterium marismortui have been determined. A search for homologous sequences of these proteins revealed that they belong to the ribosomal protein S11 family. Homologous proteins have previously been sequenced from Escherichia coli as well as from chloroplast, yeast and mammalian ribosomes. A pairwise comparison of the amino acid sequences showed that Bacillus protein S11 shares 68% identical residues with S11 from Escherichia coli and a slightly lower homology (52%) with the homologous chloroplast protein. The halophilic protein S19 is more related to the eukaryotic (45–49%) than to the eubacterial counterparts (35%)  相似文献   

5.
6.
A method for comparison of protein sequences based on their primary and secondary structure is described. Protein sequences are annotated with predicted secondary structures (using a modified Chou and Fasman method). Two lettered code sequences are generated (Xx, where X is the amino acid and x is its annotated secondary structure). Sequences are compared with a dynamic programming method (STRALIGN) that includes a similarity matrix for both the amino acids and secondary structures. The similarity value for each paired two-lettered code is a linear combination of similarity values for the paired amino acids and their annotated secondary structures. The method has been applied to eight globin proteins (28 pairs) for which the X-ray structure is known. For protein pairs with high primary sequence similarity (greater than 45%), STRALIGN alignment is identical to that obtained by a dynamic programming method using only primary sequence information. However, alignment of protein pairs with lower primary sequence similarity improves significantly with the addition of secondary structure annotation. Alignment of the pair with the least primary sequence similarity of 16% was improved from 0 to 37% 'correct' alignment using this method. In addition, STRALIGN was successfully applied to seven pairs of distantly related cytochrome c proteins, and three pairs of distantly related picornavirus proteins.  相似文献   

7.
Ribosomal proteins were extracted from 50S ribosomal subunits of the archaebacterium Halobacterium marismortui by decreasing the concentration of Mg2+ and K+, and the proteins were separated and purified by ion-exchange column chromatography on DEAE-cellulose. Ten proteins were purified to homogeneity and three of these proteins were subjected to sequence analysis. The complete amino acid sequences of the ribosomal proteins L25, L29 and L31 were established by analyses of the peptides obtained by enzymatic digestion with trypsin, Staphylococcus aureus protease, chymotrypsin and lysylendopeptidase. Proteins L25, L29 and L31 consist of 84, 115 and 95 amino acid residues with the molecular masses of 9472 Da, 12293 Da and 10418 Da respectively. A comparison of their sequences with those of other large-ribosomal-subunit proteins from other organisms revealed that protein L25 from H. marismortui is homologous to protein L23 from Escherichia coli (34.6%), Bacillus stearothermophilus (41.8%), and tobacco chloroplasts (16.3%) as well as to protein L25 from yeast (38.0%). Proteins L29 and L31 do not appear to be homologous to any other ribosomal proteins whose structures are so far known.  相似文献   

8.
Molecular modeling of proteins is confronted with the problem of finding homologous proteins, especially when few identities remain after the process of molecular evolution. Using even the most recent methods based on sequence identity detection, structural relationships are still difficult to establish with high reliability. As protein structures are more conserved than sequences, we investigated the possibility of using protein secondary structure comparison (observed or predicted structures) to discriminate between related and unrelated proteins sequences in the range of 10%-30% sequence identity. Pairwise comparison of secondary structures have been measured using the structural overlap (Sov) parameter. In this article, we show that if the secondary structures likeness is >50%, most of the pairs are structurally related. Taking into account the secondary structures of proteins that have been detected by BLAST, FASTA, or SSEARCH in the noisy region (with high E: value), we show that distantly related protein sequences (even with <20% identity) can be still identified. This strategy can be used to identify three-dimensional templates in homology modeling by finding unexpected related proteins and to select proteins for experimental investigation in a structural genomic approach, as well as for genome annotation.  相似文献   

9.
The pattern of nucleotide substitution was examined at 2,129 orthologous loci among five genomes of Staphylococcus aureus, which included two sister pairs of closely related genomes (MW2/MSSA476 and Mu50/N315) and the more distantly related MRSA252. A total of 108 loci were unusual in lacking any synonymous differences among the five genomes; most of these were short genes encoding proteins highly conserved at the amino acid sequence level (including many ribosomal proteins) or unknown predicted genes. In contrast, 45 genes were identified that showed anomalously high divergence at synonymous sites. The latter genes were evidently introduced by homologous recombination from distantly related genomes, and in many cases, the pattern of nucleotide substitution made it possible to reconstruct the most probable recombination event involved. These recombination events introduced genes encoding proteins that differed in amino acid sequence and thus potentially in function. Several of the proteins are known or likely to be involved in pathogenesis (e.g., staphylocoagulase, exotoxin, Ser-Asp fibrinogen-binding bone sialoprotein-binding protein, fibrinogen and keratin-10 binding surface-anchored protein, fibrinogen-binding protein ClfA, and enterotoxin P). Therefore, the results support the hypothesis that exchange of homologous genes among S. aureus genomes can play a role in the evolution of pathogenesis in this species.  相似文献   

10.
The cloning and characterization of the cytoplasmic 7 S RNAs of HeLa cells has provided pure probes to study the organization of the corresponding genomic DNA sequences. Such analysis has shown that the 7 S L and K RNAs are derived from families of middle repetitive DNA (Ullu & Melli, 1982; Ullu et al., 1982). In this work we analyze the evolutionary conservation of these sequences in the RNA and DNA of distantly related species. Hybridization of the 7 S recombinants to the RNA of rodents, birds, amphibians and echinoderms suggests high conservation of these sequences throughout evolution. Southern blot analysis of genomic DNAs from the same species shows the presence of families of repeated sequences homologous to the 7 S recombinants and Alu DNAs in the genomes of the same species. We were unable to hybridize the 7 S probes to the RNAs of Drosophila melanogaster or Dictyostelium discoideum, although sequence(s) homologous to the 7 S L probe were found in the genome of D. discoideum and to both 7 S L and K probes in the genome of D. melanogaster.  相似文献   

11.
The genes for the ribosomal 5S rRNA binding protein L5 have been cloned from three extremely thermophilic eubacteria, Thermus flavus, Thermus thermophilus HB8 and Thermus aquaticus (Jahn et al, submitted). Genes for protein L5 from the three Thermus strains display 95% G/C in third positions of codons. Amino acid sequences deduced from the DNA sequence were shown to be identical for T flavus and T thermophilus, although the corresponding DNA sequences differed by two T to C transitions in the T thermophilus gene. Protein L5 sequences from T flavus and T thermophilus are 95% homologous to L5 from T aquaticus and 56.5% homologous to the corresponding E coli sequence. The lowest degrees of homology were found between the T flavus/T thermophilus L5 proteins and those of yeast L16 (27.5%), Halobacterium marismortui (34.0%) and Methanococcus vannielii (36.6%). From sequence comparison it becomes clear that thermostability of Thermus L5 proteins is achieved by an increase in hydrophobic interactions and/or by restriction of steric flexibility due to the introduction of amino acids with branched aliphatic side chains such as leucine. Alignment of the nine protein sequences equivalent to Thermus L5 proteins led to identification of a conserved internal segment, rich in acidic amino acids, which shows homology to subsequences of E coli L18 and L25. The occurrence of conserved sequence elements in 5S rRNA binding proteins and ribosomal proteins in general is discussed in terms of evolution and function.  相似文献   

12.
A method of identification of significant conservative and variable regions in homologous protein sequences is presented. A set of aligned homologous sequences is divided into two groups consisting of m and n most related sequences. Each pair of sequences from different group is compared using unitary similarity matrix. The superposition of pairwise comparisons scanned by a window of 10 amino acid residues gives intergroup local variability profile (VP). Area S of the figure between the VP and its mean value line is compared with averaged area S(r) of 1000 VPs of artificial homologous protein families. The difference (S-S(r)) given in standard deviation units sigma r is believed to be the amino acid substitution overall irregularity along the homologous protein sequences OI = (S-S(r))/sigma r. If OI greater than 2, the real VP extrema containing the surplus of area S-(S(r) + 2 sigma r) are cut off. The cut off stretches are likely to be significant conservative and variable regions. The significant conservative and variable regions of six homologous sequence families (phospholipases A2, cytochromes b, alpha-subunits of Na, K-ATPase, L- and M-subunits of photosynthetic bacteria photoreaction centre and human rhodopsins) were identified. It was shown that for artificial homologous protein sequences derived by k-fold lengthening of natural proteins the OI value rises as square root of k. To compare the degree of substitution irregularity in homologous protein sequence families of different length L the value of standard substitution overall irregularity for L = 250 is proposed.  相似文献   

13.
Evaluation and improvements in the automatic alignment of protein sequences   总被引:6,自引:0,他引:6  
The accuracy of protein sequence alignment obtained by applying a commonly used global sequence comparison algorithm is assessed. Alignments based on the superposition of the three-dimensional structures are used as a standard for testing the automatic, sequence-based methods. Alignments obtained from the global comparison of five pairs of homologous protein sequences studied gave 54% agreement overall for residues in secondary structures. The inclusion of information about the secondary structure of one of the proteins in order to limit the number of gaps inserted in regions of secondary structure, improved this figure to 68%. A similarity score of greater than six standard deviation units suggests that an alignment which is greater than 75% correct within secondary structural regions can be obtained automatically for the pair of sequences.  相似文献   

14.
The complete amino acid sequences of ribosomal proteins L9, L20, L21/22, L24 and L32 from the archaebacterium Halobacterium marismortui were determined. The comparison of the sequences of these proteins with those from other organisms revealed that proteins L21/22 and L24 are homologous to ribosomal protein Yrp29 from yeast and L19 from rat, respectively, and that H. marismortui L20 is homologous to L30 from eubacteria. H. marismortui ribosomal protein L9 showed sequence homology to both L29 from yeast and L15 from eubacteria. No homologous protein was found for H. marismortui L32. These results are discussed with respect to the phylogenetic relationship between eubacteria, archaebacteria and eukaryotes.  相似文献   

15.
A set of aligned homologous protein sequences is divided into two groups consisting of the most related sequences m and k. The value of the position variability of homologous protein sequences is defined as a number of failures to coincide in the intergroup comparison of all possible k x m pairs of amino acid residues in that position divided by k x m. The position variability value plotted vs the sequence position number with a window of 10 positions gives the intergroup local variability profile. The area S of the figure included between the local variability profile and the straight line corresponding to the mean local variability value is compared with the average area S(r) for 1000 random homologous protein families. If S is greater than S(r) by more than 2 standard deviation units sigma r the local variability profile is assumed to contain peaks and hollows corresponding to significant variable and conservative regions of the sequences. The profile extrema containing the area surplus delta S = S-(S(r) + 2 sigma r) are cut off by two straight lines to locate significant regions. The numerical experiment on the family of homologous phospholipases A2 revealed the linear dependence of the values S(r) and sigma r upon the position variability standard deviation sigma v of the homologous sequences. Furthermore, it was shown for protein families of various length (rhodopsins, aspartate aminotransferases, cytochromes b, L- and M-subunits of photosynthetic bacteria photoreaction centre and alpha-subunits of Na, K-ATPase), that delta S = S - n(S'r + 2 sigma r), where S - the area of the local variability profile, n = L/l (L - the length of the given protein family and l - the length of the hypothetical protein domain). If l = 250 then S'r = -1.42 + 62.56 sigma v and sigma'r = -0.14 + 7.46 sigma v.  相似文献   

16.
The nucleotide sequences encoding the matrix (M) proteins of measles virus (MV) and canine distemper virus (CDV) were determined from cDNA clones containing these genes in their entirety. In both cases, single open reading frames specifying basic proteins of 335 amino acid residues were predicted from the nucleotide sequences. Both viral messages were composed of approximately 1,450 nucleotides and contained 400 nucleotides of presumptive noncoding sequences at their respective 3' ends. MV and CDV M-protein-coding regions were 67% homologous at the nucleotide level and 76% homologous at the amino acid level. Only chance homology was observed in the 400-nucleotide trailer sequences. Comparisons of the M protein sequences of MV and CDV with the sequence reported for Sendai virus (B. M. Blumberg, K. Rose, M. G. Simona, L. Roux, C. Giorgi, and D. Kolakofsky, J. Virol. 52:656-663; Y. Hidaka, T. Kanda, K. Iwasaki, A. Nomoto, T. Shioda, and H. Shibuta, Nucleic Acids Res. 12:7965-7973) indicated the greatest homology among these M proteins in the carboxyterminal third of the molecule. Secondary-structure analyses of this shared region indicated a structurally conserved, hydrophobic sequence which possibly interacted with the lipid bilayer.  相似文献   

17.
18.
The presented software package allows: 1) to input and edit amino acid sequences; 2) to list aligned amino acid sequences of homologous proteins; 3) pairwise comparison of homologous sequences; 4) construction of phylogenetic trees; 5) comparison of two groups of protein sequences from the same family of homologous proteins; 6) graphic identification of conservative and variable regions of homologous sequences. The stepwise application of the programs allows to study the process of amino acid replacement accumulation during certain intervals of species evolution.  相似文献   

19.
The amino acid sequences of ribosomal proteins L1, L14, L15, L23, L24 and L29 from Bacillus stearothermophilus have been completely determined. This has been achieved by sequence analyses of peptides derived from enzymatic digestions of the proteins with trypsin, chymotrypsin, pepsin, Staphylococcus aureus protease, and Armillaria mellea protease as well as by chemical cleavage with hydroxylamine and cyanogen bromide. Based on the primary structures of the six proteins, their secondary structures were predicted using four different computer prediction programs. A comparison of the amino acid sequences of the studied proteins from B. stearothermophilus with the homologous proteins from Escherichia coli revealed that in four proteins (L1, L15, L24 and L29) between 40-50% of the residue in the sequences are identical, whereas this value is significantly higher (69%) for L14 and lower (28%) for L23. The distribution of those amino acid residues which are identical in the corresponding proteins from the two bacteria is not random along the protein chain: some regions are highly conserved whereas others are not. This finding indicates that the regions which are conserved during evolution are important for the spatial structure and/or function of the protein.  相似文献   

20.
A transposon, designated Tn5469, was isolated from mutant strain FdR1 of the filamentous cyanobacterium Fremyella diplosiphon following its insertion into the rcaC gene. Tn5469 is a 4,904-bp noncomposite transposon with 25-bp near-perfect terminal inverted repeats and has three tandemly arranged, slightly overlapping potential open reading frames (ORFs) encoding proteins of 104.6 kDa (909 residues), 42.5 kDa (375 residues), and 31.9 kDa (272 residues). Insertion of Tn5469 into the rcaC gene in strain FdR1 generated a duplicate 5-bp target sequence. On the basis of amino acid sequence identifies, the largest ORF, designated tnpA, is predicted to encode a composite transposase protein. A 230-residue domain near the amino terminus of the TnpA protein has 15.4% amino acid sequence identity with a corresponding domain for the putative transposase encoded by Lactococcus lactis insertion sequence S1 (ISS1). In addition, the sequence for the carboxyl-terminal 600 residues of the TnpA protein is 20.0% identical to that for the TniA transposase encoded by Tn5090 on Klebsiella aerogenes plasmid R751. The TnpA and TniA proteins contain the D,D(35)E motif characteristic of a recently defined superfamily consisting of bacterial transposases and integrase proteins of eukaryotic retroelements and retrotransposons. The two remaining ORFs on Tn5469 encode proteins of unknown function. Southern blot analysis showed that wild-type F. diplosiphon harbors five genomic copies of Tn5469. In comparison, mutant strain FdR1 harbors an extra genomic copy of Tn5469 which was localized to the inactivated rcaC gene. Among five morphologically distinct cyanobacterial strains examined, none was found to contain genomic sequences homologous to Tn5469.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号