首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A new algorithm is introduced for analyzing gene-duplication-independent (orthologous) and gene-duplication-dependent amino acid sequence similarities between proteins of different species. It is based on the calculation of an autocorrelation function D(x) as a Fourier series analogous to that used in crystal analysis by x-ray diffraction. The primary structure of the protein is decomposed into "homopolypeptide-defective sequences" containing identical or similar amino acid residues and vacancies corresponding to the missing amino acid residues. The Fourier transforms F(h) simulating the diffraction patterns of defective linear gratings corresponding to the defective homopolypeptide sequences are calculated. The squared F(h) values are then used as coefficients of Fourier series corresponding to the autocorrelation functions D(x). A peak of D(x) corresponds to a vector of length x, which is the distance between two identical amino acid residues. It is pointed out that optical diffraction methods, instead of computer methods, would also be useful. It is shown through a number of examples that this method allows satisfactory pattern recognition of homologies and internal duplications of an initial segment of the polypeptide chain. In the latter case the value of the above method may be seen from the fact that it detects repeated duplications in proteins such as spinach ferredoxin and myoglobin, for which other methods had either failed or given inconclusive results. The above approach appears most promising for studies of molecular evolution and structure-sequence correlations.  相似文献   

2.

Background

Many proteins with tandem repeats in their sequence have been described and classified according to the length of the repeats: I) Repeats of short oligopeptides (from 2 to 20 amino acids), including structural cell wall proteins and arabinogalactan proteins. II) Repeats that range in length from 20 to 40 residues, including proteins with a well-established three-dimensional structure often involved in mediating protein-protein interactions. (III) Longer repeats in the order of 100 amino acids that constitute structurally and functionally independent units. Here we analyse ShooT specific (ST) proteins, a family of proteins with tandem repeats of unknown function that were first found in Leguminosae, and their possible similarities to other proteins with tandem repeats.

Results

ST protein sequences were only found in dicotyledonous plants, limited to several plant families, mainly the Fabaceae and the Asteraceae. ST mRNAs accumulate mainly in the roots and under biotic interactions. Most ST proteins have one or several Domain(s) of Unknown Function 2775 (DUF2775). All deduced ST proteins have a signal peptide, indicating that these proteins enter the secretory pathway, and the mature proteins have tandem repeat oligopeptides that share a hexapeptide (E/D)FEPRP followed by 4 partially conserved amino acids, which could determine a putative N-glycosylation signal, and a fully conserved tyrosine. In a phylogenetic tree, the sequences clade according to taxonomic group. A possible involvement in symbiosis and abiotic stress as well as in plant cell elongation is suggested, although different STs could play different roles in plant development.

Conclusions

We describe a new family of proteins called ST whose presence is limited to the plant kingdom, specifically to a few families of dicotyledonous plants. They present 20 to 40 amino acid tandem repeat sequences with different characteristics (signal peptide, DUF2775 domain, conservative repeat regions) from the described group of 20 to 40 amino acid tandem repeat proteins and also from known cell wall proteins with repeat sequences. Several putative roles in plant physiology can be inferred from the characteristics found.  相似文献   

3.
Summary The amino acid sequences of four strains of tobacco mosaic virus isolated in different parts of the world are compared. The differences between the strains are discussed with respect to special proteinchemical features (such as beginning of the chain, deletion of amino acids, number of different amino acids, sizes and distribution of regions with invariable amino acids) and with respect to the possibility of deducing the most probable nucleotide sequence for the coat protein cistron of tobacco mosaic virus.The complete amino acid sequences of the two RNA bacteriophage strains fr and f2 are compared. According to their coat proteins three groups of phages can be formed: 1) MS 2, f2 M 12 and R 17, 2) fr and 3) Q.  相似文献   

4.
The nucleotide sequence of cryptic plasmid (designated as pBL90) detected in the cells of Brevibacterium lactofermentum DSM 1412 was determined. The length of plasmid DNA is 67826 bp. Comparison of the nucleotide sequence of pBL90 with known plasmid sequences showed no long regions of significant homology. Computer analysis of the plasmid DNA revealed 29 open reading frames (ORFs). The amino acid sequences of 15 ORFs (approximately 25% of plasmid length) have a high (>70%) level of identity to proteins from different plasmids of Corynebacterium representatives, including replicative proteins. Unusual in pBL90 is the presence of replicative genes from two different families and types of replication.  相似文献   

5.
Digital coding of amino acids based on hydrophobic index   总被引:1,自引:0,他引:1  
Analysis of amino acid sequences can provide useful insights into the tertiary structures of proteins and their biological functions. One of the critical problems in amino acid analysis is how to establish a digital coding system to better reflect the properties of amino acids and their degeneracy. Based on the hydrophobic index, a one-to-one relationship has been established between the amino acid sequence and the digital signal process. Such a "bridge" will make it possible to apply all the existing powerful methods in the signal processing area to analysis of the amino acid sequences.  相似文献   

6.
    
Two computerized methods of predicting protein secondary structure from amino acid sequences are evaluated by using them on the -amylase ofAspergillus oryzae, for which the three-dimensional structure has been determined. The methods are then used, with amino acid alignments, to predict the structures of other -amylases. It is found that all -amylases of known amino acid sequence have the same basic structure, a barrel of eight parallel stretches of extended chain surrounded by eight helices. Strong similarities are found in those areas of the proteins believed to bind an essential calcium ion and at that part of the active site that catalyzes bond hydrolysis in the substrates. The active site, as a whole, is formed mainly of amino acids situated on loops joining extended chain to the adjacent helix. Variations in the length and amino acid sequence of these loops, from one -amylase to another, provide the differences in binding the substrates believed to account for the known variations in action pattern of -amylases of different biological origins.  相似文献   

7.
Archaea, bacteria and eukaryotes represent the main kingdoms of life. Is there any trend for amino acid compositions of proteins found in full genomes of species of different kingdoms? What is the percentage of totally unstructured proteins in various proteomes? We obtained amino acid frequencies for different taxa using 195 known proteomes and all annotated sequences from the Swiss-Prot data base. Investigation of the two data bases (proteomes and Swiss-Prot) shows that the amino acid compositions of proteins differ substantially for different kingdoms of life, and this difference is larger between different proteomes than between different kingdoms of life. Our data demonstrate that there is a surprisingly small selection for the amino acid composition of proteins for higher organisms (eukaryotes) and their viruses in comparison with the "random" frequency following from a uniform usage of codons of the universal genetic code. On the contrary, lower organisms (bacteria and especially archaea) demonstrate an enhanced selection of amino acids. Moreover, according to our estimates, 12%, 3% and 2% of the proteins in eukaryotic, bacterial and archaean proteomes are totally disordered, and long (> 41 residues) disordered segments are found to occur in 16% of arhaean, 20% of eubacterial and 43% of eukaryotic proteins for 19 archaean, 159 bacterial and 17 eukaryotic proteomes, respectively. A correlation between amino acid compositions of proteins of various taxa, show that the highest correlation is observed between eukaryotes and their viruses (the correlation coefficient is 0.98), and bacteria and their viruses (the correlation coefficient is 0.96), while correlation between eukaryotes and archaea is 0.85 only.  相似文献   

8.
We determined the complete amino acid sequences of the hemoglobin of two species, guinea fowl and California quail, in Galliformes from intact globin chain and chemical cleavage fragments in order to analyze the molecular evolution of hemoglobin for the classification of Galliformes. Galliformes have two types of hemoglobin components, HbA and HbD, which consist of identical chain and different chains. The sequences are similar to globin chains of Galliformes reported previously. These sequences were compared with those of other Galliformes (Phasianidae, Meleagrididae) using duck and goshawk as out-groups. The phylogenetic tree of major groups of Galliformes based on hemoglobin was similar to the tree model produced based on the amino acid sequence of lysozyme c.  相似文献   

9.
10.
A new D-type retrovirus originally designated SAIDS-D/Washington and here referred to as retrovirus-D/Washington (R-D/W) was recently isolated at the University of Washington Primate Center, Seattle, Wash., from a rhesus monkey with an acquired immunodeficiency syndrome and retroperitoneal fibromatosis. To better establish the relationship of this new D-type virus to the prototype D-type virus, Mason-Pfizer monkey virus (MPMV), we have purified and compared six structural proteins from each virus. The proteins purified from each D-type retrovirus include p4, p10, p12, p14, p27, and a phosphoprotein designated pp18 for MPMV and pp20 for R-D/W. Amino acid analysis and N-terminal amino acid sequence analysis show that the p4, p12, p14, and p27 proteins of R-D/W are distinct from the homologous proteins of MPMV but that these proteins from the two different viruses share a high degree of amino acid sequence homology. The p10 proteins from the two viruses have similar amino acid compositions, and both are blocked to N-terminal Edman degradation. The phosphoproteins from the two viruses each contain phosphoserine but are different from each other in amino acid composition, molecular weight, and N-terminal amino acid sequence. The data thus show that each of the R-D/W proteins examined is distinguishable from its MPMV homolog and that a major difference between these two D-type retroviruses is found in the viral phosphoproteins. The N-terminal amino acid sequences of D-type retroviral proteins were used to search for sequence homologies between D-type and other retroviral amino acid sequences. An unexpected amino acid sequence homology was found between R-D/W pp20 (a gag protein) and a 28-residue segment of the env precursor polyprotein of Rous sarcoma virus. The N-terminal amino acid sequences of the D-type major gag protein (p27) and the nucleic acid-binding protein (p14) show only limited amino acid sequence homology to functionally homologous proteins of C-type retroviruses.  相似文献   

11.
The review deals with repeating fragments of amino acid sequences, so-called "motifs", that are important in maintaining structural integrity and/or function of various proteins, especially those interacting with phospholipid aggregates. The occurrence of Phe-Leu-Gly motif characteristic for the amino acid sequence of the primate immuno-deficiency viruses fusion peptides is analysed in various proteins, as well as tripeptide fragments of general formula Xaa-Xah-Gly (Xaa-Phe, Tyr; Xab-hydrophobic amino acids Ala, Val, Leu, Ile) homologous to the above motif and retro-sequences Gly-Xab-Xaa. These tripeptide repeats are characteristic for the amino acid sequences of complex membrane proteins, viral envelope proteins, proteinases and proteins connected with energy transfer or interacting with lipids. These repeats are frequently met in conservative regions of amino acid sequences, in sites readily accessible for other molecules at the boundary of or between structured fragments, this being due to the backbone semi-coiled form's preference in the given amino acid fragment. This protein motif appears to play an important role at the initial stages of the large protein's interaction with the phospholipid membrane.  相似文献   

12.
Conserved and variable elements in RNA genomes of potexviruses   总被引:6,自引:0,他引:6  
The nucleotide sequences of genomic RNAs and predicted amino acid sequences of two strains of potato virus X and white clover mosaic potexvirus were compared to each other, and the proteins of different plus-RNA-containing plant viruses. The predicted non-virion proteins of potexviruses have direct sequence homology and common structural peculiarities with those of several 'Sindbis-like' plant viruses. The most conserved amino acid sequences were found to be located in the polypeptide encoded by the long 5'-proximal open reading frame (ORF1). The putative polypeptide encoded by the ORF2 starting beyond the ORF1 stop codon is clearly related to the presumptive NTP-binding domain of the ORF1-coded polypeptide. These results suggest possible functions for all of the potexvirus proteins and also indicate that potexviruses have a genome organization which is considerably different from that of other plant viruses.  相似文献   

13.

Background

The influenza A(H1N1)2009 virus has been the dominant type of influenza A virus in Finland during the 2009–2010 and 2010–2011 epidemic seasons. We analyzed the antigenic characteristics of several influenza A(H1N1)2009 viruses isolated during the two influenza seasons by analyzing the amino acid sequences of the hemagglutinin (HA), modeling the amino acid changes in the HA structure and measuring antibody responses induced by natural infection or influenza vaccination.

Methods/Results

Based on the HA sequences of influenza A(H1N1)2009 viruses we selected 13 different strains for antigenic characterization. The analysis included the vaccine virus, A/California/07/2009 and multiple California-like isolates from 2009–2010 and 2010–2011 epidemic seasons. These viruses had two to five amino acid changes in their HA1 molecule. The mutation(s) were located in antigenic sites Sa, Ca1, Ca2 and Cb region. Analysis of the antibody levels by hemagglutination inhibition test (HI) indicated that vaccinated individuals and people who had experienced a natural influenza A(H1N1)2009 virus infection showed good immune responses against the vaccine virus and most of the wild-type viruses. However, one to two amino acid changes in the antigenic site Sa dramatically affected the ability of antibodies to recognize these viruses. In contrast, the tested viruses were indistinguishable in regard to antibody recognition by the sera from elderly individuals who had been exposed to the Spanish influenza or its descendant viruses during the early 20th century.

Conclusions

According to our results, one to two amino acid changes (N125D and/or N156K) in the major antigenic sites of the hemagglutinin of influenza A(H1N1)2009 virus may lead to significant reduction in the ability of patient and vaccine sera to recognize A(H1N1)2009 viruses.  相似文献   

14.
The amino acid composition of human alcohol dehydrogenase (ADH) was compared with alcohol dehydrogenases from different organisms and with other proteins. Similar amino acid sequences in human ADH (template protein) and in other proteins were determined by means of an original computer program. Analysis of amino acid motifs reveals that the ADHs from evolutionary more close organisms have more common amino acid sequences. The quantity measure of amino acid similarity was the number of similar motifs in analyzed protein per protein length. This value was measured for ADHs and for different proteins. For ADHs, this quotient was higher than for proteins with different functions; for vertebrates it correlated with evolutionary closeness. The similar operation of motif comparison was made with the help of program complex “MEME”. The analysis of ADHs revealed 4 motifs common to 6 of 10 tested organisms and no such motifs for proteins of different function. The conclusion is that general amino composition is more important for protein function than amino acid order and for enzymes of similar function it better correlates with evolutionary distance between organisms.  相似文献   

15.
The genetic code is examined for indications of possible preceding codes that existed during early evolution. Eight of the 20 amino acids are coded by ‘quartets’ of codons with four-fold degeneracy, and 16 such quartets can exist, so that an earlier code could have provided for 15 or 16 amino acids, rather than 20. If two-fold degeneracy is postulated for the first position of the codon, there could have been 10 amino acids in the code. It is speculated that these may have been phenylalanine, valine, proline, alanine, histidine, glutamine, glutamic acid, aspartic acid, cysteine and glycine. There is a notable deficiency of arginine in proteins, despite the fact that it has six codons. Simultaneously, there is more lysine in proteins than would be expected from its two codons, if the four bases in mRNA are equiprobable and are arranged randomly. It is speculated that arginine is an ‘intruder’ into the genetic code, and that it may have displaced another amino acid such as ornithine, or may even have displaced lysine from some of its previous codon assignments. As a result, natural selection has favored lysine against the fact that it has only two codons. The introduction of tRNA into protein synthesis may have been a cataclysmic and comparatively sudden event, since duplication of tRNA takes place readily, and point mutations could rapidly differentiate members of the family of duplicates from each. Two tRNAs for different amino acids may have a common ancestor that existed more recently than the separation of the prokaryotes and eukaryotes. This is shown by homology of twoE. coli tRNAs for glycine and valine, and two yeast tRNAs for arginine and lysine.  相似文献   

16.
Summary NTP-motif, a consensus sequence previously shown to be characteristic of numerous NTP-utilizing enzymes, was identified in nonstructural proteins of several groups of positive-strand RNA viruses. These groups include picorna-, alpha-, and coronaviruses infecting animals and como-, poty-, tobamo-, tricorna-, hordei-, and furoviruses of plants, totalling 21 viruses. It has been demonstrated that the viral NTP-motif-containing proteins constitute three distinct families, the sequences within each family being similar to each other at a statistically highly significant level. A lower, but still valid similarity has also been revealed between the families. An overall alignment has been generated, which includes several highly conserved sequence stretches. The two most prominent of the latter contain the socalled A and B sites of the NTP-motif, with four of the five invariant amino acid residues observed within these sequences. These observations, taken together with the results of comparative analysis of the positions occupied by respective proteins (domains) in viral multidomain proteins, suggest that all the NTP-motif-containing proteins of positive-strand RNA viruses are homologous, constituting a highly diverged monophyletic group. In this group the A and B sites of the NTP-motif are the most conserved sequences and, by inference, should play the principal role in the functioning of the proteins. A hypothesis is proposed that all these proteins posses NTP-binding capacity and possibly NTPase activity, performing some NTP-dependent function in viral RNA replication. The importance of phylogenetic analysis for the assessment of the significance of the occurrence of the NTP-motif (and of sequence motifs of this sort in general) in proteins is emphasized.  相似文献   

17.
To facilitate swift structural characterizations, structural genomic/proteomic projects need to divide large multi-domain proteins into structural domains and to determine their structures separately. Thus, the assignment of structural domains based solely on sequence information, especially on the physico-chemical properties of the amino acid sequences, could be very helpful for such projects. In this study, we examined the characteristics of domain linker sequences, which are loop sequences connecting two structural domains. To this end, we prepared a set of 101 non-redundant multi-domain protein sequences with known structures, and performed an analysis of the linker sequences. The analysis revealed that the frequencies of five (Pro, Gly, Asp, Asn, Lys) amino acid residues differed significantly between the linker and non-linker loop sequences. Moreover, we observed a similar deviation for the residue pair frequencies between the two types of loop sequences. Finally, we describe an automated method, based on the above analysis, to detect loops that have high probabilities of being domain linkers in a protein sequence.  相似文献   

18.
Proceeding from the amino acid sequence of a number of proteins, with the help of a special computer program we have determined the frequency of pyrimidine isopliths of different length, the degree of clustering and the degree of asymmetry of complementary chains of the corresponding DNA cistrons, as well as the range of variation of these parametres which depends on the code degeneracy. The degree of asymmetry of the chains of DNA cistrons (H/L), calculated for 255 proteins of a known composition, may vary from 0.7 to 1.8. For 90% of these proteins the mean Py/Pu ratio in the coding chain of DNA is above 1. The conclusion has been made that the majority of amino acids contained in the proteins is coded for by purine triplets. It was found that the distribution of pyrimidine isopliths between DNA cistrons coding for different proteins is other than random and has a "DNA-like" character. The degree of clustering of pyrimidines (beta) in cistrons of different proteins may vary from 6.0 to 14.3. The cistrons of some proteins were found to contain long lyrimidine fragments with about 24 residues. A positive correlation (r2 = 0.74) was found to exist between the degree of clustering of pyrimidines and the degree of asymmetry of the chains corresponding to different proteins of DNA cistrons.  相似文献   

19.
Summary Gamma-carboxyglutamic acid is an amino acid with a dicarboxylic acid side chain. This amino acid, with unique metal binding properties, confers metal binding character to the proteins into which it is incorporated. This amino acid has been discovered in blood coagulation proteins (prothrombin, Factor X, Factor IX, and Factor VII), plasma proteins of unknown function (Protein C, Protein S, and Protein Z), and proteins from calcified tissue (osteocalcin and bone-Gla protein). It has also been observed in renal calculi, atherosclerotic plaque, and the egg chorioallantoic membrane, among other tissues. Gamma-carboxyglutamic acid is synthesized by the post-translational modification of glutamic acid residues. This reaction, catalyzed by a hepatic carboxylase, requires reduced vitamin K, oxygen, and carbon dioxide. The function of -carboxyglutamic acid is uncertain. In prothrombin y-carboxyglutamic acid residues bound to metal ions participate as an intramolecular non-covalent bridge to maintain protein conformation. Additionally, these amino acids participate in the calcium-dependent molecular assembly of proteins on membrane surfaces through intermolecular bridges involving y-carboxyglutamic acid and metal ions.  相似文献   

20.
崔治中  秦爱建 《生命科学》2000,12(4):155-156
所有蛋白质抗原表位既有其相对保守的氨基酸序列作为与相应MHC分子相结合的“锚点”,也有与特定溶细胞性T细胞(CTL)上的TCR要子或抗体他子特异性结合的特异性氨基酸序列。对前者,分子免疫学已积累了相当多的资料,但对后者尚少有报道。我 立克病病毒的9个不同毒株对2个单抗的反应性及其38kD磷蛋白基因的分析,确定了决定两个相叠的不同抗原表们特异性的氨基酸组成。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号