首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The amino acid compositions of many of the known sequenced proteins were compared to determine if a relationship exists between the amino acid composition and the amount of sequence homology of these proteins. For this purpose, a function, the composition divergence, was defined. A computer program based on relatively simple assumptions has been successfully used to simulate the data distribution found in comparing composition divergence with the sequence dissimilarity. From this characterization of known proteins, it is possible to predict the sequence homology of unsequenced proteins, based on amino acid composition comparisons. The power and limitations of this technique are discussed.  相似文献   

2.
We have critically evaluated hydrodynamic data from 21 proteins whose molecular dimensions are known from X-ray crystallography. We present two useful equations relating the molecular weights and sedimentation coefficients of globular proteins. The hydrodynamic data combined with data for small molecules from the literature indicate that failure of the Stokes equation occurs only for molecular weights <850. Calculated hydration values for the 21 proteins have a mean value and standard deviation of 0.53 ± 0.26 g H2O/g protein. Furthermore, statistical arguments indicate that only 5.3% of the variance is due to experimental error. The mean value and especially the dispersion of values are in sharp contrast to the values 0.36 ± 0.04 obtained by others from nmr measurements on frozen protein solutions. Hydration values calculated from nmr measurements are closely correlated with the number of charged and polar amino acid residues. In contrast to this result, our analysis of the amino acid compositions of the four proteins with the lowest hydration and the four monomeric proteins with the highest shows that the range of values we observe cannot be accounted for on the basis of amino acid composition. In fact there appears to be a weak correlation between the number of apolar residues and hydrodynamic hydration. We therefore conclude that the dispersion must result from variations in fine details of the surface structures of individual proteins. We propose a model of hemispherical clathrate cages which if correct, would account for the differences in the data obtained by these two methods.  相似文献   

3.
The outer membrane proteins (OMPs) are β-barrel membrane proteins that performed lots of biology functions. The discriminating OMPs from other non-OMPs is a very important task for understanding some biochemical process. In this study, a method that combines increment of diversity with modified Mahalanobis Discriminant, called IDQD, is presented to predict 208 OMPs, 206 transmembrane helical proteins (TMHPs) and 673 globular proteins (GPs) by using Chou's pseudo amino acid compositions as parameters. The overall accuracy of jackknife cross-validation is 93.2% and 96.1%, respectively, for three datasets (OMPs, TMHPs and GPs) and two datasets (OMPs and non-OMPs). These predicted results suggest that the method can be effectively applied to discriminate OMPs, TMHPs and GPs. And it also indicates that the pseudo amino acid composition can better reflect the core feature of membrane proteins than the classical amino acid composition.  相似文献   

4.
Three abundant proteins of approximate molecular masses of 22, 23, and 24 kilodaltons were purified from potato (Solanum tuberosum L.) tubers by DEAE cellulose and CM-52 cellulose ion exchange column chromatography, electroelution, and high-pressure liquid chromatography (HPLC). Antibodies specific to the gel-purified 22-kilodalton protein were prepared. Immunoblot analysis showed that the 22-, 23-, and 24-kilodalton proteins are immunologically related and that these proteins are present in tubers and as higher molecular mass forms in leaves, but not in stems, roots, and stolons. The ratios of amino acid composition were compared among the three purified proteins, and the aminoterminal amino acid sequences were determined for these three proteins. All three proteins have identical amino-terminal sequences that match the deduced amino acid sequence of an abundant tuber protein cDNA.  相似文献   

5.
Amino acid sequences from several thousand homologous gene pairs were compared for two plant genomes, Oryza sativa and Arabidopsis thaliana. The Arabidopsis genes all have similar G+C (guanine plus cytosine) contents, whereas their homologs in rice span a wide range of G+C levels. The results show that those rice genes that display increased divergence in their nucleotide composition (specifically, increased G+C content) showed a corresponding, predictable change in the amino acid compositions of the encoded proteins relative to their Arabidopsis homologs. This trend was not seen in a "control" set of rice genes that had nucleotide contents closer to their Arabidopsis homologs. In addition to showing an overall difference in the amino acid composition of the homologous proteins, we were also able to investigate the biased patterns of amino acid substitution since the divergence of these two species. We found that the amino acid exchange matrix was highly asymmetric when comparing the High G+C rice genes with their Arabidopsis homologs. Finally, we investigated the possible causes of this biased pattern of sequence evolution. Our results indicate that the biased pattern of protein evolution is the consequence, rather than the cause, of the corresponding changes in nucleotide content. In fact, there is an even more marked asymmetry in the patterns of substitution at synonymous nucleotide sites. Surprisingly, there is a very strong negative correlation between the level of nucleotide bias and the length of the coding sequences within the rice genome. This difference in gene length may provide important clues about the underlying mechanisms.  相似文献   

6.
The amino acid sequences of proteins determine their three-dimensional structures and functions. However, how sequence information is related to structures and functions is still enigmatic. In this study, we show that at least a part of the sequence information can be extracted by treating amino acid sequences of proteins as a collection of English words, based on a working hypothesis that amino acid sequences of proteins are composed of short constituent amino acid sequences (SCSs) or “words”. We first confirmed that the English language highly likely follows Zipf''s law, a special case of power law. We found that the rank-frequency plot of SCSs in proteins exhibits a similar distribution when low-rank tails are excluded. In comparison with natural English and “compressed” English without spaces between words, amino acid sequences of proteins show larger linear ranges and smaller exponents with heavier low-rank tails, demonstrating that the SCS distribution in proteins is largely scale-free. A distribution pattern of SCSs in proteins is similar among species, but species-specific features are also present. Based on the availability scores of SCSs, we found that sequence motifs are enriched in high-availability sites (i.e., “key words”) and vice versa. In fact, the highest availability peak within a given protein sequence often directly corresponds to a sequence motif. The amino acid composition of high-availability sites within motifs is different from that of entire motifs and all protein sequences, suggesting the possible functional importance of specific SCSs and their compositional amino acids within motifs. We anticipate that our availability-based word decoding approach is complementary to sequence alignment approaches in predicting functionally important sites of unknown proteins from their amino acid sequences.  相似文献   

7.
Data on the amino acid composition of proteins having various functions from organisms representing different evolutionary levels (83 superfamilies) are used in order to elucidate the trends in protein molecular evolution. The interconnections evolutionary rate (rate of mutation acceptance) — amino acid composition, and evolutionary level of the organism — amino acid composition (in case of proteins of the same or very similar function) are studied. The amino acid compositions of proteins performing jointly an evolutionarily old functions are also juxtaposed. The mean contemporary protein composition is used as a basis for comparison. The obtained results are evidence in favour of the existence of a trend for an increase of the special amino acids (Met, Ile, Gln, His, Lys, Asn, Phe, Tyr, Trp, Cys) at the expense of the usual ones (Thr, Pro, Ala, Ser, Arg, Gly, Leu, Val, Glu, Asp). The tests of statistical significance of the obtained results (comparison of the mean compositions of proteins from low evolutionary level organisms with that of all sequenced proteins; comparison of the mean contemporary protein composition with that obtained after simulation of the evolutionary process) confirm and universalize the observed trend. The above results direct the attention to the concept of a smaller number of amino acids in the ancient proteins and respectively simpler genetic code. A fluctuation around the initial primitive level is suggested to explain the conservatism of proteins of the same function in evolutionarily low level organisms. The observed trend could be applied for designing new proteins.  相似文献   

8.
We have isolated from rabbit liver three cDNA clones of 1400-1800 base pairs that hybridize selectively to RNA from animals treated with phenobarbital. The nucleotide sequences of the cDNAs have been determined. In the protein coding region the nucleotide sequences of two of the cDNAs are 88% homologous, and the third cDNA is about 72-74% homologous to the other two. All three are 55-60% homologous to rat liver cytochrome P-450b cDNA. The amino acid sequences derived from the cDNA sequences are about 50% homologous to those of rat liver cytochrome P-450b and rabbit liver cytochrome P-450 (form 2). The degree of homology differs substantially in different regions of the protein. The hydrophobicity profiles of these five mammalian cytochromes P-450 are very similar and contain up to eight regions of hydrophobicity that are long enough to span a membrane. These results indicate that these three cDNAs code for rabbit liver cytochromes P-450 which are different from any rabbit liver cytochrome P-450 for which amino acid sequence information is published. These cDNAs are part of a family of genes that are related to rabbit liver cytochrome P-450 (form 2) and rat liver cytochrome P-450b which are the major phenobarbital-inducible forms. The divergence of amino acid sequence between the rat and rabbit forms and the divergence of nucleotide sequences of silent sites in the two most closely related rabbit forms suggest that cytochromes P-450 have a relatively high rate of amino acid divergence compared to many other vertebrate proteins.  相似文献   

9.
A method for separating the three human protamines by HPLC of underivatized, total protamine extracts on a Nucleosil RP-C18 column is described. The identities of the three proteins have been confirmed by a combination of disc gel electrophoresis, amino acid composition, and primary sequence analysis. The results show that human protamine 3 elutes first, closely followed by protamine 2. Protamine 1 elutes later. The amino acid compositions and partial amino terminal sequences of human protamines 2 and 3 indicate that these two proteins are very closely related and suggest that they differ only by three amino-terminal amino acids.  相似文献   

10.
As a result of remarkable progresses of DNA sequencing technology, vast quantities of genomic sequences have been decoded. Homology search for amino acid sequences, such as BLAST, has become a basic tool for assigning functions of genes/proteins when genomic sequences are decoded. Although the homology search has clearly been a powerful and irreplaceable method, the functions of only 50% or fewer of genes can be predicted when a novel genome is decoded. A prediction method independent of the homology search is urgently needed. By analyzing oligonucleotide compositions in genomic sequences, we previously developed a modified Self-Organizing Map ‘BLSOM’ that clustered genomic fragments according to phylotype with no advance knowledge of phylotype. Using BLSOM for di-, tri- and tetrapeptide compositions, we developed a system to enable separation (self-organization) of proteins by function. Analyzing oligopeptide frequencies in proteins previously classified into COGs (clusters of orthologous groups of proteins), BLSOMs could faithfully reproduce the COG classifications. This indicated that proteins, whose functions are unknown because of lack of significant sequence similarity with function-known proteins, can be related to function-known proteins based on similarity in oligopeptide composition. BLSOM was applied to predict functions of vast quantities of proteins derived from mixed genomes in environmental samples.  相似文献   

11.
12.
13.
Amino and carboxyl terminal groups, amino acid composition, and peptide maps of polyhedral proteins of the nuclear polyhedrosis viruses (NPV) of Bombyx mori and Galleria mellonella were investigated. It is shown that both the proteins have a tyrosine residue as their carboxyl terminal group and no amino terminal group. Amino acid compositions of the proteins are similar. The proteins are found to have 242 residues. From the amino acid composition, a molecular weight of 28,000 was calculated. The tryptic peptide maps of both the proteins differed only in a few peptides.It is inferred that the polyhedral proteins of B. mori and G. mellonella NPV have a closely similar primary structure.  相似文献   

14.
The global amino acid compositions as deduced from the complete genomic sequences of six thermophilic archaea, two thermophilic bacteria, 17 mesophilic bacteria and two eukaryotic species were analysed by hierarchical clustering and principal components analysis. Both methods showed an influence of several factors on amino acid composition. Although GC content has a dominant effect, thermophilic species can be identified by their global amino acid compositions alone. This study presents a careful statistical analysis of factors that affect amino acid composition and also yielded specific features of the average amino acid composition of thermophilic species. Moreover, we introduce the first example of a 'compositional tree' of species that takes into account not only homologous proteins, but also proteins unique to particular species. We expect this simple yet novel approach to be a useful additional tool for the study of phylogeny at the genome level.  相似文献   

15.
We have performed an amino acid composition (AAC) analysis of the complete sequences for 235 secondary transport proteins from Escherichia coli, which have functions in the uptake and export of organic and inorganic metabolites, efflux of drugs and in controlling membrane potential. This revealed the trends in content for specific amino acid types and for combinations of amino acids with similar physicochemical properties. In certain proteins or groups of proteins, the so-called spikes of high content for a specific amino acid type or combination of amino acids were identified and confirmed statistically, which in some cases could be directly related to function and ligand specificity. This was prevalent in proteins with a function of multidrug or metal ion efflux. Any tool that can help in identifying bacterial multidrug efflux proteins is important for a better understanding of this mechanism of antibiotic resistance. Phylogenetic analysis based on sequence alignments and comparison of sequences at the N- and C-terminal ends confirmed transporter Family classification. Locations of specific amino acid types in some of the proteins that have crystal structures (EmrE, LacY, AcrB) were also considered to help link amino acid content with protein function. Though there are limitations, this work has demonstrated that a basic analysis of AAC is a useful tool to use in combination with other computational and experimental methods for classifying and investigating function and ligand specificity in a large group of transport or other membrane proteins, including those that are molecular targets for development of new drugs.  相似文献   

16.
The globin family has long been known from studies of approximately 150-residue proteins such as vertebrate myoglobins and haemoglobins. Recently, this family has been enriched by the investigation of the sequences and structures of truncated globins, which have the same basic topology but are approximately 30 residues shorter and exhibit functions other than the familiar one of binding diatomic ligands. The divergence of protein sequences, structures and functions reveals Nature's exploration of the potential inherent in a folding pattern, that is, the topology of the native structure. The observation of what remains constant and what varies during the evolution of a protein family reveals essential features of structure and function. Study of proteins with a wide range of divergence can therefore sharpen our understanding of how different amino acid sequences can determine similar three-dimensional structures. Globins have provided, and continue to provide, interesting material for such studies.  相似文献   

17.
One of the well-known observations of proteins from thermophilic bacteria is the bias of the amino acid composition in which charged residues are present in large numbers, and polar residues are scarce. On the other hand, it has been reported that the molecular surfaces of proteins are adapted to their subcellular locations, in terms of the amino acid composition. Thus, it would be reasonable to expect that the differences in the amino acid compositions between proteins of thermophilic and mesophilic bacteria would be much greater on the protein surface than in the interior. We performed systematic comparisons between proteins from thermophilic bacteria and mesophilic bacteria, in terms of the amino acid composition of the protein surface and the interior, as well as the entire amino acid chains, by using sequence information from the genome projects. The biased amino acid composition of thermophilic proteins was confirmed, and the differences from those of mesophilic proteins were most obvious in the compositions of the protein surface. In contrast to the surface composition, the interior composition was not distinctive between the thermophilic and mesophilic proteins. The frequency of the amino acid pairs that are closely located in the space was also analyzed to show the same trend of the single amino acid compositions. Interestingly, extracellular proteins from mesophilic bacteria showed an inverse trend against thermophilic proteins (i.e. a reduced number of charged residues and rich in polar residues). Nuclear proteins from eukaryotes, which are known to be abundant in positive charges, showed different compositions as a whole from the thermophiles. These results suggest that the bias of the amino acid composition of thermophilic proteins is due to the residues on the protein surfaces, which may be constrained by the extreme environment.  相似文献   

18.
Archaea-specific radA primers were used with PCR to amplify fragments of radA genes from 11 cultivated archaeal species and one marine sponge tissue sample that contained essentially an archaeal monoculture. The amino acid sequences encoded by the PCR fragments, three RadA protein sequences previously published (21), and two new complete RadA sequences were aligned with representative bacterial RecA proteins and eucaryal Rad51 and Dmc1 proteins. The alignment supported the existence of four insertions and one deletion in the archaeal and eucaryal sequences relative to the bacterial sequences. The sizes of three of the insertions were found to have taxonomic and phylogenetic significance. Comparative analysis of the RadA sequences, omitting amino acids in the insertions and deletions, shows a cladal distribution of species which mimics to a large extent that obtained by a similar analysis of archaeal 16S rRNA sequences. The PCR technique also was used to amplify fragments of 15 radA genes from uncultured natural sources. Phylogenetic analysis of the amino acid sequences encoded by these fragments reveals several clades with affinity, sometimes only distant, to the putative RadA proteins of several species of Crenarcheota. The two most deeply branching archaeal radA genes found had some amino acid deletion and insertion patterns characteristic of bacterial recA genes. Possible explanations are discussed. Finally, signature codons are presented to distinguish among RecA protein family members.  相似文献   

19.
Correlations between genomic GC contents and amino acid frequencies were studied in the homologous sequences of 12 eubacterial genomes. Results show that amino acids encoded by GC-rich codons increases significantly with genomic GC contents, whereas opposite trend was observed in case of amino acids encoded by GC-poor codons. Further studies show all the amino acids do not change in the predicted direction according to their genomic GC pressure, suggesting that protein evolution is not entirely dictated by their nucleotide frequencies. Amino acid substitution matrix calculated among hydrophobic, amphipathic and hydrophilic amino acid groups' shows that amphipathic and hydrophilic amino acids are more frequently substituted by hydrophobic amino acids than from hydrophobic to hydrophilic or amphipathic amino acids. This indicates that nucleotide bias induces a directional changes in proteome composition in such a way that underwent strong changes in hydropathy values. In fact, significant increases in hydrophobicity values have also been observed with the increase of genomic GC contents. Correlations between GC contents and amino acid compositions in three different predicted protein secondary structures show that hydropathy values increases significantly with GC contents in aperiodic and helix structures whereas strand structure remains insensitive with the genomic GC levels. The relative importance of mutation and selection on the evolution of proteins have been discussed on the basis of these results.  相似文献   

20.
Superdomain is uniquely defined in this work as a conserved combination of different globular domains in different proteins. The amino acid sequences of 25 structurally and functionally diverse proteins from fungi, plants, and animals have been analyzed in a test of the superdomain hypothesis. Each of the proteins contains a protein tyrosine phosphatase (PTP) domain followed by a C2 domain. Four novel conserved sequence motifs have been identified, one in the PTP domain and three in the C2 domain. All contribute to the PTP-C2 domain interface in PTEN, a tumor suppressor, and all are more conserved than the PTP signature motif, HCX3(K/R)XR, in the 25 sequences. We show that PTP-C2 was formed prior to the fungi, plant, and animal kingdom divergence. A superdomain as defined here does not fit the usual protein structure classification system. The demonstrated existence of one superdomain suggests the existence of others.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号