首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Feng ZP 《Biopolymers》2001,58(5):491-499
A new representation of protein sequence is devoted in this paper, in which each protein can be represented by a 20-dimensional (20D) vector of unit length. Inspired by the principle of superposition of state in quantum mechanics, the squares of the 20 components of the vector correspond to the amino acid composition. Using the new representation of the primary sequence and Bayes Discriminant Algorithm, the subcellular location of prokaryotic proteins was predicted. The overall predictive accuracy in the jackknife test can be 3% higher than the result of using amino acid composition directly for the database of sequence identity is less than 90%, but 5% higher when sequence identity is less than 80%. The higher predictive accuracy indicates that the current measure of extracting the information from the primary sequence is efficient. Since the subcellular location restricting a protein's possible function, the present method should also be a useful measure for the systematic analysis of genome data. The program used in this paper is available on request.  相似文献   

2.
Zp curve, a three-dimensional space curve representation of protein primary sequence based on the hydrophobicity and charged properties of amino acid residues along the primary sequence is suggested. Relying on the Zp parameters extracted from the three components of the Zp curve and the Bayes discriminant algorithm, the subcellular locations of prokaryotic proteins were predicted. Consequently, an accuracy of 81.5% in the cross-validation test has been achieved using 13 parameters extracted from the curve for the database of 997 prokaryotic proteins. The result is slightly better than that of using the neural network method (80.9%) based on the amino acid composition for the same database. By jointing the amino acid composition and the Zp parameters, the overall predictive accuracy 89.6% can be achieved. It is about 3% higher than that of the Bayes discriminant algorithm based merely on the amino acid composition for the same database. The prediction is also performed with a larger dataset derived from the version 39 SWISS-PROT databank and two datasets with different sequence similarity. Even for the dataset of non-sequence similarity, the improvement can be of 4.4% in the cross-validation test. The results indicate that the Zp parameters are effective in representing the information within a protein primary sequence. The method of extracting information from the primary structure may be useful for other areas of protein studies.  相似文献   

3.
By far the best understood role of the proteasome is to remove ubiquitin-conjugated proteins from eukaryotric cells by hydrolysing them into small peptides of varying lengths. These include both misfolded/abnormal proteins, as well as 'normal' proteins that need to be rapidly removed for regulatory purposes. However, the proteasome is also present in numerous prokaryotic organisms, while ubiquitin, and the ubiquitin conjugating system, are not. The eukaryotic proteasome has been adapted to degrading proteins in a ubiquitin-dependent fashion by the addition of regulatory factors that assemble in different layers onto the proteolytic core of the proteasome, and by increasing the diversity of the core subunits as well. In addition to hydrolysing ubiquitinated proteins into amino acids, the proteasome can also proteolyse selected non-ubiquitinated proteins, process proteins, and possibly refold misfolded proteins. This review will focus on the different proteasome functions, and how these are used in the multiple regulatory roles the proteasome plays in eukaryotic cells.  相似文献   

4.
A census of protein repeats.   总被引:20,自引:0,他引:20  
In this study, we analyzed all known protein sequences for repeating amino acid segments. Although duplicated sequence segments occur in 14 % of all proteins, eukaryotic proteins are three times more likely to have internal repeats than prokaryotic proteins. After clustering the repetitive sequence segments into families, we find repeats from eukaryotic proteins have little similarity with prokaryotic repeats, suggesting most repeats arose after the prokaryotic and eukaryotic lineages diverged. Consequently, protein classes with the highest incidence of repetitive sequences perform functions unique to eukaryotes. The frequency distribution of the repeating units shows only weak length dependence, implicating recombination rather than duplex melting or DNA hairpin formation as the limiting mechanism underlying repeat formation. The mechanism favors additional repeats once an initial duplication has been incorporated. Finally, we show that repetitive sequences are favored that contain small and relatively water-soluble residues. We propose that error-prone repeat expansion allows repetitive proteins to evolve more quickly than non-repeat-containing proteins.  相似文献   

5.
As the number of complete genomes rapidly increases, accurate methods to automatically predict the subcellular location of proteins are increasingly useful to help their functional annotation. In order to improve the predictive accuracy of the many prediction methods developed to date, a novel representation of protein sequences is proposed. This representation involves local compositions of amino acids and twin amino acids, and local frequencies of distance between successive (basic, hydrophobic, and other) amino acids. For calculating the local features, each sequence is split into three parts: N-terminal, middle, and C-terminal. The N-terminal part is further divided into four regions to consider ambiguity in the length and position of signal sequences. We tested this representation with support vector machines on two data sets extracted from the SWISS-PROT database. Through fivefold cross-validation tests, overall accuracies of more than 87% and 91% were obtained for eukaryotic and prokaryotic proteins, respectively. It is concluded that considering the respective features in the N-terminal, middle, and C-terminal parts is helpful to predict the subcellular location.  相似文献   

6.
Jiang Y  Pico A  Cadene M  Chait BT  MacKinnon R 《Neuron》2001,29(3):593-601
The intracellular C-terminal domain structure of a six-transmembrane K+ channel from Escherichia coli has been solved by X-ray crystallography at 2.4 A resolution. The structure is representative of a broad class of domains/proteins that regulate the conductance of K+ (here referred to as RCK domains) in prokaryotic K+ transporters and K+ channels. The RCK domain has a Rossmann-fold topology with unique positions, not commonly conserved among Rossmann-fold proteins, composing a well-conserved salt bridge and a hydrophobic dimer interface. Structure-based amino acid sequence alignments and mutational analysis are used to demonstrate that an RCK domain is also present and is an important component of the gating machinery in eukaryotic large-conductance Ca2+ activated K+ channels.  相似文献   

7.
The pokeweed antiviral protein is a ribosome inactivating protein acting on eukaryotic as well as on prokaryotic ribosomes thus is toxic for both cell types. Using the PCR technique to clone the PAP open reading frame, we characterized two cDNAs coding for proteins inhibiting eukaryotic translation process and which are not toxic for Escherichia coli, unlike the wild type protein. The sequence of the two cDNAs showed that the proteins contain only one and two point mutations. This result suggest that the wild type amino acids in the mutated positions participate in the prokaryotic ribosome recognition. These mutants might be useful for the construction of immunotoxins containing the pokeweed antiviral protein as toxin.  相似文献   

8.
Ribosomes are the only cell organelles occurring in all organisms. E. coli ribosomes, which are the best characterized particles, consist of three RNAs and 53 proteins. All components have been isolated and characterized by chemical, physical and immunological methods. The primary structures of the RNAs and of all the proteins are known. Information about the secondary structure of the proteins derives from circular dichroism measurements and from secondary structure prediction methods. The tertiary structure is being studied by limited proteolysis, proton magnetic resonance and crystallization followed by X-ray analysis. Various methods are being used to elucidate the architecture of the ribosomal particle: three-dimensional image reconstruction of crystals of bacterial ribosomes and/or their subunits; immune electron microscopy; neutron scattering; protein-protein, protein-RNA and RNA-RNA crosslinking; total reconstitution of ribosomal subunits. The results from these studies yield valuable information on the architecture of the ribosomal particle. Many mutants have been isolated in which one or a few ribosomal proteins are altered or even deleted. The genetic and biochemical characterization of these mutants allows conclusions about the importance of these proteins for the function of the ribosome. Ribosomal proteins from various prokaryotic and eukaryotic species have been compared by two-dimensional gel electrophoresis, immunological methods, reconstitution and amino acid sequence analysis. These studies show a strong homology among prokaryotic ribosomal proteins but only a weak homology between proteins from prokaryotic and eukaryotic ribosomes. Comparison of the primary and secondary structures of the ribosomal RNAs from various organisms shows that the secondary structure of the RNA molecules has been strongly conserved throughout evolution.  相似文献   

9.
J M Bujnicki 《FEBS letters》2001,507(2):123-127
The amino acid sequences of Gcd10p and Gcd14p, the two subunits of the tRNA:(1-methyladenosine-58; m(1)A58) methyltransferase (MTase) of Saccharomyces cerevisiae, have been analyzed using iterative sequence database searches and fold recognition programs. The results suggest that the 'catalytic' Gcd14p and 'substrate binding' Gcd10p are related to each other and to a group of prokaryotic open reading frames, which were previously annotated as hypothetical protein isoaspartate MTases in sequence databases. It is predicted that the prokaryotic proteins are genuine tRNA:m(1)A MTases based on similarity of their predicted active site to the Gcd14p family. In addition to the MTase domain, an additional domain was identified in the N-terminus of all these proteins that may be involved in interaction with tRNA. These results suggest that the eukaryotic tRNA:m(1)A58 MTase is a product of gene duplication and divergent evolution of a possibly homodimeric prokaryotic enzyme.  相似文献   

10.
Given a raw protein sequence, knowing its subcellular location is an important step toward understanding its function and designing further experiments. A novel method is proposed for the prediction of protein subcellular locations from sequences. For four categories of eukaryotic proteins the overall predictive accuracy is 82.0%, 2.6% higher than that by using SVM approach. For three subcellular locations of prokaryotic proteins, an overall accuracy of 89.9% is obtained. In accordance with the architecture of cells, a hierarchical prediction approach is designed. Based on amino acid composition extracellular proteins and intracellular proteins can be identified with accuracy of 97%.  相似文献   

11.
Zhou XB  Chen C  Li ZC  Zou XY 《Amino acids》2008,35(2):383-388
Apoptosis proteins play an important role in the development and homeostasis of an organism. The accurate prediction of subcellular location for apoptosis proteins is very helpful for understanding the mechanism of apoptosis and their biological functions. However, most of the existing predictive methods are designed by utilizing a single classifier, which would limit the further improvement of their performances. In this paper, a novel predictive method, which is essentially a multi-classifier system, has been proposed by combing a dual-layer support vector machine (SVM) with multiple compositions including amino acid composition (AAC), dipeptide composition (DPC) and amphiphilic pseudo amino acid composition (Am-Pse-AAC). As a demonstration, the predictive performance of our method was evaluated on two datasets of apoptosis proteins, involving the standard dataset ZD98 generated by Zhou and Doctor, and a larger dataset ZW225 generated by Zhang et al. With the jackknife test, the overall accuracies of our method on the two datasets reach 94.90% and 88.44%, respectively. The promising results indicate that our method can be a complementary tool for the prediction of subcellular location.  相似文献   

12.
The "universal correlation" (D'Onofrio, G., Bernardi, G., 1992. A universal compositional correlation among codon positions. Gene 110, 81-88.) that holds between and or ( values are the average values of the coding sequences of each genome analyzed) at both the inter- and intra-genomic level, was re-analyzed on a vastly larger dataset. The results showed a slight, but significant, difference in the vs. correlations exhibited by prokaryotes and eukaryotes. This finding prompted an analysis of the correlation between and the amino acid frequencies in the encoded proteins, which has shown that positive correlations exist between values of coding sequences and the hydropathy of the corresponding proteins. These correlations are due to the fact that hydrophobic and amphypathic amino acids increase, whereas hydrophilic amino acids decrease with increasing values. Hydropathy values of prokaryotic proteins are systematically higher than those of eukaryotes, but the slopes of the regression lines are identical. The lower hydrophobicity of eukaryotic proteins is due to differences in the amino acid composition. In particular, the twofold higher cysteine (and disulfide bond) level of eukaryotic proteins compared to prokaryotic proteins most probably compensates for their lower hydrophobicity. This supports the viewpoint that hydrophobicity plays a structural and functional role as far as protein stability is concerned.  相似文献   

13.
Evolution of the cytoskeleton   总被引:1,自引:0,他引:1  
The eukaryotic cytoskeleton appears to have evolved from ancestral precursors related to prokaryotic FtsZ and MreB. FtsZ and MreB show 40-50% sequence identity across different bacterial and archaeal species. Here I suggest that this represents the limit of divergence that is consistent with maintaining their functions for cytokinesis and cell shape. Previous analyses have noted that tubulin and actin are highly conserved across eukaryotic species, but so divergent from their prokaryotic relatives as to be hardly recognizable from sequence comparisons. One suggestion for this extreme divergence of tubulin and actin is that it occurred as they evolved very different functions from FtsZ and MreB. I will present new arguments favoring this suggestion, and speculate on pathways. Moreover, the extreme conservation of tubulin and actin across eukaryotic species is not due to an intrinsic lack of variability, but is attributed to their acquisition of elaborate mechanisms for assembly dynamics and their interactions with multiple motor and binding proteins. A new structure-based sequence alignment identifies amino acids that are conserved from FtsZ to tubulins. The highly conserved amino acids are not those forming the subunit core or protofilament interface, but those involved in binding and hydrolysis of GTP.  相似文献   

14.
Neural networks have been trained to predict the subcellular location of proteins in prokaryotic or eukaryotic cells from their amino acid composition. For three possible subcellular locations in prokaryotic organisms a prediction accuracy of 81% can be achieved. Assigning a reliability index, 33% of the predictions can be made with an accuracy of 91%. For eukaryotic proteins (excluding plant sequences) an overall prediction accuracy of 66% for four locations was achieved, with 33% of the sequences being predicted with an accuracy of 82% or better. With the subcellular location restricting a protein's possible function, this method should be a useful tool for the systematic analysis of genome data and is available via a server on the world wide web.  相似文献   

15.
We have examined the merits of the three functions based on amino acid compositions which have been proposed to indicate the similarity in amino acid sequences of two proteins; the difference index, the composition divergence and the composition coefficient. We have taken the amino acid compositions and sequences of 41 cytochrome c's and used the 820 values from all possible comparisons in the evaluation. We conclude that the functions do have a limited value in predicting proteins which are closely related in sequence and that the three functions are equivalent in this predictive ability. We have used the composition divergence values obtained from available pyruvate kinase amino acid compositions to generate a phylogenetic tree for this glycolytic enzyme.  相似文献   

16.
The amino acid sequence of the small copper protein auracyanin A isolated from the thermophilic photosynthetic green bacterium Chloroflexus aurantiacus has been determined to be a polypeptide of 139 residues. His58, Cys123, His128, and Met132 are spaced in a way to be expected if they are the evolutionary conserved metal ligands as in the known small copper proteins plastocyanin and azurin. Secondary structure prediction also indicates that auracyanin has a general beta-barrel structure similar to that of azurin from Pseudomonas aeruginosa and plastocyanin from poplar leaves. However, auracyanin appears to have sequence characteristics of both small copper protein sequence classes. The overall similarity with a consensus sequence of azurin is roughly the same as that with a consensus sequence of plastocyanin, namely 30.5%. We suggest that auracyanin A, together with the B forms, is the first example of a new class of small copper proteins that may be descendants of an ancestral sequence to both the azurin proteins occurring in prokaryotic nonphotosynthetic bacteria and the plastocyanin proteins occurring in both prokaryotic cyanobacteria and eukaryotic algae and plants. The N-terminal sequence region 1-18 of auracyanin is remarkably rich in glycine and hydroxy amino acids, and required mass spectrometric analysis to be determined. The nature of the blocking group X is not yet known, although its mass has been determined to be 220 Da. The auracyanins are the first small blue copper proteins found and studied in anoxygenic photosynthetic bacteria and are likely to mediate electron transfer between the cytochrome bc1 complex and the photosynthetic reaction center.  相似文献   

17.
Evolution of the triplet code is reconstructed on the basis of consensus temporal order of appearance of amino acids. Several important predictions are confirmed by computational sequence analyses. The earliest amino acids, alanine and glycine, have been encoded by GCC and GGC codons, as today. They were succeeded, respectively, by A- and G-series of amino acids, encoded by pyrimidine-central and purine-central codons. The length of the earliest proteins is estimated to be 6–7 residues. The earliest mRNAs were short G+C-rich molecules. These short sequences could have formed hairpins. This is confirmed by analysis of modern prokaryotic mRNA sequences. Predominant size of detected ancient hairpins also corresponds to 6–7 amino acids, as above. Vestiges of last common ancestor can be found in extant proteins in form of entirely conserved short sequences of size six to nine residues present in all or almost all sequenced prokaryotic proteomes (omnipresent motifs). The functions of the topmost conserved octamers are not involved in the basic elementary syntheses. This suggests an initial abiotic supply of amino acids, bases and sugars. Presented at: National Workshop on Astrobiology: Search for Life in the Solar System, Capri, Italy, 26 to 28 October, 2005.  相似文献   

18.
A novel dual function (reporter and affinity) tag system has been developed. Expression vectors have been constructed to express polypeptides in Escherichia coli cells as C-terminal fusions with esterase 2, a 34-kDa protein from Alicyclobacillus acidocaldarius. Presence of esterase allows to monitor the expression of fusion proteins spectrophotometrically or by activity staining in the polyacrylamide gels. The fusion proteins can be purified from crude bacterial extracts under non-denaturing conditions by one step affinity chromatography on Sepharose CL-6B immobilized trifluoromethyl-alkyl-ketone. The esterase carrier can be cleaved from fusion proteins by digestion with amino acid sequence-specific proteases blood coagulation factor Xa. The system has been used successfully for the expression and purification of polypeptides from different prokaryotic and eukaryotic organisms.  相似文献   

19.
Molecular sorting of proteins into the cisternal secretory pathway   总被引:1,自引:0,他引:1  
G A Scheele 《Biochimie》1988,70(9):1269-1276
Cotranslational translocation of exportable proteins across the RER membrane prior to their release into the extracellular space has been essentially described by use of canine pancreatic microsomal membranes. Intracisternal segregation of nascent secretory proteins was observed to be irreversible and proteolytic removal of signal sequences resulted in conformationally mature and stable proteins. Structural studies on various translocation peptides from both eukaryotic and prokaryotic preparations showed that many of them have a comparable three-domain organization. A hydrophilic amino-terminal domain is followed by a core region of hydrophobic amino acids and by the region in which the proteolytic cleavage occurs. Membrane components involved in the translocation process namely the signal recognition particle and the SRP receptor as well as the way the vectorial transport mechanism of nascent secretory proteins occurs are also discussed.  相似文献   

20.
The GC contents of 2670 prokaryotic genomes that belong to diverse phylogenetic lineages were analyzed in this paper. These genomes had GC contents that ranged from 13.5% to 74.9%. We analyzed the distance of base frequencies at the three codon positions, codon frequencies, and amino acid compositions across genomes with respect to the differences in the GC content of these prokaryotic species. We found that although the phylogenetic lineages were remote among some species, a similar genomic GC content forced them to adopt similar base usage patterns at the three codon positions, codon usage patterns, and amino acid usage patterns. Our work demonstrates that in prokaryotic genomes: a) base usage, codon usage, and amino acid usage change with GC content with a linear correlation; b) the distance of each usage has a linear correlation with the GC content difference; and c) GC content is more essential than phylogenetic lineage in determining base usage, codon usage, and amino acid usage. This work is exceptional in that we adopted intuitively graphic methods for all analyses, and we used these analyses to examine as many as 2670 prokaryotes. We hope that this work is helpful for understanding common features in the organization of microbial genomes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号