共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Qing-Bin Gao 《Analytical biochemistry》2010,398(1):52-59
Integral membrane proteins are central to many cellular processes and constitute approximately 50% of potential targets for novel drugs. However, the number of outer membrane proteins (OMPs) present in the public structure database is very limited due to the difficulties in determining structure with experimental methods. Therefore, discriminating OMPs from non-OMPs with computational methods is of medical importance as well as genome sequencing necessity. In this study, some sequence-derived structural and physicochemical features of proteins were incorporated with amino acid composition to discriminate OMPs from non-OMPs using support vector machines. The discrimination performance of the proposed method is evaluated on a benchmark dataset of 208 OMPs, 673 globular proteins, and 206 α-helical membrane proteins. A high overall accuracy of 97.8% was observed in the 5-fold cross-validation test. In addition, the current method distinguished OMPs from globular proteins and α-helical membrane proteins with overall accuracies of 98.2 and 96.4%, respectively. The prediction performance is superior to the state-of-the-art methods in the literature. It is anticipated that the current method might be a powerful tool for the discrimination of OMPs. 相似文献
3.
T Yamane 《Journal of molecular biology》1965,14(2):616-618
4.
5.
Bogatyreva NS Finkelstein AV Galzitskaya OV 《Journal of bioinformatics and computational biology》2006,4(2):597-608
Archaea, bacteria and eukaryotes represent the main kingdoms of life. Is there any trend for amino acid compositions of proteins found in full genomes of species of different kingdoms? What is the percentage of totally unstructured proteins in various proteomes? We obtained amino acid frequencies for different taxa using 195 known proteomes and all annotated sequences from the Swiss-Prot data base. Investigation of the two data bases (proteomes and Swiss-Prot) shows that the amino acid compositions of proteins differ substantially for different kingdoms of life, and this difference is larger between different proteomes than between different kingdoms of life. Our data demonstrate that there is a surprisingly small selection for the amino acid composition of proteins for higher organisms (eukaryotes) and their viruses in comparison with the "random" frequency following from a uniform usage of codons of the universal genetic code. On the contrary, lower organisms (bacteria and especially archaea) demonstrate an enhanced selection of amino acids. Moreover, according to our estimates, 12%, 3% and 2% of the proteins in eukaryotic, bacterial and archaean proteomes are totally disordered, and long (> 41 residues) disordered segments are found to occur in 16% of arhaean, 20% of eubacterial and 43% of eukaryotic proteins for 19 archaean, 159 bacterial and 17 eukaryotic proteomes, respectively. A correlation between amino acid compositions of proteins of various taxa, show that the highest correlation is observed between eukaryotes and their viruses (the correlation coefficient is 0.98), and bacteria and their viruses (the correlation coefficient is 0.96), while correlation between eukaryotes and archaea is 0.85 only. 相似文献
6.
Summary The relative abundances among the amino acids, which are functionally similar to one another, were explained by random partition of a unit interval. 相似文献
7.
Interaction between the autokinase EpsE and EpsL in the cytoplasmic membrane is required for extracellular secretion in Vibrio cholerae. 总被引:13,自引:0,他引:13 下载免费PDF全文
Vibrio cholerae secretes a number of proteins important for virulence, including cholera toxin. This process requires the products of the eps genes which have homologues in genera such as Aeromonas, Klebsiella and Pseudomonas and are thought to form a membrane-associated multiprotein complex. Here we show that the putative nucleotide-binding protein EpsE is associated with and stabilized by the cytoplasmic membrane via interaction with EpsL. Analysis of fusion proteins between EpsE and the homologous ExeE from Aeromonas hydrophila demonstrates that the N-terminus of EpsE contains the EpsL binding domain and determines species specificity. An intact Walker A box, commonly found in ATP-binding proteins, is required for activity of EpsE in vivo and for autophosphorylation of purified EpsE in vitro. These results indicate that both the kinase activity of EpsE as well as its ability to interact with the putative cytoplasmic membrane protein EpsL are required for translocation of toxin across the outer membrane in Vibrio cholerae. 相似文献
8.
9.
R D Cook 《Theoretical population biology》1975,7(1):64-83
Four quasiloglinear models are proposed for describing relationships between the amino acid composition of proteins and the structure of the genetic code. The models allow estimation of base frequencies in all three codon positions and can be used to investigate “interactions” between any two codon positions. The estimation procedure proposed by Ohta and Kimura (Genetics64 (1970), 387–395) is discussed and using two of the proposed quasiloglinear models an analysis of the amino acid composition of human cytochrome c is presented. The analysis suggests that of the six codons which code for leucine (CUU, CUC, CUA and CUG) do not occur in human cytochrome c. 相似文献
10.
Lin H 《Journal of theoretical biology》2008,252(2):350-356
The outer membrane proteins (OMPs) are β-barrel membrane proteins that performed lots of biology functions. The discriminating OMPs from other non-OMPs is a very important task for understanding some biochemical process. In this study, a method that combines increment of diversity with modified Mahalanobis Discriminant, called IDQD, is presented to predict 208 OMPs, 206 transmembrane helical proteins (TMHPs) and 673 globular proteins (GPs) by using Chou's pseudo amino acid compositions as parameters. The overall accuracy of jackknife cross-validation is 93.2% and 96.1%, respectively, for three datasets (OMPs, TMHPs and GPs) and two datasets (OMPs and non-OMPs). These predicted results suggest that the method can be effectively applied to discriminate OMPs, TMHPs and GPs. And it also indicates that the pseudo amino acid composition can better reflect the core feature of membrane proteins than the classical amino acid composition. 相似文献
11.
Pascal G Médigue C Danchin A 《BioEssays : news and reviews in molecular, cellular and developmental biology》2006,28(7):726-738
Correspondence analysis of 28 proteomes selected to span the entire realm of prokaryotes revealed universal biases in the proteins' amino acid distribution. Integral Inner Membrane Proteins always form an individual cluster, which can then be used to predict protein localisation in unknown proteomes, independently of the organism's biotope or kingdom. Orphan proteins are consistently rich in aromatic residues. Another bias is also ubiquitous: the amino acid composition is driven by the G + C content of the first codon position. An unexpected bias is driven, in many proteomes, by the AAN box of the genetic code, suggesting some functional biochemical relationship between asparagine and lysine. Less-significant biases are driven by the rare amino acids, cysteine and tryptophan. Some allow identification of species-specific functions or localisation such as surface or exported proteins. Errors in genome annotations are also revealed by correspondence analysis, making it useful for quality control and correction. 相似文献
12.
13.
14.
The structural stability of a protein requires a large number of interresidue interactions. The energetic contribution of these can be approximated by low-resolution force fields extracted from known structures, based on observed amino acid pairing frequencies. The summation of such energies, however, cannot be carried out for proteins whose structure is not known or for intrinsically unstructured proteins. To overcome these limitations, we present a novel method for estimating the total pairwise interaction energy, based on a quadratic form in the amino acid composition of the protein. This approach is validated by the good correlation of the estimated and actual energies of proteins of known structure and by a clear separation of folded and disordered proteins in the energy space it defines. As the novel algorithm has not been trained on unstructured proteins, it substantiates the concept of protein disorder, i.e. that the inability to form a well-defined 3D structure is an intrinsic property of many proteins and protein domains. This property is encoded in their sequence, because their biased amino acid composition does not allow sufficient stabilizing interactions to form. By limiting the calculation to a predefined sequential neighborhood, the algorithm was turned into a position-specific scoring scheme that characterizes the tendency of a given amino acid to fall into an ordered or disordered region. This application we term IUPred and compare its performance with three generally accepted predictors, PONDR VL3H, DISOPRED2 and GlobPlot on a database of disordered proteins. 相似文献
15.
The basic chromosomal proteins (SCP) of human, mouse, rabbit and guinea pig sperm nuclei were characterized by polyacrylamide gel electrophoresis and amino acid analysis. Spermatozoa were decapitated with 1% SDS and the nuclei recovered by density gradient centrifugation. Examination by Nomarski and electron microscopy revealed the nuclei to be intact and 99% pure. The basic proteins were extracted from nuclei, aminoethylated and purified by ion exchange chromatography and gel filtration chromatography.The SCP of human, rabbit and guinea pig gave single protein bands with similar mobilities when subjected to polyacrylamide gel electrophoresis. In contrast, aminoethylated mouse SCP consisted of two proteins, SCP·AE1 and SCP·AE2, which had different electrophoretic mobilities. The SCP of these mammalian species were characteristically rich in arginine (47–54.4%) and cysteine (7.7–12.2%). Major differences existed in the amino acid compositions of these proteins. Mouse and human SCP were rich in histidine (12.2 and 7.7%, respectively) and guinea pig was high in tyrosine (11.7%) and phenylalanine (3.5%). Valine was detected only in rabbit SCP and proline in human and guinea pig. Aspartic acid, methionine and tryptophan were not detected in all four species. Studies on the incorporation of [3H]arginine into mouse SCP demonstrated that these basic proteins are synthesized during the terminal stages of spermatogenesis and are subsequently conserved. 相似文献
16.
Fukuchi S Yoshimune K Wakayama M Moriguchi M Nishikawa K 《Journal of molecular biology》2003,327(2):347-357
The amino acid compositions of proteins from halophilic archaea were compared with those from non-halophilic mesophiles and thermophiles, in terms of the protein surface and interior, on a genome-wide scale. As we previously reported for proteins from thermophiles, a biased amino acid composition also exists in halophiles, in which an abundance of acidic residues was found on the protein surface as compared to the interior. This general feature did not seem to depend on the individual protein structures, but was applicable to all proteins encoded within the entire genome. Unique protein surface compositions are common in both halophiles and thermophiles. Statistical tests have shown that significant surface compositional differences exist among halophiles, non-halophiles, and thermophiles, while the interior composition within each of the three types of organisms does not significantly differ. Although thermophilic proteins have an almost equal abundance of both acidic and basic residues, a large excess of acidic residues in halophilic proteins seems to be compensated by fewer basic residues. Aspartic acid, lysine, asparagine, alanine, and threonine significantly contributed to the compositional differences of halophiles from meso- and thermophiles. Among them, however, only aspartic acid deviated largely from the expected amount estimated from the dinucleotide composition of the genomic DNA sequence of the halophile, which has an extremely high G+C content (68%). Thus, the other residues with large deviations (Lys, Ala, etc.) from their non-halophilic frequencies could have arisen merely as "dragging effects" caused by the compositional shift of the DNA, which would have changed to increase principally the fraction of aspartic acid alone. 相似文献
17.
18.
Having obtained the amino acid composition of a protein, chemists and molecular biologists may wish to identify the protein from this data alone. In general such data will have errors associated with them and the length of the protein may be known only approximately or not at all. In this paper a method is described which enables searching of protein sequence databases for sequences or fragments of sequences which have a composition similar to the one being sought. Such searches are generally quite discriminating as shown by the examples provided. This method has been implemented as part of the computer program Scrutineer and is being freely distributed. It is simple to use. 相似文献
19.
20.
Membrane proteins: amino acid sequence and membrane penetration 总被引:26,自引:0,他引:26
A computer study shows that the membrane-penetrating portion of the erythrocyte surface MN-glycoprotein (Winzler, 1969; Marchesi et al., 1972) is distinguishable by informal cluster analysis from other segments of globular proteins when sequence length is plotted against hydrophobicity This analysis further suggests the possibility that other membrane-penetrating segments of proteins can be identified in the same way. 相似文献