共查询到20条相似文献,搜索用时 15 毫秒
1.
Knowledge of amino acid composition, alone, is verified here to be sufficient for recognizing the structural class, α, β, α+β, or α/β of a given protein with an accuracy of 81%. This is supported by results from exhaustive enumerations of all conformations for all sequences of simple, compact lattice models consisting of two types (hydrophobic and polar) of residues. Different compositions exhibit strong affinities for certain folds. Within the limits of validity of the lattice models, two factors appear to determine the choice of particular folds: 1) the coordination numbers of individual sites and 2) the size and geometry of non-bonded clusters. These two properties, collectively termed the distribution of non-bonded contacts, are quantitatively assessed by an eigenvalue analysis of the so-called Kirchhoff or adjacency matrices obtained by considering the non-bonded interactions on a lattice. The analysis permits the identification of conformations that possess the same distribution of non-bonded contacts. Furthermore, some distributions of non-bonded contacts are favored entropically, due to their high degeneracies. Thus, a competition between enthalpic and entropic effects is effective in determining the choice of a distribution for a given composition. Based on these findings, an analysis of non-bonded contacts in protein structures was made. The analysis shows that proteins belonging to the four distinct folding classes exhibit significant differences in their distributions of non-bonded contacts, which more directly explains the success in predicting structural class from amino acid composition. Proteins 29:172–185, 1997. Published 1997 Wiley-Liss, Inc. 1 This article is a US Goverment work and, as such, is in the public domain in the United States of America. 相似文献
2.
Alignment free methods based on Chaos Game Representation (CGR), also known as sequence signature approaches, have proven of great interest for DNA sequence analysis. Indeed, they have been successfully applied for sequence comparison, phylogeny, detection of horizontal transfers or extraction of representative motifs in regulation sequences. Transposing such methods to proteins poses several fundamental questions related to representation space dimensionality. Several studies have tackled these points, but none has, so far, brought the application of CGRs to proteins to their fully expected potential. Yet, several studies have shown that techniques based on n-peptide frequencies can be relevant for proteins. Here, we investigate the effectiveness of a strategy based on the CGR approach using a fixed reverse encoding of amino acids into nucleic sequences. We first explore its relevance to protein classification into functional families. We then attempt to apply it to the prediction of protein structural classes. Our results suggest that the reverse encoding approach could be relevant in both cases. We show that it is able to classify functional families of proteins by extracting signatures close to the ProSite patterns. Applied to structural classification, the approach reaches scores of correct classification close to 84%, i.e. close to the scores of related methods in the field. Various optimizations of the approach are still possible, which open the door for future applications. 相似文献
3.
General patterns of protein structural organization have emerged from studies of hundreds of structures elucidated by X-ray crystallography and nuclear magnetic resonance. Structural units are commonly identified by visual inspection of molecular models using qualitative criteria. Here, we propose an algorithm for identification of structural units by objective, quantitative criteria based on atomic interactions. The underlying physical concept is maximal interactions within each unit and minimal interaction between units (domains). In a simple harmonic approximation, interdomain dynamics is determined by the strength of the interface and the distribution of masses. The most likely domain decomposition involves units with the most correlated motion, or largest interdomain fluctuation time. The decomposition of a convoluted 3-D structure is complicated by the possibility that the chain can cross over several times between units. Grouping the residues by solving an eigenvalue problem for the contact matrix reduces the problem to a one-dimensional search for all reasonable trial bisections. Recursive bisection yields a tree of putative folding units. Simple physical criteria are used to identify units that could exist by themselves. The units so defined closely correspond to crystallographers' notion of structural domains. The results are useful for the analysis of folding principles, for modular protein design and for protein engineering. © 1994 Wiley-Liss, Inc. 相似文献
4.
图聚类用于蛋白质分类问题可以获得较好结果,其前提是将蛋白质之间复杂的相互关系转化为适当的相似性网络作为图聚类分类的输入数据。本文提出一种基于BLAST检索的相似性网络构建方法,从目标蛋白质序列出发,通过若干轮次的BLAST检索逐步从数据库中提取与目标蛋白质直接或间接相关的序列,构成关联集。关联集中序列之间的相似性关系即相似性网络,可作为图聚类算法的分类依据。对Pfam数据库中依直接相似关系难以正确分类的蛋白质的计算表明,按本文方法构建的相似性网络取得了比较满意的结果。 相似文献
5.
6.
随机森林模型在分类与回归分析中的应用 总被引:25,自引:0,他引:25
随机森林(random forest)模型是由Breiman和Cutler在2001年提出的一种基于分类树的算法。它通过对大量分类树的汇总提高了模型的预测精度,是取代神经网络等传统机器学习方法的新的模型。随机森林的运算速度很快,在处理大数据时表现优异。随机森林不需要顾虑一般回归分析面临的多元共线性的问题,不用做变量选择。现有的随机森林软件包给出了所有变量的重要性。另外,随机森林便于计算变量的非线性作用,而且可以体现变量间的交互作用(interaction)。它对离群值也不敏感。本文通过3个案例,分别介绍了随机森林在昆虫种类的判别分析、有无数据的分析(取代逻辑斯蒂回归)和回归分析上的应用。案例的数据格式和R语言代码可为研究随机森林在分类与回归分析中的应用提供参考。 相似文献
7.
The causal relationship between protein structural change and ligand binding was classified and annotated for 839 nonredundant pairs of crystal structures in the Protein Data Bank—one with and the other without a bound low-molecular-weight ligand molecule. Protein structural changes were first classified into either domain or local motions depending on the size of the moving protein segments. Whether the protein motion was coupled with ligand binding was then evaluated based on the location of the ligand binding site and by application of the linear response theory of protein structural change. Protein motions coupled with ligand binding were further classified into either closure or opening motions. This classification revealed the following: (i) domain motions coupled with ligand binding are dominated by closure motions, which can be described by the linear response theory; (ii) local motions frequently accompany order-disorder or α-helix-coil conformational transitions; and (iii) transferase activity (Enzyme Commission number 2) is the predominant function among coupled domain closure motions. This could be explained by the closure motion acting to insulate the reaction site of these enzymes from environmental water. 相似文献
8.
We have developed a simple measuring system for fluorescence-detected linear dichroism and applied it to the structural analysis of the RecA-DNA complex filaments, which are intermediates of the homologous recombination reaction. Taking advantage of the selectivity of fluorescence signals, we distinguished the linear dichroism signals of ethidium bromide and tryptophan residues in the RecA-DNA-ethidium bromide complex, whereas the conventional (absorption-detected) linear dichroism measurement provides only the sum of the signals because signals overlap each other and that of DNA. We further observed that the tryptophan residue at position 290 of RecA in the RecA-DNA-adenosine-5'-O-(3-thiotriphosphate) complex was oriented parallel to the long axis of the filament, in good agreement with the previous site-specific linear dichroism analysis, and that this orientation was not significantly modified by the pairing of the complementary DNA strand. These results suggest that the pairing reaction occurs without a large structural change of the RecA filament. 相似文献
9.
The N-linked glycan in immunoglobulin G is critical for the stability and function of the crystallizable fragment (Fc) region. Alteration of these protein properties upon the removal of the N-linked glycan has often been explained by the alteration of the CH2 domain orientation in the Fc region. To confirm this hypothesis, we examined the small-angle X-ray scattering (SAXS) profile of the glycosylated Fc region (gFc) and aglycosylated Fc region (aFc) in solution. Conformational characteristics of the CH2 domain orientation were validated by comparison with SAXS profiles theoretically calculated from multiple crystal structures of the Fc region with different CH2 domain orientations. The reduced chi-square values from the fitting analyses of gFc and aFc associated with the degree of openness or closure of each crystal structure, as determined from the first principal component that partially governed the variation of the CH2 domain orientation extracted by a singular value decomposition analysis. For both gFc and aFc, the best-fitted SAXS profiles corresponded to ones calculated based on the crystal structure of gFc that formed a “semi-closed” CH2 domain orientation. Collectively, the data indicated that the removal of the N-linked glycan only negligibly affected the CH2 domain orientation in solution. These findings will guide the development of methodology for the production of highly refined functional Fc variants. 相似文献
10.
11.
Bt玉米秸秆Bt蛋白的土壤降解及其拟合模型的比较 总被引:7,自引:0,他引:7
研究4种Bt玉米(34B24、NK58-D1、R×601RR/YG和农大61)秸秆分解释放Bt蛋白的土壤降解动态,分别应用一级动力学反应方程、双指数模型和移动对数模型进行了拟合。结果表明,4种Bt玉米Bt蛋白的土壤降解均呈现前期负指数大量快速降解和中后期极少量稳定两个阶段,双指数模型与移动对数模型的拟合结果比一级动力学反应方程更符合实际。判断模型拟合精度时不能只用拟合结果的统计指标值来确定,图示对比实测值与模拟值的差异是避免应用统计学拟合精度指标很高但并不符合实际的模型错误估算DT50的有效办法。在研究转基因作物秸秆中Bt蛋白土壤降解规律时,应该根据其土壤降解前期较快的特点,增加前期取样的次数,以便更好地对模拟模型求得的安全性评价关键参数DT50的准确性进行科学判断。 相似文献
12.
The growth of gene and protein sequence information is currently so rapid that three-dimensional structural information is lacking for the overwhelming majority of known proteins. In this review, efforts towards rapid and sensitive methods for protein structural characterization are described, complementing existing technologies. Based on chemical cross-linking and offering the analytical speed and sensitivity of mass spectrometry these methodologies are thought to contribute valuable tools towards future high throughput protein structure elucidation. 相似文献
13.
In all cell types, protein homeostasis, or “proteostasis,” is maintained by sophisticated quality control networks that regulate protein synthesis, folding, trafficking, aggregation, disaggregation, and degradation. In one notable example, Escherichia coli employ a proteostasis system that determines whether substrates of the twin-arginine translocation (Tat) pathway are correctly folded and thus suitable for transport across the tightly sealed cytoplasmic membrane. Herein, we review growing evidence that the Tat translocase itself discriminates folded proteins from those that are misfolded and/or aggregated, preferentially exporting only the former. Genetic suppressors that inactivate this mechanism have recently been isolated and provide direct evidence for the participation of the Tat translocase in structural proofreading of its protein substrates. We also discuss how this discriminatory “folding sensor” has been exploited for the discovery of structural probes (e.g., sequence mutations, pharmacologic chaperones, intracellular antibodies) that modulate the folding and solubility of virtually any protein-of-interest, including those associated with aggregation diseases (e.g., α-synuclein, amyloid-β protein). Taken together, these studies highlight the utility of engineered bacteria for rapidly and inexpensively uncovering potent anti-aggregation factors. 相似文献
14.
Yong Yang Sheng Zhang Kevin Howe David B Wilson Felix Moser Diana Irwin Theodore W Thannhauser 《Journal of biomolecular techniques》2007,18(4):226-237
The use of nLC-ESI-MS/MS in shotgun proteomics experiments and GeLC-MS/MS analysis is well accepted and routinely available in most proteomics laboratories. However, the same cannot be said for nLC-MALDI MS/MS, which has yet to experience such widespread acceptance, despite the fact that the MALDI technology offers several critical advantages over ESI. As an illustration, in an analysis of moderately complex sample of E. coli proteins, the use MALDI in addition to ESI in GeLC-MS/MS resulted in a 16% average increase in protein identifications, while with more complex samples the number of additional protein identifications increased by an average of 45%. The size of the unique peptides identified by MALDI was, on average, 25% larger than the unique peptides identified by ESI, and they were found to be slightly more hydrophilic. The insensitivity of MALDI to the presence of ionization suppression agents was shown to be a significant advantage, suggesting it be used as a complement to ESI when ion suppression is a possibility. Furthermore, the higher resolution of the TOF/TOF instrument improved the sensitivity, accuracy, and precision of the data over that obtained using only ESI-based iTRAQ experiments using a linear ion trap. Nevertheless, accurate data can be generated with either instrument. These results demonstrate that coupling nanoLC with both ESI and MALDI ionization interfaces improves proteome coverage, reduces the deleterious effects of ionization suppression agents, and improves quantitation, particularly in complex samples. 相似文献
15.
Marc Vancanneyt Eddy Van Lerberge Jean-Francois Berny Gregoire L. Hennebert Karel Kersters 《Antonie van Leeuwenhoek》1992,61(1):69-78
The relationships among 65 basidiomycetous yeast strains were determined by one-dimensional electrophoresis of SDS-solubilized whole-cell proteins. Protein profiles were compared by the Pearson product moment correlation coefficient (r). The strains investigated represented species from the generaCystofilobasidium, Filobasidium, Filobasidiella, Kondoa, Leucosporidium, Mrakia andRhodosporidium. Except for the genusMrakia, all species constituted separate protein electrophoretic clusters. The species of the genusMrakia (M. frigida, M. gelida, M. nivalis andM. stokesii) show highly similar protein patterns, suggesting that these four species may be synonymous. Strains of two varieties ofFilobasidiella neoformans, F. neoformans var.neoformans andF. neoformans var.bacillispora, could not be differentiated by protein electrophoresis.For the delineation of the protein electrophoretic clusters of the yeasts studied, literature data relying on other criteria, such as DNA base composition, carbon source utilization patterns, enzymatic protein electrophoregrams, ubiquinone systems, DNA-DNA homology and rRNA sequence data were used. It was demonstrated that a database of SDS-protein patterns provides a valuable tool for the identification of yeasts. 相似文献
16.
Based on available experimental data and using a theoretical model of protein folding, we demonstrate that there is an optimal ratio between the average conformational entropy and the average contact energy per residue for fast protein folding. A statistical analysis of the conformational entropy and the number of contacts per residue for 5829 protein domains from four main classes (α, β, α/β, α+β) shows that each class has its own characteristic average number of contacts per residue and average conformational entropy per residue. These class-specific characteristics determine the protein folding rates: α-proteins are the fastest to fold, β-proteins are the second fastest, α+β-proteins are the third, and α/β-proteins are the last to fold. 相似文献
17.
18.
A reduced amino acid alphabet for understanding and designing protein adaptation to mutation 总被引:1,自引:0,他引:1
Etchebest C Benros C Bornot A Camproux AC de Brevern AG 《European biophysics journal : EBJ》2007,36(8):1059-1069
Protein sequence world is considerably larger than structure world. In consequence, numerous non-related sequences may adopt similar 3D folds and different kinds of amino acids may thus be found in similar 3D structures. By grouping together the 20 amino acids into a smaller number of representative residues with similar features, sequence world simplification may be achieved. This clustering hence defines a reduced amino acid alphabet (reduced AAA). Numerous works have shown that protein 3D structures are composed of a limited number of building blocks, defining a structural alphabet. We previously identified such an alphabet composed of 16 representative structural motifs (5-residues length) called Protein Blocks (PBs). This alphabet permits to translate the structure (3D) in sequence of PBs (1D). Based on these two concepts, reduced AAA and PBs, we analyzed the distributions of the different kinds of amino acids and their equivalences in the structural context. Different reduced sets were considered. Recurrent amino acid associations were found in all the local structures while other were specific of some local structures (PBs) (e.g Cysteine, Histidine, Threonine and Serine for the alpha-helix Ncap). Some similar associations are found in other reduced AAAs, e.g Ile with Val, or hydrophobic aromatic residues Trp with Phe and Tyr. We put into evidence interesting alternative associations. This highlights the dependence on the information considered (sequence or structure). This approach, equivalent to a substitution matrix, could be useful for designing protein sequence with different features (for instance adaptation to environment) while preserving mainly the 3D fold. 相似文献
19.
20.
Small globular proteins have many contacts between residues that are distant in primary sequence. These contacts create a complex network between sequence-distant segments of secondary structure, which may be expected to promote the cooperative folding of globular proteins. Although repeat proteins, which are composed of tandem modular units, lack sequence-distant contacts, several of considerable length have been shown to undergo cooperative two-state folding. To explore the limits of cooperativity in repeat proteins, we have studied the unfolding of YopM, a leucine-rich repeat (LRR) protein of over 400 residues. Despite its large size and modular architecture (15 repeats), YopM equilibrium unfolding is highly cooperative, and shows a very strong dependence on the concentration of urea. In contrast, kinetic studies of YopM folding indicate a mechanism that includes one or more transient intermediates. The urea dependence of the folding and unfolding rates suggests a relatively small transition state ensemble. As with the urea dependence, we have found an extreme dependence of the free energy of unfolding on the concentration of salt. This salt dependence likely results from general screening of a large number of unfavorable columbic interactions in the folded state, rather than from specific cation binding. 相似文献