首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Bioinformatic software has used various numerical encoding schemes to describe amino acid sequences. Orthogonal encoding, employing 20 numbers to describe the amino acid type of one protein residue, is often used with artificial neural network (ANN) models. However, this can increase the model complexity, thus leading to difficulty in implementation and poor performance. Here, we use ANNs to derive encoding schemes for the amino acid types from protein three-dimensional structure alignments. Each of the 20 amino acid types is characterized with a few real numbers. Our schemes are tested on the simulation of amino acid substitution matrices. These simplified schemes outperform the orthogonal encoding on small data sets. Using one of these encoding schemes, we generate a colouring scheme for the amino acids in which comparable amino acids are in similar colours. We expect it to be useful for visual inspection and manual editing of protein multiple sequence alignments.  相似文献   

2.
Understanding and predicting protein structures depend on the complexity and the accuracy of the models used to represent them. We have recently set up a Hidden Markov Model to optimally compress protein three-dimensional conformations into a one-dimensional series of letters of a structural alphabet. Such a model learns simultaneously the shape of representative structural letters describing the local conformation and the logic of their connections, i.e. the transition matrix between the letters. Here, we move one step further and report some evidence that such a model of protein local architecture also captures some accurate amino acid features. All the letters have specific and distinct amino acid distributions. Moreover, we show that words of amino acids can have significant propensities for some letters. Perspectives point towards the prediction of the series of letters describing the structure of a protein from its amino acid sequence.  相似文献   

3.
Simple hidden Markov models are proposed for predicting secondary structure of a protein from its amino acid sequence. Since the length of protein conformation segments varies in a narrow range, we ignore the duration effect of length distribution, and focus on inclusion of short range correlations of residues and of conformation states in the models. Conformation-independent and -dependent amino acid coarse-graining schemes are designed for the models by means of proper mutual information. We compare models of different level of complexity, and establish a practical model with a high prediction accuracy.  相似文献   

4.
The goal of this work is to characterize structurally ambivalent fragments in proteins. We have searched the Protein Data Bank and identified all structurally ambivalent peptides (SAPs) of length five or greater that exist in two different backbone conformations. The SAPs were classified in five distinct categories based on their structure. We propose a novel index that provides a quantitative measure of conformational variability of a sequence fragment. It measures the context-dependent width of the distribution of (phi,xi) dihedral angles associated with each amino acid type. This index was used to analyze the local structural propensity of both SAPs and the sequence fragments contiguous to them. We also analyzed type-specific amino acid composition, solvent accessibility, and overall structural properties of SAPs and their sequence context. We show that each type of SAP has an unusual, type-specific amino acid composition and, as a result, simultaneous intrinsic preferences for two distinct types of backbone conformation. All types of SAPs have lower sequence complexity than average. Fragments that adopt helical conformation in one protein and sheet conformation in another have the lowest sequence complexity and are sampled from a relatively limited repertoire of possible residue combinations. A statistically significant difference between two distinct conformations of the same SAP is observed not only in the overall structural properties of proteins harboring the SAP but also in the properties of its flanking regions and in the pattern of solvent accessibility. These results have implications for protein design and structure prediction.  相似文献   

5.
Here I systematically examine the information complexity of all primary sequences of natural proteins deposited in the Swiss-Prot database. The sequence complexity is assessed by determining the frequency of occurrence of each amino acid type on sequence windows of fixed length, calculating the Shannon entropy of the window and then averaging over all windows covering the sequence. The minimum value in information content obtained from the present-day record imposes a lower limit in the number of letters that a primeval amino acid alphabet must have had.  相似文献   

6.
Hexokinase (EC 2.7.1.1) catalyzes the first step in glucose metabolism, using ATP for the phosphorylation of glucose to glucose 6-phosphate. A portion of the HK1 gene was cloned by mixed oligonucleotide primer amplification of cDNA using primers of high complexity. The amino acid sequence for a partial fragment of bovine cardiac muscle HK was determined and used to create primer mixtures of 256- and 1024-fold complexity. Two products were generated from bovine cardiac muscle cDNA which show 82% nucleotide and 93% amino acid identity with a region of rat brain HK1 and cDNA. This work demonstrates that extension and amplification of cDNA probes may be successful even when amino acid sequence data indicate substantial codon degeneracy.  相似文献   

7.
8.
Using a maximum-likelihood formalism, we have developed a method with which to reconstruct the sequences of ancestral proteins. Our approach allows the calculation of not only the most probable ancestral sequence but also of the probability of any amino acid at any given node in the evolutionary tree. Because we consider evolution on the amino acid level, we are better able to include effects of evolutionary pressure and take advantage of structural information about the protein through the use of mutation matrices that depend on secondary structure and surface accessibility. The computational complexity of this method scales linearly with the number of homologous proteins used to reconstruct the ancestral sequence.  相似文献   

9.
中华鳖氨基酸和微量元素的分析与研究   总被引:21,自引:0,他引:21  
本文对中华鳖的氨基酸、维生素、微量元素等营养物质进行了分析,并与其它十种水产品进行了比较,通过对中华鳖的化学分(CS)、氨基酸分(AS)和必需氨基酸指数(EAAI)的计算可知,中华鳖的微量元素构成较为合理,氨基酸构成与FAO模式近似。  相似文献   

10.
H1 subtypes are involved in chromatin higher-order structure and gene regulation. H1 has a characteristic three-domain structure. We studied the length variation of the available H1 subtypes and showed that the length of the N-terminal and C-terminal domains was more variable than that of the central domain. The N-terminal and C-terminal domains were of low sequence complexity both at the nucleotide and at the amino acid level, whereas the globular domain was of high complexity. In most subtypes, low complexity was due only to cryptic simplicity, which reflects the clustering of a number of short and often imperfect sequence motifs. However, a subset of subtypes from eubacteria, plants, and invertebrates contained tandem repeats of short amino acid motifs (four to 12 residues), which could amount to a large proportion of the terminal domains. In addition, some other subtypes, such as those of Drosophila and mammalian H1t, were only marginally simple. The coexistence of these three kinds of subtypes suggests that the terminal domains could have originated in the amplification of short sequence motifs, which would then have evolved by point mutation and further slippage.  相似文献   

11.
Sequence complexity of disordered protein   总被引:27,自引:0,他引:27  
Intrinsic disorder refers to segments or to whole proteins that fail to self-fold into fixed 3D structure, with such disorder sometimes existing in the native state. Here we report data on the relationships among intrinsic disorder, sequence complexity as measured by Shannon's entropy, and amino acid composition. Intrinsic disorder identified in protein crystal structures, and by nuclear magnetic resonance, circular dichroism, and prediction from amino acid sequence, all exhibit similar complexity distributions that are shifted to lower values compared to, but significantly overlapping with, the distribution for ordered proteins. Compared to sequences from ordered proteins, these variously characterized intrinsically disordered segments and proteins, and also a collection of low-complexity sequences, typically have obviously higher levels of protein-specific subsets of the following amino acids: R, K, E, P, and S, and lower levels of subsets of the following: C, W, Y, I, and V. The Swiss Protein database of sequences exhibits significantly higher amounts of both low-complexity and predicted-to-be-disordered segments as compared to a non-redundant set of sequences from the Protein Data Bank, providing additional data that nature is richer in disordered and low-complexity segments compared to the commonness of these features in the set of structurally characterized proteins.  相似文献   

12.
Regularities in the primary structure of proteins   总被引:3,自引:0,他引:3  
In this paper the latest protein database consisting of more than a million amino acids is analyzed to characterize the short range regularities in the primary structure. The amino acid distributions along the polypeptide chain and among the proteins have been studied first. Their influence on the amino acid pair statistics was taken into account. We are primarily interested in the distances of the covalent structure, where the amino acid pair frequencies show non-random characters. The amino acid pairs separated by at least 20 residues in the covalent structure exhibit an exact Gaussian distribution. We found that there is a range of non-random pairing in the covalent structure. We conclude that the pair preference characters are different for each of the 20 x 20 amino acid pairs. The range of the non-random pairing varies from pair to pair, and in most cases it does not extend beyond the 9th neighbour. The preferences of a certain pair in a certain position can not be derived from the character of that pair in another position. The preference values of 400 amino acid pairs are listed for up to the pairs in 9th neighbour position. Some fields of potential application of these data have also been discussed.  相似文献   

13.
The point of view that a uniquely folded protein tertiary structure is required for the protein functioning has been prevailing in the literature quite recently. However of lately it has been found that many proteins in a cell have no such structure in an isolated state, though they have a well-defined function in physiological conditions. These proteins were named as proteins with natural or internal disorder. The portion of disordered regions in such proteins may vary from a sequence of several amino acids to a completely disordered sequence containing from tens to hundreds of amino acids. The main difference of these proteins from the structured (globular) ones is that they have no unique tertiary structure in an isolated state and acquire it after interaction with their partners. Their conformation in such a complex depends on the interacting partner and not only on their own amino acid sequence, which is specific for structured (globular) proteins. The problem of structural and functional relations in the structured proteins and proteins with internal disorder is discussed in this review. The complexity of the problem and its potential solutions are illustrated by the example of elongation factors EFlA.  相似文献   

14.
BACKGROUND AND AIMS: Plant species are considered as a good source of dietary proteins, although the nutritional quality of proteins depends on their amino acid composition. In this work the protein content and amino acid composition of nutlets of 21 Teucrium taxa (Lamiaceae) from Spain were analysed and their nutritional quality was compared with the minimum values established by the Food and Agriculture Organization of the United Nations (FAO). In addition, the amino acid composition was evaluated as a chemical character to clarify the taxonomic complexity in this genus. METHODS: Amino acid content of nutlets was determined after derivatization with diethyl ethoxymethylenemalonate by high-performance liquid chromatography. Previously, nutlets samples were hydrolysed and incubated in an oven at 110 degrees C for 24 h. KEY RESULTS: The protein content was variable, ranging from 6.4 % in T. dunense to 43.8 % in T. algarbiense. According to the FAO values all taxa contain satisfactory amounts of leucine, threonine and valine and are deficient in lysine. The similarity analysis of Teucrium taxa using amino acid composition data did not clearly reflect the infrageneric classification of this genus. CONCLUSIONS: Annual species, such as T. spinosum, T. aristatum and T. resupinatum showed a better balanced amino acid composition. The dendrogram partly matched with the karyological complexity of Teucrium. No correlation between amino acid composition and habitat has been observed, showing that Teucrium nutlet amino acid composition may not be strongly influenced by the environment.  相似文献   

15.
A complete synthetic medium containing 15 amino acids, a minimal synthetic medium (GAMS) containing 4 amino acids, and a supplemented minimal medium (GAMS + calcium pantothenate) have been developed for the cultivation of Hyphomicrobium neptunium ATCC 15444. Depending on the complexity of the synthetic media, generation times were approximately 2 to 3 times longer, and maximum cell densities were 0.3 to 0.9 log10 lower than in ZoBell marine broth 2216. The fates of 14C-labeled amino acids in GAMS were monitored. Results suggested that H. neptunium was auxotrophic for methionine, utilized glutamic acid as a primary energy source, and readily anabolized and catabolized serine and aspartic acid. Individual amino acid concentrations above 125 mM induced prolonged lag periods, whereas only methionine was not growth limiting at a concentration as low as 2 mM.  相似文献   

16.
Low complexity protein sequences are often intrinsically unstructured and many have the potential to polymerize into amyloid aggregates including filaments and hydrogels. RNA-binding proteins are unusually enriched in such sequences raising the question as to what function these domains serve in RNA metabolism. One such yeast protein, Nab3, is an 802 amino acid termination factor that contains an RNA recognition motif and a glutamine/proline rich domain adjacent to a region with structural similarity to a human hnRNP. A portion of the C-terminal glutamine/proline-rich domain assembles into filaments that organize into a hydrogel. Here we analyze the determinants of filament formation of the isolated low complexity domain as well as examine the polymerization properties of full-length Nab3. We found that the C-terminal region with structural homology to hnRNP-C is not required for assembly, nor is an adjacent stretch of 16 glutamines. However, reducing the overall glutamine composition of this 134-amino acid segment from 32% to 14% destroys its polymerization ability. Importantly, full-length wildtype Nab3 also formed filaments with a characteristic cross-β structure which was dependent upon the glutamine/proline-rich region. When full length Nab3 with reduced glutamine content in its low complexity domain was exchanged for wildtype Nab3, cells were not viable. This suggests that polymerization of Nab3 is normally required for its function. In an extension of this idea, we show that the low complexity domain of another yeast termination factor, Pcf11, polymerizes into amyloid fibers and a hydrogel. These findings suggest that, like many other RNA binding proteins, termination factors share a common biophysical trait that may be important for their function.  相似文献   

17.
Summary Our understanding of amino acid biosynthesis in plants has grown by leaps and bounds in the last decade. It appears that most of the amino acid biosynthesis takes place in the chloroplast. Recent demonstration of glutamine synthetase and DAHP synthase in the vascular tisuue has added a new dimension in the complexity of the nitrogen cycle in plants. Isolation of various genes and transformation of plants with the modified forms of the genes are providing tools for understanding the regulation of various pathways. Plant transformation approaches are also going to provide the food of the future with an improved amino acid composition.  相似文献   

18.
Amino acid repeats, or homorepeats, are low complexity protein motifs consisting of tandem repetitions of a single amino acid. Their presence and relative number vary in different proteomes, and some studies have tried to address this variation, proteome by proteome. In this work, we present a full characterization of amino acid homorepeats across evolution. We studied the presence and differential usage of each possible homorepeat in proteomes from various taxonomic groups, using clusters of very similar proteins to eliminate redundancy. The position of each amino acid repeat within proteins, and the order of co‐occurring amino acid repeats were also addressed. As a result, we present evidence about the unevenly evolution of homorepeats, as well as the functional implications of their relative position in proteins. We discuss some of these cases in their taxonomic context. Collectively, our results show evolutionary and positional signals that suggest that homorepeats have biological function, likely creating unspecific protein interactions or modulating specific interactions in a context dependent manner. In conclusion, our work supports the functional importance of homorepeats and establishes a basis for the study of other low complexity repeats. Proteins 2017; 85:709–719. © 2016 Wiley Periodicals, Inc.  相似文献   

19.
Metabolic networks comprise a multitude of enzymatic reactions carrying out various functions related to cell growth and product formation. Although such reactions are occasionally organized into biochemical pathways, a formal procedure is desired to identify the independent pathways in a bioreaction network and the degree of engagement of each individual reaction in these pathways. We present a procedure for the identification of the independent pathways of bioreaction networks of any size and complexity. The method makes use of the steady-state internal metabolite stoichiometry matrix and defines the independent pathways through the reaction membership of its kernel matrix. Examples from the aromatic amino acid biosynthetic pathway and central carbon metabolism of cells in culture are provided to illustrate the method. Applications to the analysis of the control structure of bioreaction networks are also discussed.  相似文献   

20.
Combinatorial experiments provide new ways to probe the determinants of protein folding and to identify novel folding amino acid sequences. These types of experiments, however, are complicated both by enormous conformational complexity and by large numbers of possible sequences. Therefore, a quantitative computational theory would be helpful in designing and interpreting these types of experiment. Here, we present and apply a statistically based, computational approach for identifying the properties of sequences compatible with a given main-chain structure. Protein side-chain conformations are included in an atom-based fashion. Calculations are performed for a variety of similar backbone structures to identify sequence properties that are robust with respect to minor changes in main-chain structure. Rather than specific sequences, the method yields the likelihood of each of the amino acids at preselected positions in a given protein structure. The theory may be used to quantify the characteristics of sequence space for a chosen structure without explicitly tabulating sequences. To account for hydrophobic effects, we introduce an environmental energy that it is consistent with other simple hydrophobicity scales and show that it is effective for side-chain modeling. We apply the method to calculate the identity probabilities of selected positions of the immunoglobulin light chain-binding domain of protein L, for which many variant folding sequences are available. The calculations compare favorably with the experimentally observed identity probabilities.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号