首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Several algorithms have been developed that use amino acid sequences to predict whether or not a protein or a region of a protein is disordered. These algorithms make accurate predictions for disordered regions that are 30 amino acids or longer, but it is unclear whether the predictions can be directly related to the backbone dynamics of individual amino acid residues. The nuclear Overhauser effect between the amide nitrogen and hydrogen (NHNOE) provides an unambiguous measure of backbone dynamics at single residue resolution and is an excellent tool for characterizing the dynamic behavior of disordered proteins. In this report, we show that the NHNOE values for several members of a family of disordered proteins are highly correlated with the output from three popular algorithms used to predict disordered regions from amino acid sequence. This is the first test between an experimental measure of residue specific backbone dynamics and disorder predictions. The results suggest that some disorder predictors can accurately estimate the backbone dynamics of individual amino acids in a long disordered region.  相似文献   

2.
Protein secondary structure predictions and amino acid long range contact map predictions from primary sequence of proteins have been explored to aid in modelling protein tertiary structures. In order to evaluate the usefulness of secondary structure and 3D-residue contact prediction methods to model protein structures we have used the known Q3 (alpha-helix, beta-strands and irregular turns/loops) secondary structure information, along with residue-residue contact information as restraints for MODELLER. We present here results of our modelling studies on 30 best resolved single domain protein structures of varied lengths. The results shows that it is very difficult to obtain useful models even with 100% accurate secondary structure predictions and accurate residue contact predictions for up to 30% of residues in a sequence. The best models that we obtained for proteins of lengths 37, 70, 118, 136 and 193 amino acid residues are of RMSDs 4.17, 5.27, 9.12, 7.89 and 9.69, respectively. The results show that one can obtain better models for the proteins which have high percent of alpha-helix content. This analysis further shows that MODELLER restrain optimization program can be useful only if we have truly homologous structure(s) as a template where it derives numerous restraints, almost identical to the templates used. This analysis also clearly indicates that even if we satisfy several true residue-residue contact distances, up to 30% of their sequence length with fully known secondary structural information, we end up predicting model structures much distant from their corresponding native structures.  相似文献   

3.
Protein secondary structure predictions and amino acid long range contact map predictions from primary sequence of proteins have been explored to aid in modelling protein tertiary structures. In order to evaluate the usefulness of secondary structure and 3D-residue contact prediction methods to model protein structures we have used the known Q3 (alpha-helix,beta-strands and irregular turns/loops) secondary structure information, along with residue-residue contact information as restraints for MODELLER. We present here results of our modelling studies on 30 best resolved single domain protein structures of varied lengths. The results shows that it is very difficult to obtain useful models even with 100% accurate secondary structure predictions and accurate residue contact predictions for up to 30% of residues in a sequence. The best models that we obtained for proteins of lengths 37, 70, 118, 136 and 193 amino acid residues are of RMSDs 4.17, 5.27, 9.12, 7.89 and 9.69,respectively. The results show that one can obtain better models for the proteins which have high percent of alpha-helix content. This analysis further shows that MODELLER restrain optimization program can be useful only if we have truly homologous structure(s) as a template where it derives numerous restraints, almost identical to the templates used. This analysis also clearly indicates that even if we satisfy several true residue-residue contact distances, up to 30%of their sequence length with fully known secondary structural information, we end up predicting model structures much distant from their corresponding native structures.  相似文献   

4.
Rigorous assessments of protein structure prediction have demonstrated that fold recognition methods can identify remote similarities between proteins when standard sequence search methods fail. It has been shown that the accuracy of predictions is improved when refined multiple sequence alignments are used instead of single sequences and if different methods are combined to generate a consensus model. There are several meta-servers available that integrate protein structure predictions performed by various methods, but they do not allow for submission of user-defined multiple sequence alignments and they seldom offer confidentiality of the results. We developed a novel WWW gateway for protein structure prediction, which combines the useful features of other meta-servers available, but with much greater flexibility of the input. The user may submit an amino acid sequence or a multiple sequence alignment to a set of methods for primary, secondary and tertiary structure prediction. Fold-recognition results (target-template alignments) are converted into full-atom 3D models and the quality of these models is uniformly assessed. A consensus between different FR methods is also inferred. The results are conveniently presented on-line on a single web page over a secure, password-protected connection. The GeneSilico protein structure prediction meta-server is freely available for academic users at http://genesilico.pl/meta.  相似文献   

5.
Napin is a 2S storage protein found in the seeds of oilseed rape (Brassica napus L.) and related species. Using protein structural prediction programs we have identified a region in the napin protein sequence which forms a `hydrophilic loop' composed of amino acid residues located at the protein surface. Targeting this region, we have constructed two napin chimeric genes containing the coding sequence for the peptide hormone leucine-enkephalin as a topological marker. One version has a single enkephalin sequence of 11 amino acids including linkers and the second contains a tandem repeat of this peptide comprising 22 amino acids, inserted into the napin large subunit. The inserted peptide sequences alter the balance of hydrophilic to hydrophobic amino acids and introduce flexibility into this region of the polypeptide chain. The chimeric genes have been expressed in tobacco plants under the control of the seed-specific napA gene promoter. Analyses indicate that the engineered napin proteins are expressed, transported, post-translationally modified and deposited inside the protein bodies of the transgenic seeds demonstrating that the altered napin proteins behave in a similar fashion to the authentic napin protein. Detailed immunolocalisation studies indicate that the insertion of the peptide sequences has a significant effect on the distribution of the napin proteins within the tobacco seed protein bodies.  相似文献   

6.
Due to advances in molecular biology the DNA sequences of structural genes coding for proteins are often known before a protein is characterized or even isolated. The function of a protein whose amino acid sequence has been deduced from a DNA sequence may not even be known. This has created greater interest in the development of methods to predict the tertiary structures of proteins. The a priori prediction of a protein's structure from its amino acid sequence is not yet possible. However, since proteins with similar amino acid sequences are observed to have similar three-dimensional structures, it is possible to use an analogy with a protein of known structure to draw some conclusions about the structure and properties of an uncharacterized protein. The process of predicting the tertiary structure of a protein relies very much upon computer modeling and analysis of the structure. The prediction of the structure of the bacteriophage 434 cro repressor is used as an example illustrating current procedures.  相似文献   

7.
8.
Transcobalamin I (TCI) is a member of the R binder family of vitamin B12 binding proteins. It is a major protein constituent of secondary granules in neutrophils. We have isolated and characterized full length cDNA clones encoding TCI in order to determine whether its expression is coordinately regulated with the appearance of secondary granules and whether it is consequently a useful marker of granulocyte development. Partial amino acid sequences of human R protein were obtained from tryptic digestion fragments. Using the polymerase chain reaction, a partial TCI cDNA probe was isolated by selective amplification of a region of cDNA located between two oligonucleotides deduced from the available partial amino acid sequences. The amplified probe was then used to obtain full length clones from a granulocyte cDNA library. Identity of the clones was confirmed by matching DNA sequence to known peptide amino acid sequence. TCI is transcribed to a single 1.5-kilobase mRNA species. The predicted protein sequence is 433 amino acids long. We have compared the sequence of TCI to that of rat intrinsic factor. The two proteins have areas of extensive homology which implicate regions potentially important for vitamin B12 binding. TCI mRNA was present in late neutrophil precursors but absent from uninduced and induced HL60 cells.  相似文献   

9.
Reliability of the hydropathy method to predict the formation of membrane-spanning alpha-helices by integral membrane proteins and peptides whose structure is known from X-ray crystallography is analysed. It is shown that Kyte-Doolittle hydropathy plots do not predict accurately 22 transmembrane alpha-helices in the reaction centres (RC) of the photosynthetic bacteria Rhodopseudomonas viridis and Rhodobacter sphaeroides (R-26). The accuracy of prediction for these proteins was improved using an optimised Kyte-Doolittle hydrophobicity scale. However, this hydrophobicity scale did not improve the predictions for the alphabeta-peptides of the B800-850 (LH2) complexes of the photosynthetic bacteria Rhodopseudomonas acidophila and Rhodospirillum molischianum, which were excluded from the optimisation procedure. The best and worst predictions of membrane-spanning alpha-helices for the RC proteins and LH2 peptides, respectively, were obtained with a propensity scale (PRC) calculated from the amino acid sequences and X-ray data for the RC proteins. A propensity scale (PLH) obtained using the amino acid sequences and X-ray data for the alphabeta-peptides of the LH2 complexes did not give an acceptable prediction of the transmembrane segments in the LH2 peptides; moreover, it markedly contradicted the PRC scale. Amino acids have been concluded to have no significant preference to localisation in transmembrane segments. Therefore, the predictive ability of the hydropathy methodology appears to be limited: the number of transmembrane segments can be correctly calculated for the best case only, and the lengths and positions of membrane-spanning alpha-helices in a protein amino acid sequence can not be predicted exactly.  相似文献   

10.
The complete nucleotide sequences of the vesicular stomatitis virus (VSV) mRNA's encoding the N and NS proteins have been determined from the sequences of cDNA clones. The mRNA encoding the N protein is 1,326 nucleotides long, excluding polyadenylic acid. It contains an open reading frame for translation which extends from the 5'-proximal AUG codon to encode a protein of 422 amino acids. The N and mRNA is known to contain a major ribosome binding site at the 5'-proximal AUG codon and two other minor ribosome binding sites. These secondary sites have been located unambiguously at the second and third AUG codons in the N mRNA sequence. Translational initiation at these sites, if it in fact occurs, would result in synthesis of two small proteins in a second reading frame. The VSV and mrna encoding the NS protein is 815 nucleotides long, excluding polyadenylic acid, and encodes a protein of 222 amino acids. The predicted molecular weight of the NS protein (25,110) is approximately one-half of that predicted from the mobility of NS protein on sodium dodecyl sulfate-polyacrylamide gels. Deficiency of sodium dodecyl sulfate binding to a large negatively charged domain in the NS protein could explain this anomalous electrophoretic mobility.  相似文献   

11.
Prediction of protein sorting signals from the sequence of amino acids has great importance in the field of proteomics today. Recently, the growth of protein databases, combined with machine learning approaches, such as neural networks and hidden Markov models, have made it possible to achieve a level of reliability where practical use in, for example automatic database annotation is feasible. In this review, we concentrate on the present status and future perspectives of SignalP, our neural network-based method for prediction of the most well-known sorting signal: the secretory signal peptide. We discuss the problems associated with the use of SignalP on genomic sequences, showing that signal peptide prediction will improve further if integrated with predictions of start codons and transmembrane helices. As a step towards this goal, a hidden Markov model version of SignalP has been developed, making it possible to discriminate between cleaved signal peptides and uncleaved signal anchors. Furthermore, we show how SignalP can be used to characterize putative signal peptides from an archaeon, Methanococcus jannaschii. Finally, we briefly review a few methods for predicting other protein sorting signals and discuss the future of protein sorting prediction in general.  相似文献   

12.
S M Thomas  R A Lamb  R G Paterson 《Cell》1988,54(6):891-902
The "P" gene of the paramyxovirus SV5 encodes two known proteins, P (Mr approximately equal to 44,000) and V (Mr approximately equal to 24,000). The complete nucleotide sequence of the "P" gene has been obtained and is found to contain two open reading frames, neither of which is large enough to encode the P protein. We have shown that the P and V proteins are translated from two mRNAs that differ by the presence of two nontemplated G residues in the P mRNA. These two additional nucleotides convert the two open reading frames to one of 392 amino acids. The P and V proteins are amino coterminal and have 164 amino acids in common. The unique C terminus of V consists of a cysteine-rich region that resembles a cysteine-rich metal binding domain. An open reading frame that contains this cysteine-rich region exists in all other paramyxovirus "P" gene sequences examined, which suggests that it may have important biological significance.  相似文献   

13.
14.

Background  

Current protocols yield crystals for <30% of known proteins, indicating that automatically identifying crystallizable proteins may improve high-throughput structural genomics efforts. We introduce CRYSTALP2, a kernel-based method that predicts the propensity of a given protein sequence to produce diffraction-quality crystals. This method utilizes the composition and collocation of amino acids, isoelectric point, and hydrophobicity, as estimated from the primary sequence, to generate predictions. CRYSTALP2 extends its predecessor, CRYSTALP, by enabling predictions for sequences of unrestricted size and provides improved prediction quality.  相似文献   

15.
Remote homology detection refers to the detection of structure homology in evolutionarily related proteins with low sequence similarity. Supervised learning algorithms such as support vector machine (SVM) are currently the most accurate methods. In most of these SVM-based methods, efforts have been dedicated to developing new kernels to better use the pairwise alignment scores or sequence profiles. Moreover, amino acids’ physicochemical properties are not generally used in the feature representation of protein sequences. In this article, we present a remote homology detection method that incorporates two novel features: (1) a protein's primary sequence is represented using amino acid's physicochemical properties and (2) the similarity between two proteins is measured using recurrence quantification analysis (RQA). An optimization scheme was developed to select different amino acid indices (up to 10 for a protein family) that are best to characterize the given protein family. The selected amino acid indices may enable us to draw better biological explanation of the protein family classification problem than using other alignment-based methods. An SVM-based classifier will then work on the space described by the RQA metrics. The classification scheme is named as SVM-RQA. Experiments at the superfamily level of the SCOP1.53 dataset show that, without using alignment or sequence profile information, the features generated from amino acid indices are able to produce results that are comparable to those obtained by the published state-of-the-art SVM kernels. In the future, better prediction accuracies can be expected by combining the alignment-based features with our amino acids property-based features. Supplementary information including the raw dataset, the best-performing amino acid indices for each protein family and the computed RQA metrics for all protein sequences can be downloaded from http://ym151113.ym.edu.tw/svm-rqa.  相似文献   

16.
The advent of whole genome sequencing leads to increasing number of proteins with known amino acid sequences. Despite many efforts, the number of proteins with resolved three dimensional structures is still low. One of the challenging tasks the structural biologists face is the prediction of the interaction of metal ion with any protein for which the structure is unknown. Based on the information available in Protein Data Bank, a site (METALACTIVE INTERACTION) has been generated which displays information for significant high preferential and low‐preferential combination of endogenous ligands for 49 metal ions. User can also gain information about the residues present in the first and second coordination sphere as it plays a major role in maintaining the structure and function of metalloproteins in biological system. In this paper, a novel computational tool (ZINCCLUSTER) is developed, which can predict the zinc metal binding sites of proteins even if only the primary sequence is known. The purpose of this tool is to predict the active site cluster of an uncharacterized protein based on its primary sequence or a 3D structure. The tool can predict amino acids interacting with a metal or vice versa. This tool is based on the occurrence of significant triplets and it is tested to have higher prediction accuracy when compared to that of other available techniques.  相似文献   

17.
Structural genomics projects as well as ab initio protein structure prediction methods provide structures of proteins with no sequence or fold similarity to proteins with known functions. These are often low-resolution structures that may only include the positions of C alpha atoms. We present a fast and efficient method to predict DNA-binding proteins from just the amino acid sequences and low-resolution, C alpha-only protein models. The method uses the relative proportions of certain amino acids in the protein sequence, the asymmetry of the spatial distribution of certain other amino acids as well as the dipole moment of the molecule. These quantities are used in a linear formula, with coefficients derived from logistic regression performed on a training set, and DNA-binding is predicted based on whether the result is above a certain threshold. We show that the method is insensitive to errors in the atomic coordinates and provides correct predictions even on inaccurate protein models. We demonstrate that the method is capable of predicting proteins with novel binding site motifs and structures solved in an unbound state. The accuracy of our method is close to another, published method that uses all-atom structures, time-consuming calculations and information on conserved residues.  相似文献   

18.
alpha-helices within proteins are often terminated (capped) by distinctive configurations of the polypeptide chain. Two common arrangements are the Schellman motif and the alternative alpha(L) motif. Rose and coworkers developed stereochemical rules to identify the locations of such motifs in proteins of unknown structure based only on their amino acid sequences. To check the effectiveness of these rules, they made specific predictions regarding the structural and thermodynamic consequences of certain mutations in T4 lysozyme. We have constructed these mutants and show here that they have neither the structure nor the stability that was predicted. The results show the complexity of the protein-folding problem. Comparison of known protein structures may show that a characteristic sequence of amino acids (a sequence motif) corresponds to a conserved structural motif. In any particular protein, however, changes in other parts of the sequence may result in a different conformation. The structure is determined by sequence as a whole, not by parts considered in isolation.  相似文献   

19.
The large subunit of eukaryotic ribosomes contains acidic phosphoproteins which are related to L7/L12 from Escherichia coli. In the brine shrimp Artemia these proteins are designated eL12 and eL12'. We have isolated cDNA clones for these proteins from a cDNA bank that was constructed by the use of size-fractionated poly(A)-rich RNA (8-10S fraction) from Artemia and a synthetic oligonucleotide as primer. Clones containing DNA sequences coding for eL12 and eL12 were characterized by hybrid-selected translation and DNA sequencing. The proteins eL12 and eL12' share an identical peptide of 22 amino acids at their carboxy termini whereas the remaining part of the protein shows little sequence homology. The nucleotide sequences show a different codon use for the amino acids in the common carboxy terminus, thereby excluding a common exon coding for this part of both proteins. Despite the differences in amino acid sequence in the major part of eL12 and eL12' the proteins have a considerable degree of homology on the basis of the distribution of hydrophobic and hydrophilic amino acids over the polypeptide chains, in agreement with a related folding and function of both proteins. Relative levels of mRNA coding for eL12, eL12' and elongation factor 1 alpha were determined during the development of Artemia from a dormant cyst to a nauplius. The data show a coordinate expression of the genes for EF-1 alpha and both ribosomal proteins, excluding a differential expression of the genes for these related ribosomal proteins during embryogenesis. Analysis of the gene copy number for eL12 and eL12' indicates the presence of a few genes for each protein.  相似文献   

20.
The prediction of the secondary structure of proteins from their amino acid sequences remains a key component of many approaches to the protein folding problem. The most abundant form of regular secondary structure in proteins is the alpha-helix, in which specific residue preferences exist at the N-terminal locations. Propensities derived from these observed amino acid frequencies in the Protein Data Bank (PDB) database correlate well with experimental free energies measured for residues at different N-terminal positions in alanine-based peptides. We report a novel method to exploit this data to improve protein secondary structure prediction through identification of the correct N-terminal sequences in alpha-helices, based on existing popular methods for secondary structure prediction. With this algorithm, the number of correctly predicted alpha-helix start positions was improved from 30% to 38%, while the overall prediction accuracy (Q3) remained the same, using cross-validated testing. Although the algorithm was developed and tested on multiple sequence alignment-based secondary structure predictions, it was also able to improve the predictions of start locations by methods that use single sequences to make their predictions. Furthermore, the residue frequencies at N-terminal positions of the improved predictions better reflect those seen at the N-terminal positions of alpha-helices in proteins. This has implications for areas such as comparative modeling, where a more accurate prediction of the N-terminal regions of alpha-helices should benefit attempts to model adjacent loop regions. The algorithm is available as a Web tool, located at http://rocky.bms.umist.ac.uk/elephant.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号