首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Protein structural class prediction is one of the challenging problems in bioinformatics. Previous methods directly based on the similarity of amino acid (AA) sequences have been shown to be insufficient for low-similarity protein data-sets. To improve the prediction accuracy for such low-similarity proteins, different methods have been recently proposed that explore the novel feature sets based on predicted secondary structure propensities. In this paper, we focus on protein structural class prediction using combinations of the novel features including secondary structure propensities as well as functional domain (FD) features extracted from the InterPro signature database. Our comprehensive experimental results based on several benchmark data-sets have shown that the integration of new FD features substantially improves the accuracy of structural class prediction for low-similarity proteins as they capture meaningful relationships among AA residues that are far away in protein sequence. The proposed prediction method has also been tested to predict structural classes for partially disordered proteins with the reasonable prediction accuracy, which is a more difficult problem comparing to structural class prediction for commonly used benchmark data-sets and has never been done before to the best of our knowledge. In addition, to avoid overfitting with a large number of features, feature selection is applied to select discriminating features that contribute to achieve high prediction accuracy. The selected features have been shown to achieve stable prediction performance across different benchmark data-sets.  相似文献   

2.
Cell-free protein synthesis is a promising technology featuring many advantages compared to in vivo expression techniques. However, most proteins are still synthesized in vivo due to relatively low protein yields commonly achieved in vitro, especially in the batch mode of reaction. In Escherichia coli S30 extract-based cell-free systems protein yields are supposed to be partially limited by a secondary structure formation of the mRNA. In this study we checked promising members of various classes of RNA chaperones and several different RNA helicases on their ability to enhance in vitro translation. The data clearly show that the addition of none of these factors provides a general solution to the problem. However, protein yields can be increased in presence of a microRNA hybridizing with the 5′ untranslated region of mRNAs, possibly by inducing structural changes improving accessibility of the Shine Dalgarno sequence for the ribosomes.  相似文献   

3.
Jia M  Luo L  Liu C 《Biopolymers》2004,73(1):16-26
A new integrated sequence-structure database, called IADE (Integrated ASTRAL-DSSP-EMBL), incorporating matching mRNA sequence, amino acid sequence, and protein secondary structural data, is constructed. It includes 648 protein domains. Based on the IADE database, we studied the relation between RNA stem-loop frequencies and protein secondary structure. It was found that the alpha-helices and beta-strands on proteins tend to be preferably "coded" by mRNA stem region, while the coils on proteins tend to be preferably "coded" by mRNA loop region. These tendencies are more obvious if we observe the structural words (SWs). An SW is defined by a four-amino-acid-fragment that shows the pronounced secondary structural (alpha-helix or beta-strand) propensity. It is demonstrated that the deduced correlation between protein and mRNA structure can hardly be explained as the stochastic fluctuation effect.  相似文献   

4.
Systems perspectives on mRNA processing   总被引:4,自引:0,他引:4  
McKee AE  Silver PA 《Cell research》2007,17(7):581-590
  相似文献   

5.
It has been reported that eukaryotic organisms have a nonsense-mediated mRNA decay (NMD) system to exclude aberrant mRNAs that produce truncated proteins. NMD is an RNA surveillance pathway that degrades mRNAs possessing premature translation termination codons (PTCs), thus avoiding production of possibly toxic truncated proteins. Three interacting proteins, UPF1, UPF2 and UPF3, are required for NMD in mammals and yeasts, and their amino acid sequences are well conserved among most eukaryotes, including plants. In this study, 'The Arabidopsis Information Resource' database was searched for mRNAs with premature termination codons. We selected five of these mRNAs and checked for the presence of PTCs in these mRNAs when translated in vivo. As a result we identified aberrant mRNAs produced by alternative splicing for each gene. These genes produced at least one alternative splicing variant including a PTC (PTC+) and another variant without a PTC (PTC-). We analyzed their PTC+/PTC- ratios in wild-type Arabidopsis and upf3 mutant plants and showed that the PTC+/PTC- ratios were higher in atupf3 mutant plants than wild-type plants and that the atupf3 mutant was less able to degrade mRNAs with premature termination codons than wild-type plants. This indicated that the AtUPF3 gene is required by the plant NMD system to obviate aberrantly spliced mRNA.  相似文献   

6.
Prokaryotic ribosomal protein genes are typically grouped within highly conserved operons. In many cases, one or more of the encoded proteins not only bind to a specific site in the ribosomal RNA, but also to a motif localized within their own mRNA, and thereby regulate expression of the operon. In this study, we computationally predicted an RNA motif present in many bacterial phyla within the 5′ untranslated region of operons encoding ribosomal proteins S6 and S18. We demonstrated that the S6:S18 complex binds to this motif, which we hereafter refer to as the S6:S18 complex-binding motif (S6S18CBM). This motif is a conserved CCG sequence presented in a bulge flanked by a stem and a hairpin structure. A similar structure containing a CCG trinucleotide forms the S6:S18 complex binding site in 16S ribosomal RNA. We have constructed a 3D structural model of a S6:S18 complex with S6S18CBM, which suggests that the CCG trinucleotide in a specific structural context may be specifically recognized by the S18 protein. This prediction was supported by site-directed mutagenesis of both RNA and protein components. These results provide a molecular basis for understanding protein-RNA recognition and suggest that the S6S18CBM is involved in an auto-regulatory mechanism.  相似文献   

7.
About 200 mRNA sequences of Escherichia coli and human with matching protein secondary structure data were studied. The mRNA folding for each native sequence and for corresponding randomized sequences was calculated through free energy minimization. We have found that the folding energy of mRNA segments in different protein secondary structures is significantly different. The average Z score is more negative for regular secondary structure (alpha-helix and beta-strand) than that for coil. This suggests that the codon choice in native mRNA sequence coding for protein regular structure contributes more to the mRNA folding stability.  相似文献   

8.
The secondary structure is a fundamental feature of both non-coding RNAs (ncRNAs) and messenger RNAs (mRNAs). However, our understanding of the secondary structures of mRNAs, especially those of the coding regions, remains elusive, likely due to translation and the lack of RNA-binding proteins that sustain the consensus structure like those binding to ncRNAs. Indeed, mRNAs have recently been found to adopt diverse alternative structures, but the overall functional significance remains untested. We hereby approach this problem by estimating the folding specificity, i.e., the probability that a fragment of an mRNA folds back to the same partner once refolded. We show that the folding specificity of mRNAs is lower than that of ncRNAs and exhibits moderate evolutionary conservation. Notably, we find that specific rather than alternative folding is likely evolutionarily adaptive since specific folding is frequently associated with functionally important genes or sites within a gene. Additional analysis in combination with ribosome density suggests the ability to modulate ribosome movement as one potential functional advantage provided by specific folding. Our findings reveal a novel facet of the RNA structurome with important functional and evolutionary implications and indicate a potential method for distinguishing the mRNA secondary structures maintained by natural selection from molecular noise.  相似文献   

9.
With a growing number of structures available in the Brookhaven Protein Data Bank, automatic methods for domain identification are required for the construction of databases. Domains are considered to be clusters of secondary structure elements. Thus, helices and strands are first clustered using intersecondary structural distances between C alpha positions, and dendrograms based on this distance measure are used to identify domains. Individual domains are recognized by a disjoint factor, which enables the automatic identification and classification into disjoint, interacting, and conjoint domains. Application to a database of 83 protein families and 18 unique structures shows that the approach provides an effective delineation of boundaries and identifies those proteins that can be considered as a single domain. A quantitative estimate of the interaction between domains has been proposed. The database of protein domains is a useful tool for understanding protein folding, for recognizing protein folds, and for understanding structure-activity relationships.  相似文献   

10.
A new method of contextual analysis was used to search the long non-random inverted repeats and the complementary palindromes in the genes of E. coli and T7 RNA polymerases. These genes were found to contain from 25% to 50% of all the nucleotides involved in such helices. The 5' -and 3' -ends of mRNA can be protected by neighbouring double helices from the nuclease attack. Some double helices are competing and very similar to the attenuator of E. coli trp-operon.  相似文献   

11.
挑选了NCBI COG数据库中具有全基因组的单细胞微生物,选择其中三维结构已知的蛋白质作为研究对象,研究了不同类型的二级结构含量和长度对古细菌和细菌类蛋白质耐热性的影响作用。结果表明:耐热的古细菌类蛋白质中含有相当数量的短的3_(10)螺旋,而耐热的细菌蛋白质中含有较短的loop环。这不仅说明二级结构对蛋白质耐热性有重要的影响,还表明二级结构对古细菌和细菌类蛋白质耐热性的影响作用是不同的。  相似文献   

12.
For a long time, NMR chemical shifts have been used to identify protein secondary structures. Currently, this is accomplished through comparing the observed (1)H(alpha), (13)C(alpha), (13)C(beta), or (13)C' chemical shifts with the random coil values. Here, we present a new protocol, which is based on the joint probability of each of the three secondary structural types (beta-strand, alpha-helix, and random coil) derived from chemical-shift data, to identify the secondary structure. In combination with empirical smooth filters/functions, this protocol shows significant improvements in the accuracy and the confidence of identification. Updated chemical-shift statistics are reported, on the basis of which the reliability of using chemical shift to identify protein secondary structure is evaluated for each nucleus. The reliability varies greatly among the 20 amino acids, but, on average, is in the order of: (13)C(alpha)>(13)C'>(1)H(alpha)>(13)C(beta)>(15)N>(1)H(N) to distinguish an alpha-helix from a random coil; and (1)H(alpha)>(13)C(beta) >(1)H(N) approximately (13)C(alpha) approximately (13)C' approximately (15)N for a beta-strand from a random coil. Amide (15)N and (1)H(N) chemical shifts, which are generally excluded from the application, in fact, were found to be helpful in distinguishing a beta-strand from a random coil. In addition, the chemical-shift statistical data are compared with those reported previously, and the results are discussed. A JAVA User Interface program has been developed to make the entire procedure fully automated and is available via http://ccsr3150-p3.stanford.edu.  相似文献   

13.
神经网络在蛋白质二级结构预测中的应用   总被引:3,自引:0,他引:3  
介绍了蛋白质二级结构预测的研究意义,讨论了用在蛋白质二级结构预测方面的神经网络设计问题,并且较详尽地评述了近些年来用神经网络方法在蛋白质二级结构预测中的主要工作进展情况,展望了蛋白质结构预测的前景。  相似文献   

14.
Homaeian L  Kurgan LA  Ruan J  Cios KJ  Chen K 《Proteins》2007,69(3):486-498
Secondary protein structure carries information about local structural arrangements, which include three major conformations: alpha-helices, beta-strands, and coils. Significant majority of successful methods for prediction of the secondary structure is based on multiple sequence alignment. However, multiple alignment fails to provide accurate results when a sequence comes from the twilight zone, that is, it is characterized by low (<30%) homology. To this end, we propose a novel method for prediction of secondary structure content through comprehensive sequence representation, called PSSC-core. The method uses a multiple linear regression model and introduces a comprehensive feature-based sequence representation to predict amount of helices and strands for sequences from the twilight zone. The PSSC-core method was tested and compared with two other state-of-the-art prediction methods on a set of 2187 twilight zone sequences. The results indicate that our method provides better predictions for both helix and strand content. The PSSC-core is shown to provide statistically significantly better results when compared with the competing methods, reducing the prediction error by 5-7% for helix and 7-9% for strand content predictions. The proposed feature-based sequence representation uses a comprehensive set of physicochemical properties that are custom-designed for each of the helix and strand content predictions. It includes composition and composition moment vectors, frequency of tetra-peptides associated with helical and strand conformations, various property-based groups like exchange groups, chemical groups of the side chains and hydrophobic group, auto-correlations based on hydrophobicity, side-chain masses, hydropathy, and conformational patterns for beta-sheets. The PSSC-core method provides an alternative for predicting the secondary structure content that can be used to validate and constrain results of other structure prediction methods. At the same time, it also provides useful insight into design of successful protein sequence representations that can be used in developing new methods related to prediction of different aspects of the secondary protein structure.  相似文献   

15.
A novel method for predicting the secondary structures of proteins from amino acid sequence has been presented. The protein secondary structure seqlets that are analogous to the words in natural language have been extracted. These seqlets will capture the relationship between amino acid sequence and the secondary structures of proteins and further form the protein secondary structure dictionary. To be elaborate, the dictionary is organism-specific. Protein secondary structure prediction is formulated as an integrated word segmentation and part of speech tagging problem. The word-lattice is used to represent the results of the word segmentation and the maximum entropy model is used to calculate the probability of a seqlet tagged as a certain secondary structure type. The method is markovian in the seqlets, permitting efficient exact calculation of the posterior probability distribution over all possible word segmentations and their tags by viterbi algorithm. The optimal segmentations and their tags are computed as the results of protein secondary structure prediction. The method is applied to predict the secondary structures of proteins of four organisms respectively and compared with the PHD method. The results show that the performance of this method is higher than that of PHD by about 3.9% Q3 accuracy and 4.6% SOV accuracy. Combining with the local similarity protein sequences that are obtained by BLAST can give better prediction. The method is also tested on the 50 CASP5 target proteins with Q3 accuracy 78.9% and SOV accuracy 77.1%. A web server for protein secondary structure prediction has been constructed which is available at http://www.insun.hit.edu.cn:81/demos/biology/index.html.  相似文献   

16.
1 Introduction The prediction of protein structure and function from amino acid sequences is one of the most impor-tant problems in molecular biology. This problem is becoming more pressing as the number of known pro-tein sequences is explored as a result of genome and other sequencing projects, and the protein sequence- structure gap is widening rapidly[1]. Therefore, com-putational tools to predict protein structures are needed to narrow the widening gap. Although the prediction of three dim…  相似文献   

17.
The conformational parametersP k for each amino acid species (j=1–20) of sequential peptides in proteins are presented as the product ofP i,k , wherei is the number of the sequential residues in thekth conformational state (k=-helix,-sheet,-turn, or unordered structure). Since the average parameter for ann-residue segment is related to the average probability of finding the segment in the kth state, it becomes a geometric mean of (P k )av=(P i,k ) 1/n with amino acid residuei increasing from 1 ton. We then used ln(Pk)av to convert a multiplicative process to a summation, i.e., ln(P k ) av =(1/n)P i,k (i=1 ton) for ease of operation. However, this is unlike the popular Chou-Fasman algorithm, which has the flaw of using the arithmetic mean for relative probabilities. The Chou-Fasman algorithm happens to be close to our calculations in many cases mainly because the difference between theirP k and our InP k is nearly constant for about one-half of the 20 amino acids. When stronger conformation formers and breakers exist, the difference become larger and the prediction at the N- and C-terminal-helix or-sheet could differ. If the average conformational parameters of the overlapping segments of any two states are too close for a unique solution, our calculations could lead to a different prediction.  相似文献   

18.
Summary Protein-specific versus taxon-specific patterns of nucleotide frequencies were studied in histone genes. The third positions of codons have a (well-known) taxon-specific G+C level and a histone type-specific G/C ratio. This ratio counterbalances the G/C ratio in the first and second positions so that the overall G and C levels in the coding region become approximately equal. The compensation of the G/C ratio indicates a selection pressure at the mRNA level rather than a selection pressure or mutation bias at the DNA level or a selection pressure on codon usage. The structure of histone mRNAs is compatible with the hypothesis that the G/C compensation is due to selection pressures on mRNA secondary structure. Nevertheless, no specific motifs seem to have been selected, and the free energy of the secondary structures is only slightly lower than that expected on the basis of nucleotide frequencies.Offprint requests to: M. A. Huynen  相似文献   

19.
目前蛋白质二级结构的预测准确率徘徊在75%左右,难以作进一步提高。本文通过统计学的方法,对蛋白质的冗余数据库进行了分析。并由此证明,目前影响预测准确率继续的真正原因是蛋白质数据库本身的系统误差,系统误差大约为25%。而该误差是由于实验条件的客观原因带来的。  相似文献   

20.
Gene expression signals derived from Lactococcus lactis were linked to lacZ-fused genes with different 5'-nucleotide sequences. Computer predictions of mRNA secondary structure were combined with lacZ expression studies to direct base-substitutions that could possibly influence gene expression. Mutations were made such that the DNA sequence upstream of the ATG start codon was not changed. Moreover, care was taken that the substitutions, which were all within the first six codons, neither affected the amino acid sequence of the gene product nor introduced codons rarely used in L. lactis. The results suggest that mRNA secondary structure contributes to the efficiency of translation initiation in L. lactis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号