共查询到20条相似文献,搜索用时 0 毫秒
1.
Amin Ahmadi Adl Abbas Nowzari-Dalini Bin Xue Vladimir N. Uversky 《Journal of biomolecular structure & dynamics》2013,31(6):1127-1137
Protein structural class prediction is one of the challenging problems in bioinformatics. Previous methods directly based on the similarity of amino acid (AA) sequences have been shown to be insufficient for low-similarity protein data-sets. To improve the prediction accuracy for such low-similarity proteins, different methods have been recently proposed that explore the novel feature sets based on predicted secondary structure propensities. In this paper, we focus on protein structural class prediction using combinations of the novel features including secondary structure propensities as well as functional domain (FD) features extracted from the InterPro signature database. Our comprehensive experimental results based on several benchmark data-sets have shown that the integration of new FD features substantially improves the accuracy of structural class prediction for low-similarity proteins as they capture meaningful relationships among AA residues that are far away in protein sequence. The proposed prediction method has also been tested to predict structural classes for partially disordered proteins with the reasonable prediction accuracy, which is a more difficult problem comparing to structural class prediction for commonly used benchmark data-sets and has never been done before to the best of our knowledge. In addition, to avoid overfitting with a large number of features, feature selection is applied to select discriminating features that contribute to achieve high prediction accuracy. The selected features have been shown to achieve stable prediction performance across different benchmark data-sets. 相似文献
2.
Freischmidt A Liss M Wagner R Kalbitzer HR Horn G 《Protein expression and purification》2012,82(1):26-31
Cell-free protein synthesis is a promising technology featuring many advantages compared to in vivo expression techniques. However, most proteins are still synthesized in vivo due to relatively low protein yields commonly achieved in vitro, especially in the batch mode of reaction. In Escherichia coli S30 extract-based cell-free systems protein yields are supposed to be partially limited by a secondary structure formation of the mRNA. In this study we checked promising members of various classes of RNA chaperones and several different RNA helicases on their ability to enhance in vitro translation. The data clearly show that the addition of none of these factors provides a general solution to the problem. However, protein yields can be increased in presence of a microRNA hybridizing with the 5′ untranslated region of mRNAs, possibly by inducing structural changes improving accessibility of the Shine Dalgarno sequence for the ribosomes. 相似文献
3.
Statistical correlation between protein secondary structure and messenger RNA stem-loop structure 总被引:5,自引:0,他引:5
A new integrated sequence-structure database, called IADE (Integrated ASTRAL-DSSP-EMBL), incorporating matching mRNA sequence, amino acid sequence, and protein secondary structural data, is constructed. It includes 648 protein domains. Based on the IADE database, we studied the relation between RNA stem-loop frequencies and protein secondary structure. It was found that the alpha-helices and beta-strands on proteins tend to be preferably "coded" by mRNA stem region, while the coils on proteins tend to be preferably "coded" by mRNA loop region. These tendencies are more obvious if we observe the structural words (SWs). An SW is defined by a four-amino-acid-fragment that shows the pronounced secondary structural (alpha-helix or beta-strand) propensity. It is demonstrated that the deduced correlation between protein and mRNA structure can hardly be explained as the stochastic fluctuation effect. 相似文献
4.
Systems perspectives on mRNA processing 总被引:4,自引:0,他引:4
5.
It has been reported that eukaryotic organisms have a nonsense-mediated mRNA decay (NMD) system to exclude aberrant mRNAs that produce truncated proteins. NMD is an RNA surveillance pathway that degrades mRNAs possessing premature translation termination codons (PTCs), thus avoiding production of possibly toxic truncated proteins. Three interacting proteins, UPF1, UPF2 and UPF3, are required for NMD in mammals and yeasts, and their amino acid sequences are well conserved among most eukaryotes, including plants. In this study, 'The Arabidopsis Information Resource' database was searched for mRNAs with premature termination codons. We selected five of these mRNAs and checked for the presence of PTCs in these mRNAs when translated in vivo. As a result we identified aberrant mRNAs produced by alternative splicing for each gene. These genes produced at least one alternative splicing variant including a PTC (PTC+) and another variant without a PTC (PTC-). We analyzed their PTC+/PTC- ratios in wild-type Arabidopsis and upf3 mutant plants and showed that the PTC+/PTC- ratios were higher in atupf3 mutant plants than wild-type plants and that the atupf3 mutant was less able to degrade mRNAs with premature termination codons than wild-type plants. This indicated that the AtUPF3 gene is required by the plant NMD system to obviate aberrantly spliced mRNA. 相似文献
6.
Dorota Matelska Elzbieta Purta Sylwia Panek Michal J. Boniecki Janusz M. Bujnicki Stanislaw Dunin-Horkawicz 《RNA (New York, N.Y.)》2013,19(10):1341-1348
Prokaryotic ribosomal protein genes are typically grouped within highly conserved operons. In many cases, one or more of the encoded proteins not only bind to a specific site in the ribosomal RNA, but also to a motif localized within their own mRNA, and thereby regulate expression of the operon. In this study, we computationally predicted an RNA motif present in many bacterial phyla within the 5′ untranslated region of operons encoding ribosomal proteins S6 and S18. We demonstrated that the S6:S18 complex binds to this motif, which we hereafter refer to as the S6:S18 complex-binding motif (S6S18CBM). This motif is a conserved CCG sequence presented in a bulge flanked by a stem and a hairpin structure. A similar structure containing a CCG trinucleotide forms the S6:S18 complex binding site in 16S ribosomal RNA. We have constructed a 3D structural model of a S6:S18 complex with S6S18CBM, which suggests that the CCG trinucleotide in a specific structural context may be specifically recognized by the S18 protein. This prediction was supported by site-directed mutagenesis of both RNA and protein components. These results provide a molecular basis for understanding protein-RNA recognition and suggest that the S6S18CBM is involved in an auto-regulatory mechanism. 相似文献
7.
About 200 mRNA sequences of Escherichia coli and human with matching protein secondary structure data were studied. The mRNA folding for each native sequence and for corresponding randomized sequences was calculated through free energy minimization. We have found that the folding energy of mRNA segments in different protein secondary structures is significantly different. The average Z score is more negative for regular secondary structure (alpha-helix and beta-strand) than that for coil. This suggests that the codon choice in native mRNA sequence coding for protein regular structure contributes more to the mRNA folding stability. 相似文献
8.
《基因组蛋白质组与生物信息学报(英文版)》2021,19(6):882-900
The secondary structure is a fundamental feature of both non-coding RNAs (ncRNAs) and messenger RNAs (mRNAs). However, our understanding of the secondary structures of mRNAs, especially those of the coding regions, remains elusive, likely due to translation and the lack of RNA-binding proteins that sustain the consensus structure like those binding to ncRNAs. Indeed, mRNAs have recently been found to adopt diverse alternative structures, but the overall functional significance remains untested. We hereby approach this problem by estimating the folding specificity, i.e., the probability that a fragment of an mRNA folds back to the same partner once refolded. We show that the folding specificity of mRNAs is lower than that of ncRNAs and exhibits moderate evolutionary conservation. Notably, we find that specific rather than alternative folding is likely evolutionarily adaptive since specific folding is frequently associated with functionally important genes or sites within a gene. Additional analysis in combination with ribosome density suggests the ability to modulate ribosome movement as one potential functional advantage provided by specific folding. Our findings reveal a novel facet of the RNA structurome with important functional and evolutionary implications and indicate a potential method for distinguishing the mRNA secondary structures maintained by natural selection from molecular noise. 相似文献
9.
An automatic method involving cluster analysis of secondary structures for the identification of domains in proteins. 下载免费PDF全文
R. Sowdhamini T. L. Blundell 《Protein science : a publication of the Protein Society》1995,4(3):506-520
With a growing number of structures available in the Brookhaven Protein Data Bank, automatic methods for domain identification are required for the construction of databases. Domains are considered to be clusters of secondary structure elements. Thus, helices and strands are first clustered using intersecondary structural distances between C alpha positions, and dendrograms based on this distance measure are used to identify domains. Individual domains are recognized by a disjoint factor, which enables the automatic identification and classification into disjoint, interacting, and conjoint domains. Application to a database of 83 protein families and 18 unique structures shows that the approach provides an effective delineation of boundaries and identifies those proteins that can be considered as a single domain. A quantitative estimate of the interaction between domains has been proposed. The database of protein domains is a useful tool for understanding protein folding, for recognizing protein folds, and for understanding structure-activity relationships. 相似文献
10.
A new method of contextual analysis was used to search the long non-random inverted repeats and the complementary palindromes in the genes of E. coli and T7 RNA polymerases. These genes were found to contain from 25% to 50% of all the nucleotides involved in such helices. The 5' -and 3' -ends of mRNA can be protected by neighbouring double helices from the nuclease attack. Some double helices are competing and very similar to the attenuator of E. coli trp-operon. 相似文献
11.
12.
Probability-based protein secondary structure identification using combined NMR chemical-shift data 下载免费PDF全文
For a long time, NMR chemical shifts have been used to identify protein secondary structures. Currently, this is accomplished through comparing the observed (1)H(alpha), (13)C(alpha), (13)C(beta), or (13)C' chemical shifts with the random coil values. Here, we present a new protocol, which is based on the joint probability of each of the three secondary structural types (beta-strand, alpha-helix, and random coil) derived from chemical-shift data, to identify the secondary structure. In combination with empirical smooth filters/functions, this protocol shows significant improvements in the accuracy and the confidence of identification. Updated chemical-shift statistics are reported, on the basis of which the reliability of using chemical shift to identify protein secondary structure is evaluated for each nucleus. The reliability varies greatly among the 20 amino acids, but, on average, is in the order of: (13)C(alpha)>(13)C'>(1)H(alpha)>(13)C(beta)>(15)N>(1)H(N) to distinguish an alpha-helix from a random coil; and (1)H(alpha)>(13)C(beta) >(1)H(N) approximately (13)C(alpha) approximately (13)C' approximately (15)N for a beta-strand from a random coil. Amide (15)N and (1)H(N) chemical shifts, which are generally excluded from the application, in fact, were found to be helpful in distinguishing a beta-strand from a random coil. In addition, the chemical-shift statistical data are compared with those reported previously, and the results are discussed. A JAVA User Interface program has been developed to make the entire procedure fully automated and is available via http://ccsr3150-p3.stanford.edu. 相似文献
13.
神经网络在蛋白质二级结构预测中的应用 总被引:3,自引:0,他引:3
介绍了蛋白质二级结构预测的研究意义,讨论了用在蛋白质二级结构预测方面的神经网络设计问题,并且较详尽地评述了近些年来用神经网络方法在蛋白质二级结构预测中的主要工作进展情况,展望了蛋白质结构预测的前景。 相似文献
14.
Secondary protein structure carries information about local structural arrangements, which include three major conformations: alpha-helices, beta-strands, and coils. Significant majority of successful methods for prediction of the secondary structure is based on multiple sequence alignment. However, multiple alignment fails to provide accurate results when a sequence comes from the twilight zone, that is, it is characterized by low (<30%) homology. To this end, we propose a novel method for prediction of secondary structure content through comprehensive sequence representation, called PSSC-core. The method uses a multiple linear regression model and introduces a comprehensive feature-based sequence representation to predict amount of helices and strands for sequences from the twilight zone. The PSSC-core method was tested and compared with two other state-of-the-art prediction methods on a set of 2187 twilight zone sequences. The results indicate that our method provides better predictions for both helix and strand content. The PSSC-core is shown to provide statistically significantly better results when compared with the competing methods, reducing the prediction error by 5-7% for helix and 7-9% for strand content predictions. The proposed feature-based sequence representation uses a comprehensive set of physicochemical properties that are custom-designed for each of the helix and strand content predictions. It includes composition and composition moment vectors, frequency of tetra-peptides associated with helical and strand conformations, various property-based groups like exchange groups, chemical groups of the side chains and hydrophobic group, auto-correlations based on hydrophobicity, side-chain masses, hydropathy, and conformational patterns for beta-sheets. The PSSC-core method provides an alternative for predicting the secondary structure content that can be used to validate and constrain results of other structure prediction methods. At the same time, it also provides useful insight into design of successful protein sequence representations that can be used in developing new methods related to prediction of different aspects of the secondary protein structure. 相似文献
15.
A novel method for predicting the secondary structures of proteins from amino acid sequence has been presented. The protein
secondary structure seqlets that are analogous to the words in natural language have been extracted. These seqlets will capture
the relationship between amino acid sequence and the secondary structures of proteins and further form the protein secondary
structure dictionary. To be elaborate, the dictionary is organism-specific. Protein secondary structure prediction is formulated
as an integrated word segmentation and part of speech tagging problem. The word-lattice is used to represent the results of
the word segmentation and the maximum entropy model is used to calculate the probability of a seqlet tagged as a certain secondary
structure type. The method is markovian in the seqlets, permitting efficient exact calculation of the posterior probability
distribution over all possible word segmentations and their tags by viterbi algorithm. The optimal segmentations and their
tags are computed as the results of protein secondary structure prediction. The method is applied to predict the secondary
structures of proteins of four organisms respectively and compared with the PHD method. The results show that the performance
of this method is higher than that of PHD by about 3.9% Q3 accuracy and 4.6% SOV accuracy. Combining with the local similarity
protein sequences that are obtained by BLAST can give better prediction. The method is also tested on the 50 CASP5 target
proteins with Q3 accuracy 78.9% and SOV accuracy 77.1%. A web server for protein secondary structure prediction has been constructed which
is available at http://www.insun.hit.edu.cn:81/demos/biology/index.html. 相似文献
16.
DONG Qiwen WANG Xiaolong LIN Lei & GUAN Yi School of Computer Science Technology Harbin Institute of Technology Harbin China 《中国科学:生命科学英文版》2005,48(4):394-405
1 Introduction The prediction of protein structure and function from amino acid sequences is one of the most impor-tant problems in molecular biology. This problem is becoming more pressing as the number of known pro-tein sequences is explored as a result of genome and other sequencing projects, and the protein sequence- structure gap is widening rapidly[1]. Therefore, com-putational tools to predict protein structures are needed to narrow the widening gap. Although the prediction of three dim… 相似文献
17.
Jen Tsi Yang 《Journal of Protein Chemistry》1996,15(2):185-191
The conformational parametersP
k
for each amino acid species (j=1–20) of sequential peptides in proteins are presented as the product ofP
i,k
, wherei is the number of the sequential residues in thekth conformational state (k=-helix,-sheet,-turn, or unordered structure). Since the average parameter for ann-residue segment is related to the average probability of finding the segment in the kth state, it becomes a geometric mean of (P
k
)av=(P
i,k
)
1/n
with amino acid residuei increasing from 1 ton. We then used ln(Pk)av to convert a multiplicative process to a summation, i.e., ln(P
k
)
av
=(1/n)P
i,k
(i=1 ton) for ease of operation. However, this is unlike the popular Chou-Fasman algorithm, which has the flaw of using the arithmetic mean for relative probabilities. The Chou-Fasman algorithm happens to be close to our calculations in many cases mainly because the difference between theirP
k
and our InP
k
is nearly constant for about one-half of the 20 amino acids. When stronger conformation formers and breakers exist, the difference become larger and the prediction at the N- and C-terminal-helix or-sheet could differ. If the average conformational parameters of the overlapping segments of any two states are too close for a unique solution, our calculations could lead to a different prediction. 相似文献
18.
Equal G and C contents in histone genes indicate selection pressures on mRNA secondary structure 总被引:1,自引:0,他引:1
Martijn A. Huynen Danielle A. M. Konings Pauline Hogeweg 《Journal of molecular evolution》1992,34(4):280-291
Summary Protein-specific versus taxon-specific patterns of nucleotide frequencies were studied in histone genes. The third positions of codons have a (well-known) taxon-specific G+C level and a histone type-specific G/C ratio. This ratio counterbalances the G/C ratio in the first and second positions so that the overall G and C levels in the coding region become approximately equal. The compensation of the G/C ratio indicates a selection pressure at the mRNA level rather than a selection pressure or mutation bias at the DNA level or a selection pressure on codon usage. The structure of histone mRNAs is compatible with the hypothesis that the G/C compensation is due to selection pressures on mRNA secondary structure. Nevertheless, no specific motifs seem to have been selected, and the free energy of the secondary structures is only slightly lower than that expected on the basis of nucleotide frequencies.Offprint requests to: M. A. Huynen 相似文献
19.
20.
A possible contribution of mRNA secondary structure to translation initiation efficiency in Lactococcus lactis 总被引:3,自引:0,他引:3
Maarten van de Guchte Ted van der Lende Jan Kok Gerard Venema 《FEMS microbiology letters》1991,81(2):201-208
Gene expression signals derived from Lactococcus lactis were linked to lacZ-fused genes with different 5'-nucleotide sequences. Computer predictions of mRNA secondary structure were combined with lacZ expression studies to direct base-substitutions that could possibly influence gene expression. Mutations were made such that the DNA sequence upstream of the ATG start codon was not changed. Moreover, care was taken that the substitutions, which were all within the first six codons, neither affected the amino acid sequence of the gene product nor introduced codons rarely used in L. lactis. The results suggest that mRNA secondary structure contributes to the efficiency of translation initiation in L. lactis. 相似文献