共查询到20条相似文献,搜索用时 0 毫秒
1.
MOTIVATION: A new representation for protein secondary structure prediction based on frequent amino acid patterns is described and evaluated. We discuss in detail how to identify frequent patterns in a protein sequence database using a level-wise search technique, how to define a set of features from those patterns and how to use those features in the prediction of the secondary structure of a protein sequence using support vector machines (SVMs). RESULTS: Three different sets of features based on frequent patterns are evaluated in a blind testing setup using 150 targets from the EVA contest and compared to predictions of PSI-PRED, PHD and PROFsec. Despite being trained on only 940 proteins, a simple SVM classifier based on this new representation yields results comparable to PSI-PRED and PROFsec. Finally, we show that the method contributes significant information to consensus predictions. AVAILABILITY: The method is available from the authors upon request. 相似文献
2.
3.
We propose a novel representation of RNA secondary structure for a quick comparison of different structures. Secondary structure was viewed as a set of stems and each stem was represented by two values according to its position. Using this representation, we improved the comparative sequence analysis method results and the minimum free-energy model. In the comparative sequence analysis method, a novel algorithm independent of multiple sequence alignment was developed to improve performance. When dealing with a single-RNA sequence, the minimum free-energy model is improved by combining it with RNA class information. Secondary structure prediction experiments were done on tRNA and RNAse P RNA; sensitivity and specificity were both improved. Furthermore, software programs were developed for non-commercial use. 相似文献
4.
Dandjinou AT Lévesque N Larose S Lucier JF Abou Elela S Wellinger RJ 《Current biology : CB》2004,14(13):1148-1158
BACKGROUND: Telomerase is a ribonucleoprotein complex whose RNA moiety dictates the addition of specific simple sequences onto chromosomes ends. While relevant for certain human genetic diseases, the contribution of the essential telomerase RNA to RNP assembly still remains unclear. Phylogenetic analyses of vertebrate and ciliate telomerase RNAs revealed conserved elements that potentially organize protein subunits for RNP function. In contrast, the yeast telomerase RNA could not be fitted to any known structural model, and the limited number of known sequences from Saccharomyces species did not permit the prediction of a yeast specific conserved structure. RESULTS: We cloned and analyzed the complete telomerase RNA loci (TLC1) from all known Saccharomyces species belonging to the "sensu stricto" group. Complementation analyses in S. cerevisiae and end mappings of mature RNAs ensured the relevance of the cloned sequences. By using phylogenetic comparative analysis coupled with in vitro enzymatic probing, we derived a secondary structure prediction of the Saccharomyces cerevisiae TLC1 RNA. This conserved secondary structure prediction includes a central domain that is likely to orchestrate DNA synthesis and at least two accessory domains important for RNA stability and telomerase recruitment. The structure also reveals a potential tertiary interaction between two loops in the central core. CONCLUSIONS: The predicted secondary structure of the TLC1 RNA of S. cerevisiae reveals a distinct folding pattern featuring well-separated but conserved functional elements. The predicted structure now allows for a detailed and rationally designed study to the structure-function relationships within the telomerase RNP-complex in a genetically tractable system. 相似文献
5.
In this paper, we proposed a 3-D graphical representation of RNA secondary structures. Based on this representation, we outline an approach by constructing a 3-component vector whose components are the normalized leading eigenvalues of the L/L matrices associated with RNA secondary structure. The examination of similarities/dissimilarities among the secondary structure at the 3'-terminus of different viruses illustrates the utility of the approach. 相似文献
6.
A permutation-based algorithm is introduced for the representation of closed RNA secondary structures. It is an efficient 'loopless' algorithm, which generates the permutations on base-pairs of 'k-noncrossing' setting partitions. The proposed algorithm reduces the computational complexity of known similar techniques in O(n), using minimal change ordering and transposing of not adjacent elements. 相似文献
7.
Background
It has become increasingly apparent that a comprehensive database of RNA motifs is essential in order to achieve new goals in genomic and proteomic research. Secondary RNA structures have frequently been represented by various modeling methods as graph-theoretic trees. Using graph theory as a modeling tool allows the vast resources of graphical invariants to be utilized to numerically identify secondary RNA motifs. The domination number of a graph is a graphical invariant that is sensitive to even a slight change in the structure of a tree. The invariants selected in this study are variations of the domination number of a graph. These graphical invariants are partitioned into two classes, and we define two parameters based on each of these classes. These parameters are calculated for all small order trees and a statistical analysis of the resulting data is conducted to determine if the values of these parameters can be utilized to identify which trees of orders seven and eight are RNA-like in structure. 相似文献8.
MOTIVATION: RNAs play an important role in many biological processes and
knowing their structure is important in understanding their function. Due
to difficulties in the experimental determination of RNA secondary
structure, the methods of theoretical prediction for known sequences are
often used. Although many different algorithms for such predictions have
been developed, this problem has not yet been solved. It is thus necessary
to develop new methods for predicting RNA secondary structure. The
most-used at present is Zuker's algorithm which can be used to determine
the minimum free energy secondary structure. However many RNA secondary
structures verified by experiments are not consistent with the minimum free
energy secondary structures. In order to solve this problem, a method used
to search a group of secondary structures whose free energy is close to the
global minimum free energy was developed by Zuker in 1989. When considering
a group of secondary structures, if there is no experimental data, we
cannot tell which one is better than the others. This case also occurs in
combinatorial and heuristic methods. These two kinds of methods have
several weaknesses. Here we show how the central limit theorem can be used
to solve these problems. RESULTS: An algorithm for predicting RNA secondary
structure based on helical regions distribution is presented, which can be
used to find the most probable secondary structure for a given RNA
sequence. It consists of three steps. First, list all possible helical
regions. Second, according to central limit theorem, estimate the
occurrence probability of every helical region based on the Monte Carlo
simulation. Third, add the helical region with the biggest probability to
the current structure and eliminate the helical regions incompatible with
the current structure. The above processes can be repeated until no more
helical regions can be added. Take the current structure as the final RNA
secondary structure. In order to demonstrate the confidence of the program,
a test on three RNA sequences: tRNAPhe, Pre-tRNATyr, and Tetrahymena
ribosomal RNA intervening sequence, is performed. AVAILABILITY: The program
is written in Turbo Pascal 7.0. The source code is available upon request.
CONTACT: Wujj@nic.bmi.ac.cn or Liwj@mail.bmi.ac.cn
相似文献
9.
This paper develops mathematical methods for describing and analyzing RNA secondary structures. It was motivated by the need to develop rigorous yet efficient methods to treat transitions from one secondary structure to another, which we propose here may occur as motions of loops within RNAs having appropriate sequences. In this approach a molecular sequence is described as a vector of the appropriate length. The concept of symmetries between nucleic acid sequences is developed, and the 48 possible different types of symmetries are described. Each secondary structure possible for a particular nucleotide sequence determines a symmetric, signed permutation matrix. The collection of all possible secondary structures is comprised of all matrices of this type whose left multiplication with the sequence vector leaves that vector unchanged. A transition between two secondary structures is given by the product of the two corresponding structure matrices. This formalism provides an efficient method for describing nucleic acid sequences that allows questions relating to secondary structures and transitions to be addressed using the powerful methods of abstract algebra. In particular, it facilitates the determination of possible secondary structures, including those containing pseudoknots. Although this paper concentrates on RNA structure, this formalism also can be applied to DNA. 相似文献
10.
On a six-dimensional representation of RNA secondary structures 总被引:2,自引:0,他引:2
In this paper, we proposed a 6-D representation of RNA secondary structures. The use of the 6-D representation is illustrated by constructing structure invariants. Comparisons with the similarity/dissimilarity results based on 6-D representation for a set of RNA secondary structures, are considered to illustrate the use of our structure invariants based on the entries in derived sequence matrices restricted to a selected width of a band along the main diagonal. 相似文献
11.
We describe a computational method for the prediction of RNA secondary structure that uses a combination of free energy and comparative sequence analysis strategies. Using a homology-based sequence alignment as a starting point, all favorable pairings with respect to the Turner energy function are identified. Each potentially paired region within a multiple sequence alignment is scored using a function that combines both predicted free energy and sequence covariation with optimized weightings. High scoring regions are ranked and sequentially incorporated to define a growing secondary structure. Using a single set of optimized parameters, it is possible to accurately predict the foldings of several test RNAs defined previously by extensive phylogenetic and experimental data (including tRNA, 5 S rRNA, SRP RNA, tmRNA, and 16 S rRNA). The algorithm correctly predicts approximately 80% of the secondary structure. A range of parameters have been tested to define the minimal sequence information content required to accurately predict secondary structure and to assess the importance of individual terms in the prediction scheme. This analysis indicates that prediction accuracy most strongly depends upon covariational information and only weakly on the energetic terms. However, relatively few sequences prove sufficient to provide the covariational information required for an accurate prediction. Secondary structures can be accurately defined by alignments with as few as five sequences and predictions improve only moderately with the inclusion of additional sequences. 相似文献
12.
RNA二级结构的预测算法研究已有近40年的发展历程,研究假结也将近30年的历史。在此期间,RNA二级结构的预测算法取得了很大进步,但假结预测的正确率依然偏低。其中启发式算法能较好地处理复杂假结,使其成为率先解决假结预测难题可能性最大的算法。迄今为止,未见系统地专门总结预测假结的各种启发式算法及其优点与缺点的报道。本文详细介绍了近年来国际上流行的贪婪算法、遗传算法、ILM算法、HotKnots算法以及FlexStem算法等五种算法,并总结分析了每种算法的优点与不足,最后提出在未来一段时期内,利用启发式算法提高假结预测准确度应从建立更完善的假结模型、加入更多影响因素、借鉴不同算法的优势等方面入手。为含假结RNA二级结构预测的研究提供参考。 相似文献
13.
The function of many RNAs depends crucially on their structure. Therefore, the design of RNA molecules with specific structural properties has many potential applications, e.g. in the context of investigating the function of biological RNAs, of creating new ribozymes, or of designing artificial RNA nanostructures. Here, we present a new algorithm for solving the following RNA secondary structure design problem: given a secondary structure, find an RNA sequence (if any) that is predicted to fold to that structure. Unlike the (pseudoknot-free) secondary structure prediction problem, this problem appears to be hard computationally. Our new algorithm, "RNA Secondary Structure Designer (RNA-SSD)", is based on stochastic local search, a prominent general approach for solving hard combinatorial problems. A thorough empirical evaluation on computationally predicted structures of biological sequences and artificially generated RNA structures as well as on empirically modelled structures from the biological literature shows that RNA-SSD substantially out-performs the best known algorithm for this problem, RNAinverse from the Vienna RNA Package. In particular, the new algorithm is able to solve structures, consistently, for which RNAinverse is unable to find solutions. The RNA-SSD software is publically available under the name of RNA Designer at the RNASoft website (www.rnasoft.ca). 相似文献
14.
A conserved secondary structure for telomerase RNA. 总被引:41,自引:0,他引:41
The RNA moiety of the ribonucleoprotein enzyme telomerase contains the template for telomeric DNA synthesis. We present a secondary structure model for telomerase RNA, derived by a phylogenetic comparative analysis of telomerase RNAs from seven tetrahymenine ciliates. The telomerase RNA genes from Tetrahymena malaccensis, T. pyriformis, T. hyperangularis, T. pigmentosa, T. hegewishii, and Glaucoma chattoni were cloned, sequenced, and compared with the previously cloned RNA gene from T. thermophila and with each other. To define secondary structures of these RNAs, homologous complementary sequences were identified by the occurrence of covariation among putative base pairs. Although their primary sequences have diverged rapidly overall, a strikingly conserved secondary structure was identified for all these telomerase RNAs. Short regions of nucleotide conservation include a block of 22 totally conserved nucleotides that contains the telomeric templating region. 相似文献
15.
Does a protein's secondary structure determine its three-dimensional fold? This question is tested directly by analyzing proteins of known structure and constructing a taxonomy based solely on secondary structure. The taxonomy is generated automatically, and it takes the form of a tree in which proteins with similar secondary structure occupy neighboring leaves. Our tree is largely in agreement with results from the structural classification of proteins (SCOP), a multidimensional classification based on homologous sequences, full three-dimensional structure, information about chemistry and evolution, and human judgment. Our findings suggest a simple mechanism of protein evolution. 相似文献
16.
Hofacker IL 《Nucleic acids research》2003,31(13):3429-3431
The Vienna RNA secondary structure server provides a web interface to the most frequently used functions of the Vienna RNA software package for the analysis of RNA secondary structures. It currently offers prediction of secondary structure from a single sequence, prediction of the consensus secondary structure for a set of aligned sequences and the design of sequences that will fold into a predefined structure. All three services can be accessed via the Vienna RNA web server at http://rna.tbi.univie.ac.at/. 相似文献
17.
An RNA secondary structure workbench 总被引:2,自引:4,他引:2
H M Martinez 《Nucleic acids research》1988,16(5):1789-1798
A multiple approach to the study of RNA secondary structure is described which provides for the independent drawing of structures using base-pairing lists, for the generation of local structures in the form of hairpins, and for the generation of global structures by both Monte Carlo and dynamic programming methodologies. User-adjustable parameters provide for limiting the size of hairpin loops, bulges and inner loops, and constraints can be imposed relative to position-dependent base pairing. 相似文献
18.
Jin Li Hong Liang Wang Cong Ying Wang 《Computer methods in biomechanics and biomedical engineering》2017,20(12):1261-1272
Owing to their structural diversity, RNAs perform many diverse biological functions in the cell. RNA secondary structure is thus important for predicting RNA function. Here, we propose a new combinatorial optimization algorithm, named RGRNA, to improve the accuracy of predicting RNA secondary structure. Following the establishment of a stempool, the stems are sorted by length, and chosen from largest to smallest. If the stem selected is the true stem, the secondary structure of this stem when combined with another stem selected at random will have low free energy, and the free energy will tend to gradually diminish. The free energy is considered as a parameter and the structure is converted into binary numbers to determine stem compatibility, for step-by-step prediction of the secondary structure for all combinations of stems. The RNA secondary structure can be predicted by the RGRNA method. Our experimental results show that the proposed algorithm outperforms RNAfold in terms of sensitivity, specificity, and Matthews correlation coefficient value. 相似文献
19.
Measuring the (dis)similarity between RNA secondary structures is critical for the study of RNA secondary structures and has implications to RNA functional characterization. Although a number of methods have been developed for comparing RNA structural similarities, their applications have been limited by the complexity of the required computation. In this paper, we present a novel method for comparing the similarity of RNA secondary structures generated from the same RNA sequence, i.e., a secondary structure ensemble, using a matrix representation of the RNA structures. Relevant features of the RNA secondary structures can be easily extracted through singular value decomposition (SVD) of the representing matrices. We have mapped the feature vectors of the singular values to a kernel space, where (dis)similarities among the mapped feature vectors become more evident, making clustering of RNA secondary structures easier to handle. The pair-wise comparison of RNA structures is achieved through computing the distance between the singular value vectors in the kernel space. We have applied a fuzzy kernel clustering method, using this similarity metric, to cluster the RNA secondary structure ensembles. Our application results suggest that our fuzzy kernel clustering method is highly promising for classifications of RNA structure ensembles, because of its low computational complexity and high clustering accuracy. 相似文献
20.
The total number of RNA secondary structures of a given length with minimal hairpin loop length m(m>0) and with minimal stack length l(l>0) is computed, under the assumption that all base pairs can occur. Asymptotics are derived from the determination of recurrence relations of decomposition properties. 相似文献