首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We present a machine learning method (a hierarchical network of k-nearest neighbor classifiers) that uses an RNA sequence alignment in order to predict a consensus RNA secondary structure. The input to the network is the mutual information, the fraction of complementary nucleotides, and a novel consensus RNAfold secondary structure prediction of a pair of alignment columns and its nearest neighbors. Given this input, the network computes a prediction as to whether a particular pair of alignment columns corresponds to a base pair. By using a comprehensive test set of 49 RFAM alignments, the program KNetFold achieves an average Matthews correlation coefficient of 0.81. This is a significant improvement compared with the secondary structure prediction methods PFOLD and RNAalifold. By using the example of archaeal RNase P, we show that the program can also predict pseudoknot interactions.  相似文献   

2.
Prediction of protein-RNA interactions at the atomic level of detail is crucial for our ability to understand and interfere with processes such as gene expression and regulation. Here, we investigate protein binding pockets that accommodate extruded nucleotides not involved in RNA base pairing. We observed that most of the protein-interacting nucleotides are part of a consecutive fragment of at least two nucleotides whose rings have significant interactions with the protein. Many of these share the same protein binding cavity and more than 30% of such pairs are π-stacked. Since these local geometries cannot be inferred from the nucleotide identities, we present a novel framework for their prediction from the properties of protein binding sites.First, we present a classification of known RNA nucleotide and dinucleotide protein binding sites and identify the common types of shared 3-D physicochemical binding patterns. These are recognized by a new classification methodology that is based on spatial multiple alignment. The shared patterns reveal novel similarities between dinucleotide binding sites of proteins with different overall sequences, folds and functions. Given a protein structure, we use these patterns for the prediction of its RNA dinucleotide binding sites. Based on the binding modes of these nucleotides, we further predict an RNA fragment that interacts with those protein binding sites. With these knowledge-based predictions, we construct an RNA fragment that can have a previously unknown sequence and structure. In addition, we provide a drug design application in which the database of all known small-molecule binding sites is searched for regions similar to nucleotide and dinucleotide binding patterns, suggesting new fragments and scaffolds that can target them.  相似文献   

3.
The prediction of RNA structure is useful for understanding evolution for both in silico and in vitro studies. Physical methods like NMR studies to predict RNA secondary structure are expensive and difficult. Computational RNA secondary structure prediction is easier. Comparative sequence analysis provides the best solution. But secondary structure prediction of a single RNA sequence is challenging. RNA-SSPT is a tool that computationally predicts secondary structure of a single RNA sequence. Most of the RNA secondary structure prediction tools do not allow pseudoknots in the structure or are unable to locate them. Nussinov dynamic programming algorithm has been implemented in RNA-SSPT. The current studies shows only energetically most favorable secondary structure is required and the algorithm modification is also available that produces base pairs to lower the total free energy of the secondary structure. For visualization of RNA secondary structure, NAVIEW in C language is used and modified in C# for tool requirement. RNA-SSPT is built in C# using Dot Net 2.0 in Microsoft Visual Studio 2005 Professional edition. The accuracy of RNA-SSPT is tested in terms of Sensitivity and Positive Predicted Value. It is a tool which serves both secondary structure prediction and secondary structure visualization purposes.  相似文献   

4.
5.

Background  

The number of protein sequences deriving from genome sequencing projects is outpacing our knowledge about the function of these proteins. With the gap between experimentally characterized and uncharacterized proteins continuing to widen, it is necessary to develop new computational methods and tools for functional prediction. Knowledge of catalytic sites provides a valuable insight into protein function. Although many computational methods have been developed to predict catalytic residues and active sites, their accuracy remains low, with a significant number of false positives. In this paper, we present a novel method for the prediction of catalytic sites, using a carefully selected, supervised machine learning algorithm coupled with an optimal discriminative set of protein sequence conservation and structural properties.  相似文献   

6.
小RNA深度测序技术分析西瓜花叶病毒蜀葵分离物   总被引:1,自引:0,他引:1  
蜀葵病毒病害的发生对其生长造成严重影响,明确蜀葵病毒病害的种类及变异进化对蜀葵病毒病害的防治具有重要意义。利用小RNA深度测序技术对具有明显脉明、花叶症状的蜀葵叶片进行鉴定。结果发现,感病蜀葵被西瓜花叶病毒(Watermelon mosaic virus, WMV)、锦葵脉明病毒(Mala vein cleaning virus, MVCV)和一种新的RNA病毒[暂命名为蜀葵病毒1号(Althaea rosea virus1, ArV1)]所侵染。为进一步明确WMV蜀葵分离物(WMV-Tg)的进化关系,对病毒WMV-Tg全基因组进行扩增,获得全长为10 046个核苷酸序列(nt)。序列分析结果显示,WMV-Tg与已报道的WMV分离物基因组核苷酸序列的同源性为83.3%~90.2%。系统进化关系表明,WMV-Tg与WMV-Pg聚为一簇,亲缘关系最近。对蜀葵WMV-Tg来源的小RNA(WMV-derived small interfering RNAs, WMV-vsiRNAs)的长度分布、5′碱基偏好性、极性分布以及热点区分布的分析,有助于加深对WMV-vsiRNAs的了解,并为进一步研究病毒来源的小RNA(virus-derived small interfering RNAs, vsiRNAs)在抗病毒防御中的功能,以及为蜀葵病毒病的防治奠定理论基础。  相似文献   

7.
Genome‐wide association studies (GWAS) and whole‐exome sequencing (WES) generate massive amounts of genomic variant information, and a major challenge is to identify which variations drive disease or contribute to phenotypic traits. Because the majority of known disease‐causing mutations are exonic non‐synonymous single nucleotide variations (nsSNVs), most studies focus on whether these nsSNVs affect protein function. Computational studies show that the impact of nsSNVs on protein function reflects sequence homology and structural information and predict the impact through statistical methods, machine learning techniques, or models of protein evolution. Here, we review impact prediction methods and discuss their underlying principles, their advantages and limitations, and how they compare to and complement one another. Finally, we present current applications and future directions for these methods in biological research and medical genetics.  相似文献   

8.
9.
Viruses are serious threats to human and animal health. Vaccines can prevent viral diseases, but few antiviral treatments are available to control evolving infections. Among new antiviral therapies, RNA interference (RNAi) has been the focus of intensive research. However, along with the development of efficient RNAi-based therapeutics comes the risk of emergence of resistant viruses. In this study, we challenged the in vitro propensity of a morbillivirus (peste des petits ruminants virus), a stable RNA virus, to escape the inhibition conferred by single or multiple small interfering RNAs (siRNAs) against conserved regions of the N gene. Except with the combination of three different siRNAs, the virus systematically escaped RNAi after 3 to 20 consecutive passages. The genetic modifications involved consisted of single or multiple point nucleotide mutations and a deletion of a stretch of six nucleotides, illustrating that this virus has an unusual genomic malleability.  相似文献   

10.
Protein function prediction with high-throughput data   总被引:1,自引:0,他引:1  
Zhao XM  Chen L  Aihara K 《Amino acids》2008,35(3):517-530
  相似文献   

11.
In silico tools have been developed to predict variants that may have an impact on pre-mRNA splicing. The major limitation of the application of these tools to basic research and clinical practice is the difficulty in interpreting the output. Most tools only predict potential splice sites given a DNA sequence without measuring splicing signal changes caused by a variant. Another limitation is the lack of large-scale evaluation studies of these tools. We compared eight in silico tools on 2959 single nucleotide variants within splicing consensus regions (scSNVs) using receiver operating characteristic analysis. The Position Weight Matrix model and MaxEntScan outperformed other methods. Two ensemble learning methods, adaptive boosting and random forests, were used to construct models that take advantage of individual methods. Both models further improved prediction, with outputs of directly interpretable prediction scores. We applied our ensemble scores to scSNVs from the Catalogue of Somatic Mutations in Cancer database. Analysis showed that predicted splice-altering scSNVs are enriched in recurrent scSNVs and known cancer genes. We pre-computed our ensemble scores for all potential scSNVs across the human genome, providing a whole genome level resource for identifying splice-altering scSNVs discovered from large-scale sequencing studies.  相似文献   

12.
Heterogeneity and evolution rates of delta virus RNA sequences.   总被引:12,自引:3,他引:9       下载免费PDF全文
F Imazeki  M Omata    M Ohto 《Journal of virology》1990,64(11):5594-5599
To investigate the geographical divergence of delta virus RNA sequences, 868 nucleotides (nt), including the delta antigen-coding region, were determined in isolates from two Japanese patients, M and S, by polymerase chain reaction and direct sequencing and compared with three previously reported nucleotide sequences. The sequence obtained for hepatitis delta virus RNA from patient M was approximately 92% identical to sequences previously obtained for two other strains of hepatitis delta virus, whereas the sequence of hepatitis delta virus RNA obtained from patient S was approximately 81% identical to the previously sequenced strains. This suggests that delta agent in Japan has a heterogeneous origin and the delta virus RNA sequence from Japanese patient S is the most divergent delta virus isolate yet analyzed. To study the evolution rate of delta virus RNA, viral isolates obtained 3 and 4 years apart from each of two patients were also sequenced. It was estimated that the substitution rate of viral RNA was 0.57 x 10(-3) nt per site per year in patient M and 0.64 x 10(-3) nt per site per year in patient S for the delta antigen gene.  相似文献   

13.
To predict alterations in single-strand DNA mobility in non-denaturing electrophoretic gels, Zuker's RNA folding program was modified. Energy files utilized by the LRNA RNA folding algorithm were modified to emulate folding of single-strand DNA. Energy files were modified to disallow G-T base pairing. Stacking energies were corrected for DNA thermodynamics. Constraints on loop nucleotide sequences were removed. The LRNA RNA folding algorithm using the DNA fold energy files was applied to predict folding of PCR generated single-strand DNA molecules from polymorphic human ALDH2 and TPH alleles. The DNA-Fold version 1.0 program was used to design primers to create and abolish SSCP mobility shifts. Primers were made that add a 5' tag sequence or alter complementarity to an internal sequence. Differences in DNA secondary structure were assessed by SSCP analysis and compared to single-strand DNA secondary structure predictions. Results demonstrate that alterations in single-strand DNA conformation may be predicted using DNA-Fold 1.0.  相似文献   

14.
Nucleotide sequence of human influenza A/PR/8/34 segment 2.   总被引:9,自引:2,他引:7       下载免费PDF全文
The nucleotide sequence of RNA segment 2 of human influenza strain A/PR/8/34 has been determined. Segment 2 in 2341 nucleotides long and encodes a protein of 757 amino acids (86,500 daltons molecular weight) which is involved in RNA synthesis. Although segment 2 is identical in size to segment 1, which encodes a protein of related function, neither the nucleotide sequences of these two RNA segments nor the amino acid sequences of the encoded proteins appear to be homologous. The sequence of segment 2 completes the sequence of the virus (total 13,588 nucleotides).  相似文献   

15.
The nucleotide sequence of T4 band D RNA, a stable RNA species encoded by bacteriophage T4, has been deduced from analysis of the 32P-labeled RNA and comparison with the DNA sequence of the T4 genome in the region encoding the RNA. The sequence is: pA-U-G-A-G-A-A-A-C-C-G-G-G-U-C-G-C-U-A-C-C-G-G-U-A-A-G-U-C-G-U-C-G-G-A-C-U-G-A-U-G-G-U-U-C-C-C-U-G-A-G-U-A-A-G-G-A-A-U-U-G-C-G-U-U-A-A-U-A-A -U-C-U-U-U-G-C-G-U-U-U-A-U-U-G-A-U-G-C-C-C-U-C-U-U-A-C-A-U-C-A-C-A-G-C-A-G-A-A-A-C-G-G-C-G-C-A-C-C-AOH. Band D RNA is 120 nucleotides long, and contains no modified nucleotides. The sequence can be arranged in a secondary structure consistent with the results of limited digestion with nuclease S1, but shows no striking similarities to tRNAs. While a biological function for band D RNA is unknown, similar molecules are encoded by bacteriophages T2 and T6, indicating that the molecule has been preserved during evolution. This retention may reflect a significant function for the RNA.  相似文献   

16.
Primary and secondary structure of 7-3 (K) RNA of Novikoff hepatoma   总被引:5,自引:0,他引:5  
7-3 RNA (also known as K-RNA and 7SK-RNA) is a distinct small RNA found in insect to mammalian cells. Previous studies showed that this RNA is not capped, contains no modified nucleotides, is conserved through evolution, is synthesized by RNA polymerase III, and, in part, is associated by polyribosomes. In this study, the complete nucleotide sequence of 7-3 RNA was determined by RNA-sequencing methods, and the sequence is compared with several small RNAs and repetitive DNA sequences for homology. This 330-nucleotide-long RNA contained pppGp as its 5' terminus and exhibited heterogeneity with respect to the 3'-terminal AoH. The nucleotide sequence is: (sequence in text) The RNA is G-C rich, and evidence is presented that 7-3 RNA is in a ribonucleoprotein particle in the cytoplasm.  相似文献   

17.
18.
Single-point mutations are one of the most frequent causes of genetic variability in both human and close species. The recent availability of different bioinformatics tools for annotating human single nucleotide polymorphisms (SNPs) has opened the possibility of using them to score SNPs from species with a biomedical interest, in particular from mice and other models of human disease. Also, this ability to predict pathogenicity of single point mutations in one species, based on data from another species, opens the possibility to predict the pathological character of single point mutations in humans using data from well-characterized model systems of human disease. This could provide a valuable alternative to the more traditional genetic population approaches. However, transferral of prediction tools may be limited by different factors, from a species bias in the training set, to a large sequence divergence between the proteomes of the training and the target species. Here we study the conditions under which prediction tools can be transferred among species, concentrating in the case of mice. We find that for the majority of the human-mouse homolog pairs, the sequence similarity is large enough to preserve the pathological character of mutations among species, in general. We then establish that prediction/annotation tools developed for one organism can be used to predict the neutral/pathological character of mutations/SNPs in the other organism.  相似文献   

19.
20.
Despite the biological importance of non-coding RNA, their structural characterization remains challenging. Making use of the rapidly growing sequence databases, we analyze nucleotide coevolution across homologous sequences via Direct-Coupling Analysis to detect nucleotide-nucleotide contacts. For a representative set of riboswitches, we show that the results of Direct-Coupling Analysis in combination with a generalized Nussinov algorithm systematically improve the results of RNA secondary structure prediction beyond traditional covariance approaches based on mutual information. Even more importantly, we show that the results of Direct-Coupling Analysis are enriched in tertiary structure contacts. By integrating these predictions into molecular modeling tools, systematically improved tertiary structure predictions can be obtained, as compared to using secondary structure information alone.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号