首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 0 毫秒
High accuracy is paramount when predicting biochemical characteristics using Quantitative Structural-Property Relationships (QSPRs). Although existing graph-theoretic kernel methods combined with machine learning techniques are efficient for QSPR model construction, they cannot distinguish topologically identical chiral compounds which often exhibit different biological characteristics. In this paper, we propose a new method that extends the recently developed tree pattern graph kernel to accommodate stereoisomers. We show that Support Vector Regression (SVR) with a chiral graph kernel is useful for target property prediction by demonstrating its application to a set of human vitamin D receptor ligands currently under consideration for their potential anti-cancer effects.  相似文献   

RNA structural motifs are recurrent three-dimensional (3D) components found in the RNA architecture. These RNA structural motifs play important structural or functional roles and usually exhibit highly conserved 3D geometries and base-interaction patterns. Analysis of the RNA 3D structures and elucidation of their molecular functions heavily rely on efficient and accurate identification of these motifs. However, efficient RNA structural motif search tools are lacking due to the high complexity of these motifs. In this work, we present RNAMotifScanX, a motif search tool based on a base-interaction graph alignment algorithm. This novel algorithm enables automatic identification of both partially and fully matched motif instances. RNAMotifScanX considers noncanonical base-pairing interactions, base-stacking interactions, and sequence conservation of the motifs, which leads to significantly improved sensitivity and specificity as compared with other state-of-the-art search tools. RNAMotifScanX also adopts a carefully designed branch-and-bound technique, which enables ultra-fast search of large kink-turn motifs against a 23S rRNA. The software package RNAMotifScanX is implemented using GNU C++, and is freely available from http://genome.ucf.edu/RNAMotifScanX.  相似文献   

Several computational methods based on stochastic context-free grammars have been developed for modeling and analyzing functional RNA sequences. These grammatical methods have succeeded in modeling typical secondary structures of RNA, and are used for structural alignment of RNA sequences. However, such stochastic models cannot sufficiently discriminate member sequences of an RNA family from nonmembers and hence detect noncoding RNA regions from genome sequences. A novel kernel function, stem kernel, for the discrimination and detection of functional RNA sequences using support vector machines (SVMs) is proposed. The stem kernel is a natural extension of the string kernel, specifically the all-subsequences kernel, and is tailored to measure the similarity of two RNA sequences from the viewpoint of secondary structures. The stem kernel examines all possible common base pairs and stem structures of arbitrary lengths, including pseudoknots between two RNA sequences, and calculates the inner product of common stem structure counts. An efficient algorithm is developed to calculate the stem kernels based on dynamic programming. The stem kernels are then applied to discriminate members of an RNA family from nonmembers using SVMs. The study indicates that the discrimination ability of the stem kernel is strong compared with conventional methods. Furthermore, the potential application of the stem kernel is demonstrated by the detection of remotely homologous RNA families in terms of secondary structures. This is because the string kernel is proven to work for the remote homology detection of protein sequences. These experimental results have convinced us to apply the stem kernel in order to find novel RNA families from genome sequences.  相似文献   



Gene set testing has become an important analysis technique in high throughput microarray and next generation sequencing studies for uncovering patterns of differential expression of various biological processes. Often, the large number of gene sets that are tested simultaneously require some sort of multiplicity correction to account for the multiplicity effect. This work provides a substantial computational improvement to an existing familywise error rate controlling multiplicity approach (the Focus Level method) for gene set testing in high throughput microarray and next generation sequencing studies using Gene Ontology graphs, which we call the Short Focus Level.


The Short Focus Level procedure, which performs a shortcut of the full Focus Level procedure, is achieved by extending the reach of graphical weighted Bonferroni testing to closed testing situations where restricted hypotheses are present, such as in the Gene Ontology graphs. The Short Focus Level multiplicity adjustment can perform the full top-down approach of the original Focus Level procedure, overcoming a significant disadvantage of the otherwise powerful Focus Level multiplicity adjustment. The computational and power differences of the Short Focus Level procedure as compared to the original Focus Level procedure are demonstrated both through simulation and using real data.


The Short Focus Level procedure shows a significant increase in computation speed over the original Focus Level procedure (as much as ∼15,000 times faster). The Short Focus Level should be used in place of the Focus Level procedure whenever the logical assumptions of the Gene Ontology graph structure are appropriate for the study objectives and when either no a priori focus level of interest can be specified or the focus level is selected at a higher level of the graph, where the Focus Level procedure is computationally intractable.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-014-0349-3) contains supplementary material, which is available to authorized users.  相似文献   

MOTIVATION: Current methods for multiplicity adjustment do not make use of the graph structure of Gene Ontology (GO) when testing for association of expression profiles of GO terms with a response variable. RESULTS: We propose a multiple testing method, called the focus level procedure, that preserves the graph structure of Gene Ontology (GO). The procedure is constructed as a combination of a Closed Testing procedure with Holm's method. It requires a user to choose a 'focus level' in the GO graph, which reflects the level of specificity of terms in which the user is most interested. This choice also determines the level in the GO graph at which the procedure has most power. We prove that the procedure strongly controls the family-wise error rate without any additional assumptions on the joint distribution of the test statistics used. We also present an algorithm to calculate multiplicity-adjusted P-values. Because the focus level procedure preserves the structure of the GO graph, it does not generally preserve the ordering of the raw P-values in the adjusted P-values. AVAILABILITY: The focus level procedure has been implemented in the globaltest and GlobalAncova packages, both of which are available on www.bioconductor.org.  相似文献   

Disease gene identification by using graph kernels and Markov random fields   总被引:1,自引:0,他引:1  
Genes associated with similar diseases are often functionally related. This principle is largely supported by many biological data sources, such as disease phenotype similarities, protein complexes, protein-protein interactions, pathways and gene expression profiles. Integrating multiple types of biological data is an effective method to identify disease genes for many genetic diseases. To capture the gene-disease associations based on biological networks, a kernel-based MRF method is proposed by combining graph kernels and the Markov random field (MRF) method. In the proposed method, three kinds of kernels are employed to describe the overall relationships of vertices in five biological networks, respectively, and a novel weighted MRF method is developed to integrate those data. In addition, an improved Gibbs sampling procedure and a novel parameter estimation method are proposed to generate predictions from the kernel-based MRF method. Numerical experiments are carried out by integrating known gene-disease associations, protein complexes, protein-protein interactions, pathways and gene expression profiles. The proposed kernel-based MRF method is evaluated by the leave-one-out cross validation paradigm, achieving an AUC score of 0.771 when integrating all those biological data in our experiments, which indicates that our proposed method is very promising compared with many existing methods.  相似文献   

We present a simple method for the analysis of large networks based on their graph spectral properties. One of the advantages of this method is that it uses a single numerical computation to identify subclusters in a connected graph, which can significantly simplify the complexity involved in analyzing large graphs. This is illustrated using a network of protein chains constructed on the basis of their structural similarities. The large-scale network properties and the cluster and subcluster organization of the protein chain network are presented. We summarize the results of structural and functional analyses of the nodes present in these clusters and elucidate the implications of structural similarity in the protein chain universe.  相似文献   

There are several methods for locating the RNA site where a protein binds. One of the less common methods is directed cleavage of the RNA by an EDTA-Fe reagent tethered to the protein. The reaction of the EDTA-Fe(III) with ascorbate or hydrogen peroxide produces reactive oxygen species, such as hydroxyl radicals, localized within a 10-A radius of the iron center. The reactive oxygen species will attack the ribose or deoxyribose of nucleic acids as well as proximal polypeptide backbones. One EDTA-Fe reagent, (EDTA-2-aminoethyl)-2-pyridyl disulfide complexed to iron (EPD-Fe), has been tethered to several proteins through a disulfide linkage to engineered cysteine thiols and used to cleave DNA, proteins, and RNA. A second tethered EDTA-Fe reagent, 1-(p-bromoacetamidobenzyl)-EDTA-Fe, or BABE, has also been used to cleave RNA. Here we describe the issues involved in using these reagents with any RNA binding protein.  相似文献   

Fast and proper assessment of bio macro-molecular complex structural rigidity as a measure of structural stability can be useful in systematic studies to predict molecular function, and can also enable the design of rapid scoring functions to rank automatically generated bio-molecular complexes. Based on the graph theoretical approach of Jacobs et al. [Jacobs DJ, Rader AJ, Kuhn LA, Thorpe MF (2001) Protein flexibility predictions using graph theory. Proteins: Struct Funct Genet 44:150–165] for expressing molecular flexibility, we propose a new scheme to analyze the structural stability of bio-molecular complexes. This analysis is performed in terms of the identification in interacting subunits of clusters of flappy amino acids (those constituting regions of potential internal motion) that undergo an increase in rigidity at complex formation. Gains in structural rigidity of the interacting subunits upon bio-molecular complex formation can be evaluated by expansion of the network of intra-molecular inter-atomic interactions to include inter-molecular inter-atomic interaction terms. We propose two indices for quantifying this change: one local, which can express localized (at the amino acid level) structural rigidity, the other global to express overall structural stability for the complex. The new system is validated with a series of protein complex structures reported in the protein data bank. Finally, the indices are used as scoring coefficients to rank automatically generated protein complex decoys.  相似文献   

Chemical probing is a common method for the structural characterization of RNA. Typically, RNA is radioactively end-labelled, subjected to probing conditions, and the cleavage fragment pattern is analysed by gel electrophoresis. In recent years, many chemical modifications, like fluorophores, were introduced into RNA, but methods are lacking that detect the influence of the modification on the RNA structure with single-nucleotide resolution. Here, we first demonstrate that a 5'-terminal (32)P label can be replaced by a dye label for in-line probing of riboswitch RNAs. Next, we show that small, highly structured FRET-labelled Diels-Alderase ribozymes can be directly probed, using the internal or terminal FRET dyes as reporters. The probing patterns indeed reveal whether or not the attachment of the dyes influences the structure. The existence of two dye labels in typical FRET constructs is found to be beneficial, as 'duplexing' allows observation of the complete RNA on a single gel. Structural information can be derived from the probing gels by deconvolution of the superimposed band patterns. Finally, we use fluorescent in-line probing to experimentally validate the structural consequences of photocaging, unambiguously demonstrating the intentional destruction of selected elements of secondary or tertiary structure.  相似文献   

Understanding the structural repertoire of RNA is crucial for RNA genomics research. Yet current methods for finding novel RNAs are limited to small or known RNA families. To expand known RNA structural motifs, we develop a two-dimensional graphical representation approach for describing and estimating the size of RNA’s secondary structural repertoire, including naturally occurring and other possible RNA motifs. We employ tree graphs to describe RNA tree motifs and more general (dual) graphs to describe both RNA tree and pseudoknot motifs. Our estimates of RNA’s structural space are vastly smaller than the nucleotide sequence space, suggesting a new avenue for finding novel RNAs. Specifically our survey shows that known RNA trees and pseudoknots represent only a small subset of all possible motifs, implying that some of the ‘missing’ motifs may represent novel RNAs. To help pinpoint RNA-like motifs, we show that the motifs of existing functional RNAs are clustered in a narrow range of topological characteristics. We also illustrate the applications of our approach to the design of novel RNAs and automated comparison of RNA structures; we report several occurrences of RNA motifs within larger RNAs. Thus, our graph theory approach to RNA structures has implications for RNA genomics, structure analysis and design.  相似文献   

RNA editing by adenosine deaminases acting on RNAs (ADARs) can be both specific and non-specific, depending on the substrate. Specific editing of particular adenosines may depend on the overall sequence and structural context. However, the detailed mechanisms underlying these preferences are not fully understood. Here, we show that duplex structures mimicking an editing site in the Gabra3 pre-mRNA unexpectedly fail to support RNA editing at the Gabra3 I/M site, although phylogenetic analysis suggest an evolutionarily conserved duplex structure essential for efficient RNA editing. These unusual results led us to revisit the structural requirement for this editing by mutagenesis analysis. In vivo nuclear injection experiments of mutated editing substrates demonstrate that a non-conserved structure is a determinant for editing. This structure contains bulges either on the same or the strand opposing the edited adenosine. The position of these bulges and the distance to the edited base regulate editing. Moreover, elevated folding temperature can lead to a switch in RNA editing suggesting an RNA structural change. Our results indicate the importance of RNA tertiary structure in determining RNA editing.  相似文献   

COS-7细胞中小发卡RNA介导的RNA干涉   总被引:3,自引:0,他引:3  
利用U6启动子转录小发卡RNA介导的RNA干涉是最近发展起来的在哺乳动物细胞中特异性抑制指定基因表达的新技术,已有实验证明它在小鼠畸胎瘤P19等细胞系中具有强烈的抑制基因表达的作用。本文对COS-7细胞系中U6启动子转录GFP基因的小发卡RNA介导的RNA干涉现象进行了研究,结果表明:U6启动子转录的小发卡RNA具有RNA干涉作用,即可以在COS-7细胞中特异性地抑制含有对应序列的基因GFP的表达,这一结果为今后在COS-7细胞系中利用RNA干涉技术研究目的基因的功能奠定了基础。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号