首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 0 毫秒
Identifying non-coding RNA regions on the genome using computational methods is currently receiving a lot of attention. In general, it is essentially more difficult than the problem of detecting protein-coding genes because non-coding RNA regions have only weak statistical signals. On the other hand, most functional RNA families have conserved sequences and secondary structures which are characteristic of their molecular function in a cell. These are known as sequence motifs and consensus structures, respectively. In this paper, we propose an improved method which extends a pairwise structural alignment method for RNA sequences to handle position specific scoring matrices and hence to incorporate motifs into structural alignment of RNA sequences. To model sequence motifs, we employ position specific scoring matrices (PSSMs). Experimental results show that PSSMs enable us to find individual RNA families efficiently, especially if we have biological knowledge such as sequence motifs. K. Sato and K. Morita contributed equally to this work.  相似文献   



In biological sequence analysis, position specific scoring matrices (PSSMs) are widely used to represent sequence motifs in nucleotide as well as amino acid sequences. Searching with PSSMs in complete genomes or large sequence databases is a common, but computationally expensive task.  相似文献   

Cluster-Buster: Finding dense clusters of motifs in DNA sequences   总被引:15,自引:2,他引:13       下载免费PDF全文
Frith MC  Li MC  Weng Z 《Nucleic acids research》2003,31(13):3666-3668

Consider the scenario of common gene clusters of closely related species where the cluster sizes could be as large as 400 from an alphabet of 25,000 genes. This paper addresses the problem of computing the statistical significance of such large clusters, whose individual elements occur with very low frequency (of the order of the number of species in this case) and the alphabet set of the elements is relatively large. We present a model where we study the structure of the clusters in terms of smaller nested (or otherwise) sub-clusters contained within the cluster. We give a probability estimation based on the expected cluster structure for such clusters (rather than some form of the product of individual probabilities of the elements). We also give an exact probability computation based on a dynamic programming algorithm, which runs in polynomial time.  相似文献   

Membrane protein plays an important role in some biochemical process such as signal transduction, transmembrane transport, etc. Membrane proteins are usually classified into five types [Chou, K.C., Elrod, D.W., 1999. Prediction of membrane protein types and subcellular locations. Proteins: Struct. Funct. Genet. 34, 137-153] or six types [Chou, K.C., Cai, Y.D., 2005. J. Chem. Inf. Modelling 45, 407-413]. Designing in silico methods to identify and classify membrane protein can help us understand the structure and function of unknown proteins. This paper introduces an integrative approach, IAMPC, to classify membrane proteins based on protein sequences and protein profiles. These modules extract the amino acid composition of the whole profiles, the amino acid composition of N-terminal and C-terminal profiles, the amino acid composition of profile segments and the dipeptide composition of the whole profiles. In the computational experiment, the overall accuracy of the proposed approach is comparable with the functional-domain-based method. In addition, the performance of the proposed approach is complementary to the functional-domain-based method for different membrane protein types.  相似文献   

Information about common molecular-biological approaches for the determination of the specific nucleotide sequences in genetic materials was given in the review. Main attention was paid to consideration of the ways for DNA biosensor creation. The information about the types of such biosensors was presented in detail and characteristics of the developed devices were cited. Separately the question about the use of the instrumental analytical approaches for the identification of genetic materials of individual pathogenic microorganisms was viewed.  相似文献   

MOTIVATION: Pairwise local sequence alignment is commonly used to search data bases for sequences related to some query sequence. Alignments are obtained using a scoring matrix that takes into account the different frequencies of occurrence of the various types of amino acid substitutions. Software like BLAST provides the user with a set of scoring matrices available to choose from, and in the literature it is sometimes recommended to try several scoring matrices on the sequences of interest. The significance of an alignment is usually assessed by looking at E-values and p-values. While sequence lengths and data base sizes enter the standard calculations of significance, it is much less common to take the use of several scoring matrices on the same sequences into account. Altschul proposed corrections of the p-value that account for the simultaneous use of an infinite number of PAM matrices. Here we consider the more realistic situation where the user may choose from a finite set of popular PAM and BLOSUM matrices, in particular the ones available in BLAST. It turns out that the significance of a result can be considerably overestimated, if a set of substitution matrices is used in an alignment problem and the most significant alignment is then quoted. RESULTS: Based on extensive simulations, we study the multiple testing problem that occurs when several scoring matrices for local sequence alignment are used. We consider a simple Bonferroni correction of the p-values and investigate its accuracy. Finally, we propose a more accurate correction based on extreme value distributions fitted to the maximum of the normalized scores obtained from different scoring matrices. For various sets of matrices we provide correction factors which can be easily applied to adjust p- and E-values reported by software packages.  相似文献   

Restriction endonuclease cleavage analysis and blotting hybridization of nuclear DNA and RNA to cloned avian sarcoma and murine leukemia virus genes (pol, scr and abl) demonstrated the presence and expression in baker's yeast cells of retrovirus-specific sequences. The relationship exists between the pol-specific yeast sequences and Ty cloned fragments. The results obtained are discussed in the light of evolutionary role of retroviral genes in cell division control and transposition.  相似文献   

A statistical analysis of the occurrence of particular nucleotide runs in DNA sequences of different species has been carried out. There are considerable differences of run distributions in DNA sequences of procaryotes, invertebrates and vertebrates. There is an abundance of short runs (1-2 nucleotides long) in the coding sequences and there is a deficiency of such runs in the noncoding regions. However, some interesting exceptions from this rule exist for the run distribution of adenine in procaryotes and for the arrangement of purine-pyrimidine runs in eucaryotes. The similarity in the distributions of such runs in the coding and noncoding regions may be due to some structural features of the DNA molecule as a whole. Runs of guanine (or cytosine) of three to six nucleotides occur predominantly in noncoding DNA regions in eucaryotes, especially in vertebrates.  相似文献   

Suppression subtractive hybridization, a cost-effective approach for targeting unique DNA, was used to identify a 41.7-kb Yersinia pestis-specific region. One primer pair designed from this region amplified PCR products from natural isolates of Y. pestis and produced no false positives for near neighbors, an important criterion for unambiguous bacterial identification.  相似文献   

There have been almost no standard methods for conducting computational analyses on glycan structures in comparison to DNA and proteins. In this paper, we present a novel method for extracting functional motifs from glycan structures using the KEGG/GLYCAN database. First, we developed a new similarity measure for comparing glycan structures taking into account the characteristic mechanisms of glycan biosynthesis, and we tested its ability to classify glycans of different blood components in the framework of support vector machines (SVMs). The results show that our method can successfully classify glycans from four types of human blood components: leukemic cells, erythrocyte, serum, and plasma. Next, we extracted characteristic functional motifs of glycans considered to be specific to each blood component. We predicted the substructure alpha-D-Neup5Ac-(2-->3)-beta-D-Galp-(1-->4)-D-GlcpNAc as a leukemia specific glycan motif. Based on the fact that the Agrocybe cylindracea galectin (ACG) specifically binds to the same substructure, we conducted an experiment using cell agglutination assay and confirmed that this fungal lectin specifically recognized human leukemic cells.  相似文献   

DNA restriction fragments, 120-650 base pairs (bp) in length, with 5'-GCGC-3', 5'-GGCC-3' or 3'-GCGC-5' single-stranded overhanging termini, give rise to diffuse bands of unusual electrophoretic mobility in non-denaturing polyacrylamide gels. This shift in electrophoretic mobility can be observed at 4-12 degreesC, not at higher temperatures, but is stabilized by 5-10 mM Mg2+, even at 37 degreesC. The nucleotide sequence in the abutting double-stranded part of the fragment does not affect this phenomenon, which is not caused by dimerization. The altered mobility may be due to the unusual terminal DNA structure, which is dependent on co-operative interactions among more than two neighboring G and C residues. The structure is stabilized by cytidine methylation. The biological role of such fragment structures in DNA repair and recombination is presently unknown.  相似文献   

Determination of nucleotide sequences in DNA   总被引:6,自引:0,他引:6  

The distribution of RNA motifs in natural sequences.   总被引:2,自引:3,他引:2       下载免费PDF全文
Functional analysis of genome sequences has largely ignored RNA genes and their structures. We introduce here the notion of 'ribonomics' to describe the search for the distribution of and eventually the determination of the physiological roles of these RNA structures found in the sequence databases. The utility of this approach is illustrated here by the identification in the GenBank database of RNA motifs having known binding or chemical activity. The frequency of these motifs indicates that most have originated from evolutionary drift and are selectively neutral. On the other hand, their distribution among species and their location within genes suggest that the destiny of these motifs may be more elaborate. For example, the hammerhead motif has a skewed organismal presence, is phylogenetically stable and recent work on a schistosome version confirms its in vivo biological activity. The under-representation of the valine-binding motif and the Rev-binding element in GenBank hints at a detrimental effect on cell growth or viability. Data on the presence and the location of these motifs may provide critical guidance in the design of experiments directed towards the understanding and the manipulation of RNA complexes and activities in vivo.  相似文献   

The use of Chaos Game Representation (CGR) or its generalization, Universal Sequence Maps (USM), to describe the distribution of biological sequences has been found objectionable because of the fractal structure of that coordinate system. Consequently, the investigation of distribution of symbolic motifs at multiple scales is hampered by an inexact association between distance and sequence dissimilarity. A solution to this problem could unleash the use of iterative maps as phase-state representation of sequences where its statistical properties can be conveniently investigated. In this study a family of kernel density functions is described that accommodates the fractal nature of iterative function representations of symbolic sequences and, consequently, enables the exact investigation of sequence motifs of arbitrary lengths in that scale-independent representation. Furthermore, the proposed kernel density includes both Markovian succession and currently used alignment-free sequence dissimilarity metrics as special solutions. Therefore, the fractal kernel described is in fact a generalization that provides a common framework for a diverse suite of sequence analysis techniques.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号