首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 9 毫秒
1.
Identifying non-coding RNA regions on the genome using computational methods is currently receiving a lot of attention. In general, it is essentially more difficult than the problem of detecting protein-coding genes because non-coding RNA regions have only weak statistical signals. On the other hand, most functional RNA families have conserved sequences and secondary structures which are characteristic of their molecular function in a cell. These are known as sequence motifs and consensus structures, respectively. In this paper, we propose an improved method which extends a pairwise structural alignment method for RNA sequences to handle position specific scoring matrices and hence to incorporate motifs into structural alignment of RNA sequences. To model sequence motifs, we employ position specific scoring matrices (PSSMs). Experimental results show that PSSMs enable us to find individual RNA families efficiently, especially if we have biological knowledge such as sequence motifs. K. Sato and K. Morita contributed equally to this work.  相似文献   

2.

Background  

In biological sequence analysis, position specific scoring matrices (PSSMs) are widely used to represent sequence motifs in nucleotide as well as amino acid sequences. Searching with PSSMs in complete genomes or large sequence databases is a common, but computationally expensive task.  相似文献   

3.
Statistical analysis of nucleotide sequences.   总被引:1,自引:4,他引:1       下载免费PDF全文
In order to scan nucleic acid databases for potentially relevant but as yet unknown signals, we have developed an improved statistical model for pattern analysis of nucleic acid sequences by modifying previous methods based on Markov chains. We demonstrate the importance of selecting the appropriate parameters in order for the method to function at all. The model allows the simultaneous analysis of several short sequences with unequal base frequencies and Markov order k not equal to 0 as is usually the case in databases. As a test of these modifications, we show that in E. coli sequences there is a bias against palindromic hexamers which correspond to known restriction enzyme recognition sites.  相似文献   

4.
5.
Cluster-Buster: Finding dense clusters of motifs in DNA sequences   总被引:15,自引:2,他引:13       下载免费PDF全文
Frith MC  Li MC  Weng Z 《Nucleic acids research》2003,31(13):3666-3668
  相似文献   

6.
Consider the scenario of common gene clusters of closely related species where the cluster sizes could be as large as 400 from an alphabet of 25,000 genes. This paper addresses the problem of computing the statistical significance of such large clusters, whose individual elements occur with very low frequency (of the order of the number of species in this case) and the alphabet set of the elements is relatively large. We present a model where we study the structure of the clusters in terms of smaller nested (or otherwise) sub-clusters contained within the cluster. We give a probability estimation based on the expected cluster structure for such clusters (rather than some form of the product of individual probabilities of the elements). We also give an exact probability computation based on a dynamic programming algorithm, which runs in polynomial time.  相似文献   

7.
Membrane protein plays an important role in some biochemical process such as signal transduction, transmembrane transport, etc. Membrane proteins are usually classified into five types [Chou, K.C., Elrod, D.W., 1999. Prediction of membrane protein types and subcellular locations. Proteins: Struct. Funct. Genet. 34, 137-153] or six types [Chou, K.C., Cai, Y.D., 2005. J. Chem. Inf. Modelling 45, 407-413]. Designing in silico methods to identify and classify membrane protein can help us understand the structure and function of unknown proteins. This paper introduces an integrative approach, IAMPC, to classify membrane proteins based on protein sequences and protein profiles. These modules extract the amino acid composition of the whole profiles, the amino acid composition of N-terminal and C-terminal profiles, the amino acid composition of profile segments and the dipeptide composition of the whole profiles. In the computational experiment, the overall accuracy of the proposed approach is comparable with the functional-domain-based method. In addition, the performance of the proposed approach is complementary to the functional-domain-based method for different membrane protein types.  相似文献   

8.
Information about common molecular-biological approaches for the determination of the specific nucleotide sequences in genetic materials was given in the review. Main attention was paid to consideration of the ways for DNA biosensor creation. The information about the types of such biosensors was presented in detail and characteristics of the developed devices were cited. Separately the question about the use of the instrumental analytical approaches for the identification of genetic materials of individual pathogenic microorganisms was viewed.  相似文献   

9.
Restriction endonuclease cleavage analysis and blotting hybridization of nuclear DNA and RNA to cloned avian sarcoma and murine leukemia virus genes (pol, scr and abl) demonstrated the presence and expression in baker's yeast cells of retrovirus-specific sequences. The relationship exists between the pol-specific yeast sequences and Ty cloned fragments. The results obtained are discussed in the light of evolutionary role of retroviral genes in cell division control and transposition.  相似文献   

10.
A statistical analysis of the occurrence of particular nucleotide runs in DNA sequences of different species has been carried out. There are considerable differences of run distributions in DNA sequences of procaryotes, invertebrates and vertebrates. There is an abundance of short runs (1-2 nucleotides long) in the coding sequences and there is a deficiency of such runs in the noncoding regions. However, some interesting exceptions from this rule exist for the run distribution of adenine in procaryotes and for the arrangement of purine-pyrimidine runs in eucaryotes. The similarity in the distributions of such runs in the coding and noncoding regions may be due to some structural features of the DNA molecule as a whole. Runs of guanine (or cytosine) of three to six nucleotides occur predominantly in noncoding DNA regions in eucaryotes, especially in vertebrates.  相似文献   

11.
MOTIVATION: Pairwise local sequence alignment is commonly used to search data bases for sequences related to some query sequence. Alignments are obtained using a scoring matrix that takes into account the different frequencies of occurrence of the various types of amino acid substitutions. Software like BLAST provides the user with a set of scoring matrices available to choose from, and in the literature it is sometimes recommended to try several scoring matrices on the sequences of interest. The significance of an alignment is usually assessed by looking at E-values and p-values. While sequence lengths and data base sizes enter the standard calculations of significance, it is much less common to take the use of several scoring matrices on the same sequences into account. Altschul proposed corrections of the p-value that account for the simultaneous use of an infinite number of PAM matrices. Here we consider the more realistic situation where the user may choose from a finite set of popular PAM and BLOSUM matrices, in particular the ones available in BLAST. It turns out that the significance of a result can be considerably overestimated, if a set of substitution matrices is used in an alignment problem and the most significant alignment is then quoted. RESULTS: Based on extensive simulations, we study the multiple testing problem that occurs when several scoring matrices for local sequence alignment are used. We consider a simple Bonferroni correction of the p-values and investigate its accuracy. Finally, we propose a more accurate correction based on extreme value distributions fitted to the maximum of the normalized scores obtained from different scoring matrices. For various sets of matrices we provide correction factors which can be easily applied to adjust p- and E-values reported by software packages.  相似文献   

12.
13.
14.
15.
An immunological method was developed that isolates DNA fragments containing bromouracil in repair patches from unrepaired DNA using a monoclonal antibody that recognizes bromouracil. Cultured monkey cells were exposed to either UV light or the activated carcinogen aflatoxin B1 and excision repair of damage in DNA fragments containing the integrated and transcribed E. coli gpt gene was compared to that in the genome overall. A more rapid repair, of both UV and AFB1 damage was observed in the DNA fragments containing the E. coli gpt genes. The more efficient repair of UV damage was not due to a difference in the initial level of pyrimidine dimers as determined with a specific UV endonuclease. Consistent with previous observations using different methodology, repair of UV damage in the alpha sequences was found to occur at the same rate as that in the genome overall, while repair of AFB1 damage was deficient in alpha DNA. The preferential repair of damage in the gpt gene may be related to the functional state of the sequence and/or to alterations produced in the chromatin conformation by the integration of plasmid sequences carrying the gene.  相似文献   

16.
The major simple sequence repeats present in the Arabidopsis genome were identified by Southern hybridizations with 49 oligonucleotide probes matching all the possible combinations of motifs up to 4 nucleotides long. The method used allowed us to perform all the hybridizations under the same temperature conditions. A good correlation was observed with the data obtained from database analysis, indicating that the method can be useful for identifying the major classes of microsatellite loci in species for which few or no sequence data are available. AG/CT, AAG/CTT, ATG/CAT and GTG/CAC are the major motifs present in the Arabidopsis genome that can be used as convenient probes to isolate microsatellite loci by screening libraries. AAG/CTT is the more frequent of these motifs, and its relative frequency in Arabidopsis is much higher than averagely found in the plant kingdom. About 8% of the cDNA clones from an immature silique library contains AG/CT, AAG/CTT or ATG/CAT microsatellite loci. Several microsatellite loci were isolated by screening genomic and cDNA libraries. Twenty-six tri-nucleotide loci were PCR amplified from four different ecotypes, and polymorphism was observed for 12 of them; 10 loci showing two alleles and 2 loci showing three alleles.  相似文献   

17.
Suppression subtractive hybridization, a cost-effective approach for targeting unique DNA, was used to identify a 41.7-kb Yersinia pestis-specific region. One primer pair designed from this region amplified PCR products from natural isolates of Y. pestis and produced no false positives for near neighbors, an important criterion for unambiguous bacterial identification.  相似文献   

18.
There have been almost no standard methods for conducting computational analyses on glycan structures in comparison to DNA and proteins. In this paper, we present a novel method for extracting functional motifs from glycan structures using the KEGG/GLYCAN database. First, we developed a new similarity measure for comparing glycan structures taking into account the characteristic mechanisms of glycan biosynthesis, and we tested its ability to classify glycans of different blood components in the framework of support vector machines (SVMs). The results show that our method can successfully classify glycans from four types of human blood components: leukemic cells, erythrocyte, serum, and plasma. Next, we extracted characteristic functional motifs of glycans considered to be specific to each blood component. We predicted the substructure alpha-D-Neup5Ac-(2-->3)-beta-D-Galp-(1-->4)-D-GlcpNAc as a leukemia specific glycan motif. Based on the fact that the Agrocybe cylindracea galectin (ACG) specifically binds to the same substructure, we conducted an experiment using cell agglutination assay and confirmed that this fungal lectin specifically recognized human leukemic cells.  相似文献   

19.
20.
Several methods have been developed for identifying more or less complex RNA structures in a genome. All these methods are based on the search for conserved primary and secondary sub-structures. In this paper, we present a simple formal representation of a helix, which is a combination of sequence and folding constraints, as a constrained regular expression. This representation allows us to develop a well-founded algorithm that searches for all approximate matches of a helix in a genome. The algorithm is based on an alignment graph constructed from several copies of a pushdown automaton, arranged one on top of another. This is a first attempt to take advantage of the possibilities of pushdown automata in the context of approximate matching. The worst time complexity is O(krpn), where k is the error threshold, n the size of the genome, p the size of the secondary expression, and r its number of union symbols. We then extend the algorithm to search for pseudo-knots and secondary structures containing an arbitrary number of helices.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号