首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We study an algorithm which allows sequences of binary numbers (strings) to interact with each other. The simplest system of this kind with a population of 4-bit sequences is considered here. Previously proposed folding methods are used to generate alternative two-dimensional forms of the binary sequences. The interaction of two-dimensional and one-dimensional forms of strings is simulated in a serial computer. The reaction network for the N = 4 system is established. Development of string populations initially generated randomly is observed. Nonlinear rate equations are proposed which provide a model for this simplest system.Dedicated to Professor Hermann Haken on the occasion of his 65th Birthday  相似文献   

2.
Chen PC 《Bio Systems》2005,81(2):155-163
This article presents an approach for synthesizing target strings in a class of computational models of DNA recombination. The computational models are formalized as splicing systems in the context of formal languages. Given a splicing system (of a restricted type) and a target string to be synthesized, we construct (i) a rule-embedded splicing automaton that recognizes languages containing strings embedded with symbols representing splicing rules, and (ii) an automaton that implicitly recognizes the target string. By manipulating these two automata, we extract all rule sequences that lead to the production of the target string (if that string belongs to the splicing language). An algorithm for synthesizing a certain type of target strings based on such rule sequences is presented.  相似文献   

3.
An oligopurine sequence bias occurs in eukaryotic viruses.   总被引:10,自引:6,他引:4  
Twenty four DNA and RNA viral nucleotide sequences, comprising over 346 kilobases, have been analyzed for the occurrence of strings of contiguous purine or pyrimidine residues. On average strings greater than or equal to 10 contiguous purines or pyrimidines are found three and a half times more frequently than would be expected for a random distribution of bases. Detailed analysis of the 172 kilobase Epstein-Barr viral sequence shows that the bias in favor of contiguous purine residues increases with the length of the purine string. These findings are similar to those seen for genomic DNA from higher eukaryotes. In contrast no overrepresentation of oligopurine or oligopyrimidine strings is observed in 52 kilobases from eight bacteriophage and E. coli DNA sequences.  相似文献   

4.
5.
Messenger RNA sequences possess specific nucleotide patterns distinguishing them from non-coding genomic sequences. In this study, we explore the utilization of modified Markov models to analyze sequences up to 44 bp, far beyond the 8-bp limit of conventional Markov models, for exon/intron discrimination. In order to analyze nucleotide sequences of this length, their information content is first reduced by conversion into shorter binary patterns via the application of numerous abstraction schemes. After the conversion of genomic sequences to binary strings, homogenous Markov models trained on the binary sequences are used to discriminate between exons and introns. We term this approach the Binary Abstraction Markov Model (BAMM). High-quality abstraction schemes for exon/intron discrimination are selected using optimization algorithms on supercomputers. The best MM classifiers are then combined using support vector machines into a single classifier. With this approach, over 95% classification accuracy is achieved without taking reading frame into account. With further development, the BAMM approach can be applied to sequences lacking the genetic code such as ncRNAs and 5'-untranslated regions.  相似文献   

6.
Many primate populations exhibit forms of organization that are both complex and highly dynamic. A prototype of a general purpose primate population computer modelling system has been developed; this modelling system provides data structures and operators that facilitate computer representation of many static and dynamic features of primate population organization. In this system, primate group structures are represented by text strings known as key strings. A key string begins with a label or key character that identifies its population element type. The label character is followed by data fields contained between bounds marker characters. Nested key strings can be used to concisely represent many of the structural features of social groups in different primate species. Changes in group structures are accomplished by key string insertion, deletion and move operations. Models of structures and processes in island, rhesus monkey and hamadryas baboon populations built with this prototype modelling system are discussed. In these pilot applications, use of key string data structures and operators greatly simplifies many aspects of model construction.  相似文献   

7.
The language of RNA: a formal grammar that includes pseudoknots   总被引:9,自引:0,他引:9  
MOTIVATION: In a previous paper, we presented a polynomial time dynamic programming algorithm for predicting optimal RNA secondary structure including pseudoknots. However, a formal grammatical representation for RNA secondary structure with pseudoknots was still lacking. RESULTS: Here we show a one-to-one correspondence between that algorithm and a formal transformational grammar. This grammar class encompasses the context-free grammars and goes beyond to generate pseudoknotted structures. The pseudoknot grammar avoids the use of general context-sensitive rules by introducing a small number of auxiliary symbols used to reorder the strings generated by an otherwise context-free grammar. This formal representation of the residue correlations in RNA structure is important because it means we can build full probabilistic models of RNA secondary structure, including pseudoknots, and use them to optimally parse sequences in polynomial time.  相似文献   

8.
MOTIVATION: The availability of the whole genomic sequences of HIV-1 viruses provides an excellent resource for studying the HIV-1 phylogenies using all the genetic materials. However, such huge volumes of data create computational challenges in both memory consumption and CPU usage. RESULTS: We propose the complete composition vector representation for an HIV-1 strain, and a string scoring method to extract the nucleotide composition strings that contain the richest evolutionary information for phylogenetic analysis. In this way, a large-scale whole genome phylogenetic analysis for thousands of strains can be done both efficiently and effectively. By using 42 carefully curated strains as references, we apply our method to subtype 1156 HIV-1 strains (10.5 million nucleotides in total), which include 825 pure subtype strains and 331 recombinants. Our results show that our nucleotide composition string selection scheme is computationally efficient, and is able to define both pure subtypes and recombinant forms for HIV-1 strains using the 5000 top ranked nucleotide strings. AVAILABILITY: The Java executable and the HIV-1 datasets are accessible through 'http://www.cs.ualberta.ca/~ghlin/src/WebTools/hiv.php. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

9.
Two approaches to the understanding of biological sequences are confronted. While the recognition of particular signals in sequences relies on complex physical interactions, the problem is often analysed in terms of the presence or absence of literal motifs (strings) in the sequence. We present here a test-case for evaluating the potential of this approach. We classify DNA sequences as positive or negative depending on whether they contain a single melted domain in the middle of the sequence, which is a global physical property. Two sets of positive "biological" sequences were generated by a computer simulation of evolutionary divergence along the branches of a phylogenetic tree, under the constraint that each intermediate sequence be positive. These two sets and a set of random positive sequences were subjected to pattern analysis. The observed local patterns were used to construct expert systems to discriminate positive from negative sequences. The experts achieved 79% to 90% success on random positive sequences and up to 99% on the biological sets, while making less than 2% errors on negative sequences. Thus, the global constraints imposed on sequences by a physical process may generate local patterns that are sufficient to predict, with a reasonable probability, the behaviour of the sequences. However, rather large sets of biological sequences are required to generate patterns free of illegitimate constraints. Furthermore, depending upon the initial sequence, the sets of sequences generated on a phylogenetic tree may be amenable or refractory to string analysis, while obeying identical physical constraints. Our study clarifies the relationship between experts' errors on positive and negative sequences, and the contributions of legitimate and illegitimate patterns to these errors. The test-case appears suitable both for further investigations of problems in the theory of sequence evolution and for further testing of pattern analysis techniques.  相似文献   

10.
X染色体发生X染色体失活 ,但是Xp基因有 30 %表现为逃逸 ,而Xq仅不到 3%。为了研究X染色体基因失活和表达逃逸发生和维持的分子机制 ,比较了Xq和XpDNA序列的RNA模拟结合强度。X染色体的核苷酸序列被分为 5 0kb一段 ,对每一段DNA做 7碱基 (7nt)字符串组合分析 (共有 4 7=16 384种组合 ) ,记录每段 5 0kbDNA中每种 7nt字符串的频率。选择生发中心B细胞中的 12 0个高表达基因 ,计算这些基因的内含子 7nt字符串的出现频率 ,称为intron 7nt,以此作为RNAs(RNA群 ,模拟细胞中RNA在小片段的总和 )。已知一段DNA序列的 7nt频率值和intron 7nt,即可以计算该DNA段与intron 7nt的结合强度。每段 5 0kbDNA与intron 7nt的结合强度取决于该DNA段与intron 7nt互补核苷酸的频率 ,互补的核苷酸序列越多 ,结合强度就越大。DNA段与intron 7nt的模拟结合强度称为RNA结合强度 ,试图模拟该段DNA可以结合的RNA小片段的总量。之所以采用 7nt字符串组合分析是考虑到连续 7个核苷酸互补则可以形成相对稳定的结合。研究发现 :1)Xp各DNA段的RNA结合强度均值显著大于Xq (P <0 0 0 1) ;2 )Xp上高结合RNA的DNA段数目显著高于Xq (P <0 0 0 1) ;3)RNA高结合DNA段形成的簇与X染色体基因表达逃逸区关联。有证据表明 ,RNA可以通过改变染色质  相似文献   

11.
12.
Efficient methods for multiple sequence alignment with guaranteed error bounds   总被引:11,自引:0,他引:11  
Multiple string (sequence) alignment is a difficult and important problem in computational biology, where it is central in two related tasks: finding highly conserved subregions or embedded patterns of a set of biological sequences (strings of DNA, RNA or amino acids), and inferring the evolutionary history of a set of taxa from their associated biological sequences. Several precise measures have been proposed for evaluating the goodness of a multiple alignment, but no efficient methods are known which compute the optimal alignment for any of these measures in any but small cases. In this paper, we consider two previously proposed measures, and given two computationaly efficient multiple alignment methods (one for each measure) whose deviation from the optimal value isguaranteed to be less than a factor of two. This is the novel feature of these methods, but the methods have additional virtues as well. For both methods, the guaranteed bounds are much smaller than two when the number of strings is small (1.33 for three strings of any length); for one of the methods we give a related randomized method which is much faster and which gives, with high probability, multiple alignments with fairly small error bounds; and for the other measure, the method given yields a non-obviouslower bound on the value of the optimal alignment.  相似文献   

13.
We present two parameterized algorithms for the closest string problem. The first runs in O(nL + nd · 17.97d) time for DNA strings and in O(nL + nd · 61.86d) time for protein strings, where n is the number of input strings, L is the length of each input string, and d is the given upper bound on the number of mismatches between the center string and each input string. The second runs in O(nL + nd · 13.92d) time for DNA strings and in O(nL + nd · 47.21d) time for protein strings. We then extend the first algorithm to a new parameterized algorithm for the closest substring problem that runs in O((n - 1)m2(L + d · 17.97d · m[log2(d+1)])) time for DNA strings and in O((n - 1)m2(L + d · 61.86d · m[log2(d+1)])) time for protein strings, where n is the number of input strings, L is the length of the center substring, L - 1 + m is the maximum length of a single input string, and d is the given upper bound on the number of mismatches between the center substring and at least one substring of each input string. All the algorithms significantly improve the previous bests. To verify experimentally the theoretical improvements in the time complexity, we implement our algorithm in C and apply the resulting program to the planted (L, d)-motif problem proposed by Pevzner and Sze in 2000. We compare our program with the previously best exact program for the problem, namely PMSPrune (designed by Davila et al. in 2007). Our experimental data show that our program runs faster for practical cases and also for several challenging cases. Our algorithm uses less memory too.  相似文献   

14.
Key string algorithm (KSA) could be viewed as robust computational generalization of restriction enzyme method. KSA enables robust and effective identification and structural analyzes of any given genomic sequences, like in the case of NCBI assembly for human genome. We have developed a method, using total frequency distribution of all r-bp key strings in dependence on the fragment length l, to determine the exact size of all repeats within the given genomic sequence, both of monomeric and HOR type. Subsequently, for particular fragment lengths equal to each of these repeat sizes we compute the partial frequency distribution of r-bp key strings; the key string with highest frequency is a dominant key string, optimal for segmentation of a given genomic sequence into repeat units. We illustrate how a wide class of 3-bp key strings leads to a key-string-dependent periodic cell which enables a simple identification and consensus length determinations of HORs, or any other highly convergent repeat of monomeric or HOR type, both tandem or dispersed. We illustrated KSA application for HORs in human genome and determined consensus HORs in the Build 35.1 assembly. In the next step we compute suprachromosomal family classification and CENP-B box / pJalpha distributions for HORs. In the case of less convergent repeats, like for example monomeric alpha satellite (20-40% divergence), we searched for optimal compact key string using frequency method and developed a concept of composite key string (GAAAC--CTTTG) or flexible relaxation (28 bp key string) which provides both monomeric alpha satellites as well as alpha monomer segmentation of internal HOR structure. This method is convenient also for study of R-strand (direct) / S-strand (reverse complement) alpha monomer alternations. Using KSA we identified 16 alternating regions of R-strand and S-strand monomers in one contig in choromosome 7. Use of CENP-B box and/or pJalpha motif as key string is suitable both for identification of HORs and monomeric pattern as well as for studies of CENP-B box / pJalpha distribution. As an example of application of KSA to sequences outside of HOR regions we present our finding of a tandem with highly convergent 3434-bp Long monomer in chromosome 5 (divergence less then 0.3%).  相似文献   

15.
We consider the problem of finding the optimal combination of string patterns, which characterizes a given set of strings that have a numeric attribute value assigned to each string. Pattern combinations are scored based on the correlation between their occurrences in the strings and the numeric attribute values. The aim is to find the combination of patterns which is best with respect to an appropriate scoring function. We present an O(N2) time algorithm for finding the optimal pair of substring patterns combined with Boolean functions, where N is the total length of the sequences. The algorithm looks for all possible Boolean combinations of the patterns, e.g., patterns of the form p nland notq, which indicates that the pattern pair is considered to occur in a given string s, if p occurs in s, and q does not occur in s. An efficient implementation using suffix arrays is presented, and we further show that the algorithm can be adapted to find the best k-pattern Boolean combination in O(Nk) time. The algorithm is applied to mRNA sequence data sets of moderate size combined with their turnover rates for the purpose of finding regulatory elements that cooperate, complement, or compete with each other in enhancing and/or silencing mRNA decay  相似文献   

16.
Ribonucleic Acid (RNA) structures can be viewed as a special kind of strings where characters in a string can bond with each other. The question of aligning two RNA structures has been studied for a while, and there are several successful algorithms that are based upon different models. In this paper, by adopting the model introduced in Wang and Zhang,(19) we propose two algorithms to attack the question of aligning multiple RNA structures. Our methods are to reduce the multiple RNA structure alignment problem to the problem of aligning two RNA structure alignments. Meanwhile, we will show that the framework of sequence center star alignment algorithm can be applied to the problem of multiple RNA structure alignment, and if the triangle inequality is met in the scoring matrix, the approximation ratio of the algorithm remains to be 2-2(over)n, where n is the total number of structures.  相似文献   

17.
In all eukaryotes, the ribosomal RNA genes are stably inherited redundant elements. In Drosophila melanogaster, the presence of a Ybb(-) chromosome in males, or the maternal presence of the Ribosomal exchange (Rex) element, induces magnification: a heritable increase of rDNA copy number. To date, several alternative classes of mechanisms have been proposed for magnification: in situ replication or extra-chromosomal replication, either of which might act on short or extended strings of rDNA units, or unequal sister chromatid exchange. To eliminate some of these hypotheses, none of which has been clearly proven, we examined molecular-variant composition and compared genetic maps of the rDNA in the bb(2) mutant and in some magnified bb(+) alleles. The genetic markers used are molecular-length variants of IGS sequences and of R1 and R2 mobile elements present in many 28S sequences. Direct comparison of PCR products does not reveal any particularly intensified electrophoretic bands in magnified alleles compared to the nonmagnified bb(2) allele. Hence, the increase of rDNA copy number is diluted among multiple variants. We can therefore reject mechanisms of magnification based on multiple rounds of replication of short strings. Moreover, we find no changes of marker order when pre- and postmagnification maps are compared. Thus, we can further restrict the possible mechanisms to two: replication in situ of an extended string of rDNA units or unequal exchange between sister chromatids.  相似文献   

18.
Nickel-chelating lipid monolayers were used to generate two-dimensional crystals from yeast RNA polymerase I that was histidine-tagged on one of its subunits. The interaction of the enzyme with the spread lipid layers was found to be imidazole dependent, and the formation of two-dimensional crystals required small amounts of imidazole, probably to select the specific interaction of the engineered tag with the nickel. Two distinct preparations of RNA polymerase I tagged on different subunits yielded two different crystal forms, indicating that the position of the tag determines the crystallization process. The orientation of the enzyme in both crystal forms is correlated with the location of the tagged subunits in a three-dimensional model which shows that the tagged subunits are in contact with the lipid layer.  相似文献   

19.
外源RNA干涉基因在烟草中的转化及表达   总被引:1,自引:0,他引:1  
依据RNA干涉机制,以TMV复制酶基因为靶标基因,针对TMV 5个株系复制酶基因间高度同源序列设计引物,经RT-PCR反应获得靶序列,构建靶序列反向重复结构的RNA干涉双元载体.用根癌农杆菌介导将外源基因转化至烟草品种K326基因组中,培育RNA干涉转基因烟草.人工接种病毒验证转基因烟草中外源基因在植物抗病毒能力方面的表达效果,实时荧光定量PCR分析转基因烟草抗病毒能力.结果表明,实验培育的RNA干涉转基因烟草67%对TMV呈现高度抗性;荧光定量PCR分析显示,对TMV具高度抗性的转基因烟草中病毒复制酶基因转录产物mRNA存在很大程度的降解,证实了RNA干涉技术在培育抗病毒烟草品种中的效果.  相似文献   

20.
Suzuki H 《Bio Systems》2003,69(2-3):211-221
As an example of the optimization of an evolutionary system design, a string rewriting system is studied. A set of rewriting rules that defines the growth of a string is experimentarily optimized in terms of maximizing the 'replicative capacity', that is the occurrence ratio of self-replicating strings. It is shown that the most optimized rule set allows many strings to self-replicate by using a special character able to copy an original string sequentially. Then, using various different rewriting rule sets, the connectivity between self-replicating strings is studied. A set of 'hyperblobs' covering the self-replicating strings is extracted and their connectivity is studied. The experimental results show that a large replicative capacity assures strong connectivity between self-replicating genotypes, making the system highly evolvable.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号