首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
W Saurin  P Marlière 《Biochimie》1985,67(5):517-521
A set of sequences can be defined by their common subsequences, and the length of these is a measure of the overall resemblance of the set. Each subsequence corresponds to a succession of symbols embedded in every sequence, following the same order but not necessarily contiguous. Determining the longest common subsequence (LCS) requires the exhaustive testing of all possible common subsequences, which sum up to about 2L, if L is the length of the shortest sequence. We present a polynomial algorithm (O(n X L4), where n is the number of sequences) for generating strings related to the LCS and constructed with the sequence alphabet and an indetermination symbol. Such strings are iteratively improved by deleting indetermination symbols and concomitantly introducing the greatest number of alphabet symbols. Processed accordingly, nucleic acid and protein sequences lead to key-words encompassing the salient positions of homologous chains, which can be used for aligning or classifying them, as well as for finding related sequences in data banks.  相似文献   

2.
We study properties of the symbolic sequences extracted from the fractals generated by the arc-fractal system introduced earlier by Huynh and Chew. The sequences consist of only a few symbols yet possess several nontrivial properties. First using an operator approach, we show that the sequences are not periodic, even though they are constructed from very simple rules. Second by employing the ϵ-machine approach developed by Crutchfield and Young, we measure the complexity and randomness of the sequences and show that they are indeed complex, i.e. neither periodic nor random, with the value of complexity measure being significant as compared to the known example of logistic map at the edge of chaos. The complexity and randomness of the sequences are then discussed in relation with the properties of associated fractal objects, such as their fractal dimension, symmetry and orientations of the arcs.  相似文献   

3.
Chen PC 《Bio Systems》2005,81(2):155-163
This article presents an approach for synthesizing target strings in a class of computational models of DNA recombination. The computational models are formalized as splicing systems in the context of formal languages. Given a splicing system (of a restricted type) and a target string to be synthesized, we construct (i) a rule-embedded splicing automaton that recognizes languages containing strings embedded with symbols representing splicing rules, and (ii) an automaton that implicitly recognizes the target string. By manipulating these two automata, we extract all rule sequences that lead to the production of the target string (if that string belongs to the splicing language). An algorithm for synthesizing a certain type of target strings based on such rule sequences is presented.  相似文献   

4.
Exemplar longest common subsequence   总被引:1,自引:0,他引:1  
In this paper, we investigate the computational and approximation complexity of the Exemplar Longest Common Subsequence of a set of sequences (ELCS problem), a generalization of the Longest Common Subsequence problem, where the input sequences are over the union of two disjoint sets of symbols, a set of mandatory symbols and a set of optional symbols. We show that different versions of the problem are APX-hard even for instances with two sequences. Moreover, we show that the related problem of determining the existence of a feasible solution of the Exemplar Longest Common Subsequence of two sequences is NP-hard. On the positive side, we first present an efficient algorithm for the ELCS problem over instances of two sequences where each mandatory symbol can appear in total at most three times in the sequences. Furthermore, we present two fixed-parameter algorithms for the ELCS problem over instances of two sequences where the parameter is the number of mandatory symbols.  相似文献   

5.
The language of RNA: a formal grammar that includes pseudoknots   总被引:9,自引:0,他引:9  
MOTIVATION: In a previous paper, we presented a polynomial time dynamic programming algorithm for predicting optimal RNA secondary structure including pseudoknots. However, a formal grammatical representation for RNA secondary structure with pseudoknots was still lacking. RESULTS: Here we show a one-to-one correspondence between that algorithm and a formal transformational grammar. This grammar class encompasses the context-free grammars and goes beyond to generate pseudoknotted structures. The pseudoknot grammar avoids the use of general context-sensitive rules by introducing a small number of auxiliary symbols used to reorder the strings generated by an otherwise context-free grammar. This formal representation of the residue correlations in RNA structure is important because it means we can build full probabilistic models of RNA secondary structure, including pseudoknots, and use them to optimally parse sequences in polynomial time.  相似文献   

6.
The most commonly accepted secondary structure models for 5S RNA differ for molecules of eubacterial origin, where the four-helix model of Fox and Woese is generally cited, and those of eukaryotic origin, where a fifth helix is assumed to exist. We have carefully aligned all available sequences from eukaryotes, eubacteria, chloroplasts, archaebacteria and plant mitochondria. We could thus derive a unified secondary structure model applicable to all 5S RNA sequences known to-date. It contains the five helices already present in the eukaryotic model, extended by additional segments that were not previously assumed to be universally present. One of the helices can be written in two equilibrium forms, which could reflect the existence of a flexible, dynamic structure. For the derivation of the model and the estimation of the free energies we followed a set of rules optimized to predict the tRNA cloverleaf. The stability of the unified model is higher than that of nearly all previously proposed sequence-specific and general models.  相似文献   

7.
The study of protein structure has been driven largely by the careful inspection of experimental data by human experts. However, the rapid determination of protein structures from structural-genomics projects will make it increasingly difficult to analyse (and determine the principles responsible for) the distribution of proteins in fold space by inspection alone. Here, we demonstrate a machine-learning strategy that automatically determines the structural principles describing 45 folds. The rules learnt were shown to be both statistically significant and meaningful to protein experts. With the increasing emphasis on high-throughput experimental initiatives, machine-learning and other automated methods of analysis will become increasingly important for many biological problems.  相似文献   

8.
F Rodier  J Sallantin 《Biochimie》1985,67(5):533-539
Learning processes are applied to the recognition of protein coding regions in prokaryotes. Non-contradictory, statistical and logical rules are deduced from a set of known examples of coding sequences. These rules enable to build characteristic patterns on the m-RNA upstream of the initiating codon. These rules are applied with success to recognize more than 180 coding sequences and to detect and/or eliminate hypothetical reading frames or unknown genes.  相似文献   

9.
Hundreds of thousands of putative quadruplex sequences have been found in the human genome. It is important to understand the rules that govern the stability of these intramolecular structures. In this report, we analysed sequence effects in a 3-base-long central loop, keeping the rest of the quadruplex unchanged. A first series of 36 different sequences were compared; they correspond to the general formula GGGTTTGGGHNHGGGTTTGGG. One clear rule emerged from the comparison of all sequence motifs: the presence of an adenine at the first position of the loop was significantly detrimental to stability. In contrast, adenines have no detrimental effect when present at the second or third position of the loop. Cytosines may either have a stabilizing or destabilizing effect depending on their position. In general, the correlation between the Tm or ΔG° in sodium and potassium was weak. To determine if these sequence effects could be generalized to different quadruplexes, specific loops were tested in different sequence contexts. Analysis of 26 extra sequences confirmed the general destabilizing effect of adenine as the first base of the loop(s). Finally, analysis of some of the sequences by microcalorimetry (DSC) confirmed the differences found between the sequence motifs.  相似文献   

10.
When two strings of symbols are aligned it is important to know whether the observed number of matches is better than that expected between two independent sequences with the same frequency of symbols. When strings are of different lengths, nulls need to be inserted in order to align the sequences. One approach is to use simple approximations of sampling for replacement. We describe an algorithm for exactly determining the frequencies of given numbers of matches, sampling without replacement. This does not lead to a simple closed form expression. However we show examples where sampling with, or without, replacement give very similar results and the simple approach may be adequate for all but the smallest cases.  相似文献   

11.
《L'Anthropologie》2023,127(1):103113
The Acheulean of the southern Iberian Peninsula is markedly similar to the north African Acheulean. However, the characteristics of the stone tool assemblages are heterogeneous and represent complex cultural phenomena. From MIS 15, the lithic assemblages in fluvial (Guadiana, Guadalquivir and Guadalete rivers), fluvio-lacustrine (Solana del Zamborino) and karstic (Cueva del Ángel, Bolomor, Cueva Negra del río Quípar, Cueva Horá and Santa Ana) contexts exhibit analogies and technical differences representative of a phenomenon of multiplicity. Contributing to this phenomenon is the perception of technological stasis or conservatism of the Acheulean technocomplex and the different technical responses articulated by hominins to achieve equivalent results. These equivalences generate the uniformity that allows us to recognise typologies of large cutting tools (LCTs) regardless of the lithic materials used or the organisational structures of the operational sequences. These diversified typologies include handaxes, picks, and cleavers, which maintain a consistent presence despite innovations such as the Levallois flaking method. In some cases, the presence of cleavers and spheroids affects the range of represented typologies. Beneath the uniformity of the handaxes, lie organisational differences in the operational sequences. The changes and differences in the use of flakes to shape handaxes, the representation of cleavers and diversification of shaped-tool typologies all suggest differential cultural behaviours linked in part to divergent contexts. These aspects indicate that this multiplicity is related to diffusion, adaptation and cultural changes produced at the margins of the conservatism of this technocomplex. Observed changes could indicate inter-group cultural replacements, most of which retained a similar techno-typological diversity to that seen in the north African Acheulean until MIS 5. Cyclical climate change during the Middle Pleistocene affected the Strait of Gibraltar, regulating its function and conditioning the circulation of hominins and affecting cultural interactions between southern Iberian groups.  相似文献   

12.
Sequence similarity is the most common measure currently used to infer homology between proteins. Typically, homologous protein domains show sequence similarity over their entire lengths. Here we identify Asp box motifs, initially found as repeats in sialidases and neuraminidases, in new structural and sequence contexts. These motifs represent significantly similar sequences, localized to beta hairpins within proteins that are otherwise different in sequence and three-dimensional structure. By performing a combined sequence- and structure-based analysis we detect Asp boxes in more than nine protein families, including bacterial ribonucleases, sulfite oxidases, reelin, netrins, some lipoprotein receptors, and a variety of glycosyl hydrolases. Although the function common to each of these proteins, if any, remains unclear, we discuss possible functions of Asp boxes on the basis of previously determined experimental results and discuss different evolutionary scenarios for the origin of Asp-box containing proteins.  相似文献   

13.
The primary structure of a ferredoxin isolated from D. desulfuricans Norway strain, which we called ferredoxin II (Fd II) has been elucidated. This ferredoxin is a dimer constituted of two identical subunits of molecular weight 6000. In ferredoxin II two (4 Fe-4 S) centers are present per subunit instead of one (Fe-S) center as is the case for the other ferredoxins isolated from Desulfovibrio and for Fd I from the same organism. The comparison of amino-acid sequences shows that ferredoxin II presents more homologies with clostridial type ferredoxin than with the ferredoxins from D. gigas and D. africanus.  相似文献   

14.
Through multiple sequence alignment and phylogenetic analysis, the subgrouping of the crustacean hyperglycemic hormone (CHH) family was updated using the most complete, nonredundant sequence data set. All sequences from insects were clustered into a distinct subbranch with characters closer to CHH subfamily I. Several sequences that are controversial in their nomenclature and classification are discussed. The motif configuration of CHHs differs from that of molt-inhibiting hormone or gonad-inhibiting hormone in both N and C termini. These two motifs approach each other in tertiary structure models, and the motif preference reveals the critical roles of these regions in functional specificity. Two types of exon organizations of the CHH family genes were observed. Four-exon Chh genes were found in a wide range of pan-crustacean (crustacean and hexapod) taxa, except for the penaeid species, from which the 3-exon Chh genes were reported. Meanwhile, the 3-exon structure was found in the Mih gene and Moih genes from one brachyuran species. Combining gene scan skill and exon splicing rules found in this study, we define three more novel sequences from two insect genomes. The pattern of the exon-exon junction within the mature peptide segment is preserved in all CHH family members.The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint first authors.  相似文献   

15.
16.
17.
18.
19.
蛋白质序列中的关联规则发现及其应用   总被引:2,自引:0,他引:2  
随着蛋白质序列-结构分析中使用的机器学习算法越来越复杂,其结果的解释和发现过程也随之复杂化,因此有必要寻找简单且理论上可靠的方法。通过引入原理简单、理论可靠、结果具有很强实际意义的关联规则发现算法,找到了蛋白质序列中数以万计的模式。结合实例演示了如何将这些模式应用于蛋白质序列分析中,如保守区域发现、二级结构预测等。同时根据这些结果构建了一个二级结构规则库和一种简单的二级结构预测算法,实验结果表明,约81%的二级结构可以由至少一条关联规则预测得到。  相似文献   

20.
It is commonly believed that similarities between the sequences of two proteins infer similarities between their structures. Sequence alignments reliably recognize pairs of protein of similar structures provided that the percentage sequence identity between their two sequences is sufficiently high. This distinction, however, is statistically less reliable when the percentage sequence identity is lower than 30% and little is known then about the detailed relationship between the two measures of similarity. Here, we investigate the inverse correlation between structural similarity and sequence similarity on 12 protein structure families. We define the structure similarity between two proteins as the cRMS distance between their structures. The sequence similarity for a pair of proteins is measured as the mean distance between the sequences in the subsets of sequence space compatible with their structures. We obtain an approximation of the sequence space compatible with a protein by designing a collection of protein sequences both stable and specific to the structure of that protein. Using these measures of sequence and structure similarities, we find that structural changes within a protein family are linearly related to changes in sequence similarity.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号