共查询到20条相似文献,搜索用时 0 毫秒
1.
James A. Lake 《Journal of molecular evolution》1987,26(1-2):59-73
Summary Operator metrics are explicity designed to measure evolutionary distances from nucleic acid sequences when substitution rates differ greatly among the organisms being compared, or when substitutions have been extensive. Unlike lengths calculated by the distance matrix and parsimony methods, in which substitutions in one branch of a tree can alter the measured length of another branch, lengths determined by operator metrics are not affected by substitutions outside the branch.In the method, lengths (operator metrics) corresponding to each of the branches of an unrooted tree are calculated. The metric length of a branch reconstructs the number of (transversion) differences between sequences at a tip and a node (or between nodes) of a tree. The theory is general and is fundamentally independent of differences in substitution rates among the organisms being compared. Mathematically, the independence has been obtained becuase the metrics are eigen vectors of fundamental equations which describe the evolution of all unrooted trees.Even under conditions when both the distance matrix method or a simple parsimony length method are show to indicate lengths than are an order of magnitude too large or too small, the operator metrics are accurate. Examples, using data calculated with evolutionary rates and branchings designed to confuse the measurement of branch lengths and to camouflage the topology of the true tree, demonstrate the validity of operator metrics. The method is robust. Operator metric distances are easy to calculated, can be extended to any number of taxa, and provide a statistical estimate of their variances.The utility of the method is demonstrated by using it to analyze the origins and evolutionary of chloroplasts, mitochondria, and eubacteria. 相似文献
2.
Design of nucleic acid sequences for DNA computing based on a thermodynamic approach 总被引:14,自引:0,他引:14
下载免费PDF全文

We have developed an algorithm for designing multiple sequences of nucleic acids that have a uniform melting temperature between the sequence and its complement and that do not hybridize non-specifically with each other based on the minimum free energy (ΔGmin). Sequences that satisfy these constraints can be utilized in computations, various engineering applications such as microarrays, and nano-fabrications. Our algorithm is a random generate-and-test algorithm: it generates a candidate sequence randomly and tests whether the sequence satisfies the constraints. The novelty of our algorithm is that the filtering method uses a greedy search to calculate ΔGmin. This effectively excludes inappropriate sequences before ΔGmin is calculated, thereby reducing computation time drastically when compared with an algorithm without the filtering. Experimental results in silico showed the superiority of the greedy search over the traditional approach based on the hamming distance. In addition, experimental results in vitro demonstrated that the experimental free energy (ΔGexp) of 126 sequences correlated well with ΔGmin (|R| = 0.90) than with the hamming distance (|R| = 0.80). These results validate the rationality of a thermodynamic approach. We implemented our algorithm in a graphic user interface-based program written in Java. 相似文献
3.
4.
Base sequence influences the structure, mechanics, dynamics, and interactions of nucleic acids. However, studying all possible sequences for a given fragment leads to a number of base combinations that increases exponentially with length. We present here a novel methodology based on a multi-copy approach enabling us to determine which base sequence favors a given structural change or interaction via a single energy minimization. This methodology, termed ADAPT, has been implemented starting from the JUMNA molecular mechanics program by adding special nucleotides, "lexides," containing all four bases, whose contribution to the energy of the system is weighted by continuously variable coefficients. We illustrate the application of this approach in the case of double-stranded DNA by determining the optimal sequences satisfying structural (B-Z transition), mechanical (intrinsic curvature), and interaction (ligand-binding) properties. 相似文献
5.
Estimation of evolutionary distances between nucleotide sequences 总被引:11,自引:0,他引:11
Andrey Zharkikh 《Journal of molecular evolution》1994,39(3):315-329
A formal mathematical analysis of the substitution process in nucleotide sequence evolution was done in terms of the Markov process. By using matrix algebra theory, the theoretical foundation of Barry and Hartigan's (Stat. Sci. 2:191–210, 1987) and Lanave et al.'s (J. Mol. Evol. 20:86–93, 1984) methods was provided. Extensive computer simulation was used to compare the accuracy and effectiveness of various methods for estimating the evolutionary distance between two nucleotide sequences. It was shown that the multiparameter methods of Lanave et al.'s (J. Mol. Evol. 20:86–93, 1984), Gojobori et al.'s (J. Mol. Evol. 18:414–422, 1982), and Barry and Hartigan's (Stat. Sci. 2:191–210, 1987) are preferable to others for the purpose of phylogenetic analysis when the sequences are long. However, when sequences are short and the evolutionary distance is large, Tajima and Nei's (Mol. Biol. Evol. 1:269–285, 1984) method is superior to others. 相似文献
6.
Sampling strategies for distances between DNA sequences 总被引:2,自引:0,他引:2
An international effort is now underway to obtain the DNA sequence for the entire human genome (Watson and Jordan, 1989, Genomics 5, 654-656; Barnhart, 1989, Genomics 5, 657-660). This Human Genome Initiative will generate sequence data from several species other than humans, and will result in several copies per species of at least some regions of the genome. Although the project has generated much interest, it is but one aspect of the widespread effort to generate DNA sequence data. Published sequences are collected in common databases, and release 63 of GenBank in March 1990 contained 40,127,752 bases from 33,337 reported sequences (News from GenBank 3; Mountain View, California: Intelligenetics, Inc., 1990). Large though this database is, it is only about 1% of the number of bases in the human genome. Interpretations of data of such magnitude are going to require the collaborative efforts of biometricians and molecular biologists, and an aim of this paper is to show that there is also a role for readers of this journal in the design of surveys of DNA sequences. Discussion here will center on the use of sequence data in evolutionary studies, where some region of DNA is sequenced in several different species. The object is to infer the evolutionary history of that particular region, or of the species themselves. Statistical issues in the very important studies on sequences to locate and characterize regions responsible for human diseases will not be addressed here. We will discuss appropriate ways of measuring distances between DNA sequences and of predicting the sampling properties of the distances. There are procedures for inferring evolutionary histories for a set of elements that depend on a matrix of distances between each pair of elements, and the precision of resulting trees must be influenced by the precision of the distances. We will show that account needs to be taken of two sampling processes--the sampling of sequences by the investigator ("statistical sampling"), and the sampling of genetic material involved in the formation of offspring from a parental population ("genetic sampling"). 相似文献
7.
When, in a nucleic acid sequence, the four letters C, G, A, T (or U) are replaced by suitable graphical symbols, some patterns become immediately apparent. Two sets of symbols, constructed for the analysis of either purine/pyrimidine alternations, or of regions of complementarity within a sequence are shown. In addition, another mode of coding is presented, in which the four letters are represented by vectors. The sequence is thus transformed into a planar trajectory. We show, in the case of the gene for human beta hemoglobin, that such a coding enables an easy discrimination between introns and exons. 相似文献
8.
9.
Recent progress in synthetic and computational chemistry has made it possible to develop certain novel drug candidates. Drug candidates for genetic diseases, such as cancer, may also be designed on the basis of structural information obtained using X-ray analysis and NMR, as well as evidence from biological techniques applied to natural products - DNA (or RNA) complexes and conjugates. The resulting designed drug candidates exhibit promising performance based on the recognition on nucleic acid sequences. 相似文献
10.
Sébastien Angibaud Guillaume Fertin Irena Rusu Stéphane Vialette 《Journal of computational biology》2007,14(4):379-393
Computing genomic distances between whole genomes is a fundamental problem in comparative genomics. Recent researches have resulted in different genomic distance definitions, for example, number of breakpoints, number of common intervals, number of conserved intervals, and Maximum Adjacency Disruption number. Unfortunately, it turns out that, in presence of duplications, most problems are NP-hard, and hence several heuristics have been recently proposed. However, while it is relatively easy to compare heuristics between them, until now very little is known about the absolute accuracy of these heuristics. Therefore, there is a great need for algorithmic approaches that compute exact solutions for these genomic distances. In this paper, we present a novel generic pseudo-boolean approach for computing the exact genomic distance between two whole genomes in presence of duplications, and put strong emphasis on common intervals under the maximum matching model. Of particular importance, we show three heuristics which provide very good results on a well-known public dataset of gamma-Proteobacteria. 相似文献
11.
12.
Fast algorithms for analysing sequence data are presented. An algorithm for strict homologies finds all common subsequences of length greater than or equal to 6 in two given sequences. With it, nucleic acid pieces five thousand nucleotides long can be compared in five seconds on CDC 6600. Secondary structure algorithms generate the N most stable secondary structures of an RNA molecule, taking into account all loop contributions, and the formation of all possible base-pairs in stems, including odd pairs (G.G., C.U., etc.). They allow a typical 100-nucleotide sequence to be analysed in 10 seconds. The homology and secondary structure programs are respectively illustrated with a comparison of two phage genomes, and a discussion of Drosophila melanogaster 55 RNA folding. 相似文献
13.
This work studied the relationship of any two nucleotides in genomic sequences, coding sequences and full-length cDNAs. We made a statistical hypothesis that there exist no interactions between any two nucleotides in sequences, therefore, a hypothetical combination distribution of two nucleotides is considered and the difference between the hypothetical combination distribution and the actual distribution is used to measure the average interaction between the two nucleotides. As a result, we found that the interactions between any two nucleotides are clearly and closely related with dampable wavelike patterns along the sequences. Based on the results we daringly make some hypotheses on several biological topics. Further, studies on the wave may provide new clues for gene prediction and genome structure study. 相似文献
14.
15.
In addition to the sequence homologies and statistical patterns identified among numerous genetic sequences, there are subtler classes of patterns for which most current computer search methods offer very limited utility. This class includes various presumptive eukaryotic regulatory sites. A critique of the often employed consensus and local homology methods suggests the need for new tools. In particular, such new methods should use the positional and structural data now becoming available on exactly what it is that is recognized in the DNA sequence by sequence-specific binding proteins. 相似文献
16.
We describe a program that efficiently searches sequence databanks for complex patterns where sites are linked by commonrelations such as identity, complementarity or span. Its algorithmis closer to those of automatic demonstration than to the finitestate machines used in fast pattern matching. The repertoryof relations can be enriched at will without rewriting the coreof the program. The program is written in Pascal-ISO and runson a microcomputer. Received on September 25, 1986; accepted on April 30, 1987 相似文献
17.
Nomenclature for incompletely specified bases in nucleic acid sequences: recommendations 1984. 总被引:8,自引:7,他引:8
下载免费PDF全文

A Cornish-Bowden 《Nucleic acids research》1985,13(9):3021-3030
18.
We describe two measures of a nucleic acid sequence, derived from Information Theory, which characterize the constraints toward nonuniform base composition, and the constraints on the ordering of the bases. These two measures distinguish extra-chromosomal coding sequences from all other coding sequences examined. The two measures separate eukaryotic coding sequences into two groups: those with introns and those without introns. We have also found a relationship between the general constraints of a subsequence and its degree of conservation in related genes. 相似文献
19.
R Staden 《DNA sequence》1991,1(6):369-374
We describe programs that can screen nucleic acid and protein sequences against libraries of motifs and patterns. Such comparisons are likely to play an important role in interpreting the function of sequences determined during large scale sequencing projects. In addition we report programs for converting the Prosite protein motif library into a form that is compatible with our searching programs. The programs work on VAX and SUN computers. 相似文献
20.
The programs offer the possibility of comparing pairs of homologous sequences in order to find out percentage of homology, number of identical and deviating nucleotides, of transitions and transversions and, derived from these, KNUC-values according to Kimura (1) and the corresponding standard error sigmaK. The sequences can be printed in pairs underneath each other, homologies are indicated by asterisks between the identical nucleotides. Out of a set of homologous sequences stored on a disk any number of sequences can be compared in pairs in this way, and a matrix containing either the percentage of homology values, the number of deviating nucleotides or the KNUC-values together with the corresponding standard errors can be sent to screen, printer or disk. A program will be available soon which creates a dendrogram representing the similarity between the sequences by use of an average linkage clustering method deduced from this matrix. The programs are written for Apple II computers using UCSD-PASCAL and for Sirius I/Victor 9000 computers using TURBO-PASCAL. 相似文献