首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Locally optimal subalignments using nonlinear similarity functions   总被引:2,自引:0,他引:2  
Nonlinear similarity functions are often better than linear functions at distinguishing interesting subalignments from those due to chance. Nonlinear similarity functions useful for comparing biological sequences are developed. Several new algorithms are presented for finding locally optimal subalignments of two sequences. Unlike previous algorithms, they may use any reasonable similarity function as a selection criterion. Among these algorithms are VV-1, which finds all and only the locally optimal subalignments of two sequences, and CC-1, which finds all and only the weakly locally optimal subalignments of two sequences. The VV-1 algorithm is slow and interesting only for theoretical reasons. In contrast, the CC-1 algorithm has average time complexityO(MN) when used to find only very good subalignments.  相似文献   

2.
Amino acid similarity often needs to be considered in DNA sequence comparison to elucidate gene functions. We propose a Smith-Waterman-like algorithm which considers amino acid similarity and insertions/deletions in sequences at the DNA level and at the protein level in a hybrid manner. The algorithm is applied to cDNA sequences of Oryza sativa and those of Arabidopsis thaliana. The results are compared with the results of application of NCBI's tblastx program (which compares the sequences in the BLAST manner after translation). It is shown that the present algorithm is very helpful in discovering nucleotide insertions/deletions originating from experimental errors as well as amino acid insertions/deletions due to evolutionary reasons.  相似文献   

3.
A class of non-linear similarity functionss 1 has been proposed for comparing subalignments of biological sequences. The distribution of maximals 1-similarities is well approximated by the extreme value distribution. The significance levels ofs 1 are studied for a variety of nucleotide frequency distributions as well as for several matrices of amino acid substitution costs. Also, the significance levels ofs 1 are explored for comparing three biological sequences. Several previously described subalignments of bovine proenkephalin and porcine prodynorphin are shown to be highly significant.  相似文献   

4.
A local algorithm for DNA sequence alignment with inversions   总被引:1,自引:0,他引:1  
A dynamic programming algorithm to find all optimal alignments of DNA subsequences is described. The alignments use not only substitutions, insertions and deletions of nucleotides but also inversions (reversed complements) of substrings of the sequences. The inversion alignments themselves contain substitutions, insertions and deletions of nucleotides. We study the problem of alignment with non-intersecting inversions. To provide a computationally efficient algorithm we restrict candidate inversions to theK highest scoring inversions. An algorithm to find theJ best non-intersecting alignments with inversions is also described. The new algorithm is applied to the regions of mitochondrial DNA ofDrosophila yakuba and mouse coding for URF6 and cytochrome b and the inversion of the URF6 gene is found. The open problem of intersecting inversions is discussed.  相似文献   

5.

Background  

We have compared 38 isolates of the SARS-CoV complete genome. The main goal was twofold: first, to analyze and compare nucleotide sequences and to identify positions of single nucleotide polymorphism (SNP), insertions and deletions, and second, to group them according to sequence similarity, eventually pointing to phylogeny of SARS-CoV isolates. The comparison is based on genome polymorphism such as insertions or deletions and the number and positions of SNPs.  相似文献   

6.
Optimal sequence alignment allowing for long gaps   总被引:7,自引:0,他引:7  
A new algorithm for optimal sequence alignment allowing for long insertions and deletions is developed. The algorithm requires O((L+C)MN) computational steps, O(LN) primary memory and O(MN) secondary memory storage, whereM andN(M≥N) are sequence lengths,L (typicallyL≤3) is the number of segment specifying the gap weighting function, andC is a constant. We have also modified our earlier traceback algorithm so that it finds all and only the optimal alignments in a compact form of a directed graph. The current versions accept a set of aligned sequences as input, which facilitates multiple sequence alignment by some iterative procedures. Dedicated to Professor Akiyoshi Wada on the occasion of his 60th birthday.  相似文献   

7.
The nucleotide sequences of 280–360-bp domains of lectin genes from 20 legume species belonging to 17 genera have been determined. A computer analysis of the sequences has been performed with the LASERGENE package. Based on this analysis, we constructed the phylogenetic tree of the lectins, which reflects their phylogenetic and evolutionary relationships, and predicted the amino-acid sequences of the corresponding protein domains. Features of the structure of the hydrocarbon-binding lectin domains were elucidated in some species of legume genera from the temperate climatic zone. The domains were highly variable and contained the consensus sequence AspTrePheXxxAsxXxxXxxTrpAspProXxxXxxIns/DelArgHis bearing the bulk of amino acid replacements, insertions, and deletions. An association between legume groups (including species from different genera and tribes) symbiotic with the same rhizobium species and the similarity between the hydrocarbon-binding domains of lectins from these plants was found.  相似文献   

8.
To study the mechanisms for local evolutionary changes in DNA sequences involving slippage-type insertions and deletions, an alignment approach is explored that can consider the posterior probabilities of alignment models. Various patterns of insertion and deletion that can link the ancestor and descendant sequences are proposed and evaluated by simulation and compared by the Markov chain Monte Carlo (MCMC) method. Analyses of pseudogenes reveal that the introduction of the parameters that control the probability of slippage-type events markedly augments the probability of the observed sequence evolution, arguing that a cryptic involvement of slippage occurrences is manifested as insertions and deletions of short nucleotide segments. Strikingly, approximately 80% of insertions in human pseudogenes and approximately 50% of insertions in murids pseudogenes are likely to be caused by the slippage-mediated process, as represented by BC in ABCD --> ABCBCD. We suggest that, in both human and murids, even very short repetitive motifs, such as CAGCAG, CACACA, and CCCC, have approximately 10- to 15-fold susceptibility to insertions and deletions, compared to nonrepetitive sequences. Our protocol, namely, indel-MCMC, thus seems to be a reasonable approach for statistical analyses of the early phase of microsatellite evolution.  相似文献   

9.
From a human genomic library we have isolated and sequenced a beta-actin-related pseudogene (Hbeta Ac-psi l) which is free of intervening sequences. Several nucleotide insertions and deletions and translational stop codons generated within the protein-coding region indicate that this gene is functionless.  相似文献   

10.
The H1° gene has a long 3′ untranslated region (3′UTR) of 1,125 nucleotides in the rat and 1,310 in humans. Analysis of the sequences shows that they have features of simple DNA that suggest involvement of replication slippage in their evolution. These features include the length imbalance between the rat and human sequences; the abundance of single-base repeats, two-base runs and other simple motifs clustered along the sequence; and the presence of single-base repeat length polymorphisms in the rat and mouse sequences. Pairwise comparisons show numerous short insertions/deletions, often flanked by direct repeats. In addition, a proportion of short insertions/deletions results from length differences in conserved single-base repeats. Quantification of the sequence simplicity shows that simple sequences have been more actively incorporated in the human lineage than in the rodent lineage. The combination of insertions/deletions and nucleotide substitutions along the sequence gives rise to three main regions of homology: a highly variable central region flanked by more conserved regions nearest the coding region and the polyA addition site. Correspondence to: P. Suau  相似文献   

11.
Cytochromes c2 are the nearest bacterial homologs of mitochondrial cytochrome c. The sequences of the known cytochromes c2 can be placed in two subfamilies based upon insertions and deletions, one subfamily is most like mitochondrial cytochrome c (the small C2s, without significant insertions and deletions), and the other, designated large C2, shares 3- and 8-residue insertions as well as a single-residue deletion. C2s generally function between cytochrome bc1 and cytochrome oxidase in respiration (ca 80 examples known to date) and between cytochrome bc1 and the reaction center in nonsulfur purple bacterial photosynthesis (ca 21 examples). However, members of the large C2 subfamily are almost always involved in photosynthesis (12 of 14 examples). In addition, the gene for the large C2 (cycA) is associated with those for the photosynthetic reaction center (pufBALM). We hypothesize that the insertions in the large C2s, which were already functioning in photosynthesis, allowed them to replace the membrane-bound tetraheme cytochrome, PufC, that otherwise mediates between the small C2 or other redox proteins and photosynthetic reaction centers. Based upon our analysis, we propose that the involvement of C2 in nonsulfur purple bacterial photosynthesis was a metabolic feature subsequent to the evolution of oxygen respiration.  相似文献   

12.
The similarity of two nucleotide sequences is often expressed in terms of evolutionary distance, a measure of the amount of change needed to transform one sequence into the other. Given two sequences with a small distance between them, can their similarity be explained by their base composition alone? The nucleotide order of these sequences contributes to their similarity if the distance is much smaller than their average permutation distance, which is obtained by calculating the distances for many random permutations of these sequences. To determine whether their similarity can be explained by their dinucleotide and codon usage, random sequences must be chosen from the set of permuted sequences that preserve dinucleotide and codon usage. The problem of choosing random dinucleotide and codon-preserving permutations can be expressed in the language of graph theory as the problem of generating random Eulerian walks on a directed multigraph. An efficient algorithm for generating such walks is described. This algorithm can be used to choose random sequence permutations that preserve (1) dinucleotide usage, (2) dinucleotide and trinucleotide usage, or (3) dinucleotide and codon usage. For example, the similarity of two 60-nucleotide DNA segments from the human beta-1 interferon gene (nucleotides 196-255 and 499-558) is not just the result of their nonrandom dinucleotide and codon usage.   相似文献   

13.
Insertions and deletions are responsible for gaps in aligned nucleotide sequences, but they have been usually ignored when the number of nucleotide substitutions was estimated. We compared six sets of nuclear and mitochondrial noncoding DNA sequences of primates and obtained the estimates of the evolutionary rate of insertion and deletion. The maximum-parsimony principle was applied to locate insertions and deletions on a given phylogenetic tree. Deletions were about twice as frequent as insertions for nuclear DNA, and single-nucleotide insertions and deletions were the most frequent in all events. The rate of insertion and deletion was found to be rather constant among branches of the phylogenetic tree, and the rate (approximately 2.0/kb/Myr) for mitochondrial DNA was found to be much higher than that (approximately 0.2/kb/Myr) for nuclear DNA. The rates of nucleotide substitution were about 10 times higher than the rate of insertion and deletion for both nuclear and mitochondrial DNA.   相似文献   

14.
The genes for testis-specific protein Y (TSPY) were sequenced from chimpanzee (Pan troglodytes), gorilla (Gorilla gorilla), orangutan (Pongo pygmaeus), and baboon (Papio hamadryas). The sequences were compared with each other and with the published human sequence. Substitutions were detected at 144 of the 755 nucleotide positions compared. In overviewing five sequences, one deletion in human, four successive nucleotide insertions in orangutan, and seven deletions/insertions in baboon sequence were noted. The present sequences differed from that of human by 1.9% (chimpanzee), 4.0% (gorilla), 8.2% (orangutan), and 16.8% (baboon), respectively. The phylogenetic tree constructed by the neighbor-joining method suggests that human and chimpanzee are more closely related to each other than either of them is to gorilla, and this result is also supported by maximum likelihood and strict consensus maximum parsimony trees. The number of nucleotide substitutions per site between human and chimpanzee, gorilla, and orangutan for TSPY intron were 0.024, 0.048, and 0.094, respectively. The rates of nucleotide substitutions per site per year were higher in the TSPY intron than in the TSPY exon, and higher in the TSPY intron than in the ZFY (Zinc Finger Y) intron in human and apes. © 1996 Wiley-Liss, Inc.  相似文献   

15.
Although oligonucleotide probes complementary to single nucleotide substitutions are commonly used in microarray-based screens for genetic variation, little is known about the hybridization properties of probes complementary to small insertions and deletions. It is necessary to define the hybridization properties of these latter probes in order to improve the specificity and sensitivity of oligonucleotide microarray-based mutational analysis of disease-related genes. Here, we compare and contrast the hybridization properties of oligonucleotide microarrays consisting of 25mer probes complementary to all possible single nucleotide substitutions and insertions, and one and two base deletions in the 9168 bp coding region of the ATM (ataxia telangiectasia mutated) gene. Over 68 different dye-labeled single-stranded nucleic acid targets representing all ATM coding exons were applied to these microarrays. We assess hybridization specificity by comparing the relative hybridization signals from probes perfectly matched to ATM sequences to those containing mismatches. Probes complementary to two base substitutions displayed the highest average specificity followed by those complementary to single base substitutions, single base deletions and single base insertions. In all the cases, hybridization specificity was strongly influenced by sequence context and possible intra- and intermolecular probe and/or target structure. Furthermore, single nucleotide substitution probes displayed the most consistent hybridization specificity data followed by single base deletions, two base deletions and single nucleotide insertions. Overall, these studies provide valuable empirical data that can be used to more accurately model the hybridization properties of insertion and deletion probes and improve the design and interpretation of oligonucleotide microarray-based resequencing and mutational analysis.  相似文献   

16.
We have compared the partial nucleotide and derived amino acid sequences of a phaseolin seed storage protein gene ofPhaseolus vulgaris (1) and a conglycinin storage protein gene ofGlycine max (2). Although these proteins are not antigenically related to one another, the architecture of the genes is similar throughout the sequences compared here. Intervening sequences interrupt the same amino acid positions in both genes. Within the 28% of theG. max gene and the 38% of theP. vulgaris gene represented in this comparison, 73% of the nucleotides in the coding and intervening sequences are identical, excluding the insertions and deletions. The nucleotide mismatches found in the coding sequences are distributed throughout the three codon positions with little bias towards the third codon position. In addition to the single nucleotide differences, six insertions or deletions, ranging from three to twenty-seven nucleotides in length, occur in this portion of the coding region and these are partially responsible for the molecular weight differences of the conglycinin α′-subunit and the phaseolin subunit.  相似文献   

17.
Aita T  Husimi Y  Nishigaki K 《Bio Systems》2011,106(2-3):67-75
To measure the similarity or dissimilarity between two given biological sequences, several papers proposed metrics based on the "word-composition vector". The essence of these metrics is as follows. First, we count the appearance frequencies of all the K-tuple words throughout each of two given sequences. Then, the two given sequences are transformed into their respective word-composition vectors. Next, the distance metrics, for example the angle between the two vectors, are calculated. A significant issue is to determine the optimal word size K. With a mathematical model of mutational events (including substitutions, insertions, deletions and duplications) that occur in sequences, we analyzed how the angle between the composition vectors depends on the mutational events. We also considered the optimal word size (=resolution) from our original approach. Our results were verified by computational experiments using artificially generated sequences, amino acid sequences of hemoglobin and nucleotide sequences of 16S ribosomal RNA.  相似文献   

18.
We studied the occurrence of mammalian interspersed repeats (MIRs) in DNA and RNA of vertebrates, invertebrates, and bacteria using the data from GenBank. A special algorithm based on a weight position matrix with optimal alignment using dynamic programming was developed to search for the traces of MIR dissemination. This allowed us to search for highly divergent MIRs carrying deletions and insertions. MIRs were detected in genomes of various fishes, includingLatimeria. This suggests that the origin of MIRs dates back more than 400 million years. The method to search for similarity between highly divergent sequences may be used to find the genome fragments from various ancient repeat families and from various gene families.  相似文献   

19.
The concept of the phase shift of triplet periodicity (TP) was used for searching potential DNA insertions in genes from 17 bacterial genomes. A mathematical algorithm for detection of these insertions has been developed. This approach can detect potential insertions and deletions with lengths that are not multiples of three bases, especially insertions of relatively large DNA fragments (>100 bases). New similarity measure between triplet matrixes was employed to improve the sensitivity for detecting the TP phase shift. Sequences of 17,220 bacterial genes with each consisting of more than 1,200 bases were analyzed, and the presence of a TP phase shift has been shown in ~16% of analysed genes (2,809 genes), which is about 4 times more than that detected in our previous work. We propose that shifts of the TP phase may indicate the shifts of reading frame in genes after insertions of the DNA fragments with lengths that are not multiples of three bases. A relationship between the phase shifts of TP and the frame shifts in genes is discussed.  相似文献   

20.
In this work we report a simple way to measure the similarity between two nucleotide sequences by using graph theory and information theory. This method reported allows for theoretical comparisons of naturally occurring nucleotide sequences.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号