首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 640 毫秒
1.
OWEN: aligning long collinear regions of genomes   总被引:8,自引:0,他引:8  
OWEN is an interactive tool for aligning two long DNA sequences that represents similarity between them by a chain of collinear local similarities. OWEN employs several methods for constructing and editing local similarities and for resolving conflicts between them. Alignments of sequences of lengths over 10(6) can often be produced in minutes. OWEN requires memory below 20 L, where L is the sum of lengths of the compared sequences.  相似文献   

2.
We discuss the statistical significance of local similarities found between DNA sequences, and illustrate the procedure with reference to the Queen and Korn algorithm. If the longest similarity found for two sequences has length L, this length is said to be significant at the 5% level if there is a probability of no more than 0.05 of finding a length of L or greater between a pair of sequences consisting of randomly chosen bases with the same overall base frequencies. The distribution of longest lengths is related to that of lengths from any particular pair of starting positions on the two sequences. For our implementation of the Queen and Korn algorithm, this latter distribution is constructed by combining the five different blocks of bases that may be added to extend a similarity. A table is given to assess the significance of longest similarities in sequences of length up to 1000 bases. Quite long similarities are expected to occur by chance alone. The critical values we calculate for assessing significance are preferable to expected numbers of similarities used by some commercial computer packages.  相似文献   

3.
We have developed a method of searching for similar spatial arrangements of atoms around a given chemical moiety in proteins that bind a common ligand. The first step in this method is to consider a set of atoms that closely surround a given chemical moiety. Then, to compare the spatial arrangements of such surrounding atoms in different proteins, they are translated and rotated so that the chemical moieties are superposed on each other. Spatial arrangements of surrounding atoms in a pair of proteins are judged to be similar, when there are many corresponding atoms occupying similar spatial positions. Because the method focuses on the arrangements of surrounding atoms, it can detect structural similarities of binding sites in proteins that are dissimilar in their amino acid sequences or in their chain folds. We have applied this method to identify modes of nucleotide base recognition by proteins. An all-against-all comparison of the arrangements of atoms surrounding adenine moieties revealed an unexpected structural similarity between protein kinases, cAMP-dependent protein kinase (cAPK), and casein kinase-1 (CK1), and D-Ala:D-Ala ligase (DD-ligase) at their adenine-binding sites, despite a lack of similarity in their chain folds. The similar local structure consists of a four-residue segment and three sequentially separated residues. In particular the four-residue segments of these enzymes were found to have nearly identical conformations in their backbone parts, which are involved in the recognition of adenine. This common local structure was also found in substrate-free three-dimensional structures of other proteins that are similar to DD-ligase in the chain fold and of other protein kinases. As the proteins with different folds were found to share a common local structure, these proteins seem to constitute a remarkable example of convergent evolution for the same recognition mechanism. Received: 9 December 1996 / Accepted: 7 February 1997  相似文献   

4.
Protein sequence comparison based on the wavelet transform approach   总被引:4,自引:0,他引:4  
A protein's chemical properties, the chain conformation, the function of the protein and its species specificity are determined by the information contained in the amino acid sequence. Proteins of similar functions have at some level sequential identical amino acid sequences. The closer the phylogenetic relationship, the more similar are the sequences. To find the similarities between two or more protein sequences is of great importance for protein sequence analysis. The differences in the amino acid sequences permit the construction of a family tree of evolution. In this work, a comparison method was devised that is capable of analysing a protein sequence 'hierarchically', i.e. it can examine a protein sequence at different spatial resolutions. Based on a wavelet decomposition of protein sequences and a cross-correlation study, a sequence-scale similarity concept is proposed for generating a similarity vector, which renders the comparison of two sequences feasible at different spatial resolutions (scales). This new similarity concept is an expansion of the conventional sequence similarity, which only takes into account the local pairwise amino acid match and ignores the information contained in coarser spatial resolutions.  相似文献   

5.
The review considers the original works on the primary structure of biopolymers, which were carried out from 1983 to 2003. Most works were supported by the Russian program Human Genome and earlier similar Russian programs. Little-known publications of 1983-1993 and recent unpublished results are described in detail. In the field of genome comparisons, these concern the OWEN hierarchic algorithm aligning syntenic regions of two genome sequences. The resulting global alignment is obtained as an ordered chain of local similarities. Alignment of sequences sized about 10(6) nucleotides takes several minutes. The concept of local similarity conflicts is generalized to multiple comparisons. New algorithms aligning protein sequences are described and compared with the Smith-Waterman algorithm, which is now most accurate. The ANCHOR hierarchic algorithm generates alignments of much the same accuracy and is twice as rapid as the Smith-Waterman one. The STRSWer algorithm takes an account of the secondary structures of proteins under study. With the secondary structures predicted using the PSI-PRED software for pairs of proteins having 10-30% similarity, the average accuracy of alignments generated by STRSWer is 15% higher than that achieved with the Smith-Waterman algorithm.  相似文献   

6.
R Dhar  C J Lai  G Khoury 《Cell》1978,13(2):345-358
DNA and RNA sequencing techniques were used to obtain the sequence surrounding the origin of DNA replication for human papovavirus BKV. The structure is characterized by a true palindrome of 17 residues followed by two sets of symmetrical sequences and a stretch of 20 AT residues. Within the two symmetrical sequences is a segment containing a strong purine bias, 23 of 26 nucleotides. These structures are similar, if not identical, to those found in the region of the SV40 replication, origin. Within the homologous DNA segments, 60-80% of the BKV and SV40 nucleotides are the same. The remarkable similarity of BKV and SV40 sequences containing the origins of DNA replication would appear to confirm our previous suggestion of an evolutionary relationship between the two genomes. In addition, topological similarities between these sequences suggest the possibility of certain structural requirements for bidirectional replication origins in these superhelical DNAs.  相似文献   

7.
The review considers the original works on the primary structure of biopolymers carried out from 1983 to 2003. Most works were supported by the Russian program Human Genome and earlier similar Russian programs. Little-known publications of 1983–1993 and recent unpublished results are described in detail. In the field of genome comparisons, these concern the OWEN hierarchic algorithm aligning syntenic regions of two genome sequences. The resulting global alignment is obtained as an ordered chain of local similarities. Alignment of megabase sequences takes several minutes. The concept of local similarity conflicts is generalized to multiple comparisons. New algorithms aligning protein sequences are described and compared with the Smith–Waterman algorithm, which is now most accurate. The ANCHOR hierarchic algorithm generates alignments of much the same accuracy and is twice as rapid as the Smith–Waterman one. The STRSWer algorithm takes into account the secondary structures of proteins under study. With the secondary structures predicted using the PSI-PRED software for pairs of proteins having 10–30% similarity, the average accuracy of alignments generated by STRSWer is 15% higher than that achieved with the Smith–Waterman algorithm.  相似文献   

8.
A new approach to sequence comparison: normalized sequence alignment   总被引:3,自引:0,他引:3  
The Smith-Waterman algorithm for local sequence alignment is one of the most important techniques in computational molecular biology. This ingenious dynamic programming approach was designed to reveal the highly conserved fragments by discarding poorly conserved initial and terminal segments. However, the existing notion of local similarity has a serious flaw: it does not discard poorly conserved intermediate segments. The Smith-Waterman algorithm finds the local alignment with maximal score but it is unable to find local alignment with maximum degree of similarity (e.g. maximal percent of matches). Moreover, there is still no efficient algorithm that answers the following natural question: do two sequences share a (sufficiently long) fragment with more than 70% of similarity? As a result, the local alignment sometimes produces a mosaic of well-conserved fragments artificially connected by poorly-conserved or even unrelated fragments. This may lead to problems in comparison of long genomic sequences and comparative gene prediction as recently pointed out by Zhang et al. (Bioinformatics, 15, 1012-1019, 1999). In this paper we propose a new sequence comparison algorithm (normalized local alignment ) that reports the regions with maximum degree of similarity. The algorithm is based on fractional programming and its running time is O(n2log n). In practice, normalized local alignment is only 3-5 times slower than the standard Smith-Waterman algorithm.  相似文献   

9.
Graphical representation of DNA sequences is one of the most popular techniques for alignment-free sequence comparison. Here, we propose a new method for the feature extraction of DNA sequences represented by binary images, by estimating the similarity between DNA sequences using the frequency histograms of local bitmap patterns of images. Our method shows linear time complexity for the length of DNA sequences, which is practical even when long sequences, such as whole genome sequences, are compared. We tested five distance measures for the estimation of sequence similarities, and found that the histogram intersection and Manhattan distance are the most appropriate ones for phylogenetic analyses.  相似文献   

10.
It is commonly believed that similarities between the sequences of two proteins infer similarities between their structures. Sequence alignments reliably recognize pairs of protein of similar structures provided that the percentage sequence identity between their two sequences is sufficiently high. This distinction, however, is statistically less reliable when the percentage sequence identity is lower than 30% and little is known then about the detailed relationship between the two measures of similarity. Here, we investigate the inverse correlation between structural similarity and sequence similarity on 12 protein structure families. We define the structure similarity between two proteins as the cRMS distance between their structures. The sequence similarity for a pair of proteins is measured as the mean distance between the sequences in the subsets of sequence space compatible with their structures. We obtain an approximation of the sequence space compatible with a protein by designing a collection of protein sequences both stable and specific to the structure of that protein. Using these measures of sequence and structure similarities, we find that structural changes within a protein family are linearly related to changes in sequence similarity.  相似文献   

11.
Transposable elements derived from the 7SL RNA gene, such as Alu elements in primates, have had remarkable success in several mammalian lineages. The results presented here show a broad spectrum of functions for genomic segments that display sequence composition similarities with the 7SL RNA gene. Using thoroughly documented loci, we report that DNaseI-hypersensitive sites can be singled out in large genomic sequences by an assessment of sequence composition similarities with the 7SL RNA gene. We apply a root word frequency approach to illustrate a distinctive relationship between the sequence of the 7SL RNA gene and several classes of functional genomic features that are not presumed to be of transposable origin. Transposable elements that show noticeable similarities with the 7SL sequence include Alu sequences, as expected, but also long terminal repeats and the 5′-untranslated regions of long interspersed repetitive elements. In sequences masked for repeated elements, we find, when using the 7SL RNA gene as query sequence, distinctive similarities with promoters, exons and distal gene regulatory regions. The latter being the most notoriously difficult to detect, this approach may be useful for finding genomic segments that have regulatory functions and that may have escaped detection by existing methods.  相似文献   

12.
A database search often will find a seemingly strong sequence similarity between two fragments of proteins that are not expected to have an evolutionary or functional relationship. It is tempting to suggest that the two fragments will adopt a similar conformation due to a common pattern of residues that dictate a particular substructure. To investigate the likelihood of such a structural similarity, local sequence similarities between proteins of known conformation were identified by a standard database search algorithm. Significant sequence similarity was identified as when the chance probability of obtaining the relatedness score from a scan of the entire database was less than 1%. In this region both true homologies and false homologies are detected. A total of 69 false homologies was located of length between 20 and 262 aligned positions. Many of these alignments had approximately 25% sequence identity and a further 25% of conservative changes. However, the results show in general these aligned fragments did not have a significant similarity in secondary or tertiary structure. Thus local sequence does not indicate a structural similarity when there is neither an evolutionary nor functional explanation to support this. Accordingly structure predictions based on finding a local sequence similarity with an evolutionary unrelated protein of known conformation are unlikely to be valid.  相似文献   

13.
Dynein heavy chains are involved in microtubule-dependent transport processes. While cytoplasmic dyneins are involved in chromosome or vesicle movement, axonemal dyneins are essential for motility of cilia and flagella. Here we report the isolation of dynein heavy chain (DHC)-like sequences in man and mouse. Using polymerase chain reaction and reverse-transcribed human and mouse testis RNA cDNA fragments encoding the conserved ATP binding region of dynein heavy chains were amplified. We identified 11 different mouse and eight human dynein-like sequences in testis which show high similarity to known dyneins of different species such as rat, sea urchin or green algae. Sequence similarities suggest that two of the mouse clones and one human clone encode putative cytoplasmic dynein heavy chains, whereas the other sequences show higher similarity to axonemal dyneins. Two of nine axonemal dynein isoforms identified in the mouse testis are more closely related to known outer arm dyneins, while seven clones seem to belong to the inner arm dynein group. Of the isolated human isoforms three clones were classified as outer arm and four clones as inner arm dynein heavy chains. Each of the DHC cDNAs corresponds to an individual gene as determined by Southern blot experiments. The alignment of the deduced protein sequences between human (HDHC) and mouse (MDHC) dynein fragments reveals higher similarity between single human and mouse sequences than between two sequences of the same species. Human and mouse cDNA fragments were used to isolate genomic clones. Two of these clones, gHDHC7 and gMDHC7, are homologous genes encoding axonemal inner arm dyneins. While the human clone is assigned to 3p21, the mouse gene maps to chromosome 14.  相似文献   

14.
A new approach to search for common patterns in many sequencesis presented. The idea is that one sequence from the set ofsequences to be compared is considered as a ‘basic’one and all its similarities with other sequences are found.Multiple similarities are then reconstructed using these data.This approach allows one to search for similar segments whichcan differ in both substitutions and deletions/insertions. Thesesegments can be situated at different positions in various sequences.No regions of complete or strong similarity within the segmentsare required. The other parts of the sequences can have no similarityat all. The only requirement is that the similar segments canbe found in all the sequences (or in the majority of them, giventhe common segments are present in the basic sequence). Workingtime of an algorithm presented is proportional to n.L2when nsequences of length L are analyzed. The algorithm proposed isimplemented as programs for the IBM-PC and IBM/370. Its applicationsto the analysis of biopolymer primary structures as well asthe dependence of the results on the choice of basic sequenceare discussed.  相似文献   

15.
MOTIVATION: Comparison of multimegabase genomic DNA sequences is a popular technique for finding and annotating conserved genome features. Performing such comparisons entails finding many short local alignments between sequences up to tens of megabases in length. To process such long sequences efficiently, existing algorithms find alignments by expanding around short runs of matching bases with no substitutions or other differences. Unfortunately, exact matches that are short enough to occur often in significant alignments also occur frequently by chance in the background sequence. Thus, these algorithms must trade off between efficiency and sensitivity to features without long exact matches. RESULTS: We introduce a new algorithm, LSH-ALL-PAIRS, to find ungapped local alignments in genomic sequence with up to a specified fraction of substitutions. The length and substitution rate of these alignments can be chosen so that they appear frequently in significant similarities yet still remain rare in the background sequence. The algorithm finds ungapped alignments efficiently using a randomized search technique, locality-sensitive hashing. We have found LSH-ALL-PAIRS to be both efficient and sensitive for finding local similarities with as little as 63% identity in mammalian genomic sequences up to tens of megabases in length  相似文献   

16.
Recently, several multiple plot similarity indices have been presented that cure some of the problems associated with the approaches for the calculation of compositional similarity for groups of plots by averaging pairwise similarities. These new indices calculate the similarity between more than two plots whilst considering the species composition on all compared plots. The resulting similarity value is true for the whole group of plots considered (called neighborhood in the following). Here, we review the possibilities for multiple plot similarity calculation and additionally explore coefficients that examine multiple plot similarity between a reference plot (named focal plot in the following) and any number of surrounding plots. The latter represent measures of singularity. Further, we establish a framework for applying these two kinds of multiple plot measures to gridded data including an algorithm for testing the significance of calculated values against random expectations. The capability of multiple plot measures for detecting species compositional gradients and local/regional hotspots within this framework is tested. For this purpose, several artificial data sets with known gradients in species composition (random, gradient, central hotspot, hotspot bottom right) are constructed on the basis of a real data set from a Tundra ecosystem in northern Sweden (Abisko). The coefficients that best reflect the positions of the plots on the realized gradients in species composition are considered as performing best with regard to pattern detection. The tested measures of multiple plot similarity and singularity produced considerably different results when applied to one real and 4 artificial data sets. The newly proposed symmetric singularity coefficient has the best overall performance which makes it suitable for local/regional hotspot detection and for incorporating local to regional similarity analyses in reserve selection procedures.  相似文献   

17.
A new development is introduced here in the use of dynamic programming in finding pattern similarities in genetic sequences, as was first done by Needleman and Wunsch (1969). A condition of pattern similarity is defined and an algorithm is given which scans any set of similarities and screens out those which fail to meet the condition. When the set to be scanned contains every pair of segments, one from each of two given sequences of lengthsm andn (i.e. every possible location for a pattern similarity), then it completes the scan in a number of computational steps proportional tom·n, leaving those pairs of segments which satisfy the similarity condition. The algorithm is based on the concept of match density, as suggested by Goad and Kanehisa (1982).  相似文献   

18.
The V region sequences of two anti-DNA (A52, D42) and two anti-RNA (D44, D444) autoantibodies, derived from lupus prone NZB/NZW F1 female mice, were determined by mRNA sequencing. The sequences had the following features: 1) there was no clear sequence relationship between anti-DNA and anti-RNA antibodies; 2) there were no major similarities between any of the L chain sequences and each VL gene segment belonged to a different mouse VK subgroup; 3) the H chains of the two anti-RNA antibodies showed closely related sequences of VH gene segments and very similar third complementarity determining regions (CDR3); 4) the H chains of the two anti-DNA antibodies had VH segments belonging to different VH gene families but had a unique and similar combination of D segments and junctional sequences, suggesting a common recognition element for Ag and/or for idiotypic regulation in the H chain CDR3; and 5) the VH gene segment of one anti-DNA antibody (D42) was found to be very similar to the VH gene segment of a CBA mouse hybridoma antibody (6G6) which binds to the environmental Ag phosphocholine. The three-dimensional structure of the Fv-region of the anti-DNA antibody (D42) was modeled by computer and a stretch of poly(dT), ssDNA was docked to a cleft in the antibody combining site, formed by the three H chain CDR and by CDR1 and CDR3 of the L chain. The cleft is characterized by a preponderance of arginine and tyrosine residues, lining both the walls and base of the cleft.  相似文献   

19.
We developed a new method which searches sequence segments responsible for the recognition of a given chemical structure. These segments are detected as those locally conserved among a sequence to be analyzed (target sequence) and a set of sequences (reference sequences). Reference sequences are the sequences of functionally related proteins, ligands of which contain a common chemical substructure in their molecular structures. 'Similarity graphing' cuts target sequences into segments, aligns them with reference sequence pairwise, calculates the degree of similarity for each alignment, and shows graphically cumulative similarity values on target sequence. Any locally conserved regions, short or long in length and weak or strong in similarity, are detected at their optimal conditions by adjusting three parameters. The 'enzyme-reaction database' contains chemical structures and their related enzymes. When a chemical substructure is input into the database, sequences of the enzymes related to the input substructure are systematically searched from the NBRF sequence database and output as reference sequences. Examples of analysis using similarity graphing in combination with the enzyme-reaction database showed a great potentiality in the systematic analysis of the relationships between sequences and molecular recognitions for protein engineering.  相似文献   

20.
Hybrid transfer RNA genes in phage T4   总被引:2,自引:0,他引:2  
W H McClain  K Foss 《Cell》1984,38(1):225-231
We describe the isolation and characterization of two unusual amber suppressor forms of T4 tRNALeu. The sequences of the suppressor tRNAs can be described as hybrids of wild-type tRNALeu and suppressor tRNAGln molecules: the chain lengths and majority of the nucleotide residues corresponded to tRNALeu, but CUA anticodons flanked by 2-14 residues were identical to tRNAGln. The uncertainty as to the exact number of flanking residues correlated with tRNAGln is due to the similarity of the two tRNA sequences in this region. No evidence was found for changes in other T4 tRNAs. We propose that genes for the hybrid tRNAs were produced by mispairing of DNAs at anticodon segments of tRNALeu and tRNAGln with a double crossover flanking those segments.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号