首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 568 毫秒
1.
We discuss the statistical significance of local similarities found between DNA sequences, and illustrate the procedure with reference to the Queen and Korn algorithm. If the longest similarity found for two sequences has length L, this length is said to be significant at the 5% level if there is a probability of no more than 0.05 of finding a length of L or greater between a pair of sequences consisting of randomly chosen bases with the same overall base frequencies. The distribution of longest lengths is related to that of lengths from any particular pair of starting positions on the two sequences. For our implementation of the Queen and Korn algorithm, this latter distribution is constructed by combining the five different blocks of bases that may be added to extend a similarity. A table is given to assess the significance of longest similarities in sequences of length up to 1000 bases. Quite long similarities are expected to occur by chance alone. The critical values we calculate for assessing significance are preferable to expected numbers of similarities used by some commercial computer packages.  相似文献   

2.
The review considers the original works on the primary structure of biopolymers, which were carried out from 1983 to 2003. Most works were supported by the Russian program Human Genome and earlier similar Russian programs. Little-known publications of 1983-1993 and recent unpublished results are described in detail. In the field of genome comparisons, these concern the OWEN hierarchic algorithm aligning syntenic regions of two genome sequences. The resulting global alignment is obtained as an ordered chain of local similarities. Alignment of sequences sized about 10(6) nucleotides takes several minutes. The concept of local similarity conflicts is generalized to multiple comparisons. New algorithms aligning protein sequences are described and compared with the Smith-Waterman algorithm, which is now most accurate. The ANCHOR hierarchic algorithm generates alignments of much the same accuracy and is twice as rapid as the Smith-Waterman one. The STRSWer algorithm takes an account of the secondary structures of proteins under study. With the secondary structures predicted using the PSI-PRED software for pairs of proteins having 10-30% similarity, the average accuracy of alignments generated by STRSWer is 15% higher than that achieved with the Smith-Waterman algorithm.  相似文献   

3.
Dot-matrix sequence similarity searches can be greatly speeded up through use of a table listing all locations of short oligomers in one of the sequences to find potential similarities with a second sequence. The algorithm described finds similarities between two sequences of lengths M and N, comparing L residues at a time, with an efficiency of L X M X N/(SK) where S is the alphabet size, and k is the length of the oligomer. For nucleic acids, in which S = 4, use of a tetranucleotide table results in an efficiency of L X M X N/256. The simplicity of the approach allows for a straightforward calculation of the level of similarities expected to be found for given search parameters. Furthermore, the storage required is minimal, allowing for even large sequences to be compared on small microcomputers. Theoretical considerations regarding the use of this search are discussed.  相似文献   

4.
The review considers the original works on the primary structure of biopolymers carried out from 1983 to 2003. Most works were supported by the Russian program Human Genome and earlier similar Russian programs. Little-known publications of 1983–1993 and recent unpublished results are described in detail. In the field of genome comparisons, these concern the OWEN hierarchic algorithm aligning syntenic regions of two genome sequences. The resulting global alignment is obtained as an ordered chain of local similarities. Alignment of megabase sequences takes several minutes. The concept of local similarity conflicts is generalized to multiple comparisons. New algorithms aligning protein sequences are described and compared with the Smith–Waterman algorithm, which is now most accurate. The ANCHOR hierarchic algorithm generates alignments of much the same accuracy and is twice as rapid as the Smith–Waterman one. The STRSWer algorithm takes into account the secondary structures of proteins under study. With the secondary structures predicted using the PSI-PRED software for pairs of proteins having 10–30% similarity, the average accuracy of alignments generated by STRSWer is 15% higher than that achieved with the Smith–Waterman algorithm.  相似文献   

5.
Ogurtsov AIu 《Biofizika》2005,50(3):475-479
A protocol of automatic hierarchical alignment of long DNA sequences using the program OWEN is described. The protocol is based on the command line regime of the program OWEN. The protocol makes it possible to align a large number of pairs of moderately similar sequences automatically. We used this protocol to align 8623 orthologous pairs of intergenic regions in human and murine genomes.  相似文献   

6.
MOTIVATION: As a first approximation, similarity between two long orthologous regions of genomes can be represented by a chain of local similarities. Within such a chain, pairs of successive similarities are collinear (non-conflicting), i.e. segments involved in the nth similarity precede in both sequences segments involved in the (n+1)th similarity. However, when all similarities between two long sequences are considered, usually there are many conflicts between them. Although some conflicts can be avoided by masking transposons or low-complexity sequences, selecting only those similarities that reflect orthology and, thus, belong to the evolutionarily true chain is not trivial. RESULTS: We propose a simple, hierarchical algorithm of finding the true chain of local similarities. Starting from similarities with low P-values, we resolve each pairwise conflict by deleting a similarity with a higher P-value. This greedy approach constructs a chain of similarities faster than when a chain optimal with respect to some global criterion is sought, and makes more sense biologically.  相似文献   

7.
Calculation of dot-matrices is a widespread tool in the search for sequence similarities. When sequences are distant, even this approach may fail to point out common regions. If several plots calculated for all members of a sequence set consistently displayed a similarity between them, this would increase its credibility. We present an algorithm to delineate dot-plot agreement. A novel procedure based on matrix multiplication is developed to identify common patterns and reliably aligned regions in a set of distantly related sequences. The algorithm finds motifs independent of input sequence lengths and reduces the dependence on gap penalties. When sequences share greater similarity, the same approach converts to a multiple sequence alignment procedure.  相似文献   

8.
A method for comparing amino acid compositions of proteins (Cornish-Bowden, 1977) has been extended to allow proteins of unequal lengths to be compared. The method has been tested by applying it to proteins of known sequence. It tends to exaggerate the amount of difference between unrelated proteins. It is therefore a reliable guide to possible sequence similarities, in that it does not suggest that sequences are similar when they are not, though it sometimes fails to detect genuine similarities. When applied to related proteins the method gives results in good agreement with those predicted. A phylogenetic tree for 37 snake venom toxins has been constructed from their compositions and is similar in most important respects to one constructed from the corresponding sequences.  相似文献   

9.
Bilateral similarity function is designed for analyzing the similarities of biological sequences such as DNA, RNA secondary structure or protein in this paper. The defined function can perform comprehensive comparison between sequences remarkably well, both in terms of the Hamming distance of two compared sequences and the corresponding location difference. Compared with the existing methods for similarity analysis, the examination of similarities/dissimilarities illustrates that the proposed method with the computational complexity of O(N) is effective for these three kinds of biological sequences, and bears the universality for them.  相似文献   

10.
A set of aligned homologous protein sequences is divided into two groups consisting of m and n most related sequences. The value of position variability for homologous protein sequences is defined as a number of failures to coincide in the intergroup comparison of all possible m*n pairs of amino acid residues in that position divided by m*n. The position variability value plotted versus the sequence position number with a window of 10 positions gives the intergroup local variability profile. Area S of the figure included between the local variability profile and the straight line corresponding to the mean local variability value is compared with the average area Sr for 1000 random homologous protein families. If S is greater than Sr by more than 2 standard deviation units sigma r, the local variability profile is assumed to contain peaks and hollows corresponding to significant variable and conservative regions of the sequences. The profile extrema containing the area surplus delta S = S-(Sr+ 2 sigma r) are cut off by two straight lines to locate significant regions. The difference (S-Sr) given in standard deviation units sigma r is believed to be the amino acid substitution overall irregularity along the homologous protein sequences OI = (S-Sr)/sigma r. The significant conservative and variable regions of six homologous sequence families (phospholipase A2, cytochromes b, alpha-subunits of Na,K-ATPase, L- and M-subunits of photosynthetic bacteria photoreaction centre and human rhodopsins) were identified. It was shown that for artificial homologous protein sequences derived by k-fold lengthening of natural protein sequences, the OI value rises as square root of k. To compare the degree of substitution irregularity in homologous protein sequence families of different lengths L the value of standard substitution overall irregularity for L = 250 is proposed.  相似文献   

11.
We have compiled and aligned the DNA sequences of 554 promoter regions from Escherichia coli and analysed the alignment for sequence similarities. We have focused on the similarities and differences between promoters that either do or do not contain an extended –10 element. The distribution of –10 and –35 hexamer element sequences, the range of spacer lengths between these elements and the frequencies of occurrence of different nucleotides, dinucleotides and trinucleotides were investigated. Extended –10 promoters, which contain a 5′-TG-3′ element, tend to have longer spacer lengths than promoters that do not. They also tend to show fewer matches to the consensus –35 hexamer element and contain short runs of T residues in the spacer region. We have shown experimentally that the extended –10 5′-TG-3′ motif contributes to promoter activity at seven different promoters. The importance of the motif at different promoters is dependent on the sequence of other promoter elements.  相似文献   

12.
Using chaos game representation we introduce a novel and straightforward method for identifying similarities/dissimilarities between DNA sequences of the same type, from different organisms. A matrix is associated to each CGR pattern and the similarities result from the comparison between the matrices of the sequences of interest. Three different methods of analysis of the resulting difference matrix are considered: a 3-dimensional representation giving both local and global information, a numerical characterization by defining an n-letter word similarity measure and a statistical evaluation. The method is illustrated by implementation to the study of albumin nucleotides sequences from eight mammal species taking as reference the human albumin.  相似文献   

13.
MOTIVATION: Comparison of multimegabase genomic DNA sequences is a popular technique for finding and annotating conserved genome features. Performing such comparisons entails finding many short local alignments between sequences up to tens of megabases in length. To process such long sequences efficiently, existing algorithms find alignments by expanding around short runs of matching bases with no substitutions or other differences. Unfortunately, exact matches that are short enough to occur often in significant alignments also occur frequently by chance in the background sequence. Thus, these algorithms must trade off between efficiency and sensitivity to features without long exact matches. RESULTS: We introduce a new algorithm, LSH-ALL-PAIRS, to find ungapped local alignments in genomic sequence with up to a specified fraction of substitutions. The length and substitution rate of these alignments can be chosen so that they appear frequently in significant similarities yet still remain rare in the background sequence. The algorithm finds ungapped alignments efficiently using a randomized search technique, locality-sensitive hashing. We have found LSH-ALL-PAIRS to be both efficient and sensitive for finding local similarities with as little as 63% identity in mammalian genomic sequences up to tens of megabases in length  相似文献   

14.
具有内含子的大鳞副泥鳅Sox8基因(英文)   总被引:3,自引:0,他引:3  
鱼类在脊椎动物系统进化过程中起着承先启后的作用,其性别决定具有原始性、多样性和可塑性。深入研究鱼类的性别决定和分化具有重大的理论意义。Sox基因家族是九十年代发现的一个新的基因家族,其中的许多成员都与性别决定和分化具有直接的关系。大鳞副泥鳅是一种常见的小型鱼类,属于鲤形目鳅科,是少数具有异形性染色体中的一种。本文根据已发表的Sox蛋白质的序列资料,选择在不同物种中保守程度最高的区段设计兼并引物。该组引物可以特异扩增 Sox基因的 HMG盒区。以大鳞副泥鳅基因组 DNA为模板,在扩增产物中有三条主带,其大小分别为220bp,550bp和1500bp,另有一条相当弱的带,大小为700bp(Fig.1)。雌雄个体中扩增结果一致。经克隆和DNA序列分析,从550bp扩增带中得到一新的基因片段,长500bp,编码53个氨基酸;其余部分长340bp,可能为一内含子,且符合“GT…AG”规律(Fig.2)。其可能编码的蛋白质氨基酸序列与小鼠的 Sox8, 9, 10,SRY基因的相似性分别为 96%, 94%, 90%和 47%;与人类 Sox 8, 9, 10, SRY基因的相似性分别为64%, 94%, 58%和40%(Fig.3)。  相似文献   

15.
A new approach to search for common patterns in many sequencesis presented. The idea is that one sequence from the set ofsequences to be compared is considered as a ‘basic’one and all its similarities with other sequences are found.Multiple similarities are then reconstructed using these data.This approach allows one to search for similar segments whichcan differ in both substitutions and deletions/insertions. Thesesegments can be situated at different positions in various sequences.No regions of complete or strong similarity within the segmentsare required. The other parts of the sequences can have no similarityat all. The only requirement is that the similar segments canbe found in all the sequences (or in the majority of them, giventhe common segments are present in the basic sequence). Workingtime of an algorithm presented is proportional to n.L2when nsequences of length L are analyzed. The algorithm proposed isimplemented as programs for the IBM-PC and IBM/370. Its applicationsto the analysis of biopolymer primary structures as well asthe dependence of the results on the choice of basic sequenceare discussed.  相似文献   

16.
Expressed sequence tag (EST) libraries from members of the Penaeidae family and brine shrimp (Artemia franciscana) are currently the primary source of sequence data for shrimp species. Penaeid shrimp are the most commonly farmed worldwide, but selection methods for improving shrimp are limited. A better understanding of shrimp genomics is needed for farmers to use genetic markers to select the best breeding animals. The ESTs from Litopenaeus vannamei have been previously mined for single nucleotide polymorphisms (SNPs). This present study took publicly available ESTs from nine shrimp species, excluding L. vannamei, clustered them with CAP3, predicted SNPs within them using SNPidentifier, and then analyzed whether the SNPs were intra- or interspecies. Major goals of the project were to predict SNPs that may distinguish shrimp species, locate SNPs that may segregate in multiple species, and determine the genetic similarities between L. vannamei and the other shrimp species based on their EST sequences. Overall, 4,597 SNPs were predicted from 4,600 contigs with 703 of them being interspecies SNPs, 735 of them possibly predicting species' differences, and 18 of them appearing to segregate in multiple species. While sequences appear relatively well conserved, SNPs do not appear to be well conserved across shrimp species.  相似文献   

17.
Graphical representation of DNA sequences is one of the most popular techniques for alignment-free sequence comparison. Here, we propose a new method for the feature extraction of DNA sequences represented by binary images, by estimating the similarity between DNA sequences using the frequency histograms of local bitmap patterns of images. Our method shows linear time complexity for the length of DNA sequences, which is practical even when long sequences, such as whole genome sequences, are compared. We tested five distance measures for the estimation of sequence similarities, and found that the histogram intersection and Manhattan distance are the most appropriate ones for phylogenetic analyses.  相似文献   

18.
MOTIVATION: Homologous sequences are sometimes similar over some regions but different over other regions. Homologous sequences have a much lower global similarity if the different regions are much longer than the similar regions. RESULTS: We present a generalized global alignment algorithm for comparing sequences with intermittent similarities, an ordered list of similar regions separated by different regions. A generalized global alignment model is defined to handle sequences with intermittent similarities. A dynamic programming algorithm is designed to compute an optimal general alignment in time proportional to the product of sequence lengths and in space proportional to the sum of sequence lengths. The algorithm is implemented as a computer program named GAP3 (Global Alignment Program Version 3). The generalized global alignment model is validated by experimental results produced with GAP3 on both DNA and protein sequences. The GAP3 program extends the ability of standard global alignment programs to recognize homologous sequences of lower similarity. AVAILABILITY: The GAP3 program is freely available for academic use at http://bioinformatics.iastate.edu/aat/align/align.html.  相似文献   

19.
P. IRIARTE AND R. J. OWEN. 1996. Forty-seven strains of Campylobacter jejuni were examined by PCR-RFLP analysis of 23S rRNA genes. Seven different molecular profiles were detected by a combination of HpaII AluI and DdeI digest analysis. Most (83%) strains, including those with different Penner serotypes and from different hosts, had the same molecular profiles. The high level of conservation apparent within the 23S rDNA sequences confirmed their value as targets in species-specific PCR identification assays but not for subtypic discrimination within Camp. jejuni.  相似文献   

20.
Recent work has shown that elaborate secondary sexual traits and the corresponding preferences for them may be transmitted culturally rather than by genetic inheritance. Evidence for such cultural transmission commonly invokes spatial patterns of local similarity, with neighbouring individuals or populations appearing similar to each other. Alternative explanations for local similarity include ecological similarity of neighbouring environments and confounding genetic effects caused by aggregations of kin. We found that bowers built by male spotted bowerbirds, Chlamydera maculata, within a single population showed fine-scale similarities between neighbours in the decorations displayed on them. Such similarities did not covary with local decoration availability, local display environment or kinship and could not be explained by stealing behaviour by neighbours. Instead, we suggest that these similarities are products of local tradition, either culturally transmitted by neighbouring males who regularly inspect neighbours' bowers, or as localized responses to variable individual female preferences.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号