首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Motivation: Sequence alignment is the problem of finding theoptimal character-by-character correspondence between two sequences.It can be readily solved in O(n2) time and O(n2) space on aserial machine, or in O(n) time with O(n) space per O(n) processingelements on a parallel machine. Hirschberg's divide-and-conquerapproach for finding the single best path reduces space useby a factor of n while inducing only a small constant slowdownto the serial version. Results: This paper presents a family of methods for computingsequence alignments with reduced memory that are well suitedto serial or parallel implementation. Unlike the divide-and-conquerapproach, they can be used in the forward-backward (Baum-Welch)training of linear hidden Markov models, and they avoid data-dependentrepartitioning, making them easier to parallelize. The algorithmsfeature, for an arbitrary integer L, a factor proportional toL slowdown in exchange for reducing space requirement from O(n2)to O(n). A single best path member of this algorithm familymatches the quadratic time and linear space of the divide-and-conqueralgorithm. Experimentally, the O(n1.5)-space member of the familyis 15–40% faster than the O(n)-space divide-and-conqueralgorithm. Availability: The methods will soon be incorporated in the SAMhidden Markov modeling package http: //www.cse.ucs-c.edu/research/compbio/sam.html. Contact: wzrph{at}cse.ucsc.edu  相似文献   

2.

Background  

We propose a multiple sequence alignment (MSA) algorithm and compare the alignment-quality and execution-time of the proposed algorithm with that of existing algorithms. The proposed progressive alignment algorithm uses a grammar-based distance metric to determine the order in which biological sequences are to be pairwise aligned. The progressive alignment occurs via pairwise aligning new sequences with an ensemble of the sequences previously aligned.  相似文献   

3.
Fast, optimal alignment of three sequences using linear gap costs   总被引:2,自引:0,他引:2  
Alignment algorithms can be used to infer a relationship between sequences when the true relationship is unknown. Simple alignment algorithms use a cost function that gives a fixed cost to each possible point mutation-mismatch, deletion, insertion. These algorithms tend to find optimal alignments that have many small gaps. It is more biologically plausible to have fewer longer gaps rather than many small gaps in an alignment. To address this issue, linear gap cost algorithms are in common use for aligning biological sequence data. More reliable inferences are obtained by aligning more than two sequences at a time. The obvious dynamic programming algorithm for optimally aligning k sequences of length n runs in O(n(k)) time. This is impractical if k>/=3 and n is of any reasonable length. Thus, for this problem there are many heuristics for aligning k sequences, however, they are not guaranteed to find an optimal alignment. In this paper, we present a new algorithm guaranteed to find the optimal alignment for three sequences using linear gap costs. This gives the same results as the dynamic programming algorithm for three sequences, but typically does so much more quickly. It is particularly fast when the (three-way) edit distance is small. Our algorithm uses a speed-up technique based on Ukkonen's greedy algorithm (Ukkonen, 1983) which he presented for two sequences and simple costs.  相似文献   

4.
Multiple sequence alignment plays an important role in molecular sequence analysis. An alignment is the arrangement of two (pairwise alignment) or more (multiple alignment) sequences of 'residues' (nucleotides or amino acids) that maximizes the similarities between them. Algorithmically, the problem consists of opening and extending gaps in the sequences to maximize an objective function (measurement of similarity). A simple genetic algorithm was developed and implemented in the software MSA-GA. Genetic algorithms, a class of evolutionary algorithms, are well suited for problems of this nature since residues and gaps are discrete units. An evolutionary algorithm cannot compete in terms of speed with progressive alignment methods but it has the advantage of being able to correct for initially misaligned sequences; which is not possible with the progressive method. This was shown using the BaliBase benchmark, where Clustal-W alignments were used to seed the initial population in MSA-GA, improving outcome. Alignment scoring functions still constitute an open field of research, and it is important to develop methods that simplify the testing of new functions. A general evolutionary framework for testing and implementing different scoring functions was developed. The results show that a simple genetic algorithm is capable of optimizing an alignment without the need of the excessively complex operators used in prior study. The clear distinction between objective function and genetic algorithms used in MSA-GA makes extending and/or replacing objective functions a trivial task.  相似文献   

5.
Sequence alignment by cross-correlation.   总被引:1,自引:0,他引:1  
Many recent advances in biology and medicine have resulted from DNA sequence alignment algorithms and technology. Traditional approaches for the matching of DNA sequences are based either on global alignment schemes or heuristic schemes that seek to approximate global alignment algorithms while providing higher computational efficiency. This report describes an approach using the mathematical operation of cross-correlation to compare sequences. It can be implemented using the fast fourier transform for computational efficiency. The algorithm is summarized and sample applications are given. These include gene sequence alignment in long stretches of genomic DNA, finding sequence similarity in distantly related organisms, demonstrating sequence similarity in the presence of massive (approximately 90%) random point mutations, comparing sequences related by internal rearrangements (tandem repeats) within a gene, and investigating fusion proteins. Application to RNA and protein sequence alignment is also discussed. The method is efficient, sensitive, and robust, being able to find sequence similarities where other alignment algorithms may perform poorly.  相似文献   

6.
MOTIVATION: Recently, the concept of the constrained sequence alignment was proposed to incorporate the knowledge of biologists about structures/functionalities/consensuses of their datasets into sequence alignment such that the user-specified residues/nucleotides are aligned together in the computed alignment. The currently developed programs use the so-called progressive approach to efficiently obtain a constrained alignment of several sequences. However, the kernels of these programs, the dynamic programming algorithms for computing an optimal constrained alignment between two sequences, run in (gamman2) memory, where gamma is the number of the constraints and n is the maximum of the lengths of sequences. As a result, such a high memory requirement limits the overall programs to align short sequences only. RESULTS: We adopt the divide-and-conquer approach to design a memory-efficient algorithm for computing an optimal constrained alignment between two sequences, which greatly reduces the memory requirement of the dynamic programming approaches at the expense of a small constant factor in CPU time. This new algorithm consumes only O(alphan) space, where alpha is the sum of the lengths of constraints and usually alpha < n in practical applications. Based on this algorithm, we have developed a memory-efficient tool for multiple sequence alignment with constraints. AVAILABILITY: http://genome.life.nctu.edu.tw/MUSICME.  相似文献   

7.
This article extends the use of dynamic programming algorithms in molecular sequence comparison to the alignment of the α-carbon (Cα-) coordinates of two protein structures in three dimensions. The algorithm is described in detail and is applied to the comparison of α-lactalbumin with both hen egg white lysozyme and T4 lysozyme. In the first case, the structures are similar, while the second comparison is between two distantly related molecules. References are made to the usual sequence alignments. A variety of complementary methods are introduced to display the results. National Research Council of Canada Publication No. 29461.  相似文献   

8.
Sequence analysis is the basis of bioinformatics, while sequence alignment is a fundamental task for sequence analysis. The widely used alignment algorithm, Dynamic Programming, though generating optimal alignment, takes too much time due to its high computation complexity O(N(2)). In order to reduce computation complexity without sacrificing too much accuracy, we have developed a new approach to align two homologous sequences. The new approach presented here, adopting our novel algorithm which combines the methods of probabilistic and combinatorial analysis, reduces the computation complexity to as low as O(N). The computation speed by our program is at least 15 times faster than traditional pairwise alignment algorithms without a loss of much accuracy. We hence named the algorithm Super Pairwise Alignment (SPA). The pairwise alignment execution program based on SPA and the detailed results of the aligned sequences discussed in this article are available upon request.  相似文献   

9.
The constrained multiple sequence alignment problem is to align a set of sequences of maximum length n subject to a given constrained sequence, which arises from some knowledge of the structure of the sequences. This paper presents new algorithms for this problem, which are more efficient in terms of time and space (memory) than the previous algorithms, and with a worst-case guarantee on the quality of the alignment. Saving the space requirement by a quadratic factor is particularly significant as the previous O(n4)-space algorithm has limited application due to its huge memory requirement. Experiments on real data sets confirm that our new algorithms show improvements in both alignment quality and resource requirements.  相似文献   

10.
Pairwise sequence alignments aim to decide whether two sequences are related and, if so, to exhibit their related domains. Recent works have pointed out that a significant number of true homologous sequences are missed when using classical comparison algorithms. This is the case when two homologous sequences share several little blocks of homology, too small to lead to a significant score. On the other hand, classical alignment algorithms, when detecting homologies, may fail to recognize all the significant biological signals. The aim of the paper is to give a solution to these two problems. We propose a new scoring method which tends to increase the score of an alignment when "blocks" are detected. This so-called Block-Scoring algorithm, which makes use of dynamic programming, is worth being used as a complementary tool to classical exact alignments methods. We validate our approach by applying it on a large set of biological data. Finally, we give a limit theorem for the score statistics of the algorithm.  相似文献   

11.

Background  

Multiple sequence alignment is fundamental. Exponential growth in computation time appears to be inevitable when an optimal alignment is required for many sequences. Exact costs of optimum alignments are therefore rarely computed. Consequently much effort has been invested in algorithms for alignment that are heuristic, or explore a restricted class of solutions. These give an upper bound on the alignment cost, but it is equally important to determine the quality of the solution obtained. In the absence of an optimal alignment with which to compare, lower bounds may be calculated to assess the quality of the alignment. As more effort is invested in improving upper bounds (alignment algorithms), it is therefore important to improve lower bounds as well. Although numerous cost metrics can be used to determine the quality of an alignment, many are based on sum-of-pairs (SP) measures and their generalizations.  相似文献   

12.
High-resolution reflectance spectra in the range of 400–850nm were obtained from Lake Kinneret during a period when densepopulations of the dinoflagellate Peridinium gatunense dominatedthe phytoplankton. Chlorophyll (Chl) concentrations ranged from5.1 to 185 mg m–3 and from 2.4 to 187.5 mg m–3 inthe samples of two independent experiments. The most prominentfeatures of the reflectance spectra were: (i) a wide minimumfrom 400 to 500 nm; (ii) a maximum at 550–570 nm, whichdid not surpass 3% in samples with high Chl concentration (>20mgm–3), indicating a strong absorption by pigments in thegreen range of the spectrum; (iii) a minimum at 676 nm; thiswas {small tilde}1% and was almost insensitive to variationin Chl concentration >10 mg m–3; (iv) a maximum reflectanceshowed near 700 nm; its magnitude and position were highly dependenton chlorophyll concentration. High-spectral-resolution datawere used as a guideline for selection of the most suitablespectral bands for chlorophyll remote sensing. Models were devised,based on the calculation of the integrated area above the baselinefrom 670 to 850 nm and the reflectance maximal height withinthis range. Some algorithms already used m previous studieswere tested and showed a plausible degree of accuracy when appliedto the current data base. However, novel models devised in thisstudy improved substantially the accuracy of Chl estimationby remotely sensed data, by reducing the estimation error from>11 to 6.5 mg m–3 Those models were validated by anindependent data set where Chl concentration ranged over twoorders of magnitude. The use of three relatively narrow spectralbands was sufficient for Chl mapping in Lake Kinneret. Therefore,a relatively simple sensor, measuring only a few bands willbe employed in future applications for Chl monitoring in inlandwaters. Radiometric data were also used to simulate radiancesin the channels of TM Landsat and to find the algorithm forChl assessment. The ratio of channel 4 to channel 3 was usedand enabled Chl estimation with an error of <15mg m–3This algorithm was employed to map Chl in the entire area ofLake Kinneret with 10 gradations.  相似文献   

13.
Homology search is a key tool for understanding the role, structure, and biochemical function of genomic sequences. The most popular technique for rapid homology search is BLAST, which has been in widespread use within universities, research centers, and commercial enterprises since the early 1990s. We propose a new step in the BLAST algorithm to reduce the computational cost of searching with negligible effect on accuracy. This new step - semigapped alignment - compromises between the efficiency of ungapped alignment and the accuracy of gapped alignment, allowing BLAST to accurately filter sequences with lower computational cost. In addition, we propose a heuristic - restricted insertion alignment - that avoids unlikely evolutionary paths with the aim of reducing gapped alignment cost with negligible effect on accuracy. Together, after including an optimization of the local alignment recursion, our two techniques more than double the speed of the gapped alignment stages in blast. We conclude that our techniques are an important improvement to the BLAST algorithm. Source code for the alignment algorithms is available for download at http://www.bsg.rmit.edu.au/iga/.  相似文献   

14.
A widely used algorithm for computing an optimal local alignment between two sequences requires a parameter set with a substitution matrix and gap penalties. It is recognized that a proper parameter set should be selected to suit the level of conservation between sequences. We describe an algorithm for selecting an appropriate substitution matrix at given gap penalties for computing an optimal local alignment between two sequences. In the algorithm, a substitution matrix that leads to the maximum alignment similarity score is selected among substitution matrices at various evolutionary distances. The evolutionary distance of the selected substitution matrix is defined as the distance of the computed alignment. To show the effects of gap penalties on alignments and their distances and help select appropriate gap penalties, alignments and their distances are computed at various gap penalties. The algorithm has been implemented as a computer program named SimDist. The SimDist program was compared with an existing local alignment program named SIM for finding reciprocally best-matching pairs (RBPs) of sequences in each of 100 protein families, where RBPs are commonly used as an operational definition of orthologous sequences. SimDist produced more accurate results than SIM on 50 of the 100 families, whereas both programs produced the same results on the other 50 families. SimDist was also used to compare three types of substitution matrices in scoring 444,461 pairs of homologous sequences from the 100 families.  相似文献   

15.
CO2-exchange rates (CER) of the sixth and the flag leaves oftwo spring-wheat varieties, Kolibri and Famos, were comparedusing an open-circuit infrared gas analysing system. Measurementswere repeated every two weeks starting when leaf blades werefully expanded. Single plants were grown in a controlled environmenthaving a photopuiod of 15 h and a day/night temperature of 24/19°C(H), 18/13 °C (M), and 12/7 °C (L) respectively untilapprox. 2 weeks after anthesis and at 18/13 °C until maturity.The photosynthetic photon-flux density (PPFD) at the top ofthe plants was 500 µE m–2 sec–1. During themeasurements PPFD was gradually reduced from 2000 to 0 µEm–2 sec–1 whereas the temperature was maintainedat the respctive growth-temperatures during the light period.The CER of the sixth leaf declined fairly similarly for bothvarieties, except for Kolibri where a faster decline was observedduring the first two weeks after full leaf expansion. The CERof the flag leaf declined more slowly than that of the sixthleaf. With the flag leaf of Famos, the decline was nearly linear,whereas with Kolibri it was very slow during the first few weeksbut rapid as the leaves further senesced. This pattern becamemore pronounced as the growth temperature decreased. The declinein relation to leaf age was much smaller at low PPFD than athigh PPFD during the same period. At full leaf expansion Kolibrireached higher maximum CER than Famos except at H. As the PPFDwas reduced the difference became smaller and at very low PPFDsuch as 50 µE m–2 sec–1 was reversed for thesixth leaf. Under optimum growth conditions maximum values ofCER were greater than 50mg CO2 dm–2h–1 and PPFDfor light saturation was close to 2000 µE m–2 sec–1.A comparison between the actual CER and a fitted curve widelyused, PN=(a+b/l)–1–DR, showed that the goodnessof fit strongly depends on cultivar, treatment and leaf ageas well as on the number and the level of PPFD from which datafor calculations are taken. Triticum aestivum, L., wheat, photosynthesis, photon-flux density, light response, carbon, dioxide exchange  相似文献   

16.

Background  

Multiple sequence alignment (MSA) is a useful tool in bioinformatics. Although many MSA algorithms have been developed, there is still room for improvement in accuracy and speed. In the alignment of a family of protein sequences, global MSA algorithms perform better than local ones in many cases, while local ones perform better than global ones when some sequences have long insertions or deletions (indels) relative to others. Many recent leading MSA algorithms have incorporated pairwise alignment information obtained from a mixture of sources into their scoring system to improve accuracy of alignment containing long indels.  相似文献   

17.
The aim of the work is to develop a common method for estimating the pairwise alignment quality versus the evolutionary distance (degree of homology) between the sequences being compared and versus the type of alignment procedure. 3D alignments or any data on 3D protein structure are not used in the study. Based on the accepted protein sequences evolution model, it is possible to estimate the capability of the concrete alignment algorithm to recover the genuine alignment. In this study a classical Needleman and Wunsch global alignment algorithm has been tested on a set of sequences from the Prefab database. Accuracy and confidence of a global alignment procedure were calculated as dependent on the shares of insertions/deletions and mutations.  相似文献   

18.
A fast and sensitive multiple sequence alignment algorithm   总被引:4,自引:0,他引:4  
A two-step multiple alignment strategy is presented that allowsrapid alignment of a set of homologous sequences and comparisonof pre-aligned groups of sequences. Examples are given demonstratingthe improvement in the quality of alignments when comparingentire groups instead of single sequences. The modular designof computer programs based on this algorithm allows for storageof aligned sequences and successive alignment of any numberof sequences. Received on August 23, 1988; accepted on December 6, 1988  相似文献   

19.
In manganese-retaining PS II membranes, photooxidized iodide-125labels a site close to the Cl- and/or manganese (in S2state)-binding sites in D1 protein, whereas in manganese-depletedPS II membranes it labels a site close to the Z$-binding sitein D1 protein (Ikeuchi et al.(1988) Biochim. Biophys. Acta 932:160–169). Amino acid analysis revealed that monoiodotyrosineis the sole amino acid iodinated, and peptide mapping analysisshowed that the iodination site is located between proline-141and methionine-172 of D1 in both samples. These results implythat the tyrosine residue at 147 and/or 161 of D1 is the targetof iodination irrespective of the presence or absence of manganese.Although both of the two tyrosine residues stay in membrane-spanning-helix based on proposed D1 structure, only the tyrosine-161residue is close to the lumen surface and seems to be the mostlikely candidate for iodination site. It could be also assumedin turn that Cl-, manganese- and Z$-binding sites areclose to this tyrosine-161 residue in D1 protein. (Received February 5, 1988; Accepted March 26, 1988)  相似文献   

20.

Background  

The performance of alignment programs is traditionally tested on sets of protein sequences, of which a reference alignment is known. Conclusions drawn from such protein benchmarks do not necessarily hold for the RNA alignment problem, as was demonstrated in the first RNA alignment benchmark published so far. For example, the twilight zone – the similarity range where alignment quality drops drastically – starts at 60 % for RNAs in comparison to 20 % for proteins. In this study we enhance the previous benchmark.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号