期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

张涛涛郭茂祖邹权《生物信息学》2008,6(2):65-68

序列比对是生物信息学中的一项重要任务,通过序列比对可以发现生物序列中的功能、结构和进化的信息。序列比对结果的生物学意义与所选择的匹配、不匹配、插入和删除以及空隙的罚分函数密切相关。现介绍一种参数序列比对方法,该方法把最佳比对作为权值和罚分的函数,可以系统地得到参数的选择对最佳比对结果的影响。然后将其应用于RNA序列比对,分析不同的参数选择对序列比对结果的影响。最后指出参数序列比对算法的应用以及未来的发展方向。相似文献

2.

Fast, optimal alignment of three sequences using linear gap costs 总被引：2，自引：0，他引：2

Powell DR Allison L Dix TI 《Journal of theoretical biology》2000,207(3):325-336

Alignment algorithms can be used to infer a relationship between sequences when the true relationship is unknown. Simple alignment algorithms use a cost function that gives a fixed cost to each possible point mutation-mismatch, deletion, insertion. These algorithms tend to find optimal alignments that have many small gaps. It is more biologically plausible to have fewer longer gaps rather than many small gaps in an alignment. To address this issue, linear gap cost algorithms are in common use for aligning biological sequence data. More reliable inferences are obtained by aligning more than two sequences at a time. The obvious dynamic programming algorithm for optimally aligning k sequences of length n runs in O(n(k)) time. This is impractical if k>/=3 and n is of any reasonable length. Thus, for this problem there are many heuristics for aligning k sequences, however, they are not guaranteed to find an optimal alignment. In this paper, we present a new algorithm guaranteed to find the optimal alignment for three sequences using linear gap costs. This gives the same results as the dynamic programming algorithm for three sequences, but typically does so much more quickly. It is particularly fast when the (three-way) edit distance is small. Our algorithm uses a speed-up technique based on Ukkonen's greedy algorithm (Ukkonen, 1983) which he presented for two sequences and simple costs. 相似文献

3.

Locality and gaps in RNA comparison.

Rolf Backofen Shihyen Chen Danny Hermelin Gad M Landau Mikhail A Roytberg Oren Weimann Kaizhong Zhang 《Journal of computational biology》2007,14(8):1074-1087

Locality is an important and well-studied notion in comparative analysis of biological sequences. Similarly, taking into account affine gap penalties when calculating biological sequence alignments is a well-accepted technique for obtaining better alignments. When dealing with RNA, one has to take into consideration not only sequential features, but also structural features of the inspected molecule. This makes the computation more challenging, and usually prohibits the comparison only to small RNAs. In this paper we introduce two local metrics for comparing RNAs that extend the Smith-Waterman metric and its normalized version used for string comparison. We also present a global RNA alignment algorithm which handles affine gap penalties. Our global algorithm runs in O(m(2)n(1 + lg n/m)) time, while our local algorithms run in O(m(2)n(1 + lg n/m)) and O(n(2)m) time, respectively, where m 相似文献

4.

Comparative physical mapping between Oryza sativa (AA genome type) and O. punctata (BB genome type)

Kim H San Miguel P Nelson W Collura K Wissotski M Walling JG Kim JP Jackson SA Soderlund C Wing RA 《Genetics》2007,176(1):379-390

A comparative physical map of the AA genome (Oryza sativa) and the BB genome (O. punctata) was constructed by aligning a physical map of O. punctata, deduced from 63,942 BAC end sequences (BESs) and 34,224 fingerprints, onto the O. sativa genome sequence. The level of conservation of each chromosome between the two species was determined by calculating a ratio of BES alignments. The alignment result suggests more divergence of intergenic and repeat regions in comparison to gene-rich regions. Further, this characteristic enabled localization of heterochromatic and euchromatic regions for each chromosome of both species. The alignment identified 16 locations containing expansions, contractions, inversions, and transpositions. By aligning 40% of the punctata BES on the map, 87% of the punctata FPC map covered 98% of the O. sativa genome sequence. The genome size of O. punctata was estimated to be 8% larger than that of O. sativa with individual chromosome differences of 1.5-16.5%. The sum of expansions and contractions observed in regions >500 kb were similar, suggesting that most of the contractions/expansions contributing to the genome size difference between the two species are small, thus preserving the macro-collinearity between these species, which diverged approximately 2 million years ago. 相似文献

5.

Multiple RNA structure alignment

Wang Z Zhang K 《Journal of bioinformatics and computational biology》2005,3(3):609-626

Ribonucleic Acid (RNA) structures can be viewed as a special kind of strings where characters in a string can bond with each other. The question of aligning two RNA structures has been studied for a while, and there are several successful algorithms that are based upon different models. In this paper, by adopting the model introduced in Wang and Zhang,(19) we propose two algorithms to attack the question of aligning multiple RNA structures. Our methods are to reduce the multiple RNA structure alignment problem to the problem of aligning two RNA structure alignments. Meanwhile, we will show that the framework of sequence center star alignment algorithm can be applied to the problem of multiple RNA structure alignment, and if the triangle inequality is met in the scoring matrix, the approximation ratio of the algorithm remains to be 2-2(over)n, where n is the total number of structures. 相似文献

6.

PCMA: fast and accurate multiple sequence alignment based on profile consistency 总被引：8，自引：0，他引：8

Pei J Sadreyev R Grishin NV 《Bioinformatics (Oxford, England)》2003,19(3):427-428

PCMA (profile consistency multiple sequence alignment) is a progressive multiple sequence alignment program that combines two different alignment strategies. Highly similar sequences are aligned in a fast way as in ClustalW, forming pre-aligned groups. The T-Coffee strategy is applied to align the relatively divergent groups based on profile-profile comparison and consistency. The scoring function for local alignments of pre-aligned groups is based on a novel profile-profile comparison method that is a generalization of the PSI-BLAST approach to profile-sequence comparison. PCMA balances speed and accuracy in a flexible way and is suitable for aligning large numbers of sequences. AVAILABILITY: PCMA is freely available for non-commercial use. Pre-compiled versions for several platforms can be downloaded from ftp://iole.swmed.edu/pub/PCMA/. 相似文献

7.

Learning scoring schemes for sequence alignment from partial examples

Kim E Kececioglu J 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2008,5(4):546-556

When aligning biological sequences, the choice of parameter values for the alignment scoring function is critical. Small changes in gap penalties, for example, can yield radically different alignments. A rigorous way to compute parameter values that are appropriate for aligning biological sequences is through inverse parametric sequence alignment. Given a collection of examples of biologically correct alignments, this is the problem of finding parameter values that make the scores of the example alignments close to those of optimal alignments for their sequences. We extend prior work on inverse parametric alignment to partial examples, which contain regions where the alignment is left unspecified, and to an improved formulation based on minimizing the average error between the score of an example and the score of an optimal alignment. Experiments on benchmark biological alignments show we can find parameters that generalize across protein families and that boost the accuracy of multiple sequence alignment by as much as 25%. 相似文献

8.

Space-efficient whole genome comparisons with Burrows-Wheeler transforms.

Ross A Lippert 《Journal of computational biology》2005,12(4):407-415

相似文献

9.

A simple algorithm to infer gene duplication and speciation events on a gene tree 总被引：7，自引：0，他引：7

Zmasek CM Eddy SR 《Bioinformatics (Oxford, England)》2001,17(9):821-828

MOTIVATION: When analyzing protein sequences using sequence similarity searches, orthologous sequences (that diverged by speciation) are more reliable predictors of a new protein's function than paralogous sequences (that diverged by gene duplication), because duplication enables functional diversification. The utility of phylogenetic information in high-throughput genome annotation ('phylogenomics') is widely recognized, but existing approaches are either manual or indirect (e.g. not based on phylogenetic trees). Our goal is to automate phylogenomics using explicit phylogenetic inference. A necessary component is an algorithm to infer speciation and duplication events in a given gene tree. RESULTS: We give an algorithm to infer speciation and duplication events on a gene tree by comparison to a trusted species tree. This algorithm has a worst-case running time of O(n(2)) which is inferior to two previous algorithms that are approximately O(n) for a gene tree of sequences. However, our algorithm is extremely simple, and its asymptotic worst case behavior is only realized on pathological data sets. We show empirically, using 1750 gene trees constructed from the Pfam protein family database, that it appears to be a practical (and often superior) algorithm for analyzing real gene trees. AVAILABILITY: http://www.genetics.wustl.edu/eddy/forester. 相似文献

10.

Multiple alignment by aligning alignments

Wheeler TJ Kececioglu JD 《Bioinformatics (Oxford, England)》2007,23(13):i559-i568

MOTIVATION: Multiple sequence alignment is a fundamental task in bioinformatics. Current tools typically form an initial alignment by merging subalignments, and then polish this alignment by repeated splitting and merging of subalignments to obtain an improved final alignment. In general this form-and-polish strategy consists of several stages, and a profusion of methods have been tried at every stage. We carefully investigate: (1) how to utilize a new algorithm for aligning alignments that optimally solves the common subproblem of merging subalignments, and (2) what is the best choice of method for each stage to obtain the highest quality alignment. RESULTS: We study six stages in the form-and-polish strategy for multiple alignment: parameter choice, distance estimation, merge-tree construction, sequence-pair weighting, alignment merging, and polishing. For each stage, we consider novel approaches as well as standard ones. Interestingly, the greatest gains in alignment quality come from (i) estimating distances by a new approach using normalized alignment costs, and (ii) polishing by a new approach using 3-cuts. Experiments with a parameter-value oracle suggest large gains in quality may be possible through an input-dependent choice of alignment parameters, and we present a promising approach for building such an oracle. Combining the best approaches to each stage yields a new tool we call Opal that on benchmark alignments matches the quality of the top tools, without employing alignment consistency or hydrophobic gap penalties. AVAILABILITY: Opal, a multiple alignment tool that implements the best methods in our study, is freely available at http://opal.cs.arizona.edu. 相似文献

11.

A memory-efficient algorithm for multiple sequence alignment with constraints

Lu CL Huang YP 《Bioinformatics (Oxford, England)》2005,21(1):20-30

MOTIVATION: Recently, the concept of the constrained sequence alignment was proposed to incorporate the knowledge of biologists about structures/functionalities/consensuses of their datasets into sequence alignment such that the user-specified residues/nucleotides are aligned together in the computed alignment. The currently developed programs use the so-called progressive approach to efficiently obtain a constrained alignment of several sequences. However, the kernels of these programs, the dynamic programming algorithms for computing an optimal constrained alignment between two sequences, run in (gamman2) memory, where gamma is the number of the constraints and n is the maximum of the lengths of sequences. As a result, such a high memory requirement limits the overall programs to align short sequences only. RESULTS: We adopt the divide-and-conquer approach to design a memory-efficient algorithm for computing an optimal constrained alignment between two sequences, which greatly reduces the memory requirement of the dynamic programming approaches at the expense of a small constant factor in CPU time. This new algorithm consumes only O(alphan) space, where alpha is the sum of the lengths of constraints and usually alpha < n in practical applications. Based on this algorithm, we have developed a memory-efficient tool for multiple sequence alignment with constraints. AVAILABILITY: http://genome.life.nctu.edu.tw/MUSICME. 相似文献

12.

Local sequence-structure motifs in RNA

Backofen R Will S 《Journal of bioinformatics and computational biology》2004,2(4):681-698

Ribonuclic acid (RNA) enjoys increasing interest in molecular biology; despite this interest fundamental algorithms are lacking, e.g. for identifying local motifs. As proteins, RNA molecules have a distinctive structure. Therefore, in addition to sequence information, structure plays an important part in assessing the similarity of RNAs. Furthermore, common sequence-structure features in two or several RNA molecules are often only spatially local, where possibly large parts of the molecules are dissimilar. Consequently, we address the problem of comparing RNA molecules by computing an optimal local alignment with respect to sequence and structure information. While local alignment is superior to global alignment for identifying local similarities, no general local sequence-structure alignment algorithms are currently known. We suggest a new general definition of locality for sequence-structure alignments that is biologically motivated and efficiently tractable. To show the former, we discuss locality of RNA and prove that the defined locality means connectivity by atomic and non-atomic bonds. To show the latter, we present an efficient algorithm for the newly defined pairwise local sequence-structure alignment (lssa) problem for RNA. For molecules of lengthes n and m, the algorithm has worst-case time complexity of O(n2 x m2 x max(n,m)) and a space complexity of only O(n x m). An implementation of our algorithm is available at http://www.bio.inf.uni-jena.de. Its runtime is competitive with global sequence-structure alignment. 相似文献

13.

Comparison of additive trees using circular orders.

V Makarenkov B Leclerc 《Journal of computational biology》2000,7(5):731-744

It has been postulated that existing species have been linked in the past in a way that can be described using an additive tree structure. Any such tree structure reflecting species relationships is associated with a matrix of distances between the species considered which is called a distance matrix or a tree metric matrix. A circular order of elements of X corresponds to a circular (clockwise) scanning of the subset X of vertices of a tree drawn on a plane. This paper describes an optimal algorithm using circular orders to compare the topology of two trees given by their distance matrices. This algorithm allows us to compute the Robinson and Foulds topologic distance between two trees. It employs circular order tree reconstruction to compute an ordered bipartition table of the tree edges for both given distance matrices. These bipartition tables are then compared to determine the Robinson and Foulds topologic distance, known to be an important criterion of tree similarity. The described algorithm has optimal time complexity, requiring O(n(2)) time when performed on two n x n distance matrices. It can be generalized to get another optimal algorithm, which enables the strict consensus tree of k unrooted trees, given their distance matrices, to be constructed in O(kn(2)) time. 相似文献

14.

Retrieval and on-the-fly alignment of sequence fragments from the HIV database

Gaschen B Kuiken C Korber B Foley B 《Bioinformatics (Oxford, England)》2001,17(5):415-418

相似文献

15.

NdPASA: a pairwise sequence alignment server for distantly related proteins

Li W Wang J Feng JA 《Bioinformatics (Oxford, England)》2005,21(19):3803-3805

SUMMARY: NdPASA is a web server specifically designed to optimize sequence alignment between distantly related proteins. The program integrates structure information of the template sequence into a global alignment algorithm by employing neighbor-dependent propensities of amino acids as a unique parameter for alignment. NdPASA optimizes alignment by evaluating the likelihood of a residue pair in the query sequence matching against a corresponding residue pair adopting a particular secondary structure in the template sequence. NdPASA is most effective in aligning homologous proteins sharing low percentage of sequence identity. The server is designed to aid homologous protein structure modeling. A PSI-BLAST search engine was implemented to help users identify template candidates that are most appropriate for modeling the query sequences. 相似文献

16.

Comparative study of sequence aligners for detecting antibiotic resistance in bacterial metagenomes

下载免费PDF全文

C. McCall I. Xagoraraki 《Letters in applied microbiology》2018,66(3):162-168

We aim to compare the performance of Bowtie2 , bwa‐mem , blastn and blastx when aligning bacterial metagenomes against the Comprehensive Antibiotic Resistance Database (CARD). Simulated reads were used to evaluate the performance of each aligner under the following four performance criteria: correctly mapped, false positives, multi‐reads and partials. The optimal alignment approach was applied to samples from two wastewater treatment plants to detect antibiotic resistance genes using next generation sequencing. blastn mapped with greater accuracy among the four sequence alignment approaches considered followed by Bowtie2 . blastx generated the greatest number of false positives and multi‐reads when aligned against the CARD. The performance of each alignment tool was also investigated using error‐free reads. Although each aligner mapped a greater number of error‐free reads as compared to Illumina‐error reads, in general, the introduction of sequencing errors had little effect on alignment results when aligning against the CARD. Given each performance criteria, blastn was found to be the most favourable alignment tool and was therefore used to assess resistance genes in sewage samples. Beta‐lactam and aminoglycoside were found to be the most abundant classes of antibiotic resistance genes in each sample.

Significance and Impact of the Study

Antibiotic resistance genes (ARGs) are pollutants known to persist in wastewater treatment plants among other environments, thus methods for detecting these genes have become increasingly relevant. Next generation sequencing has brought about a host of sequence alignment tools that provide a comprehensive look into antimicrobial resistance in environmental samples. However, standardizing practices in ARG metagenomic studies is challenging since results produced from alignment tools can vary significantly. Our study provides sequence alignment results of synthetic, and authentic bacterial metagenomes mapped against an ARG database using multiple alignment tools, and the best practice for detecting ARGs in environmental samples. 相似文献

17.

COACH: profile-profile alignment of protein families using hidden Markov models 总被引：1，自引：0，他引：1

Edgar RC Sjölander K 《Bioinformatics (Oxford, England)》2004,20(8):1309-1318

MOTIVATION: Alignments of two multiple-sequence alignments, or statistical models of such alignments (profiles), have important applications in computational biology. The increased amount of information in a profile versus a single sequence can lead to more accurate alignments and more sensitive homolog detection in database searches. Several profile-profile alignment methods have been proposed and have been shown to improve sensitivity and alignment quality compared with sequence-sequence methods (such as BLAST) and profile-sequence methods (e.g. PSI-BLAST). Here we present a new approach to profile-profile alignment we call Comparison of Alignments by Constructing Hidden Markov Models (HMMs) (COACH). COACH aligns two multiple sequence alignments by constructing a profile HMM from one alignment and aligning the other to that HMM. RESULTS: We compare the alignment accuracy of COACH with two recently published methods: Yona and Levitt's prof_sim and Sadreyev and Grishin's COMPASS. On two sets of reference alignments selected from the FSSP database, we find that COACH is able, on average, to produce alignments giving the best coverage or the fewest errors, depending on the chosen parameter settings. AVAILABILITY: COACH is freely available from www.drive5.com/lobster 相似文献

18.

MGAlignIt: A web service for the alignment of mRNA/EST and genomic sequences

下载免费PDF全文

Lee BT Tan TW Ranganathan S 《Nucleic acids research》2003,31(13):3533-3536

相似文献

19.

VOSTORG: a package of microcomputer programs for sequence analysis and construction of phylogenetic trees 总被引：4，自引：0，他引：4

A A Zharkikh P S Rzhetsky AYuMorosov T L Sitnikova J S Krushkal 《Gene》1991,101(2):251-254

VOSTORG is a new, versatile package of programs for the inference and presentation of phylogenetic trees, as well as an efficient tool for nucleotide (nt) and amino acid (aa) sequence analysis (sequence input, verification, alignment, construction of consensus, etc.). On appropriately equipped systems, these data can be displayed on a video monitor or printed as required. They are implemented on IBM PC/XT/AT/PS-2 or compatible computers and hardware graphic support is recommended. The package is designed to be easily handled by occasional computer users and yet it is powerful enough for experienced professionals. 相似文献

20.

A new approach to sequence comparison: normalized sequence alignment 总被引：3，自引：0，他引：3

Arslan AN Eğecioğlu O Pevzner PA 《Bioinformatics (Oxford, England)》2001,17(4):327-337

The Smith-Waterman algorithm for local sequence alignment is one of the most important techniques in computational molecular biology. This ingenious dynamic programming approach was designed to reveal the highly conserved fragments by discarding poorly conserved initial and terminal segments. However, the existing notion of local similarity has a serious flaw: it does not discard poorly conserved intermediate segments. The Smith-Waterman algorithm finds the local alignment with maximal score but it is unable to find local alignment with maximum degree of similarity (e.g. maximal percent of matches). Moreover, there is still no efficient algorithm that answers the following natural question: do two sequences share a (sufficiently long) fragment with more than 70% of similarity? As a result, the local alignment sometimes produces a mosaic of well-conserved fragments artificially connected by poorly-conserved or even unrelated fragments. This may lead to problems in comparison of long genomic sequences and comparative gene prediction as recently pointed out by Zhang et al. (Bioinformatics, 15, 1012-1019, 1999). In this paper we propose a new sequence comparison algorithm (normalized local alignment ) that reports the regions with maximum degree of similarity. The algorithm is based on fractional programming and its running time is O(n2log n). In practice, normalized local alignment is only 3-5 times slower than the standard Smith-Waterman algorithm. 相似文献