期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A statistical score for assessing the quality of multiple sequence alignments

Virpi Ahola Tero Aittokallio Mauno Vihinen Esa Uusipaikka 《BMC bioinformatics》2006,7(1):484

Background

Multiple sequence alignment is the foundation of many important applications in bioinformatics that aim at detecting functionally important regions, predicting protein structures, building phylogenetic trees etc. Although the automatic construction of a multiple sequence alignment for a set of remotely related sequences cause a very challenging and error-prone task, many downstream analyses still rely heavily on the accuracy of the alignments. 相似文献

2.

A weighted average difference method for detecting differentially expressed genes from microarray data

Koji Kadota Yuji Nakai Kentaro Shimizu 《Algorithms for molecular biology : AMB》2008,3(1):1-12

相似文献

3.

Improved alignment quality by combining evolutionary information, predicted secondary structure and self-organizing maps

Tomas Ohlson Varun Aggarwal Arne Elofsson Robert M MacCallum 《BMC bioinformatics》2006,7(1):357

Background

Protein sequence alignment is one of the basic tools in bioinformatics. Correct alignments are required for a range of tasks including the derivation of phylogenetic trees and protein structure prediction. Numerous studies have shown that the incorporation of predicted secondary structure information into alignment algorithms improves their performance. Secondary structure predictors have to be trained on a set of somewhat arbitrarily defined states (e.g. helix, strand, coil), and it has been shown that the choice of these states has some effect on alignment quality. However, it is not unlikely that prediction of other structural features also could provide an improvement. In this study we use an unsupervised clustering method, the self-organizing map, to assign sequence profile windows to "structural states" and assess their use in sequence alignment. 相似文献

4.

Optimizing substitution matrix choice and gap parameters for sequence alignment

Robert C Edgar 《BMC bioinformatics》2009,10(1):396

Background

While substitution matrices can readily be computed from reference alignments, it is challenging to compute optimal or approximately optimal gap penalties. It is also not well understood which substitution matrices are the most effective when alignment accuracy is the goal rather than homolog recognition. Here a new parameter optimization procedure, POP, is described and applied to the problems of optimizing gap penalties and selecting substitution matrices for pair-wise global protein alignments. 相似文献

5.

Pitfalls of the most commonly used models of context dependent substitution

Helen Lindsay Von Bing Yap Hua Ying Gavin A Huttley 《Biology direct》2008,3(1):52

Background

Neighboring nucleotides exert a striking influence on mutation, with the hypermutability of CpG dinucleotides in many genomes being an exemplar. Among the approaches employed to measure the relative importance of sequence neighbors on molecular evolution have been continuous-time Markov process models for substitutions that treat sequences as a series of independent tuples. The most widely used examples are the codon substitution models. We evaluated the suitability of derivatives of the nucleotide frequency weighted (hereafter NF) and tuple frequency weighted (hereafter TF) models for measuring sequence context dependent substitution. Critical properties we address are their relationships to an independent nucleotide process and the robustness of parameter estimation to changes in sequence composition. We then consider the impact on inference concerning dinucleotide substitution processes from application of these two forms to intron sequence alignments from primates. 相似文献

6.

Improving pairwise sequence alignment accuracy using near-optimal protein sequence alignments

Michael L Sierk Michael E Smoot Ellen J Bass William R Pearson 《BMC bioinformatics》2010,11(1):146

Background

While the pairwise alignments produced by sequence similarity searches are a powerful tool for identifying homologous proteins - proteins that share a common ancestor and a similar structure; pairwise sequence alignments often fail to represent accurately the structural alignments inferred from three-dimensional coordinates. Since sequence alignment algorithms produce optimal alignments, the best structural alignments must reflect suboptimal sequence alignment scores. Thus, we have examined a range of suboptimal sequence alignments and a range of scoring parameters to understand better which sequence alignments are likely to be more structurally accurate. 相似文献

7.

Accuracy of structure-based sequence alignment of automatic methods

Changhoon Kim Byungkook Lee 《BMC bioinformatics》2007,8(1):355

Background

Accurate sequence alignments are essential for homology searches and for building three-dimensional structural models of proteins. Since structure is better conserved than sequence, structure alignments have been used to guide sequence alignments and are commonly used as the gold standard for sequence alignment evaluation. Nonetheless, as far as we know, there is no report of a systematic evaluation of pairwise structure alignment programs in terms of the sequence alignment accuracy. 相似文献

8.

The contrasting properties of conservation and correlated phylogeny in protein functional residue prediction

Jonathan R Manning Emily R Jefferson Geoffrey J Barton 《BMC bioinformatics》2008,9(1):51

Background

Amino acids responsible for structure, core function or specificity may be inferred from multiple protein sequence alignments where a limited set of residue types are tolerated. The rise in available protein sequences continues to increase the power of techniques based on this principle. 相似文献

9.

Reticular alignment: A progressive corner-cutting method for multiple sequence alignment

Adrienn Szabó Ádám Novák István Miklós Jotun Hein 《BMC bioinformatics》2010,11(1):570

Background

In this paper, we introduce a progressive corner cutting method called Reticular Alignment for multiple sequence alignment. Unlike previous corner-cutting methods, our approach does not define a compact part of the dynamic programming table. Instead, it defines a set of optimal and suboptimal alignments at each step during the progressive alignment. The set of alignments are represented with a network to store them and use them during the progressive alignment in an efficient way. The program contains a threshold parameter on which the size of the network depends. The larger the threshold parameter and thus the network, the deeper the search in the alignment space for better scored alignments. 相似文献

10.

Statistical distributions of optimal global alignment scores of random protein sequences

Hongxia?Pang Jiaowei?Tang Su-Shing?Chen Shiheng?Tao Email author 《BMC bioinformatics》2005,6(1):257

Background

The inference of homology from statistically significant sequence similarity is a central issue in sequence alignments. So far the statistical distribution function underlying the optimal global alignments has not been completely determined. 相似文献

11.

Effect of the assignment of ancestral CpG state on the estimation of nucleotide substitution rates in mammals

Daniel J Gaffney Peter D Keightley 《BMC evolutionary biology》2008,8(1):265

Background

Molecular evolutionary studies in mammals often estimate nucleotide substitution rates within and outside CpG dinucleotides separately. Frequently, in alignments of two sequences, the division of sites into CpG and non-CpG classes is based simply on the presence or absence of a CpG dinucleotide in either sequence, a procedure that we refer to as CpG/non-CpG assignment. Although it likely that this procedure is biased, it is generally assumed that the bias is negligible if species are very closely related. 相似文献

12.

Heuristics for multiobjective multiple sequence alignment

Maryam Abbasi Luís Paquete Francisco B. Pereira 《Biomedical engineering online》2016,15(1):70

Background

Aligning multiple sequences arises in many tasks in Bioinformatics. However, the alignments produced by the current software packages are highly dependent on the parameters setting, such as the relative importance of opening gaps with respect to the increase of similarity. Choosing only one parameter setting may provide an undesirable bias in further steps of the analysis and give too simplistic interpretations. In this work, we reformulate multiple sequence alignment from a multiobjective point of view. The goal is to generate several sequence alignments that represent a trade-off between maximizing the substitution score and minimizing the number of indels/gaps in the sum-of-pairs score function. This trade-off gives to the practitioner further information about the similarity of the sequences, from which she could analyse and choose the most plausible alignment.

Methods

We introduce several heuristic approaches, based on local search procedures, that compute a set of sequence alignments, which are representative of the trade-off between the two objectives (substitution score and indels). Several algorithm design options are discussed and analysed, with particular emphasis on the influence of the starting alignment and neighborhood search definitions on the overall performance. A perturbation technique is proposed to improve the local search, which provides a wide range of high-quality alignments.

Results and conclusions

The proposed approach is tested experimentally on a wide range of instances. We performed several experiments with sequences obtained from the benchmark database BAliBASE 3.0. To evaluate the quality of the results, we calculate the hypervolume indicator of the set of score vectors returned by the algorithms. The results obtained allow us to identify reasonably good choices of parameters for our approach. Further, we compared our method in terms of correctly aligned pairs ratio and columns correctly aligned ratio with respect to reference alignments. Experimental results show that our approaches can obtain better results than TCoffee and Clustal Omega in terms of the first ratio.

相似文献

13.

Comparing sequences without using alignments: application to HIV/SIV subtyping

Gilles Didier Laurent Debomy Maude Pupin Ming Zhang Alexander Grossmann Claudine Devauchelle Ivan Laprevotte 《BMC bioinformatics》2007,8(1):1

Background

In general, the construction of trees is based on sequence alignments. This procedure, however, leads to loss of informationwhen parts of sequence alignments (for instance ambiguous regions) are deleted before tree building. To overcome this difficulty, one of us previously introduced a new and rapid algorithm that calculates dissimilarity matrices between sequences without preliminary alignment. 相似文献

14.

Sigma-2: Multiple sequence alignment of non-coding DNA via an evolutionary model

Gayathri Jayaraman Rahul Siddharthan 《BMC bioinformatics》2010,11(1):464

Background

While most multiple sequence alignment programs expect that all or most of their input is known to be homologous, and penalise insertions and deletions, this is not a reasonable assumption for non-coding DNA, which is much less strongly conserved than protein-coding genes. Arguing that the goal of sequence alignment should be the detection of homology and not similarity, we incorporate an evolutionary model into a previously published multiple sequence alignment program for non-coding DNA, Sigma, as a sensitive likelihood-based way to assess the significance of alignments. Version 1 of Sigma was successful in eliminating spurious alignments but exhibited relatively poor sensitivity on synthetic data. Sigma 1 used a p-value (the probability under the "null hypothesis" of non-homology) to assess the significance of alignments, and, optionally, a background model that captured short-range genomic correlations. Sigma version 2, described here, retains these features, but calculates the p-value using a sophisticated evolutionary model that we describe here, and also allows for a transition matrix for different substitution rates from and to different nucleotides. Our evolutionary model takes separate account of mutation and fixation, and can be extended to allow for locally differing functional constraints on sequence. 相似文献

15.

GASP: Gapped Ancestral Sequence Prediction for proteins

Richard?J?Edwards Email author Denis?C?Shields 《BMC bioinformatics》2004,5(1):123

Background

The prediction of ancestral protein sequences from multiple sequence alignments is useful for many bioinformatics analyses. Predicting ancestral sequences is not a simple procedure and relies on accurate alignments and phylogenies. Several algorithms exist based on Maximum Parsimony or Maximum Likelihood methods but many current implementations are unable to process residues with gaps, which may represent insertion/deletion (indel) events or sequence fragments. 相似文献

16.

Estimates of statistical significance for comparison of individual positions in multiple sequence alignments

Ruslan?I?Sadreyev Nick?V?Grishin Email author 《BMC bioinformatics》2004,5(1):106

Background

Profile-based analysis of multiple sequence alignments (MSA) allows for accurate comparison of protein families. Here, we address the problems of detecting statistically confident dissimilarities between (1) MSA position and a set of predicted residue frequencies, and (2) between two MSA positions. These problems are important for (i) evaluation and optimization of methods predicting residue occurrence at protein positions; (ii) detection of potentially misaligned regions in automatically produced alignments and their further refinement; and (iii) detection of sites that determine functional or structural specificity in two related families. 相似文献

17.

AIR: A batch-oriented web program package for construction of supermatrices ready for phylogenomic analyses

Surendra Kumar ?smund Skj?veland Russell JS Orr P?l Enger Torgeir Ruden Bj?rn-Helge Mevik Fabien Burki Andreas Botnen Kamran Shalchian-Tabrizi 《BMC bioinformatics》2009,10(1):357

Background

Large multigene sequence alignments have over recent years been increasingly employed for phylogenomic reconstruction of the eukaryote tree of life. Such supermatrices of sequence data are preferred over single gene alignments as they contain vastly more information about ancient sequence characteristics, and are thus more suitable for resolving deeply diverging relationships. However, as alignments are expanded, increasingly numbers of sites with misleading phylogenetic information are also added. Therefore, a major goal in phylogenomic analyses is to maximize the ratio of information to noise; this can be achieved by the reduction of fast evolving sites. 相似文献

18.

High quality protein sequence alignment by combining structural profile prediction and profile alignment using SABERTOOTH

Florian Teichert Jonas Minning Ugo Bastolla Markus Porto 《BMC bioinformatics》2010,11(1):251

Background

Protein alignments are an essential tool for many bioinformatics analyses. While sequence alignments are accurate for proteins of high sequence similarity, they become unreliable as they approach the so-called 'twilight zone' where sequence similarity gets indistinguishable from random. For such distant pairs, structure alignment is of much better quality. Nevertheless, sequence alignment is the only choice in the majority of cases where structural data is not available. This situation demands development of methods that extend the applicability of accurate sequence alignment to distantly related proteins. 相似文献

19.

A configuration space of homologous proteins conserving mutual information and allowing a phylogeny inference based on pair-wise Z-score probabilities

Olivier Bastien Philippe Ortet Sylvaine Roy Eric Maréchal 《BMC bioinformatics》2005,6(1):49

Background

Popular methods to reconstruct molecular phylogenies are based on multiple sequence alignments, in which addition or removal of data may change the resulting tree topology. We have sought a representation of homologous proteins that would conserve the information of pair-wise sequence alignments, respect probabilistic properties of Z-scores (Monte Carlo methods applied to pair-wise comparisons) and be the basis for a novel method of consistent and stable phylogenetic reconstruction. 相似文献

20.

A class frequency mixture model that adjusts for site-specific amino acid frequencies and improves inference of protein phylogeny

Huai-Chun Wang Karen Li Edward Susko Andrew J Roger 《BMC evolutionary biology》2008,8(1):331

Background

Widely used substitution models for proteins, such as the Jones-Taylor-Thornton (JTT) or Whelan and Goldman (WAG) models, are based on empirical amino acid interchange matrices estimated from databases of protein alignments that incorporate the average amino acid frequencies of the data set under examination (e.g JTT + F). Variation in the evolutionary process between sites is typically modelled by a rates-across-sites distribution such as the gamma (Γ) distribution. However, sites in proteins also vary in the kinds of amino acid interchanges that are favoured, a feature that is ignored by standard empirical substitution matrices. Here we examine the degree to which the pattern of evolution at sites differs from that expected based on empirical amino acid substitution models and evaluate the impact of these deviations on phylogenetic estimation. 相似文献