共查询到20条相似文献,搜索用时 15 毫秒
1.
The many faces of sequence alignment 总被引:9,自引:0,他引:9
Batzoglou S 《Briefings in bioinformatics》2005,6(1):6-22
Starting with the sequencing of the mouse genome in 2002, we have entered a period where the main focus of genomics will be to compare multiple genomes in order to learn about human biology and evolution at the DNA level. Alignment methods are the main computational component of this endeavour. This short review aims to summarise the current status of research in alignments, emphasising large-scale genomic comparisons and suggesting possible directions that will be explored in the near future. 相似文献
2.
A workbench for multiple alignment construction and analysis 总被引:126,自引:0,他引:126
Multiple sequence alignment can be a useful technique for studying molecular evolution, as well as for analyzing relationships between structure or function and primary sequence. We have developed for this purpose an interactive program, MACAW (Multiple Alignment Construction and Analysis Workbench), that allows the user to construct multiple alignments by locating, analyzing, editing, and combining "blocks" of aligned sequence segments. MACAW incorporates several novel features. (1) Regions of local similarity are located by a new search algorithm that avoids many of the limitations of previous techniques. (2) The statistical significance of blocks of similarity is evaluated using a recently developed mathematical theory. (3) Candidate blocks may be evaluated for potential inclusion in a multiple alignment using a variety of visualization tools. (4) A user interface permits each block to be edited by moving its boundaries or by eliminating particular segments, and blocks may be linked to form a composite multiple alignment. No completely automatic program is likely to deal effectively with all the complexities of the multiple alignment problem; by combining a powerful similarity search algorithm with flexible editing, analysis and display tools, MACAW allows the alignment strategy to be tailored to the problem at hand. 相似文献
3.
A "Long Indel" model for evolutionary sequence alignment 总被引:7,自引:0,他引:7
We present a new probabilistic model of sequence evolution, allowing indels of arbitrary length, and give sequence alignment algorithms for our model. Previously implemented evolutionary models have allowed (at most) single-residue indels or have introduced artifacts such as the existence of indivisible "fragments." We compare our algorithm to these previous methods by applying it to the structural homology dataset HOMSTRAD, evaluating the accuracy of (1) alignments and (2) evolutionary time estimates. With our method, it is possible (for the first time) to integrate probabilistic sequence alignment, with reliability indicators and arbitrary gap penalties, in the same framework as phylogenetic reconstruction. Our alignment algorithm requires that we evaluate the likelihood of any specific path of mutation events in a continuous-time Markov model, with the event times integrated out. To this effect, we introduce a "trajectory likelihood" algorithm (Appendix A). We anticipate that this algorithm will be useful in more general contexts, such as Markov Chain Monte Carlo simulations. 相似文献
4.
There is a lack of programs available that focus on providing an overview of an aligned set of sequences such that the comparison of homologous sites becomes comprehensible and intuitive. Being able to identify similarities, differences, and patterns within a multiple sequence alignment is biologically valuable because it permits visualization of the distribution of a particular feature and inferences about the structure, function, and evolution of the sequences in question. We have therefore created a web server, fingerprint, which combines the characteristics of existing programs that represent identity, variability, charge, hydrophobicity, solvent accessibility, and structure along with new visualizations based on composition, heterogeneity, heterozygosity, dN/dS and nucleotide diversity. fingerprint is easy to use and globally accessible through any computer using any major browser. fingerprint is available at http://evol.mcmaster.ca/fingerprint/ . 相似文献
5.
Clustal W—蛋白质与核酸序列分析软件 总被引:2,自引:1,他引:2
蛋白质与核酸的序列分析在现代生物学和生物信息学中发挥着重要作用,新的算法和软件层出不穷,本文介绍一个可运行在PC机上的完全免费的多序列比较软件-ClustalW,它不但可以进行蛋白质与核酸的多序列比较,分析不同序列之间的相似性关系,还可以绘制进化树。由于其灵活的输入输出格式、方便的参数设定和选择、详尽的在线帮助以及良好的可移植性,使得ClustalW在蛋白质与核酸的序列分析中得到了广泛应用。 相似文献
6.
Interspecific comparisons of protein sequences can reveal regions of evolutionary conservation that are under purifying selection because of functional constraints. Interpreting these constraints requires combining evolutionary information with structural, biochemical, and physiological data to understand the biological function of conserved regions. We take this integrative approach to investigate the evolution and function of the nuclear-encoded subunits of cytochrome c oxidase (COX). We find that the nuclear-encoded subunits evolved subsequent to the origin of mitochondria and the subunit composition of the holoenzyme varies across diverse taxa that include animals, yeasts, and plants. By mapping conserved amino acids onto the crystal structure of bovine COX, we show that conserved residues are structurally organized into functional domains. These domains correspond to some known functional sites as well as to other uncharacterized regions. We find that amino acids that are important for structural stability are conserved at frequencies higher than expected within each taxon, and groups of conserved residues cluster together at distances of less than 5 A more frequently than do randomly selected residues. We, therefore, suggest that selection is acting to maintain the structural foundation of COX across taxa, whereas active sites vary or coevolve within lineages. 相似文献
7.
An evolutionary model for maximum likelihood alignment of DNA sequences 总被引:16,自引:0,他引:16
Jeffrey L. Thorne Hirohisa Kishino Joseph Felsenstein 《Journal of molecular evolution》1991,33(2):114-124
Summary Most algorithms for the alignment of biological sequences are not derived from an evolutionary model. Consequently, these alignment algorithms lack a strong statistical basis. A maximum likelihood method for the alignment of two DNA sequences is presented. This method is based upon a statistical model of DNA sequence evolution for which we have obtained explicit transition probabilities. The evolutionary model can also be used as the basis of procedures that estimate the evolutionary parameters relevant to a pair of unaligned DNA sequences. A parameter-estimation approach which takes into account all possible alignments between two sequences is introduced; the danger of estimating evolutionary parameters from a single alignment is discussed. 相似文献
8.
MEGA3: Integrated software for Molecular Evolutionary Genetics Analysis and sequence alignment 总被引:431,自引:0,他引:431
With its theoretical basis firmly established in molecular evolutionary and population genetics, the comparative DNA and protein sequence analysis plays a central role in reconstructing the evolutionary histories of species and multigene families, estimating rates of molecular evolution, and inferring the nature and extent of selective forces shaping the evolution of genes and genomes. The scope of these investigations has now expanded greatly owing to the development of high-throughput sequencing techniques and novel statistical and computational methods. These methods require easy-to-use computer programs. One such effort has been to produce Molecular Evolutionary Genetics Analysis (MEGA) software, with its focus on facilitating the exploration and analysis of the DNA and protein sequence variation from an evolutionary perspective. Currently in its third major release, MEGA3 contains facilities for automatic and manual sequence alignment, web-based mining of databases, inference of the phylogenetic trees, estimation of evolutionary distances and testing evolutionary hypotheses. This paper provides an overview of the statistical methods, computational tools, and visual exploration modules for data input and the results obtainable in MEGA. 相似文献
9.
Bruno B. Amati Michel Goldschmidt-Clermont Carmichael J. A. Wallace Jean-David Rochaix 《Journal of molecular evolution》1988,28(1-2):151-160
Summary We have isolated complementary DNA (cDNA) clones for apocytochrome c from the green algaChlamydomonas reinhardtii and shown that they are encoded by a single nuclear gene termedcyc.Cyc mRNA levels are found to depend primarily on the presence of acetate as a reduced carbon source in the culture medium. The deduced amino acid sequence shows that, apart from the probable removal of the initiating methionine,C. reinhardtii apocytochrome c is syntheszed in its mature form. Its structure is generally similar to that of cytochromes c from higher plants. Several punctual deviations from the general pattern of cytochrome c sequences that is found in other organisms have interesting structural and functional implications. These include, in particular, valines 19 and 39, asparagine 78, and alanine 83. A phylogenetic tree was constructed by the matrix method from cytochrome c data for a representative range of species. The results suggest thatC. reinhardtii diverged from higher plants approximately 700–750 million years ago; they also are not easy to reconcile with the current attribution ofChlamydomonas reinhardtii andEnteromorpha intestinalis to a unique phylum, because these two species probably diverged from one another at about the same time as they diverged from the line leading to higher plants. 相似文献
10.
11.
The accuracy of the alignments of protein sequences depends on the score matrix and gap penalties used in performing the alignment. Most score functions are designed to find homologs in the various databases rather than to generate accurate alignments between known homologs. We describe the optimization of a score function for the purpose of generating accurate alignments, as evaluated by using a coordinate root-mean-square deviation (RMSD)-based merit function. We show that the resulting score matrix, which we call STROMA, generates more accurate alignments than other commonly used score matrices, and this difference is not due to differences in the gap penalties. In fact, in contrast to most of the other matrices, the alignment accuracies with STROMA are relatively insensitive to the choice of gap penalty parameters. 相似文献
12.
I. I. Litvinov M. Yu. Lobanov A. A. Mironov A. V. Finkelshtein M. A. Roytberg 《Molecular Biology》2006,40(3):474-480
The most popular algorithms employed in the pairwise alignment of protein primary structures (Smith-Watermann (SW) algorithm, FASTA, BLAST, etc.) only analyze the amino acid sequence. The SW algorithm is the most accurate, yielding alignments that agree best with superimpositions of the corresponding spatial structures of proteins. However, even the SW algorithm fails to reproduce the spatial structure alignment when the sequence identity is lower than 30%. The objective of this work was to develop a new and more accurate algorithm taking the secondary structure of proteins into account. The alignments generated by this algorithm and having the maximal weight with the secondary structure considered proved to be more accurate than SW alignments. With sequences having less than 30% identity, the accuracy (i.e., the portion of reproduced positions of a reference alignment obtained by superimposing the protein spatial structures) of the new algorithm is 58 vs. 35% of the SW algorithm. The accuracy of the new algorithm is much the same with secondary structures established experimentally or predicted theoretically. Hence, the algorithm is applicable to proteins with unknown spatial structures. The program is available at ftp://194.149.64.196/STRUSWER/. 相似文献
13.
L. Kruckenhauser W. Pinsker E. Haring W. Arnold 《Journal of Zoological Systematics and Evolutionary Research》1999,37(1):49-56
We established the phylogeny of 11 species of the genus Marmota based on the entire sequence of the mitochondrial cytochrome b ( cyt-b ) gene (1.1 kb) and a partial sequence of the NADH dehydrogenase subunit 4 ( ND4 ) gene (1.2 kb). In three species ( Marmota caligata , Marmota olympus , and Marmota bobac ) full-sized nuclear pseudogenes of the mitochondrial cyt-b were identified. The mitochondrial cyt-b genes and the three pseudogenes form separate clusters in the maximum parsimony dendrogram. This finding suggests that the pseudogenes originated from a single transfer to the nucleus that may have occurred prior to the radiation of the genus Marmota . Notably, compared with their functional mitochondrial equivalents the pseudogenes show a much lower substitution rate. In the dendrograms deduced from the mitochondrial sequences two distinct clusters become apparent: one cluster consists of the North-west American species, the other contains the Eurasian species together with the North American species Marmota monax . The position of M. monax as a member of the Eurasian clade is in accordance with the evolution of chromosome numbers. The results are of special interest with respect to the evolution of social systems in the genus that vary from solitary species ( M. monax ) to highly social species living in family groups (e.g. Marmota marmota ). The molecular phylogeny suggests a diphyletic origin of high sociality in the genus Marmota . 相似文献
14.
Background
The increasing abundance of neuromorphological data provides both the opportunity and the challenge to compare massive numbers of neurons from a wide diversity of sources efficiently and effectively. We implemented a modified global alignment algorithm representing axonal and dendritic bifurcations as strings of characters. Sequence alignment quantifies neuronal similarity by identifying branch-level correspondences between trees.Results
The space generated from pairwise similarities is capable of classifying neuronal arbor types as well as, or better than, traditional topological metrics. Unsupervised cluster analysis produces groups that significantly correspond with known cell classes for axons, dendrites, and pyramidal apical dendrites. Furthermore, the distinguishing consensus topology generated by multiple sequence alignment of a group of neurons reveals their shared branching blueprint. Interestingly, the axons of dendritic-targeting interneurons in the rodent cortex associates with pyramidal axons but apart from the (more topologically symmetric) axons of perisomatic-targeting interneurons.Conclusions
Global pairwise and multiple sequence alignment of neurite topologies enables detailed comparison of neurites and identification of conserved topological features in alignment-defined clusters. The methods presented also provide a framework for incorporation of additional branch-level morphological features. Moreover, comparison of multiple alignment with motif analysis shows that the two techniques provide complementary information respectively revealing global and local features.Electronic supplementary material
The online version of this article (doi:10.1186/s12859-015-0605-1) contains supplementary material, which is available to authorized users. 相似文献15.
Quality assessment of multiple alignment programs 总被引:7,自引:0,他引:7
A renewed interest in the multiple sequence alignment problem has given rise to several new algorithms. In contrast to traditional progressive methods, computationally expensive score optimization strategies are now predominantly employed. We systematically tested four methods (Poa, Dialign, T-Coffee and ClustalW) for the speed and quality of their alignments. As test sequences we used structurally derived alignments from BAliBASE and synthetic alignments generated by Rose. The tests included alignments of variable numbers of domains embedded in random spacer sequences. Overall, Dialign was the most accurate in cases with low sequence identity, while T-Coffee won in cases with high sequence identity. The fast Poa algorithm was almost as accurate, while ClustalW could compete only in strictly global cases with high sequence similarity. 相似文献
16.
We apply a simple method for aligning protein sequences on the basis of a 3D structure, on a large scale, to the proteins in the scop classification of fold families. This allows us to assess, understand, and improve our automatic method against an objective, manually derived standard, a type of comprehensive evaluation that has not yet been possible for other structural alignment algorithms. Our basic approach directly matches the backbones of two structures, using repeated cycles of dynamic programming and least-squares fitting to determine an alignment minimizing coordinate difference. Because of simplicity, our method can be readily modified to take into account additional features of protein structure such as the orientation of side chains or the location-dependent cost of opening a gap. Our basic method, augmented by such modifications, can find reasonable alignments for all but 1.5% of the known structural similarities in scop, i.e., all but 32 of the 2,107 superfamily pairs. We discuss the specific protein structural features that make these 32 pairs so difficult to align and show how our procedure effectively partitions the relationships in scop into different categories, depending on what aspects of protein structure are involved (e.g., depending on whether or not consideration of side-chain orientation is necessary for proper alignment). We also show how our pairwise alignment procedure can be extended to generate a multiple alignment for a group of related structures. We have compared these alignments in detail with corresponding manual ones culled from the literature. We find good agreement (to within 95% for the core regions), and detailed comparison highlights how particular protein structural features (such as certain strands) are problematical to align, giving somewhat ambiguous results. With these improvements and systematic tests, our procedure should be useful for the development of scop and the future classification of protein folds. 相似文献
17.
An algorithm is presented for the accurate and rapid generation of multiple protein sequence alignments from tertiary structure comparisons. A preliminary multiple sequence alignment is performed using sequence information, which then determines an initial superposition of the structures. A structure comparison algorithm is applied to all pairs of proteins in the superimposed set and a similarity tree calculated. Multiple sequence alignments are then generated by following the tree from the branches to the root. At each branchpoint of the tree, a structure-based sequence alignment and coordinate transformations are output, with the multiple alignment of all structures output at the root. The algorithm encoded in STAMP (STructural Alignment of Multiple Proteins) is shown to give alignments in good agreement with published structural accounts within the dehydrogenase fold domains, globins, and serine proteinases. In order to reduce the need for visual verification, two similarity indices are introduced to determine the quality of each generated structural alignment. Sc quantifies the global structural similarity between pairs or groups of proteins, whereas Pij' provides a normalized measure of the confidence in the alignment of each residue. STAMP alignments have the quality of each alignment characterized by Sc and Pij' values and thus provide a reproducible resource for studies of residue conservation within structural motifs. 相似文献
18.
Large residual 15N-1H dipolar couplings have been measured in a Src homology II domain aligned at Pf1 bacteriophage concentrations an order of magnitude lower than used for induction of a similar degree of alignment of nucleic acids and highly acidic proteins. An increase in 1 H and 15N protein linewidths and a decrease in T2 and T1 relaxation time constants implicates a binding interaction between the protein and phage as the mechanism of alignment. However, the associated increased linewidth does not preclude the accurate measurement of large dipolar couplings in the aligned protein. A good correlation is observed between measured dipolar couplings and predicted values based on the high resolution NMR structure of the SH2 domain. The observation of binding-induced protein alignment promises to broaden the scope of alignment techniques by extending their applicability to proteins that are able to interact weakly with the alignment medium. 相似文献
19.
BCL::Align is a multiple sequence alignment tool that utilizes the dynamic programming method in combination with a customizable scoring function for sequence alignment and fold recognition. The scoring function is a weighted sum of the traditional PAM and BLOSUM scoring matrices, position-specific scoring matrices output by PSI-BLAST, secondary structure predicted by a variety of methods, chemical properties, and gap penalties. By adjusting the weights, the method can be tailored for fold recognition or sequence alignment tasks at different levels of sequence identity. A Monte Carlo algorithm was used to determine optimized weight sets for sequence alignment and fold recognition that most accurately reproduced the SABmark reference alignment test set. In an evaluation of sequence alignment performance, BCL::Align ranked best in alignment accuracy (Cline score of 22.90 for sequences in the Twilight Zone) when compared with Align-m, ClustalW, T-Coffee, and MUSCLE. ROC curve analysis indicates BCL::Align's ability to correctly recognize protein folds with over 80% accuracy. The flexibility of the program allows it to be optimized for specific classes of proteins (e.g. membrane proteins) or fold families (e.g. TIM-barrel proteins). BCL::Align is free for academic use and available online at http://www.meilerlab.org/. 相似文献