首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background

Obtaining an accurate sequence alignment is fundamental for consistently analyzing biological data. Although this problem may be efficiently solved when only two sequences are considered, the exact inference of the optimal alignment easily gets computationally intractable for the multiple sequence alignment case. To cope with the high computational expenses, approximate heuristic methods have been proposed that address the problem indirectly by progressively aligning the sequences in pairs according to their relatedness. These methods however are not flexible to change the alignment of an already aligned group of sequences in the view of new data, resulting thus in compromises on the quality of the deriving alignment. In this paper we present ReformAlign, a novel meta-alignment approach that may significantly improve on the quality of the deriving alignments from popular aligners. We call ReformAlign a meta-aligner as it requires an initial alignment, for which a variety of alignment programs can be used. The main idea behind ReformAlign is quite straightforward: at first, an existing alignment is used to construct a standard profile which summarizes the initial alignment and then all sequences are individually re-aligned against the formed profile. From each sequence-profile comparison, the alignment of each sequence against the profile is recorded and the final alignment is indirectly inferred by merging all the individual sub-alignments into a unified set. The employment of ReformAlign may often result in alignments which are significantly more accurate than the starting alignments.

Results

We evaluated the effect of ReformAlign on the generated alignments from ten leading alignment methods using real data of variable size and sequence identity. The experimental results suggest that the proposed meta-aligner approach may often lead to statistically significant more accurate alignments. Furthermore, we show that ReformAlign results in more substantial improvement in cases where the starting alignment is of relatively inferior quality or when the input sequences are harder to align.

Conclusions

The proposed profile-based meta-alignment approach seems to be a promising and computationally efficient method that can be combined with practically all popular alignment methods and may lead to significant improvements in the generated alignments.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-265) contains supplementary material, which is available to authorized users.  相似文献   

2.
Quality assessment of multiple alignment programs   总被引:7,自引:0,他引:7  
A renewed interest in the multiple sequence alignment problem has given rise to several new algorithms. In contrast to traditional progressive methods, computationally expensive score optimization strategies are now predominantly employed. We systematically tested four methods (Poa, Dialign, T-Coffee and ClustalW) for the speed and quality of their alignments. As test sequences we used structurally derived alignments from BAliBASE and synthetic alignments generated by Rose. The tests included alignments of variable numbers of domains embedded in random spacer sequences. Overall, Dialign was the most accurate in cases with low sequence identity, while T-Coffee won in cases with high sequence identity. The fast Poa algorithm was almost as accurate, while ClustalW could compete only in strictly global cases with high sequence similarity.  相似文献   

3.
Protein structure alignment plays a key role in protein structure prediction and fold family classification. An efficient method for multiple protein structure alignment in a mathematical manner is presented, based on deterministic annealing technique. The alignment problem is mapped onto a nonlinear continuous optimization problem (NCOP) with common consensus chain, matching assignment matrices and atomic coordinates as variables. At each step in the annealing procedure, the NCOP is decomposed into as many sub-problems as the number of protein chains, each of which is actually an independent pairwise structure alignment between a protein chain and the consensus chain and hence can be efficiently solved by the parallel computation technique. The proposed method is robust with respect to choice of iteration parameters for a wide range of proteins, and performs well in both multiple and pairwise structure alignment cases, compared with existing alignment methods.  相似文献   

4.
Summary We examined two extensive families of protein sequences using four different alignment schemes that employ various degrees of weighting in order to determine which approach is most sensitive in establishing relationships. All alignments used a similarity approach based on a general algorithm devised by Needleman and Wunsch. The approaches included a simple program, UM (unitary matrix), whereby only identities are scored; a scheme in which the genetic code is used as a basis for weighting (GC); another that employs a matrix based on structural similarity of amino acids taken together with the genetic basis of mutation (SG); and a fourth that uses the empirical log-odds matrix (LOM) developed by Dayhoff on the basis of observed amino acid replacements. The two sequence families examined were (a) nine different globins and (b) nine different tyrosine kinase-like proteins. It was assumed a priori that all members of a family share common ancestry. In cases where two sequences were more than 30% identical, alignments by all four methods were almost always the same. In cases where the percentage identity was less than 20%, however, there were often significant differences in the alignments. On the average, the Dayhoff LOM approach was the most effective in verifying distant relationships, as judged by an empirical jumbling test. This was not universally the case, however, and in some instances the simple UM was actually as good or better. Trees constructed on the basis of the various alignments differed with regard to their limb lengths, but had essentially the same branching orders. We suggest some reasons for the different effectivenesses of the four approaches in the two different sequence settings, and offer some rules of thumb for assessing the significance of sequence relationships.  相似文献   

5.
Protein multiple sequence alignment is an important bioinformatics tool. It has important applications in biological evolution analysis and protein structure prediction. A variety of alignment algorithms in this field have achieved great success. However, each algorithm has its own inherent deficiencies. In this paper, permutation similarity is proposed to evaluate several protein multiple sequence alignment algorithms that are widely used currently. As the permutation similarity method only concerns the relative order of different protein evolutionary distances, without taking into account the slight difference between the evolutionary distances, it can get more robust evaluations. The longest common subsequence method is adopted to define the similarity between different permutations. Using these methods, we assessed Dialign, Tcoffee, ClustalW and Muscle and made comparisons among them.  相似文献   

6.
As a basic tool of modern biology, sequence alignment can provide us useful information in fold, function, and active site of protein. For many cases, the increased quality of sequence alignment means a better performance. The motivation of present work is to increase ability of the existing scoring scheme/algorithm by considering residue-residue correlations better. Based on a coarse-grained approach, the hydrophobic force between each pair of residues is written out from protein sequence. It results in the construction of an intramolecular hydrophobic force network that describes the whole residue-residue interactions of each protein molecule, and characterizes protein's biological properties in the hydrophobic aspect. A former work has suggested that such network can characterize the top weighted feature regarding hydrophobicity. Moreover, for each homologous protein of a family, the corresponding network shares some common and representative family characters that eventually govern the conservation of biological properties during protein evolution. In present work, we score such family representative characters of a protein by the deviation of its intramolecular hydrophobic force network from that of background. Such score can assist the existing scoring schemes/algorithms, and boost up the ability of multiple sequences alignment, e.g. achieving a prominent increase (∼50%) in searching the structurally alike residue segments at a low identity level. As the theoretical basis is different, the present scheme can assist most existing algorithms, and improve their efficiency remarkably.  相似文献   

7.
将粒子群优化算法应用于序列联配,提出了一种改进的粒子群优化算法,该算法在粒子群的进化过程中根据粒子的适应值动态地调整粒子群的惯性权重与粒子群飞行速度范围,提高了算法的收敛速度和收敛精度;针对PSO算法可能出现的早熟现象,引入重新初始化机制,增强了算法的搜索能力,实验表明该算法是有效的。  相似文献   

8.
In this study, we present an approach to identify some residues that represent the pivot points to experience conformational changes between open (unligand) and closed (ligand) forms of a protein. First, an angle, , formed by 4 consecutive Ca atoms in polypeptide backbones was introduced. The difference of this angle, , from the equivalent residues between the open and the closed form was used to represent the local torsion changes in the protein structure, and the residue with the maximum among was identified to be a pivot residue. We demonstrate the ability of our method by identifying the pivot residues from five proteins, Lysozyme mutates, Lactoferrin, Lay/Arg/Orn-binding protein, Calmodulin and Catabolit gene activator protein. These pivot residues are located at the hinges in the proteins, they are hinge points for the domain motion. These examples also show that the pivot residues are useful to distinguish the mechanism between shear motion and hinge motion in a protein  相似文献   

9.
在生物信息学研究中,生物序列比对问题占有重要的地位。多序列比对问题是一个NPC问题,由于时间和空间的限制不能够求出精确解。文中简要介绍了Feng和Doolittle提出的多序列比对算法的基本思想,并改进了该算法使之具有更好的比对精度。实验结果表明,新算法对解决一般的progressive多序列比对方法中遇到的局部最优问题有较好的效果。  相似文献   

10.
Class I phosphoinositide (PI) 3-kinases act through effector proteins whose 3-PI selectivity is mediated by a limited repertoire of structurally defined, lipid recognition domains. We describe here the lipid preferences and crystal structure of a new class of PI binding modules exemplified by select IQGAPs (IQ motif containing GTPase-activating proteins) known to coordinate cellular signaling events and cytoskeletal dynamics. This module is defined by a C-terminal 105-107 amino acid region of which IQGAP1 and -2, but not IQGAP3, binds preferentially to phosphatidylinositol 3,4,5-trisphosphate (PtdInsP(3)). The binding affinity for PtdInsP(3), together with other, secondary target-recognition characteristics, are comparable with those of the pleckstrin homology domain of cytohesin-3 (general receptor for phosphoinositides 1), an established PtdInsP(3) effector protein. Importantly, the IQGAP1 C-terminal domain and the cytohesin-3 pleckstrin homology domain, each tagged with enhanced green fluorescent protein, were both re-localized from the cytosol to the cell periphery following the activation of PI 3-kinase in Swiss 3T3 fibroblasts, consistent with their common, selective recognition of endogenous 3-PI(s). The crystal structure of the C-terminal IQGAP2 PI binding module reveals unexpected topological similarity to an integral fold of C2 domains, including a putative basic binding pocket. We propose that this module integrates select IQGAP proteins with PI 3-kinase signaling and constitutes a novel, atypical phosphoinositide binding domain that may represent the first of a larger group, each perhaps structurally unique but collectively dissimilar from the known PI recognition modules.  相似文献   

11.
Tusnády GE  Sarkadi B  Simon I  Váradi A 《FEBS letters》2006,580(4):1017-1022
In this review, we summarize the currently available information on the membrane topology of some key members of the human ABC protein subfamilies, and present the predicted domain arrangements. In the lack of high-resolution structures for eukaryotic ABC transporters this topology is based only on prediction algorithms and biochemical data for the location of various segments of the polypeptide chain, relative to the membrane. We suggest that topology models generated by the available prediction methods should only be used as guidelines to provide a basis of experimental strategies for the elucidation of the membrane topology.  相似文献   

12.

Background

The increasing abundance of neuromorphological data provides both the opportunity and the challenge to compare massive numbers of neurons from a wide diversity of sources efficiently and effectively. We implemented a modified global alignment algorithm representing axonal and dendritic bifurcations as strings of characters. Sequence alignment quantifies neuronal similarity by identifying branch-level correspondences between trees.

Results

The space generated from pairwise similarities is capable of classifying neuronal arbor types as well as, or better than, traditional topological metrics. Unsupervised cluster analysis produces groups that significantly correspond with known cell classes for axons, dendrites, and pyramidal apical dendrites. Furthermore, the distinguishing consensus topology generated by multiple sequence alignment of a group of neurons reveals their shared branching blueprint. Interestingly, the axons of dendritic-targeting interneurons in the rodent cortex associates with pyramidal axons but apart from the (more topologically symmetric) axons of perisomatic-targeting interneurons.

Conclusions

Global pairwise and multiple sequence alignment of neurite topologies enables detailed comparison of neurites and identification of conserved topological features in alignment-defined clusters. The methods presented also provide a framework for incorporation of additional branch-level morphological features. Moreover, comparison of multiple alignment with motif analysis shows that the two techniques provide complementary information respectively revealing global and local features.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0605-1) contains supplementary material, which is available to authorized users.  相似文献   

13.
城市不同植被配置类型空气负离子效应评价   总被引:26,自引:2,他引:26  
通过对南宁城区、城郊绿地及农田开发区进行负离子含量的测定,找出不同植被配置类型空气负离子效应的差别。结果表明,就空气负离子而言,城郊大规模绿地的空气质量〉农田开发区〉城区;植被配置的复层结构(乔灌草)〉简单植被配置结构(乔灌、乔草、灌草)〉单一配置结构(草坪、稀乔、稀灌草)。溪流和瀑布对增加负离子浓度的作用显著。随着海拔和郁闭度的增加,空气负离子含量有上升的趋势。空气负离子含量随季节有一定的波动。  相似文献   

14.
Using a laser confocal microscope, chromatin arrangements in intact interphase nuclei were investigated in four plant species. Chromosomes in these plants have specific segments that can be stained with the fluorescent dye chromomycin A3 (CMA). We stained centromeres inHordeum vulgare, sub-telomeric regions inSecale cereale, satellites inChrysanthemum multicore, and the satellites and the short arms of chromosomes with satellites inHemerocallis middendorfii. The following points were shown: (1) In mitotic interphase nuclei, the centromere and the telomeres of both arms touched the nuclear membrane and had evident polarity. Some CMA-bodies in sub-telomeric regions do not contact the nuclear membrane. (2) Differentiated nuclei had a non-random construction. Polarity of chromosomes is maintained, however, the chromosomes are far apart from the nuclear membrane. (3) Associations in sub-telomeric regions in the interphase nuclei ofSecale cereale were probably due to the association of heterochromatic regions with identical repeated sequences rather than telomere associlations. (4) In interphase nuclei ofChrysanthemum multicore, satellites fused during interphase.  相似文献   

15.
A frequency domain approach and a time domain approach have been combined in an investigation of the behaviour of the primary and secondary endings of an isolated muscle spindle in response to the activity of two static fusimotor axons when the parent muscle is held at a fixed length and when it is subjected to random length changes. The frequency domain analysis has an associated error process which provides a measure of how well the input processes can be used to predict the output processes and is also used to specify how the interactions between the recorded processes contribute to this error. Without assuming stationarity of the input, the time domain approach uses a sequence of probability models of increasing complexity in which the number of input processes to the model is progressively increased. This feature of the time domain approach was used to identify a preferred direction of interaction between the processes underlying the generation of the activity of the primary and secondary endings. In the presence of fusimotor activity and dynamic length changes imposed on the muscle, it was shown that the activity of the primary and secondary endings carried different information about the effects of the inputs imposed on the muscle spindle. The results presented in this work emphasise that the analysis of the behaviour of complex systems benefits from a combination of frequency and time domain methods. This article is part of a special issue on Neuronal Dynamics of Sensory Coding.  相似文献   

16.
17.
Summary Rhodopsins share a limited number of amino acid identities with a variety of other integral membrane proteins. Most of these proteins have seven putative transmembrane segments and are likely to play a role in transmembrane signaling. We have undertaken a systematic series of comparisons of primary and secondary structure in order to clarify the functional and evolutionary significance of these sequence similarities. On the basis of consistently high similarity scores, we find that the most internally consistent definition of the rhodopsis gene family would ionclude vertebrate rhodospins, - and -adrenergic receptors, M1 and M2 muscarinic acetylcholine receptors, substance K receptors and insect rhodopsins, while excluding bacterirhodopsin, themas human oncogene, vertebrate and insect nicotinic acetylcholine receptors, and the yeast STE2 and STE3 peptide receptors. The rhodopsin gene family is highly diverged at the primary sequence level but has maintained a conserved secondary structure, including a previosuly unidentified hierarchy of transmembrane segment hydrophobicity. We have deevelope new computer alogithms for progressive multiple sequence alignment and the analysis of local conservation of protein domains, and we have used these algorithms to examined the phylogeny of the rhodopsin gene family and the changing domains of sequence conservation. The results show striking diffiierences and similarities in the conserved domains in each of the three main branches of the rhodopsin gene family, and indicte that color vision arose independently in the lines of descent leading to modern humans and fruit flies.  相似文献   

18.
A fuzzy cluster method is presented to recognize protein domains. This algorithm can identify domains globally. A protein structure set was used to test the algorithm. Among 219 proteins, 66.7% yielded results that agreed with the reference definitions, 30.6% showed minor differences, and only 2.7% (six proteins) showed major differences with the reference. The new method is more than 20 times fast than previous algorithms. Received: 9 November 1998 / Revised version: 20 December 1999 / Accepted: 20 December 1999  相似文献   

19.
The zinc finger associated domain (ZAD), present in almost 100 distinct proteins, characterizes the largest subgroup of C2H2 zinc finger proteins in Drosophila melanogaster and was initially found to be encoded by arthropod genomes only. Here, we report that the ZAD was also present in the last common ancestor of arthropods and vertebrates, and that vertebrate genomes contain a single conserved gene that codes for a ZAD-like peptide. Comparison of the ZAD proteomes of several arthropod species revealed an extensive and species-specific expansion of ZAD-coding genes in higher holometabolous insects, and shows that only few ZAD-coding genes with essential functions in Drosophila melanogaster are conserved. Furthermore, at least 50% of the ZAD-coding genes of Drosophila melanogaster are expressed in the female germline, suggesting a function in oocyte development and/or a requirement during early embryogenesis. Since the majority of the essential ZAD coding genes of Drosophila melanogaster were not conserved during arthropod or at least during insect evolution, we propose that the LSE of ZAD-coding genes shown here may provide the raw material for the evolution of new functions that allow organisms to pursue novel evolutionary paths.  相似文献   

20.
Members of the superfamily of G-protein-coupled neurotransmitter receptors have a conserved secondary structure, a moderate and reasonably steady rate of sequence change, and usually lack introns within the coding sequence. These properties are advantageous for evolutionary studies. The duplication and divergence of the genes in this gene family led to the formation of distinct neurotransmitter pathways and may have facilitated the evolution of complex nervous systems. I have analyzed this evolutionary divergence by quantitative multiple sequence alignment, bootstrap resampling, and statistical analysis of 49 adrenergic, muscarinic cholinergic, dopamine, and octopamine receptor sequences from 12 animal species. The results indicate that the first event to occur within this gene family was the divergence of the catecholamine receptors from the muscarinic acetylcholine receptors, which occurred prior to the divergence of the arthropod and vertebrate lineages. Subsequently, the ability to activate specific second-messenger pathways diverged independently in both the muscarinic and the catecholamine receptors. This appears to have occurred after the divergence of the arthropod and vertebrate lineages but before the divergence of the avian and mammalian lineages. However, the second-messenger pathways activated by adrenergic and dopamine receptors did not diverge independently. Rather, the ability of the catecholamine receptors to bind to specific ligands, such as epinephrine, norepinephrine, dopamine, or octopamine, was repeatedly modified in evolutionary history, and in some cases was modified after the divergence of the second-messenger pathways.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号