首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Improved sequence alignment at low pairwise identity is important for identifying potential remote homologues in database searches and for obtaining accurate alignments as a prelude to modeling structures by homology. Our work is motivated by two observations: structural data provide superior training examples for developing techniques to improve the alignment of remote homologues; and general substitution patterns for remote homologues differ from those of closely related proteins. We introduce a new set of amino acid residue interchange matrices built from structural superposition data. These matrices exploit known structural homology as a means of characterizing the effect evolution has on residue-substitution profiles. Given their origin, it is not surprising that the individual residue-residue interchange frequencies are chemically sensible.The structural interchange matrices show a significant increase both in pairwise alignment accuracy and in functional annotation/fold recognition accuracy across distantly related sequences. We demonstrate improved pairwise alignment by using superpositions of homologous domains extracted from a structural database as a gold standard and go on to show an increase in fold recognition accuracy using a database of homologous fold families. This was applied to the unassigned open reading frames from the genome of Helicobacter pylori to identify five matches, two of which are not represented by new annotations in the sequence databases. In addition, we describe a new cyclic permutation strategy to identify distant homologues that experienced gene duplication and subsequent deletions. Using this method, we have identified a potential homologue to one additional previously unassigned open reading frame from the H. pylori genome.  相似文献   

2.
With an ever-increasing amount of available data on protein-protein interaction (PPI) networks and research revealing that these networks evolve at a modular level, discovery of conserved patterns in these networks becomes an important problem. Although available data on protein-protein interactions is currently limited, recently developed algorithms have been shown to convey novel biological insights through employment of elegant mathematical models. The main challenge in aligning PPI networks is to define a graph theoretical measure of similarity between graph structures that captures underlying biological phenomena accurately. In this respect, modeling of conservation and divergence of interactions, as well as the interpretation of resulting alignments, are important design parameters. In this paper, we develop a framework for comprehensive alignment of PPI networks, which is inspired by duplication/divergence models that focus on understanding the evolution of protein interactions. We propose a mathematical model that extends the concepts of match, mismatch, and gap in sequence alignment to that of match, mismatch, and duplication in network alignment and evaluates similarity between graph structures through a scoring function that accounts for evolutionary events. By relying on evolutionary models, the proposed framework facilitates interpretation of resulting alignments in terms of not only conservation but also divergence of modularity in PPI networks. Furthermore, as in the case of sequence alignment, our model allows flexibility in adjusting parameters to quantify underlying evolutionary relationships. Based on the proposed model, we formulate PPI network alignment as an optimization problem and present fast algorithms to solve this problem. Detailed experimental results from an implementation of the proposed framework show that our algorithm is able to discover conserved interaction patterns very effectively, in terms of both accuracies and computational cost.  相似文献   

3.
Dickson RJ  Gloor GB 《PloS one》2012,7(6):e37645
The use of sequence alignments to understand protein families is ubiquitous in molecular biology. High quality alignments are difficult to build and protein alignment remains one of the largest open problems in computational biology. Misalignments can lead to inferential errors about protein structure, folding, function, phylogeny, and residue importance. Identifying alignment errors is difficult because alignments are built and validated on the same primary criteria: sequence conservation. Local covariation identifies systematic misalignments and is independent of conservation. We demonstrate an alignment curation tool, LoCo, that integrates local covariation scores with the Jalview alignment editor. Using LoCo, we illustrate how local covariation is capable of identifying alignment errors due to the reduction of positional independence in the region of misalignment. We highlight three alignments from the benchmark database, BAliBASE 3, that contain regions of high local covariation, and investigate the causes to illustrate these types of scenarios. Two alignments contain sequential and structural shifts that cause elevated local covariation. Realignment of these misaligned segments reduces local covariation; these alternative alignments are supported with structural evidence. We also show that local covariation identifies active site residues in a validated alignment of paralogous structures. Loco is available at https://sourceforge.net/projects/locoprotein/files/.  相似文献   

4.
Determining structural similarities between proteins is an important problem since it can help identify functional and evolutionary relationships. In this paper, an algorithm is proposed to align two protein structures. Given the protein backbones, the algorithm finds a rigid motion of one backbone onto the other such that large substructures are matched. The algorithm uses a representation of the backbones that is independent of their relative orientations in space and applies dynamic programming to this representation to compute an initial alignment, which is then refined iteratively. Experiments indicate that the algorithm is competitive with two well-known algorithms, namely DALI and LOCK.  相似文献   

5.
A new set of DNA base-nucleic acid codes and their hypercomplex number representation have been introduced for taking the probability of each nucleotide into full account. A new scoring system has been proposed to suit the hypercomplex number representation of the DNA base-nucleic acid codes and incorporated with the method of dot matrix analysis and various algorithms of sequence alignment. The problem of DNA sequence alignment can be processed in a rather similar way to pairwise alignment of the protein sequence.  相似文献   

6.
Pairwise sequence alignment is a central problem in bioinformatics, which forms the basis of various other applications. Two related sequences are expected to have a high alignment score, but relatedness is usually judged by statistical significance rather than by alignment score. Recently, it was shown that pairwise statistical significance gives promising results as an alternative to database statistical significance for getting individual significance estimates of pairwise alignment scores. The improvement was mainly attributed to making the statistical significance estimation process more sequence-specific and database-independent. In this paper, we use sequence-specific and position-specific substitution matrices to derive the estimates of pairwise statistical significance, which is expected to use more sequence-specific information in estimating pairwise statistical significance. Experiments on a benchmark database with sequence-specific substitution matrices at different levels of sequence-specific contribution were conducted, and results confirm that using sequence-specific substitution matrices for estimating pairwise statistical significance is significantly better than using a standard matrix like BLOSUM62, and than database statistical significance estimates reported by popular database search programs like BLAST, PSI-BLAST (without pretrained PSSMs), and SSEARCH on a benchmark database, but with pretrained PSSMs, PSI-BLAST results are significantly better. Further, using position-specific substitution matrices for estimating pairwise statistical significance gives significantly better results even than PSI-BLAST using pretrained PSSMs.  相似文献   

7.
Identifying common local segments, also called motifs, in multiple protein sequences plays an important role for establishing homology between proteins. Homology is easy to establish when sequences are similar (sharing an identity > 25%). However, for distant proteins, it is much more difficult to align motifs that are not similar in sequences but still share common structures or functions. This paper is a first attempt to align multiple protein sequences using both primary and secondary structure information. A new sequence model is proposed so that the model assigns high probabilities not only to motifs that contain conserved amino acids but also to motifs that present common secondary structures. The proposed method is tested in a structural alignment database BAliBASE. We show that information brought by the predicted secondary structures greatly improves motif identification. A website of this program is available at www.stat.purdue.edu/~junxie/2ndmodel/sov.html.  相似文献   

8.
MOTIVATION: Searching for non-coding RNA (ncRNA) genes and structural RNA elements (eleRNA) are major challenges in gene finding today as these often are conserved in structure rather than in sequence. Even though the number of available methods is growing, it is still of interest to pairwise detect two genes with low sequence similarity, where the genes are part of a larger genomic region. RESULTS: Here we present such an approach for pairwise local alignment which is based on foldalign and the Sankoff algorithm for simultaneous structural alignment of multiple sequences. We include the ability to conduct mutual scans of two sequences of arbitrary length while searching for common local structural motifs of some maximum length. This drastically reduces the complexity of the algorithm. The scoring scheme includes structural parameters corresponding to those available for free energy as well as for substitution matrices similar to RIBOSUM. The new foldalign implementation is tested on a dataset where the ncRNAs and eleRNAs have sequence similarity <40% and where the ncRNAs and eleRNAs are energetically indistinguishable from the surrounding genomic sequence context. The method is tested in two ways: (1) its ability to find the common structure between the genes only and (2) its ability to locate ncRNAs and eleRNAs in a genomic context. In case (1), it makes sense to compare with methods like Dynalign, and the performances are very similar, but foldalign is substantially faster. The structure prediction performance for a family is typically around 0.7 using Matthews correlation coefficient. In case (2), the algorithm is successful at locating RNA families with an average sensitivity of 0.8 and a positive predictive value of 0.9 using a BLAST-like hit selection scheme. AVAILABILITY: The program is available online at http://foldalign.kvl.dk/  相似文献   

9.
Profile hidden Markov models (HMMs) based on classical HMMs have been widely applied for protein sequence identification. The formulation of the forward and backward variables in profile HMMs is made under statistical independence assumption of the probability theory. We propose a fuzzy profile HMM to overcome the limitations of that assumption and to achieve an improved alignment for protein sequences belonging to a given family. The proposed model fuzzifies the forward and backward variables by incorporating Sugeno fuzzy measures and Choquet integrals, thus further extends the generalized HMM. Based on the fuzzified forward and backward variables, we propose a fuzzy Baum-Welch parameter estimation algorithm for profiles. The strong correlations and the sequence preference involved in the protein structures make this fuzzy architecture based model as a suitable candidate for building profiles of a given family, since the fuzzy set can handle uncertainties better than classical methods.  相似文献   

10.

Background  

Structural alignment of RNAs is becoming important, since the discovery of functional non-coding RNAs (ncRNAs). Recent studies, mainly based on various approximations of the Sankoff algorithm, have resulted in considerable improvement in the accuracy of pairwise structural alignment. In contrast, for the cases with more than two sequences, the practical merit of structural alignment remains unclear as compared to traditional sequence-based methods, although the importance of multiple structural alignment is widely recognized.  相似文献   

11.

Background  

Sequence alignment is one of the most important techniques to analyze biological systems. It is also true that the alignment is not complete and we have to develop it to look for more accurate method. In particular, an alignment for homologous sequences with low sequence similarity is not in satisfactory level. Usual methods for aligning protein sequences in recent years use a measure empirically determined. As an example, a measure is usually defined by a combination of two quantities (1) and (2) below: (1) the sum of substitutions between two residue segments, (2) the sum of gap penalties in insertion/deletion region. Such a measure is determined on the assumption that there is no an intersite correlation on the sequences. In this paper, we improve the alignment by taking the correlation of consecutive residues.  相似文献   

12.
Alpha-amylase is an enzyme of great significance to industry, but most alpha-amylases are unstable at lower pH. In this paper, we have studied the related dipeptide and characteristic dipeptide of optimal pH in alpha-amylase. On analysis, it gives the explicit results as follows: (1) Ten dipeptides are associated with alpha-amylase's optimal pH. AH, DV, EH, HR, and YV are of positive correlation, AM, IC, NG, NL, and PS are of negative correlation. (2) GE, RE, GS, and KS are higher pH alpha-amylase characteristic dipeptides; AS, GS, DY, and GI are high pH alpha-amylase characteristic dipeptides; TE, VR, DS, and ET are middle pH alpha-amylase characteristic dipeptides; DK, NT, PT, and RV are low pH alpha-amylase characteristic dipeptides; AT, DS, GR, and SR are lower pH alpha-amylase characteristic dipeptides.  相似文献   

13.
14.
We have analyzed sequence covariation in an alignment of 266 non-redundant SH3 domain sequences using chi-squared statistical methods. Artifactual covariations arising from close evolutionary relationships among certain sequence subgroups were eliminated using empirically derived sequence diversity thresholds. This covariation detection method was able to predict residue-residue contacts (side-chain centres of mass within 8 A) in the structure of the SH3 domain with an accuracy of 85 %, which is greater than that achieved in many previous covariation studies. In examining the positions involved most frequently in covariations, we discovered a dramatic over-representation of a subset of five hydrophobic core positions. This covariation information was used to design second and third site substitutions that could compensate for highly destabilizing hydrophobic core substitutions in the Fyn SH3 domain, thus providing experimental data to validate the covariation analysis. The testing of our covariation detection method on 15 other alignments showed that the accuracy of contact prediction is highly variable depending on which sequence alignment is used, and useful levels of prediction accuracy were obtained with only approximately one-third of alignments. The results presented here provide insight into the difficulties inherent in covariation analysis, and suggest that it may have limited usefulness in tertiary structure prediction. On the other hand, our ability to use covariation analysis to design stabilizing combinations of hydrophobic core substitutions attests to its potential utility for gaining deeper insight into the stability determinants and functional mechanisms of proteins with known three-dimensional structures.  相似文献   

15.
A Hybrid Pairwise Likelihood Method   总被引:3,自引:0,他引:3  
Kuk  Anthony Y. C. 《Biometrika》2007,94(4):939-952
A modification to the pairwise likelihood method is proposed,which aims to improve the estimation of the marginal distributionparameters. This is achieved by replacing the pairwise likelihoodscore equations, for estimating such parameters, by the optimallinear combinations of the marginal score functions. A furtheradvantage of the proposed estimator of marginal parameters,over pairwise likelihood, is that it is robust to misspecificationof the bivariate distributions as long as the univariate marginaldistributions are correctly specified. While alternating logisticregression can be seen as a special case of the proposed method,it is shown that an existing generalization of alternating logisticregression applicable to ordinal data is not the same as andis inferior to the proposed method because it replaces certainconditional densities by pseudodensities that assume workingindependence. The fitting of the multivariate negative binomialdistribution is another scenario involving intractable likelihoodthat calls for the use of pairwise likelihood methods, and thesuperiority of the modified method is demonstrated in a simulationstudy. Two examples, based on the analyses of salamander matingand patient-controlled analgesia data, demonstrate the usefulnessof the proposed method. The possibility of combining optimallythe pairwise, rather than marginal, scores is also consideredand its difficulty and potential are discussed.  相似文献   

16.
A method for incorporating dipolar coupling restraints into structure calculations is described which follows closely on methodology that has been recently presented for orienting peptide planes using dipolar couplings [Mueller et al. (2000) J. Mol. Biol., 300, 197–212] and is specifically developed for use in cases of an axially symmetric alignment tensor. Modeling studies on an all -helical protein, farnesyl diphosphate synthase, establish the utility of the approach. A global fold of the 370-residue maltose binding protein in complex with -cyclodextrin is obtained from experimentally derived restraints. The average pairwise rmsd values between the N- and C-terminal domains in this NMR structure and the corresponding regions in the X-ray structure of the protein are 2.8 and 3.1 Å, respectively.  相似文献   

17.
Pairwise curve synchronization for functional data   总被引:1,自引:0,他引:1  
Tang  Rong; Muller  Hans-Georg 《Biometrika》2008,95(4):875-889
Data collected by scientists are increasingly in the form oftrajectories or curves. Often these can be viewed as realizationsof a composite process driven by both amplitude and time variation.We consider the situation in which functional variation is dominatedby time variation, and develop a curve-synchronization methodthat uses every trajectory in the sample as a reference to obtainpairwise warping functions in the first step. These initialpairwise warping functions are then used to create improvedestimators of the underlying individual warping functions inthe second step. A truncated averaging process is used to obtainrobust estimation of individual warping functions. The methodcompares well with other available time-synchronization approachesand is illustrated with Berkeley growth data and gene expressiondata for multiple sclerosis.  相似文献   

18.
19.
Spite in Hamilton’s sense is defined as the willingness to harm oneself in order to harm another more. The standard replicator dynamic predicts that evolutionarily stable strategies are payoff-maximizing equilibria of the underlying game, and hence rules out the evolution of spiteful behavior. We propose a modified replicator dynamic, where selection is based on local outcomes, rather than on the population ’state’, as in standard models. We show that under this new model spite can evolve readily. The new dynamic suggests conditions under which spite in animals might be found.  相似文献   

20.
氨基酸二肽甜味剂的开发研究进展   总被引:11,自引:0,他引:11  
范长胜 《工业微生物》2002,32(2):37-40,51
介绍了以氨基酸为原料生产的二肽甜味剂阿斯巴甜,阿力甜和乐甜的研究开发情况,并对这三种甜味剂的应用和市场情况作了概述。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号