共查询到20条相似文献,搜索用时 15 毫秒
1.
The protein sequence database was analyzed for evidence that some distinct sequence families might be distantly related in evolution by changes in frame of translation. Sequences were compared using special amino acid substitution matrices for the alternate frames of translation. The statistical significance of alignment scores were computed in the true database and shuffled versions of the database that preserve any potential codon bias. The comparison of results from these two databases provides a very sensitive method for detecting remote relationships. We find a weak but measurable relatedness within the database as a whole, supporting the notion that some proteins may have evolved from others through changes in frame of translation. We also quantify residual homology in the ordinary sense within a database of generally unrelated sequences. 相似文献
2.
MOTIVATION: Alignments of two multiple-sequence alignments, or statistical models of such alignments (profiles), have important applications in computational biology. The increased amount of information in a profile versus a single sequence can lead to more accurate alignments and more sensitive homolog detection in database searches. Several profile-profile alignment methods have been proposed and have been shown to improve sensitivity and alignment quality compared with sequence-sequence methods (such as BLAST) and profile-sequence methods (e.g. PSI-BLAST). Here we present a new approach to profile-profile alignment we call Comparison of Alignments by Constructing Hidden Markov Models (HMMs) (COACH). COACH aligns two multiple sequence alignments by constructing a profile HMM from one alignment and aligning the other to that HMM. RESULTS: We compare the alignment accuracy of COACH with two recently published methods: Yona and Levitt's prof_sim and Sadreyev and Grishin's COMPASS. On two sets of reference alignments selected from the FSSP database, we find that COACH is able, on average, to produce alignments giving the best coverage or the fewest errors, depending on the chosen parameter settings. AVAILABILITY: COACH is freely available from www.drive5.com/lobster 相似文献
3.
Daria V. Dibrova Kirill A. Konovalov Vadim V. Perekhvatov Konstantin V. Skulachev Armen Y. Mulkidjanian 《Biology direct》2017,12(1):29
Background
The Clusters of Orthologous Groups (COGs) of proteins systematize evolutionary related proteins into specific groups with similar functions. However, the available databases do not provide means to assess the extent of similarity between the COGs.Aim
We intended to provide a method for identification and visualization of evolutionary relationships between the COGs, as well as a respective web server.Results
Here we introduce the COGcollator, a web tool for identification of evolutionarily related COGs and their further analysis. We demonstrate the utility of this tool by identifying the COGs that contain distant homologs of (i) the catalytic subunit of bacterial rotary membrane ATP synthases and (ii) the DNA/RNA helicases of the superfamily 1.Reviewers
This article was reviewed by Drs. Igor N. Berezovsky, Igor Zhulin and Yuri Wolf.4.
It is suspected that correlated motions among a subset of spatially separated residues drive conformational dynamics not only in multidomain but also in single domain proteins. Sequence and structure‐based methods have been proposed to determine covariation between two sites on a protein. The statistical coupling analysis (SCA) that compares the changes in probability at two sites in a multiple sequence alignment (MSA) and a subset of the MSA has been used to infer the network of residues that encodes allosteric signals in protein families. The structural perturbation method (SPM), that probes the response of a local perturbation at all other sites, has been used to probe the allostery wiring diagram in biological machines and enzymes. To assess the efficacy of the SCA, we used an exactly soluble two dimensional lattice model and performed double‐mutant cycle (DMC) calculations to predict the extent of physical coupling between two sites. The predictions of the SCA and the DMC results show that only residues that are in contact in the native state are accurately identified. In addition, covariations among strongly interacting residues are most easily identified by the SCA. These conclusions are consistent with the DMC experiments on the PDZ family. Good correlation between the SCA and the DMC is only obtained by performing multiple experiments that vary the nature of amino acids at a given site. In contrast, the energetic coupling found in experiments for the PDZ domain are recovered using the SPM. We also predict, using the SPM, several residues that are coupled energetically. Proteins 2009. © 2009 Wiley‐Liss, Inc. 相似文献
5.
We have examined the merits of the three functions based on amino acid compositions which have been proposed to indicate the similarity in amino acid sequences of two proteins; the difference index, the composition divergence and the composition coefficient. We have taken the amino acid compositions and sequences of 41 cytochrome c's and used the 820 values from all possible comparisons in the evaluation. We conclude that the functions do have a limited value in predicting proteins which are closely related in sequence and that the three functions are equivalent in this predictive ability. We have used the composition divergence values obtained from available pyruvate kinase amino acid compositions to generate a phylogenetic tree for this glycolytic enzyme. 相似文献
6.
7.
We develop a new threading algorithm MUSTER by extending the previous sequence profile-profile alignment method, PPA. It combines various sequence and structure information into single-body terms which can be conveniently used in dynamic programming search: (1) sequence profiles; (2) secondary structures; (3) structure fragment profiles; (4) solvent accessibility; (5) dihedral torsion angles; (6) hydrophobic scoring matrix. The balance of the weighting parameters is optimized by a grading search based on the average TM-score of 111 training proteins which shows a better performance than using the conventional optimization methods based on the PROSUP database. The algorithm is tested on 500 nonhomologous proteins independent of the training sets. After removing the homologous templates with a sequence identity to the target >30%, in 224 cases, the first template alignment has the correct topology with a TM-score >0.5. Even with a more stringent cutoff by removing the templates with a sequence identity >20% or detectable by PSI-BLAST with an E-value <0.05, MUSTER is able to identify correct folds in 137 cases with the first model of TM-score >0.5. Dependent on the homology cutoffs, the average TM-score of the first threading alignments by MUSTER is 5.1-6.3% higher than that by PPA. This improvement is statistically significant by the Wilcoxon signed rank test with a P-value < 1.0 x 10(-13), which demonstrates the effect of additional structural information on the protein fold recognition. The MUSTER server is freely available to the academic community at http://zhang.bioinformatics.ku.edu/MUSTER. 相似文献
8.
We present FORTE, a profile-profile comparison tool for protein fold recognition. Users can submit a protein sequence to explore the possibilities of structural similarity existing in known structures. Results are reported via email in the form of pairwise alignments. 相似文献
9.
Prediction of protein-protein interactions using distant conservation of sequence patterns and structure relationships 总被引:3,自引:0,他引:3
Espadaler J Romero-Isart O Jackson RM Oliva B 《Bioinformatics (Oxford, England)》2005,21(16):3360-3368
MOTIVATION: Given that association and dissociation of protein molecules is crucial in most biological processes several in silico methods have been recently developed to predict protein-protein interactions. Structural evidence has shown that usually interacting pairs of close homologs (interologs) physically interact in the same way. Moreover, conservation of an interaction depends on the conservation of the interface between interacting partners. In this article we make use of both, structural similarities among domains of known interacting proteins found in the Database of Interacting Proteins (DIP) and conservation of pairs of sequence patches involved in protein-protein interfaces to predict putative protein interaction pairs. RESULTS: We have obtained a large amount of putative protein-protein interaction (approximately 130,000). The list is independent from other techniques both experimental and theoretical. We separated the list of predictions into three sets according to their relationship with known interacting proteins found in DIP. For each set, only a small fraction of the predicted protein pairs could be independently validated by cross checking with the Human Protein Reference Database (HPRD). The fraction of validated protein pairs was always larger than that expected by using random protein pairs. Furthermore, a correlation map of interacting protein pairs was calculated with respect to molecular function, as defined in the Gene Ontology database. It shows good consistency of the predicted interactions with data in the HPRD database. The intersection between the lists of interactions of other methods and ours produces a network of potentially high-confidence interactions. 相似文献
10.
STRUCTFAST is a novel profile-profile alignment algorithm capable of detecting weak similarities between protein sequences. The increased sensitivity and accuracy of the STRUCTFAST method are achieved through several unique features. First, the algorithm utilizes a novel dynamic programming engine capable of incorporating important information from a structural family directly into the alignment process. Second, the algorithm employs a rigorous analytical formula for profile-profile scoring to overcome the limitations of ad hoc scoring functions that require adjustable parameter training. Third, the algorithm employs Convergent Island Statistics (CIS) to compute the statistical significance of alignment scores independently for each pair of sequences. STRUCTFAST routinely produces alignments that meet or exceed the quality obtained by an expert human homology modeler, as evidenced by its performance in the latest CAFASP4 and CASP6 blind prediction benchmark experiments. 相似文献
11.
T. C. Elleman 《Journal of molecular evolution》1978,11(2):143-161
Summary A method for detecting homology between two protein or nucleic acid sequences which require insertions or deletions for optimum alignment has been devised for use with a computer. Sequences are assessed for possible relationship by Monte Carlo methods involving comparisons between the alignment of the real sequences and alignments of randomly scrambled sequences of the Same composition as the real sequences, each alignment having the optimum number of gaps. As each gap is successively introduced into a comparison (real or random) a maximum score is determined from the similarity of the aligned residues. From the distribution of the maximum alignment scores of randomly scrambled sequences having the same number of gaps, the percentage of random comparisons having higher scores is determined, and the smallest of these percentage levels for each pair of sequences (real or random) indicates the optimum alignment. The fraction of the comparisons of random sequences having percentage levels at their optimum alignment below that of the real sequence comparison at its optimum estimates the probability that such an alignment might have arisen by chance. Related sequences are detected since their optimum alignment score, by virtue of a contribution from ancestral homology in addition to optimised random considerations, occupies a more extreme position in the appropriate frequency distribution of scores than do the majority of optimum scores of randomly scrambled sequences in their appropriate distributions.Application of this optimum match method of sequence comparison shows that the sensitivity of the maximum match method of Needleman and Wunsch (1970) decreases quite dramatically with sequence comparisons which require only a few gaps for a reasonable alignment, or when sequences differ greatly in length. The maximum match method as applied by Barker and Dayhoff (1972) has the additional disadvantage that deletions which have occurred in the longer of two homologous protein sequences further decrease the sensitivity of detection of relationship. The constrained match method of Sankoff and Cedergren (1973) is seen to be misleading since large increments in the alignment score from added gaps do not necessarily result in a high total alignment score required to demonstrate sequence homology. 相似文献
12.
13.
There are currently at least nine distinct glycosidase sequence families which are all known to adopt a TIM barrel fold [Henrissat,B. and Davies,G. (1997) CURR: Opin. Struct. Biol., 7, 637-644]. To explore the relationships between these enzymes and their evolution, comprehensive sequence and structure comparisons were performed, generating four distinct clusters. The first cluster, S1, comprises the alpha-amylase related enzymes, all with the retention mechanism (axial-->axial). The second cluster, S2, included two functional subgroups, one composed of various kinds of glucosidases all with the retention mechanism (equatorial-->equatorial) (the so-called 4/7 superfamily), and the other subgroup including the beta-amylases with the inversion mechanism (axial--> equatorial). The third cluster, S3, with the retention mechanism (equatorial-->equatorial), could be subdivided, based on the catalytic residues and mechanisms, into two functional subgroups: the chitinase group, catalysed by two acidic residues on the C-termini of beta-4 and beta-6, and the hevamine group, using two acidic residues on the C-termini of beta-4 for catalysis. The fourth cluster, S4, is composed of chitobiase with the retention mechanism (equatorial--> equatorial). These clusters are compared with the sequence families derived by Henrissat and coworkers. PSI-BLAST profiles and multiple-alignments of tertiary structures suggest that S1 and S2 are distantly related, as are S3 and S4, which have N-acetylated substrates. This work highlights the difficulties of untangling distant evolutionary relationships in ubiquitous folds such as the TIM barrel. 相似文献
14.
Background
Protein sequence profile-profile alignment is an important approach to recognizing remote homologs and generating accurate pairwise alignments. It plays an important role in protein sequence database search, protein structure prediction, protein function prediction, and phylogenetic analysis.Results
In this work, we integrate predicted solvent accessibility, torsion angles and evolutionary residue coupling information with the pairwise Hidden Markov Model (HMM) based profile alignment method to improve profile-profile alignments. The evaluation results demonstrate that adding predicted relative solvent accessibility and torsion angle information improves the accuracy of profile-profile alignments. The evolutionary residue coupling information is helpful in some cases, but its contribution to the improvement is not consistent.Conclusion
Incorporating the new structural information such as predicted solvent accessibility and torsion angles into the profile-profile alignment is a useful way to improve pairwise profile-profile alignment methods. 相似文献15.
This paper presents a novel approach to profile-profile comparison. The method compares two input profiles (like those that are generated by PSI-BLAST) and assigns a similarity score to assess their statistical similarity. Our profile-profile comparison tool, which allows for gaps, can be used to detect weak similarities between protein families. It has also been optimized to produce alignments that are in very good agreement with structural alignments. Tests show that the profile-profile alignments are indeed highly correlated with similarities between secondary structure elements and tertiary structure. Exhaustive evaluations show that our method is significantly more sensitive in detecting distant homologies than the popular profile-based search programs PSI-BLAST and IMPALA. The relative improvement is the same order of magnitude as the improvement of PSI-BLAST relative to BLAST. Our new tool often detects similarities that fall within the twilight zone of sequence similarity. 相似文献
16.
Two independent stationary P-related neogenes had been previously described in the Drosophila obscura species group and in the Drosophila montium species subgroup. In Drosophila melanogaster, P-transposable elements can encode an 87 kDa transposase and a 66 kDa repressor, but the P-neogenes have only conserved the capacity to encode a 66 kDa repressor-like protein specified by the first three exons. We have previously analyzed the genomic modifications associated with the transition of a P-element into the montium P-neogene, the coding capacity of which has been conserved for around 20 Myr ( Nouaud, D., and D. Anxolabéhère. 1997. Mol. Biol. Evol. 14:1132-1144). Here we show that the P-neogene of some species of the montium subgroup presents a new structure involving the capture of an additional exon from a very distant P-element subfamily. This additional exon is inserted either upstream or downstream of the first exon of the P-neogene. As a result of alternative splicing, these modified neogenes can produce, in addition to the repressor-like protein, a new protein which differs only by the NH2-terminal region. We hypothesize that this protein diversity within an organism results in a functional diversification due to the selective advantage associated with the domestication of the P-neogene in these species. Moreover, the autonomous P-element which provides the additional exons is still present in the genome. Its nucleotide sequence is more than 45% distant from the previously defined P-type element (M-type, O-type, T-type) and defines a new P-type element subfamily referred to as the K-type. 相似文献
17.
Ginalski K Pas J Wyrwicz LS von Grotthuss M Bujnicki JM Rychlewski L 《Nucleic acids research》2003,31(13):3804-3807
ORFeus is a fully automated, sensitive protein sequence similarity search server available to the academic community via the Structure Prediction Meta Server (http://BioInfo.PL/Meta/). The goal of the development of ORFeus was to increase the sensitivity of the detection of distantly related protein families. Predicted secondary structure information was added to the information about sequence conservation and variability, a technique known from hybrid threading approaches. The accuracy of the meta profiles created this way is compared with profiles containing only sequence information and with the standard approach of aligning a single sequence with a profile. Additionally, the alignment of meta profiles is more sensitive in detecting remote homology between protein families than if aligning two sequence-only profiles or if aligning a profile with a sequence. The specificity of the alignment score is improved in the lower specificity range compared with the robust sequence-only profiles. 相似文献
18.
19.
Accurate sequence alignments are crucial for modelling and to provide an evolutionary picture of related proteins. It is well-known that alignments are hard to obtain during distant relationships. Three thousand and fifty-two alignments of 218 pairs of protein domain structural entries, with <40% sequence identity, belonging to different structural classes, of diverse domain sizes and length-rigid/variable domains were performed using 12 programs. Structural parameters such as root mean square deviation, secondary-structural content and equivalences were considered for critical assessment. Methods that compare fragments and permit twists and translations align well during distant relationships and length variations. 相似文献
20.
The discovery of cis-element control motifs in noncoding DNA poses a difficult problem in genome analysis. Functional analysis by means of reporter constructs expressed in transgenic organisms is the most reliable method, but is by itself time-consuming and expensive. Searching noncoding DNA for known control motifs by sequence analysis is problematic, since protein binding motifs are short, in the range of 8-10 bp, and occur frequently by chance. Heretofore, the most reliable sequence analysis method has been the comparison of homologous sequence domains in related but moderately evolutionarily divergent species such as, for example, mouse and human. In such pairwise combinations, control regions are conserved because they serve a vital function and can be identified by their similar sequences. Single pairwise comparisons, however, allow the discovery of conserved sequence strings only at low resolution and without specific identity. We have investigated the possibility of using multiple sequence comparisons to correct these shortcomings. We applied this method to the Hoxc8 early enhancer region that has been previously analyzed in depth by functional methods and through its application successfully identified known protein binding cis-element motifs. Candidate protein binding sites could also be identified. This method, based on evolutionarily related sequence comparisons, should be quite useful as a prescreening step prior to functional analysis with corresponding savings in time and resources. 相似文献