共查询到20条相似文献,搜索用时 15 毫秒
1.
Guillermo Thode Juan Antonio García-Renea Juan Jimenez 《Journal of molecular evolution》1996,42(2):224-233
Proteins of related functions are often similar in sequence, reflecting a common phylogenetic origin. Proteins with no known homology are probably diversified proteins, too distantly related to known sequences in databases to retain significant similarity. All proteins, however, probably share common ancestries if one moves far enough back in evolution; therefore, given the huge accumulation of protein sequences in current databases, it could be expected that some proteins with no obvious sequence resemblance to any other share some residues that could represent footprints of ancient common ancestries. To identify such putative footprints, we have searched for short stretches of amino acids present in a given protein sequence that are also found in a significant number of nonrelated proteins in the database. The significantly high frequency of occurrence of these patterns in the database would support a common evolutionary source, and a diversity of non-related proteins that contain the pattern would express their ancient origin. Using this strategy, significant patterns were found in actual exons, but not in randomized amino acid sequences, nor in translated sequences of noncoding DNA, suggesting that this strategy actually leads to the identification of patterns with a biological significance. These significant patterns are not randomly positioned along the sequences analyzed, but they tend to accumulate within specific regions, producing a profile of discrete domains. In some well-known proteins analyzed in this study, some of these domains are coincident with known motifs. Thus, the procedure described in this paper could be useful for identifying ancient patterns and domains in protein sequences, some of which could also have a functional or structural significance. 相似文献
2.
Shi S Zhong Y Majumdar I Sri Krishna S Grishin NV 《Bioinformatics (Oxford, England)》2007,23(11):1331-1338
MOTIVATION: Many evolutionarily distant, but functionally meaningful links between proteins come to light through comparison of spatial structures. Most programs that assess structural similarity compare two proteins to each other and find regions in common between them. Structural classification experts look for a particular structural motif instead. Programs base similarity scores on superposition or closeness of either Cartesian coordinates or inter-residue contacts. Experts pay more attention to the general orientation of the main chain and mutual spatial arrangement of secondary structural elements. There is a need for a computational tool to find proteins with the same secondary structures, topological connections and spatial architecture, regardless of subtle differences in 3D coordinates. RESULTS: We developed ProSMoS--a Protein Structure Motif Search program that emulates an expert. Starting from a spatial structure, the program uses previously delineated secondary structural elements. A meta-matrix of interactions between the elements (parallel or antiparallel) minding handedness of connections (left or right) and other features (e.g. element lengths and hydrogen bonds) is constructed prior to or during the searches. All structures are reduced to such meta-matrices that contain just enough information to define a protein fold, but this definition remains very general and deviations in 3D coordinates are tolerated. User supplies a meta-matrix for a structural motif of interest, and ProSMoS finds all proteins in the protein data bank (PDB) that match the meta-matrix. ProSMoS performance is compared to other programs and is illustrated on a beta-Grasp motif. A brief analysis of all beta-Grasp-containing proteins is presented. Program availability: ProSMoS is freely available for non-commercial use from ftp://iole.swmed.edu/pub/ProSMoS. 相似文献
3.
The database of Phylogeny and ALIgnment of homologous protein structures (PALI) contains three-dimensional (3-D) structure-dependent sequence alignments as well as structure-based phylogenetic trees of protein domains in various families. The latest updated version (Release 2.1) comprises of 844 families of homologous proteins involving 3863 protein domain structures with each of these families having at least two members. Each member in a family has been structurally aligned with every other member in the same family using two proteins at a time. In addition, an alignment of multiple structures has also been performed using all the members in a family. Every family with at least three members is associated with two dendrograms, one based on a structural dissimilarity metric and the other based on similarity of topologically equivalenced residues for every pairwise alignment. Apart from these multi-member families, there are 817 single member families in the updated version of PALI. A new feature in the current release of PALI is the integration, with 3-D structural families, of sequences of homologues from the sequence databases. Alignments between homologous proteins of known 3-D structure and those without an experimentally derived structure are also provided for every family in the enhanced version of PALI. The database with several web interfaced utilities can be accessed at: http://pauling.mbu.iisc.ernet.in/~pali. 相似文献
4.
基于模糊邻近关系的粒度空间,对蛋白质序列进行聚类结构分析。利用MEGA软件计算选取的木聚糖酶序列间的比对距离,引入内积将其转化为模糊邻近关系(或矩阵),再应用算法求解其粒度空间,进行序列的聚类结构分析和最佳聚类确定研究。这些研究为蛋白质序列提供了定量分析的工具。 相似文献
5.
I. Jonassen J. F. Collins D. G. Higgins 《Protein science : a publication of the Protein Society》1995,4(8):1587-1595
We present a new method for the identification of conserved patterns in a set of unaligned related protein sequences. It is able to discover patterns of a quite general form, allowing for both ambiguous positions and for variable length wildcard regions. It allows the user to define a class of patterns (e.g., the degree of ambiguity allowed and the length and number of gaps), and the method is then guaranteed to find the conserved patterns in this class scoring highest according to a significance measure defined. Identified patterns may be refined using one of two new algorithms. We present a new (nonstatistical) significance measure for flexible patterns. The method is shown to recover known motifs for PROSITE families and is also applied to some recently described families from the literature. 相似文献
6.
This work presents a method to compare local clusters of interactingresidues as observed in a known three-dimensional protein structurewith corresponding clusters inferred from homologous proteinsequences, assuming conserved protein folding. For this purposethe local environment of a selected residue in a known proteinstructure is defined as the ensemble of amino acids in contactwith it in the folded state. Using a multiple sequence alignmentto identify corresponding residues in homologous proteins, adetailed comparison can be performed between the local environmentof a selected amino acid in the template protein structure andthe expected local environments at the sets of equivalent residues,derived from the aligned protein sequences. The comparison makesit possible to detect conserved local features such as hydrogenbonding or complementarity in residue substitution. A globalmeasure of environmental similarity is also defined, to searchfor conserved amino acid clusters subject to functional or structural constraints. The proposed approach is useful for investigatingprotein function as well as for site-directed mutagenesis experiments,where appropriate amino acid substitutions can be suggestedby observing naturally occurring protein variants. 相似文献
7.
Structural class characterizes the overall folding type of a protein or its domain. This paper develops an accurate method for in silico prediction of structural classes from low homology (twilight zone) protein sequences. The proposed LLSC-PRED method applies linear logistic regression classifier and a custom-designed, feature-based sequence representation to provide predictions. The main advantages of the LLSC-PRED are the comprehensive representation that includes 58 features describing composition and physicochemical properties of the sequences and transparency of the prediction model. The representation also includes predicted secondary structure content, thus for the first time exploring synergy between these two related predictions. Based on tests performed with a large set of 1673 twilight zone domains, the LLSC-PRED's prediction accuracy, which equals over 62%, is shown to be better than accuracy of over a dozen recently published competing in silico methods and similar to accuracy of other, non-transparent classifiers that use the proposed representation. 相似文献
8.
The use of folding patterns in the search of protein structural similarities; a three-dimensional model of phosphoribosyl transferases 总被引:2,自引:0,他引:2
B Busetta 《Biochimica et biophysica acta》1988,957(1):21-33
A new way to predict the topologies of proteins of unknown three-dimensional structure is derived from the comparison of the distribution of the strongest predicted secondary structures with equivalent distributions recorded for proteins of known X-ray structures. As an illustration the tentative three-dimensional model of phosphoribosyl transferases which was proposed by Argos et al. is rediscussed. 相似文献
9.
10.
Background
A large number of PROSITE patterns select false positives and/or miss known true positives. It is possible that – at least in some cases – the weak specificity and/or sensitivity of a pattern is due to the fact that one, or maybe more, functional and/or structural key residues are not represented in the pattern. Multiple sequence alignments are commonly used to build functional sequence patterns. If residues structurally conserved in proteins sharing a function cannot be aligned in a multiple sequence alignment, they are likely to be missed in a standard pattern construction procedure. 相似文献11.
Background
Knowledge of structural class is used by numerous methods for identification of structural/functional characteristics of proteins and could be used for the detection of remote homologues, particularly for chains that share twilight-zone similarity. In contrast to existing sequence-based structural class predictors, which target four major classes and which are designed for high identity sequences, we predict seven classes from sequences that share twilight-zone identity with the training sequences. 相似文献12.
In the field of evolutionary structural genomics, methods are needed to evaluate why genomes evolved to contain the fold distributions that are observed. In order to study the effects of population dynamics in the evolved genomes we need fast and accurate evolutionary models which can analyze the effects of selection, drift and fixation of a protein sequence in a population that are grounded by physical parameters governing the folding and binding properties of the sequence. In this study, various knowledge-based, force field, and statistical methods for protein folding have been evaluated with four different folds: SH2 domains, SH3 domains, Globin-like, and Flavodoxin-like, to evaluate the speed and accuracy of the energy functions. Similarly, knowledge-based and force field methods have been used to predict ligand binding specificity in SH2 domain. To demonstrate the applicability of these methods, the dynamics of evolution of new binding capabilities by an SH2 domain is demonstrated. 相似文献
13.
A new approach to search for common patterns in many sequencesis presented. The idea is that one sequence from the set ofsequences to be compared is considered as a basicone and all its similarities with other sequences are found.Multiple similarities are then reconstructed using these data.This approach allows one to search for similar segments whichcan differ in both substitutions and deletions/insertions. Thesesegments can be situated at different positions in various sequences.No regions of complete or strong similarity within the segmentsare required. The other parts of the sequences can have no similarityat all. The only requirement is that the similar segments canbe found in all the sequences (or in the majority of them, giventhe common segments are present in the basic sequence). Workingtime of an algorithm presented is proportional to n.L2when nsequences of length L are analyzed. The algorithm proposed isimplemented as programs for the IBM-PC and IBM/370. Its applicationsto the analysis of biopolymer primary structures as well asthe dependence of the results on the choice of basic sequenceare discussed. 相似文献
14.
Information on the structural classes of proteins has been proven to be important in many fields of bioinformatics. Prediction of protein structural class for low-similarity sequences is a challenge problem. In this study, 11 features (including 8 re-used features and 3 newly-designed features) are rationally utilized to reflect the general contents and spatial arrangements of the secondary structural elements of a given protein sequence. To evaluate the performance of the proposed method, jackknife cross-validation tests are performed on two widely used benchmark datasets, 1189 and 25PDB with sequence similarity lower than 40% and 25%, respectively. Comparison of our results with other methods shows that our proposed method is very promising and may provide a cost-effective alternative to predict protein structural class in particular for low-similarity datasets. 相似文献
15.
Several stratagems are used in protein bioinformatics for the classification of proteins based on sequence, structure or function. We explore the concept of a minimal signature embedded in a sequence that defines the likely position of a protein in a classification. Specifically, we address the derivation of sparse profiles for the G-protein coupled receptor (GPCR) clan of integral membrane proteins. We present an evolutionary algorithm (EA) for the derivation of sparse profiles (signatures) without the need to supply a multiple alignment. We also apply an evolution strategy (ES) to the problem of pattern and profile refinement. Patterns were derived for the GPCR 'superfamily' and GPCR families 1-3 individually from starting populations of randomly generated signatures, using a database of integral membrane protein sequences and an objective function using a modified receiver operator characteristic (ROC) statistic. The signature derived for the family 1 GPCR sequences was shown to perform very well in a stringent cross-validation test, detecting 76% of unseen GPCR sequences at 5% error. Application of the ES refinement method to a signature developed by a previously described method [Sadowski, M.I., Parish, J.H., 2003. Automated generation and refinement of protein signatures: case study with G-protein coupled receptors. Bioinformatics 19, 727-734] resulted in a 6% increase of coverage for 5% error as measured in the validation test. We note that there might be a limit to this or any classification of proteins based on patterns or schemata. 相似文献
16.
Sunyaev SR Bogopolsky GA Oleynikova NV Vlasov PK Finkelstein AV Roytberg MA 《Proteins》2004,54(3):569-582
Alignment of protein sequences is a key step in most computational methods for prediction of protein function and homology-based modeling of three-dimensional (3D)-structure. We investigated correspondence between "gold standard" alignments of 3D protein structures and the sequence alignments produced by the Smith-Waterman algorithm, currently the most sensitive method for pair-wise alignment of sequences. The results of this analysis enabled development of a novel method to align a pair of protein sequences. The comparison of the Smith-Waterman and structure alignments focused on their inner structure and especially on the continuous ungapped alignment segments, "islands" between gaps. Approximately one third of the islands in the gold standard alignments have negative or low positive score, and their recognition is below the sensitivity limit of the Smith-Waterman algorithm. From the alignment accuracy perspective, the time spent by the algorithm while working in these unalignable regions is unnecessary. We considered features of the standard similarity scoring function responsible for this phenomenon and suggested an alternative hierarchical algorithm, which explicitly addresses high scoring regions. This algorithm is considerably faster than the Smith-Waterman algorithm, whereas resulting alignments are in average of the same quality with respect to the gold standard. This finding shows that the decrease of alignment accuracy is not necessarily a price for the computational efficiency. 相似文献
17.
R Staden 《DNA sequence》1991,1(6):369-374
We describe programs that can screen nucleic acid and protein sequences against libraries of motifs and patterns. Such comparisons are likely to play an important role in interpreting the function of sequences determined during large scale sequencing projects. In addition we report programs for converting the Prosite protein motif library into a form that is compatible with our searching programs. The programs work on VAX and SUN computers. 相似文献
18.
19.
J G Guillet J Hoebeke R Lengagne K Tate F Borras-Herrera A D Strosberg F Borras-Cuesta 《Journal of molecular recognition : JMR》1991,4(1):17-25
We present a homology scanning microcomputer program to predict functional T-cell epitopes within proteins. By taking into account particular human or mouse restriction elements the predictions are made haplotype-specific. The generality of this approach is confirmed by (i) identification of well-characterized immunogenic T-cell determinants in lysozyme (ii) search for potential T epitopes on unanalysed proteins like the human beta 2-adrenoreceptor (iii) modification of non-immunogenic peptide sequences in order to generate T-cell determinants. 相似文献
20.
SUMMARY: MASIA is a software tool for pattern recognition in multiple aligned protein sequences. MASIA converts a sequence to a properties matrix that can be scanned in both vertical and horizontal steps. Consistent patterns are recognized based on the statistical significance of their occurrence. Preset macros can be altered on-line to seek any combination of amino acid properties or sequence characteristics. MASIA output can be used directly by our programs to predict the 3D structure of proteins. AVAILABILITY: Access MASIA at http://www.scsb.utmb.edu/masia/ma sia.html. 相似文献