共查询到20条相似文献,搜索用时 15 毫秒
1.
MOTIVATION: As protein structure database expands, protein loop modeling remains an important and yet challenging problem. Knowledge-based protein loop prediction methods have met with two challenges in methodology development: (1) loop boundaries in protein structures are frequently problematic in constructing length-dependent loop databases for protein loop predictions; (2) knowledge-based modeling of loops of unknown structure requires both aligning a query loop sequence to loop templates and ranking the loop sequence-template matches. RESULTS: We developed a knowledge-based loop prediction method that circumvents the need of constructing hierarchically clustered length-dependent loop libraries. The method first predicts local structural fragments of a query loop sequence and then structurally aligns the predicted structural fragments to a set of non-redundant loop structural templates regardless of the loop length. The sequence-template alignments are then quantitatively evaluated with an artificial neural network model trained on a set of predictions with known outcomes. Prediction accuracy benchmarks indicated that the novel procedure provided an alternative approach overcoming the challenges of knowledge-based loop prediction. AVAILABILITY: http://cmb.genomics.sinica.edu.tw 相似文献
2.
A generic geometric transformation that unifies a wide range of natural and abstract shapes 总被引:1,自引:0,他引:1
Gielis J 《American journal of botany》2003,90(3):333-338
To study forms in plants and other living organisms, several mathematical tools are available, most of which are general tools that do not take into account valuable biological information. In this report I present a new geometrical approach for modeling and understanding various abstract, natural, and man-made shapes. Starting from the concept of the circle, I show that a large variety of shapes can be described by a single and simple geometrical equation, the Superformula. Modification of the parameters permits the generation of various natural polygons. For example, applying the equation to logarithmic or trigonometric functions modifies the metrics of these functions and all associated graphs. As a unifying framework, all these shapes are proven to be circles in their internal metrics, and the Superformula provides the precise mathematical relation between Euclidean measurements and the internal non-Euclidean metrics of shapes. Looking beyond Euclidean circles and Pythagorean measures reveals a novel and powerful way to study natural forms and phenomena. 相似文献
3.
A statistical analysis of known structures is made for an assessment of the utility of short-range energy considerations. For each type of amino acid, the potentials governing (1) the torsions and bond angle changes of virtual Cα-Cα bonds and (2) the coupling between torsion and bond angle changes are derived. These contribute approximately −2 RT per residue to the stability of native proteins, approximately half of which is due to coupling effects. The torsional potentials for the α-helical states of different residues are verified to be strongly correlated with the free-energy change measurements made upon single-site mutations at solvent-exposed regions. Likewise, a satisfactory correlation is shown between the β-sheet potentials of different amino acids and the scales from free-energy measurements, despite the role of tertiary context in stabilizing β-sheets. Furthermore, there is excellent agreement between our residue-specific potentials for α-helical state and other thermodynamic based scales. Threading experiments performed by using an inverse folding protocol show that 50 of 62 test structures correctly recognize their native sequence on the basis of short-range potentials. The performance is improved to 55, upon simultaneous consideration of short-range potentials and the nonbonded interaction potentials between sequentially distant residues. Interactions between near residues along the primary structure, i.e., the local or short-range interactions, are known to be insufficient, alone, for understanding the tertiary structural preferences of proteins alone. Yet, knowledge of short-range conformational potentials permits rationalizing the secondary structure propensities and aids in the discrimination between correct and incorrect tertiary folds. Proteins 29:292–308, 1997. © 1997 Wiley-Liss, Inc. 相似文献
4.
FUGUE, a program for recognizing distant homologues by sequence-structure comparison (http://www-cryst.bioc.cam.ac.uk/fugue/), has three key features. (1) Improved environment-specific substitution tables. Substitutions of an amino acid in a protein structure are constrained by its local structural environment, which can be defined in terms of secondary structure, solvent accessibility, and hydrogen bonding status. The environment-specific substitution tables have been derived from structural alignments in the HOMSTRAD database (http://www-cryst.bioc. cam.ac.uk/homstrad/). (2) Automatic selection of alignment algorithm with detailed structure-dependent gap penalties. FUGUE uses the global-local algorithm to align a sequence-structure pair when they greatly differ in length and uses the global algorithm in other cases. The gap penalty at each position of the structure is determined according to its solvent accessibility, its position relative to the secondary structure elements (SSEs) and the conservation of the SSEs. (3) Combined information from both multiple sequences and multiple structures. FUGUE is designed to align multiple sequences against multiple structures to enrich the conservation/variation information. We demonstrate that the combination of these three key features implemented in FUGUE improves both homology recognition performance and alignment accuracy. 相似文献
5.
MOTIVATION: Local structure segments (LSSs) are small structural units shared by unrelated proteins. They are extensively used in protein structure comparison, and predicted LSSs (PLSSs) are used very successfully in ab initio folding simulations. However, predicted or real LSSs are rarely exploited by protein sequence comparison programs that are based on position-by-position alignments. RESULTS: We developed a SEgment Alignment algorithm (SEA) to compare proteins described as a collection of predicted local structure segments (PLSSs), which is equivalent to an unweighted graph (network). Any specific structure, real or predicted corresponds to a specific path in this network. SEA then uses a network matching approach to find two most similar paths in networks representing two proteins. SEA explores the uncertainty and diversity of predicted local structure information to search for a globally optimal solution. It simultaneously solves two related problems: the alignment of two proteins and the local structure prediction for each of them. On a benchmark of protein pairs with low sequence similarity, we show that application of the SEA algorithm improves alignment quality as compared to FFAS profile-profile alignment, and in some cases SEA alignments can match the structural alignments, a feat previously impossible for any sequence based alignment methods. 相似文献
6.
A multiple alignment program for protein sequences 总被引:1,自引:0,他引:1
A program for the multiple alignment of protein sequences ispresented. The program is an extension of the fast alignmentprogram by Wilbur et al. (1984) into higher dimensions. Theuse of hash procedures on fragments of the protein sequencesincreases the speed of calculation. Thereby we also take intoaccount fragments which are present in some, but not in all,sequences considered. The results of some multiple alignmentsare given. Received on September 11, 1986; accepted on March 18, 1987 相似文献
7.
A rapid method of protein structure alignment 总被引:5,自引:0,他引:5
A reduction in the time required to compare two protein structures has been achieved for a previously developed structure alignment method, by reducing the number of residue pair comparisons which must be performed between the two structures. Subsets of residue pairs are selected by an iterative procedure. Initially, selection is based on similarities in solvent accessible surface areas or torsional angles or a combination of both properties, giving subsets containing approximately 2% of the total number of residue pairs. Using these subsets, a rough comparison of the two structures is generated by the structural alignment program. The information returned from this can be used to identify more accurately topologically equivalent residues in the two proteins, thus enabling a new and much smaller subset (less than 0.2% of the total number of residue pairs) to be selected. The process of iterative refinement of the residue pair subsets is repeated once more, when in 95% of the structure comparisons tested, the correct alignment of the proteins was obtained. Times required to compare the structures using the refined subsets are insignificant compared to the initial comparison, so that considerable increases in speed are possible. The method was tested on two groups of proteins, a set of remotely related alpha/beta nucleotide proteins and the variable and constant domains of the immunoglobulins. Increases in speed ranging from 50-fold to greater than 150-fold were obtained depending on the degree of similarity of the two structures. In some comparisons the alignment was improved due to the reduction in noise obtained by comparing mainly equivalent residues. 相似文献
8.
MOTIVATION: This work aims to develop computational methods to annotate protein structures in an automated fashion. We employ a support vector machine (SVM) classifier to map from a given class of structures to their corresponding structural (SCOP) or functional (Gene Ontology) annotation. In particular, we build upon recent work describing various kernels for protein structures, where a kernel is a similarity function that the classifier uses to compare pairs of structures. RESULTS: We describe a kernel that is derived in a straightforward fashion from an existing structural alignment program, MAMMOTH. We find in our benchmark experiments that this kernel significantly out-performs a variety of other kernels, including several previously described kernels. Furthermore, in both benchmarks, classifying structures using MAMMOTH alone does not work as well as using an SVM with the MAMMOTH kernel. AVAILABILITY: http://noble.gs.washington.edu/proj/3dkernel 相似文献
9.
A holistic approach to protein structure alignment 总被引:4,自引:0,他引:4
A method of protein structure comparison developed previously is extended to incorporate other aspects of protein structure in addition to the inter-atomic vectors on which it was originally based. Each additional aspect, which induced hydrogen bonding, solvent exposure, torsional angles and sequence, was introduced separately and evaluated for its ability to improve alignment quality. The components were then combined, suitably weighted, to produce a more holistic comparison method. The method was tested on a group of remotely related beta/alpha type proteins that share a common feature in their overall chain fold. The results indicated that while the original inter-atomic vector component was sufficient to give the correct alignment of most pairs of topologically equivalent proteins, the inclusion of hydrogen bonds, torsion angles and a measure of solvent exposure led to improvements in the more difficult comparisons. Consideration of amino acid properties, including hydrophobicity, had no beneficial effect. The failure of the latter component was not unexpected considering the almost total lack of sequence similarity among the proteins considered. 相似文献
10.
Shibberu Y Holder A 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2011,8(4):867-875
A new intrinsic geometry based on a spectral analysis is used to motivate methods for aligning protein folds. The geometry is induced by the fact that a distance matrix can be scaled so that its eigenvalues are positive. We provide a mathematically rigorous development of the intrinsic geometry underlying our spectral approach and use it to motivate two alignment algorithms. The first uses eigenvalues alone and dynamic programming to quickly compute a fold alignment. Family identification results are reported for the Skolnick40 and Proteus300 data sets. The second algorithm extends our spectral method by iterating between our intrinsic geometry and the 3D geometry of a fold to make high-quality alignments. Results and comparisons are reported for several difficult fold alignments. The second algorithm's ability to correctly identify fold families in the Skolnick40 and Proteus300 data sets is also established. 相似文献
11.
A parameterized algorithm for protein structure alignment. 总被引:2,自引:0,他引:2
This paper proposes a parameterized polynomial time approximation scheme (PTAS) for aligning two protein structures, in the case where one protein structure is represented by a contact map graph and the other by a contact map graph or a distance matrix. If the sequential order of alignment is not required, the time complexity is polynomial in the protein size and exponential with respect to two parameters D(u)/D(l) and D(c)/D(l), which usually can be treated as constants. In particular, D(u) is the distance threshold determining if two residues are in contact or not, D(c) is the maximally allowed distance between two matched residues after two proteins are superimposed, and D(l) is the minimum inter-residue distance in a typical protein. This result clearly demonstrates that the computational hardness of the contact map based protein structure alignment problem is related not to protein size but to several parameters modeling the problem. The result is achieved by decomposing the protein structure using tree decomposition and discretizing the rigid-body transformation space. Preliminary experimental results indicate that on a Linux PC, it takes from ten minutes to one hour to align two proteins with approximately 100 residues. 相似文献
12.
Pei J 《Current opinion in structural biology》2008,18(3):382-386
Multiple sequence alignments are essential in computational analysis of protein sequences and structures, with applications in structure modeling, functional site prediction, phylogenetic analysis and sequence database searching. Constructing accurate multiple alignments for divergent protein sequences remains a difficult computational task, and alignment speed becomes an issue for large sequence datasets. Here, I review methodologies and recent advances in the multiple protein sequence alignment field, with emphasis on the use of additional sequence and structural information to improve alignment quality. 相似文献
13.
Bioinformatics (2007) 23(7), 789–792 The authors wish to apologize for the omission 相似文献
14.
Summary Blood leukocytes exhibit specific cell type recognition. Neutrophils adhere to neutrophils, eosinophils to eosinophils, basophils to basophils and monocytes to monocytes. Rather large homotypic aggragates are formed. These are almost abolished by prior treatment of the cells with trypsin. It is assumed that a protein is involved in this type of cell recognition. protein monomer-monomer interaction could provide the specificity required in homotypic aggregate formation. 相似文献
15.
A signal encoded in vertebrate DNA that influences nucleosome positioning and alignment. 总被引:1,自引:2,他引:1
下载免费PDF全文

Evidence is provided that the nucleotide triplet con-sensus non-T(A/T)G (abbreviated to VWG) influences nucleosome positioning and nucleosome alignment into regular arrays. This triplet consensus has been recently found to exhibit a fairly strong 10 bp periodicity in human DNA, implicating it in anisotropic DNA bendability. It is demonstrated that the experimentally determined preferences for nucleosome positioning in native SV40 chromatin can, to a large extent, be pre-dicted simply by counting the occurrences of the period-10 VWG consensus. Nucleosomes tend to form in regions of the SV40 genome that contain high counts of period-10 VWG and/or avoid regions with low counts. In contrast, periodic occurrences of the dinucleotides AA/TT, implicated in the rotational positioning of DNA in nucleosomes, did not correlate with the preferred nucleosome locations in SV40 chromatin. Periodic occurrences of AA did correlate with preferred nucleosome locations in a region of SV40 DNA where VWG occurrences are low. Regular oscillations in period-10 VWG counts with a dinucleosome period were found in vertebrate DNA regions that aligned nucleosomes into regular arrays in vitro in the presence of linker histone. Escherichia coli and plasmid DNA, which fail to align nucleosomes in vitro, lacked these regular VWG oscillations. 相似文献
16.
The biological role, biochemical function, and structure of uncharacterized protein sequences is often inferred from their similarity to known proteins. A constant goal is to increase the reliability, sensitivity, and accuracy of alignment techniques to enable the detection of increasingly distant relationships. Development, tuning, and testing of these methods benefit from appropriate benchmarks for the assessment of alignment accuracy.Here, we describe a benchmark protocol to estimate sequence-to-sequence and sequence-to-structure alignment accuracy. The protocol consists of structurally related pairs of proteins and procedures to evaluate alignment accuracy over the whole set. The set of protein pairs covers all the currently known fold types. The benchmark is challenging in the sense that it consists of proteins lacking clear sequence similarity.Correct target alignments are derived from the three-dimensional structures of these pairs by rigid body superposition. An evaluation engine computes the accuracy of alignments obtained from a particular algorithm in terms of alignment shifts with respect to the structure derived alignments. Using this benchmark we estimate that the best results can be obtained from a combination of amino acid residue substitution matrices and knowledge-based potentials. 相似文献
17.
MOTIVATION: Currently, the most accurate fold-recognition method is to perform profile-profile alignments and estimate the statistical significances of those alignments by calculating Z-score or E-value. Although this scheme is reliable in recognizing relatively close homologs related at the family level, it has difficulty in finding the remote homologs that are related at the superfamily or fold level. RESULTS: In this paper, we present an alternative method to estimate the significance of the alignments. The alignment between a query protein and a template of length n in the fold library is transformed into a feature vector of length n + 1, which is then evaluated by support vector machine (SVM). The output from SVM is converted to a posterior probability that a query sequence is related to a template, given SVM output. Results show that a new method shows significantly better performance than PSI-BLAST and profile-profile alignment with Z-score scheme. While PSI-BLAST and Z-score scheme detect 16 and 20% of superfamily-related proteins, respectively, at 90% specificity, a new method detects 46% of these proteins, resulting in more than 2-fold increase in sensitivity. More significantly, at the fold level, a new method can detect 14% of remotely related proteins at 90% specificity, a remarkable result considering the fact that the other methods can detect almost none at the same level of specificity. 相似文献
18.
19.
20.
MOTIVATION: In recent years, several methods have been proposed for aligning two protein sequence profiles, with reported improvements in alignment accuracy and homolog discrimination versus sequence-sequence methods (e.g. BLAST) and profile-sequence methods (e.g. PSI-BLAST). Profile-profile alignment is also the iterated step in progressive multiple sequence alignment algorithms such as CLUSTALW. However, little is known about the relative performance of different profile-profile scoring functions. In this work, we evaluate the alignment accuracy of 23 different profile-profile scoring functions by comparing alignments of 488 pairs of sequences with identity < or =30% against structural alignments. We optimize parameters for all scoring functions on the same training set and use profiles of alignments from both PSI-BLAST and SAM-T99. Structural alignments are constructed from a consensus between the FSSP database and CE structural aligner. We compare the results with sequence-sequence and sequence-profile methods, including BLAST and PSI-BLAST. RESULTS: We find that profile-profile alignment gives an average improvement over our test set of typically 2-3% over profile-sequence alignment and approximately 40% over sequence-sequence alignment. No statistically significant difference is seen in the relative performance of most of the scoring functions tested. Significantly better results are obtained with profiles constructed from SAM-T99 alignments than from PSI-BLAST alignments. AVAILABILITY: Source code, reference alignments and more detailed results are freely available at http://phylogenomics.berkeley.edu/profilealignment/ 相似文献