共查询到20条相似文献,搜索用时 15 毫秒
1.
The role of repeating motifs in protein structures is thought to be as modular building blocks which allow an economic way of constructing complex proteins. In this work novel wavelet transform analysis techniques are used to detect and characterize repeating motifs in protein sequence and structure data, where the Kyte-Doolittle hydrophobicity scale (Eta Phi) and relative accessible surface area (rASA) data provide residue information about the protein sequence and structure, respectively. We analyze a variety of repeating protein motifs, TIM barrels, propellor blades, coiled coils and leucine-rich repeat structures. Detection and characterization of these motifs is performed using techniques based on the continuous wavelet transform (CWT). Results indicate that the wavelet transform techniques developed herein are a promising approach for the detection and characterization of repeating motifs for both structural and in some instances sequence data. 相似文献
2.
An automated algorithm is presented that delineates protein sequence fragments which display similarity. The method incorporates a selection of a number of local nonoverlapping sequence alignments with the highest similarity scores and a graphtheoretical approach to elucidate the consistent start and end points of the fragments comprising one or more ensembles of related subsequences. The procedure allows the simultaneous identification of different types of repeats within one sequence. A multiple alignment of the resulting fragments is performed and a consensus sequence derived from the ensemble(s). Finally, a profile is constructed form the multiple alignment to detect possible and more distant members within the sequence. The method tolerates mutations in the repeats as well as insertions and deletions. The sequence spans between the various repeats or repeat clusters may be of different lengths. The technique has been applied to a number of proteins where the repeating fragments have been derived from information additional to the protein sequences. © 1993 Wiley-Liss, Inc. 相似文献
3.
4.
Mandrich L Pezzullo M Del Vecchio P Barone G Rossi M Manco G 《Journal of molecular biology》2004,335(1):357-369
The recently solved three-dimensional (3D) structures of two thermostable members of the carboxylesterase/lipase HSL family, namely the Alicyclobacillus (formerly Bacillus) acidocaldarius and Archaeoglobus fulgidus carboxylesterases (EST2 and AFEST, respectively) were compared with that of the mesophilic homologous counterpart Brefeldine A esterase from Bacillus subtilis. Since the 3D homology models of other members of the HSL family were also available, we performed a structural alignment with all these sequences. The resulting alignment was used to assess the amino acid “traffic rule” in the HSL family. Quite surprisingly, the data were in very good agreement with those recently reported from two independent groups and based on the comparison of a huge number of homologous sequences from the genus Bacillus, Methanococcus and Deinococcus/Thermus. Taken as a whole, the data point to the statistical meaning of defined amino acid conversions going from psychrophilic to hyperthermophilic sequences. We identified and mapped several such changes onto the EST2 structure and observed that such mutations were localized mostly in loops regions or α-helices and were mostly excluded from the active site. A site-directed mutagenesis of two of the identified residues confirmed they were involved in thermal stability. 相似文献
5.
M. A. Rodionov M. S. Johnson 《Protein science : a publication of the Protein Society》1994,3(12):2366-2377
We report the derivation of scores that are based on the analysis of residue-residue contact matrices from 443 3-dimensional structures aligned structurally as 96 families, which can be used to evaluate sequence-structure matches. Residue-residue contacts and the more than 3 x 10(6) amino acid substitutions that take place between pairs of these contacts at aligned positions within each family of structures have been tabulated and segregated according to the solvent accessibility of the residues involved. Contact maps within a family of structures are shown to be highly conserved (approximately 75%) even when the sequence identity is approaching 10%. In a comparison involving a globin structure and the search of a sequence databank (> 21,000 sequences), the contact probability scores are shown to provide a very powerful secondary screen for the top scoring sequence-structure matches, where between 69% and 84% of the unrelated matches are eliminated. The search of an aligned set of 2 globins against a sequence databank and the subsequent residue contact-based evaluation of matches locates all 618 globin sequences before the first non-globin match. From a single bacterial serine proteinase structure, the structural template approach coupled with residue-residue contact substitution data lead to the detection of the mammalian serine proteinase family among the top matches in the search of a sequence databank. 相似文献
6.
Andrej Ali John P. Overington 《Protein science : a publication of the Protein Society》1994,3(9):1582-1596
We describe a database of protein structure alignments as well as methods and tools that use this database to improve comparative protein modeling. The current version of the database contains 105 alignments of similar proteins or protein segments. The database comprises 416 entries, 78,495 residues, 1,233 equivalent entry pairs, and 230,396 pairs of equivalent alignment positions. At present, the main application of the database is to improve comparative modeling by satisfaction of spatial restraints implemented in the program MODELLER (?ali A, Blundell TL, 1993, J Mol Biol 234:779–815). To illustrate the usefulness of the database, the restraints on the conformation of a disulfide bridge provided by an equivalent disulfide bridge in a related structure are derived from the alignments; the prediction success of the disulfide dihedral angle classes is increased to approximately 80%, compared to approximately 55% for modeling that relies on the stereochemistry of disulfide bridges alone. The second example of the use of the database is the derivation of the probability density function for comparative modeling of the cis/trans isomerism of the proline residues; the prediction success is increased from 0% to 82.9% for cis-proline and from 93.3% to 96.2% for trans-proline. The database is available via electronic mail. 相似文献
7.
Christoph Berbalk Christine S. Schwaiger Peter Lackner 《Protein science : a publication of the Protein Society》2009,18(10):2027-2035
Protein structure alignment methods are essential for many different challenges in protein science, such as the determination of relations between proteins in the fold space or the analysis and prediction of their biological function. A number of different pairwise and multiple structure alignment (MStA) programs have been developed and provided to the community. Prior knowledge of the expected alignment accuracy is desirable for the user of such tools. To retrieve an estimate of the performance of current structure alignment methods, we compiled a test suite taken from literature and the SISYPHUS database consisting of proteins that are difficult to align. Subsequently, different MStA programs were evaluated regarding alignment correctness and general limitations. The analysis shows that there are large differences in the success between the methods in terms of applicability and correctness. The latter ranges from 44 to 75% correct core positions. Taking only the best method result per test case this number increases to 84%. We conclude that the methods available are applicable to difficult cases, but also that there is still room for improvements in both, practicability and alignment correctness. An approach that combines the currently available methods supported by a proper score would be useful. Until then, a user should not rely on just a single program. 相似文献
8.
Comparison of two protein structures often results in not only a global alignment but also a number of distinct local alignments; the latter, referred to as alternative alignments, are however usually ignored in existing protein structure comparison analyses. Here, we used a novel method of protein structure comparison to extensively identify and characterize the alternative alignments obtained for structure pairs of a fold classification database. We showed that all alternative alignments can be classified into one of just a few types, and with which illustrated the potential of using alternative alignments to identify recurring protein substructures, including the internal structural repeats of a protein. Furthermore, we showed that among the alternative alignments obtained, permuted alignments, which included both circular and scrambled permutations, are as prevalent as topological alignments. These results demonstrated that the so far largely unattended alternative alignments of protein structures have implications and applications for research of protein classification and evolution. 相似文献
9.
The formation of fibril aggregates by long polyglutamine sequences is assumed to play a major role in neurodegenerative diseases such as Huntington. Here, we model peptides rich in glutamine, through a series of molecular dynamics simulations. Starting from a rigid nanotube-like conformation, we have obtained a new conformational template that shares structural features of a tubular helix and of a beta-helix conformational organization. Our new model can be described as a super-helical arrangement of flat beta-sheet segments linked by planar turns or bends. Interestingly, our comprehensive analysis of the Protein Data Bank reveals that this is a common motif in beta-helices (termed beta-bend), although it has not been identified so far. The motif is based on the alternation of beta-sheet and helical conformation as the protein sequence is followed from the N to the C termini (beta-alpha(R)-beta-polyPro-beta). We further identify this motif in the ssNMR structure of the protofibril of the amyloidogenic peptide Abeta(1-40). The recurrence of the beta-bend suggests a general mode of connecting long parallel beta-sheet segments that would allow the growth of partially ordered fibril structures. The design allows the peptide backbone to change direction with a minimal loss of main chain hydrogen bonds. The identification of a coherent organization beyond that of the beta-sheet segments in different folds rich in parallel beta-sheets suggests a higher degree of ordered structure in protein fibrils, in agreement with their low solubility and dense molecular packing. 相似文献
10.
11.
To understand how protein segments are inserted and deleted during divergent evolution, a set of pairwise alignments contained exactly one gap, and therefore arising from the first insertion-deletion (indel) event in the time separating the homologs, was examined. The alignments showed that "structure breaking" amino acids (PGDNS) were preferred within and flanking gapped regions, as are two residues with hydrophilic side-chains (QE) that frequently occur at the surface of protein folds. Conversely, hydrophobic residues (FMILYVW) occur infrequently within and flanking the gapped region. These preferences are modestly different in protein pairs separated by an episode of adaptive evolution, than in pairs diverging under strong functional constraints. Surprisingly, regions near an indel have not evolved more rapidly than the sequence pair overall, showing no evidence that an indel event must be compensated by local amino acid replacement. The gap-lengths are best approximated by a Zipfian distribution, with the probability of a gap of length L decreasing as a function of L(-1.8). These features are largely independent of the length of the gap and the extent of divergence (measured by both silent and non-silent sequence changes) separating the two proteins. Surprisingly, amino acid repeats were discovered in more than a third of the polypeptide segments in and around the gap. These correspond to repeats in the DNA sequence. This suggests that a signature of the mechanism by which indels occur in the DNA sequence remains in the encoded protein sequences. These data suggest specific tools to score gap placement in an alignment. They also suggest tools that distinguish true indels from gaps created by mistaken gene finding, including under-predicted and over-predicted introns. By providing mechanisms to identify errors, the tools will enhance the value of genome sequence databases in support of integrated paleogenomics strategies used to extract functional information in a post-genomic environment. 相似文献
12.
User-driven in silico RNA homology search is still a nontrivial task. In part, this is the consequence of a limited precision of the computational tools in spite of recent exciting progress in this area, and to a certain extent, computational costs are still problematic in practice. An important, and as we argue here, dominating issue is the dependence on good curated (secondary) structural alignments of the RNAs. These are often hard to obtain, not so much because of an inherent limitation in the available data, but because they require substantial manual curation, an effort that is rarely acknowledged. Here, we qualitatively describe a realistic scenario for what a “regular user” (i.e., a nonexpert in a particular RNA family) can do in practice, and what kind of results are likely to be achieved. Despite the indisputable advances in computational RNA biology, the conclusion is discouraging: BLAST still works better or equally good as other methods unless extensive expert knowledge on the RNA family is included. However, when good curated data are available the recent development yields further improvements in finding remote homologs. Homology search beyond the reach of BLAST hence is not at all a routine task. 相似文献
13.
Mohd. Quadir Siddiqui Rajan Kumar Choudhary Pankaj Thapa Neha Kulkarni Yogendra S. Rajpurohit Hari S. Misra 《Journal of biomolecular structure & dynamics》2017,35(14):3032-3042
Fanconi anemia complementation groups – I (FANCI) protein facilitates DNA ICL (Inter-Cross-link) repair and plays a crucial role in genomic integrity. FANCI is a 1328 amino acids protein which contains armadillo (ARM) repeats and EDGE motif at the C-terminus. ARM repeats are functionally diverse and evolutionarily conserved domain that plays a pivotal role in protein–protein and protein–DNA interactions. Considering the importance of ARM repeats, we have explored comprehensive in silico and in vitro approach to examine folding pattern. Size exclusion chromatography, dynamic light scattering (DLS) and glutaraldehyde crosslinking studies suggest that FANCI ARM repeat exist as monomer as well as in oligomeric forms. Circular dichroism (CD) and fluorescence spectroscopy results demonstrate that protein has predominantly α- helices and well-folded tertiary structure. DNA binding was analysed using electrophoretic mobility shift assay by autoradiography. Temperature-dependent CD, Fluorescence spectroscopy and DLS studies concluded that protein unfolds and start forming oligomer from 30°C. The existence of stable portion within FANCI ARM repeat was examined using limited proteolysis and mass spectrometry. The normal mode analysis, molecular dynamics and principal component analysis demonstrated that helix-turn-helix (HTH) motif present in ARM repeat is highly dynamic and has anti-correlated motion. Furthermore, FANCI ARM repeat has HTH structural motif which binds to double-stranded DNA. 相似文献
14.
Morphological diversity of Old World rats and mice (Rodentia, Muridae) mandible in relation with phylogeny and adaptation 总被引:2,自引:0,他引:2
J. Michaux P. Chevret S. Renaud 《Journal of Zoological Systematics and Evolutionary Research》2007,45(3):263-279
The respective roles of the phylogenetic and ecological components in an adaptive radiation are tested on a sample of Old World rats and mice (Muridae, Murinae). Phylogeny was established on nuclear and mitochondrial genes and reconstructed by maximum likelihood and Bayesian methods. This phylogeny is congruent with previous larger scale ones recently published, but includes some new results: Bandicota and Nesokia are sister taxa and Micromys would be closely related to the Rattus group. The ecological diversification is investigated through one factor, the diet, and the mandible outline provides the morphological marker. Elliptic and radial Fourier transforms are used for quantifying size and shape differences among species. Univariate size and shape parameters indicate that phylogeny is more influential on size than diet, and the reverse occurs for shape and robust patterns are recognized by multivariate analyses of the data sets provided by the Fourier methods. Omnivorous and herbivorous groups are well separated despite some overlapping, as well as are other Murinae with a specialized diet (insects, seeds). Phylogeny is also influential as shown by the segregation of several groups (Praomys, Arvicanthini, Rattus, Apodemus). Allometric shape variation was investigated, and although present it does not overwhelm effects of either phylogeny or diet. Massive mandibles characterize herbivorous Murinae and slender mandibles, the insectivorous ones. A strong angular process relative to the coronoid process characterizes seedeaters, and the reverse characterized Murinae with a diet based largely on animal matter. Such changes in morphology are clearly in relation with the functioning of the mandible, and with the forces required by the nature of the food: the need of a stronger occlusal force in herbivorous species would explain massive mandibles, and an increase of the grasping and piercing function of incisors in insectivorous species would explain slender mandibles. 相似文献
15.
Jaroszewski L Li W Godzik A 《Protein science : a publication of the Protein Society》2002,11(7):1702-1713
A major bottleneck in comparative modeling is the alignment quality; this is especially true for proteins whose distant relationships could be reliably recognized only by recent advances in fold recognition. The best algorithms excel in recognizing distant homologs but often produce incorrect alignments for over 50% of protein pairs in large fold-prediction benchmarks. The alignments obtained by sequence-sequence or sequence-structure matching algorithms differ significantly from the structural alignments. To study this problem, we developed a simplified method to explicitly enumerate all possible alignments for a pair of proteins. This allowed us to estimate the number of significantly different alignments for a given scoring method that score better than the structural alignment. Using several examples of distantly related proteins, we show that for standard sequence-sequence alignment methods, the number of significantly different alignments is usually large, often about 10(10) alternatives. This distance decreases when the alignment method is improved, but the number is still too large for the brute force enumeration approach. More effective strategies were needed, so we evaluated and compared two well-known approaches for searching the space of suboptimal alignments. We combined their best features and produced a hybrid method, which yielded alignments that surpassed the original alignments for about 50% of protein pairs with minimal computational effort. 相似文献
16.
K. Mizuguchi C. M. Deane T. L. Blundell J. P. Overington 《Protein science : a publication of the Protein Society》1998,7(11):2469-2471
We describe a database of protein structure alignments for homologous families. The database HOMSTRAD presently contains 130 protein families and 590 aligned structures, which have been selected on the basis of quality of the X-ray analysis and accuracy of the structure. For each family, the database provides a structure-based alignment derived using COMPARER and annotated with JOY in a special format that represents the local structural environment of each amino acid residue. HOMSTRAD also provides a set of superposed atomic coordinates obtained using MNYFIT, which can be viewed with a graphical user interface or used for comparative modeling studies. The database is freely available on the World Wide Web at: http://www-cryst.bioc.cam. ac.uk/-homstrad/, with search facilities and links to other databases. 相似文献
17.
A technique for prediction of protein membrane toplogy (intra- and extraceullular sidedness) has been developed. Membrane-spanning segments are first predicted using an algorithm based upon multiply aligned amino acid sequences. The compositional differences in the protein segments exposed at each side of the membrane are then investigated. The ratios are calculated for Asn, Asp, Gly, Phe, Pro, Trp, Tyr, and Val, mostly found on the extracellular side, and for Ala, Arg, Cys, and Lys, mostly occurring on the intracellular side. The consensus over these 12 residue distributions is used for sidedness prediction. The method was developed with a set of 42 protein families for which all but one were correctly predicted with the new algorithm. This represents an improvement over previous techniques. The new method, applied to a set of 12 membrane protein families different from the test set and with recently determined topologies, performed well, with 11 of 12 sidedness assignments agreeing with experimental results. The method has also been applied to several membrane protein families for which the topology has yet to be determined. An electronic prediction service is available at the E-mail address tmap@embl-heidelberg.de and on WWW via http://www.emblheidelberg.de. 相似文献
18.
Repeated motifs of amino acids within proteins are an abundant feature of eukaryotic sequences and may catalyze the rapid production of genetic and even phenotypic variation among organisms. The completion of the genome sequencing projects of 12 distinct Drosophila species provides a unique dataset to study these intriguing sequence features on a phylogeny with a variety of timescales. We show that there is a higher percentage of proteins containing repeats within the Drosophila genus than most other eukaryotes, including non-Drosphila insects, which makes this collection of species particularly useful for the study of protein repeats. We also find that proteins containing repeats are overrepresented in functional categories involving developmental processes, signaling, and gene regulation. Using the set of 1-to-1 ortholog alignments for the 12 Drosophila species, we test the ability of repeats to act as reliable phylogenetic signals and find that they resolve the generally accepted phylogeny despite the noise caused by their accelerated rate of evolution. We also determine that in general the position of repeats within a protein sequence is non-random, with repeats more often being absent from the middle regions of sequences. Finally we find evidence to suggest that the presence of repeats is associated with an increase in evolutionary rate upon the entire sequence in which they are embedded. With additional evidence to suggest a corresponding elevation in positive selection we propose that some repeats may be inducing compensatory substitutions in their surrounding sequence. 相似文献
19.
《Structure (London, England : 1993)》2022,30(8):1169-1177.e4
- Download : Download high-res image (147KB)
- Download : Download full-size image
20.
A method to determine the mean pollen dispersal of individual plants growing within a large pollen source 总被引:1,自引:0,他引:1
C. Lavigne B. Godelle X. Reboud P. H. Gouyon 《TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik》1996,93(8):1319-1326
Pollen dispersal has been recently focused on as a major issue in the risk assessment of transgenic crop plants. The shape of the pollen dispersal of individual plants is hard to determine since a very large number of plants must be monitored in order to track rare longdistance dispersal events. Conversely, studies using large plots as a pollen source provide a pollen distribution that depends on the shape of the source plot. We report here on a method based on the use of Fourier transforms by which the pollen dispersal of a single, average individual can be obtained from data using large plots as pollen sources, thus allowing the estimation of the probability of long-distance dispersal for single plants. This method is subsequently implemented on simulated data to test its susceptibility to random noise and edge effects. Its conditions of application and value for use in ecological studies, in particular risk assessment of the deliberate release of transgenic plants, are discussed. 相似文献