首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
When performing bioinformatics analysis on tandem mass spectrometry data, there is a computational need to efficiently store and sort these semi-ordered datasets. To solve this problem, a new data structure based on dynamic arrays was designed and implemented in an algorithm that parses semi-ordered data made by Mascot, a separate software program that matches peptide tandem mass spectra to protein sequences in a database. By accommodating the special features of these large datasets, the combined dynamic array (CDA) provides efficient searching and insertion operations. The operations on real datasets using this new data structure are hundreds times faster than operations using binary tree and red-black tree structures. The difference becomes more significant when the dataset size grows. This data structure may be useful for improving the speed of other related types of protein assembling software or other types of software that operate on datasets with similar semi-ordered features.  相似文献   

2.
Protein structures cluster into families of folds that can result from extremely different amino acid sequences [1]. Because the enormous amount of genetic information generates a limited number of protein folds [2], a particular domain structure often assumes numerous functions. How new protein structures and new functions evolve under these limitations remains elusive. Molecular evolution may be driven by the ability of biomacromolecules to adopt multiple conformations as a bridge between different folds [3-6]. This could allow proteins to explore new structures and new tasks while part of the structural ensemble retains the initial conformation and function as a safeguard [7]. Here we show that a global structural switch can arise from single amino acid changes in cysteine-rich domains (CRD) of cnidarian nematocyst proteins. The ability of these CRDs to form two structures with different disulfide patterns from an identical cysteine pattern is distinctive [8]. By applying a structure-based mutagenesis approach, we demonstrate that a cysteine-rich domain can interconvert between two natively occurring domain structures via a bridge state containing both structures. Comparing cnidarian CRD sequences leads us to believe that the mutations we introduced to stabilize each structure reflect the birth of new protein folds in evolution.  相似文献   

3.
The score matrix from a structure comparison program (SAP) was used to search for repeated structures using a Fourier analysis. When tested with artificial data, a simple Fourier transform of the smoothed matrix provided a clear signal of the repeat periodicity that could be used to extract the repeating units with the SAP program. The strength of the Fourier signal was calibrated against the signal from model proteins. The most useful of these was the novel random-walk approach employed to generate realistic 'fake' structures. On the basis of these it was possible to conclude that only a small proportion of protein structures have an unexpected degree of symmetry. Artificially generated 'ideal' folds provided an upper limit on the strength of signal that could be expected from a 'perfectly' repeating compact structure. Unexpectedly, some of the very regular beta-propellor folds attained the same strength but the majority of symmetric structures lay below this region. When native proteins were ranked by the power of their spectrum a wide variety of fold types were seen to score highly. In the betaalpha class, these included the globular betaalpha proteins and the more repetitive leucine-rich betaalpha folds. In the all-beta class; beta-propellors, beta-prisms and beta-helices were found as well as the more globular gamma-crystalin domains. When this ranked list was filtered to remove proteins that contained detectable internal sequence similarity (using the program REPRO), the list became exclusively composed of just globular betaalpha class proteins and in the top 50 re-ranked proteins, only a single 4-fold propellor structure remained.  相似文献   

4.
Due to large sizes and complex nature, few large macromolecular complexes have been solved to atomic resolution. This has lead to an under-representation of these structures, which are composed of novel and/or homologous folds, in the library of known structures and folds. While it is often difficult to achieve a high-resolution model for these structures, X-ray crystallography and electron cryomicroscopy are capable of determining structures of large assemblies at low to intermediate resolutions. To aid in the interpretation and analysis of such structures, we have developed two programs: helixhunter and foldhunter. Helixhunter is capable of reliably identifying helix position, orientation and length using a five-dimensional cross-correlation search of a three-dimensional density map followed by feature extraction. Helixhunter's results can in turn be used to probe a library of secondary structure elements derived from the structures in the Protein Data Bank (PDB). From this analysis, it is then possible to identify potential homologous folds or suggest novel folds based on the arrangement of alpha helix elements, resulting in a structure-based recognition of folds containing alpha helices. Foldhunter uses a six-dimensional cross-correlation search allowing a probe structure to be fitted within a region or component of a target structure. The structural fitting therefore provides a quantitative means to further examine the architecture and organization of large, complex assemblies. These two methods have been successfully tested with simulated structures modeled from the PDB at resolutions between 6 and 12 A. With the integration of helixhunter and foldhunter into sequence and structural informatics techniques, we have the potential to deduce or confirm known or novel folds in domains or components within large complexes.  相似文献   

5.
Protein evolution is imprinted in both the sequence and the structure of evolutionary building blocks known as protein domains. These domains share a common ancestry and can be unified into a comparatively small set of folding architectures, the protein folds. We have traced the distribution of protein folds between and within proteomes belonging to Eukarya, Archaea, and Bacteria along the branches of a universal phylogeny of protein architecture. This tree was reconstructed from global fold-usage statistics derived from a structural census of proteomes. We found that folds shared by the three organismal domains were placed almost exclusively at the base of the rooted tree and that there were marked heterogeneities in fold distribution and clear evolutionary patterns related to protein architecture and organismal diversification. These include a relative timing for the emergence of prokaryotes, congruent episodes of architectural loss and diversification in Archaea and Bacteria, and a late and quite massive rise of architectural novelties in Eukarya perhaps linked to multicellularity.Reviewing Editor : Dr. David Pollock  相似文献   

6.
This paper evaluates the results of a protein structure prediction contest. The predictions were made using threading procedures, which employ techniques for aligning sequences with 3D structures to select the correct fold of a given sequence from a set of alternatives. Nine different teams submitted 86 predictions, on a total of 21 target proteins with little or no sequence homology to proteins of known structure. The 3D structures of these proteins were newly determined by experimental methods, but not yet published or otherwise available to the predictors. The predictions, made from the amino acid sequence alone, thus represent a genuine test of the current performance of threading methods. Only a subset of all the predictions is evaluated here. It corresponds to the 44 predictions submitted for the 11 target proteins seen to adopt known folds. The predictions for the remaining 10 proteins were not analyzed, although weak similarities with known folds may also exist in these proteins. We find that threading methods are capable of identifying the correct fold in many cases, but not reliably enough as yet. Every team predicts correctly a different set of targets, with virtually all targets predicted correctly by at least one team. Also, common folds such as TIM barrels are recognized more readily than folds with only a few known examples. However, quite surprisingly, the quality of the sequence-structure alignments, corresponding to correctly recognized folds, is generally very poor, as judged by comparison with the corresponding 3D structure alignments. Thus, threading can presently not be relied upon to derive a detailed 3D model from the amino acid sequence. This raises a very intriguing question: how is fold recognition achieved? Our analysis suggests that it may be achieved because threading procedures maximize hydrophobic interactions in the protein core, and are reasonably good at recognizing local secondary structure. © 1995 Wiley-Liss, Inc.  相似文献   

7.
Abstract Ciliary folds form the dorsolateral walls of the foregut in numerous polychaetes. These feeding structures have not been recognized earlier. They are described here for 26 species in 16 families. The folds consist of ciliated cells, usually associated with gland cells, and have no intrinsic muscular system. Protraction of the dorsolateral folds to make contact with the substratum during uptake of food is mainly achieved by contractions of the musculature of the body wall in the anterior part of the body. These folds either occur alone or are associated with a ventral pharyngeal organ. Dorsolateral ciliated folds are structures originally adapted to microphagy. From the present study and the literature it is obvious that these structures are widespread among polychaetes of various taxa. This distribution and their similar structure suggest that dorsolateral folds are phylogenetically old structures which might already have been present in the stem species of polychaetes.  相似文献   

8.
Vincent J. Hilser 《Proteins》2016,84(4):435-447
Knowing the determinants of conformational specificity is essential for understanding protein structure, stability, and fold evolution. To address this issue, a novel statistical measure of energetic compatibility between sequence and structure was developed using an experimentally validated model of the energetics of the native state ensemble. This approach successfully matched sequences from a diverse subset of the human proteome to their respective folds. Unexpectedly, significant energetic compatibility between ostensibly unrelated sequences and structures was also observed. Interrogation of these matches revealed a general framework for understanding the origins of conformational specificity within a proteome: specificity is a complex function of both the ability of a sequence to adopt folds other than the native, and ability of a fold to accommodate sequences other than the native. The regional variation in energetic compatibility indicates that the compatibility is dominated by incompatibility of sequence for alternative fold segments, suggesting that evolution of protein sequences has involved substantial negative selection, with certain segments serving as “gatekeepers” that presumably prevent alternative structures. Beyond these global trends, a size dependence exists in the degree to which the energetic compatibility is determined from negative selection, with smaller proteins displaying more negative selection. This partially explains how short sequences can adopt unique folds, despite the higher probability in shorter proteins for small numbers of mutations to increase compatibility with other folds. In providing evolutionary ground rules for the thermodynamic relationship between sequence and fold, this framework imparts valuable insight for rational design of unique folds or fold switches. Proteins 2016; 84:435–447. © 2016 Wiley Periodicals, Inc.  相似文献   

9.
Protein structure prediction is limited by the inaccuracy of the simplified energy functions necessary for efficient sorting over many conformations. It was recently suggested (Finkelstein, Phys Rev Lett 1998;80:4823-4825) that these errors can be reduced by energy averaging over a set of homologous sequences. This conclusion is confirmed in this study by testing protein structure recognition in gapless threading. The accuracy of recognition was estimated by the Z-score values obtained in gapless threading tests. For threading, we used 20 target proteins, each having from 20 to 70 homologs taken from the HSSP sequence base. The energy of the native structures was compared with the energy from 34 to 75 thousand of alternative structures generated by threading. The energy calculations were done with our recently developed Calpha atom-based phenomenological potentials. We show that averaging of protein energies over homologs reduces the Z-score from approximately -6.1 (average Z-score for individual chains) to approximately -8.1. This means that a correct fold can be found among 3 x 10(9) random folds in the first case and among 3 x 10(15) in the second. Such increase in selectivity is important for recognition of protein folds.  相似文献   

10.
11.
Newly determined protein structures are classified to belong to a new fold, if the structures are sufficiently dissimilar from all other so far known protein structures. To analyze structural similarities of proteins, structure alignment tools are used. We demonstrate that the usage of nonsequential structure alignment tools, which neglect the polypeptide chain connectivity, can yield structure alignments with significant similarities between proteins of known three-dimensional structure and newly determined protein structures that possess a new fold. The recently introduced protein structure alignment tool, GANGSTA, is specialized to perform nonsequential alignments with proper assignment of the secondary structure types by focusing on helices and strands only. In the new version, GANGSTA+, the underlying algorithms were completely redesigned, yielding enhanced quality of structure alignments, offering alignment against a larger database of protein structures, and being more efficient. We applied DaliLite, TM-align, and GANGSTA+ on three protein crystal structures considered to be novel folds. Applying GANGSTA+ to these novel folds, we find proteins in the ASTRAL40 database, which possess significant structural similarities, albeit the alignments are nonsequential and in some cases involve secondary structure elements aligned in reverse orientation. A web server is available at http://agknapp.chemie.fu-berlin.de/gplus for pairwise alignment, visualization, and database comparison.  相似文献   

12.
It is currently believed that the atlas of existing protein structures is faithfully represented in the Protein Data Bank. However, whether this atlas covers the full universe of all possible protein structures is still a highly debated issue. By using a sophisticated numerical approach, we performed an exhaustive exploration of the conformational space of a 60 amino acid polypeptide chain described with an accurate all-atom interaction potential. We generated a database of around 30,000 compact folds with at least of secondary structure corresponding to local minima of the potential energy. This ensemble plausibly represents the universe of protein folds of similar length; indeed, all the known folds are represented in the set with good accuracy. However, we discover that the known folds form a rather small subset, which cannot be reproduced by choosing random structures in the database. Rather, natural and possible folds differ by the contact order, on average significantly smaller in the former. This suggests the presence of an evolutionary bias, possibly related to kinetic accessibility, towards structures with shorter loops between contacting residues. Beside their conceptual relevance, the new structures open a range of practical applications such as the development of accurate structure prediction strategies, the optimization of force fields, and the identification and design of novel folds.  相似文献   

13.
This paper presents and discusses evidence suggesting how the diversity of domain folds in existence today might have evolved from peptide ancestors. We apply a structure similarity detection method to detect instances where localized regions of different protein folds contain highly similar sequences and structures. Results of performing an all-on-all comparison of known structures are described and compared with other recently published findings. The numerous instances of local sequence and structure similarities within different protein folds, together with evidence from proteins containing sequence and structure repeats, argues in favor of the evolution of modern single polypeptide domains from ancient short peptide ancestors (antecedent domain segments (ADSs)). In this model, ancient protein structures were formed by self-assembling aggregates of short polypeptides. Subsequently, and perhaps concomitantly with the evolution of higher fidelity DNA replication and repair systems, single polypeptide domains arose from the fusion of ADSs genes. Thus modern protein domains may have a polyphyletic origin.  相似文献   

14.
It is well known that the structure is currently available only for a small fraction of known protein sequences. It is urgent to discover the important features of known protein sequences based on present protein structures. Here, we report a study on the size distribution of protein families within different types of folds. The fold of a protein means the global arrangement of its main secondary structures, both in terms of their relative orientations and their topological connections, which specify a certain biochemical and biophysical aspect. We first search protein families in the structural database SCOP against the sequence-based database Pfam, and acquire a pool of corresponding Pfam families whose structures can be deemed as known. This pool of Pfam families is called the sample space for short. Then the size distributions of protein families involving the sample space, the Pfam database and the SCOP database are obtained. The results indicate that the size distributions of protein families under different kinds of folds abide by similar power-law. Specially, the largest families scatter evenly in different kinds of folds. This may help better understand the relationship of protein sequence, structure and function. We also show that the total of proteins with known structures can be considered a random sample from the whole space of protein sequences, which is an essential but unsettled assumption for related predictions, such as, estimating the number of protein folds in nature. Finally we conclude that about 2957 folds are needed to cover the total Pfam families by a simple method.  相似文献   

15.
Protein crystallography has become a major technique for understanding cellular processes. This has come about through great advances in the technology of data collection and interpretation, particularly the use of synchrotron radiation. The ability to express eukaryotic genes in Escherichia coli is also important. Analysis of known structures shows that all proteins are built from about 1000 primeval folds. The collection of all primeval folds provides a basis for predicting structure from sequence. At present about 450 are known. Of the presently sequenced genomes only a fraction can be related to known proteins on the basis of sequence alone. Attempts are being made to determine all (or as many as possible) of the structures from some bacterial genomes in the expectation that structure will point to function more reliably than does sequence. Membrane proteins present a special problem. The next 20 years may see the experimental determination of another 40,000 protein structures. This will make considerable demands on synchrotron sources and will require many more biochemists than are currently available. The availability of massive structure databases will alter the way biochemistry is done.  相似文献   

16.
Zhang C  Kim SH 《Proteins》2000,40(3):409-419
The Greek key motifs are the topological signature of many beta-barrels and a majority of beta-sandwich structures. An updated survey of these structures integrates many early observations and newly emerging patterns and provides a better understanding of the unique role of Greek keys in protein structures. A stereotypical Greek key beta-barrel accommodates five or six strands and can have 12 possible topologies. All except one six-stranded topologies have been observed, and only one five-stranded topologies have been seen in actual structures. Of the representative beta-barrel structures analyzed here, half have left-handed Greek keys. This result challenges the empirical claim of the handedness regularity of Greek keys in beta-barrels. One of the five-stranded topologies that has not been observed in beta-barrels comprises two overlapping Greek keys. The two three-dimensional forms of this topology constitute a structural unit that is present in a vast majority of known beta-sandwich structures. Using this unit as the root, we have built a new taxonomy tree for the beta-sandwich folds and deduced a set of rules that appear to constrain how other beta-strands adjoin the unit to form a larger double-layered structure. These rules, though derived from a larger data set, are essentially the same as those drawn from earlier studies, suggesting that they may reflect the true topological constraints in the design of beta-sandwich structures. Finally, a novel variant of the Greek key motif (defined here as the twisted Greek key) has emerged which introduces loop crossings into the folded structures. Proteins 2000;40:409-419.  相似文献   

17.
Yo Matsuo  Ken Nishikawa 《Proteins》1995,23(3):370-375
A protein fold recognition method was tested by the blind prediction of the structures of a set of proteins. The method evaluates the compatibility of an amino acid sequence with a three-dimensional structure using the four evaluation functions: side-chain packing, solvation, hydrogen-bonding, and local conformation functions. The structures of 14 proteins containing 19 sequences were predicted. The predictions were compared with the experimental structures. The experimental results showed that 9 of the 19 target sequences have known folds or portions of known folds. Among them, the folds of Klebsiella aerogenes urease β subunit (KAUB) and pyruvate phosphate dikinase domain 4 (PPDK4) were successfully recognized; our method predicted that KAUB and PPDK4 would adopt the folds of macromomycin (Ig-fold) and phosphoribosylanthra-nilate isomerase:indoleglycerol-phosphate synthase (TIM barrel), respectively, and the experimental structure revealed that they actually adopt the predicted folds. The predictions for the other targets were not successful, but they often gave secondary structural patterns similar to those of the experimental structures. © 1995 Wiley-Liss, Inc.  相似文献   

18.
Mark Gerstein 《Proteins》1998,33(4):518-534
Eight microbial genomes are compared in terms of protein structure. Specifically, yeast, H. influenzae, M. genitalium, M. jannaschii, Synechocystis, M. pneumoniae, H. pylori, and E. coli are compared in terms of patterns of fold usage—whether a given fold occurs in a particular organism. Of the ∼340 soluble protein folds currently in the structure databank (PDB), 240 occur in at least one of the eight genomes, and 30 are shared amongst all eight. The shared folds are depleted in all-helical structure and enriched in mixed helix-sheet structure compared to the folds in the PDB. The top-10 most common of the shared 30 are enriched in superfolds, uniting many non-homologous sequence families, and are especially similar in overall architecture—eight having helices packed onto a central sheet. They are also very different from the common folds in the PBD, highlighting databank biases. Folds can be ranked in terms of expression as well as genome duplication. In yeast the top-10 most highly expressed folds are considerably different from the most highly duplicated folds. A tree can be constructed grouping genomes in terms of their shared folds. This has a remarkably similar topology to more conventional classifications, based on very different measures of relatedness. Finally, folds of membrane proteins can be analyzed through transmembrane-helix (TM) prediction. All the genomes appear to have similar usage patterns for these folds, with the occurrence of a particular fold falling off rapidly with increasing numbers of TM-elements, according to a “Zipf-like” law. This implies there are no marked preferences for proteins with particular numbers of TM-helices (e.g. 7-TM) in microbial genomes. Further information pertinent to this analysis is available at http://bioinfo.mbb.yale.edu/genome. Proteins 33:518–534, 1998. © 1998 Wiley-Liss, Inc.  相似文献   

19.
Several protein structure classification schemes exist that partition the protein universe into structural units called folds. Yet these schemes do not discuss how these units sit relative to each other in a global structure space. In this paper we construct networks that describe such global relationships between folds in the form of structural bridges. We generate these networks using four different structural alignment methods across multiple score thresholds. The networks constructed using the different methods remain a similar distance apart regardless of the probability threshold defining a structural bridge. This suggests that at least some structural bridges are method specific and that any attempt to build a picture of structural space should not be reliant on a single structural superposition method. Despite these differences all representations agree on an organisation of fold space into five principal community structures: all-α, all-β sandwiches, all-β barrels, α/β and α + β. We project estimated fold ages onto the networks and find that not only are the pairings of unconnected folds associated with higher age differences than bridged folds, but this difference increases with the number of networks displaying an edge. We also examine different centrality measures for folds within the networks and how these relate to fold age. While these measures interpret the central core of fold space in varied ways they all identify the disposition of ancestral folds to fall within this core and that of the more recently evolved structures to provide the peripheral landscape. These findings suggest that evolutionary information is encoded along these structural bridges. Finally, we identify four highly central pivotal folds representing dominant topological features which act as key attractors within our landscapes.  相似文献   

20.
Despite their seemingly endless diversity, proteins adopt a limited number of structural forms. It has been estimated that 80% of proteins will be found to adopt one of only about 400 folds, most of which are already known. These folds are largely formed by a limited 'vocabulary' of recurring supersecondary structure elements, often by repetition of the same element and, increasingly, elements similar in both structure and sequence are discovered. This suggests that modern proteins evolved by fusion and recombination from a more ancient peptide world and that many of the core folds observed today may contain homologous building blocks. The peptides forming these building blocks would not in themselves have had the ability to fold, but would have emerged as cofactors supporting RNA-based replication and catalysis (the 'RNA world'). Their association into larger structures and eventual fusion into polypeptide chains would have allowed them to become independent of their RNA scaffold, leading to the evolution of a novel type of macromolecule: the folded protein.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号