首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Summary The nature of the amino acids whose codons border introns in ferritin genes is novel; the disposition of these intron boundaries within the three-dimensional structure of the 24-subunit molecule differs significantly from that of other proteins. These observations are discussed in relation to the functions of isoferritins.  相似文献   

2.
Rahul Kaushik  Kam Y. J. Zhang 《Proteins》2020,88(10):1271-1284
The infinitesimally small sequence space naturally scouted in the millions of years of evolution suggests that the natural proteins are constrained by some functional prerequisites and should differ from randomly generated sequences. We have developed a protein sequence fitness scoring function that implements sequence and corresponding secondary structural information at tripeptide levels to differentiate natural and nonnatural proteins. The proposed fitness function is extensively validated on a dataset of about 210 000 natural and nonnatural protein sequences and benchmarked with existing methods for differentiating natural and nonnatural proteins. The high sensitivity, specificity, and percentage accuracy (0.81%, 0.95%, and 91% respectively) of the fitness function demonstrates its potential application for sampling the protein sequences with higher probability of mimicking natural proteins. Moreover, the four major classes of proteins (α proteins, β proteins, α/β proteins, and α + β proteins) are separately analyzed and β proteins are found to score slightly lower as compared to other classes. Further, an analysis of about 250 designed proteins (adopted from previously reported cases) helped to define the boundaries for sampling the ideal protein sequences. The protein sequence characterization aided by the proposed fitness function could facilitate the exploration of new perspectives in the design of novel functional proteins.  相似文献   

3.
4.
Sim J  Kim SY  Lee J 《Proteins》2005,59(3):627-632
Successful prediction of protein domain boundaries provides valuable information not only for the computational structure prediction of multidomain proteins but also for the experimental structure determination. Since protein sequences of multiple domains may contain much information regarding evolutionary processes such as gene-exon shuffling, this information can be detected by analyzing the position-specific scoring matrix (PSSM) generated by PSI-BLAST. We have presented a method, PPRODO (Prediction of PROtein DOmain boundaries) that predicts domain boundaries of proteins from sequence information by a neural network. The network is trained and tested using the values obtained from the PSSM generated by PSI-BLAST. A 10-fold cross-validation technique is performed to obtain the parameters of neural networks using a nonredundant set of 522 proteins containing 2 contiguous domains. PPRODO provides good and consistent results for the prediction of domain boundaries, with accuracy of about 66% using the +/-20 residue criterion. The PPRODO source code, as well as all data sets used in this work, are available from http://gene.kias.re.kr/ approximately jlee/pprodo/.  相似文献   

5.
Haspel N  Tsai CJ  Wolfson H  Nussinov R 《Proteins》2003,51(2):203-215
We have previously presented a building block folding model. The model postulates that protein folding is a hierarchical top-down process. The basic unit from which a fold is constructed, referred to as a hydrophobic folding unit, is the outcome of combinatorial assembly of a set of "building blocks." Results obtained by the computational cutting procedure yield fragments that are in agreement with those obtained experimentally by limited proteolysis. Here we show that as expected, proteins from the same family give very similar building blocks. However, different proteins can also give building blocks that are similar in structure. In such cases the building blocks differ in sequence, stability, contacts with other building blocks, and in their 3D locations in the protein structure. This result, which we have repeatedly observed in many cases, leads us to conclude that while a building block is influenced by its environment, nevertheless, it can be viewed as a stand-alone unit. For small-sized building blocks existing in multiple conformations, interactions with sister building blocks in the protein will increase the population time of the native conformer. With this conclusion in hand, it is possible to develop an algorithm that predicts the building block assignment of a protein sequence whose structure is unknown. Toward this goal, we have created sequentially nonredundant databases of building block sequences. A protein sequence can be aligned against these, in order to be matched to a set of potential building blocks.  相似文献   

6.
7.
8.
9.
The identification of protein domains within multi-domain proteins is a persistent problem. Here, we describe an experimental method (shotgun proteolysis) based on random DNA fragmentation and protease selection of the encoded polypeptides on phage for this purpose. We applied the method to the Escherichia coli genome and identified 124 protease-resistant fragments; several were re-cloned for expression as soluble fragments in bacteria, and corresponded to autonomously folding units with folding energies similar to natural protein domains (DeltaG(u)=3.8-6.6 kcal/mol). Structural information was available for approximately half of the selected proteins, which corresponded to compact, globular and domain-sized units that had been derived from a wide range of protein superfamilies. Furthermore, boundaries of the selected fragments correlated with domain boundaries as defined by bioinformatics predictions (R2=0.82; p=0.016). However, predictions were incomplete or entirely lacking for the remaining fragments, reflecting the limited proteome coverage of current bioinformatics methods. Shotgun proteolysis therefore provides a means to identify domains and other autonomously folding units on a genome-wide scale, without any prior knowledge of sequence or structure. Shotgun proteolysis should be particularly valuable for structural studies of proteins and represents a high-throughput alternative to the classical limited proteolysis method for the isolation of stable components of multi-domain proteins.  相似文献   

10.
Centripetal modules and ancient introns   总被引:10,自引:0,他引:10  
Roy SW  Nosaka M  de Souza SJ  Gilbert W 《Gene》1999,238(1):85-91
We have created an algorithm which instantiates the centripetal definition of modules, compact regions of protein structure, as introduced by Go and Nosaka (M. Go and M. Nosaka, 1987. Protein architecture and the origin of introns. Cold Spring Harbor Symp. Quant. Bio. 52, 915-924). That definition seeks the minima of a function that sums the squares of C-alpha carbon distances over a window around each amino acid residue in a three-dimensional protein structure and identifies such minima with module boundaries. We analyze a set of 44 ancient conserved proteins, with known three-dimensional structures, which have intronless homologues in bacteria and intron-containing homologues in the eukaryotes, with a corresponding set of 988 intron positions. We show that the phase zero intron positions are significantly correlated with the module boundaries (p = 0.0002), while the intron positions that lie within codons, in phase one and phase two, are not correlated with these 'centripetal' module boundaries. Furthermore, we analyze the phylogenetic distribution of intron positions and identify a subset of putatively 'ancient' intron positions: phase zero positions in one phylogenetic kingdom which have an associated intron either in an identical position or within three codons in another phylogenetic kingdom (a notion of intron sliding). This subset of 120 'ancient' introns lies closer to the module boundaries than does the full set of phase zero introns with high significance, a p-value of 0.008. We conclude that the behavior of this set of introns supports the prediction of a mixed theory: that some introns are very old and were used for exon shuffling in the progenote, while many introns have been lost and added since.  相似文献   

11.
Human complement component C9 is a multidomain protein for which a large number of surface topographical features have been determined. We have analyzed the exon-intron boundaries of the human C9 gene and find a good correlation between splice sites and surface features of the protein but little correlation with the putative protein domain structure, even in the cysteine-rich sequence homology with the low-density lipoprotein (LDL) receptor which is likely to be an independently folded structural motif. This is surprising because in the LDL receptor the same sequence is precisely bounded by introns, and it has been assumed that this sequence is present in both proteins as a result of exon shuffling. We deduce that substantial rearrangement of the exon-intron structure of the C9 gene must have occurred before the exchange of cysteine-rich domains, possibly linked to the process of exon duplication which was required to generate the repeats in the LDL receptor.  相似文献   

12.
13.
The ability to consistently distinguish real protein structures from computationally generated model decoys is not yet a solved problem. One route to distinguish real protein structures from decoys is to delineate the important physical features that specify a real protein. For example, it has long been appreciated that the hydrophobic cores of proteins contribute significantly to their stability. We used two sources to obtain datasets of decoys to compare with real protein structures: submissions to the biennial Critical Assessment of protein Structure Prediction competition, in which researchers attempt to predict the structure of a protein only knowing its amino acid sequence, and also decoys generated by 3DRobot, which have user‐specified global root‐mean‐squared deviations from experimentally determined structures. Our analysis revealed that both sets of decoys possess cores that do not recapitulate the key features that define real protein cores. In particular, the model structures appear more densely packed (because of energetically unfavorable atomic overlaps), contain too few residues in the core, and have improper distributions of hydrophobic residues throughout the structure. Based on these observations, we developed a feed‐forward neural network, which incorporates key physical features of protein cores, to predict how well a computational model recapitulates the real protein structure without knowledge of the structure of the target sequence. By identifying the important features of protein structure, our method is able to rank decoy structures with similar accuracy to that obtained by state‐of‐the‐art methods that incorporate many additional features. The small number of physical features makes our model interpretable, emphasizing the importance of protein packing and hydrophobicity in protein structure prediction.  相似文献   

14.
Topology predictions for integral membrane proteins can be substantially improved if parts of the protein can be constrained to a given in/out location relative to the membrane using experimental data or other information. Here, we have identified a set of 367 domains in the SMART database that, when found in soluble proteins, have compartment-specific localization of a kind relevant for membrane protein topology prediction. Using these domains as prediction constraints, we are able to provide high-quality topology models for 11% of the membrane proteins extracted from 38 eukaryotic genomes. Two-thirds of these proteins are single spanning, a group of proteins for which current topology prediction methods perform particularly poorly.  相似文献   

15.
Newly determined protein structures are classified to belong to a new fold, if the structures are sufficiently dissimilar from all other so far known protein structures. To analyze structural similarities of proteins, structure alignment tools are used. We demonstrate that the usage of nonsequential structure alignment tools, which neglect the polypeptide chain connectivity, can yield structure alignments with significant similarities between proteins of known three-dimensional structure and newly determined protein structures that possess a new fold. The recently introduced protein structure alignment tool, GANGSTA, is specialized to perform nonsequential alignments with proper assignment of the secondary structure types by focusing on helices and strands only. In the new version, GANGSTA+, the underlying algorithms were completely redesigned, yielding enhanced quality of structure alignments, offering alignment against a larger database of protein structures, and being more efficient. We applied DaliLite, TM-align, and GANGSTA+ on three protein crystal structures considered to be novel folds. Applying GANGSTA+ to these novel folds, we find proteins in the ASTRAL40 database, which possess significant structural similarities, albeit the alignments are nonsequential and in some cases involve secondary structure elements aligned in reverse orientation. A web server is available at http://agknapp.chemie.fu-berlin.de/gplus for pairwise alignment, visualization, and database comparison.  相似文献   

16.
A consensus approach for the assignment of structural domains in proteins is presented. The approach combines a number of previously published algorithms, and takes advantage of the elevated accuracy obtained when assignments from the individual algorithms are in agreement. The consensus approach is tested on a data set of 55 protein chains, for which domain assignments from four automated methods were known, and for which crystallographers assignments had been reported in the literature. Accuracy was found to increase in this test from 72% using individual algorithms to 100% when all four methods were in agreement. However a consensus prediction using all four methods was only possible for 52% of the dataset. The consensus approach [using three publicly available domain assignment algorithms (PUU, DETECTIVE, DOMAK)] was then used to make domain assignments for a data set of 787 protein chains from the Protein Data Bank. Analysis of the assignments showed 55.7% of assignments could be made automatically, and of these, 13.5% were multi-domain proteins. Of the remaining 44.3% that could not be assigned by the consensus procedure 90.4% had their domain boundaries assigned correctly by at least one of the algorithms. Once identified, these domains were analyzed for trends in their size and secondary structure class. In addition, the discontinuity of each domain along the protein chain was considered.  相似文献   

17.
Protein C inhibitor (plasminogen activator inhibitor-3) is a plasma glycoprotein and a member of the serine proteinase inhibitor superfamily. In the present study, the human gene for protein C inhibitor was isolated and characterized from three independent phage that contained overlapping inserts coding for the entire gene. The genomic DNA was isolated and studied by restriction mapping, polymerase chain reaction analysis, and DNA sequencing. The gene was 11.5 kilobases in length and consisted of five exons separated by four introns. In addition, 0.8 kilobases of DNA from the 5'-flanking region were sequenced. The exon-intron boundaries all observed the "GT-AG" rule. The gene for protein C inhibitor was assigned to chromosome 14 by polymerase chain reaction analysis of human/hamster hybrid cell lines. The organization of the gene for protein C inhibitor is similar to the genes coding for alpha 1-antitrypsin and alpha 1-antichymotrypsin. The genes for these two proteins are also localized on chromosome 14 suggesting a recent evolution of the genes for these three proteins from a common ancestor.  相似文献   

18.
19.
W Wang  R Skopp  M Scofield    C Price 《Nucleic acids research》1992,20(24):6621-6629
We have identified two 1.6 kb macronuclear DNA molecules from Euplotes crassus that hybridize to the alpha subunit of the Oxytricha telomere protein. We have shown that one of these molecules encodes the 51 kDa Euplotes telomere protein while the other appears to encode a homolog of the telomere protein. Although this homolog clearly differs in sequence from the Euplotes telomere protein, the two proteins share extensive amino acid sequence identity with each other and with the alpha subunit of the Oxytricha telomere protein. In all three proteins 35-36% of the amino acids are identical, while 54-56% are similar. The most extended regions of sequence conservation map within the N-terminal section; this section has been shown to comprise the DNA-binding domain in the Euplotes telomere protein. Our findings suggest that some of the conserved amino acids may be involved in DNA recognition and binding. The gene encoding the telomere protein homolog contains two introns; one of these introns is only 24 bp in length. This is the smallest mRNA intron reported to date.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号