首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
We describe a database of protein structure alignments as well as methods and tools that use this database to improve comparative protein modeling. The current version of the database contains 105 alignments of similar proteins or protein segments. The database comprises 416 entries, 78,495 residues, 1,233 equivalent entry pairs, and 230,396 pairs of equivalent alignment positions. At present, the main application of the database is to improve comparative modeling by satisfaction of spatial restraints implemented in the program MODELLER (?ali A, Blundell TL, 1993, J Mol Biol 234:779–815). To illustrate the usefulness of the database, the restraints on the conformation of a disulfide bridge provided by an equivalent disulfide bridge in a related structure are derived from the alignments; the prediction success of the disulfide dihedral angle classes is increased to approximately 80%, compared to approximately 55% for modeling that relies on the stereochemistry of disulfide bridges alone. The second example of the use of the database is the derivation of the probability density function for comparative modeling of the cis/trans isomerism of the proline residues; the prediction success is increased from 0% to 82.9% for cis-proline and from 93.3% to 96.2% for trans-proline. The database is available via electronic mail.  相似文献   

2.
The conditional probability, P(sigma/x), is a statement of the probability that the value of sigma will be found given the prior information that a value of x has been observed. Here sigma represents any one of the secondary structure types, alpha, beta, tau, and rho for helix, sheet, turn, and random, respectively, and x represents a sequence attribute, including, but not limited to: (1) hydropathy; (2) hydrophobic moments assuming helix and sheet; (3) Richardson and Richardson helical N-cap and C-cap values; (4) Chou-Fasman conformational parameters for helix, P alpha, for sheet, P beta, and for turn, P tau; and (5) Garnier, Osguthorpe, and Robson (GOR) information values for helix, I alpha, for sheet, I beta, for turn, I tau, and for random structure, I rho. Plots of P(sigma/x) vs. x are demonstrated to provide information about the correlation between structure and attribute, sigma and x. The separations between different P(sigma/x) vs. x curves indicate the capacity of a given attribute to discriminate between different secondary structural types and permit comparison of different attributes. P(alpha/x), P(beta/x), P(tau/x) and P(rho/x) vs. x plots show that the most useful attributes for discriminating helix are, in order: hydrophobic moment assuming helix greater than P alpha much greater than N-cap greater than C-cap approximately I alpha approximately I tau. The information value for turns, I tau, was found to discriminate helix better than turns. Discrimination for sheet was found to be in the following order: I beta much greater than P beta approximately hydropathy greater than I rho approximately hydrophobic moment assuming sheet. Three attributes, at their low values, were found to give significant discrimination for the absence of helix: I alpha approximately P alpha approximately hydrophobic moment assuming helix. Also, three other attributes were found to indicate the absence of sheet: P beta much greater than I rho approximately hydropathy. Indications of the absence of sigma could be as useful for some applications as the indication of the presence of sigma.  相似文献   

3.
We describe a database of protein structure alignments for homologous families. The database HOMSTRAD presently contains 130 protein families and 590 aligned structures, which have been selected on the basis of quality of the X-ray analysis and accuracy of the structure. For each family, the database provides a structure-based alignment derived using COMPARER and annotated with JOY in a special format that represents the local structural environment of each amino acid residue. HOMSTRAD also provides a set of superposed atomic coordinates obtained using MNYFIT, which can be viewed with a graphical user interface or used for comparative modeling studies. The database is freely available on the World Wide Web at: http://www-cryst.bioc.cam. ac.uk/-homstrad/, with search facilities and links to other databases.  相似文献   

4.
In the recent past, there has been a resurgence of interest in Chikungunya virus (CHIKV) attributed to massive outbreaks of Chikungunya fever in the South-East Asia Region. This has reflected in substantial increase in submission of CHIKV genome sequences to NCBI (National Center for Biotechnology Information) database. Hereby we submit a database "CHIKVPRO" containing structural and functional annotation of Chikungunya virus proteins (25 strains) submitted in the NCBI repository. The CHIKV genome encodes for 9 proteins:4 non-structural and 5 structural. The CHIKVPRO database aims to provide the virology community with a single accession authoritative resource for CHIKV proteome- with reference to physiochemical and molecular properties, proteolytic cleavage sites, hydrophobicity, transmembrane prediction, and classification into functional families using SVMProt and other Expasy tools. AVAILABILITY: The database is freely available at http://www.chikvpro.info/  相似文献   

5.
The globin derived from the monomer Component IV hemoglobin of the marine annelid,Glycera dibranchiata, has been completely sequenced, and the resulting information has been used to create a structural model of the protein. The most important result is that the consensus sequence of Component IV differs by 3 amino acids from a cDNA-predicted amino acid sequence thought earlier to encode the Component IV hemoglobin. This work reveals that the histidine (E7), typical of most heme-containing globins, is replaced by leucine in Component IV. Also significant is that this sequence is not identical to any of the previously reportedGlycera dibranchiata monomer hemoglobin sequences, including the sequence from a previously reported crystal structure, but has high identity to all. A three-dimensional structual model for monomer Component IV hemoglobin was constructed using the published 1.5 å crystal structure of a monomer hemoglobin fromGlycera dibranchiata as a template. The model shows several interesting features: (1) a Phe31 (B10) that is positioned in the active site; (2) a His39 occurs in an interhelical region occupied by Pro in 98.2% of reported globin sequences; and (3) a Met41 is found at a position that emerges from this work as a previously unrecognized heme contact.Abbreviations used GMHX the holo-protein (including b-type heme, Glycera dibranchiata monomer hemoglobin Component X (X=2, 3, or 4) - GMGX the apo-protein, or globin, Glycera dibranchiata monomer globin derived from Component X (X=2, 3, or 4) - rec-gmg the globin derived from a recombinant holoprotein of a Glycera dibranchiata monomer hemoglobin, rec-gmh, whose sequence has been inferred from an isolated cDNA insert - CB label refers to peptides generated from cyanogen bromide cleavage of GMG4 - HPLC high-performance liquid chromatography - T label refers to peptides generated from trypsin digests of GMG4 - Mb myoglobin - MCS monomer hemoglobin crystal structure from Glycera dibranchiata. H, N-terminal sequence of GMG4 - SWMb sperm whale myoglobin  相似文献   

6.
Profile search methods based on protein domain alignments have proven to be useful tools in comparative sequence analysis. Domain alignments used by currently available search methods have been computed by sequence comparison. With the growth of the protein structure database, however, alignments of many domain pairs have also been computed by structure comparison. Here, we examine the extent to which information from these two sources agrees. We measure agreement with respect to identification of homologous regions in each protein, that is, with respect to the location of domain boundaries. We also measure agreement with respect to identification of homologous residue sites by comparing alignments and assessing the accuracy of the molecular models they predict. We find that domain alignments in publicly available collections based on sequence and structure comparison are largely consistent. However, the homologous regions identified by sequence comparison are often shorter than those identified by 3D structure comparison. In addition, when overall sequence similarity is low alignments from sequence comparison produce less accurate molecular models, suggesting that they less accurately identify homologous sites. These observations suggest that structure comparison results might be used to improve the overall accuracy of domain alignment collections and the performance of profile search methods based on them.  相似文献   

7.
Rapid increase in protein sequence information from genome sequencing projects demand the intervention of bioinformatics tools to recognize interesting gene-products and associated function. Often, multiple algorithms need to be employed to improve accuracy in predictions and several structure prediction algorithms are on the public domain. Here, we report the availability of an Integrated Web-server as a bioinformatics online package dedicated for in-silico analysis of protein sequence and structure data (IWS). IWS provides web interface to both in-house and widely accepted programs from major bioinformatics groups, organized as 10 different modules. IWS also provides interactive images for Analysis Work Flow, which will provide transparency to the user to carry out analysis by moving across modules seamlessly and to perform their predictions in a rapid manner. AVAILABILITY: IWS IS AVAILABLE FROM THE URL: http://caps.ncbs.res.in/iws.  相似文献   

8.
BACKGROUND INFORMATION: The MIPs (major intrinsic proteins) constitute a large family of membrane proteins that facilitate the passive transport of water and small neutral solutes across cell membranes. Since water is the most abundant molecule in all living organisms, the discovery of selective water-transporting channels called AQPs (aquaporins) has led to new knowledge on both the physiological and molecular mechanisms of membrane permeability. The MIPs are identified in Archaea, Bacteria and Eukaryota, and the rapid accumulation of new sequences in the database provides an opportunity for large-scale analysis, to identify functional and/or structural signatures or to infer evolutionary relationships. To help perform such an analysis, we have developed MIPDB (database for MIP proteins), a relational database dedicated to members of the MIP family. RESULTS: MIPDB is a motif-oriented database that integrates data on 785 MIP proteins from more than 200 organisms and contains 230 distinct sequence motifs. MIPDB proposes the classification of MIP proteins into three functional subgroups: AQPs, glycerol-uptake facilitators and aquaglyceroporins. Plant MIPs are classified into three specific subgroups according to their subcellular distribution in the plasma membrane, tonoplast or the symbiosome membrane. Some motifs of the database are highly selective and can be used to predict the transport function or subcellular localization of unknown MIP proteins. CONCLUSIONS: MIPDB offers a user-friendly and intuitive interface for a rapid and easy access to MIP resources and to sequence analysis tools. MIPDB is a web application, publicly accessible at http://idefix.univ-rennes1.fr:8080/Prot/index.html.  相似文献   

9.
The cytoplasmic hemoglobin II from the gill of the clamLucina pectinata consists of 150 amino acid residues, has a calculatedM m of 17,476, including heme and an acetylated N-terminal residue. It retains the invariant residues Phe 44 at position CD1 and His 65 at the proximal position F8, as well as the highly conserved Trp 15 at position A12 and Pro 38 at position C2. The most likely candidate for the distal residue at position E7, based on the alignment with other globins, is Gln 65. However, optical and EPR spectroscopic studies of the ferri Hb II (Kraus, D. W., Wittenberg, J. B., Lu, J. F., and Peisach, J.,J. Biol. Chem. 265, 16054–16059, 1990) have implicated a tyrosinate oxygen as the distal ligand. Modeling of theLucina Hb II sequence, using the crystal structure of sperm whale aquometmyoglobin, showed that Tyr 30 substituting for the Leu located at position B10 can place its oxygen within 2.8 Å of the water molecule occupying the distal ligand position. This structural alteration is facilitated by the coordinate mutation of the residue at position CD4, from Phe 46 in the sperm whale myoglobin sequence to Leu 47 inLucina Hb II.  相似文献   

10.
Use of o-phthaldialdehyde to chemically reduce the newly generated amino termini responsible for the progressively increasing background during an extended amino acid sequence analysis in a liquid phase sequencer has been described. The results have been compared with Fluram blocking using apomyoglobin and rabbit C-reactive protein as standard and unknown samples, respectively.  相似文献   

11.
12.
Babnigg G  Giometti CS 《Proteomics》2006,6(16):4514-4522
In proteome studies, identification of proteins requires searching protein sequence databases. The public protein sequence databases (e.g., NCBInr, UniProt) each contain millions of entries, and private databases add thousands more. Although much of the sequence information in these databases is redundant, each database uses distinct identifiers for the identical protein sequence and often contains unique annotation information. Users of one database obtain a database-specific sequence identifier that is often difficult to reconcile with the identifiers from a different database. When multiple databases are used for searches or the databases being searched are updated frequently, interpreting the protein identifications and associated annotations can be problematic. We have developed a database of unique protein sequence identifiers called Sequence Globally Unique Identifiers (SEGUID) derived from primary protein sequences. These identifiers serve as a common link between multiple sequence databases and are resilient to annotation changes in either public or private databases throughout the lifetime of a given protein sequence. The SEGUID Database can be downloaded (http://bioinformatics.anl.gov/SEGUID/) or easily generated at any site with access to primary protein sequence databases. Since SEGUIDs are stable, predictions based on the primary sequence information (e.g., pI, Mr) can be calculated just once; we have generated approximately 500 different calculations for more than 2.5 million sequences. SEGUIDs are used to integrate MS and 2-DE data with bioinformatics information and provide the opportunity to search multiple protein sequence databases, thereby providing a higher probability of finding the most valid protein identifications.  相似文献   

13.
In the present work, we use structural information to characterize a set of disease-associated single amino acid polymorphisms exhaustively. The analysis of different properties, such as substitution matrix elements, secondary structure, accessibility, free energies of transfer from water to octanol, amino acid volume, etc., suggests that many disease-causing mutations are associated with extreme changes in the value of parameters relating to protein stability. Overall, our results indicate that, while knowledge of protein structure clearly helps in understanding these mutations, a finer understanding can come only from a quantitative knowledge of protein stability and of the protein environment in the cell. Interestingly, use of evolutionary information from multiple sequence alignments can be used to increase our knowledge of disease-associated mutations.  相似文献   

14.
An improved generalized comparative modeling method, GENECOMP, for the refinement of threading models is developed and validated on the Fischer database of 68 probe-template pairs, a standard benchmark used to evaluate threading approaches. The basic idea is to perform ab initio folding using a lattice protein model, SICHO, near the template provided by the new threading algorithm PROSPECTOR. PROSPECTOR also provides predicted contacts and secondary structure for the template-aligned regions, and possibly for the unaligned regions by garnering additional information from other top-scoring threaded structures. Since the lowest-energy structure generated by the simulations is not necessarily the best structure, we employed two structure-selection protocols: distance geometry and clustering. In general, clustering is found to generate somewhat better quality structures in 38 of 68 cases. When applied to the Fischer database, the protocol does no harm and in a significant number of cases improves upon the initial threading model, sometimes dramatically. The procedure is readily automated and can be implemented on a genomic scale.  相似文献   

15.
The complete amino acid sequence of the 125-residue photoactive yellow protein (PYP) from Ectothiorhodospira halophila has been determined to be MEHVAFGSEDIENTLAKMDDGQLDGLAFGAIQLDGDGNILQYNAAEGDITGRDPKEVIGKNFFKDVAP+ ++ CTDSPEFYGKFKEGVASGNLNTMFEYTFDYQMTPTKVKVHMKKALSGDSYWVFVKRV. This is the first sequence to be reported for this class of proteins. There is no obvious sequence homology to any other protein, although the crystal structure, known at 2.4 A resolution (McRee, D.E., et al., 1989, Proc. Natl. Acad. Sci. USA 86, 6533-6537), indicates a relationship to the similarly sized fatty acid binding protein (FABP), a representative of a family of eukaryotic proteins that bind hydrophobic molecules. The amino acid sequence exhibits no greater similarity between PYP and FABP than for proteins chosen at random (8%). The photoactive yellow protein contains an unidentified chromophore that is bleached by light but recovers within a second. Here we demonstrate that the chromophore is bound covalently to Cys 69 instead of Lys 111 as deduced from the crystal structure analysis. The partially exposed side chains of Tyr 76, 94, and 118, plus Trp 119 appear to be arranged in a cluster and probably become more exposed due to a conformational change of the protein resulting from light-induced chromophore bleaching. The charged residues are not uniformly distributed on the protein surface but are arranged in positive and negative clusters on opposite sides of the protein. The exact chemical nature of the chromophore remains undetermined, but we here propose a possible structure based on precise mass analysis of a chromophore-binding peptide by electrospray ionization mass spectrometry and on the fact that the chromophore can be cleaved off the apoprotein upon reduction with a thiol reagent. The molecular mass of the chromophore, including an SH group, is 147.6 Da (+/- 0.5 Da); the cysteine residue to which it is bound is at sequence position 69.  相似文献   

16.
Mark E. Snow 《Proteins》1993,15(2):183-190
A novel scheme for the parameterization of a type of “potential energy” function for protein molecules is introduced. The function is parameterized based on the known conformations of previously determined protein structures and their sequence similarity to a molecule whose conformation is to be calculated. Once parameterized, minima of the potential energy function can be located using a version of simulated annealing which has been previously shown to locate global and near-global minima with the given functional form. As a test problem, the potential was parameterized based on the known structures of the rubredoxins from Desulfovibrio vulgaris, Desulfovibrio desulfuricans, and Clostridium pasteurianum, which vary from 45 to 54 amino acids in length, and the sequence alignments of these molecules with the rubredoxin sequence from Desulfovibrio gigas. Since the Desulfovibrio gigas rubredeoxin conformation has also been determined, it is possible to check the accuracy of the results. Ten simulated-annealing runs from random starting conformations were performed. Seven of the 10 resultant conformations have an all-Cα rms deviation from the crystallographically determined conformation of less than 1.7 Å. For five of the structures, the rms deviation is less than 0.8 Å. Four of the structures have conformations which are virtually identical to each other except for the position of the carboxy-terminal residue. This is also the conformation which is achieved if the determined crystal structure is minimized with the same potential. The all-Cα rms difference between the crystal and minimized crystal structures is 0.6 Å. It is further observed that the “energies” of the structures according to the potential function exhibit a strong correlation with rms deviation from the native structure. The conformations of the individual model structures and the computational aspects of the modeling procedure are discussed. © 1993 Wiley-Liss, Inc.  相似文献   

17.
18.
The availability of fast and robust algorithms for protein structure comparison provides an opportunity to produce a database of three-dimensional comparisons, called families of structurally similar proteins (FSSP). The database currently contains an extended structural family for each of 154 representative (below 30% sequence identity) protein chains. Each data set contains: the search structure; all its relatives with 70-30% sequence identity, aligned structurally; and all other proteins from the representative set that contain substructures significantly similar to the search structure. Very close relatives (above 70% sequence identity) rarely have significant structural differences and are excluded. The alignments of remote relatives are the result of pairwise all-against-all structural comparisons in the set of 154 representative protein chains. The comparisons were carried out with each of three novel automatic algorithms that cover different aspects of protein structure similarity. The user of the database has the choice between strict rigid-body comparisons and comparisons that take into account interdomain motion or geometrical distortions; and, between comparisons that require strictly sequential ordering of segments and comparisons, which allow altered topology of loop connections or chain reversals. The data sets report the structurally equivalent residues in the form of a multiple alignment and as a list of matching fragments to facilitate inspection by three-dimensional graphics. If substructures are ignored, the result is a database of structure alignments of full-length proteins, including those in the twilight zone of sequence similarity.(ABSTRACT TRUNCATED AT 250 WORDS)  相似文献   

19.
The accelerated pace of genomic sequencing has increased the demand for structural models of gene products. Improved quantitative methods are needed to study the many systems (e.g., macromolecular assemblies) for which data are scarce. Here, we describe a new molecular dynamics method for protein structure determination and molecular modeling. An energy function, or database potential, is derived from distributions of interatomic distances obtained from a database of known structures. X-ray crystal structures are refined by molecular dynamics with the new energy function replacing the Van der Waals potential. Compared to standard methods, this method improved the atomic positions, interatomic distances, and side-chain dihedral angles of structures randomized to mimic the early stages of refinement. The greatest enhancement in side-chain placement was observed for groups that are characteristically buried. More accurate calculated model phases will follow from improved interatomic distances. Details usually seen only in high-resolution refinements were improved, as is shown by an R-factor analysis. The improvements were greatest when refinements were carried out using X-ray data truncated at 3.5 A. The database potential should therefore be a valuable tool for determining X-ray structures, especially when only low-resolution data are available.  相似文献   

20.
There is indirect evidence that the amino acid composition of proteins depends on their dimension. The amino acid composition of a nonredundant set of about 550,000 proteins was determined and it was observed that, in the range of 50-200 residues, the percentage of occurrence of most of the residue types significantly depends on protein dimension. This result should prove useful in analyzing protein sequences and genomics.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号