共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
BACKGROUND: Several methods of structural classification have been developed to introduce some order to the large amount of data present in the Protein Data Bank. Such methods facilitate structural comparisons and provide a greater understanding of structure and function. The most widely used and comprehensive databases are SCOP, CATH and FSSP, which represent three unique methods of classifying protein structures: purely manual, a combination of manual and automated, and purely automated, respectively. In order to develop reliable template libraries and benchmarks for protein-fold recognition, a systematic comparison of these databases has been carried out to determine their overall agreement in classifying protein structures. RESULTS: Approximately two-thirds of the protein chains in each database are common to all three databases. Despite employing different methods, and basing their systems on different rules of protein structure and taxonomy, SCOP, CATH and FSSP agree on the majority of their classifications. Discrepancies and inconsistencies are accounted for by a small number of explanations. Other interesting features have been identified, and various differences between manual and automatic classification methods are presented. CONCLUSIONS: Using these databases requires an understanding of the rules upon which they are based; each method offers certain advantages depending on the biological requirements and knowledge of the user. The degree of discrepancy between the systems also has an impact on reliability of prediction methods that employ these schemes as benchmarks. To generate accurate fold templates for threading, we extract information from a consensus database, encompassing agreements between SCOP, CATH and FSSP. 相似文献
3.
We present an automated procedure to assign CATH and SCOP classifications to proteins whose FSSP score is available. CATH classification is assigned down to the topology level, and SCOP classification is assigned to the fold level. Because the FSSP database is updated weekly, this method makes it possible to update also CATH and SCOP with the same frequency. Our predictions have a nearly perfect success rate when ambiguous cases are discarded. These ambiguous cases are intrinsic in any protein structure classification that relies on structural information alone. Hence, we introduce the "twilight zone for structure classification." We further suggest that to resolve these ambiguous cases, other criteria of classification, based also on information about sequence and function, must be used. 相似文献
4.
The F2CS server provides access to the software, F2CS2.00, which implements an automated prediction method of SCOP and CATH classifications of proteins, based on their FSSP Z-scores. AVAILABILITY: Free at http://www.weizmann.ac.il/physics/complex/compphys/f2cs/ SUPPLEMENTARY INFORMATION: The site contains links to additional figures and tables. 相似文献
5.
Background
Domain experts manually construct the Structural Classification of Protein (SCOP) database to categorize and compare protein structures. Even though using the SCOP database is believed to be more reliable than classification results from other methods, it is labor intensive. To mimic human classification processes, we develop an automatic SCOP fold classification system to assign possible known SCOP folds and recognize novel folds for newly-discovered proteins. 相似文献6.
Proteins are highly flexible molecules. Prediction of molecular flexibility aids in the comprehension and prediction of protein function and in providing details of functional mechanisms. The ability to predict the locations, directions, and extent of molecular movements can assist in fitting atomic resolution structures to low-resolution EM density maps and in predicting the complex structures of interacting molecules (docking). There are several types of molecular movements. In this work, we focus on the prediction of hinge movements. Given a single protein structure, the method automatically divides it into the rigid parts and the hinge regions connecting them. The method employs the Elastic Network Model, which is very efficient and was validated against a large data set of proteins. The output can be used in applications such as flexible protein-protein and protein-ligand docking, flexible docking of protein structures into cryo-EM maps, and refinement of low-resolution EM structures. The web server of HingeProt provides convenient visualization of the results and is available with two mirror sites at http://www.prc.boun.edu.tr/appserv/prc/HingeProt3 and http://bioinfo3d.cs.tau.ac.il/HingeProt/. 相似文献
7.
8.
MOTIVATION: The Monte Carlo fragment insertion method for protein tertiary structure prediction (ROSETTA) of Baker and others, has been merged with the I-SITES library of sequence structure motifs and the HMMSTR model for local structure in proteins, to form a new public server for the ab initio prediction of protein structure. The server performs several tasks in addition to tertiary structure prediction, including a database search, amino acid profile generation, fragment structure prediction, and backbone angle and secondary structure prediction. Meeting reasonable service goals required improvements in the efficiency, in particular for the ROSETTA algorithm. RESULTS: The new server was used for blind predictions of 40 protein sequences as part of the CASP4 blind structure prediction experiment. The results for 31 of those predictions are presented here. 61% of the residues overall were found in topologically correct predictions, which are defined as fragments of 30 residues or more with a root-mean-square deviation in superimposed alpha carbons of less than 6A. HMMSTR 3-state secondary structure predictions were 73% correct overall. Tertiary structure predictions did not improve the accuracy of secondary structure prediction. 相似文献
9.
SCOP: a structural classification of proteins database 总被引:17,自引:0,他引:17
Lo Conte L Ailey B Hubbard TJ Brenner SE Murzin AG Chothia C 《Nucleic acids research》2000,28(1):257-259
10.
11.
Gennarino VA Sardiello M Mutarelli M Dharmalingam G Maselli V Lago G Banfi S 《Gene》2011,482(1-2):51-58
An Antarctic strain (NJ-7) of Chlorella vulgaris possesses the same 18S rRNA sequence as that of a temperate strain (UTEX259), but shows significantly higher freezing tolerance than the latter. Suppression subtractive hybridization (SSH) was performed to identify genes of intensified expression in NJ-7 relative to UTEX259. Among the genes identified, Ccor1 and Ccor2, co-organized in the same gene cluster Ccor1-Ccor2-Ccor1-Ccor2, showed much higher expression levels in NJ-7 than in UTEX259 at both 20°C and 4°C. As detected by Northern blot and Western blot analyses, the two genes were cold-inducible in NJ-7 but almost not expressed in UTEX259. Their encoded products are predicted to share 55.7% identity to each other and possess physicochemical characteristics similar to that of late embryogenesis abundant (LEA) proteins in plants. The purified recombinant Ccor1 and Ccor2 showed high heat-stability and could act as cryoprotectants to lactate dehydrogenase in vitro. Based on their expression patterns and protein characteristics, we propose that Ccor1 and Ccor2 are two novel LEA proteins and are related to the greatly enhanced freezing tolerance in the Antarctic strain. 相似文献
12.
13.
Improved and automated prediction of effective siRNA 总被引:11,自引:0,他引:11
Chalk AM Wahlestedt C Sonnhammer EL 《Biochemical and biophysical research communications》2004,319(1):264-274
Short interfering RNAs are used in functional genomics studies to knockdown a single gene in a reversible manner. The results of siRNA experiments are highly dependent on the choice of siRNA sequence. In order to evaluate siRNA design rules, we collected a database of 398 siRNAs of known efficacy from 92 genes. We used this database to evaluate previously proposed rules from smaller datasets, and to find a new set of rules that are optimal for the entire database. We also trained a regression tree with full cross-validation. It was however difficult to obtain the same precision as methods previously tested on small datasets from one or two genes. We show that those methods are overfitting as they work poorly on independent validation datasets from multiple genes. Our new design rules can predict siRNAs with efficacy >/= 50% in 91% of cases, and with efficacy >/=90% in 52% of cases, which is more than a twofold improvement over random selection. Software for designing siRNAs is available online via a web server at or as a standalone version for high-throughput applications. 相似文献
14.
15.
Assigning macroinvertebrate tolerance classifications using generalised additive models 总被引:1,自引:0,他引:1
Lester L. Yuan 《Freshwater Biology》2004,49(5):662-677
1. Macroinvertebrates are frequently classified in terms of their tolerance to human disturbance and pollution. These tolerance values have been used effectively to assess the biological condition of running waters. 2. Generalised additive models were used to associate the presence and absence of different macroinvertebrate genera with different environmental gradients. The model results were then used to classify each genera as sensitive, intermediately tolerant or tolerant to different stressor gradients as quantified by total phosphorus concentration, sulphate ion concentration, qualitative habitat score and stream pH. The analytical approach provided a means of estimating stressor‐specific tolerance classifications while controlling for covarying, natural environmental gradients. 3. Computed tolerance classification generally conformed with expectations and provided some capacity for distinguishing between different stressors in test data. 相似文献
16.
Mapping using unique sequences 总被引:5,自引:0,他引:5
D C Torney 《Journal of molecular biology》1991,217(2):259-264
Theoretical predictions are given for the progress expected, when mapping DNA by identifying clones containing specific unique sequences. Progress is measured in three ways; however, all results depend on (dimensionless counterparts of) the number of clones and the number of unique sequences used. Furthermore, the effects of clone length dispersion are included in the theoretical predictions. Both the clones in the library and the unique sequences are assumed to be generated randomly, with uniform probability of originating at any base in the region to be mapped. The first measure of progress is the expected length fraction of the region to be mapped covered by at least one clone, when clones containing at least one unique sequence are included in the map. The second measure of progress is the expected length fraction of the region to be mapped in "covered intervals", an interval being the region between adjacent unique sequences. Alternative definitions for clones covering an interval are analyzed. The third measure of progress is the expected number of clone islands generated; an island covers successive intervals. Finally, using these measures of progress, we compare the efficiency of this new mapping strategy with conventional clone mapping strategies. 相似文献
17.
18.
Consensus clustering involves combining multiple clusterings of the same set of objects to achieve a single clustering that will, hopefully, provide a better picture of the groupings that are present in a dataset. This Letter reports the use of consensus clustering methods on sets of chemical compounds represented by 2D fingerprints. Experiments with DUD, IDAlert, MDDR and MUV data suggests that consensus methods are unlikely to result in significant improvements in clustering effectiveness as compared to the use of a single clustering method. 相似文献
19.
Tobi D 《Proteins》2012,80(4):1167-1176
A novel methodology for comparison of protein dynamics is presented. Protein dynamics is calculated using the Gaussian network model and the modes of motion are globally aligned using the dynamic programming algorithm of Needleman and Wunsch, commonly used for sequence alignment. The alignment is fast and can be used to analyze large sets of proteins. The methodology is applied to the four major classes of the SCOP database: "all alpha proteins," "all beta proteins," "alpha and beta proteins," and "alpha/beta proteins". We show that different domains may have similar global dynamics. In addition, we report that the dynamics of "all alpha proteins" domains are less specific to structural variations within a given fold or superfamily compared with the other classes. We report that domain pairs with the most similar and the least similar global dynamics tend to be of similar length. The significance of the methodology is that it suggests a new and efficient way of mapping between the global structural features of protein families/subfamilies and their encoded dynamics. 相似文献
20.
MaxSub: an automated measure for the assessment of protein structure prediction quality 总被引:1,自引:0,他引:1
MOTIVATION: Evaluating the accuracy of predicted models is critical for assessing structure prediction methods. Because this problem is not trivial, a large number of different assessment measures have been proposed by various authors, and it has already become an active subfield of research (Moult et al. (1997,1999) and CAFASP (Fischer et al. 1999) prediction experiments have demonstrated that it has been difficult to choose one single, 'best' method to be used in the evaluation. Consequently, the CASP3 evaluation was carried out using an extensive set of especially developed numerical measures, coupled with human-expert intervention. As part of our efforts towards a higher level of automation in the structure prediction field, here we investigate the suitability of a fully automated, simple, objective, quantitative and reproducible method that can be used in the automatic assessment of models in the upcoming CAFASP2 experiment. Such a method should (a) produce one single number that measures the quality of a predicted model and (b) perform similarly to human-expert evaluations. RESULTS: MaxSub is a new and independently developed method that further builds and extends some of the evaluation methods introduced at CASP3. MaxSub aims at identifying the largest subset of C(alpha) atoms of a model that superimpose 'well' over the experimental structure, and produces a single normalized score that represents the quality of the model. Because there exists no evaluation method for assessment measures of predicted models, it is not easy to evaluate how good our new measure is. Even though an exact comparison of MaxSub and the CASP3 assessment is not straightforward, here we use a test-bed extracted from the CASP3 fold-recognition models. A rough qualitative comparison of the performance of MaxSub vis-a-vis the human-expert assessment carried out at CASP3 shows that there is a good agreement for the more accurate models and for the better predicting groups. As expected, some differences were observed among the medium to poor models and groups. Overall, the top six predicting groups ranked using the fully automated MaxSub are also the top six groups ranked at CASP3. We conclude that MaxSub is a suitable method for the automatic evaluation of models. 相似文献