首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Oldfield TJ 《Proteins》2002,49(4):510-528
The protein databank contains a vast wealth of structural and functional information. The analysis of this macromolecular information has been the subject of considerable work in order to advance knowledge beyond the collection of molecular coordinates. This article presents a method that determines local structural information within proteins using mathematical data mining techniques. The mine program described returns many known configurations of residues such as the catalytic triad, metal binding sites and the N-linked glycosylation site; as well as many other multiple residue interactions not previously categorized. Because mathematical constructs are used as targets, this method can identify new information not previously known, and also provide unbiased results of typical structure and their expected deviations. Because the results are defined mathematically, they cannot indicate the biological implications of the results. Therefore two support programs are described that provide insight into the biological context for the mine results. The first allows a weighted RMSD search between a template set of coordinates and a list of PDB files, and the second allows the labeling of a protein with the template results from mining to aid in the classification of this protein.  相似文献   

2.
3.
4.
We present several bioinformatics applications for the identification and quantification of phosphoproteome components by MS. These applications include a front‐end graphical user interface that combines several Thermo RAW formats to MASCOT? Generic Format extractors (EasierMgf), two graphical user interfaces for search engines OMSSA and SEQUEST (OmssaGui and SequestGui), and three applications, one for the management of databases in FASTA format (FastaTools), another for the integration of search results from up to three search engines (Integrator), and another one for the visualization of mass spectra and their corresponding database search results (JsonVisor). These applications were developed to solve some of the common problems found in proteomic and phosphoproteomic data analysis and were integrated in the workflow for data processing and feeding on our LymPHOS database. Applications were designed modularly and can be used standalone. These tools are written in Perl and Python programming languages and are supported on Windows platforms. They are all released under an Open Source Software license and can be freely downloaded from our software repository hosted at GoogleCode.  相似文献   

5.
We introduce the computer tool “Know Your Samples” (KYSS) for assessment and visualisation of large scale proteomics datasets, obtained by mass spectrometry (MS) experiments. KYSS facilitates the evaluation of sample preparation protocols, LC peptide separation, and MS and MS/MS performance by monitoring the number of missed cleavages, precursor ion charge states, number of protein identifications and peptide mass error in experiments. KYSS generates several different protein profiles based on protein abundances, and allows for comparative analysis of multiple experiments. KYSS was adapted for blood plasma proteomics and provides concentrations of identified plasma proteins. We demonstrate the utility of the KYSS tool for MS based proteome analysis of blood plasma and for assessment of hydrogel particles for depletion of abundant proteins in plasma. The KYSS software is open source and is freely available at http://kyssproject.github.io/.  相似文献   

6.
A consensus approach for the assignment of structural domains in proteins is presented. The approach combines a number of previously published algorithms, and takes advantage of the elevated accuracy obtained when assignments from the individual algorithms are in agreement. The consensus approach is tested on a data set of 55 protein chains, for which domain assignments from four automated methods were known, and for which crystallographers assignments had been reported in the literature. Accuracy was found to increase in this test from 72% using individual algorithms to 100% when all four methods were in agreement. However a consensus prediction using all four methods was only possible for 52% of the dataset. The consensus approach [using three publicly available domain assignment algorithms (PUU, DETECTIVE, DOMAK)] was then used to make domain assignments for a data set of 787 protein chains from the Protein Data Bank. Analysis of the assignments showed 55.7% of assignments could be made automatically, and of these, 13.5% were multi-domain proteins. Of the remaining 44.3% that could not be assigned by the consensus procedure 90.4% had their domain boundaries assigned correctly by at least one of the algorithms. Once identified, these domains were analyzed for trends in their size and secondary structure class. In addition, the discontinuity of each domain along the protein chain was considered.  相似文献   

7.
Analysis of metal-protein interaction distances, coordination numbers, B-factors (displacement parameters), and occupancies of metal-binding sites in protein structures determined by X-ray crystallography and deposited in the PDB shows many unusual values and unexpected correlations. By measuring the frequency of each amino acid in metal ion-binding sites, the positive or negative preferences of each residue for each type of cation were identified. Our approach may be used for fast identification of metal-binding structural motifs that cannot be identified on the basis of sequence similarity alone. The analysis compares data derived separately from high and medium-resolution structures from the PDB with those from very high-resolution small-molecule structures in the Cambridge Structural Database (CSD). For high-resolution protein structures, the distribution of metal-protein or metal-water interaction distances agrees quite well with data from CSD, but the distribution is unrealistically wide for medium (2.0-2.5 Å) resolution data. Our analysis of cation B-factors versus average B-factors of atoms in the cation environment reveals substantial numbers of structures contain either an incorrect metal ion assignment or an unusual coordination pattern. Correlation between data resolution and completeness of the metal coordination spheres is also found.  相似文献   

8.
Data analysis--not data production--is becoming the bottleneck in gene expression research. Data integration is necessary to cope with an ever increasing amount of data, to cross-validate noisy data sets, and to gain broad interdisciplinary views of large biological data sets. New Internet resources may help researchers to combine data sets across different gene expression platforms. However, noise and disparities in experimental protocols strongly limit data integration. A detailed review of four selected studies reveals how some of these limitations may be circumvented and illustrates what can be achieved through data integration.  相似文献   

9.
Short peptides connecting-helices and-strands have been analyzed in 240 proteins refined at resolutions of 0.25 nm or better. Connecting peptides of lengths between one and five residues have been classified as part of supersecondary motifs of four types:, , , and. Careful consideration has been given to the definition of secondary structures on the basis of hydrogen bonds and main-chain conformational angles. Using five classes of residue conformation—a, b, e, l, t—in the nonregular structure regions of, space, 34 classes of supersecondary motifs occurring at least five times have been identified. Among these 34 classes, 11 classes that occur more than 25 times are commonly occurring supersecondary structure motifs. The patterns and conformations of the 11 commonly occurring supersecondary structure motifs have been characterized, demonstrating that patterns and conformations adopted by supersecondary structure motifs are limited. The results have relevance to structure prediction, comparative modeling, and protein folding.  相似文献   

10.
蛋白质二级结构预测样本集数据库的设计与实现   总被引:1,自引:0,他引:1  
张宁  张涛 《生物信息学》2006,4(4):163-166
将数据库技术应用到蛋白质二级结构预测的样本集处理和分析上,建立了二级结构预测样本集数据库。以CB513样本集为例介绍了该数据库的构建模式。构建样本数据库不仅便于存储、管理和检索数据,还可以完成一些简单的序列分析工作,取代许多以往必须的编程。从而大大提高了工作效率,减少错误的发生。  相似文献   

11.
Absolute protein concentration determination is becoming increasingly important in a number of fields including diagnostics, biomarker discovery and systems biology modeling. The recently introduced quantification concatamer methodology provides a novel approach to performing such determinations, and it has been applied to both microbial and mammalian systems. While a number of software tools exist for performing analyses of quantitative data generated by related methodologies such as SILAC, there is currently no analysis package dedicated to the quantification concatamer approach. Furthermore, most tools that are currently available in the field of quantitative proteomics do not manage storage and dissemination of such data sets.  相似文献   

12.
An algorithm is described for automatically detecting hydrophobic cores in proteins of known structure. Three pieces of information are considered in order to achieve this goal. These are: secondary structure, side-chain accessibility, and side-chain-side-chain contacts. Residues are considered to contribute to a core when they occur in regular secondary structure and have buried side chains that form predominantly nonpolar contacts with one another. This paper describes the algorithm's application to families of proteins with conserved topologies but low sequence similarities. The aim of this investigation is to determine the efficacy of the algorithm as well as to study the extent to which similar cores are identified within a common topology.  相似文献   

13.
Accompanying the discovery of an increasing number of proteins, there is the need to provide functional annotation that is both highly accurate and consistent. The Gene Ontology (GO) provides consistent annotation in a computer readable and usable form; hence, GO annotation (GOA) has been assigned to a large number of protein sequences based on direct experimental evidence and through inference determined by sequence homology. Here we show that this annotation can be extended and corrected for cases where protein structures are available. Specifically, using the Combinatorial Extension (CE) algorithm for structure comparison, we extend the protein annotation currently provided by GOA at the European Bioinformatics Institute (EBI) to further describe the contents of the Protein Data Bank (PDB). Specific cases of biologically interesting annotations derived by this method are given. Given that the relationship between sequence, structure, and function is complicated, we explore the impact of this relationship on assigning GOA. The effect of superfolds (folds with many functions) is considered and, by comparison to the Structural Classification of Proteins (SCOP), the individual effects of family, superfamily, and fold.  相似文献   

14.
There has been an increased interest in computational methods for amyloid and (or) aggregate prediction, due to the prevalence of these aggregates in numerous diseases and their recently discovered functional importance. To evaluate these methods, several datasets have been compiled. Typically, aggregation‐prone regions of proteins, which form aggregates or amyloids in vivo, are more than 15 residues long and intrinsically disordered. However, the number of such experimentally established amyloid forming and non‐forming sequences are limited, not exceeding one hundred entries in existing databases. In this work, we parsed all available NMR‐resolved protein structures from the PDB and assembled a new, sevenfold larger, dataset of unfolded sequences, soluble at high concentrations. We proposed to use these sequences as a negative set for evaluating methods for predicting aggregation in vivo. We also present the results of benchmarking cutting edge tools for the prediction of aggregation versus solubility propensity.  相似文献   

15.
Recognition of regions on the surface of one protein, that are similar to a binding site of another is crucial for the prediction of molecular interactions and for functional classifications. We first describe a novel method, SiteEngine, that assumes no sequence or fold similarities and is able to recognize proteins that have similar binding sites and may perform similar functions. We achieve high efficiency and speed by introducing a low-resolution surface representation via chemically important surface points, by hashing triangles of physico-chemical properties and by application of hierarchical scoring schemes for a thorough exploration of global and local similarities. We proceed to rigorously apply this method to functional site recognition in three possible ways: first, we search a given functional site on a large set of complete protein structures. Second, a potential functional site on a protein of interest is compared with known binding sites, to recognize similar features. Third, a complete protein structure is searched for the presence of an a priori unknown functional site, similar to known sites. Our method is robust and efficient enough to allow computationally demanding applications such as the first and the third. From the biological standpoint, the first application may identify secondary binding sites of drugs that may lead to side-effects. The third application finds new potential sites on the protein that may provide targets for drug design. Each of the three applications may aid in assigning a function and in classification of binding patterns. We highlight the advantages and disadvantages of each type of search, provide examples of large-scale searches of the entire Protein Data Base and make functional predictions.  相似文献   

16.
One of the main goals in proteomics is to solve biological and molecular questions regarding a set of identified proteins. In order to achieve this goal, one has to extract and collect the existing biological data from public repositories for every protein and afterward, analyze and organize the collected data. Due to the complexity of this task and the huge amount of data available, it is not possible to gather this information by hand, making it necessary to find automatic methods of data collection. Within a proteomic context, we have developed Protein Information and Knowledge Extractor (PIKE) which solves this problem by automatically accessing several public information systems and databases across the Internet. PIKE bioinformatics tool starts with a set of identified proteins, listed as the most common protein databases accession codes, and retrieves all relevant and updated information from the most relevant databases. Once the search is complete, PIKE summarizes the information for every single protein using several file formats that share and exchange the information with other software tools. It is our opinion that PIKE represents a great step forward for information procurement and drastically reduces manual database validation for large proteomic studies. It is available at http://proteo.cnb.csic.es/pike .  相似文献   

17.
18.
19.
A study is presented of the conformational characteristics of NMR-derived protein structures in the Protein Data Bank compared to X-ray structures. Both ensemble and energy-minimized average structures are analyzed. We have addressed the problem using the methods developed for crystal structures by examining the distribution of ?, Ψ, and χ angles as indicators of global conformational irregularity. All these features in NMR structures occur to varying degrees in multiple conformational states. Some measures of local geometry are very tightly constrained by the methods used to generate the structure, e.g., proline ? angles, α-helix ?, Ψ angles, ω angles, and Cα chirality. The more lightly restrained torsion angles do show increasead clustering as the number of overall experimental observations increases. ?, Ψ, and χ1 angle conformational heterogeneity is strongly correlated with accessibility but shows additional differences which reflect the differing number of observations possible in NMR for the various side chains (e.g., many for Trp, few for Ser). In general, we find that the core is defined to a notional resolution of 2.0 to 2.3 Å. Of real interest is the behavior of surface residues and in particular the side chains where multiple rotameric states in different structures can vary from 10% to 88%. Later generation structures show a much tighter definition which correlates with increasing use of J-coupling information, stereospecific assignments, and heteronumclear techniques. A suite of programs is being developed to address the special needs of NMR-derived structures which will take into account the existence of increased mobility in solution. © 1993 Wiley-Liss, Inc.  相似文献   

20.
Prof. Haruki Nakamura, who is the former head of Protein Data Bank Japan (PDBj) and an expert in computational biology, retired from Osaka University at the end of March 2018. He founded PDBj at the Institute for Protein Research, together with other faculty members, researchers, engineers, and annotators in 2000, and subsequently established the worldwide Protein Data Bank (wwPDB) in 2003 to manage the core archive of the Protein Data Bank (PDB), collaborating with RCSB-PDB in the USA and PDBe in Europe. As the former head of PDBj and also an expert in structural bioinformatics, he has grown PDBj to become a well-known data center within the structural biology community and developed several related databases, tools and integrated with new technologies, such as the semantic web, as primary services offered by PDBj.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号