首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 8 毫秒
1.
FORTE: a profile-profile comparison tool for protein fold recognition   总被引:1,自引:0,他引:1  
We present FORTE, a profile-profile comparison tool for protein fold recognition. Users can submit a protein sequence to explore the possibilities of structural similarity existing in known structures. Results are reported via email in the form of pairwise alignments.  相似文献   

2.
High throughput proteome screening for biomarker detection   总被引:6,自引:0,他引:6  
Mass spectrometry-based quantitative proteomics has become an important component of biological and clinical research. Current methods, while highly developed and powerful, are falling short of their goal of routinely analyzing whole proteomes mainly because the wealth of proteomic information accumulated from prior studies is not used for the planning or interpretation of present experiments. The consequence of this situation is that in every proteomic experiment the proteome is rediscovered. In this report we describe an approach for quantitative proteomics that builds on the extensive prior knowledge of proteomes and a platform for the implementation of the method. The method is based on the selection and chemical synthesis of isotopically labeled reference peptides that uniquely identify a particular protein and the addition of a panel of such peptides to the sample mixture consisting of tryptic peptides from the proteome in question. The platform consists of a peptide separation module for the generation of ordered peptide arrays from the combined peptide sample on the sample plate of a MALDI mass spectrometer, a high throughput MALDI-TOF/TOF mass spectrometer, and a suite of software tools for the selective analysis of the targeted peptides and the interpretation of the results. Applying the method to the analysis of the human blood serum proteome we demonstrate the feasibility of using mass spectrometry-based proteomics as a high throughput screening technology for the detection and quantification of targeted proteins in a complex system.  相似文献   

3.
The objective of this study is to automatically identify regions of the human proteome that are suitable for 3D structure determination by X-ray crystallography and to annotate them according to their likelihood to produce diffraction quality crystals. The results provide a powerful tool for structural genomics laboratories who wish to select human proteins based on the statistical likelihood of crystallisation success. Combining fold recognition and crystallisation prediction algorithms enables the efficient calculation of the crystallisability of the entire human proteome. This novel study estimates that there are approximately 40,000 crystallisable regions in the human proteome. Currently, only 15% of these regions (approx. 6,000 sequences) have been solved to at least 95% sequence identity. The remaining unsolved regions have been categorised into 5 crystallisation classes and an integral membrane protein (IMP) class, based on established structure prediction, crystallisation prediction and transmembrane (TM) helix prediction algorithms. Approximately 750 unsolved regions (2% of the proteome) have been identified as having a PDB fold representative (template) and an ‘optimal’ likelihood of crystallisation. At the other end of the spectrum, more than 10,500 non-IMP regions with a PDB template are classified as ‘very difficult’ to crystallise (26%) and almost 2,500 regions (6%) were predicted to contain at least 3 TM helices. The 3D-SPECS (3D Structural Proteomics Explorer with Crystallisation Scores) website contains crystallisation predictions for the entire human proteome and can be found at .  相似文献   

4.

Background  

Nonnegative matrix factorization (NMF) is a feature extraction method that has the property of intuitive part-based representation of the original features. This unique ability makes NMF a potentially promising method for biological sequence analysis. Here, we apply NMF to fold recognition and remote homolog detection problems. Recent studies have shown that combining support vector machines (SVM) with profile-profile alignments improves performance of fold recognition and remote homolog detection remarkably. However, it is not clear which parts of sequences are essential for the performance improvement.  相似文献   

5.
A technique that combines ion mobility spectrometry (IMS) with reversed-phase liquid chromatography (LC), collision-induced dissociation (CID) and mass spectrometry (MS) has been developed. The approach is described as a high throughput means of analysing complex mixtures of peptides that arise from enzymatic digestion of protein mixtures. In this approach, peptides are separated by LC and, as they elute from the column, they are introduced into the gas phase and ionised by electrospray ionisation. The beam of ions is accumulated in an ion trap and then the concentrated ion packet is injected into a drift tube where the ions are separated again in the gas phase by IMS, a technique that differentiates ions based on their mobilities through a buffer gas. As ions exit the drift tube, they can be subjected to collisional activation to produce fragments prior to being introduced into a mass spectrometer for detection. The IMS separation can be carried out in only a few milliseconds and offers a number of advantages compared with LC-MS alone. An example of a single 21-minute LC-IMS-(CID)-MS analysis of the human plasma proteome reveals approximately 20,000 parent ions and approximately 600,000 fragment ions and evidence for 227 unique protein assignments.  相似文献   

6.
胡始昌  江弋  林琛  邹权 《生物信息学》2012,10(2):112-115
蛋白质折叠问题被列为"21世纪的生物物理学"的重要课题,他是分子生物学中心法则尚未解决的一个重大生物学问题,因此预测蛋白质折叠模式是一个复杂、困难、和有挑战性的工作。为了解决该问题,我们引入了分类器集成,本文所采用的是三种分类器(LMT、RandomForest、SMO)进行集成以及188维组合理化特征来对蛋白质类别进行预测。实验证明,该方法可以有效表征蛋白质折叠模式的特性,对蛋白质序列数据实现精确分类;交叉验证和独立测试均证明本文预测准确率超过70%,比前人工作提高近10个百分点。  相似文献   

7.
The wealth of protein sequence and structure data is greater than ever, thanks to the ongoing Genomics and Structural Genomics projects. The information available through such efforts needs to be analysed by new methods that combine both databases. One important result of genomic sequence analysis is the inference of functional homology among proteins. Until recently sequence similarity comparison was the only method for homologue inference. The new fold recognition approach reviewed in this paper enhances sequence comparison methods by including structural information in the process of protein comparison. This additional information often allows for the detection of similarities that cannot be found by methods that only use sequence information.  相似文献   

8.
9.
MOTIVATION: We present an extensive evaluation of different methods and criteria to detect remote homologs of a given protein sequence. We investigate two associated problems: first, to develop a sensitive searching method to identify possible candidates and, second, to assign a confidence to the putative candidates in order to select the best one. For searching methods where the score distributions are known, p-values are used as confidence measure with great success. For the cases where such theoretical backing is absent, we propose empirical approximations to p-values for searching procedures. RESULTS: As a baseline, we review the performances of different methods for detecting remote protein folds (sequence alignment and threading, with and without sequence profiles, global and local). The analysis is performed on a large representative set of protein structures. For fold recognition, we find that methods using sequence profiles generally perform better than methods using plain sequences, and that threading methods perform better than sequence alignment methods. In order to assess the quality of the predictions made, we establish and compare several confidence measures, including raw scores, z-scores, raw score gaps, z-score gaps, and different methods of p-value estimation. We work our way from the theoretically well backed local scores towards more explorative global and threading scores. The methods for assessing the statistical significance of predictions are compared using specificity--sensitivity plots. For local alignment techniques we find that p-value methods work best, albeit computationally cheaper methods such as those based on score gaps achieve similar performance. For global methods where no theory is available methods based on score gaps work best. By using the score gap functions as the measure of confidence we improve the more powerful fold recognition methods for which p-values are unavailable. AVAILABILITY: The benchmark set is available upon request.  相似文献   

10.
The Sequence Alignment Benchmark (SABmark) provides sets of multiple alignment problems derived from the SCOP classification. These sets, Twilight Zone and Superfamilies, both cover the entire known fold space using sequences with very low to low, and low to intermediate similarity, respectively. In addition, each set has an alternate version in which unalignable but apparently similar sequences are added to each problem.  相似文献   

11.
McGuffin LJ  Jones DT 《Proteins》2003,52(2):166-175
If secondary structure predictions are to be incorporated into fold recognition methods, an assessment of the effect of specific types of errors in predicted secondary structures on the sensitivity of fold recognition should be carried out. Here, we present a systematic comparison of different secondary structure prediction methods by measuring frequencies of specific types of error. We carry out an evaluation of the effect of specific types of error on secondary structure element alignment (SSEA), a baseline fold recognition method. The results of this evaluation indicate that missing out whole helix or strand elements, or predicting the wrong type of element, is more detrimental than predicting the wrong lengths of elements or overpredicting helix or strand. We also suggest that SSEA scoring is an effective method for assessing accuracy of secondary structure prediction and perhaps may also provide a more appropriate assessment of the "usefulness" and quality of predicted secondary structure, if secondary structure alignments are to be used in fold recognition.  相似文献   

12.
Ensemble classifier for protein fold pattern recognition   总被引:4,自引:0,他引:4  
MOTIVATION: Prediction of protein folding patterns is one level deeper than that of protein structural classes, and hence is much more complicated and difficult. To deal with such a challenging problem, the ensemble classifier was introduced. It was formed by a set of basic classifiers, with each trained in different parameter systems, such as predicted secondary structure, hydrophobicity, van der Waals volume, polarity, polarizability, as well as different dimensions of pseudo-amino acid composition, which were extracted from a training dataset. The operation engine for the constituent individual classifiers was OET-KNN (optimized evidence-theoretic k-nearest neighbors) rule. Their outcomes were combined through a weighted voting to give a final determination for classifying a query protein. The recognition was to find the true fold among the 27 possible patterns. RESULTS: The overall success rate thus obtained was 62% for a testing dataset where most of the proteins have <25% sequence identity with the proteins used in training the classifier. Such a rate is 6-21% higher than the corresponding rates obtained by various existing NN (neural networks) and SVM (support vector machines) approaches, implying that the ensemble classifier is very promising and might become a useful vehicle in protein science, as well as proteomics and bioinformatics. AVAILABILITY: The ensemble classifier, called PFP-Pred, is available as a web-server at http://202.120.37.186/bioinf/fold/PFP-Pred.htm for public usage.  相似文献   

13.
Improvement of the GenTHREADER method for genomic fold recognition   总被引:10,自引:0,他引:10  
MOTIVATION: In order to enhance genome annotation, the fully automatic fold recognition method GenTHREADER has been improved and benchmarked. The previous version of GenTHREADER consisted of a simple neural network which was trained to combine sequence alignment score, length information and energy potentials derived from threading into a single score representing the relationship between two proteins, as designated by CATH. The improved version incorporates PSI-BLAST searches, which have been jumpstarted with structural alignment profiles from FSSP, and now also makes use of PSIPRED predicted secondary structure and bi-directional scoring in order to calculate the final alignment score. Pairwise potentials and solvation potentials are calculated from the given sequence alignment which are then used as inputs to a multi-layer, feed-forward neural network, along with the alignment score, alignment length and sequence length. The neural network has also been expanded to accommodate the secondary structure element alignment (SSEA) score as an extra input and it is now trained to learn the FSSP Z-score as a measurement of similarity between two proteins. RESULTS: The improvements made to GenTHREADER increase the number of remote homologues that can be detected with a low error rate, implying higher reliability of score, whilst also increasing the quality of the models produced. We find that up to five times as many true positives can be detected with low error rate per query. Total MaxSub score is doubled at low false positive rates using the improved method. AVAILABILITY: http://www.psipred.net.  相似文献   

14.
Reducing the complexity of plasma proteome through complex multidimensional fractionation protocols is critical for the detection of low abundance proteins that have the potential to be the most specific disease biomarkers. Therefore, we examined a four dimension profiling method, which includes low abundance protein enrichment, tryptic digestion and peptide fractionation by IEF, SCX and RP-LC. The application of peptide pI filtering as an additional criterion for the validation of the identifications allows to minimize the false discovery rate and to optimize the best settings of the protein identification database search engine. This sequential approach allows for the identification of low abundance proteins, such as angiogenin (10?9 g/L), pigment epithelium growth factor (10?8 g/L), hepatocyte growth factor activator (10?7 g/L) and thrombospondin-1 (10?6 g/L), having concentrations similar to those of many other growth factors and cytokines involved in disease pathophysiology.  相似文献   

15.
The development of high‐throughput methods for gene discovery has paved the way for the design of new strategies for genome‐scale protein analysis. Lawrence Livermore National Laboratory and Onyx Pharmaceuticals, Inc., have produced an automatable system for the expression and purification of large numbers of proteins encoded by cDNA clones from the IMAGE (Integrated Molecular Analysis of Genomes and Their Expression) collection. This high‐throughput protein expression system has been developed for the analysis of the human proteome, the protein equivalent of the human genome, comprising the translated products of all expressed genes. Functional and structural analysis of novel genes identified by EST (Expressed Sequence Tag) sequencing and the Human Genome Project will be greatly advanced by the application of this high‐throughput expression system for protein production. A prototype was designed to demonstrate the feasibility of our approach. Using a PCR‐based strategy, 72 unique IMAGE cDNA clones have been used to create an array of recombinant baculoviruses in a 96‐well microtiter plate format. Forty‐two percent of these cDNAs successfully produced soluble, recombinant protein. All of the steps in this process, from PCR to protein production, were performed in 96‐well microtiter plates, and are thus amenable to automation. Each recombinant protein was engineered to incorporate an epitope tag at the amino terminal end to allow for immunoaffinity purification. Proteins expressed from this system are currently being analyzed for functional and biochemical properties. J. Cell. Biochem. 80:187–191, 2000. © 2000 Wiley‐Liss, Inc.  相似文献   

16.
An immunoassay for interferon-gamma (IFN-gamma) using homogeneous time-resolved fluorescence (HTRF) has been developed. In this assay, IFN-gamma can be detected by simply adding a mixture of three reagents-biotinylated polyclonal antibody, europium cryptate (fluorescence donor, EuK)-labeled monoclonal antibody, and crosslinked allophycocyanin (fluorescence acceptor, XL665) conjugated with streptavidin-and then measuring the time-resolved fluorescence. The detection limit of IFN-gamma by the proposed method is about 625 pg/ml. We applied the method to the detection of IFN-gamma secreted from NK3.3 cells and employed it in high throughput screening for IFN-gamma production inhibitors. With this screening format, IFN-gamma can be measured by directly adding the above reagents to microplate wells where NK3.3 cells are being cultured and stimulated with interleukin-12. This "in situ" immunoassay requires only pipetting reagents, with no need to transfer the culture supernatant to another microplate or wash the plate. Therefore, this screening format makes possible full automation of cell-based immunoassay, thus reducing cost and experimental time while increasing accuracy and throughput.  相似文献   

17.
Taylor WR  Jonassen I 《Proteins》2004,56(2):222-234
A method (SPREK) was developed to evaluate the register of a sequence on a structure based on the matching of structural patterns against a library derived from the protein structure databank. The scores obtained were normalized against random background distributions derived from sequence shuffling and permutation methods. 'Random' structures were also used to evaluate the effectiveness of the method. These were generated by a simple random-walk and a more sophisticated structure prediction method that produced protein-like folds. For comparison with other methods, the performance of the method was assessed using collections of models including decoys and models from the CASP-5 exercise. The performance of SPREK on the decoy models was equivalent to (and sometimes better than) those obtained with more complex approaches. An exception was the two smallest proteins, for which SPREK did not perform well due to a lack of patterns. Using the best parameter combination from trials on decoy models, the CASP models of intermediate difficulty were evaluated by SPREK and the quality of the top scoring model was evaluated by its CASP ranking. Of the 14 targets in this class, half lie in the top 10% (out of around 140 models for each target). The two worst rankings resulted from the selection by our method of a well-packed model that was based on the wrong fold. Of the other poor rankings, one was the smallest protein and the others were the four largest (all over 250 residues).  相似文献   

18.
A new approach based on the implementation of support vector machine (SVM) with the error correcting output codes (ECOC) is presented for recognition of multi-class protein folds. The experimental show that the proposed method can improve prediction accuracy by 4%-10% on two datasets containing 27 SCOP folds.  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号