首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
SUMMARY: Voro3D is an original easy-to-use tool, which provides a brand new point of view on protein structures through the three-dimensional (3D) Voronoi tessellations. To construct the Voronoi cells associated with each amino acid by a number of different tessellation methods, Voro3D uses a protein structure file in the PDB format as an input. After calculation, different structural properties of interest like secondary structures assignment, environment accessibility and exact contact matrices can be derived without any geometrical cut-off. Voro3D provides also a visualization of these tessellations superimposed on the associated protein structure, from which it is possible to model a polygonal protein surface using a model solvent or to quantify, for instance, the contact areas between a protein and a ligand. AVAILABILITY: The software executable file for PC using Windows 98, 2000, NT, XP can be freely downloaded at http://www.lmcp.jussieu.fr/~mornon/voronoi.html CONTACT: franck.dupuis@sanofi-aventis.com; jean-paul-mornon@imcp.jussieu.fr.  相似文献   

2.
MOTIVATION: The solvent accessibility of amino acid residues plays an important role in tertiary structure prediction, especially in the absence of significant sequence similarity of a query protein to those with known structures. The prediction of solvent accessibility is less accurate than secondary structure prediction in spite of improvements in recent researches. The k-nearest neighbor method, a simple but powerful classification algorithm, has never been applied to the prediction of solvent accessibility, although it has been used frequently for the classification of biological and medical data. RESULTS: We applied the fuzzy k-nearest neighbor method to the solvent accessibility prediction, using PSI-BLAST profiles as feature vectors, and achieved high prediction accuracies. With leave-one-out cross-validation on the ASTRAL SCOP reference dataset constructed by sequence clustering, our method achieved 64.1% accuracy for a 3-state (buried/intermediate/exposed) prediction (thresholds of 9% for buried/intermediate and 36% for intermediate/exposed) and 86.7, 82.0, 79.0 and 78.5% accuracies for 2-state (buried/exposed) predictions (thresholds of each 0, 5, 16 and 25% for buried/exposed), respectively. Our method also showed slightly better accuracies than other methods by about 2-5% on the RS126 dataset and a benchmarking dataset with 229 proteins. AVAILABILITY: Program and datasets are available at http://biocom1.ssu.ac.kr/FKNNacc/ CONTACT: jul@ssu.ac.kr.  相似文献   

3.
Prediction of protein structure from its amino acid sequence is still a challenging problem. The complete physicochemical understanding of protein folding is essential for the accurate structure prediction. Knowledge of residue solvent accessibility gives useful insights into protein structure prediction and function prediction. In this work, we propose a random forest method, RSARF, to predict residue accessible surface area from protein sequence information. The training and testing was performed using 120 proteins containing 22006 residues. For each residue, buried and exposed state was computed using five thresholds (0%, 5%, 10%, 25%, and 50%). The prediction accuracy for 0%, 5%, 10%, 25%, and 50% thresholds are 72.9%, 78.25%, 78.12%, 77.57% and 72.07% respectively. Further, comparison of RSARF with other methods using a benchmark dataset containing 20 proteins shows that our approach is useful for prediction of residue solvent accessibility from protein sequence without using structural information. The RSARF program, datasets and supplementary data are available at http://caps.ncbs.res.in/download/pugal/RSARF/.  相似文献   

4.
5.
CS-PSeq-Gen is a program derived from PSeq-Gen, designed to perform simulations of the evolution of protein sequences under the constraints of a reconstructed phylogeny. It also provides a basis for the investigation of the correlated evolution of sites. AVAILABILITY: http://condor.urbb.jussieu.fr/CS-PSeq-Gen.html  相似文献   

6.
基于支持向量机方法的蛋白可溶性预测   总被引:1,自引:0,他引:1  
按照蛋白质序列中残基的相对可溶性,将其分为两类(表面/内部)和三类(表面/中间/内部)进行预测。选择不同窗宽和参数对数据进行训练和预测,以确保得到最好的分类效果,并同其他已有方法进行比较。对同一数据集不同分类阈值的预测结果显示,支持向量机方法对蛋白质可溶性的整体预测效果优于神经网络和信息论的方法。其中,对两类数据的最优分类结果达到79.0%,对三类数据的最优分类结果达到67.5%,表明支持向量机是蛋白质残基可溶性预测的一种有效方法。  相似文献   

7.
Qin S  He Y  Pan XM 《Proteins》2005,61(3):473-480
We have improved the multiple linear regression (MLR) algorithm for protein secondary structure prediction by combining it with the evolutionary information provided by multiple sequence alignment of PSI-BLAST. On the CB513 dataset, the three states average overall per-residue accuracy, Q(3), reached 76.4%, while segment overlap accuracy, SOV99, reached 73.2%, using a rigorous jackknife procedure and the strictest reduction of eight states DSSP definition to three states. This represents an improvement of approximately 5% on overall per-residue accuracy compared with previous work. The relative solvent accessibility prediction also benefited from this combination of methods. The system achieved 77.7% average jackknifed accuracy for two states prediction based on a 25% relative solvent accessibility mode, with a Mathews' correlation coefficient of 0.548. The improved MLR secondary structure and relative solvent accessibility prediction server is available at http://spg.biosci.tsinghua.edu.cn/.  相似文献   

8.
Prediction of protein stability upon amino acid substitutions is an important problem in molecular biology and the solving of which would help for designing stable mutants. In this work, we have analyzed the stability of protein mutants using two different datasets of 1396 and 2204 mutants obtained from ProTherm database, respectively for free energy change due to thermal (DeltaDeltaG) and denaturant denaturations (DeltaDeltaG(H(2)O)). We have used a set of 48 physical, chemical energetic and conformational properties of amino acid residues and computed the difference of amino acid properties for each mutant in both sets of data. These differences in amino acid properties have been related to protein stability (DeltaDeltaG and DeltaDeltaG(H(2)O)) and are used to train with classification and regression tool for predicting the stability of protein mutants. Further, we have tested the method with 4 fold, 5 fold and 10 fold cross validation procedures. We found that the physical properties, shape and flexibility are important determinants of protein stability. The classification of mutants based on secondary structure (helix, strand, turn and coil) and solvent accessibility (buried, partially buried, partially exposed and exposed) distinguished the stabilizing/destabilizing mutants at an average accuracy of 81% and 80%, respectively for DeltaDeltaG and DeltaDeltaG(H(2)O). The correlation between the experimental and predicted stability change is 0.61 for DeltaDeltaG and 0.44 for DeltaDeltaG(H(2)O). Further, the free energy change due to the replacement of amino acid residue has been predicted within an average error of 1.08 kcal/mol and 1.37 kcal/mol for thermal and chemical denaturation, respectively. The relative importance of secondary structure and solvent accessibility, and the influence of the dataset on prediction of protein mutant stability have been discussed.  相似文献   

9.
MOSAIC is a set of tools for the segmentation of multiple aligned DNA sequences into homogeneous zones. The segmentation is based on the distribution of mutational events along the alignment. As an example, the analysis of one repeated sequence belonging to the subtelomeric regions of the yeast genome is presented. AVAILABILITY: Free access from ftp://ftp.biomath.jussieu.fr/pub/papers/MOSAIC  相似文献   

10.
Pairwise local sequence alignment methods have been the prevailing technique to identify homologous nucleotides between related species. However, existing methods that identify and align all homologous nucleotides in one or more genomes have suffered from poor scalability and limited accuracy. We propose a novel method that couples a gapped extension heuristic with an efficient filtration method for identifying interspersed repeats in genome sequences. During gapped extension, we use the MUSCLE implementation of progressive global multiple alignment with iterative refinement. The resulting gapped extensions potentially contain alignments of unrelated sequence. We detect and remove such undesirable alignments using a hidden Markov model (HMM) to predict the posterior probability of homology. The HMM emission frequencies for nucleotide substitutions can be derived from any time-reversible nucleotide substitution matrix. We evaluate the performance of our method and previous approaches on a hybrid data set of real genomic DNA with simulated interspersed repeats. Our method outperforms a related method in terms of sensitivity, positive predictive value, and localizing boundaries of homology. The described methods have been implemented in freely available software, Repeatoire, available from: http://wwwabi.snv.jussieu.fr/public/Repeatoire.  相似文献   

11.
Thermoregulatory responses to heat exposure were studied in 12 hand-reared, acclimated pigeons (Columbia livia). Measurements of body temperature (Tcl), brain temperature (Tbr), cutaneous water evaporation (CWE) and respiratory frequency (fr) were carried out in intact conscious heat exposed birds. In a second group of lightly restrained birds, fr and CWE were taken when temperatures of the trunk, brain and air (Ta) were independently changed. Increasing Tbr to 43.5–43.8°C induced a pronounced polypnea (deep and fast, (300 breaths min−1) when Tcl regulated at 42.4°C. Moreover, when hyperthermia (Tcl = 43.0°C) was combined with increased Tbr (43.0–43.8°C) shallow and fast panting (>500 breaths min−1) was evoked. CWE was probably elicited by inputs generated by the skin warm receptors as a result of increased Ta. Moreover it was demonstrated that warming the brain to 42.5°C elicits cutaneous water evaporation in birds exposed to 26°C. When a high Ta (60°C) is accompanied by a high relative humidity (17%), the combined effect generates inputs eliciting intensive panting. The integration of the present and earlier data allows us to generate a model demonstrating the distinguished significance of the trunk, skin and brain thermosensors in the regulation of both respiratory and cutaneous latent heat dissipation. The present model also emphasizes the fact that the highly thermosensitive pigeon brain responds in a similar pattern to that found in mammals  相似文献   

12.
Knowing the coordination number and relative solvent accessibility of all the residues in a protein is crucial for deriving constraints useful in modeling protein folding and protein structure and in scoring remote homology searches. We develop ensembles of bidirectional recurrent neural network architectures to improve the state of the art in both contact and accessibility prediction, leveraging a large corpus of curated data together with evolutionary information. The ensembles are used to discriminate between two different states of residue contacts or relative solvent accessibility, higher or lower than a threshold determined by the average value of the residue distribution or the accessibility cutoff. For coordination numbers, the ensemble achieves performances ranging within 70.6-73.9% depending on the radius adopted to discriminate contacts (6A-12A). These performances represent gains of 16-20% over the baseline statistical predictor, always assigning an amino acid to the largest class, and are 4-7% better than any previous method. A combination of different radius predictors further improves performance. For accessibility thresholds in the relevant 15-30% range, the ensemble consistently achieves a performance above 77%, which is 10-16% above the baseline prediction and better than other existing predictors, by up to several percentage points. For both problems, we quantify the improvement due to evolutionary information in the form of PSI-BLAST-generated profiles over BLAST profiles. The prediction programs are implemented in the form of two web servers, CONpro and ACCpro, available at http://promoter.ics.uci.edu/BRNN-PRED/.  相似文献   

13.
Intragenic duplications of genetic material have important biological roles because of their protein sequence and structural consequences. We developed Swelfe to find internal repeats at three levels. Swelfe quickly identifies statistically significant internal repeats in DNA and amino acid sequences and in 3D structures using dynamic programming. The associated web server also shows the relationships between repeats at each level and facilitates visualization of the results. AVAILABILITY: http://bioserv.rpbs.jussieu.fr/swelfe. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

14.
Joo K  Lee SJ  Lee J 《Proteins》2012,80(7):1791-1797
We present a method to predict the solvent accessibility of proteins which is based on a nearest neighbor method applied to the sequence profiles. Using the method, continuous real-value prediction as well as two-state and three-state discrete predictions can be obtained. The method utilizes the z-score value of the distance measure in the feature vector space to estimate the relative contribution among the k-nearest neighbors for prediction of the discrete and continuous solvent accessibility. The Solvent accessibility database is constructed from 5717 proteins extracted from PISCES culling server with the cutoff of 25% sequence identities. Using optimal parameters, the prediction accuracies (for discrete predictions) of 78.38% (two-state prediction with the threshold of 25%), 65.1% (three-state prediction with the thresholds of 9 and 36%), and the Pearson correlation coefficient (between the predicted and true RSA's for continuous prediction) of 0.676 are achieved An independent benchmark test was performed with the CASP8 targets where we find that the proposed method outperforms existing methods. The prediction accuracies are 80.89% (for two state prediction with the threshold of 25%), 67.58% (three-state prediction), and the Pearson correlation coefficient of 0.727 (for continuous prediction) with mean absolute error of 0.148. We have also investigated the effect of increasing database sizes on the prediction accuracy, where additional improvement in the accuracy is observed as the database size increases. The SANN web server is available at http://lee.kias.re.kr/~newton/sann/.  相似文献   

15.
Repseek, a tool to retrieve approximate repeats from large DNA sequences   总被引:2,自引:0,他引:2  
Chromosomes or other long DNA sequences contain many highly similar repeated sub-sequences. While there are efficient methods for detecting strict repeats or detecting already characterized repeats, there is no software available for detecting approximate repeats in large DNA sequences allowing for weighted substitutions and indels in a coherent statistical framework. Here, we present an implementation of a two-steps method (seed detection followed by their extension) that detects those approximate repeats. Our method is computationally efficient enough to handle large sequences and is flexible enough to account for influencing factors, such as sequence-composition biases both at the seed detection and alignment levels. AVAILABILITY: http://wwwabi.snv.jussieu.fr/public/RepSeek/  相似文献   

16.
A strategy for finding regions of similarity in complete genome sequences   总被引:3,自引:2,他引:1  
MOTIVATION: Complete genomic sequences will become available in the future. New methods to deal with very large sequences (sizes beyond 100 kb) efficiently are required. One of the main aims of such work is to increase our understanding of genome organization and evolution. This requires studies of the locations of regions of similarity. RESULTS: We present here a new tool, ASSIRC ('Accelerated Search for SImilarity Regions in Chromosomes'), for finding regions of similarity in genomic sequences. The method involves three steps: (i) identification of short exact chains of fixed size, called 'seeds', common to both sequences, using hashing functions; (ii) extension of these seeds into putative regions of similarity by a 'random walk' procedure; (iii) final selection of regions of similarity by assessing alignments of the putative sequences. We used simulations to estimate the proportion of regions of similarity not detected for particular region sizes, base identity proportions and seed sizes. This approach can be tailored to the user's specifications. We looked for regions of similarity between two yeast chromosomes (V and IX). The efficiency of the approach was compared to those of conventional programs BLAST and FASTA, by assessing CPU time required and the regions of similarity found for the same data set. AVAILABILITY: Source programs are freely available at the following address: ftp://ftp.biologie.ens. fr/pub/molbio/assirc.tar.gz CONTACT: vincens@biologie.ens.fr, hazout@urbb.jussieu.fr   相似文献   

17.
The available epidemiological studies of lung cancer and exposure to other people''s tobacco smoke, in which exposure was assessed by whether or not a person classified as a non-smoker lived with a smoker, were identified and the results combined. There were 10 case-control studies and three prospective studies. Overall, there was a highly significant 35% increase in the risk of lung cancer among non-smokers living with smokers compared with non-smokers living with non-smokers (relative risk 1.35, 95% confidence interval 1.19 to 1.54). Part of this increase was almost certainly caused by the misclassification of some smokers as non-smokers. As smokers, who are more likely to get lung cancer than non-smokers, tend to live with smokers this misclassification probably exaggerated the estimated increase in risk. Adjustment for this error reduced the estimate to 30% (relative risk 1.30), but as people who live with non-smokers may still be exposed to other people''s smoke this estimate was revised again to allow for the fact that a truly unexposed reference group was not used. The increase in risk among non-smokers living with smokers compared with a completely unexposed group was thus estimated as 53% (relative risk of 1.53). This analysis, and the fact that non-smokers breathe environmental tobacco smoke, which contains carcinogens, into their lungs and that the generally accepted view is that there is no safe threshold for the effect of carcinogens, leads to the conclusion that breathing other people''s tobacco smoke is a cause of lung cancer. About a third of the cases of lung cancer in non-smokers who live with smokers, and about a quarter of the cases in non-smokers in general, may be attributed to such exposure.  相似文献   

18.
Predicting surface exposure of amino acids from protein sequence   总被引:8,自引:0,他引:8  
The amino acid residues on a protein surface play a key role in interaction with other molecules, determined many physical properties, and constrain the structure of the folded protein. A database of monomeric protein crystal structures was used to teach computer-simulated neural networks rules for predicting surface exposure from local sequence. These trained networks are able to correctly predict surface exposure for 72% of residues in a testing set using a binary model, (buried/exposed) and for 54% of residues using a ternary model (buried/intermediate/exposed). In the ternary model, only 11% of the exposed residues are predicted as buried and only 5% of the buried residues are predicted as exposed. Also, since the networks are able to predict exposure with a quantitative confidence estimate, it is possible to assign exposure for over half of the residues in a binary model with greater than 80% accuracy. Even more accurate predictions are obtained by making a consensus prediction of exposure for a homologous family. The effect of the local environment of an amino acid on its accessibility, though smaller than expected, is significant and accounts for the higher success rate of prediction than obtained with previously used criteria. In the absence of a three-dimensional structure, the ability to predict surface accessibility of amino acids directly from the sequence is a valuable tool in choosing sites of chemical modification or specific mutations and in studies of molecular interaction.  相似文献   

19.
We present BTMX (Beta barrel TransMembrane eXposure), a computational method to predict the exposure status (i.e. exposed to the bilayer or hidden in the protein structure) of transmembrane residues in transmembrane beta barrel proteins (TMBs). BTMX predicts the exposure status of known TM residues with an accuracy of 84.2% over 2,225 residues and provides a confidence score for all predictions. Predictions made are in concert with the fact that hydrophobic residues tend to be more exposed to the bilayer. The biological relevance of the input parameters is also discussed. The highest prediction accuracy is obtained when a sliding window comprising three residues with similar C(α)-C(β) vector orientations is employed. The prediction accuracy of the BTMX method on a separate unseen non-redundant test dataset is 78.1%. By employing out-pointing residues that are exposed to the bilayer, we have identified various physico-chemical properties that show statistically significant differences between the beta strands located at the oligomeric interfaces compared to the non-oligomeric strands. The BTMX web server generates colored, annotated snake-plots as part of the prediction results and is available under the BTMX tab at http://service.bioinformatik.uni-saarland.de/tmx-site/. Exposure status prediction of TMB residues may be useful in 3D structure prediction of TMBs.  相似文献   

20.
Yuan Z  Burrage K  Mattick JS 《Proteins》2002,48(3):566-570
A Support Vector Machine learning system has been trained to predict protein solvent accessibility from the primary structure. Different kernel functions and sliding window sizes have been explored to find how they affect the prediction performance. Using a cut-off threshold of 15% that splits the dataset evenly (an equal number of exposed and buried residues), this method was able to achieve a prediction accuracy of 70.1% for single sequence input and 73.9% for multiple alignment sequence input, respectively. The prediction of three and more states of solvent accessibility was also studied and compared with other methods. The prediction accuracies are better than, or comparable to, those obtained by other methods such as neural networks, Bayesian classification, multiple linear regression, and information theory. In addition, our results further suggest that this system may be combined with other prediction methods to achieve more reliable results, and that the Support Vector Machine method is a very useful tool for biological sequence analysis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号