首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background  

Detection of DNA-binding sites in proteins is of enormous interest for technologies targeting gene regulation and manipulation. We have previously shown that a residue and its sequence neighbor information can be used to predict DNA-binding candidates in a protein sequence. This sequence-based prediction method is applicable even if no sequence homology with a previously known DNA-binding protein is observed. Here we implement a neural network based algorithm to utilize evolutionary information of amino acid sequences in terms of their position specific scoring matrices (PSSMs) for a better prediction of DNA-binding sites.  相似文献   

2.
Secondary structure prediction is a crucial task for understanding the variety of protein structures and performed biological functions. Prediction of secondary structures for new proteins using their amino acid sequences is of fundamental importance in bioinformatics. We propose a novel technique to predict protein secondary structures based on position-specific scoring matrices (PSSMs) and physico-chemical properties of amino acids. It is a two stage approach involving multiclass support vector machines (SVMs) as classifiers for three different structural conformations, viz., helix, sheet and coil. In the first stage, PSSMs obtained from PSI-BLAST and five specially selected physicochemical properties of amino acids are fed into SVMs as features for sequence-to-structure prediction. Confidence values for forming helix, sheet and coil that are obtained from the first stage SVM are then used in the second stage SVM for performing structure-to-structure prediction. The two-stage cascaded classifiers (PSP_MCSVM) are trained with proteins from RS126 dataset. The classifiers are finally tested on target proteins of critical assessment of protein structure prediction experiment-9 (CASP9). PSP_MCSVM with brainstorming consensus procedure performs better than the prediction servers like Predator, DSC, SIMPA96, for randomly selected proteins from CASP9 targets. The overall performance is found to be comparable with the current state-of-the art. PSP_MCSVM source code, train-test datasets and supplementary files are available freely in public domain at: and  相似文献   

3.
Summary We isolated a gene encoding a 218 kDa myosin-like protein from Saccharomyces cerevisiae using a monoclonal antibody directed against human platelet myosin as a probe. The protein sequence encoded by the MLP1 gene (for myosin-like protein) contains extensive stretches of a heptad-repeat pattern suggesting that the protein can form coiled coils typical of myosins. Immunolocalization experiments using affinity-purified antibodies raised against a TrpE-MLP1 fusion protein showed a dot-like structure adjacent to the nucleus in yeast cells bearing the MLP1 gene on a multicopy plasmid. In mouse epithelial cells the yeast anti-MLP1 antibodies stained the nucleus. Mutants bearing disruptions of the MLP1 gene were viable, but more sensitive to ultraviolet light than wild-type strains, suggesting an involvement of MLP1 in DNA repair. The MLP1 gene was mapped to chromosome 11, 25 cM from met1.  相似文献   

4.
MOTIVATION: Position specific scoring matrices (PSSMs) corresponding to aligned sequences of homologous proteins are commonly used in homology detection. A PSSM is generated on the basis of one of the homologues as a reference sequence, which is the query in the case of PSI-BLAST searches. The reference sequence is chosen arbitrarily while generating PSSMs for reverse BLAST searches. In this work we demonstrate that the use of multiple PSSMs corresponding to a given alignment and variable reference sequences is more effective than using traditional single PSSMs and hidden Markov models. RESULTS: Searches for proteins with known 3-D structures have been made against three databases of protein family profiles corresponding to known structures: (1) One PSSM per family; (2) multiple PSSMs corresponding to an alignment and variable reference sequences for every family; and (3) hidden Markov models. A comparison of the performances of these three approaches suggests that the use of multiple PSSMs is most effective. CONTACT: ns@mbu.iisc.ernet.in.  相似文献   

5.
6.

Background  

In biological sequence analysis, position specific scoring matrices (PSSMs) are widely used to represent sequence motifs in nucleotide as well as amino acid sequences. Searching with PSSMs in complete genomes or large sequence databases is a common, but computationally expensive task.  相似文献   

7.
Ho SY  Yu FC  Chang CY  Huang HL 《Bio Systems》2007,90(1):234-241
In this paper, we investigate the design of accurate predictors for DNA-binding sites in proteins from amino acid sequences. As a result, we propose a hybrid method using support vector machine (SVM) in conjunction with evolutionary information of amino acid sequences in terms of their position-specific scoring matrices (PSSMs) for prediction of DNA-binding sites. Considering the numbers of binding and non-binding residues in proteins are significantly unequal, two additional weights as well as SVM parameters are analyzed and adopted to maximize net prediction (NP, an average of sensitivity and specificity) accuracy. To evaluate the generalization ability of the proposed method SVM-PSSM, a DNA-binding dataset PDC-59 consisting of 59 protein chains with low sequence identity on each other is additionally established. The SVM-based method using the same six-fold cross-validation procedure and PSSM features has NP=80.15% for the training dataset PDNA-62 and NP=69.54% for the test dataset PDC-59, which are much better than the existing neural network-based method by increasing the NP values for training and test accuracies up to 13.45% and 16.53%, respectively. Simulation results reveal that SVM-PSSM performs well in predicting DNA-binding sites of novel proteins from amino acid sequences.  相似文献   

8.
目的 蛋白质的柔性运动对生物体各种反应有着重要意义,基于蛋白质的空间结构预测其柔性运动是蛋白质结构-功能关系领域的重要问题.卷积神经网络(convolutional neural network,CNN)在蛋白质结构-功能关系研究中已有成功应用.方法 本研究借鉴计算机视觉研究中PointNet方法的思想,提出了一种蛋白...  相似文献   

9.

Background

BLAST is a commonly-used software package for comparing a query sequence to a database of known sequences; in this study, we focus on protein sequences. Position-specific-iterated BLAST (PSI-BLAST) iteratively searches a protein sequence database, using the matches in round i to construct a position-specific score matrix (PSSM) for searching the database in round i?+?1. Biegert and S?ding developed Context-sensitive BLAST (CS-BLAST), which combines information from searching the sequence database with information derived from a library of short protein profiles to achieve better homology detection than PSI-BLAST, which builds its PSSMs from scratch.

Results

We describe a new method, called domain enhanced lookup time accelerated BLAST (DELTA-BLAST), which searches a database of pre-constructed PSSMs before searching a protein-sequence database, to yield better homology detection. For its PSSMs, DELTA-BLAST employs a subset of NCBI??s Conserved Domain Database (CDD). On a test set derived from ASTRAL, with one round of searching, DELTA-BLAST achieves a ROC5000 of 0.270 vs. 0.116 for CS-BLAST. The performance advantage diminishes in iterated searches, but DELTA-BLAST continues to achieve better ROC scores than CS-BLAST.

Conclusions

DELTA-BLAST is a useful program for the detection of remote protein homologs. It is available under the ??Protein BLAST?? link at http://blast.ncbi.nlm.nih.gov.

Reviewers

This article was reviewed by Arcady Mushegian, Nick V. Grishin, and Frank Eisenhaber.  相似文献   

10.
11.
Selection of the best quality embryo is the key for a faithful implantation in in vitro fertilization (IVF) practice. However, the process of evaluating numerous images captured by time-lapse imaging (TLI) system is time-consuming and some important features cannot be recognized by naked eyes. Convolutional neural network (CNN) is used in medical imaging yet in IVF. The study aims to apply CNN on day-one human embryo TLI. We first presented CNN algorithm for day-one human embryo segmentation on three distinct features: zona pellucida (ZP), cytoplasm and pronucleus (PN). We tested the CNN performance compared side-by-side with manual labelling by clinical embryologist, then measured the segmented day-one human embryo parameters and compared them with literature reported values. The precisions of segmentation were that cytoplasm over 97%, PN over 84% and ZP around 80%. For the morphometrics data of cytoplasm, ZP and PN, the results were comparable with those reported in literatures, which showed high reproducibility and consistency. The CNN system provides fast and stable analytical outcome to improve work efficiency in IVF setting. To conclude, our CNN system is potential to be applied in practice for day-one human embryo segmentation as a robust tool with high precision, reproducibility and speed.  相似文献   

12.
Designing protein sequences that fold to a given three-dimensional (3D) structure has long been a challenging problem in computational structural biology with significant theoretical and practical implications. In this study, we first formulated this problem as predicting the residue type given the 3D structural environment around the C α atom of a residue, which is repeated for each residue of a protein. We designed a nine-layer 3D deep convolutional neural network (CNN) that takes as input a gridded box with the atomic coordinates and types around a residue. Several CNN layers were designed to capture structure information at different scales, such as bond lengths, bond angles, torsion angles, and secondary structures. Trained on a very large number of protein structures, the method, called ProDCoNN (protein design with CNN), achieved state-of-the-art performance when tested on large numbers of test proteins and benchmark datasets.  相似文献   

13.
Approximate Bayesian computation (ABC) is useful for parameterizing complex models in population genetics. In this study, ABC was applied to simultaneously estimate parameter values for a model of metapopulation coalescence and test two alternatives to a strict metapopulation model in the well‐studied network of Daphnia magna populations in Finland. The models shared four free parameters: the subpopulation genetic diversity (θS), the rate of gene flow among patches (4Nm), the founding population size (N0) and the metapopulation extinction rate (e) but differed in the distribution of extinction rates across habitat patches in the system. The three models had either a constant extinction rate in all populations (strict metapopulation), one population that was protected from local extinction (i.e. a persistent source), or habitat‐specific extinction rates drawn from a distribution with specified mean and variance. Our model selection analysis favoured the model including a persistent source population over the two alternative models. Of the closest 750 000 data sets in Euclidean space, 78% were simulated under the persistent source model (estimated posterior probability = 0.769). This fraction increased to more than 85% when only the closest 150 000 data sets were considered (estimated posterior probability = 0.774). Approximate Bayesian computation was then used to estimate parameter values that might produce the observed set of summary statistics. Our analysis provided posterior distributions for e that included the point estimate obtained from previous data from the Finnish D. magna metapopulation. Our results support the use of ABC and population genetic data for testing the strict metapopulation model and parameterizing complex models of demography.  相似文献   

14.
Two novel myb-like genes (atmyb6 and atmyb7) were isolated from an Arabidopsis thaliana cDNA library. The entire proteins or the Myb domains encoded by the genes were expressed as fusion proteins in Escherichia coli. The DNA-binding domain of the murine c-Myb was also expressed in the same way for use in comparative studies. The fusion proteins were examined for their DNA-binding activity using the animal c-Myb DNA-binding site (MBS) and the binding site of the maize P gene product (PBS). The Myb domain of Atmyb6 bound to PBS more efficiently than to MBS. Complete Atmyb6 and Atmyb7 proteins preferentially bound to PBS but not MBS. This suggests that the in vitro binding consensus sequences for both Atmyb6 and Atmyb7 are similar to PBS. The binding of the Myb domain of Atmyb6 to both PBS and MBS raises the possibility that the protein recognizes multiple sequences in vivo. The third α-helix and three adjacent amino acids in the third repeat (R3) of c-Myb were replaced with the analogous sequence of Atmyb6 to create a chimeric Myb protein. This chimeric protein bound to PBS with a low affinity but failed to bind to MBS. Thus the binding pattern of the chimeric Myb protein is similar to that of the Atmyb6. This result suggests that the last 20 amino acids in the R3 repeat of Atmyb6 play a major role in DNA-binding.  相似文献   

15.
Luminol-enhanced chemiluminescence (CL) was used to examine the response of various leukocyte populations following stimulation with a crude extract of Phaseolus vulgaris, namely phytohaemagglutinin (PHA-C). Populations stimulated included a human peripheral mixed leukocyte preparation (MLP), and purified preparations of lymphocytes, monocytes and polymorphonuclear leukocytes (PMNL). Mouse peritoneal exudate cells and the lymphocytic cells lines Molt #4 and Daudi were also stimulated. Following stimulation, a characteristic three-peaked chemiluminescent response was obtained from the MLP population. Little or no response was obtained from the purified lymphocytes. Monocytes produced a sharp peak corresponding to the second peak of the MLP response and PMNL produced a broad peak corresponding to the third peak of the MLP response. Mouse peritoneal exudate cells containing lymphocytes and monocytes/macrophages showed a two-peaked stimulation which corresponded to the first two peaks of the MLP response. Molt #4 and Daudi showed no chemiluminescence if stimulated individually, but if added to a MLP substantial enhancement of the first and second peaks was observed. These results indicate some form of lymphocyte/monocyte interaction leading to enhanced CL following PHA-C stimulation.  相似文献   

16.
17.
Accurately assigning folds for divergent protein sequences is a major obstacle to structural studies. Herein, we outline an effective method for fold recognition using sets of PSSMs, each of which is constructed for different protein folds. Our analyses demonstrate that FSL (Fold-specific Position Specific Scoring Matrix Libraries) can predict/relate structures given only their amino acid sequences of highly divergent proteins. This ability to detect distant relationships is dependent on low-identity sequence alignments obtained from FSL. Results from our experiments demonstrate that FSL perform well in recognizing folds from the "twilight-zone" SABmark dataset. Further, this method is capable of accurate fold prediction in newly determined structures. We suggest that by building complete PSSM libraries for all unique folds within the Protein Database (PDB), FSL can be used to rapidly and reliably annotate a large subset of protein folds at proteomic level. The related programs and fold-specific PSSMs for our FSL are publicly available at: http://ccp.psu.edu/download/FSLv1.0/.  相似文献   

18.
In the post-genome era, the prediction of protein function is one of the most demanding tasks in the study of bioinformatics. Machine learning methods, such as the support vector machines (SVMs), greatly help to improve the classification of protein function. In this work, we integrated SVMs, protein sequence amino acid composition, and associated physicochemical properties into the study of nucleic-acid-binding proteins prediction. We developed the binary classifications for rRNA-, RNA-, DNA-binding proteins that play an important role in the control of many cell processes. Each SVM predicts whether a protein belongs to rRNA-, RNA-, or DNA-binding protein class. Self-consistency and jackknife tests were performed on the protein data sets in which the sequences identity was < 25%. Test results show that the accuracies of rRNA-, RNA-, DNA-binding SVMs predictions are approximately 84%, approximately 78%, approximately 72%, respectively. The predictions were also performed on the ambiguous and negative data set. The results demonstrate that the predicted scores of proteins in the ambiguous data set by RNA- and DNA-binding SVM models were distributed around zero, while most proteins in the negative data set were predicted as negative scores by all three SVMs. The score distributions agree well with the prior knowledge of those proteins and show the effectiveness of sequence associated physicochemical properties in the protein function prediction. The software is available from the author upon request.  相似文献   

19.
The endosperm starch of the wheat grain is composed of amylose and amylopectin. Genetic manipulation of the ratio of amylose to amylopectin or the amylose content could bring about improved texture and quality of wheat flour. The chromosomal locations of genes affecting amylose content were investigated using a monosomic series of Chinese Spring (CS) and a set of Cheyenne (CNN) chromosome substitution lines in the CS genetic background. Trials over three seasons revealed that a decrease in amylose content occurred in monosomic 4A and an increase in monosomic 7B. Allelic variation between CS and CNN was suggested for the genes on chromosomes 4A and 7B. To examine the effects of three Waxy (Wx) genes which encode a granule-bound starch synthase (Wx protein), the Wx proteins from CS monosomics of interest were analyzed using SDS-PAGE. The amount of the Wx protein coded by the Wx-B1 gene on chromosome arm 4AL was reduced in monosomic 4A, and thus accounted for its decreased amylose content. The amounts of two other Wx proteins coded by the Wx-A1 and Wx-D1 genes on chromosome arms 7AS and 7DS, respectively, showed low levels of protein in the monosomics but no effect on amylose content. The effect of chromosome 7B on the level of amylose suggested the presence of a regulator gene which suppresses the activities of the Wx genes.  相似文献   

20.
In the absence of interlogs, building docking models is a time intensive task, involving generation of a large pool of docking decoys followed by refinement and screening to identify near native docking solutions. This limits the researcher interested in building docking methods with the choice of benchmarking only a limited number of protein complexes. We have created a repository called dockYard (), that allows modelers interested in protein-protein interaction to access large volume of information on protein dimers and their interlogs, and also download decoys for their work if they are interested in building modeling methods. dockYard currently offers four categories of docking decoys derived from: Bound (native dimer co-crystallized), Unbound (individual subunits are crystallized, as well as the target dimer), Variants (match the previous two categories in at least one subunit with 100% sequence identity), and Interlogs (match the previous categories in at least one subunit with ≥90% or ≥50% sequence identity). The web service offers options for full or selective download based on search parameters. Our portal also serves as a repository to modelers who may want to share their decoy sets with the community.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号