首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
3.
A detailed analysis of the DNA-binding sites of 26 proteins is presented using data from the Nucleic Acid Database (NDB) and the Protein Data Bank (PDB). Chemical and physical properties of the protein-DNA interface, such as polarity, size, shape, and packing, were analysed. The DNA-binding sites shared common features, comprising many discontinuous sequence segments forming hydrophilic surfaces capable of direct and water-mediated hydrogen bonds. These interface sites were compared to those of protein-protein binding sites, revealing them to be more polar, with many more intermolecular hydrogen bonds and buried water molecules than the protein-protein interface sites. By looking at the number and positioning of protein residue-DNA base interactions in a series of interaction footprints, three modes of DNA binding were identified (single-headed, double-headed and enveloping). Six of the eight enzymes in the data set bound in the enveloping mode, with the protein presenting a large interface area effectively wrapped around the DNA.A comparison of structural parameters of the DNA revealed that some values for the bound DNA (including twist, slide and roll) were intermediate of those observed for the unbound B-DNA and A-DNA. The distortion of bound DNA was evaluated by calculating a root-mean-square deviation on fitting to a canonical B-DNA structure. Major distortions were commonly caused by specific kinks in the DNA sequence, some resulting in the overall bending of the helix. The helix bending affected the dimensions of the grooves in the DNA, allowing the binding of protein elements that would otherwise be unable to make contact. From this structural analysis a preliminary set of rules that govern the bending of the DNA in protein-DNA complexes, are proposed.  相似文献   

4.
Feature selection from DNA microarray data is a major challenge due to high dimensionality in expression data. The number of samples in the microarray data set is much smaller compared to the number of genes. Hence the data is improper to be used as the training set of a classifier. Therefore it is important to select features prior to training the classifier. It should be noted that only a small subset of genes from the data set exhibits a strong correlation with the class. This is because finding the relevant genes from the data set is often non-trivial. Thus there is a need to develop robust yet reliable methods for gene finding in expression data. We describe the use of several hybrid feature selection approaches for gene finding in expression data. These approaches include filtering (filter out the best genes from the data set) and wrapper (best subset of genes from the data set) phases. The methods use information gain (IG) and Pearson Product Moment Correlation (PPMC) as the filtering parameters and biogeography based optimization (BBO) as the wrapper approach. K nearest neighbour algorithm (KNN) and back propagation neural network are used for evaluating the fitness of gene subsets during feature selection. Our analysis shows that an impressive performance is provided by the IG-BBO-KNN combination in different data sets with high accuracy (>90%) and low error rate.  相似文献   

5.
The docking of repressor proteins to DNA starting from the unbound protein and model-built DNA coordinates is modeled computationally. The approach was evaluated on eight repressor/DNA complexes that employed different modes for protein/ DNA recognition. The global search is based on a protein-protein docking algorithm that evaluates shape and electrostatic complementarity, which was modified to consider the importance of electrostatic features in DNA-protein recognition. Complexes were then ranked by an empirical score for the observed amino acid /nucleotide pairings (i.e., protein-DNA pair potentials) derived from a database of 20 protein/DNA complexes. A good prediction had at least 65% of the correct contacts modeled. This approach was able to identify a good solution at rank four or better for three out of the eight complexes. Predicted complexes were filtered by a distance constraint based on experimental data defining the DNA footprint. This improved coverage to four out of eight complexes having a good model at rank four or better. The additional use of amino acid mutagenesis and phylogenetic data defining residues on the repressor resulted in between 2 and 27 models that would have to be examined to find a good solution for seven of the eight test systems. This study shows that starting with unbound coordinates one can predict three-dimensional models for protein/DNA complexes that do not involve gross conformational changes on association. Proteins 33:535–549, 1998. © 1998 Wiley-Liss, Inc.  相似文献   

6.
7.
BACKGROUND: We present a novel strategy for classification of DNA molecules using measurements from an alpha-Hemolysin channel detector. The proposed approach provides excellent classification performance for five different DNA hairpins that differ in only one base-pair. For multi-class DNA classification problems, practitioners usually adopt approaches that use decision trees consisting of binary classifiers. Finding the best tree topology requires exploring all possible tree topologies and is computationally prohibitive. We propose a computational framework based on feature primitives that eliminates the need of a decision tree of binary classifiers. In the first phase, we generate a pool of weak features from nanopore blockade current measurements by using HMM analysis, principal component analysis and various wavelet filters. In the next phase, feature selection is performed using AdaBoost. AdaBoost provides an ensemble of weak learners of various types learned from feature primitives. RESULTS AND CONCLUSION: We show that our technique, despite its inherent simplicity, provides a performance comparable to recent multi-class DNA molecule classification results. Unlike the approach presented by Winters-Hilt et al., where weaker data is dropped to obtain better classification, the proposed approach provides comparable classification accuracy without any need for rejection of weak data. A weakness of this approach, on the other hand, is the very "hands-on" tuning and feature selection that is required to obtain good generalization. Simply put, this method obtains a more informed set of features and provides better results for that reason. The strength of this approach appears to be in its ability to identify strong features, an area where further results are actively being sought.  相似文献   

8.
9.
A major challenge in the field of protein-protein docking is to discriminate between the many wrong and few near-native conformations, i.e. scoring. Here, we introduce combinatorial complex-type-dependent scoring functions for different types of protein-protein complexes, protease/inhibitor, antibody/antigen, enzyme/inhibitor and others. The scoring functions incorporate both physical and knowledge-based potentials, i.e. atomic contact energy (ACE), the residue pair potential (RP), electrostatic and van der Waals' interactions. For different type complexes, the weights of the scoring functions were optimized by the multiple linear regression method, in which only top 300 structures with ligand root mean square deviation (L_RMSD) less than 20 A from the bound (co-crystallized) docking of 57 complexes were used to construct a training set. We employed the bound docking studies to examine the quality of the scoring function, and also extend to the unbound (separately crystallized) docking studies and extra 8 protein-protein complexes. In bound docking of the 57 cases, the first hits of protease/inhibitor cases are all ranked in the top 5. For the cases of antibody/antigen, enzyme/inhibitor and others, there are 17/19, 5/6 and 13/15 cases with the first hits ranked in the top 10, respectively. In unbound docking studies, the first hits of 9/17 protease/inhibitor, 6/19 antibody/antigen, 1/6 enzyme/inhibitor and 6/15 others' complexes are ranked in the top 10. Additionally, for the extra 8 cases, the first hits of the two protease/inhibitor cases are ranked in the top for the bound and unbound test. For the two enzyme/inhibitor cases, the first hits are ranked 1st for bound test, and the 119th and 17th for the unbound test. For the others, the ranks of the first hits are the 1st for the bound test and the 12th for the 1WQ1 unbound test. To some extent, the results validated our divide-and-conquer strategy in the docking study, which might hopefully shed light on the prediction of protein-protein interactions.  相似文献   

10.
11.
12.
Several groups of investigators are using external detection of radiolabeled protein to study the flux of protein from plasma into the pulmonary interstitium. A basic assumption for these studies has been that the unbound (free) tracer concentration is small and insignificant. The purpose of this study is to evaluate how free tracer influences the determination of normalized slope index. A five-compartment model for the lung was used with transport equations for both unbound and bound nuclide flux. Parameters of the unbound and bound transport equations were varied to evaluate the sensitivity of normalized slope index to each parameter. The model was also compared with published protein flux data to investigate the validity of the transport model. Application of the model to external scan data provides a sensitive method for evaluating the flux of bound and unbound tracers into the pulmonary interstitium. We conclude that because the distribution volume for unbound tracer is large with respect to protein distribution volume, even a small amount of unbound tracer (2-5%) can create large errors in the determination of normalized slope index.  相似文献   

13.
Computational docking approaches are important as a source of protein-protein complexes structures and as a means to understand the principles of protein association. A key element in designing better docking approaches, including search procedures, potentials, and scoring functions is their validation on experimentally determined structures. Thus, the databases of such structures (benchmark sets) are important. The previous, first release of the DOCKGROUND resource (Douguet et al., Bioinformatics 2006; 22:2612-2618) implemented a comprehensive database of cocrystallized (bound) protein-protein complexes in a relational database of annotated structures. The current release adds important features to the set of bound structures, such as regularly updated downloadable datasets: automatically generated nonredundant set, built according to most common criteria, and a manually curated set that includes only biological nonobligate complexes along with a number of additional useful characteristics. The main focus of the current release is unbound (experimental and simulated) protein-protein complexes. Complexes from the bound dataset are used to identify crystallized unbound analogs. If such analogs do not exist, the unbound structures are simulated by rotamer library optimization. Thus, the database contains comprehensive sets of complexes suitable for large scale benchmarking of docking algorithms. Advanced methodologies for simulating unbound conformations are being explored for the next release. The future releases will include datasets of modeled protein-protein complexes, and systematic sets of docking decoys obtained by different docking algorithms. The growing DOCKGROUND resource is designed to become a comprehensive public environment for developing and validating new docking methodologies.  相似文献   

14.
DNA‐binding proteins play critical roles in biological processes including gene expression, DNA packaging and DNA repair. They bind to DNA target sequences with different degrees of binding specificity, ranging from highly specific (HS) to nonspecific (NS). Alterations of DNA‐binding specificity, due to either genetic variation or somatic mutations, can lead to various diseases. In this study, a comparative analysis of protein–DNA complex structures was carried out to investigate the structural features that contribute to binding specificity. Protein–DNA complexes were grouped into three general classes based on degrees of binding specificity: HS, multispecific (MS), and NS. Our results show a clear trend of structural features among the three classes, including amino acid binding propensities, simple and complex hydrogen bonds, major/minor groove and base contacts, and DNA shape. We found that aspartate is enriched in HS DNA binding proteins and predominately binds to a cytosine through a single hydrogen bond or two consecutive cytosines through bidentate hydrogen bonds. Aromatic residues, histidine and tyrosine, are highly enriched in the HS and MS groups and may contribute to specific binding through different mechanisms. To further investigate the role of protein flexibility in specific protein–DNA recognition, we analyzed the conformational changes between the bound and unbound states of DNA‐binding proteins and structural variations. The results indicate that HS and MS DNA‐binding domains have larger conformational changes upon DNA‐binding and larger degree of flexibility in both bound and unbound states. Proteins 2016; 84:1147–1161. © 2016 Wiley Periodicals, Inc.  相似文献   

15.
We present an updated version of the protein–RNA docking benchmark, which we first published four years back. The non‐redundant protein–RNA docking benchmark version 2.0 consists of 126 test cases, a threefold increase in number compared to its previous version. The present version consists of 21 unbound–unbound cases, of which, in 12 cases, the unbound RNAs are taken from another complex. It also consists of 95 unbound–bound cases where only the protein is available in the unbound state. Besides, we introduce 10 new bound–unbound cases where only the RNA is found in the unbound state. Based on the degree of conformational change of the interface residues upon complex formation the benchmark is classified into 72 rigid‐body cases, 25 semiflexible cases and 19 full flexible cases. It also covers a wide range of conformational flexibility including small side chain movement to large domain swapping in protein structures as well as flipping and restacking in RNA bases. This benchmark should provide the docking community with more test cases for evaluating rigid‐body as well as flexible docking algorithms. Besides, it will also facilitate the development of new algorithms that require large number of training set. The protein–RNA docking benchmark version 2.0 can be freely downloaded from http://www.csb.iitkgp.ernet.in/applications/PRDBv2 . Proteins 2017; 85:256–267. © 2016 Wiley Periodicals, Inc.  相似文献   

16.
Baber JL  Levens D  Libutti D  Tjandra N 《Biochemistry》2000,39(20):6022-6032
The K homology (KH) motif is one of the major classes of nucleic acid binding proteins. Some members of this family have been shown to interact with DNA while others have RNA targets. There have been no reports containing direct experimental evidence regarding the nature of KH module-DNA interaction. In this study, the interaction of the C-terminal KH domain of heterogeneous nuclear ribonucleoprotein K (KH3) with its cognate single-stranded DNA (ssDNA) are investigated. Chemical shift perturbation mapping indicates that the first two helices, the conserved GxxG loop, beta 1, and beta 2, are the primary regions involved in DNA binding for KH3. The nature of the KH3-ssDNA interaction is further illuminated by a comparison of backbone 15N relaxation data for the bound and unbound KH3. Relaxation data are also used to confirm that the backbone of wild-type KH3 is structurally identical to that of the G26R mutant KH3, which was previously published. Amide proton exchange experiments indicate that the two helices involved in DNA binding are less stable than other regions of secondary structure and that a large portion of KH3 backbone amide hydrogens are protected in some manner upon ssDNA binding. The major backbone dynamics features of KH3 are similar to those of the structurally comparable human papillomavirus-31 E2 DNA binding domain. Secondary structure information for ssDNA-bound wild-type KH3 is also presented and shows that binding results in no global changes in the protein fold.  相似文献   

17.
Current feature selection methods for supervised classification of tissue samples from microarray data generally fail to exploit complementary discriminatory power that can be found in sets of features. Using a feature selection method with the computational architecture of the cross-entropy method, including an additional preliminary step ensuring a lower bound on the number of times any feature is considered, we show when testing on a human lymph node data set that there are a significant number of genes that perform well when their complementary power is assessed, but “pass under the radar” of popular feature selection methods that only assess genes individually on a given classification tool. We also show that this phenomenon becomes more apparent as diagnostic specificity of the tissue samples analysed increases.  相似文献   

18.
19.
Both Proteins and DNA undergo conformational changes in order to form functional complexes and also to facilitate interactions with other molecules. These changes have direct implications for the stability and specificity of the complex, as well as the cooperativity of interactions between multiple entities. In this work, we have extensively analyzed conformational changes in DNA‐binding proteins by superimposing DNA‐bound and unbound pairs of protein structures in a curated database of 90 proteins. We manually examined each of these pairs, unified the authors' annotations, and summarized our observations by classifying conformational changes into six structural categories. We explored a relationship between conformational changes and functional classes, binding motifs, target specificity, biophysical features of unbound proteins, and stability of the complex. In addition, we have also investigated the degree to which the intrinsic flexibility can explain conformational changes in a subset of 52 proteins with high quality coordinate data. Our results indicate that conformational changes in DNA‐binding proteins contribute significantly to both the stability of the complex and the specificity of targets recognized by them. We also conclude that most conformational changes occur in proteins interacting with specific DNA targets, even though unbound protein structures may have sufficient information to interact with DNA in a nonspecific manner. Proteins 2014; 82:841–857. © 2013 Wiley Periodicals, Inc.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号