首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
MOTIVATION: Direct recognition, or direct readout, of DNA bases by a DNA-binding protein involves amino acids that interact directly with features specific to each base. Experimental evidence also shows that in many cases the protein achieves partial sequence specificity by indirect recognition, i.e., by recognizing structural properties of the DNA. (1) Could threading a DNA sequence onto a crystal structure of bound DNA help explain the indirect recognition component of sequence specificity? (2) Might the resulting pure-structure computational motif manifest itself in familiar sequence-based computational motifs? RESULTS: The starting structure motif was a crystal structure of DNA bound to the integration host factor protein (IHF) of E. coli. IHF is known to exhibit both direct and indirect recognition of its binding sites. (1) Threading DNA sequences onto the crystal structure showed statistically significant partial separation of 60 IHF binding sites from random and intragenic sequences and was positively correlated with binding affinity. (2) The crystal structure was shown to be equivalent to a linear Markov network, and so, to a joint probability distribution over sequences, computable in linear time. It was transformed algorithmically into several common pure-sequence representations, including (a) small sets of short exact strings, (b) weight matrices, (c) consensus regular patterns, (d) multiple sequence alignments, and (e) phylogenetic trees. In all cases the pure-sequence motifs retained statistically significant partial separation of the IHF binding sites from random and intragenic sequences. Most exhibited positive correlation with binding affinity. The multiple alignment showed some conserved columns, and the phylogenetic tree partially mixed low-energy sequences with IHF binding sites but separated high-energy sequences. The conclusion is that deformation energy explains part of indirect recognition, which explains part of IHF sequence-specific binding.  相似文献   

2.
3.
TBP recognizes its target sites, TATA boxes, by recognizing their sequence-dependent structure and flexibility. Studying this mode of TATA-box recognition, termed ‘indirect readout’, is important for elucidating the binding mechanism in this system, as well as for developing methods to locate new binding sites in genomic DNA. We determined the binding stability and TBP-induced TATA-box bending for consensus-like TATA boxes. In addition, we calculated the individual information score of all studied sequences. We show that various non-additive effects exist in TATA boxes, dependent on their structural properties. By several criterions, we divide TATA boxes to two main groups. The first group contains sequences with 3–4 consecutive adenines. Sequences in this group have a rigid context-independent cooperative structure, best described by a nearest-neighbor non-additive model. Sequences in the second group have a flexible, context-dependent conformation, which cannot be described by an additive model or by a nearest-neighbor non-additive model. Classifying TATA boxes by these and other structural rules clarifies the different recognition pathways and binding mechanisms used by TBP upon binding to different TATA boxes. We discuss the structural and evolutionary sources of the difficulties in predicting new binding sites by probabilistic weight-matrix methods for proteins in which indirect readout is dominant.  相似文献   

4.
Although genome‐editing enzymes such as TALEN and CRISPR/Cas9 are being widely used, they have an essential limitation in that their relatively high‐molecular weight makes them difficult to be delivered to cells. To develop a novel genome‐editing enzyme with a smaller molecular weight, we focused on the engrailed homeodomain (EHD). We designed and constructed proteins composed of two EHDs connected by a linker to increase sequence specificity. In bacterial one‐hybrid assays and electrophoresis mobility shift assay analyses, the created proteins exhibited good affinity for DNA sequences consisting of two tandemly aligned EHD target sequences. However, they also bound to individual EHD targets. To avoid binding to single target sites, we introduced amino acid mutations to reduce the protein–DNA affinity of each EHD monomer and successfully created a small protein with high specificity for tandem EHD target sequences.  相似文献   

5.
6.
Characterization of recombinant murine leukemia virus integrase.   总被引:6,自引:6,他引:0       下载免费PDF全文
Retroviral integration involves two DNA substrates that play different roles. The viral DNA substrate is recognized by virtue of specific nucleotide sequences near the end of a double-stranded DNA molecule. The target DNA substrate is recognized at internal sites with little sequence preference; nucleosomal DNA appears to be preferred for this role. Despite this apparent asymmetry in the sequence, structure, and roles of the DNA substrates in the integration reaction, the existence of distinct binding sites for viral and target DNA substrates has been controversial. In this report, we describe the expression in Escherichia coli and purification of Moloney murine leukemia virus integrase as a fusion protein with glutathione S-transferase, characterization of its activity by using several model DNA substrates, and the initial kinetic characterization of its interactions with a model viral DNA substrate. We provide evidence for functionally and kinetically distinct binding sites for viral and target DNA substrates and describe a cross-linking assay for DNA binding at a site whose specificity is consistent with the target DNA binding site.  相似文献   

7.
8.
9.
Structure-based prediction of DNA target sites by regulatory proteins   总被引:15,自引:0,他引:15  
Kono H  Sarai A 《Proteins》1999,35(1):114-131
Regulatory proteins play a critical role in controlling complex spatial and temporal patterns of gene expression in higher organism, by recognizing multiple DNA sequences and regulating multiple target genes. Increasing amounts of structural data on the protein-DNA complex provides clues for the mechanism of target recognition by regulatory proteins. The analyses of the propensities of base-amino acid interactions observed in those structural data show that there is no one-to-one correspondence in the interaction, but clear preferences exist. On the other hand, the analysis of spatial distribution of amino acids around bases shows that even those amino acids with strong base preference such as Arg with G are distributed in a wide space around bases. Thus, amino acids with many different geometries can form a similar type of interaction with bases. The redundancy and structural flexibility in the interaction suggest that there are no simple rules in the sequence recognition, and its prediction is not straightforward. However, the spatial distributions of amino acids around bases indicate a possibility that the structural data can be used to derive empirical interaction potentials between amino acids and bases. Such information extracted from structural databases has been successfully used to predict amino acid sequences that fold into particular protein structures. We surmised that the structures of protein-DNA complexes could be used to predict DNA target sites for regulatory proteins, because determining DNA sequences that bind to a particular protein structure should be similar to finding amino acid sequences that fold into a particular structure. Here we demonstrate that the structural data can be used to predict DNA target sequences for regulatory proteins. Pairwise potentials that determine the interaction between bases and amino acids were empirically derived from the structural data. These potentials were then used to examine the compatibility between DNA sequences and the protein-DNA complex structure in a combinatorial "threading" procedure. We applied this strategy to the structures of protein-DNA complexes to predict DNA binding sites recognized by regulatory proteins. To test the applicability of this method in target-site prediction, we examined the effects of cognate and noncognate binding, cooperative binding, and DNA deformation on the binding specificity, and predicted binding sites in real promoters and compared with experimental data. These results show that target binding sites for several regulatory proteins are successfully predicted, and our data suggest that this method can serve as a powerful tool for predicting multiple target sites and target genes for regulatory proteins.  相似文献   

10.
Non-additive genetic variation is usually ignored when genome-wide markers are used to study the genetic architecture and genomic prediction of complex traits in human, wild life, model organisms or farm animals. However, non-additive genetic effects may have an important contribution to total genetic variation of complex traits. This study presented a genomic BLUP model including additive and non-additive genetic effects, in which additive and non-additive genetic relation matrices were constructed from information of genome-wide dense single nucleotide polymorphism (SNP) markers. In addition, this study for the first time proposed a method to construct dominance relationship matrix using SNP markers and demonstrated it in detail. The proposed model was implemented to investigate the amounts of additive genetic, dominance and epistatic variations, and assessed the accuracy and unbiasedness of genomic predictions for daily gain in pigs. In the analysis of daily gain, four linear models were used: 1) a simple additive genetic model (MA), 2) a model including both additive and additive by additive epistatic genetic effects (MAE), 3) a model including both additive and dominance genetic effects (MAD), and 4) a full model including all three genetic components (MAED). Estimates of narrow-sense heritability were 0.397, 0.373, 0.379 and 0.357 for models MA, MAE, MAD and MAED, respectively. Estimated dominance variance and additive by additive epistatic variance accounted for 5.6% and 9.5% of the total phenotypic variance, respectively. Based on model MAED, the estimate of broad-sense heritability was 0.506. Reliabilities of genomic predicted breeding values for the animals without performance records were 28.5%, 28.8%, 29.2% and 29.5% for models MA, MAE, MAD and MAED, respectively. In addition, models including non-additive genetic effects improved unbiasedness of genomic predictions.  相似文献   

11.
We show that the cAMP receptor protein (Crp) binds to DNA as several different conformers. This situation has precluded discovering a high correlation between any sequence property and binding affinity for proteins that bend DNA. Experimentally quantified affinities of Synechocystis sp. PCC 6803 cAMP receptor protein (SyCrp1), the Escherichia coli Crp (EcCrp, also CAP) and DNA were analyzed to mathematically describe, and make human-readable, the relationship of DNA sequence and binding affinity in a given system. Here, sequence logos and weight matrices were built to model SyCrp1 binding sequences. Comparing the weight matrix model to binding affinity revealed several distinct binding conformations. These Crp/DNA conformations were asymmetrical (non-palindromic).  相似文献   

12.
13.
14.
15.
We investigate the conservation of amino acid residue sequences in 21 DNA-binding protein families and study the effects that mutations have on DNA-sequence recognition. The observations are best understood by assigning each protein family to one of three classes: (i) non-specific, where binding is independent of DNA sequence; (ii) highly specific, where binding is specific and all members of the family target the same DNA sequence; and (iii) multi-specific, where binding is also specific, but individual family members target different DNA sequences. Overall, protein residues in contact with the DNA are better conserved than the rest of the protein surface, but there is a complex underlying trend of conservation for individual residue positions. Amino acid residues that interact with the DNA backbone are well conserved across all protein families and provide a core of stabilising contacts for homologous protein-DNA complexes. In contrast, amino acid residues that interact with DNA bases have variable levels of conservation depending on the family classification. In non-specific families, base-contacting residues are well conserved and interactions are always found in the minor groove where there is little discrimination between base types. In highly specific families, base-contacting residues are highly conserved and allow member proteins to recognise the same target sequence. In multi-specific families, base-contacting residues undergo frequent mutations and enable different proteins to recognise distinct target sequences. Finally, we report that interactions with bases in the target sequence often follow (though not always) a universal code of amino acid-base recognition and the effects of amino acid mutations can be most easily understood for these interactions.  相似文献   

16.
The TATA box-binding protein (TBP) recognizes its target sites (TATA boxes) by indirectly reading the DNA sequence through its conformation effects (indirect readout). Here, we explore the molecular mechanisms underlying indirect readout of TATA boxes by TBP by studying the binding of TBP to adenovirus major late promoter (AdMLP) sequence variants, including alterations inside as well as in the sequences flanking the TATA box. We measure here the dissociation kinetics of complexes of TBP with AdMLP targets and, by phase-sensitive assay, the intrinsic bending in the TATA box sequences as well as the bending of the same sequence induced by TBP binding. In these experiments we observe a correlation of the kinetic stability to sequence changes within the TATA recognition elements. Comparison of the kinetic data with structural properties of TATA boxes in known crystalline TBP/TATA box complexes reveals several "signals" for TATA box recognition, which are both on the single base-pair level, as well as larger DNA tracts within the TATA recognition element. The DNA bending induced by TBP on its binding sites is not correlated to the stability of TBP/TATA box complexes. Moreover, we observe a significant influence on the kinetic stability of alteration in the region flanking the TATA box. This effect is limited however to target sites with alternating TA sequences, whereas the AdMLP target, containing an A tract, is not influenced by these changes.  相似文献   

17.
We report measurements of the relative binding affinity of CAP for DNA sequences which have been systematically mutated in the region flanking the consensus binding site. Our experiments focus on the locus one helical turn from the dyad axis where DNA bending toward the minor groove is induced upon C-AP binding. The binding free energy and extent of bending are moderately well correlated for the set of 56 sequences. Changes in binding affinity spanning a factor of about 50 could be accounted for by additive contributions of dinucleotides; with a few exceptions, the relative ranking of dinucleotide contributions to binding and bending are similar. We conclude that dinucleotides are the smallest independent unit required for quantitative interpretation of CAP-induced DNA bending and binding in the distal domains of the CAP consensus binding site. The imperfect correlation between binding strength and extent of bending implies that sequence changes affect protein binding strength not only by altering the DNA deformation energy required to form the complex, but also by affecting directly the free energy of interaction between protein and DNA.  相似文献   

18.
19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号