共查询到20条相似文献,搜索用时 15 毫秒
1.
Detecting DNA-binding helix-turn-helix structural motifs using sequence and structure information
下载免费PDF全文

In this work, we analyse the potential for using structural knowledge to improve the detection of the DNA-binding helix–turn–helix (HTH) motif from sequence. Starting from a set of DNA-binding protein structures that include a functional HTH motif and have no apparent sequence similarity to each other, two different libraries of hidden Markov models (HMMs) were built. One library included sequence models of whole DNA-binding domains, which incorporate the HTH motif, the second library included shorter models of ‘partial’ domains, representing only the fraction of the domain that corresponds to the functionally relevant HTH motif itself. The libraries were scanned against a dataset of protein sequences, some containing the HTH motifs, others not. HMM predictions were compared with the results obtained from a previously published structure-based method and subsequently combined with it. The combined method proved more effective than either of the single-featured approaches, showing that information carried by motif sequences and motif structures are to some extent complementary and can successfully be used together for the detection of DNA-binding HTHs in proteins of unknown function. 相似文献
2.
Giri Narasimhan Changsong Bu Yuan Gao Xuning Wang Ning Xu Kalai Mathee 《Journal of computational biology》2002,9(5):707-720
We use methods from Data Mining and Knowledge Discovery to design an algorithm for detecting motifs in protein sequences. The algorithm assumes that a motif is constituted by the presence of a "good" combination of residues in appropriate locations of the motif. The algorithm attempts to compile such good combinations into a "pattern dictionary" by processing an aligned training set of protein sequences. The dictionary is subsequently used to detect motifs in new protein sequences. Statistical significance of the detection results are ensured by statistically determining the various parameters of the algorithm. Based on this approach, we have implemented a program called GYM. The Helix-Turn-Helix motif was used as a model system on which to test our program. The program was also extended to detect Homeodomain motifs. The detection results for the two motifs compare favorably with existing programs. In addition, the GYM program provides a lot of useful information about a given protein sequence. 相似文献
3.
A method for discerning protein structures containing the DNA-binding helix-turn-helix (HTH) motif has been developed. The method uses statistical models based on geometrical measurements of the motif. With a decision tree model, key structural features required for DNA binding were identified. These include a high average solvent-accessibility of residues within the recognition helix and a conserved hydrophobic interaction between the recognition helix and the second alpha helix preceding it. The Protein Data Bank was searched using a more accurate model of the motif created using the Adaboost algorithm to identify structures that have a high probability of containing the motif, including those that had not been reported previously. 相似文献
4.
M D Yudkin 《Protein engineering》1987,1(5):371-372
5.
6.
7.
This paper presents a simple program for interactive searchingfor nucleotide sequences that may code for the helixturnhelix,zinc finger or leucine zipper motifs in proteins. The helixturnhelixmotifs are predicted using the recently published method ofDodd and Egan, while zinc fingers and leucine zippers are searchedfor by our original methods. DNABIND is shown to detect allfour known helixturnhelix motifs in bacteriophagelambda genes and both zinc fingers of the adrl gene of yeast. 相似文献
8.
A method, called "protein blotting," for the detection of DNA-binding proteins is described. Proteins are separated on an SDA-polyacrylamide gel. The gel is sandwiched between 2 nitrocellulose filters and the proteins allowed to diffuse out of the gel and onto the filters. The proteins are tightly bound to each filter, producing a replica of the original gel pattern. The replica is used to detect DNA-binding proteins, RNA-binding proteins or histone-binding proteins by incubation of the filter with [32P]DNA, [125I]RNA, or [125I] histone. Evidence is also presented that specific protein-DNA interactions may be detected by this technique; under appropriate conditions, the lac repressor binds only to DNA containing the lac operator. Strategies for the detection of specific protein-DNA interactions are discussed. 相似文献
9.
Background
Automatic extraction of motifs from biological sequences is an important research problem in study of molecular biology. For proteins, it is desired to discover sequence motifs containing a large number of wildcard symbols, as the residues associated with functional sites are usually largely separated in sequences. Discovering such patterns is time-consuming because abundant combinations exist when long gaps (a gap consists of one or more successive wildcards) are considered. Mining algorithms often employ constraints to narrow down the search space in order to increase efficiency. However, improper constraint models might degrade the sensitivity and specificity of the motifs discovered by computational methods. We previously proposed a new constraint model to handle large wildcard regions for discovering functional motifs of proteins. The patterns that satisfy the proposed constraint model are called W-patterns. A W-pattern is a structured motif that groups motif symbols into pattern blocks interleaved with large irregular gaps. Considering large gaps reflects the fact that functional residues are not always from a single region of protein sequences, and restricting motif symbols into clusters corresponds to the observation that short motifs are frequently present within protein families. To efficiently discover W-patterns for large-scale sequence annotation and function prediction, this paper first formally introduces the problem to solve and proposes an algorithm named WildSpan (sequential pattern mining across large wildcard regions) that incorporates several pruning strategies to largely reduce the mining cost. 相似文献10.
Ronaghi M 《Analytical biochemistry》2000,286(2):282-288
In modern biology, there is a critical need to develop a high-throughput and inexpensive platform for DNA sequencing. Pyrosequencing is a nonelectrophoretic single-tube DNA sequencing method that takes advantage of cooperativity between four enzymes to monitor DNA synthesis. In these studies, single-stranded DNA-binding protein (SSB) was added to the primed DNA template prior to the Pyrosequencing reaction. The addition of SSB to a Pyrosequencing reaction system resulted in a read length of more than 30 nucleotides. Improvements were observed as: (i) increased efficiency of the enzymes, (ii) reduced mispriming, as measured by nonspecific signals, (iii) an increase in signal intensity during the reaction, (iv) higher accuracy in reading the number of identical adjacent nucleotides in difficult templates, and (v) longer reads. The usefulness of these results for future Pyrosequencing applications is discussed. 相似文献
11.
Ribosomal protein L7/L12 has a helix-turn-helix motif similar to that found in DNA-binding regulatory proteins. 总被引:3,自引:0,他引:3
下载免费PDF全文

Inspection of the structure of the C-terminal domain of ribosomal protein L7/L12 (1) reveals a helix-turn-helix motif similar to the one found in many DNA-binding regulatory proteins (2-5). The 19 alpha-carbon atoms of the L7/L12 alpha-helices superimpose on the DNA binding helices of CAP and cro with root-mean-square distances between corresponding alpha carbons of 1.45 and 1.55 A, respectively. These helices in L7/L12 are within a patch of highly conserved residues on the surface of L7/L12 whose role is as yet uncertain. We raise the possibility that they may constitute a binding site for nucleic acids, most probably RNA. Consistent with this hypothesis are calculations of the electrostatic charge potential surrounding the protein, which show a region of positive potential centered on the first of these helices. 相似文献
12.
13.
Facchiano AM 《Bioinformatics (Oxford, England)》2000,16(3):292-293
SUMMARY: HELM is a web tool designed to automate the analysis of protein sequences searching for alpha helix motifs. This analysis can be useful in protein engineering studies, aimed at the identification of regions to be modified in order to obtain more suitable features of local and/or global stability. AVAILABILITY: The tool is available to academic and commercial institutions at the URL http://crisceb.area.na.cnr.it/angelo/ PROTEIN_TOOLS/HELM/ CONTACT: angelo@crisceb.area.na.cnr.it 相似文献
14.
Allegra Via Pier Federico Gherardini Enrico Ferraro Gabriele Ausiello Gianpaolo Scalia Tomba Manuela Helmer-Citterich 《BMC bioinformatics》2007,8(1):68
Background
False occurrences of functional motifs in protein sequences can be considered as random events due solely to the sequence composition of a proteome. Here we use a numerical approach to investigate the random appearance of functional motifs with the aim of addressing biological questions such as: How are organisms protected from undesirable occurrences of motifs otherwise selected for their functionality? Has the random appearance of functional motifs in protein sequences been affected during evolution?Results
Here we analyse the occurrence of functional motifs in random sequences and compare it to that observed in biological proteomes; the behaviour of random motifs is also studied. Most motifs exhibit a number of false positives significantly similar to the number of times they appear in randomized proteomes (=expected number of false positives). Interestingly, about 3% of the analysed motifs show a different kind of behaviour and appear in biological proteomes less than they do in random sequences. In some of these cases, a mechanism of evolutionary negative selection is apparent; this helps to prevent unwanted functionalities which could interfere with cellular mechanisms.Conclusion
Our thorough statistical and biological analysis showed that there are several mechanisms and evolutionary constraints both of which affect the appearance of functional motifs in protein sequences.15.
A sensitive technique for protein sequence motif recognition based on neural networks has been developed. It involves three major steps. (1) At each appropriate alignment position of a set of N matched sequences, a set of N aligned oligopeptides is specified with preselected window length. N neural nets are subsequently and successively trained on N-1 amino acid spans after eliminating each ith oligopeptide. A test for recognition of each of the ith spans is performed. The average neural net recognition over N such trials is used as a measure of conservation for the particular windowed region of the multiple alignment. This process is repeated for all possible spans of given length in the multiple alignment. (2) The M most conserved regions are regarded as motifs and the oligopeptides within each are used to train intensively M individual neural networks. (3) The M networks are then applied in a search for related primary structures in a databank of known protein sequences. The oligopeptide spans in the database sequence with strongest neural net output for each of the M networks are saved and then scored according to the output signals and the proper combination that follows the expected N- to C-terminal sequence order. The motifs from the database with highest similarity scores can then be used to retrain the M neural nets, which can be subsequently utilized for further searches in the databank, thus providing even greater sensitivity to recognize distant familial proteins. This technique was successfully applied to the integrase, DNA-polymerase and immunoglobulin families. 相似文献
16.
17.
18.
19.
Functional analysis of genome sequences has largely ignored RNA genes and their structures. We introduce here the notion of 'ribonomics' to describe the search for the distribution of and eventually the determination of the physiological roles of these RNA structures found in the sequence databases. The utility of this approach is illustrated here by the identification in the GenBank database of RNA motifs having known binding or chemical activity. The frequency of these motifs indicates that most have originated from evolutionary drift and are selectively neutral. On the other hand, their distribution among species and their location within genes suggest that the destiny of these motifs may be more elaborate. For example, the hammerhead motif has a skewed organismal presence, is phylogenetically stable and recent work on a schistosome version confirms its in vivo biological activity. The under-representation of the valine-binding motif and the Rev-binding element in GenBank hints at a detrimental effect on cell growth or viability. Data on the presence and the location of these motifs may provide critical guidance in the design of experiments directed towards the understanding and the manipulation of RNA complexes and activities in vivo. 相似文献
20.
λ-Repressor-operator sites interaction, particularly O(R)1 and O(R)2, is a key component of the λ-genetic switch. FRET from the dansyl bound to the C-terminal domain of the protein, to the intercalated EtBr in the operator DNA indicates that the structure of the protein is more compact in the O(R)2 complex than in the O(R)1 complex. Fluorescence anisotropy reveals enhanced flexibility of the C-terminal domain of the repressor at fast timescales after complex formation with O(R)1. In contrast, O(R)2 bound repressor shows no significant enhancement of protein dynamics at these timescales. These differences are shown to be important for correct protein-protein interactions. Altered protein dynamics upon specific DNA sequence recognition may play important roles in assembly of regulatory proteins at the correct positions. 相似文献