首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
HIV-1 protease is a major drug target against AIDS as it permits viral maturation by processing the gag and pol polyproteins of the virus. The cleavage sites in these polyproteins do not have obvious sequence homology or a binding motif and the specificity of the protease is not easily determined. We used various threading approaches, together with the crystal structures of substrate complexes which served as template structures, to study the substrate specificity of HIV-1 protease with the aim of obtaining a better differentiation between binding and nonbinding sequences. The predictions from threading improved when distance-dependent interaction energy functions were used instead of contact matrices. To rank the peptides and properly account for the peptide's conformation in the total energy, the results from using short-range potentials on multiple template structures were averaged. Finally, a dynamic threading approach is introduced which is potentially useful for cases when there is only one template structure available. The conformational energy of the peptide-especially the term accounting for the side chains-was found to be important in differentiating between binding and nonbinding sequences. Hence, the substrate specificity, and thus the ability of the virus to mature, is affected by the compatibility of the substrate peptide to fit within the limited conformational space of the active site groove.  相似文献   

2.
Non-additivity in protein-DNA binding   总被引:3,自引:0,他引:3  
  相似文献   

3.
An analysis of the structure of DNA sites responsible for binding to glucocorticoid-receptor complex (GlRC) was carried out. The use of the frequency matrices and of a variant of the perception method made it possible to establish that in the GlRC binding site on both sides of the known conservative nucleotide sequence (nucleus) there were additional conservative elements which seemed to be able to modulate the efficiency of GlRC binding. A criterion is worked out for detecting the potential GlRC binding sites in given sequences. It is based on the simultaneous use of several perceptron matrices. The efficiency of detection of GlRC binding sites by means of the proposed criterion is by an order higher than that performed according to the GlRC binding site consensus (Beato et al. [2]).  相似文献   

4.
5.
MOTIVATION: Direct recognition, or direct readout, of DNA bases by a DNA-binding protein involves amino acids that interact directly with features specific to each base. Experimental evidence also shows that in many cases the protein achieves partial sequence specificity by indirect recognition, i.e., by recognizing structural properties of the DNA. (1) Could threading a DNA sequence onto a crystal structure of bound DNA help explain the indirect recognition component of sequence specificity? (2) Might the resulting pure-structure computational motif manifest itself in familiar sequence-based computational motifs? RESULTS: The starting structure motif was a crystal structure of DNA bound to the integration host factor protein (IHF) of E. coli. IHF is known to exhibit both direct and indirect recognition of its binding sites. (1) Threading DNA sequences onto the crystal structure showed statistically significant partial separation of 60 IHF binding sites from random and intragenic sequences and was positively correlated with binding affinity. (2) The crystal structure was shown to be equivalent to a linear Markov network, and so, to a joint probability distribution over sequences, computable in linear time. It was transformed algorithmically into several common pure-sequence representations, including (a) small sets of short exact strings, (b) weight matrices, (c) consensus regular patterns, (d) multiple sequence alignments, and (e) phylogenetic trees. In all cases the pure-sequence motifs retained statistically significant partial separation of the IHF binding sites from random and intragenic sequences. Most exhibited positive correlation with binding affinity. The multiple alignment showed some conserved columns, and the phylogenetic tree partially mixed low-energy sequences with IHF binding sites but separated high-energy sequences. The conclusion is that deformation energy explains part of indirect recognition, which explains part of IHF sequence-specific binding.  相似文献   

6.
We describe a new method for identifying the sequences that signal the start of translation, and the boundaries between exons and introns (donor and acceptor sites) in human mRNA. According to the mandatory keyword, ORGANISM, and feature key, CDS, a large set of standard data for each signal site was extracted from the ASCII flat file, gbpri.seq, in the GenBank release 108.0. This was used to generate the scoring matrices, which summarize the sequence information for each signal site. The scoring matrices take into account the independent nucleotide frequencies between adjacent bases in each position within the signal site regions, and the relative weight on each nucleotide in proportion to their probabilities in the known signal sites. Using a scoring scheme that is based on the nucleotide scoring matrices, the method has great sensitivity and specificity when used to locate signals in uncharacterized human genomic DNA. These matrices are especially effective at distinguishing true and false sites.  相似文献   

7.
8.
9.
Goonesekere NC  Lee B 《Proteins》2008,71(2):910-919
The sequence homology detection relies on score matrices, which reflect the frequency of amino acid substitutions observed in a dataset of homologous sequences. The substitution matrices in popular use today are usually constructed without consideration of the structural context in which the substitution takes place. Here, we present amino acid substitution matrices specific for particular polar-nonpolar environment of the amino acid. As expected, these matrices [context-specific substitution matrices (CSSMs)] show striking differences from the popular BLOSUM62 matrix, which does not include structural information. When incorporated into BLAST and PSI-BLAST, CSSM outperformed BLOSUM matrices as assessed by ROC curve analyses of the number of true and false hits and by the accuracy of the sequence alignments to the hit sequences. These findings are also of relevance to profile-profile-based methods of homology detection, since CSSMs may help build a better profile. Profiles generated for protein sequences in PDB using CSSM-PSI-BLAST will be made available for searching via RPSBLAST through our web site http://lmbbi.nci.nih.gov/.  相似文献   

10.
11.
Nucleic acid-based biochemical assays are crucial to modern biology. Key applications, such as detection of bacterial, viral and fungal pathogens, require detailed knowledge of assay sensitivity and specificity to obtain reliable results. Improved methods to predict assay performance are needed for exploiting the exponentially growing amount of DNA sequence data and for reducing the experimental effort required to develop robust detection assays. Toward this goal, we present an algorithm for the calculation of sequence similarity based on DNA thermodynamics. In our approach, search queries consist of one to three oligonucleotide sequences representing either a hybridization probe, a pair of Padlock probes or a pair of PCR primers with an optional TaqMantrade mark probe (i.e. in silico or 'virtual' PCR). Matches are reported if the query and target satisfy both the thermodynamics of the assay (binding at a specified hybridization temperature and/or change in free energy) and the relevant biological constraints (assay sequences binding to the correct target duplex strands in the required orientations). The sensitivity and specificity of our method is evaluated by comparing predicted to known sequence tagged sites in the human genome. Free energy is shown to be a more sensitive and specific match criterion than hybridization temperature.  相似文献   

12.
The Gibbs sampling method has been widely used for sequence analysis after it was successfully applied to the problem of identifying regulatory motif sequences upstream of genes. Since then, numerous variants of the original idea have emerged: however, in all cases the application has been to finding short motifs in collections of short sequences (typically less than 100 nucleotides long). In this paper, we introduce a Gibbs sampling approach for identifying genes in multiple large genomic sequences up to hundreds of kilobases long. This approach leverages the evolutionary relationships between the sequences to improve the gene predictions, without explicitly aligning the sequences. We have applied our method to the analysis of genomic sequence from 14 genomic regions, totaling roughly 1.8 Mb of sequence in each organism. We show that our approach compares favorably with existing ab initio approaches to gene finding, including pairwise comparison based gene prediction methods which make explicit use of alignments. Furthermore, excellent performance can be obtained with as little as four organisms, and the method overcomes a number of difficulties of previous comparison based gene finding approaches: it is robust with respect to genomic rearrangements, can work with draft sequence, and is fast (linear in the number and length of the sequences). It can also be seamlessly integrated with Gibbs sampling motif detection methods.  相似文献   

13.
MOTIVATION: DNA repeats are a common feature of most genomic sequences. Their de novo identification is still difficult despite being a crucial step in genomic analysis and oligonucleotides design. Several efficient algorithms based on word counting are available, but too short words decrease specificity while long words decrease sensitivity, particularly in degenerated repeats. RESULTS: The Repeat Analysis Program (RAP) is based on a new word-counting algorithm optimized for high resolution repeat identification using gapped words. Many different overlapping gapped words can be counted at the same genomic position, thus producing a better signal than the single ungapped word. This results in better specificity both in terms of low-frequency detection, being able to identify sequences repeated only once, and highly divergent detection, producing a generally high score in most intron sequences. AVAILABILITY: The program is freely available for non-profit organizations, upon request to the authors. CONTACT: giorgio.valle@unipd.it SUPPLEMENTARY INFORMATION: The program has been tested on the Caenorhabditis elegans genome using word lengths of 12, 14 and 16 bases. The full analysis has been implemented in the UCSC Genome Browser and is accessible at http://genome.cribi.unipd.it.  相似文献   

14.
15.
16.
Many important cellular protein interactions are mediated by peptide recognition domains. The ability to predict a domain's binding specificity directly from its primary sequence is essential to understanding the complexity of protein-protein interaction networks. One such recognition domain is the PDZ domain, functioning in scaffold proteins that facilitate formation of signaling networks. Predicting the PDZ domain's binding specificity was a part of the DREAM4 Peptide Recognition Domain challenge, the goal of which was to describe, as position weight matrices, the specificity profiles of five multi-mutant ERBB2IP-1 domains. We developed a method that derives multi-mutant binding preferences by generalizing the effects of single point mutations on the wild type domain's binding specificities. Our approach, trained on publicly available ERBB2IP-1 single-mutant phage display data, combined linear regression-based prediction for ligand positions whose specificity is determined by few PDZ positions, and single-mutant position weight matrix averaging for all other ligand columns. The success of our method as the winning entry of the DREAM4 competition, as well as its superior performance over a general PDZ-ligand binding model, demonstrates the advantages of training a model on a well-selected domain-specific data set.  相似文献   

17.
18.
Although in vitro selection technology is a versatile experimental tool for discovering novel synthetic RNA molecules, finding complex RNA molecules is difficult because most RNAs identified from random sequence pools are simple motifs, consistent with recent computational analysis of such sequence pools. Thus, enriching in vitro selection pools with complex structures could increase the probability of discovering novel RNAs. Here we develop an approach for engineering sequence pools that links RNA sequence space regions with corresponding structural distributions via a "mixing matrix" approach combined with a graph theory analysis. We define five classes of mixing matrices motivated by covariance mutations in RNA; these constructs define nucleotide transition rates and are applied to chosen starting sequences to yield specific nonrandom pools. We examine the coverage of sequence space as a function of the mixing matrix and starting sequence via clustering analysis. We show that, in contrast to random sequences, which are associated only with a local region of sequence space, our designed pools, including a structured pool for GTP aptamers, can target specific motifs. It follows that experimental synthesis of designed pools can benefit from using optimized starting sequences, mixing matrices, and pool fractions associated with each of our constructed pools as a guide. Automation of our approach could provide practical tools for pool design applications for in vitro selection of RNAs and related problems.  相似文献   

19.
Two gamma-glutamyl transpeptidase mRNAs (mRNAI and mRNAII), with alternate 5'-untranslated regions, are expressed in the rat kidney. Oligonucleotides were designed based upon these two alternate 5' sequences and used as primers to amplify GGT genomic DNA sequences. The genomic organization of the mRNAI and mRNAII 5'-untranslated sequences reveals that the mRNAs are encoded from two separate promoters which are 2.1 kbp apart on the single GGT gene. A 2775 base pair genomic sequence, which contains the proximal GGT promoter, was cloned from two overlapping amplified fragments. S1 mapping analysis shows that the kidney GGT mRNAI is transcribed from several start sites on this promoter which displays neither a classical TATA box nor Sp1 binding sites. Chimeric plasmids, including the GGT promoter region for mRNAI, associated with the chloramphenicol acetyltransferase (CAT) reporter gene, were transiently expressed in a kidney (LLCPK) and in a hepatoma (HTC) cell line. A sequence extending 308 bases upstream from the major GGT mRNAI start site drives a promoter activity which is 5-fold higher in LLCPK than in HTC cells and is sufficient to confer cell specificity to the GGT proximal promoter.  相似文献   

20.
All the detectable metallo-beta-lactamase fold proteins were identified in the publicly available sequence databases and complete genome sequences using iterative profile searches with the PSI-BLAST program and motif searches with position specific weight matrices. The catalytic site/mechanism and the corresponding structural elements were characterized for these proteins based on the available structure of the Bacillus zinc-dependent beta-lactamase. Based on pair-wise sequence and phylogenetic analysis an evolutionary classification for enzymes of this fold was developed and discussed in terms of implications for substrate specificity. Finally, some predicted inactive members which have been recruited for non-enzymatic functions such as microtubule binding in a cytoskeletal MAP1 are described.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号