首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Kim S  Wang Z  Dalkilic M 《Proteins》2007,66(3):671-681
The motif prediction problem is to predict short, conserved subsequences that are part of a family of sequences, and it is a very important biological problem. Gibbs is one of the first successful motif algorithms and it runs very fast compared with other algorithms, and its search behavior is based on the well-studied Gibbs random sampling. However, motif prediction is a very difficult problem and Gibbs may not predict true motifs in some cases. Thus, the authors explored a possibility of improving the prediction accuracy of Gibbs while retaining its fast runtime performance. In this paper, the authors considered Gibbs only for proteins, not for DNA binding sites. The authors have developed iGibbs, an integrated motif search framework for proteins that employs two previous techniques of their own: one for guiding motif search by clustering sequences and another by pattern refinement. These two techniques are combined to a new double clustering approach to guiding motif search. The unique feature of their framework is that users do not have to specify the number of motifs to be predicted when motifs occur in different subsets of the input sequences since it automatically clusters input sequences into clusters and predict motifs from the clusters. Tests on the PROSITE database show that their framework improved the prediction accuracy of Gibbs significantly. Compared with more exhaustive search methods like MEME, iGibbs predicted motifs more accurately and runs one order of magnitude faster.  相似文献   

2.
3.
4.
5.
INCLUSive is a suite of algorithms and tools for the analysis of gene expression data and the discovery of cis-regulatory sequence elements. The tools allow normalization, filtering and clustering of microarray data, functional scoring of gene clusters, sequence retrieval, and detection of known and unknown regulatory elements using probabilistic sequence models and Gibbs sampling. All tools are available via different web pages and as web services. The web pages are connected and integrated to reflect a methodology and facilitate complex analysis using different tools. The web services can be invoked using standard SOAP messaging. Example clients are available for download to invoke the services from a remote computer or to be integrated with other applications. All services are catalogued and described in a web service registry. The INCLUSive web portal is available for academic purposes at http://www.esat.kuleuven.ac.be/inclusive.  相似文献   

6.
7.
8.
9.

Background  

With the advent of high throughput sequencing techniques, large amounts of sequencing data are readily available for analysis. Natural biological signals are intrinsically highly variable making their complete identification a computationally challenging problem. Many attempts in using statistical or combinatorial approaches have been made with great success in the past. However, identifying highly degenerate and long (>20 nucleotides) motifs still remains an unmet challenge as high degeneracy will diminish statistical significance of biological signals and increasing motif size will cause combinatorial explosion. In this report, we present a novel rule-based method that is focused on finding degenerate and long motifs. Our proposed method, named iTriplet, avoids costly enumeration present in existing combinatorial methods and is amenable to parallel processing.  相似文献   

10.
11.
A computer-based system termed MBIS (the Molecular Biological Information Service), written in FORTRAN77 and Digital Command Language (DCL) and running on a Digital Equipment Corporation VAX computer under the VMS operating system (V4.1) is in use at the Division of Molecular Biology. MBIS consists of three main sections: 1) The utility section, used by the system's manager to tailor the five commonly available databases so that they are useable by the applications programmes running on the system; 2) The retrieval section, used to find and extract specific sequences or bibliographic information, and 3) The analytical section, used to analyse and compare sequences either extracted from the databases or input by the user. The nucleotide databases maintained are GenBank, EMBL and PIR (Protein Identification Resource, National Biomedical Research Foundation) and the peptide databases are PIR and NEWAT. In addition, users can originate and maintain their own databases. Those programmes which feature graphics output are compatible with most emulators of the Tektronix 4010 terminal.  相似文献   

12.
13.
14.
SPXX, a frequent sequence motif in gene regulatory proteins   总被引:48,自引:0,他引:48  
A new DNA-binding unit, composed of four amino acid residues and common in gene regulatory proteins, is proposed. The occurrences of the sequences Ser-Pro-X-X (SPXX) and Thr-Pro-X-X (TPXX) in gene regulatory proteins are compared with those in general proteins. These sequences are found more frequently in gene regulatory proteins including homoeotic gene products, segmentation gene products, steroid hormone receptors and certain oncogene products, than they are in DNA-binding proteins that are not directly involved in gene regulation, such as the core histones, or in general proteins. It is therefore suggested that these sequences contribute to DNA-binding in a manner important for gene regulation. Amino acid residues characteristic of the types of proteins are found as the variable residues X: basic residues, Lys and Arg, in histones, H1 and sea urchin spermatogenous H2B; Tyr in RNA polymerase II; and Ser, Thr, Ala, Leu and Pro in other gene regulatory proteins S(T)PXX sequences are located on either side of other DNA-recognizing units such as Zn fingers, helix-turn-helices, and cores of histones. The structure of a S(T)PXX sequence is presumed to be a beta-turn I stabilized by two hydrogen bonds, and its potential mode of DNA-binding is discussed.  相似文献   

15.
It has recently been shown that certain oligodeoxynucleotides (ODNs) designed as catalytic DNA molecules (DNAzymes) exhibit potent cytotoxicity independent of RNA-cleavage activity in a number of cell lines. These cytotoxic ODNs all featured a 5′ G-rich sequence and induced cell death by a TLR9-independent mechanism. In this study, we examined the sequence and length dependence of ODNs for cytotoxicity. A G-rich sequence at the 5′ terminus of the molecule was necessary for cytotoxicity and the potency of ODNs with active 5′ sequences was length dependent. Cytotoxicity appeared to be generally independent of 3′ sequence composition, although 3′ sequences totally lacking G-nucleotides were mostly inactive. Nucleolin, elongation factor 1-alpha (eEF1A) and vimentin were identified as binding to a cytotoxic ODN (Dz13) using protein pull-down assays and LC-MS/MS. Although these proteins have previously been described to bind G-rich ODNs, the binding of eEF1A correlated with cytotoxicity, whereas binding of nucleolin and vimentin did not. Quiescent non-proliferating cells were resistant to cytotoxicity, indicating cytotoxicity may be cell cycle dependent. Although the exact mechanism of cytotoxicity remains unknown, marked potency of the longer (25nt) ODNs in particular, indicates the potential of these molecules for treatment of diseases associated with abnormal cell proliferation.  相似文献   

16.
MOTIVATION: Information about a particular protein or protein family is usually distributed among multiple databases and often in more than one entry in each database. Retrieval and organization of this information can be a laborious task. This task is complicated even further by the existence of alternative terms for the same concept. RESULTS: The PDB, SWISS-PROT, ENZYME, and CATH databases have been imported into a combined relational database, BIOMOLQUEST: A powerful search engine has been built using this database as a back end. The search engine achieves significant improvements in query performance by automatically utilizing cross-references between the legacy databases. The results of the queries are presented in an organized, hierarchical way.  相似文献   

17.
Zhu J  Xie L  Honig B 《Proteins》2006,65(2):463-479
In this article, we present an iterative, modular optimization (IMO) protocol for the local structure refinement of protein segments containing secondary structure elements (SSEs). The protocol is based on three modules: a torsion-space local sampling algorithm, a knowledge-based potential, and a conformational clustering algorithm. Alternative methods are tested for each module in the protocol. For each segment, random initial conformations were constructed by perturbing the native dihedral angles of loops (and SSEs) of the segment to be refined while keeping the protein body fixed. Two refinement procedures based on molecular mechanics force fields - using either energy minimization or molecular dynamics - were also tested but were found to be less successful than the IMO protocol. We found that DFIRE is a particularly effective knowledge-based potential and that clustering algorithms that are biased by the DFIRE energies improve the overall results. Results were further improved by adding an energy minimization step to the conformations generated with the IMO procedure, suggesting that hybrid strategies that combine both knowledge-based and physical effective energy functions may prove to be particularly effective in future applications.  相似文献   

18.
SUMMARY: CisML is an XML-based format for sequence motif detection software. This proposed standard is applicable to many types of sequence motif detection programs. It is intended to facilitate the integration of data and the comparison of results from different software packages, and to simplify the development of downstream tools. XSL stylesheets are provided for easy generation of text, html and graphical reports from CisML-formatted data. AVAILABILITY: http://zlab.bu.edu/CisML/ SUPPLEMENTARY INFORMATION: Example CisML-formatted data and XSL stylesheets for report generation are available along with the sample output.  相似文献   

19.
Based on a data-base search, the sequences of 32 Bovidae retroposon elements have been compared. Two conserved areas are identified, and one of the corresponding sequences of the derived bovine consensus was used to design oligonucleotides as primer molecules for random DNA amplification of Bovidae DNA. Such a primer binding site should occur on average every 10,000 bp in the bovine genome, as suggested by a survey of published sequences. This estimate about the distribution of these possible primer binding sites was experimentally substantiated by mapping four of these primer binding sites within 40 kb of contiguous bovine DNA, carrying the heretofore undescribed bovine lactoferrin gene. Furthermore, these conserved, ubiquitous sequence motifs prove to be useful for mapping of bovine DNA.  相似文献   

20.
The detection and alignment of locally conserved regions (motifs) in multiple sequences can provide insight into protein structure, function, and evolution. A new Gibbs sampling algorithm is described that detects motif-encoding regions in sequences and optimally partitions them into distinct motif models; this is illustrated using a set of immunoglobulin fold proteins. When applied to sequences sharing a single motif, the sampler can be used to classify motif regions into related submodels, as is illustrated using helix-turn-helix DNA-binding proteins. Other statistically based procedures are described for searching a database for sequences matching motifs found by the sampler. When applied to a set of 32 very distantly related bacterial integral outer membrane proteins, the sampler revealed that they share a subtle, repetitive motif. Although BLAST (Altschul SF et al., 1990, J Mol Biol 215:403-410) fails to detect significant pairwise similarity between any of the sequences, the repeats present in these outer membrane proteins, taken as a whole, are highly significant (based on a generally applicable statistical test for motifs described here). Analysis of bacterial porins with known trimeric beta-barrel structure and related proteins reveals a similar repetitive motif corresponding to alternating membrane-spanning beta-strands. These beta-strands occur on the membrane interface (as opposed to the trimeric interface) of the beta-barrel. The broad conservation and structural location of these repeats suggests that they play important functional roles.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号