首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
3.
4.
We report an unsupervised structural motif discovery algorithm, FoldMiner, which is able to detect global and local motifs in a database of proteins without the need for multiple structure or sequence alignments and without relying on prior classification of proteins into families. Motifs, which are discovered from pairwise superpositions of a query structure to a database of targets, are described probabilistically in terms of the conservation of each secondary structure element's position and are used to improve detection of distant structural relationships. During each iteration of the algorithm, the motif is defined from the current set of homologs and is used both to recruit additional homologous structures and to discard false positives. FoldMiner thus achieves high specificity and sensitivity by distinguishing between homologous and nonhomologous structures by the regions of the query to which they align. We find that when two proteins of the same fold are aligned, highly conserved secondary structure elements in one protein tend to align to highly conserved elements in the second protein, suggesting that FoldMiner consistently identifies the same motif in members of a fold. Structural alignments are performed by an improved superposition algorithm, LOCK 2, which detects distant structural relationships by placing increased emphasis on the alignment of secondary structure elements. LOCK 2 obeys several properties essential in automated analysis of protein structure: It is symmetric, its alignments of secondary structure elements are transitive, its alignments of residues display a high degree of transitivity, and its scoring system is empirically found to behave as a metric.  相似文献   

5.
In this paper, a new immune genetic algorithm for motif discovery is proposed. The algorithm adopts concentration regulation mechanism to maintain the population diversity and vaccine mechanism to inhibit degeneracy during evolution. Experimental results have demonstrated the method's capacity to find known motifs in relatively long promoter sequences and multiple motifs within a single run.  相似文献   

6.
The evolution of 2, a C-4-methylcarbonate analogue of paclitaxel with minimal oral bioavailability and oral efficacy, into its C-3'-t-butyl-3'-N-t-butyloxycarbonyl analogue (15i), a novel taxane with oral efficacy in preclinical models that is comparable to iv administered paclitaxel, is described.  相似文献   

7.
8.
9.
10.
11.
12.
MOTIVATION: Despite the growing literature devoted to finding differentially expressed genes in assays probing different tissues types, little attention has been paid to the combinatorial nature of feature selection inherent to large, high-dimensional gene expression datasets. New flexible data analysis approaches capable of searching relevant subgroups of genes and experiments are needed to understand multivariate associations of gene expression patterns with observed phenotypes. RESULTS: We present in detail a deterministic algorithm to discover patterns of multivariate gene associations in gene expression data. The patterns discovered are differential with respect to a control dataset. The algorithm is exhaustive and efficient, reporting all existent patterns that fit a given input parameter set while avoiding enumeration of the entire pattern space. The value of the pattern discovery approach is demonstrated by finding a set of genes that differentiate between two types of lymphoma. Moreover, these genes are found to behave consistently in an independent dataset produced in a different laboratory using different arrays, thus validating the genes selected using our algorithm. We show that the genes deemed significant in terms of their multivariate statistics will be missed using other methods. AVAILABILITY: Our set of pattern discovery algorithms including a user interface is distributed as a package called Genes@Work. This package is freely available to non-commercial users and can be downloaded from our website (http://www.research.ibm.com/FunGen).  相似文献   

13.
14.
15.
16.
17.
18.
MOTIVATION: Most de novo motif identification methods optimize the motif model first and then separately test the statistical significance of the motif score. In the first stage, a motif abundance parameter needs to be specified or modeled. In the second stage, a Z-score or P-value is used as the test statistic. Error rates under multiple comparisons are not fully considered. Methodology: We propose a simple but novel approach, fdrMotif, that selects as many binding sites as possible while controlling a user-specified false discovery rate (FDR). Unlike existing iterative methods, fdrMotif combines model optimization [e.g. position weight matrix (PWM)] and significance testing at each step. By monitoring the proportion of binding sites selected in many sets of background sequences, fdrMotif controls the FDR in the original data. The model is then updated using an expectation (E)- and maximization (M)-like procedure. We propose a new normalization procedure in the E-step for updating the model. This process is repeated until either the model converges or the number of iterations exceeds a maximum. RESULTS: Simulation studies suggest that our normalization procedure assigns larger weights to the binding sites than do two other commonly used normalization procedures. Furthermore, fdrMotif requires only a user-specified FDR and an initial PWM. When tested on 542 high confidence experimental p53 binding loci, fdrMotif identified 569 p53 binding sites in 505 (93.2%) sequences. In comparison, MEME identified more binding sites but in fewer ChIP sequences than fdrMotif. When tested on 500 sets of simulated 'ChIP' sequences with embedded known p53 binding sites, fdrMotif, compared to MEME, has higher sensitivity with similar positive predictive value. Furthermore, fdrMotif is robust to noise: it selected nearly identical binding sites in data adulterated with 50% added background sequences and the unadulterated data. We suggest that fdrMotif represents an improvement over MEME. AVAILABILITY: C code can be found at: http://www.niehs.nih.gov/research/resources/software/fdrMotif/.  相似文献   

19.
20.
MOTIVATION: Motif discovery in sequential data is a problem of great interest and with many applications. However, previous methods have been unable to combine exhaustive search with complex motif representations and are each typically only applicable to a certain class of problems. RESULTS: Here we present a generic motif discovery algorithm (Gemoda) for sequential data. Gemoda can be applied to any dataset with a sequential character, including both categorical and real-valued data. As we show, Gemoda deterministically discovers motifs that are maximal in composition and length. As well, the algorithm allows any choice of similarity metric for finding motifs. Finally, Gemoda's output motifs are representation-agnostic: they can be represented using regular expressions, position weight matrices or any number of other models for any type of sequential data. We demonstrate a number of applications of the algorithm, including the discovery of motifs in amino acids sequences, a new solution to the (l,d)-motif problem in DNA sequences and the discovery of conserved protein substructures. AVAILABILITY: Gemoda is freely available at http://web.mit.edu/bamel/gemoda  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号