首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Gene expression array technology has made possible the assay of expression levels of tens of thousands of genes at a time; large databases of such measurements are currently under construction. One important use of such databases is the ability to search for experiments that have similar gene expression levels as a query, potentially identifying previously unsuspected relationships among cellular states. Such searches depend crucially on the metric used to assess the similarity between pairs of experiments. The complex joint distribution of gene expression levels, particularly their correlational structure and non-normality, make simple similarity metrics such as Euclidean distance or correlational similarity scores suboptimal for use in this application. We present a similarity metric for gene expression array experiments that takes into account the complex joint distribution of expression values. We provide a computationally tractable approximation to this measure, and have implemented a database search tool based on it. We discuss implementation issues and efficiency, and we compare our new metric to other standard metrics.  相似文献   

3.
MOTIVATION: The large-scale comparison of protein-ligand binding sites is problematic, in that measures of structural similarity are difficult to quantify and are not easily understood in terms of statistical similarity that can ultimately be related to structure and function. We present a binding site matching score the Poisson Index (PI) based upon a well-defined statistical model. PI requires only the number of matching atoms between two sites and the size of the two sites-the same information used by the Tanimoto Index (TI), a comparable and widely used measure for molecular similarity. We apply PI and TI to a previously automatically extracted set of binding sites to determine the robustness and usefulness of both scores. RESULTS: We found that PI outperforms TI; moreover, site similarity is poorly defined for TI at values around the 99.5% confidence level for which PI is well defined. A difference map at this confidence level shows that PI gives much more meaningful information than TI. We show individual examples where TI fails to distinguish either a false or a true site paring in contrast to PI, which performs much better. TI cannot handle large or small sites very well, or the comparison of large and small sites, in contrast to PI that is shown to be much more robust. Despite the difficulty of determining a biological 'ground truth' for binding site similarity we conclude that PI is a suitable measure of binding site similarity and could form the basis for a binding site classification scheme comparable to existing protein domain classification schema.  相似文献   

4.
SUMMARY: We present a tool called MRSD (Metabolic Route Search and Design) to search and design routes based on the weighted compound transform diagraph. The search submodule returns routes between a source and product compound within seconds in the network of one or multiple organisms based on data from KEGG. The design submodule designs a route from an appointed compound in an interactive mode. The two complementary functions, Metabolic Route Search and Design, can be broadly used in biosynthesis, bio-pharmaceuticals and the other related fields. AVAILABILITY: bioinfo.ustc.edu.cn/softwares/MRSD/.  相似文献   

5.
MOTIVATION: In the present work we combine computational analysis and experimental data to explore the extent to which binding site similarities between members of the human cytosolic sulfotransferase family correlate with small-molecule binding profiles. Conversely, from a small-molecule point of view, we explore the extent to which structural similarities between small molecules correlate to protein binding profiles. RESULTS: The comparison of binding site structural similarities and small-molecule binding profiles shows that proteins with similar small-molecule binding profiles tend to have a higher degree of binding site similarity but the latter is not sufficient to predict small-molecule binding patterns, highlighting the difficulty of predicting small-molecule binding patterns from sequence or structure. Likewise, from a small-molecule perspective, small molecules with similar protein binding profiles tend to be topologically similar but topological similarity is not sufficient to predict their protein binding patterns. These observations have important consequences for function prediction and drug design.  相似文献   

6.
7.

Background  

Modelling the interaction between potentially antigenic peptides and Major Histocompatibility Complex (MHC) molecules is a key step in identifying potential T-cell epitopes. For Class II MHC alleles, the binding groove is open at both ends, causing ambiguity in the positional alignment between the groove and peptide, as well as creating uncertainty as to what parts of the peptide interact with the MHC. Moreover, the antigenic peptides have variable lengths, making naive modelling methods difficult to apply. This paper introduces a kernel method that can handle variable length peptides effectively by quantifying similarities between peptide sequences and integrating these into the kernel.  相似文献   

8.
9.
10.
Bilateral similarity function is designed for analyzing the similarities of biological sequences such as DNA, RNA secondary structure or protein in this paper. The defined function can perform comprehensive comparison between sequences remarkably well, both in terms of the Hamming distance of two compared sequences and the corresponding location difference. Compared with the existing methods for similarity analysis, the examination of similarities/dissimilarities illustrates that the proposed method with the computational complexity of O(N) is effective for these three kinds of biological sequences, and bears the universality for them.  相似文献   

11.
Based on 2D-connectivity molecular similarity and cluster analyses, a dataset for HSA binding is divided into the training set and the test set. 4D-fingerprint similarity measures were applied to this dataset. Four different predictive schemes (SM, SA, SR, and SC) were applied to the test set based on the similarity measures of each compound to the compounds in the training set. The first algorithmic scheme (SM), which only takes the most similar compound in the training set into consideration, predicts the binding affinity of a test compound. This scheme has relatively poor predictivity based on 4D-fingerprint similarity analyses. The other three algorithmic schemes (SM, SR, and SC), which assign a weighting coefficient to each of the top-ten most similar training set compounds, have reasonable predictivity of a test set. The algorithmic scheme which categorizes the most similar compounds into different weighted clusters predicts the test set best. The 4D-fingerprints provide 36 different individual IPE/IPE type molecular similarity measures. Further investigation shows that the NP/HA, HS/HA, and HA/HA IPE/IPE type measures predict the test set well. Moreover, these three IPE/IPE type similarity measures are very similar to one another for the particular training and test sets investigated. The 4D-fingerprints have relatively high predictivity for this particular dataset.  相似文献   

12.
A number of complementary methods have been developed for predicting protein-protein interaction sites. We sought to increase prediction robustness and accuracy by combining results from different predictors, and report here a meta web server, meta-PPISP, that is built on three individual web servers: cons-PPISP (http://pipe.scs.fsu.edu/ppisp.html), Promate (http://bioportal.weizmann.ac.il/promate), and PINUP (http://sparks.informatics.iupui.edu/PINUP/). A linear regression method, using the raw scores of the three servers as input, was trained on a set of 35 nonhomologous proteins. Cross validation showed that meta-PPISP outperforms all the three individual servers. At coverages identical to those of the individual methods, the accuracy of meta-PPISP is higher by 4.8 to 18.2 percentage points. Similar improvements in accuracy are also seen on CAPRI and other targets. AVAILABILITY: meta-PPISP can be accessed at http://pipe.scs.fsu.edu/meta-ppisp.html  相似文献   

13.
14.
ScanMoment is a webserver designed to identify the presence of the basic faced α‐helix (BFAH) motif in the nucleic acid binding sites of proteins. The program calculates the ’Basic Moment‘, a parameter that quantitizes the distribution of basic residues on the surface of an α‐helix. A sliding window is used to generate a plot displaying regions of the protein sequence that possesses a high Basic Moment and hus likely to possess a BFAH motif. The user may vary the periodicity from that of an alpha‐helix (100°), to those of other secondary structures such as beta sheets and 310 helices. The program can also plot the periodicity of basic residues in a protein sequence using a Fourier transformation. The procedure has been used to characterize the presence of BFAHs in the N‐terminal extensions of the eukaryotic aminoacyl‐tRNA synthetases and to indicate the presence of a BFAH in the tRNA binding site of alanyl‐tRNA synthetase.  相似文献   

15.
Issac B  Raghava GP 《BioTechniques》2002,33(3):548-50, 552, 554-6
Similarity searches are a powerful method for solving important biological problems such as database scanning, evolutionary studies, gene prediction, and protein structure prediction. FASTA is a widely used sequence comparison tool for rapid database scanning. Here we describe the GWFASTA server that was developed to assist the FASTA user in similarity searches against partially and/or completely sequenced genomes. GWFASTA consists of more than 60 microbial genomes, eight eukaryote genomes, and proteomes of annotatedgenomes. Infact, it provides the maximum number of databases for similarity searching from a single platform. GWFASTA allows the submission of more than one sequence as a single query for a FASTA search. It also provides integrated post-processing of FASTA output, including compositional analysis of proteins, multiple sequences alignment, and phylogenetic analysis. Furthermore, it summarizes the search results organism-wise for prokaryotes and chromosome-wise for eukaryotes. Thus, the integration of different tools for sequence analyses makes GWFASTA a powerful toolfor biologists.  相似文献   

16.
SUMMARY: We present an algorithmic tool for the identification of biologically significant amino acids in proteins of known three dimensional structure. We estimate the degree of purifying selection and positive Darwinian selection at each site and project these estimates onto the molecular surface of the protein. Thus, patches of functional residues (undergoing either positive or purifying selection), which may be discontinuous in the linear sequence, are revealed. We test for the statistical significance of the site-specific scores in order to obtain reliable and valid estimates. AVAILABILITY: The Selecton web server is available at: http://selecton.bioinfo.tau.ac.il SUPPLEMENTARY INFORMATION: More information is available at http://selecton.bioinfo.tau.ac.il/overview.html. A set of examples is available at http://selecton.bioinfo.tau.ac.il/gallery.html.  相似文献   

17.
SUMMARY: PreDs is a WWW server that predicts the dsDNA-binding sites on protein molecular surfaces generated from the atomic coordinates in a PDB format. The prediction was done by evaluating the electrostatic potential, the local curvature and the global curvature on the surfaces. Results of the prediction can be interactively checked with our original surface viewer. AVAILABILITY: PreDs is available free of charge from http://pre-s.protein.osaka-u.ac.jp/~preds/ CONTACT: kino@ims.u-tokyo.ac.jp.  相似文献   

18.
The current challenge in synthetic vaccine design is the development of a methodology to identify and test short antigen peptides as potential T-cell epitopes. Recently, we described a HLA-peptide binding model (using structural properties) capable of predicting peptides binding to any HLA allele. Consequently, we have developed a web server named T-EPITOPE DESIGNER to facilitate HLA-peptide binding prediction. The prediction server is based on a model that defines peptide binding pockets using information gleaned from X-ray crystal structures of HLA-peptide complexes, followed by the estimation of peptide binding to binding pockets. Thus, the prediction server enables the calculation of peptide binding to HLA alleles. This model is superior to many existing methods because of its potential application to any given HLA allele whose sequence is clearly defined. The web server finds potential application in T cell epitope vaccine design. AVAILABILITY: http://www.bioinformation.net/ted/  相似文献   

19.
20.
MOTIVATION: We consider the problem of finding similarities in protein structure databases. Current techniques sequentially compare the given query protein to all of the proteins in the database to find similarities. Therefore, the cost of similarity queries increases linearly as the volume of the protein databases increase. As the sizes of experimentally determined and theoretically estimated protein structure databases grow, there is a need for scalable searching techniques. RESULTS: Our techniques extract feature vectors on triplets of SSEs (Secondary Structure Elements). Later, these feature vectors are indexed using a multidimensional index structure. For a given query protein, this index structure is used to quickly prune away unpromising proteins in the database. The remaining proteins are then aligned using a popular alignment tool such as VAST. We also develop a novel statistical model to estimate the goodness of a match using the SSEs. Experimental results show that our techniques improve the pruning time of VAST 3 to 3.5 times while maintaining similar sensitivity.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号