首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
SUMMARY: PreDs is a WWW server that predicts the dsDNA-binding sites on protein molecular surfaces generated from the atomic coordinates in a PDB format. The prediction was done by evaluating the electrostatic potential, the local curvature and the global curvature on the surfaces. Results of the prediction can be interactively checked with our original surface viewer. AVAILABILITY: PreDs is available free of charge from http://pre-s.protein.osaka-u.ac.jp/~preds/ CONTACT: kino@ims.u-tokyo.ac.jp.  相似文献   

2.
MOTIVATION: Clustering of protein sequences is widely used for the functional characterization of proteins. However, it is still not easy to cluster distantly-related proteins, which have only regional similarity among their sequences. It is therefore necessary to develop an algorithm for clustering such distantly-related proteins. RESULTS: We have developed a time and space efficient clustering algorithm. It uses a graph representation where its vertices and edges denote proteins and their sequence similarities above a certain cutoff score, respectively. It repeatedly partitions the graph by removing edges that have small weights, which correspond to low sequence similarities. To find the appropriate partitions, we introduce a score combining the normalized cut and a locally minimal cut capacities. Our method is applied to the entire 40,703 human proteins in SWISS-PROT and TrEMBL. The resulting clusters shows a 76% recall (20,529 proteins) of the 26,917 classified by InterPro. It also finds relationships not found by other clustering methods. AVAILABILITY: The complete result of our algorithm for all the human proteins in SWISS-PROT and TrEMBL, and other supplementary information are available at http://motif.ics.es.osaka-u.ac.jp/Ncut-KL/  相似文献   

3.
Estimating transcription factor bindability on DNA.   总被引:4,自引:0,他引:4  
  相似文献   

4.
A hidden Markov model for progressive multiple alignment   总被引:4,自引:0,他引:4  
MOTIVATION: Progressive algorithms are widely used heuristics for the production of alignments among multiple nucleic-acid or protein sequences. Probabilistic approaches providing measures of global and/or local reliability of individual solutions would constitute valuable developments. RESULTS: We present here a new method for multiple sequence alignment that combines an HMM approach, a progressive alignment algorithm, and a probabilistic evolution model describing the character substitution process. Our method works by iterating pairwise alignments according to a guide tree and defining each ancestral sequence from the pairwise alignment of its child nodes, thus, progressively constructing a multiple alignment. Our method allows for the computation of each column minimum posterior probability and we show that this value correlates with the correctness of the result, hence, providing an efficient mean by which unreliably aligned columns can be filtered out from a multiple alignment.  相似文献   

5.
6.
A grid layout algorithm for automatic drawing of biochemical networks   总被引:4,自引:0,他引:4  
MOTIVATION: Visualization is indispensable in the research of complex biochemical networks. Available graph layout algorithms are not adequate for satisfactorily drawing such networks. New methods are required to visualize automatically the topological architectures and facilitate the understanding of the functions of the networks. RESULTS: We propose a novel layout algorithm to draw complex biochemical networks. A network is modeled as a system of interacting nodes on squared grids. A discrete cost function between each node pair is designed based on the topological relation and the geometric positions of the two nodes. The layouts are produced by minimizing the total cost. We design a fast algorithm to minimize the discrete cost function, by which candidate layouts can be produced efficiently. A simulated annealing procedure is used to choose better candidates. Our algorithm demonstrates its ability to exhibit cluster structures clearly in relatively compact layout areas without any prior knowledge. We developed Windows software to implement the algorithm for CADLIVE. AVAILABILITY: All materials can be freely downloaded from http://kurata21.bio.kyutech.ac.jp/grid/grid_layout.htm; http://www.cadlive.jp/ SUPPLEMENTARY INFORMATION: http://kurata21.bio.kyutech.ac.jp/grid/grid_layout.htm; http://www.cadlive.jp/  相似文献   

7.
Large-scale genome projects generate an unprecedented number of protein sequences, most of them are experimentally uncharacterized. Predicting the 3D structures of sequences provides important clues as to their functions. We constructed the Genomes TO Protein structures and functions (GTOP) database, containing protein fold predictions of a huge number of sequences. Predictions are mainly carried out with the homology search program PSI-BLAST, currently the most popular among high-sensitivity profile search methods. GTOP also includes the results of other analyses, e.g. homology and motif search, detection of transmembrane helices and repetitive sequences. We have completed analyzing the sequences of 41 organisms, with the number of proteins exceeding 120 000 in total. GTOP uses a graphical viewer to present the analytical results of each ORF in one page in a ‘color-bar’ format. The assigned 3D structures are presented by Chime plug-in or RasMol. The binding sites of ligands are also included, providing functional information. The GTOP server is available at http://spock.genes.nig.ac.jp/~genome/gtop.html.  相似文献   

8.
Accelerated off-target search algorithm for siRNA   总被引:7,自引:0,他引:7  
MOTIVATION: Designing highly effective short interfering RNA (siRNA) sequences with maximum target-specificity for mammalian RNA interference (RNAi) is one of the hottest topics in molecular biology. The relationship between siRNA sequences and RNAi activity has been studied extensively to establish rules for selecting highly effective sequences. However, there is a pressing need to compute siRNA sequences that minimize off-target silencing effects efficiently and to match any non-targeted sequences with mismatches. RESULTS: The enumeration of potential cross-hybridization candidates is non-trivial, because siRNA sequences are short, ca. 19 nt in length, and at least three mismatches with non-targets are required. With at least three mismatches, there are typically four or five contiguous matches, so that a BLAST search frequently overlooks off-target candidates. By contrast, existing accurate approaches are expensive to execute; thus we need to develop an accurate, efficient algorithm that uses seed hashing, the pigeonhole principle, and combinatorics to identify mismatch patterns. Tests show that our method can list potential cross-hybridization candidates for any siRNA sequence of selected human gene rapidly, outperforming traditional methods by orders of magnitude in terms of computational performance. AVAILABILITY: http://design.RNAi.jp CONTACT: yamada@cb.k.u-tokyo.ac.jp.  相似文献   

9.
DNA sequences are translated into protein coding sequences and then further assigned to protein families in metagenomic analyses, because of the need for sensitivity. However, huge amounts of sequence data create the problem that even general homology search analyses using BLASTX become difficult in terms of computational cost. We designed a new homology search algorithm that finds seed sequences based on the suffix arrays of a query and a database, and have implemented it as GHOSTX. GHOSTX achieved approximately 131–165 times acceleration over a BLASTX search at similar levels of sensitivity. GHOSTX is distributed under the BSD 2-clause license and is available for download at http://www.bi.cs.titech.ac.jp/ghostx/. Currently, sequencing technology continues to improve, and sequencers are increasingly producing larger and larger quantities of data. This explosion of sequence data makes computational analysis with contemporary tools more difficult. We offer this tool as a potential solution to this problem.  相似文献   

10.
MOTIVATION: To construct a multiple sequence alignment (MSA) of a large number (> approximately 10,000) of sequences, the calculation of a guide tree with a complexity of O(N2) to O(N3), where N is the number of sequences, is the most time-consuming process. RESULTS: To overcome this limitation, we have developed an approximate algorithm, PartTree, to construct a guide tree with an average time complexity of O(N log N). The new MSA method with the PartTree algorithm can align approximately 60,000 sequences in several minutes on a standard desktop computer. The loss of accuracy in MSA caused by this approximation was estimated to be several percent in benchmark tests using Pfam. AVAILABILITY: The present algorithm has been implemented in the MAFFT sequence alignment package (http://align.bmr.kyushu-u.ac.jp/mafft/software/). SUPPLEMENTARY INFORMATION: Supplementary information is available at Bioinformatics online.  相似文献   

11.
Domains are considered as the basic units of protein folding, evolution, and function. Decomposing each protein into modular domains is thus a basic prerequisite for accurate functional classification of biological molecules. Here, we present ADDA, an automatic algorithm for domain decomposition and clustering of all protein domain families. We use alignments derived from an all-on-all sequence comparison to define domains within protein sequences based on a global maximum likelihood model. In all, 90% of domain boundaries are predicted within 10% of domain size when compared with the manual domain definitions given in the SCOP database. A representative database of 249,264 protein sequences were decomposed into 450,462 domains. These domains were clustered on the basis of sequence similarities into 33,879 domain families containing at least two members with less than 40% sequence identity. Validation against family definitions in the manually curated databases SCOP and PFAM indicates almost perfect unification of various large domain families while contamination by unrelated sequences remains at a low level. The global survey of protein-domain space by ADDA confirms that most large and universal domain families are already described in PFAM and/or SMART. However, a survey of the complete set of mobile modules leads to the identification of 1479 new interesting domain families which shuffle around in multi-domain proteins. The data are publicly available at ftp://ftp.ebi.ac.uk/pub/contrib/heger/adda.  相似文献   

12.
MOTIVATION: Maximum likelihood-based methods to estimate site by site substitution rate variability in aligned homologous protein sequences rely on the formulation of a phylogenetic tree and generally assume that the patterns of relative variability follow a pre-determined distribution. We present a phylogenetic tree-independent method to estimate the relative variability of individual sites within large datasets of homologous protein sequences. It is based upon two simple assumptions. Firstly that substitutions observed between two closely related sequences are likely, in general, to occur at the most variable sites. Secondly that non-conservative amino acid substitutions tend to occur at more variable sites. Our methodology makes no assumptions regarding the underlying pattern of relative variability between sites. RESULTS: We have compared, using data simulated under a non-gamma distributed model, the performance of this approach to that of a maximum likelihood method that assumes gamma distributed rates. At low mean rates of evolution our method inferred site by site relative substitution rates more accurately than the maximum likelihood approach in the absence of prior assumptions about the relationships between sequences. Our method does not directly account for the effects of mutational saturation, However, we have incorporated an 'ad-hoc' modification that allows the accurate estimation of relative site variability in fast evolving and saturated datasets.  相似文献   

13.
Minai R  Matsuo Y  Onuki H  Hirota H 《Proteins》2008,72(1):367-381
Many drugs, even ones that are designed to act selectively on a target protein, bind unintended proteins. These unintended bindings can explain side effects or indicate additional mechanisms for a drug's medicinal properties. Structural similarity between binding sites is one of the reasons for binding to multiple targets. We developed a method for the structural alignment of atoms in the solvent-accessible surface of proteins that uses similarities in the local atomic environment, and carried out all-against-all structural comparisons for 48,347 potential ligand-binding regions from a nonredundant protein structure subset (nrPDB, provided by NCBI). The relationships between the similarity of ligand-binding regions and the similarity of the global structures of the proteins containing the binding regions were examined. We found 10,403 known ligand-binding region pairs whose structures were similar despite having different global folds. Of these, we detected 281 region pairs that had similar ligands with similar binding modes. These proteins are good examples of convergent evolution. In addition, we found a significant correlation between Z-score of structural similarity and true positive rate of "active" entries in the PubChem BioAssay database. Moreover, we confirmed the interaction between ibuprofen and a new target, porcine pancreatic elastase, by NMR experiment. Finally, we used this method to predict new drug-target protein interactions. We obtained 540 predictions for 105 drugs (e.g., captopril, lovastatin, flurbiprofen, metyrapone, and salicylic acid), and calculated the binding affinities using AutoDock simulation. The results of these structural comparisons are available at http://www.tsurumi.yokohama-cu.ac.jp/fold/database.html.  相似文献   

14.
15.
Gene recognition by combination of several gene-finding programs   总被引:8,自引:1,他引:7  
MOTIVATION: A number of programs have been developed to predict the eukaryotic gene structures in DNA sequences. However, gene finding is still a challenging problem. RESULTS: We have explored the effectiveness when the results of several gene-finding programs were re- analyzed and combined. We studied several methods with four programs (FEXH, GeneParser3, GEN-SCAN and GRAIL2). By HIGHEST-policy combination method or BOUNDARY method, approximate correlation (AC) improved by 3- 5% in comparison with the best single gene-finding program. From another viewpoint, OR-based combination of the four programs is the most reliable to know whether a candidate exon overlaps with the real exon or not, although it is less sensitive than GENSCAN for exon-intron boundaries. Our methods can easily be extended to combine other programs. AVAILABILITY: We have developed a server program (Shirokane System) and a client program (GeneScope) to use the methods. GeneScope is available through a WWW site (http://gf.genome.ad.jp/). CONTACT: katsu,takagi@ims.u-tokyo.ac.jp   相似文献   

16.
MOTIVATION: An unmanageably large amount of data on genome sequences is accumulating, prompting researchers to develop new methods to analyze them. We have devised a novel method designated oligostickiness, a measure roughly proportional to the binding affinity of an oligonucleotide to a DNA of interest, in order to analyze genome sequences as a whole. RESULTS: Fifteen representative genomes such as Bacillus subtilis, Escherichia coli, Saccharomyces cerevisiae, Caenorhabditis elegans, H. sapiens and others were analyzed by this method using more than 50 probe dodecanucleotides, offering the following findings: (i) Genome sequences can be specifically featured by way of oligostickiness maps. (ii) Oligostickiness analysis, which is similar to but more informative than (G + C) content or repetitive sequence analysis, can reveal intra-genomic structures such as mosaic structures (E. coli and B. subtilis) and highly sticky/non-sticky regions of biological meanings. (iii) Some probe oligonucleotides such as dC(12) and dT(12) can be used for classifying genomes, clearly discriminating prokaryotes and eukaryotes. (iv) Based on global oligostickiness, which is the average value of the local oligostickinesses, the features of a genome could be visualized in spider web mode. The pattern of a spider web as well as a set of oligostickiness maps is highly characteristic to each genome or chromosome. Thus, we called it as chromosome texture, leading to a finding that all the chromosomes contained in a cell, so far investigated, have a common texture. AVAILABILITY: Oligostickinesses maps used in this work are available at http://gp.fms.saitama-u.ac.jp/ CONTACT: koichi@fms.saitama-u.ac.jp  相似文献   

17.
18.
The hydrophobic cores of proteins predicted by wavelet analysis   总被引:7,自引:0,他引:7  
MOTIVATION: In the process of protein construction, buried hydrophobic residues tend to assemble in a core of a protein. Methods used to predict these cores involve use or no use of sequential alignment. In the case of a close homology, prediction was more accurate if sequential alignment was used. If the homology was weak, predictions would be unreliable. A hydrophobicity plot involving the hydropathy index is useful for purposes of prediction, and smoothing is essential. However, the proposed methods are insufficient. We attempted to predict hydrophobic cores with a low frequency extracted from the hydrophobicity plot, using wavelet analysis. RESULTS: The cores were predicted at a rate of 68.7%, by cross-validation. Using wavelet analysis, the cores of non-homologous proteins can be predicted with close to 70% accuracy, without sequential alignment. AVAILABILITY: The program used in this study is available from Intergalactic Reality (http://www.intergalact.com). CONTACT: hirakawa@grt.kyushu-u.ac.jp, kuhara@grt.kyushu-u.ac.jp  相似文献   

19.
ADAPTSITE: detecting natural selection at single amino acid sites.   总被引:12,自引:0,他引:12  
ADAPTSITE is a program package for detecting natural selection at single amino acid sites, using a multiple alignment of protein-coding sequences for a given phylogenetic tree. The program infers ancestral codons at all interior nodes, and computes the total numbers of synonymous (c(S)) and nonsynonymous (c(N)) substitutions as well as the average numbers of synonymous (s(S)) and nonsynonymous (s(N)) sites for each codon site. The probabilities of occurrence of synonymous and nonsynonymous substitutions are approximated by s(S) / (s(S) + s(N)) and s(N) / (s(S) + s(N)), respectively. The null hypothesis of selective neutrality is tested for each codon site, assuming a binomial distribution for the probability of obtaining c(S) and c(N). AVAILABILITY: ADAPTSITE is available free of charge at the World-Wide Web sites http://mep.bio.psu.edu/adaptivevol.html and http://www.cib.nig.ac.jp/dda/yossuzuk/welcome.html. The package includes the source code written in C, binary files for UNIX operating systems, manual, and example files.  相似文献   

20.
The phylogenetic inference of ancestral protein sequences is a powerful technique for the study of molecular evolution, but any conclusions drawn from such studies are only as good as the accuracy of the reconstruction method. Every inference method leads to errors in the ancestral protein sequence, resulting in potentially misleading estimates of the ancestral protein's properties. To assess the accuracy of ancestral protein reconstruction methods, we performed computational population evolution simulations featuring near-neutral evolution under purifying selection, speciation, and divergence using an off-lattice protein model where fitness depends on the ability to be stable in a specified target structure. We were thus able to compare the thermodynamic properties of the true ancestral sequences with the properties of “ancestral sequences” inferred by maximum parsimony, maximum likelihood, and Bayesian methods. Surprisingly, we found that methods such as maximum parsimony and maximum likelihood that reconstruct a “best guess” amino acid at each position overestimate thermostability, while a Bayesian method that sometimes chooses less-probable residues from the posterior probability distribution does not. Maximum likelihood and maximum parsimony apparently tend to eliminate variants at a position that are slightly detrimental to structural stability simply because such detrimental variants are less frequent. Other properties of ancestral proteins might be similarly overestimated. This suggests that ancestral reconstruction studies require greater care to come to credible conclusions regarding functional evolution. Inferred functional patterns that mimic reconstruction bias should be reevaluated.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号