共查询到20条相似文献,搜索用时 0 毫秒
1.
Existing methods for calculating semantic similarities between pairs of Gene Ontology (GO) terms and gene products often rely on external databases like Gene Ontology Annotation (GOA) that annotate gene products using the GO terms. This dependency leads to some limitations in real applications. Here, we present a semantic similarity algorithm (SSA), that relies exclusively on the GO. When calculating the semantic similarity between a pair of input GO terms, SSA takes into account the shortest path between them, the depth of their nearest common ancestor, and a novel similarity score calculated between the definitions of the involved GO terms. In our work, we use SSA to calculate semantic similarities between pairs of proteins by combining pairwise semantic similarities between the GO terms that annotate the involved proteins. The reliability of SSA was evaluated by comparing the resulting semantic similarities between proteins with the functional similarities between proteins derived from expert annotations or sequence similarity. Comparisons with existing state-of-the-art methods showed that SSA is highly competitive with the other methods. SSA provides a reliable measure for semantics similarity independent of external databases of functional-annotation observations. 相似文献
2.
Background
Signaling pathways can be reconstructed by identifying ‘effect types’ (i.e. activation/inhibition) of protein-protein interactions (PPIs). Effect types are composed of ‘directions’ (i.e. upstream/downstream) and ‘signs’ (i.e. positive/negative), thereby requiring directions as well as signs of PPIs to predict signaling events from PPI networks. Here, we propose a computational method for systemically annotating effect types to PPIs using relations between functional information of proteins.Results
We used regulates, positively regulates, and negatively regulates relations in Gene Ontology (GO) to predict directions and signs of PPIs. These relations indicate both directions and signs between GO terms so that we can project directions and signs between relevant GO terms to PPIs. Independent test results showed that our method is effective for predicting both directions and signs of PPIs. Moreover, our method outperformed a previous GO-based method that did not consider the relations between GO terms. We annotated effect types to human PPIs and validated several highly confident effect types against literature. The annotated human PPIs are available in Additional file 2 to aid signaling pathway reconstruction and network biology research.Conclusions
We annotated effect types to PPIs by using regulates, positively regulates, and negatively regulates relations in GO. We demonstrated that those relations are effective for predicting not only signs, but also directions of PPIs. The usefulness of those relations suggests their potential applications to other types of interactions such as protein-DNA interactions.3.
4.
Blattner FR 《Nucleic acids research》1984,12(2):1201-1202
A simple method is described for speeding up the computation of base pairing interractions. It is especially effective on microcomputers. 相似文献
5.
False positive reduction in protein-protein interaction predictions using gene ontology annotations 总被引:1,自引:0,他引:1
Background
Many crucial cellular operations such as metabolism, signalling, and regulations are based on protein-protein interactions. However, the lack of robust protein-protein interaction information is a challenge. One reason for the lack of solid protein-protein interaction information is poor agreement between experimental findings and computational sets that, in turn, comes from huge false positive predictions in computational approaches. Reduction of false positive predictions and enhancing true positive fraction of computationally predicted protein-protein interaction datasets based on highly confident experimental results has not been adequately investigated. 相似文献6.
Using an efficient iterative method, we have developed a distance-dependent knowledge-based scoring function to predict protein-protein interactions. The function, referred to as ITScore-PP, was derived using the crystal structures of a training set of 851 protein-protein dimeric complexes containing true biological interfaces. The key idea of the iterative method for deriving ITScore-PP is to improve the interatomic pair potentials by iteration, until the pair potentials can distinguish true binding modes from decoy modes for the protein-protein complexes in the training set. The iterative method circumvents the challenging reference state problem in deriving knowledge-based potentials. The derived scoring function was used to evaluate the ligand orientations generated by ZDOCK 2.1 and the native ligand structures on a diverse set of 91 protein-protein complexes. For the bound test cases, ITScore-PP yielded a success rate of 98.9% if the top 10 ranked orientations were considered. For the more realistic unbound test cases, the corresponding success rate was 40.7%. Furthermore, for faster orientational sampling purpose, several residue-level knowledge-based scoring functions were also derived following the similar iterative procedure. Among them, the scoring function that uses the side-chain center of mass (SCM) to represent a residue, referred to as ITScore-PP(SCM), showed the best performance and yielded success rates of 71.4% and 30.8% for the bound and unbound cases, respectively, when the top 10 orientations were considered. ITScore-PP was further tested using two other published protein-protein docking decoy sets, the ZDOCK decoy set and the RosettaDock decoy set. In addition to binding mode prediction, the binding scores predicted by ITScore-PP also correlated well with the experimentally determined binding affinities, yielding a correlation coefficient of R = 0.71 on a test set of 74 protein-protein complexes with known affinities. ITScore-PP is computationally efficient. The average run time for ITScore-PP was about 0.03 second per orientation (including optimization) on a personal computer with 3.2 GHz Pentium IV CPU and 3.0 GB RAM. The computational speed of ITScore-PP(SCM) is about an order of magnitude faster than that of ITScore-PP. ITScore-PP and/or ITScore-PP(SCM) can be combined with efficient protein docking software to study protein-protein recognition. 相似文献
7.
8.
MOTIVATION: Function annotation of an unclassified protein on the basis of its interaction partners is well documented in the literature. Reliable predictions of interactions from other data sources such as gene expression measurements would provide a useful route to function annotation. We investigate the global relationship of protein-protein interactions with gene expression. This relationship is studied in four evolutionarily diverse species, for which substantial information regarding their interactions and expression is available: human, mouse, yeast and Escherichia coli. RESULTS: In E.coli the expression of interacting pairs is highly correlated in comparison to random pairs, while in the other three species, the correlation of expression of interacting pairs is only slightly stronger than that of random pairs. To strengthen the correlation, we developed a protocol to integrate ortholog information into the interaction and expression datasets. In all four genomes, the likelihood of predicting protein interactions from highly correlated expression data is increased using our protocol. In yeast, for example, the likelihood of predicting a true interaction, when the correlation is > 0.9, increases from 1.4 to 9.4. The improvement demonstrates that protein interactions are reflected in gene expression and the correlation between the two is strengthened by evolution information. The results establish that co-expression of interacting protein pairs is more conserved than that of random ones. 相似文献
9.
MOTIVATION: The Gene Ontology (GO) is a widely used terminology for gene product characterization in, for example, interpretation of biology underlying microarray experiments. The current GO defines term relationships within each of the independent subontologies: molecular function, biological process and cellular component. However, it is evident that there also exist biological relationships between terms of different subontologies. Our aim was to connect the three subontologies to enable GO to cover more biological knowledge, enable a more consistent use of GO and provide new opportunities for biological reasoning. RESULTS: We propose a new structure, the Second Gene Ontology Layer, capturing biological relations not directly reflected in the present ontology structure. Given molecular functions, these paths identify biological processes where the molecular functions are involved and cellular components where they are active. The current Second Layer contains 6271 validated paths, covering 54% of the molecular functions of GO and can be used to render existing gene annotation sets more complete and consistent. Applying Second Layer paths to a set of 4223 human genes, increased biological process annotations by 24% compared to publicly available annotations and reproduced 30% of them. AVAILABILITY: The Second GO is publicly available through the GO Annotation Toolbox (GOAT.no): http://www.goat.no. 相似文献
10.
Jiajie Peng Xuanshuo Zhang Weiwei Hui Junya Lu Qianqian Li Shuhui Liu Xuequn Shang 《BMC systems biology》2018,12(2):18
Background
Gene Ontology (GO) is one of the most popular bioinformatics resources. In the past decade, Gene Ontology-based gene semantic similarity has been effectively used to model gene-to-gene interactions in multiple research areas. However, most existing semantic similarity approaches rely only on GO annotations and structure, or incorporate only local interactions in the co-functional network. This may lead to inaccurate GO-based similarity resulting from the incomplete GO topology structure and gene annotations.Results
We present NETSIM2, a new network-based method that allows researchers to measure GO-based gene functional similarities by considering the global structure of the co-functional network with a random walk with restart (RWR)-based method, and by selecting the significant term pairs to decrease the noise information. Based on the EC number (Enzyme Commission)-based groups of yeast and Arabidopsis, evaluation test shows that NETSIM2 can enhance the accuracy of Gene Ontology-based gene functional similarity.Conclusions
Using NETSIM2 as an example, we found that the accuracy of semantic similarities can be significantly improved after effectively incorporating the global gene-to-gene interactions in the co-functional network, especially on the species that gene annotations in GO are far from complete.11.
MOTIVATION: The increasing availability of complete genome sequences provides excellent opportunity for the further development of tools for functional studies in proteomics. Several experimental approaches and in silico algorithms have been developed to cluster proteins into networks of biological significance that may provide new biological insights, especially into understanding the functions of many uncharacterized proteins. Among these methods, the phylogenetic profiles method has been widely used to predict protein-protein interactions. It involves the selection of reference organisms and identification of homologous proteins. Up to now, no published report has systematically studied the effects of the reference genome selection and the identification of homologous proteins upon the accuracy of this method. RESULTS: In this study, we optimized the phylogenetic profiles method by integrating phylogenetic relationships among reference organisms and sequence homology information to improve prediction accuracy. Our results revealed that the selection of the reference organisms set and the criteria for homology identification significantly are two critical factors for the prediction accuracy of this method. Our refined phylogenetic profiles method shows greater performance and potentially provides more reliable functional linkages compared with previous methods. 相似文献
12.
The dynein molecular motor is a highly complex enzyme containing up to 15 different protein components and consists of several distinct domains identifiable by electron microscopy. One of the current challenges is to understand the supramolecular organization of this motor and to determine the location and function of the various components. Recently, we have used covalent crosslinking by amine-selective reagents and a carbodiimide, which results in zero-length crosslink, to investigate protein-protein associations within Chlamydomonas flagellar dynein. This approach also has enabled us to identify previously undescribed interactions between the dynein arms and other components of the flagellar axoneme. In this report, we detail methods we have developed to probe intradynein and intraaxonemal interactions and discuss the variety of factors that need be addressed to perform a successful crosslinking experiment. 相似文献
13.
Guo Z Wang L Li Y Gong X Yao C Ma W Wang D Li Y Zhu J Zhang M Yang D Rao S Wang J 《Bioinformatics (Oxford, England)》2007,23(16):2121-2128
MOTIVATION: Current high-throughput protein-protein interaction (PPI) data do not provide information about the condition(s) under which the interactions occur. Thus, the identification of condition-responsive PPI sub-networks is of great importance for investigating how a living cell adapts to changing environments. RESULTS: In this article, we propose a novel edge-based scoring and searching approach to extract a PPI sub-network responsive to conditions related to some investigated gene expression profiles. Using this approach, what we constructed is a sub-network connected by the selected edges (interactions), instead of only a set of vertices (proteins) as in previous works. Furthermore, we suggest a systematic approach to evaluate the biological relevance of the identified responsive sub-network by its ability of capturing condition-relevant functional modules. We apply the proposed method to analyze a human prostate cancer dataset and a yeast cell cycle dataset. The results demonstrate that the edge-based method is able to efficiently capture relevant protein interaction behaviors under the investigated conditions. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. 相似文献
14.
Takahiro Morio Hiroyuki Adachi Kazuo Sutoh Kaichiro Yanagisawa Yoshimasa Tanaka 《Journal of plant research》1995,108(1):111-114
Using a plasmid pBsr2 which carries a blasticidin S-resistant gene, we have improved the method of REMI (restriction enzyme-mediated
integration) provided for insertional mutagenesis inDictyostelium discoideum (bsr-REMI). To confirm usefulness of thebsr-REMI, transformation efficiency, copy number of integrated DNA, and randomness of integration into genome were examined. 相似文献
15.
Neumüller RA Wirtz-Peitz F Lee S Kwon Y Buckner M Hoskins RA Venken KJ Bellen HJ Mohr SE Perrimon N 《Genetics》2012,190(3):931-940
In Drosophila collections of green fluorescent protein (GFP) trap lines have been used to probe the endogenous expression patterns of trapped genes or the subcellular localization of their protein products. Here, we describe a method, based on nonoverlapping, highly specific, shRNA transgenes directed against GFP, that extends the utility of these collections to loss-of-function studies. Furthermore, we used a MiMIC transposon to generate GFP traps in Drosophila cell lines with distinct subcellular localization patterns, which will permit high-throughput screens using fluorescently tagged proteins. Finally, we show that fluorescent traps, paired with recombinant nanobodies and mass spectrometry, allow the study of endogenous protein complexes in Drosophila. 相似文献
16.
Protein-protein interactions (PPIs) are frequently mediated by the binding of a modular domain in one protein to a short, linear peptide motif in its partner. The advent of proteomic methods such as peptide and protein arrays has led to the accumulation of a wealth of interaction data for modular interaction domains. Although several computational programs have been developed to predict modular domain-mediated PPI events, they are often restricted to a given domain type. We describe DomPep, a method that can potentially be used to predict PPIs mediated by any modular domains. DomPep combines proteomic data with sequence information to achieve high accuracy and high coverage in PPI prediction. Proteomic binding data were employed to determine a simple yet novel parameter Ligand-Binding Similarity which, in turn, is used to calibrate Domain Sequence Identity and Position-Weighted-Matrix distance, two parameters that are used in constructing prediction models. Moreover, DomPep can be used to predict PPIs for both domains with experimental binding data and those without. Using the PDZ and SH2 domain families as test cases, we show that DomPep can predict PPIs with accuracies superior to existing methods. To evaluate DomPep as a discovery tool, we deployed DomPep to identify interactions mediated by three human PDZ domains. Subsequent in-solution binding assays validated the high accuracy of DomPep in predicting authentic PPIs at the proteome scale. Because DomPep makes use of only interaction data and the primary sequence of a domain, it can be readily expanded to include other types of modular domains. 相似文献
17.
The ability of a chimeric HP1-Polycomb (Pc) protein to bind both to heterochromatin and to euchromatic sites of Pc protein binding was exploited to detect stable protein-protein interactions in vivo. Previously, we showed that endogenous Pc protein was recruited to ectopic heterochromatic binding sites by the chimeric protein. Here, we examine the association of other Pc group (Pc-G) proteins. We show that Posterior sex combs (Psc) protein also is recruited to heterochromatin by the chimeric protein, demonstrating that Psc protein participates in direct protein-protein interaction with Pc protein or Pc-associated protein. In flies carrying temperature-sensitive alleles of Enhancer of zeste[E(z)] the general decondensation of polytene chromosomes that occurs at the restrictive temperature is associated with loss of binding of endogenous Pc and chimeric HP1-Polycomb protein to euchromatin, but binding of HP1 and chimeric HP1-Polycomb protein to the heterochromatin is maintained. The E(z) mutation also results in the loss of chimera-dependent binding to heterochromatin by endogenous Pc and Psc proteins at the restrictive temperature, suggesting that interaction of these proteins is mediated by E(z) protein. A myc-tagged full-length Suppressor 2 of zeste [Su(z)2] protein interacts poorly or not at all with ectopic Pc-G complexes, but a truncated Su(z)2 protein is strongly recruited to all sites of chimeric protein binding. Trithorax protein is not recruited to the heterochromatin by the chimeric HP1-Polycomb protein, suggesting either that this protein does not interact directly with Pc-G complexes or that such interactions are regulated. Ectopic binding of chimeric chromosomal proteins provides a useful tool for distinguishing specific protein-protein interactions from specific protein-DNA interactions important for complex assembly in vivo. 相似文献
18.
MOTIVATION: The inference of genes that are truly associated with inherited human diseases from a set of candidates resulting from genetic linkage studies has been one of the most challenging tasks in human genetics. Although several computational approaches have been proposed to prioritize candidate genes relying on protein-protein interaction (PPI) networks, these methods can usually cover less than half of known human genes. RESULTS: We propose to rely on the biological process domain of the gene ontology to construct a gene semantic similarity network and then use the network to infer disease genes. We show that the constructed network covers about 50% more genes than a typical PPI network. By analyzing the gene semantic similarity network with the PPI network, we show that gene pairs tend to have higher semantic similarity scores if the corresponding proteins are closer to each other in the PPI network. By analyzing the gene semantic similarity network with a phenotype similarity network, we show that semantic similarity scores of genes associated with similar diseases are significantly different from those of genes selected at random, and that genes with higher semantic similarity scores tend to be associated with diseases with higher phenotype similarity scores. We further use the gene semantic similarity network with a random walk with restart model to infer disease genes. Through a series of large-scale leave-one-out cross-validation experiments, we show that the gene semantic similarity network can achieve not only higher coverage but also higher accuracy than the PPI network in the inference of disease genes. 相似文献
19.
Prediction of protein-protein interaction is a difficult and important problem in biology. In this paper, we propose a new method based on an ensemble of K-local hyperplane distance nearest neighbor (HKNN) classifiers, where each HKNN is trained using a different physicochemical property of the amino acids. Moreover, we propose a new encoding technique that combines the amino acid indices together with the 2-Grams amino acid composition. A fusion of HKNN classifiers combined with the 'Sum rule' enables us to obtain an improvement over other state-of-the-art methods. The approach is demonstrated by building a learning system based on experimentally validated protein-protein interactions in human gastric bacterium Helicobacter pylori and in Human dataset. 相似文献
20.
Predicting the interactions between all the possible pairs of proteins in a given organism (making a protein-protein interaction map) is a crucial subject in bioinformatics. Most of the previous methods based on supervised machine learning use datasets containing approximately the same number of interacting pairs of proteins (positives) and non-interacting pairs of proteins (negatives) for training a classifier and are estimated to yield a large number of false positives. Thinking that the negatives used in previous studies cannot adequately represent all the negatives that need to be taken into account, we have developed a method based on multiple Support Vector Machines (SVMs) that uses more negatives than positives for predicting interactions between pairs of yeast proteins and pairs of human proteins. We show that the performance of a single SVM improved as we increased the number of negatives used for training and that, if more than one CPU is available, an approach using multiple SVMs is useful not only for improving the performance of classifiers but also for reducing the time required for training them. Our approach can also be applied to assessing the reliability of high-throughput interactions. 相似文献