首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
Extracting protein-protein interaction (PPI) from biomedical literature is an important task in biomedical text mining (BioTM). In this paper, we propose a hash subgraph pairwise (HSP) kernel-based approach for this task. The key to the novel kernel is to use the hierarchical hash labels to express the structural information of subgraphs in a linear time. We apply the graph kernel to compute dependency graphs representing the sentence structure for protein-protein interaction extraction task, which can efficiently make use of full graph structural information, and particularly capture the contiguous topological and label information ignored before. We evaluate the proposed approach on five publicly available PPI corpora. The experimental results show that our approach significantly outperforms all-path kernel approach on all five corpora and achieves state-of-the-art performance.  相似文献   

2.
The development of kernel-based inhomogeneous random graphs has provided models that are flexible enough to capture many observed characteristics of real networks, and that are also mathematically tractable. We specify a class of inhomogeneous random graph models, called random kernel graphs, that produces sparse graphs with tunable graph properties, and we develop an efficient generation algorithm to sample random instances from this model. As real-world networks are usually large, it is essential that the run-time of generation algorithms scales better than quadratically in the number of vertices n. We show that for many practical kernels our algorithm runs in time at most 𝒪(n(logn)2). As a practical example we show how to generate samples of power-law degree distribution graphs with tunable assortativity.  相似文献   

3.
Choosing an appropriate kernel is very important and critical when classifying a new problem with Support Vector Machine. So far, more attention has been paid on constructing new kernels and choosing suitable parameter values for a specific kernel function, but less on kernel selection. Furthermore, most of current kernel selection methods focus on seeking a best kernel with the highest classification accuracy via cross-validation, they are time consuming and ignore the differences among the number of support vectors and the CPU time of SVM with different kernels. Considering the tradeoff between classification success ratio and CPU time, there may be multiple kernel functions performing equally well on the same classification problem. Aiming to automatically select those appropriate kernel functions for a given data set, we propose a multi-label learning based kernel recommendation method built on the data characteristics. For each data set, the meta-knowledge data base is first created by extracting the feature vector of data characteristics and identifying the corresponding applicable kernel set. Then the kernel recommendation model is constructed on the generated meta-knowledge data base with the multi-label classification method. Finally, the appropriate kernel functions are recommended to a new data set by the recommendation model according to the characteristics of the new data set. Extensive experiments over 132 UCI benchmark data sets, with five different types of data set characteristics, eleven typical kernels (Linear, Polynomial, Radial Basis Function, Sigmoidal function, Laplace, Multiquadric, Rational Quadratic, Spherical, Spline, Wave and Circular), and five multi-label classification methods demonstrate that, compared with the existing kernel selection methods and the most widely used RBF kernel function, SVM with the kernel function recommended by our proposed method achieved the highest classification performance.  相似文献   

4.
Yang Z  Lin Y  Wu J  Tang N  Lin H  Li Y 《Proteomics》2011,11(19):3811-3817
Knowledge about protein-protein interactions (PPIs) unveils the molecular mechanisms of biological processes. However, the volume and content of published biomedical literature on protein interactions is expanding rapidly, making it increasingly difficult for interaction database curators to detect and curate protein interaction information manually. We present a multiple kernel learning-based approach for automatic PPI extraction from biomedical literature. The approach combines the following kernels: feature-based, tree, and graph and combines their output with Ranking support vector machine (SVM). Experimental evaluations show that the features in individual kernels are complementary and the kernel combined with Ranking SVM achieves better performance than those of the individual kernels, equal weight combination and optimal weight combination. Our approach can achieve state-of-the-art performance with respect to the comparable evaluations, with 64.88% F-score and 88.02% AUC on the AImed corpus.  相似文献   

5.
6.
Yue Cao  Yang Shen 《Proteins》2020,88(8):1091-1099
Structural information about protein-protein interactions, often missing at the interactome scale, is important for mechanistic understanding of cells and rational discovery of therapeutics. Protein docking provides a computational alternative for such information. However, ranking near-native docked models high among a large number of candidates, often known as the scoring problem, remains a critical challenge. Moreover, estimating model quality, also known as the quality assessment problem, is rarely addressed in protein docking. In this study, the two challenging problems in protein docking are regarded as relative and absolute scoring, respectively, and addressed in one physics-inspired deep learning framework. We represent protein and complex structures as intra- and inter-molecular residue contact graphs with atom-resolution node and edge features. And we propose a novel graph convolutional kernel that aggregates interacting nodes’ features through edges so that generalized interaction energies can be learned directly from 3D data. The resulting energy-based graph convolutional networks (EGCN) with multihead attention are trained to predict intra- and inter-molecular energies, binding affinities, and quality measures (interface RMSD) for encounter complexes. Compared to a state-of-the-art scoring function for model ranking, EGCN significantly improves ranking for a critical assessment of predicted interactions (CAPRI) test set involving homology docking; and is comparable or slightly better for Score_set, a CAPRI benchmark set generated by diverse community-wide docking protocols not known to training data. For Score_set quality assessment, EGCN shows about 27% improvement to our previous efforts. Directly learning from 3D structure data in graph representation, EGCN represents the first successful development of graph convolutional networks for protein docking.  相似文献   

7.
Drug-drug interaction (DDI) detection is particularly important for patient safety. However, the amount of biomedical literature regarding drug interactions is increasing rapidly. Therefore, there is a need to develop an effective approach for the automatic extraction of DDI information from the biomedical literature. In this paper, we present a Stacked Generalization-based approach for automatic DDI extraction. The approach combines the feature-based, graph and tree kernels and, therefore, reduces the risk of missing important features. In addition, it introduces some domain knowledge based features (the keyword, semantic type, and DrugBank features) into the feature-based kernel, which contribute to the performance improvement. More specifically, the approach applies Stacked generalization to automatically learn the weights from the training data and assign them to three individual kernels to achieve a much better performance than each individual kernel. The experimental results show that our approach can achieve a better performance of 69.24% in F-score compared with other systems in the DDI Extraction 2011 challenge task.  相似文献   

8.
近年来,越来越多的生物学实验研究表明,microRNA (miRNA)在人类复杂疾病的发展中发挥着重要作用。因此,预测miRNA与疾病之间的关联有助于疾病的准确诊断和有效治疗。由于传统的生物学实验是一种昂贵且耗时的方式,于是许多基于生物学数据的计算模型被提出来预测miRNA与疾病的关联。本研究提出了一种端到端的深度学习模型来预测miRNA-疾病关联关系,称为MDAGAC。首先,通过整合疾病语义相似性,miRNA功能相似性和高斯相互作用谱核相似性,构建miRNA和疾病的相似性图。然后,通过图自编码器和协同训练来改善标签传播的效果。该模型分别在miRNA图和疾病图上建立了两个图自编码器,并对这两个图自编码器进行了协同训练。miRNA图和疾病图上的图自编码器能够通过初始关联矩阵重构得分矩阵,这相当于在图上传播标签。miRNA-疾病关联的预测概率可以从得分矩阵得到。基于五折交叉验证的实验结果表明,MDAGAC方法可靠有效,优于现有的几种预测miRNA-疾病关联的方法。  相似文献   

9.
The structural chirality is an inherent feature of fully synthetic boron cluster compounds that sometimes exhibit unique biochemical effects. HPLC studies with zwitter-ionic cluster boron compounds and electrophoretic studies with boron cluster anions reveal that the chiral separability of these species is remarkably dissimilar to that of organic species, if uncharged cyclodextrins are used as chiral selectors. Furthermore, marked differences were found between the analytical characteristics of the chiral separations of the boron cluster species and those of the organic species with uncharged cyclodextrins. The present-day experimental database indicates that the rules valid for the chiral separations of the organic species cannot be applied to the chiral separations of the boron cluster species without experimental verification. The current extent of research work devoted to the investigation of chirality and chiral separations of boron cluster species is negligibly small in comparison with that devoted to the investigation of chirality and chiral separations of organic species. This makes difficult a reliable explanation of both the particularities observed in chiral separations of boron cluster species with cyclodextrins as chiral selectors and the strange effects related to these separations at the moment.  相似文献   

10.
This paper proposes a neural network model for prediction of olfactory glomerular activity aimed at future application to the evaluation of odor qualities. The model's input is the structure of an odorant molecule expressed as a labeled graph, and it employs the graph kernel method to quantify structural similarities between odorants and the function of olfactory receptor neurons. An artificial neural network then converts odorant molecules into glomerular activity expressed in Gaussian mixture functions. The authors also propose a learning algorithm that allows adjustment of the parameters included in the model using a learning data set composed of pairs of odorants and measured glomerular activity patterns. We observed that the defined similarity between odorant structure has correlation of 0.3-0.9 with that of glomerular activity. Glomerular activity prediction simulation showed a certain level of prediction ability where the predicted glomerular activity patterns also correlate the measured ones with middle to high correlation in average for data sets containing 363 odorants.  相似文献   

11.
Characterization of QTL for oil content in maize kernel   总被引:2,自引:0,他引:2  
Kernel oil content in maize is a complex quantitative trait. Phenotypic variation in kernel oil content can be dissected into its component traits such as oil metabolism and physical characteristics of the kernel, including embryo size and embryo-to-endosperm weight ratio (EEWR). To characterize quantitative trait loci (QTL) for kernel oil content, a recombinant inbred population derived from a cross between normal line B73 and high-oil line By804 was genotyped using 228 molecular markers and phenotyped for kernel oil content and its component traits [embryo oil content, embryo oil concentration, EEWR, embryo volume, embryo width, embryo length, and embryo width-to-length ratio (EWLR)]. A total of 58 QTL were identified for kernel oil content and its component traits in 26 genomic regions across all chromosomes. Eight main-effect QTL were identified for kernel oil content, embryo oil content, embryo oil concentration, EEWR, embryo weight, and EWLR, each accounting for over 10?% of the phenotypic variation in six genomic regions. Over 90?% of QTL identified for kernel oil content co-localized with QTL for component traits, validating their molecular contribution to kernel oil content. On chromosome 1, the QTL that had the largest effect on kernel oil content (qKO1-1) was associated with embryo width; on chromosome 9, the QTL for kernel oil content (qKO9) was related to EEWR (qEEWR9). Embryo oil concentration and embryo width were identified as the most important component traits controlling the second largest QTL for kernel oil content on chromosome 6 (qKO6) and a minor QTL for kernel oil content on chromosome 5 (qKO5-2), respectively. The dissection of kernel oil QTL will facilitate future cloning and/or functional validation of kernel oil content, and help to elucidate the genetic basis of kernel oil content in maize.  相似文献   

12.
Hodges JS  Carlin BP  Fan Q 《Biometrics》2003,59(2):317-322
Bayesian analyses of spatial data often use a conditionally autoregressive (CAR) prior, which can be written as the kernel of an improper density that depends on a precision parameter tau that is typically unknown. To include tau in the Bayesian analysis, the kernel must be multiplied by tau(k) for some k. This article rigorously derives k = (n - I)/2 for the L2 norm CAR prior (also called a Gaussian Markov random field model) and k = n - I for the L1 norm CAR prior, where n is the number of regions and I the number of "islands" (disconnected groups of regions) in the spatial map. Since I = 1 for a spatial structure defining a connected graph, this supports Knorr-Held's (2002, in Highly Structured Stochastic Systems, 260-264) suggestion that k = (n - 1)/2 in the L2 norm case, instead of the more common k = n/2. We illustrate the practical significance of our results using a periodontal example.  相似文献   

13.
One of the major challenges in single-cell data analysis is the determination of cellular developmental trajectories using single-cell data. Although substantial studies have been conducted in recent years, more effective methods are still strongly needed to infer the developmental processes accurately. This work devises a new method, named DTFLOW, for determining the pseudotemporal trajectories with multiple branches. DTFLOW consists of two major steps: a new method called Bhattacharyya kernel feature decomposition(BKFD) to reduce the data dimensions, and a novel approach named Reverse Searching on k-nearest neighbor graph(RSKG) to identify the multi-branching processes of cellular differentiation. In BKFD, we first establish a stationary distribution for each cell to represent the transition of cellular developmental states based on the random walk with restart algorithm, and then propose a new distance metric for calculating pseudotime of single cells by introducing the Bhattacharyya kernel matrix. The effectiveness of DTFLOW is rigorously examined by using four single-cell datasets. We compare the efficiency of DTFLOW with the published state-of-the-art methods. Simulation results suggest that DTFLOW has superior accuracy and strong robustness properties for constructing pseudotime trajectories. The Python source code of DTFLOW can be freely accessed at https://github.com/statway/DTFLOW.  相似文献   

14.
In brain imaging, solving learning problems in multi-subjects settings is difficult because of the differences that exist across individuals. Here we introduce a novel classification framework based on group-invariant graphical representations, allowing to overcome the inter-subject variability present in functional magnetic resonance imaging (fMRI) data and to perform multivariate pattern analysis across subjects. Our contribution is twofold: first, we propose an unsupervised representation learning scheme that encodes all relevant characteristics of distributed fMRI patterns into attributed graphs; second, we introduce a custom-designed graph kernel that exploits all these characteristics and makes it possible to perform supervised learning (here, classification) directly in graph space. The well-foundedness of our technique and the robustness of the performance to the parameter setting are demonstrated through inter-subject classification experiments conducted on both artificial data and a real fMRI experiment aimed at characterizing local cortical representations. Our results show that our framework produces accurate inter-subject predictions and that it outperforms a wide range of state-of-the-art vector- and parcel-based classification methods. Moreover, the genericity of our method makes it is easily adaptable to a wide range of potential applications. The dataset used in this study and an implementation of our framework are available at http://dx.doi.org/10.6084/m9.figshare.1086317.  相似文献   

15.
王凯荣  张磊 《应用生态学报》2008,19(12):2757-2762
花生既是世界主要的油料作物,又是重要的植物蛋白来源和食品加工原料.随着花生直接食用和食品加工的不断增加,国际上对花生籽粒Cd含量问题越来越关注.我国是世界上重要的花生生产国和出口国.近年来,花生Cd含量偏高已经成为制约我国出口贸易的重要因素.本文从花生籽粒Cd富集能力、花生Cd含量的种内差异、籽粒中Cd的分布规律、影响花生籽粒Cd积累的机制和降低花生籽粒Cd含量技术等方面,对花生Cd污染研究的现状与问题进行了论述.指出在花生Cd污染控制方面有2种策略可以考虑,一是降低花生对土壤Cd的吸收;二是控制Cd向籽粒的迁移富集.为此需要从3个方面加强对花生籽粒Cd积累机制的研究,即花生根系活性特征参数及其与籽粒Cd积累的关系;花生果荚Cd吸收机制及其对籽粒Cd含量的贡献;花生植株体内Cd迁移机制及其与籽粒Cd含量的关系.  相似文献   

16.
The directed Hamiltonian path (DHP) problem is one of the hard computational problems for which there is no practical algorithm on a conventional computer available. Many problems, including the traveling sales person problem and the longest path problem, can be translated into the DHP problem, which implies that an algorithm for DHP can also solve all the translated problems. To study the robustness of the laboratory protocol of the pioneering DNA computing for the DHP problem performed by Leonard Adleman (1994), we investigated how the graph size, multiplicity of the Hamiltonian paths, and the size of oligonucleotides that encode the vertices would affect the laboratory procedures. We applied Adleman's protocol with 18-mer oligonucleotide per node to a graph with 8 vertices and 14 edges containing two Hamiltonian paths (Adleman used 20-mer oligonucleotides for a graph with 7 nodes, 14 edges and one Hamiltonian path). We found that depending on the graph characteristics such as the number of short cycles, the oligonucleotide size, and the hybridization conditions that used to encode the graph, the protocol should be executed with different parameters from Adleman's.  相似文献   

17.
18.
Although chiral distinction plays a pervasive role in chemistry, a complete understanding of how this takes place is still lacking. In this work, we expand the earlier described minimal requirement of so called four‐point interactions (vide infra). We focus on chiral point charge model systems as a means to aid in the dissection of the underlying, operative principles. We also construct models with defined symmetry characteristics. By considering extensive constellations of diastereomeric complexes, we are able to identify emerging principles for chiral distinction. As previously postulated, all the diastereomeric complexes, regardless of their nominal contact‐points, possess a chiral distinction energy. In the comparison of complexes, we find that, contrary to chemical intuition, the magnitude of chiral distinction does not correlate with the stability of the complexes, i.e., consideration of low energy complexes may not be an effective way to evaluate chiral distinction. Similarly, we do not find a correlation between the number of contact‐points and chiral distinction. Moreover, favorable interactions and facile chiral distinction appear to be unrelated. We also see some tendency for greater chiral distinction in less symmetric systems, although this may not be general. These studies can now form the basis to fold in higher levels of complexity into the models so as to gain further insights into the nature of chiral distinction. Chirality, 2010. © 2009 Wiley‐Liss, Inc.  相似文献   

19.
普萘洛尔对映异构体诱导HUVEC细胞的蛋白质表达谱差异   总被引:2,自引:0,他引:2  
手性药物只能通过严格的手性识别才能选择性地与特定生物大分子相互作用,在药动学、药效学等方面上表现出手性特征.以非选择性β肾上腺素能受体阻滞剂普萘洛尔(PRO)的对映异构体R(+)/S(-)-PRO为模型药物,分别作用于人脐静脉内皮细胞(HUVEC),提取全细胞蛋白质,经双向电泳、MALDI-TOF-MS、SWISSPROT数据库分析鉴定差异表达蛋白质;共筛选出22个差异表达蛋白质点,鉴定了HSP86、HSP84、GRP75、KLC18、KBTB2、TGM2、GBLP、GCNT2、RAB36、KLH34等10种蛋白质.研究表明,PRO对映异构体可引起广泛的基因表达改变,涉及信号分子、代谢酶、骨架蛋白、伴侣蛋白等,且具有显著的手性特征,这可能与PRO显著的手性生物学特征有紧密联系,但仍需开展进一步深入研究,以明确产生PRO手性生物学特征的多种途径和机制.蛋白质组学技术为深入了解药物的手性生物学特征及其作用机制提供了新的思路和策略,对手性药物开发和临床合理用药有着重要的意义.  相似文献   

20.
A detailed computational study of a derivatized quinine chiral stationary phase (CSP) interacting with enantiomeric 3, 5-dinitrobenzoyl derivatives of leucine was carried out to understand where and how chiral discrimination takes place. The most stable structure of the CSP derived from a conformer search gave a structure whose geometry agrees with an X-ray structure (rmsd 0.6 A). The computed retention order and enantiodiscriminating free energy differences also agree with chromatographic data. The location and characteristics of the analyte binding site were assessed. An evaluation of total energies and intermolecular energies responsible for complex formation and for chiral discrimination was performed. Molecular dynamics trajectories of those intermolecular forces as well as distributions of the stabilizing and destabilizing forces are presented. A partitioning of the CSP into molecular fragments and the role each fragment plays in complexation and chiral recognition is also described.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号