共查询到20条相似文献,搜索用时 0 毫秒
1.
Background
Understanding the relationship between diseases based on the underlying biological mechanisms is one of the greatest challenges in modern biology and medicine. Exploring disease-disease associations by using system-level biological data is expected to improve our current knowledge of disease relationships, which may lead to further improvements in disease diagnosis, prognosis and treatment.Results
We took advantage of diverse biological data including disease-gene associations and a large-scale molecular network to gain novel insights into disease relationships. We analysed and compared four publicly available disease-gene association datasets, then applied three disease similarity measures, namely annotation-based measure, function-based measure and topology-based measure, to estimate the similarity scores between diseases. We systematically evaluated disease associations obtained by these measures against a statistical measure of comorbidity which was derived from a large number of medical patient records. Our results show that the correlation between our similarity measures and comorbidity scores is substantially higher than expected at random, confirming that our similarity measures are able to recover comorbidity associations. We also demonstrated that our predicted disease associations correlated with disease associations generated from genome-wide association studies significantly higher than expected at random. Furthermore, we evaluated our predicted disease associations via mining the literature on PubMed, and presented case studies to demonstrate how these novel disease associations can be used to enhance our current knowledge of disease relationships.Conclusions
We present three similarity measures for predicting disease associations. The strong correlation between our predictions and known disease associations demonstrates the ability of our measures to provide novel insights into disease relationships.Electronic supplementary material
The online version of this article (doi:10.1186/1471-2105-15-304) contains supplementary material, which is available to authorized users. 相似文献2.
3.
ABSTRACT: BACKGROUND: Dynamic Bayesian network (DBN) is among the mainstream approaches for modeling various biological networks, including the gene regulatory network (GRN). Most current methods for learning DBN employ either local search such as hill-climbing, or a meta stochastic global optimization framework such as genetic algorithm or simulated annealing, which are only able to locate sub-optimal solutions. Further, current DBN applications have essentially been limited to small sized networks. RESULTS: To overcome the above difficulties, we introduce here a deterministic global optimization based DBN approach for reverse engineering genetic networks from time course gene expression data. For such DBN models that consist only of inter time slice arcs, we show that there exists a polynomial time algorithm for learning the globally optimal network structure. The proposed approach, named GlobalMIT+, employs the recently proposed information theoretic scoring metric named mutual information test (MIT). GlobalMIT+ is able to learn high-order time delayed genetic interactions, which are common to most biological systems. Evaluation of the approach using both synthetic and real data sets, including a 733 cyanobacterial gene expression data set, shows significantly improved performance over other techniques. CONCLUSIONS: Our studies demonstrate that deterministic global optimization approaches can infer large scale genetic networks. 相似文献
4.
Recently, the use of the Bayesian network as an alternative to existing tools for similarity-based virtual screening has received noticeable attention from researchers in the chemoinformatics field. The main aim of the Bayesian network model is to improve the retrieval effectiveness of similarity-based virtual screening. To this end, different models of the Bayesian network have been developed. In our previous works, the retrieval performance of the Bayesian network was observed to improve significantly when multiple reference structures or fragment weightings were used. In this article, the authors enhance the Bayesian inference network (BIN) using the relevance feedback information. In this approach, a few high-ranking structures of unknown activity were filtered from the outputs of BIN, based on a single active reference structure, to form a set of active reference structures. This set of active reference structures was used in two distinct techniques for carrying out such BIN searching: reweighting the fragments in the reference structures and group fusion techniques. Simulated virtual screening experiments with three MDL Drug Data Report data sets showed that the proposed techniques provide simple ways of enhancing the cost-effectiveness of ligand-based virtual screening searches, especially for higher diversity data sets. 相似文献
5.
With the advances of network function virtualization and cloud computing technologies, a number of network services are implemented across data centers by creating a service chain using different virtual network functions (VNFs) running on virtual machines. Due to the complexity of network infrastructure, creating a service chain requires high operational cost especially in carrier-grade network service providers and supporting stringent QoS requirements from users is also a complicated task. There have been various research efforts to address these problems that only focus on one aspect of optimization goal either from users such as latency minimization and QoS based optimization, or from service providers such as resource optimization and cost minimization. However, meeting the requirements both from users and service providers efficiently is still a challenging issue. This paper proposes a VNF placement algorithm called VNF-EQ that allows users to meet their service latency requirements, while minimizing the energy consumption at the same time. The proposed algorithm is dynamic in a sense that the locations or the service chains of VNFs are reconfigured to minimize the energy consumption when the traffic passing through the chain falls below a pre-defined threshold. We use genetic algorithm to formulate this problem because it is a variation of the multi-constrained path selection problem known as NP-complete. The benchmarking results show that the proposed approach outperforms other heuristic algorithms by as much as 49% and reduces the energy consumptions by rearranging VNFs. 相似文献
6.
7.
Summary. The support vector machine, a machine-learning method, is used to predict the four structural classes, i.e. mainly α, mainly
β, α–β and fss, from the topology-level of CATH protein structure database. For the binary classification, any two structural
classes which do not share any secondary structure such as α and β elements could be classified with as high as 90% accuracy.
The accuracy, however, will decrease to less than 70% if the structural classes to be classified contain structure elements
in common. Our study also shows that the dimensions of feature space 202 = 400 (for dipeptide) and 203 = 8 000 (for tripeptide) give nearly the same prediction accuracy. Among these 4 structural classes, multi-class classification
gives an overall accuracy of about 52%, indicating that the multi-class classification technique in support of vector machines
may still need to be further improved in future investigation. 相似文献
8.
Murtada Khalafallah Elbashir Jianxin Wang Fang-Xiang Wu Lusheng Wang 《Proteome science》2013,11(Z1):S5
Background
β-turns are secondary structure type that have essential role in molecular recognition, protein folding, and stability. They are found to be the most common type of non-repetitive structures since 25% of amino acids in protein structures are situated on them. Their prediction is considered to be one of the crucial problems in bioinformatics and molecular biology, which can provide valuable insights and inputs for the fold recognition and drug design.Results
We propose an approach that combines support vector machines (SVMs) and logistic regression (LR) in a hybrid prediction method, which we call (H-SVM-LR) to predict β-turns in proteins. Fractional polynomials are used for LR modeling. We utilize position specific scoring matrices (PSSMs) and predicted secondary structure (PSS) as features. Our simulation studies show that H-SVM-LR achieves Qtotal of 82.87%, 82.84%, and 82.32% on the BT426, BT547, and BT823 datasets respectively. These values are the highest among other β-turns prediction methods that are based on PSSMs and secondary structure information. H-SVM-LR also achieves favorable performance in predicting β-turns as measured by the Matthew's correlation coefficient (MCC) on these datasets. Furthermore, H-SVM-LR shows good performance when considering shape strings as additional features.Conclusions
In this paper, we present a comprehensive approach for β-turns prediction. Experiments show that our proposed approach achieves better performance compared to other competing prediction methods.9.
10.
A major challenge of the protein docking problem is to define scoring functions that can distinguish near‐native protein complex geometries from a large number of non‐native geometries (decoys) generated with noncomplexed protein structures (unbound docking). In this study, we have constructed a neural network that employs the information from atom‐pair distance distributions of a large number of decoys to predict protein complex geometries. We found that docking prediction can be significantly improved using two different types of polar hydrogen atoms. To train the neural network, 2000 near‐native decoys of even distance distribution were used for each of the 185 considered protein complexes. The neural network normalizes the information from different protein complexes using an additional protein complex identity input neuron for each complex. The parameters of the neural network were determined such that they mimic a scoring funnel in the neighborhood of the native complex structure. The neural network approach avoids the reference state problem, which occurs in deriving knowledge‐based energy functions for scoring. We show that a distance‐dependent atom pair potential performs much better than a simple atom‐pair contact potential. We have compared the performance of our scoring function with other empirical and knowledge‐based scoring functions such as ZDOCK 3.0, ZRANK, ITScore‐PP, EMPIRE, and RosettaDock. In spite of the simplicity of the method and its functional form, our neural network‐based scoring function achieves a reasonable performance in rigid‐body unbound docking of proteins. Proteins 2010. © 2009 Wiley‐Liss, Inc. 相似文献
11.
The explosion of available sequence data necessitates the development of sophisticated machine learning tools with which to analyze them. This study introduces a sequence-learning technology called side effect machines. It also applies a model of evolution which simulates the evolution of a ring species to the training of the side effect machines. A comparison is done between side effect machines evolved in the ring structure and side effect machines evolved using a standard evolutionary algorithm based on tournament selection. At the core of the training of side effect machines is a nearest neighbor classifier. A parameter study was performed to investigate the impact of the division of training data into examples for nearest neighbor assessment and training cases. The parameter study demonstrates that parameter setting is important in the baseline runs but had little impact in the ring-optimization runs. The ring optimization technique was also found to exhibit improved and also more reliable training performance. Side effect machines are tested on two types of synthetic data, one based on GC-content and the other checking for the ability of side effect machines to recognize an embedded motif. Three types of biological data are used, a data set with different types of immune-system genes, a data set with normal and retro-virally derived human genomic sequence, and standard and nonstandard initiation regions from the cytochrome-oxidase subunit one in the mitochondrial genome. 相似文献
12.
Live migration of virtual machine (VM) provides a significant benefit for virtual server mobility without disrupting service. It is widely used for system management in virtualized data centers. However, migration costs may vary significantly for different workloads due to the variety of VM configurations and workload characteristics. To take into account the migration overhead in migration decision-making, we investigate design methodologies to quantitatively predict the migration performance and energy consumption. We thoroughly analyze the key parameters that affect the migration cost from theory to practice. We construct application-oblivious models for the cost prediction by using learned knowledge about the workloads at the hypervisor (also called VMM) level. This should be the first kind of work to estimate VM live migration cost in terms of both performance and energy in a quantitative approach. We evaluate the models using five representative workloads on a Xen virtualized environment. Experimental results show that the refined model yields higher than 90% prediction accuracy in comparison with measured cost. Model-guided decisions can significantly reduce the migration cost by more than 72.9% at an energy saving of 73.6%. 相似文献
13.
Background
Protein-protein interaction (PPI) plays essential roles in cellular functions. The cost, time and other limitations associated with the current experimental methods have motivated the development of computational methods for predicting PPIs. As protein interactions generally occur via domains instead of the whole molecules, predicting domain-domain interaction (DDI) is an important step toward PPI prediction. Computational methods developed so far have utilized information from various sources at different levels, from primary sequences, to molecular structures, to evolutionary profiles. 相似文献14.
Prediction of protein structural classes by neural network 总被引:6,自引:0,他引:6
15.
Chang JH Hwang KB Oh SJ Zhang BT 《Journal of bioinformatics and computational biology》2005,3(1):61-77
Combined analysis of the microarray and drug-activity datasets has the potential of revealing valuable knowledge about various relations among gene expressions and drug activities in the malignant cell. In this paper, we apply Bayesian networks, a tool for compact representation of the joint probability distribution, to such analysis. For the alleviation of data dimensionality problem, the huge datasets were condensed using a feature abstraction technique. The proposed analysis method was applied to the NCI60 dataset (http://discover.nci.nih.gov) consisting of gene expression profiles and drug activity patterns on human cancer cell lines. The Bayesian networks, learned from the condensed dataset, identified most of the salient pairwise correlations and some known relationships among several features in the original dataset, confirming the effectiveness of the proposed feature abstraction method. Also, a survey of the recent literature confirms the several relationships appearing in the learned Bayesian network to be biologically meaningful. 相似文献
16.
Software reliability prediction using recurrent neural network with Bayesian regularization 总被引:1,自引:0,他引:1
A recurrent neural network modeling approach for software reliability prediction with respect to cumulative failure time is proposed. Our proposed network structure has the capability of learning and recognizing the inherent internal temporal property of cumulative failure time sequence. Further, by adding a penalty term of sum of network connection weights, Bayesian regularization is applied to our network training scheme to improve the generalization capability and lower the susceptibility of overfitting. The performance of our proposed approach has been tested using four real-time control and flight dynamic application data sets. Numerical results show that our proposed approach is robust across different software projects, and has a better performance with respect to both goodness-of-fit and next-step-predictability compared to existing neural network models for failure time prediction. 相似文献
17.
《Biochimica et Biophysica Acta - Proteins and Proteomics》2020,1868(10):140477
The subcellular location of a protein is highly related to its function. Identifying the location of a given protein is an essential step for investigating its related problems. Traditional experimental methods can produce solid determination. However, their limitations, such as high cost and low efficiency, are evident. Computational methods provide an alternative means to address these problems. Most previous methods constantly extract features from protein sequences or structures for building prediction models. In this study, we use two types of features and combine them to construct the model. The first feature type is extracted from a protein–protein interaction network to abstract the relationship between the encoded protein and other proteins. The second type is obtained from gene ontology and biological pathways to indicate the existing functions of the encoded protein. These features are analyzed using some feature selection methods. The final optimum features are adopted to build the model with recurrent neural network as the classification algorithm. Such model yields good performance with Matthews correlation coefficient of 0.844. A decision tree is used as a rule learning classifier to extract decision rules. Although the performance of decision rules is poor, they are valuable in revealing the molecular mechanism of proteins with different subcellular locations. The final analysis confirms the reliability of the extracted rules. The source code of the propose method is freely available at https://github.com/xypan1232/rnnloc 相似文献
18.
Chris J Needham James R Bradford Andrew J Bulpitt Matthew A Care David R Westhead 《BMC bioinformatics》2006,7(1):405-14
Background
A number of methods that use both protein structural and evolutionary information are available to predict the functional consequences of missense mutations. However, many of these methods break down if either one of the two types of data are missing. Furthermore, there is a lack of rigorous assessment of how important the different factors are to prediction. 相似文献19.
Predicting rRNA-, RNA-, and DNA-binding proteins from primary structure with support vector machines
In the post-genome era, the prediction of protein function is one of the most demanding tasks in the study of bioinformatics. Machine learning methods, such as the support vector machines (SVMs), greatly help to improve the classification of protein function. In this work, we integrated SVMs, protein sequence amino acid composition, and associated physicochemical properties into the study of nucleic-acid-binding proteins prediction. We developed the binary classifications for rRNA-, RNA-, DNA-binding proteins that play an important role in the control of many cell processes. Each SVM predicts whether a protein belongs to rRNA-, RNA-, or DNA-binding protein class. Self-consistency and jackknife tests were performed on the protein data sets in which the sequences identity was < 25%. Test results show that the accuracies of rRNA-, RNA-, DNA-binding SVMs predictions are approximately 84%, approximately 78%, approximately 72%, respectively. The predictions were also performed on the ambiguous and negative data set. The results demonstrate that the predicted scores of proteins in the ambiguous data set by RNA- and DNA-binding SVM models were distributed around zero, while most proteins in the negative data set were predicted as negative scores by all three SVMs. The score distributions agree well with the prior knowledge of those proteins and show the effectiveness of sequence associated physicochemical properties in the protein function prediction. The software is available from the author upon request. 相似文献
20.
G-protein coupled receptors (GPCRs) represent one of the most important classes of drug targets for pharmaceutical industry and play important roles in cellular signal transduction. Predicting the coupling specificity of GPCRs to G-proteins is vital for further understanding the mechanism of signal transduction and the function of the receptors within a cell, which can provide new clues for pharmaceutical research and development. In this study, the features of amino acid compositions and physiochemical properties of the full-length GPCR sequences have been analyzed and extracted. Based on these features, classifiers have been developed to predict the coupling specificity of GPCRs to G-protelns using support vector machines. The testing results show that this method could obtain better prediction accuracy. 相似文献