期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Shortest path analysis using partial correlations for classifying gene functions from gene expression data

Fitch AM Jones MB 《Bioinformatics (Oxford, England)》2009,25(1):42-47

MOTIVATION: Gaussian graphical models (GGMs) are a popular tool for representing gene association structures. We propose using estimated partial correlations from these models to attach lengths to the edges of the GGM, where the length of an edge is inversely related to the partial correlation between the gene pair. Graphical lasso is used to fit the GGMs and obtain partial correlations. The shortest paths between pairs of genes are found. Where terminal genes have the same biological function intermediate genes on the path are classified as having the same function. We validate the method using genes of known function using the Rosetta Compendium of yeast (Saccharomyces Cerevisiae) gene expression profiles. We also compare our results with those obtained using a graph constructed using correlations. RESULTS: Using a partial correlation graph, we are able to classify approximately twice as many genes to the same level of accuracy as when using a correlation graph. More importantly when both methods are tuned to classify a similar number of genes, the partial correlation approach can increase the accuracy of the classifications. 相似文献

2.

Duplication models for biological networks. 总被引：11，自引：0，他引：11

Fan Chung Linyuan Lu T Gregory Dewey David J Galas 《Journal of computational biology》2003,10(5):677-687

Are biological networks different from other large complex networks? Both large biological and nonbiological networks exhibit power-law graphs (number of nodes with degree k, N(k) approximately k(-beta)), yet the exponents, beta, fall into different ranges. This may be because duplication of the information in the genome is a dominant evolutionary force in shaping biological networks (like gene regulatory networks and protein-protein interaction networks) and is fundamentally different from the mechanisms thought to dominate the growth of most nonbiological networks (such as the Internet). The preferential choice models used for nonbiological networks like web graphs can only produce power-law graphs with exponents greater than 2. We use combinatorial probabilistic methods to examine the evolution of graphs by node duplication processes and derive exact analytical relationships between the exponent of the power law and the parameters of the model. Both full duplication of nodes (with all their connections) as well as partial duplication (with only some connections) are analyzed. We demonstrate that partial duplication can produce power-law graphs with exponents less than 2, consistent with current data on biological networks. The power-law exponent for large graphs depends only on the growth process, not on the starting graph. 相似文献

3.

Assessing the validity domains of graphical Gaussian models in order to infer relationships among components of complex biological systems

Villers F Schaeffer B Bertin C Huet S 《Statistical applications in genetics and molecular biology》2008,7(1):Article 14

The study of the interactions of cellular components is an essential base step to understand the structure and dynamics of biological networks. Various methods were recently developed for this purpose. While most of them combine different types of data and a priori knowledge, methods based on graphical Gaussian models are capable of learning the network directly from raw data. They consider the full-order partial correlations which are partial correlations between two variables given the remaining ones, for modeling direct links between variables. Statistical methods were developed for estimating these links when the number of observations is larger than the number of variables. However, the rapid advance of new technologies that allow the simultaneous measure of genome expression, led to large-scale datasets where the number of variables is far larger than the number of observations. To get around this dimensionality problem, different strategies and new statistical methods were proposed. In this study we focused on statistical methods recently published. All are based on the fact that the number of direct relationships between two variables is very small in regards to the number of possible relationships, p(p-1)/2. In the biological context, this assumption is not always satisfied over the whole graph. It is essential to precisely know the behavior of the methods in regards to the characteristics of the studied object before applying them. For this purpose, we evaluated the validity domain of each method from wide-ranging simulated datasets. We then illustrated our results using recently published biological data. 相似文献

4.

Semi-Markov graph dynamics

Raberto M Rapallo F Scalas E 《PloS one》2011,6(8):e23370

In this paper, we outline a model of graph (or network) dynamics based on two ingredients. The first ingredient is a Markov chain on the space of possible graphs. The second ingredient is a semi-Markov counting process of renewal type. The model consists in subordinating the Markov chain to the semi-Markov counting process. In simple words, this means that the chain transitions occur at random time instants called epochs. The model is quite rich and its possible connections with algebraic geometry are briefly discussed. Moreover, for the sake of simplicity, we focus on the space of undirected graphs with a fixed number of nodes. However, in an example, we present an interbank market model where it is meaningful to use directed graphs or even weighted graphs. 相似文献

5.

SeinFit,a Computer Program for the Estimation of the Seinhorst Equation

N. M. Viaene P. Simoens G. S. Abawi 《Journal of nematology》1997,29(4):474-477

A computer program, "SeinFit," was created to determine the Seinhorst equation that best fits experimental data on the relationship between preplant nematode densities and plant growth. Data, which can be entered manually or imported from a text file, are displayed in a data window while the corresponding graph is shown in a graph window. Various options are available to manipulate the data and the graph settings. The best-fitting Seinhorst equation can be calculated by two methods that are both based on the evaluation of the residual sum of squares. Depending on the method, a range of values for different parameters of the Seinhorst equation can be chosen, as well as the number of steps in each range. Data, graphs, and values of the parameters of the Seinhorst equation can be printed. The program allows for quick calculation of the danaage threshold density - one of the parameters of the Seinhorst model. Versions written for Macintosh or DOS-compatible machines are currently available through the Society of Nematologists'' World Wide Web site (http://ianrwww.unl.edu/ianr/plntpath/ nematode/SOFTWARE/nemasoft.htm). 相似文献

6.

Evaluating intraspecific "network" construction methods using simulated sequence data: do existing algorithms outperform the global maximum parsimony approach?

Cassens I Mardulyn P Milinkovitch MC 《Systematic biology》2005,54(3):363-372

In intraspecific studies, reticulated graphs are valuable tools for visualization, within a single figure, of alternative genealogical pathways among haplotypes. As available software packages implementing the global maximum parsimony (MP) approach only give the possibility to merge resulting topologies into less-resolved consensus trees, MP has often been neglected as an alternative approach to purely algorithmic (i.e., methods defined solely on the basis of an algorithm) "network" construction methods. Here, we propose to search tree space using the MP criterion and present a new algorithm for uniting all equally most parsimonious trees into a single (possibly reticulated) graph. Using simulated sequence data, we compare our method with three purely algorithmic and widely used graph construction approaches (minimum-spanning network, statistical parsimony, and median-joining network). We demonstrate that the combination of MP trees into a single graph provides a good estimate of the true genealogy. Moreover, our analyses indicate that, when internal node haplotypes are not sampled, the median-joining and MP methods provide the best estimate of the true genealogy whereas the minimum-spanning algorithm shows very poor performances. 相似文献

7.

Making Large-Scale Networks from fMRI Data

Verena D. Schmittmann Sara Jahfari Denny Borsboom Alexander O. Savi Lourens J. Waldorp 《PloS one》2015,10(9)

Pairwise correlations are currently a popular way to estimate a large-scale network (> 1000 nodes) from functional magnetic resonance imaging data. However, this approach generally results in a poor representation of the true underlying network. The reason is that pairwise correlations cannot distinguish between direct and indirect connectivity. As a result, pairwise correlation networks can lead to fallacious conclusions; for example, one may conclude that a network is a small-world when it is not. In a simulation study and an application to resting-state fMRI data, we compare the performance of pairwise correlations in large-scale networks (2000 nodes) against three other methods that are designed to filter out indirect connections. Recovery methods are evaluated in four simulated network topologies (small world or not, scale-free or not) in scenarios where the number of observations is very small compared to the number of nodes. Simulations clearly show that pairwise correlation networks are fragmented into separate unconnected components with excessive connectedness within components. This often leads to erroneous estimates of network metrics, like small-world structures or low betweenness centrality, and produces too many low-degree nodes. We conclude that using partial correlations, informed by a sparseness penalty, results in more accurate networks and corresponding metrics than pairwise correlation networks. However, even with these methods, the presence of hubs in the generating network can be problematic if the number of observations is too small. Additionally, we show for resting-state fMRI that partial correlations are more robust than correlations to different parcellation sets and to different lengths of time-series. 相似文献

8.

Partial correlation analysis for the identification of synaptic connections

Eichler M Dahlhaus R Sandkühler J 《Biological cybernetics》2003,89(4):289-302

In this paper, we investigate the use of partial correlation analysis for the identification of functional neural connectivity from simultaneously recorded neural spike trains. Partial correlation analysis allows one to distinguish between direct and indirect connectivities by removing the portion of the relationship between two neural spike trains that can be attributed to linear relationships with recorded spike trains from other neurons. As an alternative to the common frequency domain approach based on the partial spectral coherence we propose a new statistic in the time domain. The new scaled partial covariance density provides additional information on the direction and the type, excitatory or inhibitory, of the connectivities. In simulation studies, we investigated the power and limitations of the new statistic. The simulations show that the detectability of various connectivity patterns depends on various parameters such as connectivity strength and background activity. In particular, the detectability decreases with the number of neurons included in the analysis and increases with the recording time. Further, we show that the method can also be used to detect multiple direct connectivities between two neurons. Finally, the methods of this paper are illustrated by an application to neurophysiological data from spinal dorsal horn neurons. 相似文献

9.

The Edge-Disjoint Path Problem on Random Graphs by Message-Passing

Fabrizio Altarelli Alfredo Braunstein Luca Dall’Asta Caterina De Bacco Silvio Franz 《PloS one》2015,10(12)

We present a message-passing algorithm to solve a series of edge-disjoint path problems on graphs based on the zero-temperature cavity equations. Edge-disjoint paths problems are important in the general context of routing, that can be defined by incorporating under a unique framework both traffic optimization and total path length minimization. The computation of the cavity equations can be performed efficiently by exploiting a mapping of a generalized edge-disjoint path problem on a star graph onto a weighted maximum matching problem. We perform extensive numerical simulations on random graphs of various types to test the performance both in terms of path length minimization and maximization of the number of accommodated paths. In addition, we test the performance on benchmark instances on various graphs by comparison with state-of-the-art algorithms and results found in the literature. Our message-passing algorithm always outperforms the others in terms of the number of accommodated paths when considering non trivial instances (otherwise it gives the same trivial results). Remarkably, the largest improvement in performance with respect to the other methods employed is found in the case of benchmarks with meshes, where the validity hypothesis behind message-passing is expected to worsen. In these cases, even though the exact message-passing equations do not converge, by introducing a reinforcement parameter to force convergence towards a sub optimal solution, we were able to always outperform the other algorithms with a peak of 27% performance improvement in terms of accommodated paths. On random graphs, we numerically observe two separated regimes: one in which all paths can be accommodated and one in which this is not possible. We also investigate the behavior of both the number of paths to be accommodated and their minimum total length. 相似文献

10.

Identifying Cognitive States Using Regularity Partitions

Ioannis Pappas Panos Pardalos 《PloS one》2015,10(8)

Functional Magnetic Resonance (fMRI) data can be used to depict functional connectivity of the brain. Standard techniques have been developed to construct brain networks from this data; typically nodes are considered as voxels or sets of voxels with weighted edges between them representing measures of correlation. Identifying cognitive states based on fMRI data is connected with recording voxel activity over a certain time interval. Using this information, network and machine learning techniques can be applied to discriminate the cognitive states of the subjects by exploring different features of data. In this work we wish to describe and understand the organization of brain connectivity networks under cognitive tasks. In particular, we use a regularity partitioning algorithm that finds clusters of vertices such that they all behave with each other almost like random bipartite graphs. Based on the random approximation of the graph, we calculate a lower bound on the number of triangles as well as the expectation of the distribution of the edges in each subject and state. We investigate the results by comparing them to the state of the art algorithms for exploring connectivity and we argue that during epochs that the subject is exposed to stimulus, the inspected part of the brain is organized in an efficient way that enables enhanced functionality. 相似文献

11.

Inference for nonparanormal partial correlation via regularized rank-based nodewise regression

Haoyan Hu Yumou Qiu 《Biometrics》2023,79(2):1173-1186

Partial correlation is a common tool in studying conditional dependence for Gaussian distributed data. However, partial correlation being zero may not be equivalent to conditional independence under non-Gaussian distributions. In this paper, we propose a statistical inference procedure for partial correlations under the high-dimensional nonparanormal (NPN) model where the observed data are normally distributed after certain monotone transformations. The NPN partial correlation is the partial correlation of the normal transformed data under the NPN model, which is a more general measure of conditional dependence. We estimate the NPN partial correlations by regularized nodewise regression based on the empirical ranks of the original data. A multiple testing procedure is proposed to identify the nonzero NPN partial correlations. The proposed method can be carried out by a simple coordinate descent algorithm for lasso optimization. It is easy-to-implement and computationally more efficient compared to the existing methods for estimating NPN graphical models. Theoretical results are developed to show the asymptotic normality of the proposed estimator and to justify the proposed multiple testing procedure. Numerical simulations and a case study on brain imaging data demonstrate the utility of the proposed procedure and evaluate its performance compared to the existing methods. Data used in preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database. 相似文献

12.

A simple visualization technique to understand the system dynamics in bioreactors

Patil KR Kulkarni AJ 《Biotechnology progress》2007,23(5):1101-1105

In this article, we present a graph theoretic method to visualize and analyze the system behavior under different operating conditions. The system attributes (or variables) are the nodes in the graphs, and partial correlation between a pair of attributes defines the distance between corresponding nodes, resulting in a fully connected graph. Then, the redundant links are reduced using Pathfinder Network Scaling technique to uncover the latent network structure. We use a simulated biological reactor dataset in normal and faulty operation to validate our method. The method is general and can be used to analyze several different systems. 相似文献

13.

Recent developments in quantitative graph theory: information inequalities for networks

Dehmer M Sivakumar L 《PloS one》2012,7(2):e31395

In this article, we tackle a challenging problem in quantitative graph theory. We establish relations between graph entropy measures representing the structural information content of networks. In particular, we prove formal relations between quantitative network measures based on Shannon's entropy to study the relatedness of those measures. In order to establish such information inequalities for graphs, we focus on graph entropy measures based on information functionals. To prove such relations, we use known graph classes whose instances have been proven useful in various scientific areas. Our results extend the foregoing work on information inequalities for graphs. 相似文献

14.

Detecting hierarchical structure in molecular characteristics of disease using transitive approximations of directed graphs

Jacob J Jentsch M Kostka D Bentink S Spang R 《Bioinformatics (Oxford, England)》2008,24(7):995-1001

MOTIVATION: Molecular diagnostics aims at classifying diseases into clinically relevant sub-entities based on molecular characteristics. Typically, the entities are split into subgroups, which might contain several variants yielding a hierarchical model of the disease. Recent years have introduced a plethora of new molecular screening technologies to molecular diagnostics. As a result molecular profiles of patients became complex and the classification task more difficult. RESULTS: We present a novel tool for detecting hierarchical structure in binary datasets. We aim for identifying molecular characteristics, which are stochastically implying other characteristics. The final hierarchical structure is encoded in a directed transitive graph where nodes represent molecular characteristics and a directed edge from a node A to a node B denotes that almost all cases with characteristic B also display characteristic A. Naturally, these graphs need to be transitive. In the core of our modeling approach lies the problem of calculating good transitive approximations of given directed but not necessarily transitive graphs. By good transitive approximation we understand transitive graphs, which differ from the reference graph in only a small number of edges. It is known that the problem of finding optimal transitive approximation is NP-complete. Here we develop an efficient heuristic for generating good transitive approximations. We evaluate the computational efficiency of the algorithm in simulations, and demonstrate its use in the context of a large genome-wide study on mature aggressive lymphomas. AVAILABILITY: The software used in our analysis is freely available from http://compdiag.uni-regensburg.de/software/transApproxs.shtml. 相似文献

15.

Clustering gene expression data using graph separators

Kaba B Pinet N Lelandais G Sigayret A Berry A 《In silico biology》2007,7(4-5):433-452

Recent work has used graphs to modelize expression data from microarray experiments, in view of partitioning the genes into clusters. In this paper, we introduce the use of a decomposition by clique separators. Our aim is to improve the classical clustering methods in two ways: first we want to allow an overlap between clusters, as this seems biologically sound, and second we want to be guided by the structure of the graph to define the number of clusters. We test this approach with a well-known yeast database (Saccharomyces cerevisiae). Our results are good, as the expression profiles of the clusters we find are very coherent. Moreover, we are able to organize into another graph the clusters we find, and order them in a fashion which turns out to respect the chronological order defined by the the sporulation process. 相似文献

16.

基于图自编码器和协同训练预测miRNA[]与疾病的关联

下载免费PDF全文

刘立伟刘晓兰谭者斌《生物信息学》2024,22(2):116-123

近年来,越来越多的生物学实验研究表明,microRNA (miRNA)在人类复杂疾病的发展中发挥着重要作用。因此,预测miRNA与疾病之间的关联有助于疾病的准确诊断和有效治疗。由于传统的生物学实验是一种昂贵且耗时的方式,于是许多基于生物学数据的计算模型被提出来预测miRNA与疾病的关联。本研究提出了一种端到端的深度学习模型来预测miRNA-疾病关联关系,称为MDAGAC。首先,通过整合疾病语义相似性,miRNA功能相似性和高斯相互作用谱核相似性,构建miRNA和疾病的相似性图。然后,通过图自编码器和协同训练来改善标签传播的效果。该模型分别在miRNA图和疾病图上建立了两个图自编码器,并对这两个图自编码器进行了协同训练。miRNA图和疾病图上的图自编码器能够通过初始关联矩阵重构得分矩阵,这相当于在图上传播标签。miRNA-疾病关联的预测概率可以从得分矩阵得到。基于五折交叉验证的实验结果表明,MDAGAC方法可靠有效,优于现有的几种预测miRNA-疾病关联的方法。相似文献

17.

Tetramer protein complex interface residue pairs prediction with LSTM combined with graph representations

《Biochimica et Biophysica Acta - Proteins and Proteomics》2020,1868(11):140504

MotivationProtein-protein interactions are important for many biological processes. Theoretical understanding of the structurally determining factors of interaction sites will help to understand the underlying mechanism of protein-protein interactions. Taking advantage of advanced mathematical methods to correctly predict interaction sites will be useful. Although some previous studies have been devoted to the interaction interface of protein monomer and the interface residues between chains of protein dimers, very few studies about the interface residues prediction of protein multimers, including trimers, tetramer and even more monomers in a large protein complex. As we all know, a large number of proteins function with the form of multibody protein complexes. And the complexity of the protein multimers structure causes the difficulty of interface residues prediction on them. So, we hope to build a method for the prediction of protein tetramer interface residue pairs.ResultsHere, we developed a new deep network based on LSTM network combining with graph to predict protein tetramers interaction interface residue pairs. On account of the protein structure data is not the same as the image or video data which is well-arranged matrices, namely the Euclidean Structure mentioned in many researches. Because the Non-Euclidean Structure data can't keep the translation invariance, and we hope to extract some spatial features from this kind of data applying on deep learning, an algorithm combining with graph was developed to predict the interface residue pairs of protein interactions based on a topological graph building a relationship between vertexes and edges in graph theory combining multilayer Long Short-Term Memory network. First, selecting the training and test samples from the Protein Data Bank, and then extracting the physicochemical property features and the geometric features of surface residue associated with interfacial properties. Subsequently, we transform the protein multimers data to topological graphs and predict protein interaction interface residue pairs using the model. In addition, different types of evaluation indicators verified its validity. 相似文献

18.

A novel analytical method for evolutionary graph theory problems

Paulo Shakarian Patrick Roos Geoffrey Moores 《Bio Systems》2013

Evolutionary graph theory studies the evolutionary dynamics of populations structured on graphs. A central problem is determining the probability that a small number of mutants overtake a population. Currently, Monte Carlo simulations are used for estimating such fixation probabilities on general directed graphs, since no good analytical methods exist. In this paper, we introduce a novel deterministic framework for computing fixation probabilities for strongly connected, directed, weighted evolutionary graphs under neutral drift. We show how this framework can also be used to calculate the expected number of mutants at a given time step (even if we relax the assumption that the graph is strongly connected), how it can extend to other related models (e.g. voter model), how our framework can provide non-trivial bounds for fixation probability in the case of an advantageous mutant, and how it can be used to find a non-trivial lower bound on the mean time to fixation. We provide various experimental results determining fixation probabilities and expected number of mutants on different graphs. Among these, we show that our method consistently outperforms Monte Carlo simulations in speed by several orders of magnitude. Finally we show how our approach can provide insight into synaptic competition in neurology. 相似文献

19.

An empirical Bayes approach to inferring large-scale gene association networks 总被引：8，自引：0，他引：8

Schäfer J Strimmer K 《Bioinformatics (Oxford, England)》2005,21(6):754-764

MOTIVATION: Genetic networks are often described statistically using graphical models (e.g. Bayesian networks). However, inferring the network structure offers a serious challenge in microarray analysis where the sample size is small compared to the number of considered genes. This renders many standard algorithms for graphical models inapplicable, and inferring genetic networks an 'ill-posed' inverse problem. METHODS: We introduce a novel framework for small-sample inference of graphical models from gene expression data. Specifically, we focus on the so-called graphical Gaussian models (GGMs) that are now frequently used to describe gene association networks and to detect conditionally dependent genes. Our new approach is based on (1) improved (regularized) small-sample point estimates of partial correlation, (2) an exact test of edge inclusion with adaptive estimation of the degree of freedom and (3) a heuristic network search based on false discovery rate multiple testing. Steps (2) and (3) correspond to an empirical Bayes estimate of the network topology. RESULTS: Using computer simulations, we investigate the sensitivity (power) and specificity (true negative rate) of the proposed framework to estimate GGMs from microarray data. This shows that it is possible to recover the true network topology with high accuracy even for small-sample datasets. Subsequently, we analyze gene expression data from a breast cancer tumor study and illustrate our approach by inferring a corresponding large-scale gene association network for 3883 genes. 相似文献

20.

Adjusted regularization of cortical covariance

Giuseppe Vinci Valérie Ventura Matthew A. Smith Robert E. Kass 《Journal of computational neuroscience》2018,45(2):83-101

It is now common to record dozens to hundreds or more neurons simultaneously, and to ask how the network activity changes across experimental conditions. A natural framework for addressing questions of functional connectivity is to apply Gaussian graphical modeling to neural data, where each edge in the graph corresponds to a non-zero partial correlation between neurons. Because the number of possible edges is large, one strategy for estimating the graph has been to apply methods that aim to identify large sparse effects using an \(L_{1}\) penalty. However, the partial correlations found in neural spike count data are neither large nor sparse, so techniques that perform well in sparse settings will typically perform poorly in the context of neural spike count data. Fortunately, the correlated firing for any pair of cortical neurons depends strongly on both their distance apart and the features for which they are tuned. We introduce a method that takes advantage of these known, strong effects by allowing the penalty to depend on them: thus, for example, the connection between pairs of neurons that are close together will be penalized less than pairs that are far apart. We show through simulations that this physiologically-motivated procedure performs substantially better than off-the-shelf generic tools, and we illustrate by applying the methodology to populations of neurons recorded with multielectrode arrays implanted in macaque visual cortex areas V1 and V4. 相似文献