共查询到20条相似文献,搜索用时 0 毫秒
1.
Community structure detection is an important tool in graph analysis. This can be done, among other ways, by solving for the partition set which optimizes the modularity scores . Here it is shown that topological constraints in correlation graphs induce over-fragmentation of community structures. A refinement step to this optimization based on Linear Discriminant Analysis (LDA) and a statistical test for significance is proposed. In structured simulation constrained by topology, this novel approach performs better than the optimization of modularity alone. This method was also tested with two empirical datasets: the Roll-Call voting in the 110th US Senate constrained by geographic adjacency, and a biological dataset of 135 protein structures constrained by inter-residue contacts. The former dataset showed sub-structures in the communities that revealed a regional bias in the votes which transcend party affiliations. This is an interesting pattern given that the 110th Legislature was assumed to be a highly polarized government. The -amylase catalytic domain dataset (biological dataset) was analyzed with and without topological constraints (inter-residue contacts). The results without topological constraints showed differences with the topology constrained one, but the LDA filtering did not change the outcome of the latter. This suggests that the LDA filtering is a robust way to solve the possible over-fragmentation when present, and that this method will not affect the results where there is no evidence of over-fragmentation. 相似文献
2.
One of the pressing open problems of computational systems biology is the elucidation of the topology of genetic regulatory networks (GRNs) using high throughput genomic data, in particular microarray gene expression data. The Dialogue for Reverse Engineering Assessments and Methods (DREAM) challenge aims to evaluate the success of GRN inference algorithms on benchmarks of simulated data. In this article, we present GENIE3, a new algorithm for the inference of GRNs that was best performer in the DREAM4 In Silico Multifactorial challenge. GENIE3 decomposes the prediction of a regulatory network between p genes into p different regression problems. In each of the regression problems, the expression pattern of one of the genes (target gene) is predicted from the expression patterns of all the other genes (input genes), using tree-based ensemble methods Random Forests or Extra-Trees. The importance of an input gene in the prediction of the target gene expression pattern is taken as an indication of a putative regulatory link. Putative regulatory links are then aggregated over all genes to provide a ranking of interactions from which the whole network is reconstructed. In addition to performing well on the DREAM4 In Silico Multifactorial challenge simulated data, we show that GENIE3 compares favorably with existing algorithms to decipher the genetic regulatory network of Escherichia coli. It doesn''t make any assumption about the nature of gene regulation, can deal with combinatorial and non-linear interactions, produces directed GRNs, and is fast and scalable. In conclusion, we propose a new algorithm for GRN inference that performs well on both synthetic and real gene expression data. The algorithm, based on feature selection with tree-based ensemble methods, is simple and generic, making it adaptable to other types of genomic data and interactions. 相似文献
3.
Background
The transmission networks of Plasmodium vivax characterize how the parasite transmits from one location to another, which are informative and insightful for public health policy makers to accurately predict the patterns of its geographical spread. However, such networks are not apparent from surveillance data because P. vivax transmission can be affected by many factors, such as the biological characteristics of mosquitoes and the mobility of human beings. Here, we pay special attention to the problem of how to infer the underlying transmission networks of P. vivax based on available tempo-spatial patterns of reported cases.Methodology
We first define a spatial transmission model, which involves representing both the heterogeneous transmission potential of P. vivax at individual locations and the mobility of infected populations among different locations. Based on the proposed transmission model, we further introduce a recurrent neural network model to infer the transmission networks from surveillance data. Specifically, in this model, we take into account multiple real-world factors, including the length of P. vivax incubation period, the impact of malaria control at different locations, and the total number of imported cases.Principal Findings
We implement our proposed models by focusing on the P. vivax transmission among 62 towns in Yunnan province, People''s Republic China, which have been experiencing high malaria transmission in the past years. By conducting scenario analysis with respect to different numbers of imported cases, we can (i) infer the underlying P. vivax transmission networks, (ii) estimate the number of imported cases for each individual town, and (iii) quantify the roles of individual towns in the geographical spread of P. vivax.Conclusion
The demonstrated models have presented a general means for inferring the underlying transmission networks from surveillance data. The inferred networks will offer new insights into how to improve the predictability of P. vivax transmission. 相似文献4.
基于相互作用的蛋白质功能预测 总被引:1,自引:0,他引:1
蛋白质功能预测是后基因时代研究的热点问题。基于相互作用的蛋白质功能预测方法目前应用比较广泛,但是当"伙伴蛋白质"(interacting partners)数目k较小时,其预测准确率不高。从蛋白质相互作用网络入手,结合"小世界网络"特性,有效解决了k较小时预测准确率不高的问题。对酵母(Saccharomyces cerevisiae)蛋白质的相互作用网络进行预测,当k≤4时其预测准确率比相同条件下的GO(global optimization)方法有一定提高。实验结果表明:该方法能够有效的应用于伙伴蛋白质数目较小时的蛋白质功能预测。 相似文献
5.
The neural patterns recorded during a neuroscientific experiment reflect complex interactions between many brain regions, each comprising millions of neurons. However, the measurements themselves are typically abstracted from that underlying structure. For example, functional magnetic resonance imaging (fMRI) datasets comprise a time series of three-dimensional images, where each voxel in an image (roughly) reflects the activity of the brain structure(s)–located at the corresponding point in space–at the time the image was collected. FMRI data often exhibit strong spatial correlations, whereby nearby voxels behave similarly over time as the underlying brain structure modulates its activity. Here we develop topographic factor analysis (TFA), a technique that exploits spatial correlations in fMRI data to recover the underlying structure that the images reflect. Specifically, TFA casts each brain image as a weighted sum of spatial functions. The parameters of those spatial functions, which may be learned by applying TFA to an fMRI dataset, reveal the locations and sizes of the brain structures activated while the data were collected, as well as the interactions between those structures. 相似文献
6.
Regina H. Magierowski Steve M. Read Steven J. B. Carter Danielle M. Warfe Laurie S. Cook Edward C. Lefroy Peter E. Davies 《PloS one》2015,10(3)
Identifying land-use drivers of changes in river condition is complicated by spatial scale, geomorphological context, land management, and correlations among responding variables such as nutrients and sediments. Furthermore, variations in standard metrics, such as substratum composition, do not necessarily relate causally to ecological impacts. Consequently, the absence of a significant relationship between a hypothesised driver and a dependent variable does not necessarily indicate the absence of a causal relationship. We conducted a gradient survey to identify impacts of catchment-scale grazing by domestic livestock on river macroinvertebrate communities. A standard correlative approach showed that community structure was strongly related to the upstream catchment area under grazing. We then used data from a stream mesocosm experiment that independently quantified the impacts of nutrients and fine sediments on macroinvertebrate communities to train artificial neural networks (ANNs) to assess the relative influence of nutrients and fine sediments on the survey sites from their community composition. The ANNs developed to predict nutrient impacts did not find a relationship between nutrients and catchment area under grazing, suggesting that nutrients were not an important factor mediating grazing impacts on community composition, or that these ANNs had no generality or insufficient power at the landscape-scale. In contrast, ANNs trained to predict the impacts of fine sediments indicated a significant relationship between fine sediments and catchment area under grazing. Macroinvertebrate communities at sites with a high proportion of land under grazing were thus more similar to those resulting from high fine sediments in a mesocosm experiment than to those resulting from high nutrients. Our study confirms that 1) fine sediment is an important mediator of land-use impacts on river macroinvertebrate communities, 2) ANNs can successfully identify subtle effects and separate the effects of correlated variables, and 3) data from small-scale experiments can generate relationships that help explain landscape-scale patterns. 相似文献
7.
The paper is concerned with methods for the estimation of the coalescence time (time since the most recent common ancestor) of a sample of intraspecies DNA sequences. The methods take advantage of prior knowledge of population demography, in addition to the molecular data. While some theoretical results are presented, a central focus is on computational methods. These methods are easy to implement, and, since explicit formulae tend to be either unavailable or unilluminating, they are also more useful and more informative in most applications. Extensions are presented that allow for the effects of uncertainty in our knowledge of population size and mutation rates, for variability in population sizes, for regions of different mutation rate, and for inference concerning the coalescence time of the entire population. The methods are illustrated using recent data from the human Y chromosome. 相似文献
8.
Joseph T. Wu Kathy Leung Ranawaka A. P. M. Perera Daniel K. W. Chu Cheuk Kwong Lee Ivan F. N. Hung Che Kit Lin Su-Vui Lo Yu-Lung Lau Gabriel M. Leung Benjamin J. Cowling J. S. Malik Peiris 《PLoS pathogens》2014,10(4)
Seroprevalence survey is the most practical method for accurately estimating infection attack rate (IAR) in an epidemic such as influenza. These studies typically entail selecting an arbitrary titer threshold for seropositivity (e.g. microneutralization [MN] 1∶40) and assuming the probability of seropositivity given infection (infection-seropositivity probability, ISP) is 100% or similar to that among clinical cases. We hypothesize that such conventions are not necessarily robust because different thresholds may result in different IAR estimates and serologic responses of clinical cases may not be representative. To illustrate our hypothesis, we used an age-structured transmission model to fully characterize the transmission dynamics and seroprevalence rises of 2009 influenza pandemic A/H1N1 (pdmH1N1) during its first wave in Hong Kong. We estimated that while 99% of pdmH1N1 infections became MN1∶20 seropositive, only 72%, 62%, 58% and 34% of infections among age 3–12, 13–19, 20–29, 30–59 became MN1∶40 seropositive, which was much lower than the 90%–100% observed among clinical cases. The fitted model was consistent with prevailing consensus on pdmH1N1 transmission characteristics (e.g. initial reproductive number of 1.28 and mean generation time of 2.4 days which were within the consensus range), hence our ISP estimates were consistent with the transmission dynamics and temporal buildup of population-level immunity. IAR estimates in influenza seroprevalence studies are sensitive to seropositivity thresholds and ISP adjustments which in current practice are mostly chosen based on conventions instead of systematic criteria. Our results thus highlighted the need for reexamining conventional practice to develop standards for analyzing influenza serologic data (e.g. real-time assessment of bias in ISP adjustments by evaluating the consistency of IAR across multiple thresholds and with mixture models), especially in the context of pandemics when robustness and comparability of IAR estimates are most needed for informing situational awareness and risk assessment. The same principles are broadly applicable for seroprevalence studies of other infectious disease outbreaks. 相似文献
9.
The form of an organism is the combination of its size and its shape. For a sample of forms, biologists wish to characterize both mean form and the variation in form. For geometric data, where form is characterized as the spatial locations of homologous points, the first step in analysis superimposes the forms, which requires an assumption about what measure of size is appropriate. Geometric morphometrics adopts centroid size as the natural measure of size, and assumes that variation around the mean form is isometric with size. These assumptions limit the interpretation of the resulting estimates of mean and variance in form. We illustrate these problems using allometric variation in shape. We show that superimposition based on subsets of relatively isometric points can yield superior inferences about the overall pattern of variation. We propose and demonstrate two superimposition techniques based on this idea. In subset superimposition, landmarks are progressively discarded from the data used for superimposition if they result in significant decreases in the variation among the remaining landmarks. In outline superimposition, regularly distributed pseudolandmarks on the continuous outline of a form are used as the basis for superimposition of the landmarks contained within it. Simulations show that these techniques can result in dramatic improvements in the accuracy of estimated variance-covariance matrices among landmarks when our assumptions are roughly satisfied. The pattern of variation inferred by means of our superimposition techniques can be quite different from that recovered from full generalized Procrustes superimposition. The pattern of shape variation in the wings of drosophilid flies appears to meet these assumptions. Adoption of superimposition procedures that incorporate biological assumptions about the nature of size and of the variation in shape can dramatically improve the ability to infer the pattern of variation in geometric morphometric data. 相似文献
10.
Arvind Rao Alfred O Hero III David J States James Douglas Engel 《EURASIP Journal on Bioinformatics and Systems Biology》2007,2007(1):51947
Most current methods for gene regulatory network identification lead to the inference of steady-state networks, that is, networks prevalent over all times, a hypothesis which has been challenged. There has been a need to infer and represent networks in a dynamic, that is, time-varying fashion, in order to account for different cellular states affecting the interactions amongst genes. In this work, we present an approach, regime-SSM, to understand gene regulatory networks within such a dynamic setting. The approach uses a clustering method based on these underlying dynamics, followed by system identification using a state-space model for each learnt cluster—to infer a network adjacency matrix. We finally indicate our results on the mouse embryonic kidney dataset as well as the T-cell activation-based expression dataset and demonstrate conformity with reported experimental evidence. 相似文献
11.
12.
Extracting network-based functional relationships within genomic datasets is an important challenge in the computational analysis of large-scale data. Although many methods, both public and commercial, have been developed, the problem of identifying networks of interactions that are most relevant to the given input data still remains an open issue. Here, we have leveraged the method of random walks on graphs as a powerful platform for scoring network components based on simultaneous assessment of the experimental data as well as local network connectivity. Using this method, NetWalk, we can calculate distribution of Edge Flux values associated with each interaction in the network, which reflects the relevance of interactions based on the experimental data. We show that network-based analyses of genomic data are simpler and more accurate using NetWalk than with some of the currently employed methods. We also present NetWalk analysis of microarray gene expression data from MCF7 cells exposed to different doses of doxorubicin, which reveals a switch-like pattern in the p53 regulated network in cell cycle arrest and apoptosis. Our analyses demonstrate the use of NetWalk as a valuable tool in generating high-confidence hypotheses from high-content genomic data. 相似文献
13.
Signalling network inference is a central problem in system biology. Previous studies investigate this problem by independently inferring local signalling networks and then linking them together via crosstalk. Since a cellular signalling system is in fact indivisible, this reductionistic approach may have an impact on the accuracy of the inference results. Preferably, a cell-scale signalling network should be inferred as a whole. However, the holistic approach suffers from three practical issues: scalability, measurement and overfitting. Here we make this approach feasible based on two key observations: 1) variations of concentrations are sparse due to separations of timescales; 2) several species can be measured together using cross-reactivity. We propose a method, CCELL, for cell-scale signalling network inference from time series generated by immunoprecipitation using Bayesian compressive sensing. A set of benchmark networks with varying numbers of time-variant species is used to demonstrate the effectiveness of our method. Instead of exhaustively measuring all individual species, high accuracy is achieved from relatively few measurements. 相似文献
14.
Studies of social networks, mapped using self-reported contacts, have demonstrated the strong influence of social connections on the propensity for individuals to adopt or maintain healthy behaviors and on their likelihood to adopt health risks such as obesity. Social network analysis may prove useful for businesses and organizations that wish to improve the health of their populations by identifying key network positions. Health traits have been shown to correlate across friendship ties, but evaluating network effects in large coworker populations presents the challenge of obtaining sufficiently comprehensive network data. The purpose of this study was to evaluate methods for using online communication data to generate comprehensive network maps that reproduce the health-associated properties of an offline social network. In this study, we examined three techniques for inferring social relationships from email traffic data in an employee population using thresholds based on: (1) the absolute number of emails exchanged, (2) logistic regression probability of an offline relationship, and (3) the highest ranked email exchange partners. As a model of the offline social network in the same population, a network map was created using social ties reported in a survey instrument. The email networks were evaluated based on the proportion of survey ties captured, comparisons of common network metrics, and autocorrelation of body mass index (BMI) across social ties. Results demonstrated that logistic regression predicted the greatest proportion of offline social ties, thresholding on number of emails exchanged produced the best match to offline network metrics, and ranked email partners demonstrated the strongest autocorrelation of BMI. Since each method had unique strengths, researchers should choose a method based on the aspects of offline behavior of interest. Ranked email partners may be particularly useful for purposes related to health traits in a social network. 相似文献
15.
Maximum entropy-based inference methods have been successfully used to infer direct interactions from biological datasets such as gene expression data or sequence ensembles. Here, we review undirected pairwise maximum-entropy probability models in two categories of data types, those with continuous and categorical random variables. As a concrete example, we present recently developed inference methods from the field of protein contact prediction and show that a basic set of assumptions leads to similar solution strategies for inferring the model parameters in both variable types. These parameters reflect interactive couplings between observables, which can be used to predict global properties of the biological system. Such methods are applicable to the important problems of protein 3-D structure prediction and association of gene–gene networks, and they enable potential applications to the analysis of gene alteration patterns and to protein design. 相似文献
16.
17.
Researchers have recently paid attention to social contact patterns among individuals due to their useful applications in such areas as epidemic evaluation and control, public health decisions, chronic disease research and social network research. Although some studies have estimated social contact patterns from social networks and surveys, few have considered how to infer the hierarchical structure of social contacts directly from census data. In this paper, we focus on inferring an individual’s social contact patterns from detailed census data, and generate various types of social contact patterns such as hierarchical-district-structure-based, cross-district and age-district-based patterns. We evaluate newly generated contact patterns derived from detailed 2011 Hong Kong census data by incorporating them into a model and simulation of the 2009 Hong Kong H1N1 epidemic. We then compare the newly generated social contact patterns with the mixing patterns that are often used in the literature, and draw the following conclusions. First, the generation of social contact patterns based on a hierarchical district structure allows for simulations at different district levels. Second, the newly generated social contact patterns reflect individuals social contacts. Third, the newly generated social contact patterns improve the accuracy of the SEIR-based epidemic model. 相似文献
18.
Inferring Behavioral States of Grazing Livestock from High-Frequency Position Data Alone 总被引:1,自引:0,他引:1
Studies of animal behavior are crucial to understanding animal-ecosystem interactions, but require substantial efforts in visual observation or sensor measurement. We investigated how classifying behavioral states of grazing livestock using global positioning data alone depends on the classification approach, the preselection of training data, and the number and type of movement metrics. Positions of grazing cows were collected at intervals of 20 seconds in six upland areas in Switzerland along with visual observations of animal behavior for comparison. A total of 87 linear and cumulative distance metrics and 15 turning angle metrics across multiple time steps were used to classify position data into the behavioral states of walking, grazing, and resting. Five random forest classification models, a linear discriminant analysis, a support vector machine, and a state-space model were evaluated. The most accurate classification of the observed behavioral states in an independent validation dataset was 83%, obtained using random forest with all available movement metrics. However, the state-specific accuracy was highly unequal (walking: 36%, grazing: 95%, resting: 58%). Random undersampling led to a prediction accuracy of 77%, with more balanced state-specific accuracies (walking: 68%, grazing: 82%, resting: 68%). The other evaluated machine-learning approaches had lower classification accuracies. The state-space model, based on distance to the preceding position and turning angle, produced a relatively low accuracy of 64%, slightly lower than a random forest model with the same predictor variables. Given the successful classification of behavioral states, our study promotes the more frequent use of global positioning data alone for animal behavior studies under the condition that data is collected at high frequency and complemented by context-specific behavioral observations. Machine-learning algorithms, notably random forest, were found very useful for classification and easy to implement. Moreover, the use of measures across multiple time steps is clearly necessary for a satisfactory classification. 相似文献
19.
Verena D. Schmittmann Sara Jahfari Denny Borsboom Alexander O. Savi Lourens J. Waldorp 《PloS one》2015,10(9)
Pairwise correlations are currently a popular way to estimate a large-scale network (> 1000 nodes) from functional magnetic resonance imaging data. However, this approach generally results in a poor representation of the true underlying network. The reason is that pairwise correlations cannot distinguish between direct and indirect connectivity. As a result, pairwise correlation networks can lead to fallacious conclusions; for example, one may conclude that a network is a small-world when it is not. In a simulation study and an application to resting-state fMRI data, we compare the performance of pairwise correlations in large-scale networks (2000 nodes) against three other methods that are designed to filter out indirect connections. Recovery methods are evaluated in four simulated network topologies (small world or not, scale-free or not) in scenarios where the number of observations is very small compared to the number of nodes. Simulations clearly show that pairwise correlation networks are fragmented into separate unconnected components with excessive connectedness within components. This often leads to erroneous estimates of network metrics, like small-world structures or low betweenness centrality, and produces too many low-degree nodes. We conclude that using partial correlations, informed by a sparseness penalty, results in more accurate networks and corresponding metrics than pairwise correlation networks. However, even with these methods, the presence of hubs in the generating network can be problematic if the number of observations is too small. Additionally, we show for resting-state fMRI that partial correlations are more robust than correlations to different parcellation sets and to different lengths of time-series. 相似文献