期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A Spark-based genetic algorithm for sensor placement in large scale drinking water distribution systems

Chengyu Hu Guo Ren Chao Liu Ming Li Wei Jie 《Cluster computing》2017,20(2):1089-1099

Water pollution incidents have occurred frequently in recent years, causing severe damages, economic loss and long-lasting society impact. A viable solution is to install water quality monitoring sensors in water supply networks (WSNs) for real-time pollution detection, thereby mitigating the risk of catastrophic contamination incidents. Given the significant cost of placing sensors at all locations in a network, a critical issue is where to deploy sensors within WSNs, while achieving rapid detection of contaminant events. Existing studies have mainly focused on sensor placement in water distribution systems (WDSs). However, the problem is still not adequately addressed, especially for large scale WSNs. In this paper, we investigate the sensor placement problem in large scale WDSs with the objective of minimizing the impact of contamination events. Specifically, we propose a two-phase Spark-based genetic algorithm (SGA). Experimental results show that SGA outperforms other traditional algorithms in both accuracy and efficiency, which validates the feasibility and effectiveness of our proposed approach. 相似文献

2.

Multi-way clustering of microarray data using probabilistic sparse matrix factorization

Dueck D Morris QD Frey BJ 《Bioinformatics (Oxford, England)》2005,21(Z1):i144-i151

MOTIVATION: We address the problem of multi-way clustering of microarray data using a generative model. Our algorithm, probabilistic sparse matrix factorization (PSMF), is a probabilistic extension of a previous hard-decision algorithm for this problem. PSMF allows for varying levels of sensor noise in the data, uncertainty in the hidden prototypes used to explain the data and uncertainty as to the prototypes selected to explain each data vector. RESULTS: We present experimental results demonstrating that our method can better recover functionally-relevant clusterings in mRNA expression data than standard clustering techniques, including hierarchical agglomerative clustering, and we show that by computing probabilities instead of point estimates, our method avoids converging to poor solutions. 相似文献

3.

Algorithms for energy efficient mobile object tracking in wireless sensor networks

Li Liu Bin Hu Lian Li 《Cluster computing》2010,13(2):181-197

Wireless sensor networks have found more and more applications in a variety of pervasive computing environments, in their functions as data acquisition in pervasive applications. However, how to get better performance to support data acquisition of pervasive applications over WSNs remains to be a nontrivial and challenging task. The network lifetime and application requirement are two fundamental, yet conflicting, design objectives in wireless sensor networks for tracking mobile objects. The application requirement is often correlated to the delay time within which the application can send its sensing data back to the users in tracking networks. In this paper we study the network lifetime maximization problem and the delay time minimization problem together. To make both problems tractable, we have the assumption that each sensor node keeps working since it turns on. And we formulate the network lifetime maximization problem as maximizing the number of sensor nodes who don’t turn on, and the delay time minimization problem as minimizing the routing path length, after achieving the required tracking tasks. Since we prove the problems are NP-complete and APX-complete, we propose three heuristic algorithms to solve them. And we present several experiments to show the advantages and disadvantages referring to the network lifetime and the delay time among these three algorithms on three models, random graphs, grids and hypercubes. Furthermore, we implement the distributed version of these algorithms. 相似文献

4.

Link Community Detection Using Generative Model and Nonnegative Matrix Factorization

Dongxiao He Di Jin Carlos Baquero Dayou Liu 《PloS one》2014,9(1)

Discovery of communities in complex networks is a fundamental data analysis problem with applications in various domains. While most of the existing approaches have focused on discovering communities of nodes, recent studies have shown the advantages and uses of link community discovery in networks. Generative models provide a promising class of techniques for the identification of modular structures in networks, but most generative models mainly focus on the detection of node communities rather than link communities. In this work, we propose a generative model, which is based on the importance of each node when forming links in each community, to describe the structure of link communities. We proceed to fit the model parameters by taking it as an optimization problem, and solve it using nonnegative matrix factorization. Thereafter, in order to automatically determine the number of communities, we extend the above method by introducing a strategy of iterative bipartition. This extended method not only finds the number of communities all by itself, but also obtains high efficiency, and thus it is more suitable to deal with large and unexplored real networks. We test this approach on both synthetic benchmarks and real-world networks including an application on a large biological network, and compare it with two highly related methods. Results demonstrate the superior performance of our approach over competing methods for the detection of link communities. 相似文献

5.

Distance estimation by mining characteristics in anisotropic sensor networks

Kai Li Yun Wang 《Cluster computing》2010,13(2):167-180

Localization is useful for many position-dependent applications in wireless sensor networks, where distance estimation from sensor nodes to beacon nodes plays a fundamental role. Most current ranging methods rely on an assumption that deployed WSNs are isotropic. Hence, adjustments on measured distances are the same in all directions. Unfortunately, this assumption does not hold in practice. Present methods introduce such great ranging errors that they are not feasible for real applications. In order to obtain better distance estimation in anisotropic WSNs, we propose a new metric, Dominating Degree, to describe the local deployment characteristics of sensor nodes, and to identify turning nodes along paths. We further propose a method to scale deployment irregularities of WSNs as global characteristics. Finally, appropriate adjustments to distance measurements are performed by synthesizing both local and global characteristics. Simulation results show that the proposed method outperforms PDM and DV-distance especially when beacon nodes are not deployed uniformly. 相似文献

6.

CogTime_RMF: regularized matrix factorization with drifting cognition degree for collaborative filtering

JieMin Chen Feiyi Tang Jing Xiao JianGuo Li Jing He Yong Tang 《Cluster computing》2016,19(2):821-835

Due to the exponential growth of information, recommender systems have been a widely exploited technique to solve the problem of information overload effectively. Collaborative filtering (CF) is the most successful and extensively employed recommendation approach. However, current CF methods recommend suitable items for users mainly by user-item matrix that contains the individual preference of users for items in a collection. So these methods suffer from such problems as the sparsity of the available data and low accuracy in predictions. To address these issues, borrowing the idea of cognition degree from cognitive psychology and employing the regularized matrix factorization (RMF) as the basic model, we propose a novel drifting cognition degree-based RMF collaborative filtering method named CogTime_RMF that incorporates both user-item matrix and users’ drifting cognition degree with time. Moreover, we conduct experiments on the real datasets MovieLens 1 M and MovieLens 100 k, and the method is compared with three similarity based methods and three other latest matrix factorization based methods. Empirical results demonstrate that our proposal can yield better performance over other methods in accuracy of recommendation. In addition, results show that CogTime_RMF can alleviate the data sparsity, particularly in the circumstance that few ratings are observed. 相似文献

7.

Uncovering Community Structures with Initialized Bayesian Nonnegative Matrix Factorization

Xianchao Tang Tao Xu Xia Feng Guoqing Yang 《PloS one》2014,9(9)

Uncovering community structures is important for understanding networks. Currently, several nonnegative matrix factorization algorithms have been proposed for discovering community structure in complex networks. However, these algorithms exhibit some drawbacks, such as unstable results and inefficient running times. In view of the problems, a novel approach that utilizes an initialized Bayesian nonnegative matrix factorization model for determining community membership is proposed. First, based on singular value decomposition, we obtain simple initialized matrix factorizations from approximate decompositions of the complex network’s adjacency matrix. Then, within a few iterations, the final matrix factorizations are achieved by the Bayesian nonnegative matrix factorization method with the initialized matrix factorizations. Thus, the network’s community structure can be determined by judging the classification of nodes with a final matrix factor. Experimental results show that the proposed method is highly accurate and offers competitive performance to that of the state-of-the-art methods even though it is not designed for the purpose of modularity maximization. 相似文献

8.

Estimating missing data: an iterative regression approach

Holt B Benfer RA 《Journal of human evolution》2000,39(3):289-296

The problem of missing data is common in all fields of science. Various methods of estimating missing values in a dataset exist, such as deletion of cases, insertion of sample mean, and linear regression. Each approach presents problems inherent in the method itself or in the nature of the pattern of missing data. We report a method that (1) is more general in application and (2) provides better estimates than traditional approaches, such as one-step regression. The model is general in that it may be applied to singular matrices, such as small datasets or those that contain dummy or index variables. The strength of the model is that it builds a regression equation iteratively, using a bootstrap method. The precision of the regressed estimates of a variable increases as regressed estimates of the predictor variables improve. We illustrate this method with a set of measurements of European Upper Paleolithic and Mesolithic human postcranial remains, as well as a set of primate anthropometric data. First, simulation tests using the primate data set involved randomly turning 20% of the values to "missing". In each case, the first iteration produced significantly better estimates than other estimating techniques. Second, we applied our method to the incomplete set of human postcranial measurements. MISDAT estimates always perform better than replacement of missing data by means and better than classical multiple regression. As with classical multiple regression, MISDAT performs when squared multiple correlation values approach the reliability of the measurement to be estimated, e.g., above about 0. 8. 相似文献

9.

Reverse engineering gene regulatory networks from measurement with missing values

Oyetunji?E.?Ogundijo Abdulkadir?Elmas Xiaodong?Wang Email author 《EURASIP Journal on Bioinformatics and Systems Biology》2017,2017(1):2

Background

Gene expression time series data are usually in the form of high-dimensional arrays. Unfortunately, the data may sometimes contain missing values: for either the expression values of some genes at some time points or the entire expression values of a single time point or some sets of consecutive time points. This significantly affects the performance of many algorithms for gene expression analysis that take as an input, the complete matrix of gene expression measurement. For instance, previous works have shown that gene regulatory interactions can be estimated from the complete matrix of gene expression measurement. Yet, till date, few algorithms have been proposed for the inference of gene regulatory network from gene expression data with missing values.

Results

We describe a nonlinear dynamic stochastic model for the evolution of gene expression. The model captures the structural, dynamical, and the nonlinear natures of the underlying biomolecular systems. We present point-based Gaussian approximation (PBGA) filters for joint state and parameter estimation of the system with one-step or two-step missing measurements. The PBGA filters use Gaussian approximation and various quadrature rules, such as the unscented transform (UT), the third-degree cubature rule and the central difference rule for computing the related posteriors. The proposed algorithm is evaluated with satisfying results for synthetic networks, in silico networks released as a part of the DREAM project, and the real biological network, the in vivo reverse engineering and modeling assessment (IRMA) network of yeast Saccharomyces cerevisiae.

Conclusion

PBGA filters are proposed to elucidate the underlying gene regulatory network (GRN) from time series gene expression data that contain missing values. In our state-space model, we proposed a measurement model that incorporates the effect of the missing data points into the sequential algorithm. This approach produces a better inference of the model parameters and hence, more accurate prediction of the underlying GRN compared to when using the conventional Gaussian approximation (GA) filters ignoring the missing data points.

相似文献

10.

Bayes optimal informer sets for early-stage drug discovery

Peng Yu Spencer Ericksen Anthony Gitter Michael A. Newton 《Biometrics》2023,79(2):642-654

An important experimental design problem in early-stage drug discovery is how to prioritize available compounds for testing when very little is known about the target protein. Informer-based ranking (IBR) methods address the prioritization problem when the compounds have provided bioactivity data on other potentially relevant targets. An IBR method selects an informer set of compounds, and then prioritizes the remaining compounds on the basis of new bioactivity experiments performed with the informer set on the target. We formalize the problem as a two-stage decision problem and introduce the Bayes Optimal Informer SEt (BOISE) method for its solution. BOISE leverages a flexible model of the initial bioactivity data, a relevant loss function, and effective computational schemes to resolve the two-step design problem. We evaluate BOISE and compare it to other IBR strategies in two retrospective studies, one on protein-kinase inhibition and the other on anticancer drug sensitivity. In both empirical settings BOISE exhibits better predictive performance than available methods. It also behaves well with missing data, where methods that use matrix completion show worse predictive performance. 相似文献

11.

Multi objective-based incremental clustering by fast search technique for dynamically creating and updating clusters in large data

Balakrishna Sivadi 《Cluster computing》2022,25(2):1441-1457

Cluster Computing - With the prevailing advancements in sensor technologies such as the Internet of Things (IoTs), cyber–physical-systems (CPSs), wireless sensor networks (WSNs), and many... 相似文献

12.

Integrative factorization of bidimensionally linked matrices

Jun Young Park Eric F. Lock 《Biometrics》2020,76(1):61-74

Advances in molecular “omics” technologies have motivated new methodologies for the integration of multiple sources of high-content biomedical data. However, most statistical methods for integrating multiple data matrices only consider data shared vertically (one cohort on multiple platforms) or horizontally (different cohorts on a single platform). This is limiting for data that take the form of bidimensionally linked matrices (eg, multiple cohorts measured on multiple platforms), which are increasingly common in large-scale biomedical studies. In this paper, we propose bidimensional integrative factorization (BIDIFAC) for integrative dimension reduction and signal approximation of bidimensionally linked data matrices. Our method factorizes data into (a) globally shared, (b) row-shared, (c) column-shared, and (d) single-matrix structural components, facilitating the investigation of shared and unique patterns of variability. For estimation, we use a penalized objective function that extends the nuclear norm penalization for a single matrix. As an alternative to the complicated rank selection problem, we use results from the random matrix theory to choose tuning parameters. We apply our method to integrate two genomics platforms (messenger RNA and microRNA expression) across two sample cohorts (tumor samples and normal tissue samples) using the breast cancer data from the Cancer Genome Atlas. We provide R code for fitting BIDIFAC, imputing missing values, and generating simulated data. 相似文献

13.

Selective integration of multiple biological data for supervised network inference

Kato T Tsuda K Asai K 《Bioinformatics (Oxford, England)》2005,21(10):2488-2495

MOTIVATION: Inferring networks of proteins from biological data is a central issue of computational biology. Most network inference methods, including Bayesian networks, take unsupervised approaches in which the network is totally unknown in the beginning, and all the edges have to be predicted. A more realistic supervised framework, proposed recently, assumes that a substantial part of the network is known. We propose a new kernel-based method for supervised graph inference based on multiple types of biological datasets such as gene expression, phylogenetic profiles and amino acid sequences. Notably, our method assigns a weight to each type of dataset and thereby selects informative ones. Data selection is useful for reducing data collection costs. For example, when a similar network inference problem must be solved for other organisms, the dataset excluded by our algorithm need not be collected. RESULTS: First, we formulate supervised network inference as a kernel matrix completion problem, where the inference of edges boils down to estimation of missing entries of a kernel matrix. Then, an expectation-maximization algorithm is proposed to simultaneously infer the missing entries of the kernel matrix and the weights of multiple datasets. By introducing the weights, we can integrate multiple datasets selectively and thereby exclude irrelevant and noisy datasets. Our approach is favorably tested in two biological networks: a metabolic network and a protein interaction network. AVAILABILITY: Software is available on request. 相似文献

14.

Distributed Gaussian mixture model-based particle filter method for chemical pollution source localization with sensor network

Yong Zhang Liyi Zhang Jianfeng Han Zhe Ban 《Cluster computing》2017,20(4):2905-2917

Chemical pollution source localization with statistical estimation algorithm in sensor networks, which was also known as source parameters estimation, has an important significance in fields such as pollution environmental monitoring and control. In this paper, a distributed Gaussian mixture dispersion model based particle filter method was proposed for the chemical pollution source localization problem. At the same time, we designed a composite information objective function for sensor scheduling scheme, which comprised of information utility measurement and energy consumption measurement. At last, in order to balance the source localization accuracy and energy consumption, a dynamical sensor radius adjusting method was given for sensor nodes scheduling. Simulation and experiment results show that the proposed method could determine the position of chemical pollution source, compared to UKF, the distributed Gaussian mixture particle filter method was suggested because it could get a significant reduction in the required numbers of sensor nodes and less energy to achieve the desired performance with less time. 相似文献

15.

缺失数据下蛋白质多结构叠加的迭代方法

路建波高华方张世华卢本卓马旭《中国生物化学与分子生物学报》2017,33(6):630-637

蛋白质三维结构叠加面临的主要问题是,参与叠加的目标蛋白质的氨基酸残基存在某些缺失,但是多结构叠加方法却大多数需要完整的氨基酸序列,而目前通用的方法是直接删去缺失的氨基酸序列,导致叠加结果不准确。由于同源蛋白质间结构的相似性,因此,一个蛋白质结构中缺失的某个区域,可能存在于另一个同源蛋白质结构中。基于此,本文提出一种新的、简单、有效的缺失数据下的蛋白质结构叠加方法（ITEMDM）。该方法采用缺失数据的迭代思想计算蛋白质的结构叠加,采用优化的最小二乘算法结合矩阵SVD分解方法,求旋转矩阵和平移向量。用该方法成功叠加了细胞色素C家族的蛋白质和标准Fischer’s 数据库的蛋白质（67对蛋白质）,并且与其他方法进行了比较。数值实验表明,本算法有如下优点：①与THESEUS算法相比较,运行时间快,迭代次数少;②与PSSM算法相比较,结果准确,运算时间少。结果表明,该方法可以更好地叠加缺失数据的蛋白质三维结构。相似文献

16.

A Stochastic Model for Detecting Overlapping and Hierarchical Community Structure

Xiaochun Cao Xiao Wang Di Jin Xiaojie Guo Xianchao Tang 《PloS one》2015,10(3)

Community detection is a fundamental problem in the analysis of complex networks. Recently, many researchers have concentrated on the detection of overlapping communities, where a vertex may belong to more than one community. However, most current methods require the number (or the size) of the communities as a priori information, which is usually unavailable in real-world networks. Thus, a practical algorithm should not only find the overlapping community structure, but also automatically determine the number of communities. Furthermore, it is preferable if this method is able to reveal the hierarchical structure of networks as well. In this work, we firstly propose a generative model that employs a nonnegative matrix factorization (NMF) formulization with a l_2,1 norm regularization term, balanced by a resolution parameter. The NMF has the nature that provides overlapping community structure by assigning soft membership variables to each vertex; the l_2,1 regularization term is a technique of group sparsity which can automatically determine the number of communities by penalizing too many nonempty communities; and hence the resolution parameter enables us to explore the hierarchical structure of networks. Thereafter, we derive the multiplicative update rule to learn the model parameters, and offer the proof of its correctness. Finally, we test our approach on a variety of synthetic and real-world networks, and compare it with some state-of-the-art algorithms. The results validate the superior performance of our new method. 相似文献

17.

Assessing the sensitivity of divergence time estimates to locus sampling,calibration points,and model priors in a RAD-seq phylogeny of Carex section Schoenoxiphium

Tamara Villaverde Enrique Maguilla Modesto Luceño Andrew L. Hipp 《植物分类学报：英文版》2021,59(4):687-697

Restriction site-associated DNA sequencing (RAD-seq) and related methods have become relatively common approaches to resolve species-level phylogeny. It is not clear, however, whether RAD-seq data matrices are well suited to relaxed clock inference of divergence times, given the size of the matrices and the abundance of missing data. We investigated the sensitivity of Bayesian relaxed clock estimates of divergence times to alternative analytical decisions on an empirical RAD-seq phylogenetic matrix. We explored the relative contribution of secondary calibration strategies, amount of missing data, and the data partition analyzed to overall variance in divergence times inferred using BEAST MCMC analyses of Carex section Schoenoxiphium (Cyperaceae)—a recent radiation for which we have nearly complete species sampling of RAD-seq data. The crown node for Schoenoxiphium was estimated to be 15.22 (9.56–21.18) Ma using a single calibration point and low missing data, 11.93 (8.07–16.03) Ma using multiple calibration points and low missing data, and 8.34 (5.41–11.22) using multiple calibrations but high missing data. We found that using matrices with more than half of the individuals with missing data inferred younger mean ages for all nodes. Moreover, we have found that our molecular clock estimates are sensitive to the positions of the calibration(s) in our phylogenetic tree (using matrices with low missing data), especially when only a single calibration was applied to estimate divergence times. These results argue for sensitivity analyses and caution in interpreting divergence time estimates from RAD-seq data. 相似文献

18.

Inferring phenotypes from substance use via collaborative matrix completion

Jin Lu Jiangwen Sun Xinyu Wang Henry Kranzler Joel Gelernter Jinbo Bi 《BMC systems biology》2018,12(6):104

Background

Although substance use disorders (SUDs) are heritable, few genetic risk factors for them have been identified, in part due to the small sample sizes of study populations. To address this limitation, researchers have aggregated subjects from multiple existing genetic studies, but these subjects can have missing phenotypic information, including diagnostic criteria for certain substances that were not originally a focus of study. Recent advances in addiction neurobiology have shown that comorbid SUDs (e.g., the abuse of multiple substances) have similar genetic determinants, which makes it possible to infer missing SUD diagnostic criteria using criteria from another SUD and patient genotypes through statistical modeling.

Results

We propose a new approach based on matrix completion techniques to integrate features of comorbid health conditions and individual’s genotypes to infer unreported diagnostic criteria for a disorder. This approach optimizes a bi-linear model that uses the interactions between known disease correlations and candidate genes to impute missing criteria. An efficient stochastic and parallel algorithm was developed to optimize the model with a speed 20 times greater than the classic sequential algorithm. It was tested on 3441 subjects who had both cocaine and opioid use disorders and successfully inferred missing diagnostic criteria with consistently better accuracy than other recent statistical methods.

Conclusions

The proposed matrix completion imputation method is a promising tool to impute unreported or unobserved symptoms or criteria for disease diagnosis. Integrating data at multiple scales or from heterogeneous sources may help improve the accuracy of phenotype imputation.

相似文献

19.

The impact of anchored phylogenomics and taxon sampling on phylogenetic inference in narrow‐mouthed frogs (Anura,Microhylidae)

下载免费PDF全文

Pedro L.V. Peloso Darrel R. Frost Stephen J. Richards Miguel T. Rodrigues Stephen Donnellan Masafumi Matsui Cristopher J. Raxworthy S.D. Biju Emily Moriarty Lemmon Alan R. Lemmon Ward C. Wheeler 《Cladistics : the international journal of the Willi Hennig Society》2016,32(2):113-140

Despite considerable progress in unravelling the phylogenetic relationships of microhylid frogs, relationships among subfamilies remain largely unstable and many genera are not demonstrably monophyletic. Here, we used five alternative combinations of DNA sequence data (ranging from seven loci for 48 taxa to up to 73 loci for as many as 142 taxa) generated using the anchored phylogenomics sequencing method (66 loci, derived from conserved genome regions, for 48 taxa) and Sanger sequencing (seven loci for up to 142 taxa) to tackle this problem. We assess the effects of character sampling, taxon sampling, analytical methods and assumptions in phylogenetic inference of microhylid frogs. The phylogeny of microhylids shows high susceptibility to different analytical methods and datasets used for the analyses. Clades inferred from maximum‐likelihood are generally more stable across datasets than those inferred from parsimony. Parsimony trees inferred within a tree‐alignment framework are generally better resolved and better supported than those inferred within a similarity‐alignment framework, even under the same cost matrix (equally weighted) and same treatment of gaps (as a fifth nucleotide state). We discuss potential causes for these differences in resolution and clade stability among discovery operations. We also highlight the problem that commonly used algorithms for model‐based analyses do not explicitly model insertion and deletion events (i.e. gaps are treated as missing data). Our results corroborate the monophyly of Microhylidae and most currently recognized subfamilies but fail to provide support for relationships among subfamilies. Several taxonomic updates are provided, including naming of two new subfamilies, both monotypic. 相似文献

20.

A binary matrix factorization algorithm for protein complex prediction

Tu S Chen R Xu L 《Proteome science》2011,9(Z1):S18

BACKGROUND: Identifying biologically relevant protein complexes from a large protein-protein interaction (PPI) network, is essential to understand the organization of biological systems. However, high-throughput experimental techniques that can produce a large amount of PPIs are known to yield non-negligible rates of false-positives and false-negatives, making the protein complexes difficult to be identified. RESULTS: We propose a binary matrix factorization (BMF) algorithm under the Bayesian Ying-Yang (BYY) harmony learning, to detect protein complexes by clustering the proteins which share similar interactions through factorizing the binary adjacent matrix of a PPI network. The proposed BYY-BMF algorithm automatically determines the cluster number while this number is pre-given for most existing BMF algorithms. Also, BYY-BMF's clustering results does not depend on any parameters or thresholds, unlike the Markov Cluster Algorithm (MCL) that relies on a so-called inflation parameter. On synthetic PPI networks, the predictions evaluated by the known annotated complexes indicate that BYY-BMF is more robust than MCL for most cases. On real PPI networks from the MIPS and DIP databases, BYY-BMF obtains a better balanced prediction accuracies than MCL and a spectral analysis method, while MCL has its own advantages, e.g., with good separation values. 相似文献