首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
With the rapid growth of the Internet and overwhelming amount of information and choices that people are confronted with, recommender systems have been developed to effectively support users’ decision-making process in the online systems. However, many recommendation algorithms suffer from the data sparsity problem, i.e. the user-object bipartite networks are so sparse that algorithms cannot accurately recommend objects for users. This data sparsity problem makes many well-known recommendation algorithms perform poorly. To solve the problem, we propose a recommendation algorithm based on the semi-local diffusion process on the user-object bipartite network. The simulation results on two sparse datasets, Amazon and Bookcross, show that our method significantly outperforms the state-of-the-art methods especially for those small-degree users. Two personalized semi-local diffusion methods are proposed which further improve the recommendation accuracy. Finally, our work indicates that sparse online systems are essentially different from the dense online systems, so it is necessary to reexamine former algorithms and conclusions based on dense data in sparse systems.  相似文献   

2.
As one of the major challenges, cold-start problem plagues nearly all recommender systems. In particular, new items will be overlooked, impeding the development of new products online. Given limited resources, how to utilize the knowledge of recommender systems and design efficient marketing strategy for new items is extremely important. In this paper, we convert this ticklish issue into a clear mathematical problem based on a bipartite network representation. Under the most widely used algorithm in real e-commerce recommender systems, the so-called item-based collaborative filtering, we show that to simply push new items to active users is not a good strategy. Interestingly, experiments on real recommender systems indicate that to connect new items with some less active users will statistically yield better performance, namely, these new items will have more chance to appear in other users'' recommendation lists. Further analysis suggests that the disassortative nature of recommender systems contributes to such observation. In a word, getting in-depth understanding on recommender systems could pave the way for the owners to popularize their cold-start products with low costs.  相似文献   

3.
Liu  Xi  Liu  Jun 《Cluster computing》2022,25(2):1095-1109

We address the problem of online virtual machine (VM) provisioning and allocation with multiple types of resources. Formulating this problem in an auction-based setting, we propose an accurate mathematical model incorporating the ability to preempt and resume a given task for the sake of best overall use of resources. Our objective is to efficiently provide and allocate multiple VMs to maximize social welfare and encourage users to declare truthful requests. We first design an offline optimal mechanism based on the VCG mechanism; this mechanism has full knowledge of all users and offers ideal solutions. We also design an online greedy mechanism that considers only current knowledge while offering near-optimal solutions instead. Our proposed greedy mechanism consists of winner determination and payment algorithms. Furthermore, we show that the winner determination algorithm is monotonic and that the payment algorithm implements the critical payment. Both our allocation methods offer incentives to users providing true values for the sake of obtaining the best utility. We performed extensive experiments to investigate the performance of our proposed greedy mechanism compared to the optimal mechanism. Experimental results demonstrate that our proposed greedy mechanism obtains near-optimal solutions in a reasonable time.

  相似文献   

4.
Metabolomics and other omics tools are generally characterized by large data sets with many variables obtained under different environmental conditions. Clustering methods and more specifically two-mode clustering methods are excellent tools for analyzing this type of data. Two-mode clustering methods allow for analysis of the behavior of subsets of metabolites under different experimental conditions. In addition, the results are easily visualized. In this paper we introduce a two-mode clustering method based on a genetic algorithm that uses a criterion that searches for homogeneous clusters. Furthermore we introduce a cluster stability criterion to validate the clusters and we provide an extended knee plot to select the optimal number of clusters in both experimental and metabolite modes. The genetic algorithm-based two-mode clustering gave biological relevant results when it was applied to two real life metabolomics data sets. It was, for instance, able to identify a catabolic pathway for growth on several of the carbon sources. Electronic supplementary material The online version of this article (doi:) contains supplementary material, which is available to authorized users. J. A. Hageman and R. A. van den Berg contributed equally to this paper.  相似文献   

5.
Radicchi F 《PloS one》2011,6(2):e17249
We considered all matches played by professional tennis players between 1968 and 2010, and, on the basis of this data set, constructed a directed and weighted network of contacts. The resulting graph showed complex features, typical of many real networked systems studied in literature. We developed a diffusion algorithm and applied it to the tennis contact network in order to rank professional players. Jimmy Connors was identified as the best player in the history of tennis according to our ranking procedure. We performed a complete analysis by determining the best players on specific playing surfaces as well as the best ones in each of the years covered by the data set. The results of our technique were compared to those of two other well established methods. In general, we observed that our ranking method performed better: it had a higher predictive power and did not require the arbitrary introduction of external criteria for the correct assessment of the quality of players. The present work provides novel evidence of the utility of tools and methods of network theory in real applications.  相似文献   

6.
SomaticSeq is an accurate somatic mutation detection pipeline implementing a stochastic boosting algorithm to produce highly accurate somatic mutation calls for both single nucleotide variants and small insertions and deletions. The workflow currently incorporates five state-of-the-art somatic mutation callers, and extracts over 70 individual genomic and sequencing features for each candidate site. A training set is provided to an adaptively boosted decision tree learner to create a classifier for predicting mutation statuses. We validate our results with both synthetic and real data. We report that SomaticSeq is able to achieve better overall accuracy than any individual tool incorporated.

Electronic supplementary material

The online version of this article (doi:10.1186/s13059-015-0758-2) contains supplementary material, which is available to authorized users.  相似文献   

7.
The aim of model calibration is to estimate unique parameter values from available experimental data, here applied to a biocatalytic process. The traditional approach of first gathering data followed by performing a model calibration is inefficient, since the information gathered during experimentation is not actively used to optimize the experimental design. By applying an iterative robust model‐based optimal experimental design, the limited amount of data collected is used to design additional informative experiments. The algorithm is used here to calibrate the initial reaction rate of an ω‐transaminase catalyzed reaction in a more accurate way. The parameter confidence region estimated from the Fisher Information Matrix is compared with the likelihood confidence region, which is not only more accurate but also a computationally more expensive method. As a result, an important deviation between both approaches is found, confirming that linearization methods should be applied with care for nonlinear models. © 2017 American Institute of Chemical Engineers Biotechnol. Prog., 33:1278–1293, 2017  相似文献   

8.
In this paper, based on the coupled social networks (CSN), we propose a hybrid algorithm to nonlinearly integrate both social and behavior information of online users. Filtering algorithm, based on the coupled social networks, considers the effects of both social similarity and personalized preference. Experimental results based on two real datasets, Epinions and Friendfeed, show that the hybrid pattern can not only provide more accurate recommendations, but also enlarge the recommendation coverage while adopting global metric. Further empirical analyses demonstrate that the mutual reinforcement and rich-club phenomenon can also be found in coupled social networks where the identical individuals occupy the core position of the online system. This work may shed some light on the in-depth understanding of the structure and function of coupled social networks.  相似文献   

9.
Online users nowadays are facing serious information overload problem. In recent years, recommender systems have been widely studied to help people find relevant information. Adaptive social recommendation is one of these systems in which the connections in the online social networks are optimized for the information propagation so that users can receive interesting news or stories from their leaders. Validation of such adaptive social recommendation methods in the literature assumes uniform distribution of users'' activity frequency. In this paper, our empirical analysis shows that the distribution of online users'' activity is actually heterogenous. Accordingly, we propose a more realistic multi-agent model in which users'' activity frequency are drawn from a power-law distribution. We find that previous social recommendation methods lead to serious delay of information propagation since many users are connected to inactive leaders. To solve this problem, we design a new similarity measure which takes into account users'' activity frequencies. With this similarity measure, the average delay is significantly shortened and the recommendation accuracy is largely improved.  相似文献   

10.
In this paper, we propose an iterative beam hardening correction method that is applicable for the case with multiple materials. By assuming that the materials composing scanned object are known and that they are distinguishable by their linear attenuation coefficients at some given energy, the beam hardening correction problem is converted into a nonlinear system problem, which is then solved iteratively. The reconstructed image is the distribution of linear attenuation coefficient of the scanned object at a given energy. So there are no beam hardening artifacts in the image theoretically. The proposed iterative scheme combines an accurate polychromatic forward projection with a linearized backprojection. Both forward projection and backprojection have high degree of parallelism, and are suitable for acceleration on parallel systems. Numerical experiments with both simulated data and real data verifies the validity of the proposed method. The beam hardening artifacts are alleviated effectively. In addition, the proposed method has a good tolerance on the error of the estimated x-ray spectrum.  相似文献   

11.
Online estimation of unknown state variables is a key component in the accurate modelling of biological wastewater treatment processes due to a lack of reliable online measurement systems. The extended Kalman filter (EKF) algorithm has been widely applied for wastewater treatment processes. However, the series approximations in the EKF algorithm are not valid, because biological wastewater treatment processes are highly nonlinear with a time-varying characteristic. This work proposes an alternative online estimation approach using the sequential Monte Carlo (SMC) methods for recursive online state estimation of a biological sequencing batch reactor for wastewater treatment. SMC is an algorithm that makes it possible to recursively construct the posterior probability density of the state variables, with respect to all available measurements, through a random exploration of the states by entities called ‘particle’. In this work, the simplified and modified Activated Sludge Model No. 3 with nonlinear biological kinetic models is used as a process model and formulated in a dynamic state-space model applied to the SMC method. The performance of the SMC method for online state estimation applied to a biological sequencing batch reactor with online and offline measured data is encouraging. The results indicate that the SMC method could emerge as a powerful tool for solving online state and parameter estimation problems without any model linearization or restrictive assumptions pertaining to the type of nonlinear models for biological wastewater treatment processes.  相似文献   

12.
A key step in the analysis of circadian data is to make an accurate estimate of the underlying period. There are many different techniques and algorithms for determining period, all with different assumptions and with differing levels of complexity. Choosing which algorithm, which implementation and which measures of accuracy to use can offer many pitfalls, especially for the non-expert. We have developed the BioDare system, an online service allowing data-sharing (including public dissemination), data-processing and analysis. Circadian experiments are the main focus of BioDare hence performing period analysis is a major feature of the system. Six methods have been incorporated into BioDare: Enright and Lomb-Scargle periodograms, FFT-NLLS, mFourfit, MESA and Spectrum Resampling. Here we review those six techniques, explain the principles behind each algorithm and evaluate their performance. In order to quantify the methods'' accuracy, we examine the algorithms against artificial mathematical test signals and model-generated mRNA data. Our re-implementation of each method in Java allows meaningful comparisons of the computational complexity and computing time associated with each algorithm. Finally, we provide guidelines on which algorithms are most appropriate for which data types, and recommendations on experimental design to extract optimal data for analysis.  相似文献   

13.
In large-scale systems biology applications, features are structured in hidden functional categories whose predictive power is identical. Feature selection, therefore, can lead not only to a problem with a reduced dimensionality, but also reveal some knowledge on functional classes of variables. In this contribution, we propose a framework based on a sparse zero-sum game which performs a stable functional feature selection. In particular, the approach is based on feature subsets ranking by a thresholding stochastic bandit. We provide a theoretical analysis of the introduced algorithm. We illustrate by experiments on both synthetic and real complex data that the proposed method is competitive from the predictive and stability viewpoints.  相似文献   

14.
15.
Parameter estimation in dynamic systems finds applications in various disciplines, including system biology. The well-known expectation-maximization (EM) algorithm is a popular method and has been widely used to solve system identification and parameter estimation problems. However, the conventional EM algorithm cannot exploit the sparsity. On the other hand, in gene regulatory network inference problems, the parameters to be estimated often exhibit sparse structure. In this paper, a regularized expectation-maximization (rEM) algorithm for sparse parameter estimation in nonlinear dynamic systems is proposed that is based on the maximum a posteriori (MAP) estimation and can incorporate the sparse prior. The expectation step involves the forward Gaussian approximation filtering and the backward Gaussian approximation smoothing. The maximization step employs a re-weighted iterative thresholding method. The proposed algorithm is then applied to gene regulatory network inference. Results based on both synthetic and real data show the effectiveness of the proposed algorithm.  相似文献   

16.
In this report, we compare and contrast three previously published Bayesian methods for inferring haplotypes from genotype data in a population sample. We review the methods, emphasizing the differences between them in terms of both the models ("priors") they use and the computational strategies they employ. We introduce a new algorithm that combines the modeling strategy of one method with the computational strategies of another. In comparisons using real and simulated data, this new algorithm outperforms all three existing methods. The new algorithm is included in the software package PHASE, version 2.0, available online (http://www.stat.washington.edu/stephens/software.html).  相似文献   

17.
Predicting protein functions computationally from massive protein-protein interaction (PPI) data generated by high-throughput technology is one of the challenges and fundamental problems in the post-genomic era. Although there have been many approaches developed for computationally predicting protein functions, the mutual correlations among proteins in terms of protein functions have not been thoroughly investigated and incorporated into existing prediction methods, especially in voting based prediction methods. In this paper, we propose an innovative method to predict protein functions from PPI data by aggregating the functional correlations among relevant proteins using the Choquet-Integral in fuzzy theory. This functional aggregation measures the real impact of each relevant protein function on the final prediction results, and reduces the impact of repeated functional information on the prediction. Accordingly, a new protein similarity and a new iterative prediction algorithm are proposed in this paper. The experimental evaluations on real PPI datasets demonstrate the effectiveness of our method.  相似文献   

18.
Michael K. Gilson 《Proteins》1993,15(3):266-282
Computer models of proteins frequently treat the energies and forces associated with ionizable groups as if they were purely electrostatic. This paper examines the validity of the purely electrostatic approach, and concludes that significant errors in energies can result from the neglect of ionization changes. However, a complete treatment of ionizable groups presents substantial computational obstacles, because of the large number of ionization states which must be examined in systems having multiple interacting titratable groups. In order to address this problem, two novel methods for treating the energetics and forces associated with ionizable groups with a minimum of computer time have been developed. The most rapid method yields approximate energies by computing the free energy of a single highly occupied ionization state. The second method separates ionizable groups into clusters, and treats intracluster interactions exactly, but intercluster interactions approximately. This method yields both accurate energies and fractional charges. Good results are obtained in tests of both methods on proteins having has many as 123 ionizable groups. The more rapid method requires computer times of 0.01 to 0.34 sec, while the more accurate method requires 0.7 to 15 sec. These methods may be fast enough to permit the incorporation of ionization effects in iterative computations, such as energy minimizations and conformational searches. © 1993 Wiley-Liss, Inc.  相似文献   

19.
Ryu  Minho  Lee  Geonseok  Lee  Kichun 《Cluster computing》2021,24(3):1975-1987

In the new era of big data, numerous information and technology systems can store huge amounts of streaming data in real time, for example, in server-access logs on web application servers. The importance of anomaly detection in voluminous quantities of streaming data from such systems is rapidly increasing. One of the biggest challenges in the detection task is to carry out real-time contextual anomaly detection in streaming data with varying patterns that are visually detectable but unsuitable for a parametric model. Most anomaly detection algorithms have weaknesses in dealing with streaming time-series data containing such patterns. In this paper, we propose a novel method for online contextual anomaly detection in streaming time-series data using generalized extreme studentized deviates (GESD) tests. The GESD test is relatively accurate and efficient because it performs statistical hypothesis testing but it is unable to handle streaming time-series data. Thus, focusing on streaming time-series data, we propose an online version of the test capable of detecting outliers under varying patterns. We perform extensive experiments with simulated data, syntactic data, and real online traffic data from Yahoo Webscope, showing a clear advantage of the proposed method, particularly for analyzing streaming data with varying patterns.

  相似文献   

20.
MOTIVATION: In analyses of microarray data with a design of different biological conditions, ranking genes by their differential 'importance' is often desired so that biologists can focus research on a small subset of genes that are most likely related to the experiment conditions. Permutation methods are often recommended and used, in place of their parametric counterparts, due to the small sample sizes of microarray experiments and possible non-normality of the data. The recommendations, however, are based on classical knowledge in the hypothesis test setting. RESULTS: We explore the relationship between hypothesis testing and gene ranking. We indicate that the permutation method does not provide a metric for the distance between two underlying distributions. In our simulation studies permutation methods tend to be equally or less accurate than parametric methods in ranking genes. This is partially due to the discreteness of the permutation distributions, as well as the non-metric property. In data analysis the variability in ranking genes can be assessed by bootstrap. It turns out that the variability is much lower for permutation than parametric methods, which agrees with the known robustness of permutation methods to individual outliers in the data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号