首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Recommender systems are designed to assist individual users to navigate through the rapidly growing amount of information. One of the most successful recommendation techniques is the collaborative filtering, which has been extensively investigated and has already found wide applications in e-commerce. One of challenges in this algorithm is how to accurately quantify the similarities of user pairs and item pairs. In this paper, we employ the multidimensional scaling (MDS) method to measure the similarities between nodes in user-item bipartite networks. The MDS method can extract the essential similarity information from the networks by smoothing out noise, which provides a graphical display of the structure of the networks. With the similarity measured from MDS, we find that the item-based collaborative filtering algorithm can outperform the diffusion-based recommendation algorithms. Moreover, we show that this method tends to recommend unpopular items and increase the global diversification of the networks in long term.  相似文献   

2.
With the development of IT convergence technologies, users can now more easily access useful information. These days, diverse and far-reaching information is being rapidly produced and distributed instantly in digitized format. Studies are continuously seeking to develop more efficient methods of delivering information to a greater number of users. Image filtering, which extracts features of interest from images, was developed to address the weakness of collaborative filtering, which is limited to superficial data analysis. However, image filtering has its own weakness of requiring complicated calculations to obtain the similarity between images. In this study, to resolve these problems, we propose associative image filtering based on the mining method utilizing the harmonic mean. Using data mining’s Apriori algorithm, this study investigated the association among preferred images from an associative image group and obtained a prediction based on user preference mean. In so doing, we observed a positive relationship between the various image preferences and the various distances between images’ color histograms. Preference mean was calculated based on the arithmetic mean, geometric mean, and harmonic mean. We found through performance analysis that the harmonic mean had the highest accuracy. In associative image filtering, we used the harmonic mean in order to anticipate preferences. In testing accuracy with MAE utilizing the proposed method, this study demonstrated an improvement of approximately 12 % on average compared to previous collaborative image filtering.  相似文献   

3.
Due to the exponential growth of information, recommender systems have been a widely exploited technique to solve the problem of information overload effectively. Collaborative filtering (CF) is the most successful and extensively employed recommendation approach. However, current CF methods recommend suitable items for users mainly by user-item matrix that contains the individual preference of users for items in a collection. So these methods suffer from such problems as the sparsity of the available data and low accuracy in predictions. To address these issues, borrowing the idea of cognition degree from cognitive psychology and employing the regularized matrix factorization (RMF) as the basic model, we propose a novel drifting cognition degree-based RMF collaborative filtering method named CogTime_RMF that incorporates both user-item matrix and users’ drifting cognition degree with time. Moreover, we conduct experiments on the real datasets MovieLens 1 M and MovieLens 100 k, and the method is compared with three similarity based methods and three other latest matrix factorization based methods. Empirical results demonstrate that our proposal can yield better performance over other methods in accuracy of recommendation. In addition, results show that CogTime_RMF can alleviate the data sparsity, particularly in the circumstance that few ratings are observed.  相似文献   

4.
Fast algorithms for pairwise biosequence similarity search frequently use filtering and indexing strategies to identify potential matches between a query sequence and a database. For the most part, these strategies are not informed by the substitution score matrices commonly used by comparison algorithms to assign numerical scores to pairs of aligned residues. Consequently, although many filtering strategies offer strong formal guarantees about their ability to detect pairs of sequences differing by few substitutions, these methods can make no guarantee of detecting pairs with high similarity scores. We describe a general technique, score simulation, to help resolve the tension between existing filtering techniques and the use of score matrices. Score simulation, using score matrices, maps ungapped similarity search problems to the simpler problem of finding pairs of strings that differ by few substitutions. Score simulation leads to indexing schemes for biosequences that permit efficient ungapped similarity search with arbitrary score matrices while maintaining strong formal guarantees of sensitivity. We introduce the LSH-ALL-PAIRS-SIM algorithm for finding local similarities in large biosequence collections and show that it is both computationally feasible and sensitive in practice.  相似文献   

5.
This research analyzes some aspects of the relationship between gene expression, gene function, and gene annotation. Many recent studies are implicitly based on the assumption that gene products that are biologically and functionally related would maintain this similarity both in their expression profiles as well as in their gene ontology (GO) annotation. We analyze how accurate this assumption proves to be using real publicly available data. We also aim to validate a measure of semantic similarity for GO annotation. We use the Pearson correlation coefficient and its absolute value as a measure of similarity between expression profiles of gene products. We explore a number of semantic similarity measures (Resnik, Jiang, and Lin) and compute the similarity between gene products annotated using the GO. Finally, we compute correlation coefficients to compare gene expression similarity against GO semantic similarity. Our results suggest that the Resnik similarity measure outperforms the others and seems better suited for use in gene ontology. We also deduce that there seems to be correlation between semantic similarity in the GO annotation and gene expression for the three GO ontologies. We show that this correlation is negligible up to a certain semantic similarity value; then, for higher similarity values, the relationship trend becomes almost linear. These results can be used to augment the knowledge provided by clustering algorithms and in the development of bioinformatic tools for finding and characterizing gene products.  相似文献   

6.
Document similarity has important real life applications such as finding duplicate web sites and identifying plagiarism. While the basic techniques such as k-similarity algorithms have been long known, overwhelming amount of data, being collected such as in big data setting, calls for novel algorithms to find highly similar documents in reasonably short amount of time. In particular, pairwise comparison of documents’ features, a key operation in calculating document similarity, necessitates prohibitively high storage and computation power. In this paper, we propose a new filtering technique that decreases the number of comparisons between the query set and the search set to find highly similar documents. The proposed filtering technique utilizes Z-order prefix, based on the cosine similarity measure, in which only the most important features are used first to find highly similar documents. We propose a three-phase approach, where the phases are near duplicate detection, common important terms and join phase. We utilize the Hadoop distributed file system and the MapReduce parallel programming model to scale our techniques to big data setting. Our experimental results on real data show that the proposed method performs better than the previous work in the literature in terms of the number of joins, and therefore, speed.  相似文献   

7.
The analysis of polychoric correlations via principal component analysis and exploratory factor analysis are well-known approaches to determine the dimensionality of ordered categorical items. However, the application of these approaches has been considered as critical due to the possible indefiniteness of the polychoric correlation matrix. A possible solution to this problem is the application of smoothing algorithms. This study compared the effects of three smoothing algorithms, based on the Frobenius norm, the adaption of the eigenvalues and eigenvectors, and on minimum-trace factor analysis, on the accuracy of various variations of parallel analysis by the means of a simulation study. We simulated different datasets which varied with respect to the size of the respondent sample, the size of the item set, the underlying factor model, the skewness of the response distributions and the number of response categories in each item. We found that a parallel analysis and principal component analysis of smoothed polychoric and Pearson correlations led to the most accurate results in detecting the number of major factors in simulated datasets when compared to the other methods we investigated. Of the methods used for smoothing polychoric correlation matrices, we recommend the algorithm based on minimum trace factor analysis.  相似文献   

8.
The user-based collaborative filtering (CF) algorithm is one of the most popular approaches for making recommendation. Despite its success, the traditional user-based CF algorithm suffers one serious problem that it only measures the influence between two users based on their symmetric similarities calculated by their consumption histories. It means that, for a pair of users, the influences on each other are the same, which however may not be true. Intuitively, an expert may have an impact on a novice user but a novice user may not affect an expert at all. Besides, each user may possess a global importance factor that affects his/her influence to the remaining users. To this end, in this paper, we propose an asymmetric user influence model to measure the directed influence between two users and adopt the PageRank algorithm to calculate the global importance value of each user. And then the directed influence values and the global importance values are integrated to deduce the final influence values between two users. Finally, we use the final influence values to improve the performance of the traditional user-based CF algorithm. Extensive experiments have been conducted, the results of which have confirmed that both the asymmetric user influence model and global importance value play key roles in improving recommendation accuracy, and hence the proposed method significantly outperforms the existing recommendation algorithms, in particular the user-based CF algorithm on the datasets of high rating density.  相似文献   

9.
Molecular similarity and molecular diversity techniques lie at the heart of attempts to design structurally diverse combinatorial libraries for the identification of novel bioactive compounds. Recent advances include the development of new types of selection algorithm, the validation of such algorithms, the use of filtering systems to screen out undesirable molecules prior to the design of a library, and the integration of similarity and diversity analysis with other methods for computer-aided molecular design.  相似文献   

10.
As one of the major challenges, cold-start problem plagues nearly all recommender systems. In particular, new items will be overlooked, impeding the development of new products online. Given limited resources, how to utilize the knowledge of recommender systems and design efficient marketing strategy for new items is extremely important. In this paper, we convert this ticklish issue into a clear mathematical problem based on a bipartite network representation. Under the most widely used algorithm in real e-commerce recommender systems, the so-called item-based collaborative filtering, we show that to simply push new items to active users is not a good strategy. Interestingly, experiments on real recommender systems indicate that to connect new items with some less active users will statistically yield better performance, namely, these new items will have more chance to appear in other users'' recommendation lists. Further analysis suggests that the disassortative nature of recommender systems contributes to such observation. In a word, getting in-depth understanding on recommender systems could pave the way for the owners to popularize their cold-start products with low costs.  相似文献   

11.
Estimating pairwise correlation from replicated genome-scale (a.k.a. OMICS) data is fundamental to cluster functionally relevant biomolecules to a cellular pathway. The popular Pearson correlation coefficient estimates bivariate correlation by averaging over replicates. It is not completely satisfactory since it introduces strong bias while reducing variance. We propose a new multivariate correlation estimator that models all replicates as independent and identically distributed (i.i.d.) samples from the multivariate normal distribution. We derive the estimator by maximizing the likelihood function. For small sample data, we provide a resampling-based statistical inference procedure, and for moderate to large sample data, we provide an asymptotic statistical inference procedure based on the Likelihood Ratio Test (LRT). We demonstrate advantages of the new multivariate correlation estimator over Pearson bivariate correlation estimator using simulations and real-world data analysis examples. AVAILABILITY: The estimator and statistical inference procedures have been implemented in an R package 'CORREP' that is available from CRAN [http://cran.r-project.org] and Bioconductor [http://www.bioconductor.org/]. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

12.
13.
In real-time collaborative graphical editing systems, bitmap-based graphical editing systems are particularly special and practically useful ones, and Do and Undo/Redo operations are intricate problems in this field. However, existing researches on graphical editing systems are quite scanty. In this paper, based on Multi-version strategy, we propose a new approach to solve the Do and Undo/Redo consistency maintenance problems with due consideration of three possible cases: all-causal, all-independent and causal-independent-mixed operations. Compared with previous collaborative algorithms, the algorithms proposed in this paper support Do and Undo/Redo operations without requiring additional space. In addition, two example analyses are also given to prove the algorithms’ effectiveness separately. Furthermore, the time complexity of the two algorithms is both O(n). Finally, a system prototype called bitmap-based Co-Graphical Editor is implemented to verify them realistically.  相似文献   

14.
A protein-protein docking procedure traditionally consists in two successive tasks: a search algorithm generates a large number of candidate conformations mimicking the complex existing in vivo between two proteins, and a scoring function is used to rank them in order to extract a native-like one. We have already shown that using Voronoi constructions and a well chosen set of parameters, an accurate scoring function could be designed and optimized. However to be able to perform large-scale in silico exploration of the interactome, a near-native solution has to be found in the ten best-ranked solutions. This cannot yet be guaranteed by any of the existing scoring functions. In this work, we introduce a new procedure for conformation ranking. We previously developed a set of scoring functions where learning was performed using a genetic algorithm. These functions were used to assign a rank to each possible conformation. We now have a refined rank using different classifiers (decision trees, rules and support vector machines) in a collaborative filtering scheme. The scoring function newly obtained is evaluated using 10 fold cross-validation, and compared to the functions obtained using either genetic algorithms or collaborative filtering taken separately. This new approach was successfully applied to the CAPRI scoring ensembles. We show that for 10 targets out of 12, we are able to find a near-native conformation in the 10 best ranked solutions. Moreover, for 6 of them, the near-native conformation selected is of high accuracy. Finally, we show that this function dramatically enriches the 100 best-ranking conformations in near-native structures.  相似文献   

15.
Online users nowadays are facing serious information overload problem. In recent years, recommender systems have been widely studied to help people find relevant information. Adaptive social recommendation is one of these systems in which the connections in the online social networks are optimized for the information propagation so that users can receive interesting news or stories from their leaders. Validation of such adaptive social recommendation methods in the literature assumes uniform distribution of users'' activity frequency. In this paper, our empirical analysis shows that the distribution of online users'' activity is actually heterogenous. Accordingly, we propose a more realistic multi-agent model in which users'' activity frequency are drawn from a power-law distribution. We find that previous social recommendation methods lead to serious delay of information propagation since many users are connected to inactive leaders. To solve this problem, we design a new similarity measure which takes into account users'' activity frequencies. With this similarity measure, the average delay is significantly shortened and the recommendation accuracy is largely improved.  相似文献   

16.
17.
The causal relationship between genes and diseases has been investigated with the development of DNA sequence. Polymorphisms incorporated in the HapMap Project have enabled fine mapping with linkage disequilibrium (LD) and prior clustering of the haplotypes on the basis of a similarity measure has often been performed in an attempt to capture coalescent events because they can reduce the amount of computation. However an inappropriate choice of similarity measure can lead to wrong conclusions and we propose a new haplotype-based clustering algorithm for fine-scale mapping by using a Bayesian partition model. To handle phase-unknown genotypes, we propose a new algorithm based on a Metropolized Gibbs sampler and it is implemented in C++. Our simulation studies found that the proposed method improves the accuracy of the estimator for the disease susceptibility locus. We illustrated the practical implication of the new analysis method by an application to fine-scale mapping of CYP2D6 in drug metabolism.  相似文献   

18.
Background: MicroRNAs (miRNAs) are a significant type of non-coding RNAs, which usually were encoded by endogenous genes with about ~22 nt nucleotides. Accumulating biological experiments have shown that miRNAs have close associations with various human diseases. Although traditional experimental methods achieve great successes in miRNA-disease interaction identification, these methods also have some limitations. Therefore, it is necessary to develop computational method to predict miRNA-disease interactions. Methods: Here, we propose a computational framework (MDVSI) to predict interactions between miRNAs and diseases by integrating miRNA topological similarity and functional similarity. Firstly, the CosRA index is utilized to measure miRNA similarity based on network topological feature. Then, in order to enhance the reliability of miRNA similarity, the functional similarity and CosRA similarity are integrated based on linear weight method. Further, the potential miRNA-disease associations are predicted by using recommendation method. In addition, in order to overcome limitation of recommendation method, for new disease, a new strategy is proposed to predict potential interactions between miRNAs and new disease based on disease functional similarity. Results: To evaluate the performance of different methods, we conduct ten-fold cross validation and de novo test in experiment and compare MDVSI with two the-state-of-art methods. The experimental result shows that MDVSI achieves an AUC of 0.91, which is at least 0.012 higher than other compared methods. Conclusions: In summary, we propose a computational framework (MDSVI) for miRNA-disease interaction prediction. The experiment results demonstrate that it outperforms other the-state-of-the-art methods. Case study shows that it can effectively identify potential miRNA-disease interactions.  相似文献   

19.
MOTIVATION: Word-matching algorithms such as BLAST are routinely used for sequence comparison. These algorithms typically use areas of matching words to seed alignments which are then used to assess the degree of sequence similarity. In this paper, we show that by formally separating the word-matching and sequence-alignment process, and using information about word frequencies to generate alignments and similarity scores, we can create a new sequence-comparison algorithm which is both fast and sensitive. The formal split between word searching and alignment allows users to select an appropriate alignment method without affecting the underlying similarity search. The algorithm has been used to develop software for identifying entries in DNA sequence databases which are contaminated with vector sequence. RESULTS: We present three algorithms, RAPID, PHAT and SPLAT, which together allow vector contaminations to be found and assessed extremely rapidly. RAPID is a word search algorithm which uses probabilities to modify the significance attached to different words; PHAT and SPLAT are alignment algorithms. An initial implementation has been shown to be approximately an order of magnitude faster than BLAST. The formal split between word searching and alignment not only offers considerable gains in performance, but also allows alignment generation to be viewed as a user interface problem, allowing the most useful output method to be selected without affecting the underlying similarity search. Receiver Operator Characteristic (ROC) analysis of an artificial test set allows the optimal score threshold for identifying vector contamination to be determined. ROC curves were also used to determine the optimum word size (nine) for finding vector contamination. An analysis of the entire expressed sequence tag (EST) subset of EMBL found a contamination rate of 0.27%. A more detailed analysis of the 50 000 ESTs in est10.dat (an EST subset of EMBL) finds an error rate of 0.86%, principally due to two large-scale projects. AVAILABILITY: A Web page for the software exists at http://bioinf.man.ac.uk/rapid, or it can be downloaded from ftp://ftp.bioinf.man.ac.uk/RAPID CONTACT: crispin@cs.man.ac.uk  相似文献   

20.
This paper analyses the problem of prey choice for a forager that can collect only one item per foraging trip. In the terminology of Orians and Pearson (1979) this is central-place foraging by single-prey loaders. Our analysis goes beyond that of Orians and Pearson by including a cost of time away from the central place that can be a constant per unit time or can increase. Two new effects emerge: (i) It can be optimal to abandon the foraging trip without having collected an item, (ii) The minimum acceptable item depends on the time spent foraging. We show how our framework encompasses cases not usually thought of as central place foraging, such as diving animals that must surface for air.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号