首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this paper a novel variable selection method based on Radial Basis Function (RBF) neural networks and genetic algorithms is presented. The fuzzy means algorithm is utilized as the training method for the RBF networks, due to its inherent speed, the deterministic approach of selecting the hidden node centers and the fact that it involves only a single tuning parameter. The trade-off between the accuracy and parsimony of the produced model is handled by using Final Prediction Error criterion, based on the RBF training and validation errors, as a fitness function of the proposed genetic algorithm. The tuning parameter required by the fuzzy means algorithm is treated as a free variable by the genetic algorithm. The proposed method was tested in benchmark data sets stemming from the scientific communities of time-series prediction and medicinal chemistry and produced promising results.  相似文献   

2.
Tu S  Chen R  Xu L 《Proteome science》2011,9(Z1):S18
BACKGROUND: Identifying biologically relevant protein complexes from a large protein-protein interaction (PPI) network, is essential to understand the organization of biological systems. However, high-throughput experimental techniques that can produce a large amount of PPIs are known to yield non-negligible rates of false-positives and false-negatives, making the protein complexes difficult to be identified. RESULTS: We propose a binary matrix factorization (BMF) algorithm under the Bayesian Ying-Yang (BYY) harmony learning, to detect protein complexes by clustering the proteins which share similar interactions through factorizing the binary adjacent matrix of a PPI network. The proposed BYY-BMF algorithm automatically determines the cluster number while this number is pre-given for most existing BMF algorithms. Also, BYY-BMF's clustering results does not depend on any parameters or thresholds, unlike the Markov Cluster Algorithm (MCL) that relies on a so-called inflation parameter. On synthetic PPI networks, the predictions evaluated by the known annotated complexes indicate that BYY-BMF is more robust than MCL for most cases. On real PPI networks from the MIPS and DIP databases, BYY-BMF obtains a better balanced prediction accuracies than MCL and a spectral analysis method, while MCL has its own advantages, e.g., with good separation values.  相似文献   

3.
This paper applies and studies the behavior of three learning algorithms, i.e. the Support Vector machine (SVM), the Radial Basis Function Network (the RBF network), and k-Nearest Neighbor (k-NN) for predicting HIV-1 drug resistance from genotype data. In addition, a new algorithm for classifier combination is proposed. The results of comparing the predictive performance of three learning algorithms show that, SVM yields the highest average accuracy, the RBF network gives the highest sensitivity, and k-NN yields the best in specificity. Finally, the comparison of the predictive performance of the composite classifier with three learning algorithms demonstrates that the proposed composite classifier provides the highest average accuracy.  相似文献   

4.
In phylogenetic inference by maximum-parsimony (MP), minimum-evolution (ME), and maximum-likelihood (ML) methods, it is customary to conduct extensive heuristic searches of MP, ME, and ML trees, examining a large number of different topologies. However, these extensive searches tend to give incorrect tree topologies. Here we show by extensive computer simulation that when the number of nucleotide sequences (m) is large and the number of nucleotides used (n) is relatively small, the simple MP or ML tree search algorithms such as the stepwise addition (SA) plus nearest neighbor interchange (NNI) search and the SA plus subtree pruning regrafting (SPR) search are as efficient as the extensive search algorithms such as the SA plus tree bisection-reconnection (TBR) search in inferring the true tree. In the case of ME methods, the simple neighbor-joining (NJ) algorithm is as efficient as or more efficient than the extensive NJ+TBR search. We show that when ME methods are used, the simple p distance generally gives better results in phylogenetic inference than more complicated distance measures such as the Hasegawa-Kishino-Yano (HKY) distance, even when nucleotide substitution follows the HKY model. When ML methods are used, the simple Jukes-Cantor (JC) model of phylogenetic inference generally shows a better performance than the HKY model even if the likelihood value for the HKY model is much higher than that for the JC model. This indicates that at least in the present case, selecting of a substitution model by using the likelihood ratio test or the AIC index is not appropriate. When n is small relative to m and the extent of sequence divergence is high, the NJ method with p distance often shows a better performance than ML methods with the JC model. However, when the level of sequence divergence is low, this is not the case.  相似文献   

5.
MOTIVATION: Gene selection algorithms for cancer classification, based on the expression of a small number of biomarker genes, have been the subject of considerable research in recent years. Shevade and Keerthi propose a gene selection algorithm based on sparse logistic regression (SLogReg) incorporating a Laplace prior to promote sparsity in the model parameters, and provide a simple but efficient training procedure. The degree of sparsity obtained is determined by the value of a regularization parameter, which must be carefully tuned in order to optimize performance. This normally involves a model selection stage, based on a computationally intensive search for the minimizer of the cross-validation error. In this paper, we demonstrate that a simple Bayesian approach can be taken to eliminate this regularization parameter entirely, by integrating it out analytically using an uninformative Jeffrey's prior. The improved algorithm (BLogReg) is then typically two or three orders of magnitude faster than the original algorithm, as there is no longer a need for a model selection step. The BLogReg algorithm is also free from selection bias in performance estimation, a common pitfall in the application of machine learning algorithms in cancer classification. RESULTS: The SLogReg, BLogReg and Relevance Vector Machine (RVM) gene selection algorithms are evaluated over the well-studied colon cancer and leukaemia benchmark datasets. The leave-one-out estimates of the probability of test error and cross-entropy of the BLogReg and SLogReg algorithms are very similar, however the BlogReg algorithm is found to be considerably faster than the original SLogReg algorithm. Using nested cross-validation to avoid selection bias, performance estimation for SLogReg on the leukaemia dataset takes almost 48 h, whereas the corresponding result for BLogReg is obtained in only 1 min 24 s, making BLogReg by far the more practical algorithm. BLogReg also demonstrates better estimates of conditional probability than the RVM, which are of great importance in medical applications, with similar computational expense. AVAILABILITY: A MATLAB implementation of the sparse logistic regression algorithm with Bayesian regularization (BLogReg) is available from http://theoval.cmp.uea.ac.uk/~gcc/cbl/blogreg/  相似文献   

6.
Plasma concentrations of the nitric oxide synthase inhibitor asymmetric dimethylarginine (ADMA) and symmetric dimethylarginine (SDMA) increase already in the early stages of renal insufficiency. There is no agreement as to whether reduced renal plasma clearance (RPCL) contributes to this increase. Therefore, we investigated the relationship between estimated glomerular filtration rate (eGFR), RPCL, and plasma ADMA and SDMA in essential hypertensive patients with mild to moderate renal insufficiency. In 171 patients who underwent renal angiography, we drew blood samples from the aorta and both renal veins and measured mean renal blood flow (MRBF) using the (133)Xe washout technique. RPCL was calculated using arteriovenous concentration differences and MRBF. After correction for potential confounders, reduced eGFR was associated with higher plasma ADMA and SDMA [standardized regression coefficient (β) = -0.22 (95% confidence intervals: -0.41, -0.04) and β = -0.66 (95% confidence intervals: -0.83, -0.49), respectively]. However, eGFR was not independently associated with RPCL of ADMA. Moreover, reduced RPCL of ADMA was not associated with higher plasma ADMA. Contrary to ADMA, reduced eGFR was indeed associated with lower RPCL of SDMA [β = 0.21 (95% confidence intervals: 0.02, 0.40)]. In conclusion, our findings indicate that RPCL of ADMA is independent of renal function in hypertensive patients with mild to moderate renal insufficiency. Unlike the case for SDMA, reduced RPCL of ADMA is of minor importance for the increase in plasma ADMA in these patients, which indicates that increased plasma ADMA in this population is not a direct consequence of the kidneys failing as a plasma ADMA-regulating organ.  相似文献   

7.
This paper reports the development and application of three powerful algorithms for the analysis and simulation of mathematical models consisting of ordinary differential equations. First, we describe an extended parameter sensitivity analysis: we measure the relative sensitivities of many dynamical behaviors of the model to perturbations of each parameter. We check sensitivities to parameter variation over both small and large ranges. These two extensions of a common technique have applications in parameter estimation and in experimental design. Second, we compute sensitivity functions, using an efficient algorithm requiring just one model simulation to obtain all sensitivities of state variables to all parameters as functions of time. We extend the analysis to a behavior which is not a state variable. Third, we present an unconstrained global optimization algorithm, and apply it in a novel way: we determine the input to the model, given an optimality criterion and typical outputs. The algorithm itself is an efficient one for high-order problems, and does not get stuck at local extrema. We apply the sensitivity analysis, sensitivity functions, and optimization algorithm to a sixth-order nonlinear ordinary differential equation model for human eye movements. This application shows that the algorithms are not only practicable for high-order models, but also useful as conceptual tools.  相似文献   

8.
One of the main obstacles to the widespread use of artificial neural networks is the difficulty of adequately defining values for their free parameters. This article discusses how Radial Basis Function (RBF) networks can have their parameters defined by genetic algorithms. For such, it presents an overall view of the problems involved and the different approaches used to genetically optimize RBF networks. A new strategy to optimize RBF networks using genetic algorithms is proposed, which includes new representation, crossover operator and the use of a multiobjective optimization criterion. Experiments using a benchmark problem are performed and the results achieved using this model are compared to those achieved by other approaches.  相似文献   

9.
Reinforcement learning algorithms have provided some of the most influential computational theories for behavioral learning that depends on reward and penalty. After briefly reviewing supporting experimental data, this paper tackles three difficult theoretical issues that remain to be explored. First, plain reinforcement learning is much too slow to be considered a plausible brain model. Second, although the temporal-difference error has an important role both in theory and in experiments, how to compute it remains an enigma. Third, function of all brain areas, including the cerebral cortex, cerebellum, brainstem and basal ganglia, seems to necessitate a new computational framework. Computational studies that emphasize meta-parameters, hierarchy, modularity and supervised learning to resolve these issues are reviewed here, together with the related experimental data.  相似文献   

10.
The development of high-throughput technology has generated a massive amount of high-dimensional data, and many of them are of discrete type. Robust and efficient learning algorithms such as LASSO [1] are required for feature selection and overfitting control. However, most feature selection algorithms are only applicable to the continuous data type. In this paper, we propose a novel method for sparse support vector machines (SVMs) with L_{p} (p ≪ 1) regularization. Efficient algorithms (LpSVM) are developed for learning the classifier that is applicable to high-dimensional data sets with both discrete and continuous data types. The regularization parameters are estimated through maximizing the area under the ROC curve (AUC) of the cross-validation data. Experimental results on protein sequence and SNP data attest to the accuracy, sparsity, and efficiency of the proposed algorithm. Biomarkers identified with our methods are compared with those from other methods in the literature. The software package in Matlab is available upon request.  相似文献   

11.
Community detection is a fundamental problem in the analysis of complex networks. Recently, many researchers have concentrated on the detection of overlapping communities, where a vertex may belong to more than one community. However, most current methods require the number (or the size) of the communities as a priori information, which is usually unavailable in real-world networks. Thus, a practical algorithm should not only find the overlapping community structure, but also automatically determine the number of communities. Furthermore, it is preferable if this method is able to reveal the hierarchical structure of networks as well. In this work, we firstly propose a generative model that employs a nonnegative matrix factorization (NMF) formulization with a l2,1 norm regularization term, balanced by a resolution parameter. The NMF has the nature that provides overlapping community structure by assigning soft membership variables to each vertex; the l2,1 regularization term is a technique of group sparsity which can automatically determine the number of communities by penalizing too many nonempty communities; and hence the resolution parameter enables us to explore the hierarchical structure of networks. Thereafter, we derive the multiplicative update rule to learn the model parameters, and offer the proof of its correctness. Finally, we test our approach on a variety of synthetic and real-world networks, and compare it with some state-of-the-art algorithms. The results validate the superior performance of our new method.  相似文献   

12.
Recent years, a large amount of ontology learning algorithms have been applied in different disciplines and engineering. The ontology model is presented as a graph and the key of ontology algorithms is similarity measuring between concepts. In the learning frameworks, the information of each ontology vertex is expressed as a vector, thus the similarity measuring can be determined via the distance of the corresponding vector. In this paper, we study how to get an optimal distance function in the ontology setting. The tricks we presented are divided into two parts: first, the ontology distance learning technology in the setting that the ontology data have no labels; then, the distance learning approaches in the setting that the given ontology data are carrying real numbers as their labels. The result data of the four simulation experiments reveal that our new ontology trick has high efficiency and accuracy in ontology similarity measure and ontology mapping in special engineering applications.  相似文献   

13.
Zou  Hui 《Biometrika》2008,95(1):241-247
We propose an efficient and adaptive shrinkage method for variableselection in the Cox model. The method constructs a piecewise-linearregularization path connecting the maximum partial likelihoodestimator and the origin. Then a model is selected along thepath. We show that the constructed path is adaptive in the sensethat, with a proper choice of regularization parameter, thefitted model works as well as if the true underlying submodelwere given in advance. A modified algorithm of the least-angle-regressiontype efficiently computes the entire regularization path ofthe new estimator. Furthermore, we show that, with a properchoice of shrinkage parameter, the method is consistent in variableselection and efficient in estimation. Simulation shows thatthe new method tends to outperform the lasso and the smoothly-clipped-absolute-deviationestimators with moderate samples. We apply the methodology todata concerning nursing homes.  相似文献   

14.
We consider the efficient initialization of structure and parameters of generalized Gaussian radial basis function (RBF) networks using fuzzy decision trees generated by fuzzy ID3 like induction algorithms. The initialization scheme is based on the proposed functional equivalence property of fuzzy decision trees and generalized Gaussian RBF networks. The resulting RBF network is compact, easy to induce, comprehensible, and has acceptable classification accuracy with stochastic gradient descent learning algorithm.  相似文献   

15.
16.
Freshwater crayfish are one of the most important aquatic organisms that play a pivotal role in the aquatic food chain as well as serving as bioindicators for the aquatic ecosystem health assessment. Hemocytes, the blood cells of crustaceans, can be considered stress and health indicators in crayfish, and are used to evaluate the health response. Therefore, total hemocyte cell numbers (THCs) are useful parameters to show the health of crustaceans and serve as stress indicators to decide the quality of the habitat. Since, catching the fish and the other aquatic organisms, and collecting the data for further assessments are time-consuming and frustrating, today, scientists tend to use swift, more sophisticated, and more reliable methods for modeling the ecosystem stressors based on bioindicators. One tool which has attracted the attention of science communities in the last decades is machine learning algorithms that are reliable and accurate methods to solve classification and regression problems. In this study, a support vector machine is carried out as a machine learning algorithm to classify healthy and unhealthy crayfish based on physiological characteristics. To solve the non-linearity problem of the data by transporting data to high-dimensional space, different kernel functions including polynomial (PK), Pearson VII function-based universal (PUK), and radial basis function (RBF) kernels are used and their effect on the performance of the SVM model was evaluated. Both PK and PUK functions performed well in classifying the crayfish. RBF, however, had an adverse impact on the performance of the model. PUK kernel exhibited an outstanding performance (Accuracy = 100%) for the classification of the healthy and unhealthy crayfish.  相似文献   

17.
支持向量机与神经网络的关系研究   总被引:2,自引:0,他引:2  
支持向量机是一种基于统计学习理论的新颖的机器学习方法,由于其出色的学习性能,该技术已成为当前国际机器学习界的研究热点,该方法已经广泛用于解决分类和回归问题.本文将结构风险函数应用于径向基函数网络学习中,同时讨论了支持向量回归模型和径向基函数网络之间的关系.仿真实例表明所给算法提高了径向基函数网络的泛化性能.  相似文献   

18.
This paper presents an approach based on Rival Penalized Competitive Learning (RPCL) rules for discrete-valued source separation. In this approach, we first build a connection between the source number and the cluster number of observations. Then, we use the RPCL rule to automatically find out the correct number of clusters such that the source number is determined. Moreover, we tune the de-mixing matrix based on the cluster centers instead of the observation themselves, whereby the noise interference is considerably reduced. The experiments have shown that this new approach not only quickly and automatically determines the number of sources, but also is insensitive to the noise in performing blind source separation.  相似文献   

19.
This paper presents the pruning and model-selecting algorithms to the support vector learning for sample classification and function regression. When constructing RBF network by support vector learning we occasionally obtain redundant support vectors which do not significantly affect the final classification and function approximation results. The pruning algorithms primarily based on the sensitivity measure and the penalty term. The kernel function parameters and the position of each support vector are updated in order to have minimal increase in error, and this makes the structure of SVM network more flexible. We illustrate this approach with synthetic data simulation and face detection problem in order to demonstrate the pruning effectiveness.  相似文献   

20.
Multi-marker approaches have received a lot of attention recently in genome wide association studies and can enhance power to detect new associations under certain conditions. Gene-, gene-set- and pathway-based association tests are increasingly being viewed as useful supplements to the more widely used single marker association analysis which have successfully uncovered numerous disease variants. A major drawback of single-marker based methods is that they do not look at the joint effects of multiple genetic variants which individually may have weak or moderate signals. Here, we describe novel tests for multi-marker association analyses that are based on phenotype predictions obtained from machine learning algorithms. Instead of assuming a linear or logistic regression model, we propose the use of ensembles of diverse machine learning algorithms for prediction. We show that phenotype predictions obtained from ensemble learning algorithms provide a new framework for multi-marker association analysis. They can be used for constructing tests for the joint association of multiple variants, adjusting for covariates and testing for the presence of interactions. To demonstrate the power and utility of this new approach, we first apply our method to simulated SNP datasets. We show that the proposed method has the correct Type-1 error rates and can be considerably more powerful than alternative approaches in some situations. Then, we apply our method to previously studied asthma-related genes in 2 independent asthma cohorts to conduct association tests.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号