首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
Statistical modeling of links between genetic profiles with environmental and clinical data to aid in medical diagnosis is a challenge. Here, we present a computational approach for rapidly selecting important clinical data to assist in medical decisions based on personalized genetic profiles. What could take hours or days of computing is available on-the-fly, making this strategy feasible to implement as a routine without demanding great computing power. The key to rapidly obtaining an optimal/nearly optimal mathematical function that can evaluate the "disease stage" by combining information of genetic profiles with personal clinical data is done by querying a precomputed solution database. The database is previously generated by a new hybrid feature selection method that makes use of support vector machines, recursive feature elimination and random sub-space search. Here, to evaluate the method, data from polymorphisms in the renin-angiotensin-aldosterone system genes together with clinical data were obtained from patients with hypertension and control subjects. The disease "risk" was determined by classifying the patients' data with a support vector machine model based on the optimized feature; then measuring the Euclidean distance to the hyperplane decision function. Our results showed the association of renin-angiotensin-aldosterone system gene haplotypes with hypertension. The association of polymorphism patterns with different ethnic groups was also tracked by the feature selection process. A demonstration of this method is also available online on the project's web site.  相似文献   

2.
Gene expression profiling has been widely used to study molecular signatures of many diseases and to develop molecular diagnostics for disease prediction. Gene selection, as an important step for improved diagnostics, screens tens of thousands of genes and identifies a small subset that discriminates between disease types. A two-step gene selection method is proposed to identify informative gene subsets for accurate classification of multiclass phenotypes. In the first step, individually discriminatory genes (IDGs) are identified by using one-dimensional weighted Fisher criterion (wFC). In the second step, jointly discriminatory genes (JDGs) are selected by sequential search methods, based on their joint class separability measured by multidimensional weighted Fisher criterion (wFC). The performance of the selected gene subsets for multiclass prediction is evaluated by artificial neural networks (ANNs) and/or support vector machines (SVMs). By applying the proposed IDG/JDG approach to two microarray studies, that is, small round blue cell tumors (SRBCTs) and muscular dystrophies (MDs), we successfully identified a much smaller yet efficient set of JDGs for diagnosing SRBCTs and MDs with high prediction accuracies (96.9% for SRBCTs and 92.3% for MDs, resp.). These experimental results demonstrated that the two-step gene selection method is able to identify a subset of highly discriminative genes for improved multiclass prediction.  相似文献   

3.
Identifying the informative genes has always been a major step in microarray data analysis. The complexity of various cancer datasets makes this issue still challenging. In this paper, a novel Bio-inspired Multi-objective algorithm is proposed for gene selection in microarray data classification specifically in the binary domain of feature selection. The presented method extends the traditional Bat Algorithm with refined formulations, effective multi-objective operators, and novel local search strategies employing social learning concepts in designing random walks. A hybrid model using the Fisher criterion is then applied to three widely-used microarray cancer datasets to explore significant biomarkers which reveal the effectiveness of the proposed method for genomic analysis. Experimental results unveil new combinations of informative biomarkers have association with other studies.  相似文献   

4.
Developing an efficient method for determination of the DNA-binding proteins, due to their vital roles in gene regulation, is becoming highly desired since it would be invaluable to advance our understanding of protein functions. In this study, we proposed a new method for the prediction of the DNA-binding proteins, by performing the feature rank using random forest and the wrapper-based feature selection using forward best-first search strategy. The features comprise information from primary sequence, predicted secondary structure, predicted relative solvent accessibility, and position specific scoring matrix. The proposed method, called DBPPred, used Gaussian naïve Bayes as the underlying classifier since it outperformed five other classifiers, including decision tree, logistic regression, k-nearest neighbor, support vector machine with polynomial kernel, and support vector machine with radial basis function. As a result, the proposed DBPPred yields the highest average accuracy of 0.791 and average MCC of 0.583 according to the five-fold cross validation with ten runs on the training benchmark dataset PDB594. Subsequently, blind tests on the independent dataset PDB186 by the proposed model trained on the entire PDB594 dataset and by other five existing methods (including iDNA-Prot, DNA-Prot, DNAbinder, DNABIND and DBD-Threader) were performed, resulting in that the proposed DBPPred yielded the highest accuracy of 0.769, MCC of 0.538, and AUC of 0.790. The independent tests performed by the proposed DBPPred on completely a large non-DNA binding protein dataset and two RNA binding protein datasets also showed improved or comparable quality when compared with the relevant prediction methods. Moreover, we observed that majority of the selected features by the proposed method are statistically significantly different between the mean feature values of the DNA-binding and the non DNA-binding proteins. All of the experimental results indicate that the proposed DBPPred can be an alternative perspective predictor for large-scale determination of DNA-binding proteins.  相似文献   

5.
Bio-support vector machines for computational proteomics   总被引:2,自引:0,他引:2  
MOTIVATION: One of the most important issues in computational proteomics is to produce a prediction model for the classification or annotation of biological function of novel protein sequences. In order to improve the prediction accuracy, much attention has been paid to the improvement of the performance of the algorithms used, few is for solving the fundamental issue, namely, amino acid encoding as most existing pattern recognition algorithms are unable to recognize amino acids in protein sequences. Importantly, the most commonly used amino acid encoding method has the flaw that leads to large computational cost and recognition bias. RESULTS: By replacing kernel functions of support vector machines (SVMs) with amino acid similarity measurement matrices, we have modified SVMs, a new type of pattern recognition algorithm for analysing protein sequences, particularly for proteolytic cleavage site prediction. We refer to the modified SVMs as bio-support vector machine. When applied to the prediction of HIV protease cleavage sites, the new method has shown a remarkable advantage in reducing the model complexity and enhancing the model robustness.  相似文献   

6.
支持向量回归机(Support vector regressio,SVR)模型的拟合精度和泛化能力取决于其相关参数的选择,其参数选择实质上是一个优化搜索过程。根据启发式广度优先搜索(Heuristic Breadth first Search,HBFS)算法在求解优化问题上高效的特点,提出了一种以k-fold交叉验证的最小化误差为目标,HBFS为寻优策略的SVR参数选择方法,通过3个基准数据集对该模型进行了仿真实验,结果表明该方法在保证预测精度前提下,大幅度的缩短了训练建模时间,为大样本的SVR参数选择提供了一种新的有效解决方案。  相似文献   

7.
支持向量机与神经网络的关系研究   总被引:2,自引:0,他引:2  
支持向量机是一种基于统计学习理论的新颖的机器学习方法,由于其出色的学习性能,该技术已成为当前国际机器学习界的研究热点,该方法已经广泛用于解决分类和回归问题.本文将结构风险函数应用于径向基函数网络学习中,同时讨论了支持向量回归模型和径向基函数网络之间的关系.仿真实例表明所给算法提高了径向基函数网络的泛化性能.  相似文献   

8.
基于已知的人类PolII启动子序列数据,综合选取启动子序列内容和序列信号特征,构建启动子的支持向量机分类器.分别以启动子序列的6-mer频数作为离散源参数构建序列内容特征。同时选取24个位点的3-mer频数作为序列信号特征构建PWM,将所得到的两类参数输入支持向量机对人类启动子进行预测.用10折叠交叉检验和独立数据集来衡量算法的预测能力,相关系数指标达到95%以上,结果显示结合了支持向量机的离散增量算法能够有效的提高预测成功率,是进行真核生物启动子预测的一种很有效的方法.  相似文献   

9.
10.
ConsNet is a comprehensive software package for the design of conservation area networks (CANs). The software selects areas to be potentially placed under conservation management for the representation of biodiversity surrogates. Additionally, ConsNet optimizes spatial criteria including compactness, connectivity, replication, and alignment, as well as socio-economic criteria as specified by users. ConsNet uses an advanced tabu search engine to identify efficient alternatives quickly, offering capabilities beyond existing planning software. The ability to perform ongoing interactive analysis with multi-criteria objectives makes ConsNet an ideal decision support tool for large scale planning exercises.  相似文献   

11.
Tabu search is a meta-heuristic approach that is proven to be useful in solving combinatorial optimization problems. We implement the adaptive memory features of tabu search to refine a multiple sequence alignment. Adaptive memory helps the search process to avoid local optima and explores the solution space economically and effectively without getting trapped into cycles. The algorithm is further enhanced by introducing extended tabu search features such as intensification and diversification. The neighborhoods of a solution are generated stochastically and a consistency-based objective function is employed to measure its quality. The algorithm is tested with the datasets from BAliBASE benchmarking database. We have observed through experiments that tabu search is able to improve the quality of multiple alignments generated by other software such as ClustalW and T-Coffee. The source code of our algorithm is available at http://www.bii.a-star.edu.sg/~tariq/tabu/.  相似文献   

12.
13.
赵学彤  杨亚东  渠鸿竹  方向东 《遗传》2018,40(9):693-703
随着组学技术的不断发展,对于不同层次和类型的生物数据的获取方法日益成熟。在疾病诊治过程中会产生大量数据,通过机器学习等人工智能方法解析复杂、多维、多尺度的疾病大数据,构建临床决策支持工具,辅助医生寻找快速且有效的疾病诊疗方案是非常必要的。在此过程中,机器学习等人工智能方法的选择显得尤为重要。基于此,本文首先从类型和算法角度对临床决策支持领域中常用的机器学习等方法进行简要综述,分别介绍了支持向量机、逻辑回归、聚类算法、Bagging、随机森林和深度学习,对机器学习等方法在临床决策支持中的应用做了相应总结和分类,并对它们的优势和不足分别进行讨论和阐述,为临床决策支持中机器学习等人工智能方法的选择提供有效参考。  相似文献   

14.
A method for identifying the positions in the amino acid sequence, which are critical for the catalytic activity of a protein using support vector machines (SVMs) is introduced and analysed. SVMs are supported by an efficient learning algorithm and can utilize some prior knowledge about the structure of the problem. The amino acid sequences of the variants of a protein, created by inducing mutations, along with their fitness are required as input data by the method to predict its critical positions. To investigate the performance of this algorithm, variants of the beta-lactamase enzyme were created in silico using simulations of both mutagenesis and recombination protocols. Results from literature on beta-lactamase were used to test the accuracy of this method. It was also compared with the results from a simple search algorithm. The algorithm was also shown to be able to predict critical positions that can tolerate two different amino acids and retain function.  相似文献   

15.
Random peptide libraries displayed on the surface of filamentous bacteriophage are widely used as tools for the discovery of ligands for biologically relevant macromolecules, including antibodies, enzymes, and cell surface receptors. Phage display results in linkage of an affinity-selectable function (the displayed peptide) to the DNA encoding that function, allowing selection of individual binding clones by iterative cycles of in vitro panning and in vivo amplification. Critical to the success of a panning experiment is the complexity of the library: the greater the diversity of clones within the library, the more likely the library contains sequences that will bind a given target with useful affinity. A method for construction of high-complexity (> or = 10(9) independent clones) random peptide libraries is presented. The key steps are highly efficient binary ligation under conditions where the vector is relatively dilute, with only a modest molar excess of insert, followed by efficient electrotransformation into Escherichia coli. Library design strategies and a protocol for rapid sequence characterization are also presented.  相似文献   

16.
We provide a novel interpretation of the dual of support vector machines (SVMs) in terms of scatter with respect to class prototypes and their mean. As a key contribution, we extend this framework to multiple classes, providing a new joint Scatter SVM algorithm, at the level of its binary counterpart in the number of optimization variables. This enables us to implement computationally efficient solvers based on sequential minimal and chunking optimization. As a further contribution, the primal problem formulation is developed in terms of regularized risk minimization and the hinge loss, revealing the score function to be used in the actual classification of test patterns. We investigate Scatter SVM properties related to generalization ability, computational efficiency, sparsity and sensitivity maps, and report promising results.  相似文献   

17.
Xu Z  Zhang C  Liu S  Zhou Y 《Proteins》2006,63(4):961-966
Solvent accessibility, one of the key properties of amino acid residues in proteins, can be used to assist protein structure prediction. Various approaches such as neural network, support vector machines, probability profiles, information theory, Bayesian theory, logistic function, and multiple linear regression have been developed for solvent accessibility prediction. In this article, a much simpler quadratic programming method based on the buriability parameter set of amino acid residues is developed. The new method, called QBES (Quadratic programming and Buriability Energy function for Solvent accessibility prediction), is reasonably accurate for predicting the real value of solvent accessibility. By using a dataset of 30 proteins to optimize three parameters, the average correlation coefficients between the predicted and actual solvent accessibility are about 0.5 for all four independent test sets ranging from 126 to 513 proteins. The method is efficient. It takes only 20 min for a regular PC to obtain results of 30 proteins with an average length of 263 amino acids. Although the proposed method is less accurate than a few more sophisticated methods based on neural network or support vector machines, this is the first attempt to predict solvent accessibility by energy optimization with constraints. Possible improvements and other applications of the method are discussed.  相似文献   

18.
Complex cellular machines and processes are commonly believed to be products of selection, and it is typically understood to be the job of evolutionary biologists to show how selective advantage can account for each step in their origin and subsequent growth in complexity. Here, we describe how complex machines might instead evolve in the absence of positive selection through a process of "presuppression," first termed constructive neutral evolution (CNE) more than a decade ago. If an autonomously functioning cellular component acquires mutations that make it dependent for function on another, pre-existing component or process, and if there are multiple ways in which such dependence may arise, then dependence inevitably will arise and reversal to independence is unlikely. Thus, CNE is a unidirectional evolutionary ratchet leading to complexity, if complexity is equated with the number of components or steps necessary to carry out a cellular process. CNE can explain "functions" that seem to make little sense in terms of cellular economy, like RNA editing or splicing, but it may also contribute to the complexity of machines with clear benefit to the cell, like the ribosome, and to organismal complexity overall. We suggest that CNE-based evolutionary scenarios are in these and other cases less forced than the selectionist or adaptationist narratives that are generally told.  相似文献   

19.
基于支持向量机融合网络的蛋白质折叠子识别研究   总被引:11,自引:1,他引:11  
在不依赖于序列相似性的条件下,蛋白质折叠子识别是一种分析蛋白质结构的重要方法.提出了一种三层支持向量机融合网络,从蛋白质的氨基酸序列出发,对27类折叠子进行识别.融合网络使用支持向量机作为成员分类器,采用“多对多”的多类分类策略,将折叠子的6种特征分为主要特征和次要特征,构建了多个差异的融合方案,然后对这些融合方案进行动态选择得到最终决策.当分类之前难以确定哪些参与组合的特征种类能够使分类结果最好时,提供了一种可靠的解决方案来自动选择特征信息互补最大的组合,保证了最佳分类结果.最后,识别系统对独立测试样本的总分类精度达到61.04%.结果和对比表明,此方法是一种有效的折叠子识别方法.  相似文献   

20.
 The patterns of phenotypic association between mated males and females depend on the decision rules that individuals employ during search for a mate. We generalize the sequential search rule and examine how the shape of the function that relates a male character to the benefit of a mating decision influences the threshold value of the male trait that induces females to terminate search. If the fitness function is linear the optimal threshold value of a male character increases with the slope of the function. The phenotypic threshold criterion declines, all else being equal, if the fitness function is made more concave (or less convex) by an increase of the risk of the function. The expression of the trait in females has no effect on the optimal threshold value of a male character if the fitness function is linear and phenotypic values combine additively to influence the benefit of a mating decision; the phenotypic threshold criterion is ubiquitous among females. A convex fitness function induces females with high trait values to adopt a relatively high phenotypic threshold criterion, whereas a concave fitness function induces such females to adopt a low threshold value for the male trait. Thus, linear, convex and concave fitness functions effect random, assortative and disassortative combinations of phenotypes among mated individuals, respectively. Changes of female search behavior induced by changes of the distribution of a male character similarly depend on the shape of the fitness function. A variance-preserving increase of male trait values produces a relatively small increase of the threshold criterion for the male character if the fitness function is concave, relative to conditions in which the fitness function is either linear or convex. Our results suggest that a sequential search rule can in principle induce the kinds of mating patterns observed in nature and that the phenotypic association between mated individuals is likely to depend on how a male character translates into fitness, the distribution of the trait among males and attributes of searching females. Received: 20 September 1997 / Revised version: 13 August 1998  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号