期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Personalized diagnosis by cached solutions with hypertension as a study model

Carvalho PC Freitas SS Lima AB Barros M Bittencourt I Degrave W Cordovil I Fonseca R Carvalho MG Moura Neto RS Cabello PH 《Genetics and molecular research : GMR》2006,5(4):856-867

Statistical modeling of links between genetic profiles with environmental and clinical data to aid in medical diagnosis is a challenge. Here, we present a computational approach for rapidly selecting important clinical data to assist in medical decisions based on personalized genetic profiles. What could take hours or days of computing is available on-the-fly, making this strategy feasible to implement as a routine without demanding great computing power. The key to rapidly obtaining an optimal/nearly optimal mathematical function that can evaluate the "disease stage" by combining information of genetic profiles with personal clinical data is done by querying a precomputed solution database. The database is previously generated by a new hybrid feature selection method that makes use of support vector machines, recursive feature elimination and random sub-space search. Here, to evaluate the method, data from polymorphisms in the renin-angiotensin-aldosterone system genes together with clinical data were obtained from patients with hypertension and control subjects. The disease "risk" was determined by classifying the patients' data with a support vector machine model based on the optimized feature; then measuring the Euclidean distance to the hyperplane decision function. Our results showed the association of renin-angiotensin-aldosterone system gene haplotypes with hypertension. The association of polymorphism patterns with different ethnic groups was also tracked by the feature selection process. A demonstration of this method is also available online on the project's web site. 相似文献

2.

Gene Selection for Multiclass Prediction by Weighted Fisher Criterion

Jianhua Xuan Yue Wang Yibin Dong Yuanjian Feng Bin Wang Javed Khan Maria Bakay Zuyi Wang Lauren Pachman Sara Winokur Yi-Wen Chen Robert Clarke Eric Hoffman 《EURASIP Journal on Bioinformatics and Systems Biology》2007,2007(1):64628

Gene expression profiling has been widely used to study molecular signatures of many diseases and to develop molecular diagnostics for disease prediction. Gene selection, as an important step for improved diagnostics, screens tens of thousands of genes and identifies a small subset that discriminates between disease types. A two-step gene selection method is proposed to identify informative gene subsets for accurate classification of multiclass phenotypes. In the first step, individually discriminatory genes (IDGs) are identified by using one-dimensional weighted Fisher criterion (wFC). In the second step, jointly discriminatory genes (JDGs) are selected by sequential search methods, based on their joint class separability measured by multidimensional weighted Fisher criterion (wFC). The performance of the selected gene subsets for multiclass prediction is evaluated by artificial neural networks (ANNs) and/or support vector machines (SVMs). By applying the proposed IDG/JDG approach to two microarray studies, that is, small round blue cell tumors (SRBCTs) and muscular dystrophies (MDs), we successfully identified a much smaller yet efficient set of JDGs for diagnosing SRBCTs and MDs with high prediction accuracies (96.9% for SRBCTs and 92.3% for MDs, resp.). These experimental results demonstrated that the two-step gene selection method is able to identify a subset of highly discriminative genes for improved multiclass prediction. 相似文献

3.

Gene selection for tumor classification using a novel bio-inspired multi-objective approach

M. Dashtban Mohammadali Balafar Prashanth Suravajhala 《Genomics》2018,110(1):10-17

Identifying the informative genes has always been a major step in microarray data analysis. The complexity of various cancer datasets makes this issue still challenging. In this paper, a novel Bio-inspired Multi-objective algorithm is proposed for gene selection in microarray data classification specifically in the binary domain of feature selection. The presented method extends the traditional Bat Algorithm with refined formulations, effective multi-objective operators, and novel local search strategies employing social learning concepts in designing random walks. A hybrid model using the Fisher criterion is then applied to three widely-used microarray cancer datasets to explore significant biomarkers which reveal the effectiveness of the proposed method for genomic analysis. Experimental results unveil new combinations of informative biomarkers have association with other studies. 相似文献

4.

Sequence Based Prediction of DNA-Binding Proteins Based on Hybrid Feature Selection Using Random Forest and Gaussian Na?ve Bayes

Wangchao Lou Xiaoqing Wang Fan Chen Yixiao Chen Bo Jiang Hua Zhang 《PloS one》2014,9(1)

Developing an efficient method for determination of the DNA-binding proteins, due to their vital roles in gene regulation, is becoming highly desired since it would be invaluable to advance our understanding of protein functions. In this study, we proposed a new method for the prediction of the DNA-binding proteins, by performing the feature rank using random forest and the wrapper-based feature selection using forward best-first search strategy. The features comprise information from primary sequence, predicted secondary structure, predicted relative solvent accessibility, and position specific scoring matrix. The proposed method, called DBPPred, used Gaussian naïve Bayes as the underlying classifier since it outperformed five other classifiers, including decision tree, logistic regression, k-nearest neighbor, support vector machine with polynomial kernel, and support vector machine with radial basis function. As a result, the proposed DBPPred yields the highest average accuracy of 0.791 and average MCC of 0.583 according to the five-fold cross validation with ten runs on the training benchmark dataset PDB594. Subsequently, blind tests on the independent dataset PDB186 by the proposed model trained on the entire PDB594 dataset and by other five existing methods (including iDNA-Prot, DNA-Prot, DNAbinder, DNABIND and DBD-Threader) were performed, resulting in that the proposed DBPPred yielded the highest accuracy of 0.769, MCC of 0.538, and AUC of 0.790. The independent tests performed by the proposed DBPPred on completely a large non-DNA binding protein dataset and two RNA binding protein datasets also showed improved or comparable quality when compared with the relevant prediction methods. Moreover, we observed that majority of the selected features by the proposed method are statistically significantly different between the mean feature values of the DNA-binding and the non DNA-binding proteins. All of the experimental results indicate that the proposed DBPPred can be an alternative perspective predictor for large-scale determination of DNA-binding proteins. 相似文献

5.

Bio-support vector machines for computational proteomics 总被引：2，自引：0，他引：2

Yang ZR Chou KC 《Bioinformatics (Oxford, England)》2004,20(5):735-741

MOTIVATION: One of the most important issues in computational proteomics is to produce a prediction model for the classification or annotation of biological function of novel protein sequences. In order to improve the prediction accuracy, much attention has been paid to the improvement of the performance of the algorithms used, few is for solving the fundamental issue, namely, amino acid encoding as most existing pattern recognition algorithms are unable to recognize amino acids in protein sequences. Importantly, the most commonly used amino acid encoding method has the flaw that leads to large computational cost and recognition bias. RESULTS: By replacing kernel functions of support vector machines (SVMs) with amino acid similarity measurement matrices, we have modified SVMs, a new type of pattern recognition algorithm for analysing protein sequences, particularly for proteolytic cleavage site prediction. We refer to the modified SVMs as bio-support vector machine. When applied to the prediction of HIV protease cleavage sites, the new method has shown a remarkable advantage in reducing the model complexity and enhancing the model robustness. 相似文献

6.

基于启发式广度优先搜索的SVR参数优化方法研究

向昌盛周子英夏艳军《生物信息学》2010,8(3):219-222

支持向量回归机(Support vector regressio,SVR)模型的拟合精度和泛化能力取决于其相关参数的选择,其参数选择实质上是一个优化搜索过程。根据启发式广度优先搜索(Heuristic Breadth first Search,HBFS)算法在求解优化问题上高效的特点,提出了一种以k-fold交叉验证的最小化误差为目标,HBFS为寻优策略的SVR参数选择方法,通过3个基准数据集对该模型进行了仿真实验,结果表明该方法在保证预测精度前提下,大幅度的缩短了训练建模时间,为大样本的SVR参数选择提供了一种新的有效解决方案。相似文献

7.

支持向量机与神经网络的关系研究 总被引：2，自引：0，他引：2

沈正维李秋菊《生物数学学报》2006,21(2):204-208

支持向量机是一种基于统计学习理论的新颖的机器学习方法,由于其出色的学习性能,该技术已成为当前国际机器学习界的研究热点,该方法已经广泛用于解决分类和回归问题．本文将结构风险函数应用于径向基函数网络学习中,同时讨论了支持向量回归模型和径向基函数网络之间的关系．仿真实例表明所给算法提高了径向基函数网络的泛化性能．相似文献

8.

基于序列特征的人类PolⅡ启动子理论预测

杨科利许强《生命科学研究》2009,13(5):403-407

基于已知的人类PolII启动子序列数据,综合选取启动子序列内容和序列信号特征,构建启动子的支持向量机分类器．分别以启动子序列的6-mer频数作为离散源参数构建序列内容特征。同时选取24个位点的3-mer频数作为序列信号特征构建PWM,将所得到的两类参数输入支持向量机对人类启动子进行预测．用10折叠交叉检验和独立数据集来衡量算法的预测能力,相关系数指标达到95％以上,结果显示结合了支持向量机的离散增量算法能够有效的提高预测成功率,是进行真核生物启动子预测的一种很有效的方法．相似文献

9.

Using recurrence quantification analysis descriptors for protein sequence classification with support vector machines

Mitra J Mundra P Kulkarni BD Jayaraman VK 《Journal of biomolecular structure & dynamics》2007,25(3):289-298

相似文献

10.

ConsNet: new software for the selection of conservation area networks with spatial and multi-criteria analyses

Michael Ciarleglio J. Wesley Barnes Sahotra Sarkar 《Ecography》2009,32(2):205-209

ConsNet is a comprehensive software package for the design of conservation area networks (CANs). The software selects areas to be potentially placed under conservation management for the representation of biodiversity surrogates. Additionally, ConsNet optimizes spatial criteria including compactness, connectivity, replication, and alignment, as well as socio-economic criteria as specified by users. ConsNet uses an advanced tabu search engine to identify efficient alternatives quickly, offering capabilities beyond existing planning software. The ability to perform ongoing interactive analysis with multi-criteria objectives makes ConsNet an ideal decision support tool for large scale planning exercises. 相似文献

11.

A tabu search algorithm for post-processing multiple sequence alignment

Riaz T Yi W Li KB 《Journal of bioinformatics and computational biology》2005,3(1):145-156

Tabu search is a meta-heuristic approach that is proven to be useful in solving combinatorial optimization problems. We implement the adaptive memory features of tabu search to refine a multiple sequence alignment. Adaptive memory helps the search process to avoid local optima and explores the solution space economically and effectively without getting trapped into cycles. The algorithm is further enhanced by introducing extended tabu search features such as intensification and diversification. The neighborhoods of a solution are generated stochastically and a consistency-based objective function is employed to measure its quality. The algorithm is tested with the datasets from BAliBASE benchmarking database. We have observed through experiments that tabu search is able to improve the quality of multiple alignments generated by other software such as ClustalW and T-Coffee. The source code of our algorithm is available at http://www.bii.a-star.edu.sg/~tariq/tabu/. 相似文献

12.

Using Recurrence Quantification Analysis Descriptors for Protein Sequence Classification with Support Vector Machines

Joydeep Mitra Piyushkumar Mundra B. D. Kulkarni Valadi K. Jayaraman 《Journal of biomolecular structure & dynamics》2013,31(3):289-297

相似文献

13.

组学时代下机器学习方法在临床决策支持中的应用

赵学彤杨亚东渠鸿竹方向东《遗传》2018,40(9):693-703

随着组学技术的不断发展,对于不同层次和类型的生物数据的获取方法日益成熟。在疾病诊治过程中会产生大量数据,通过机器学习等人工智能方法解析复杂、多维、多尺度的疾病大数据,构建临床决策支持工具,辅助医生寻找快速且有效的疾病诊疗方案是非常必要的。在此过程中,机器学习等人工智能方法的选择显得尤为重要。基于此,本文首先从类型和算法角度对临床决策支持领域中常用的机器学习等方法进行简要综述,分别介绍了支持向量机、逻辑回归、聚类算法、Bagging、随机森林和深度学习,对机器学习等方法在临床决策支持中的应用做了相应总结和分类,并对它们的优势和不足分别进行讨论和阐述,为临床决策支持中机器学习等人工智能方法的选择提供有效参考。相似文献

14.

Support vector machines for learning to identify the critical positions of a protein

Dubey A Realff MJ Lee JH Bommarius AS 《Journal of theoretical biology》2005,234(3):351-361

A method for identifying the positions in the amino acid sequence, which are critical for the catalytic activity of a protein using support vector machines (SVMs) is introduced and analysed. SVMs are supported by an efficient learning algorithm and can utilize some prior knowledge about the structure of the problem. The amino acid sequences of the variants of a protein, created by inducing mutations, along with their fitness are required as input data by the method to predict its critical positions. To investigate the performance of this algorithm, variants of the beta-lactamase enzyme were created in silico using simulations of both mutagenesis and recombination protocols. Results from literature on beta-lactamase were used to test the accuracy of this method. It was also compared with the results from a simple search algorithm. The algorithm was also shown to be able to predict critical positions that can tolerate two different amino acids and retain function. 相似文献

15.

Construction of high-complexity combinatorial phage display peptide libraries 总被引：2，自引：0，他引：2

Noren KA Noren CJ 《Methods (San Diego, Calif.)》2001,23(2):169-178

Random peptide libraries displayed on the surface of filamentous bacteriophage are widely used as tools for the discovery of ligands for biologically relevant macromolecules, including antibodies, enzymes, and cell surface receptors. Phage display results in linkage of an affinity-selectable function (the displayed peptide) to the DNA encoding that function, allowing selection of individual binding clones by iterative cycles of in vitro panning and in vivo amplification. Critical to the success of a panning experiment is the complexity of the library: the greater the diversity of clones within the library, the more likely the library contains sequences that will bind a given target with useful affinity. A method for construction of high-complexity (> or = 10(9) independent clones) random peptide libraries is presented. The key steps are highly efficient binary ligation under conditions where the vector is relatively dilute, with only a modest molar excess of insert, followed by efficient electrotransformation into Escherichia coli. Library design strategies and a protocol for rapid sequence characterization are also presented. 相似文献

16.

A Scatter-Based Prototype Framework and Multi-Class Extension of Support Vector Machines

Robert Jenssen Marius Kloft Alexander Zien S?ren Sonnenburg Klaus-Robert Müller 《PloS one》2012,7(10)

We provide a novel interpretation of the dual of support vector machines (SVMs) in terms of scatter with respect to class prototypes and their mean. As a key contribution, we extend this framework to multiple classes, providing a new joint Scatter SVM algorithm, at the level of its binary counterpart in the number of optimization variables. This enables us to implement computationally efficient solvers based on sequential minimal and chunking optimization. As a further contribution, the primal problem formulation is developed in terms of regularized risk minimization and the hinge loss, revealing the score function to be used in the actual classification of test patterns. We investigate Scatter SVM properties related to generalization ability, computational efficiency, sparsity and sensitivity maps, and report promising results. 相似文献

17.

QBES: predicting real values of solvent accessibility from sequences by efficient, constrained energy optimization

Xu Z Zhang C Liu S Zhou Y 《Proteins》2006,63(4):961-966

Solvent accessibility, one of the key properties of amino acid residues in proteins, can be used to assist protein structure prediction. Various approaches such as neural network, support vector machines, probability profiles, information theory, Bayesian theory, logistic function, and multiple linear regression have been developed for solvent accessibility prediction. In this article, a much simpler quadratic programming method based on the buriability parameter set of amino acid residues is developed. The new method, called QBES (Quadratic programming and Buriability Energy function for Solvent accessibility prediction), is reasonably accurate for predicting the real value of solvent accessibility. By using a dataset of 30 proteins to optimize three parameters, the average correlation coefficients between the predicted and actual solvent accessibility are about 0.5 for all four independent test sets ranging from 126 to 513 proteins. The method is efficient. It takes only 20 min for a regular PC to obtain results of 30 proteins with an average length of 263 amino acids. Although the proposed method is less accurate than a few more sophisticated methods based on neural network or support vector machines, this is the first attempt to predict solvent accessibility by energy optimization with constraints. Possible improvements and other applications of the method are discussed. 相似文献

18.

How a neutral evolutionary ratchet can build cellular complexity

Lukeš J Archibald JM Keeling PJ Doolittle WF Gray MW 《IUBMB life》2011,63(7):528-537

Complex cellular machines and processes are commonly believed to be products of selection, and it is typically understood to be the job of evolutionary biologists to show how selective advantage can account for each step in their origin and subsequent growth in complexity. Here, we describe how complex machines might instead evolve in the absence of positive selection through a process of "presuppression," first termed constructive neutral evolution (CNE) more than a decade ago. If an autonomously functioning cellular component acquires mutations that make it dependent for function on another, pre-existing component or process, and if there are multiple ways in which such dependence may arise, then dependence inevitably will arise and reversal to independence is unlikely. Thus, CNE is a unidirectional evolutionary ratchet leading to complexity, if complexity is equated with the number of components or steps necessary to carry out a cellular process. CNE can explain "functions" that seem to make little sense in terms of cellular economy, like RNA editing or splicing, but it may also contribute to the complexity of machines with clear benefit to the cell, like the ribosome, and to organismal complexity overall. We suggest that CNE-based evolutionary scenarios are in these and other cases less forced than the selectionist or adaptationist narratives that are generally told. 相似文献

19.

基于支持向量机融合网络的蛋白质折叠子识别研究 总被引：11，自引：1，他引：11

施建宇潘泉张绍武梁彦《生物化学与生物物理进展》2006,33(2):155-162

在不依赖于序列相似性的条件下,蛋白质折叠子识别是一种分析蛋白质结构的重要方法.提出了一种三层支持向量机融合网络,从蛋白质的氨基酸序列出发,对27类折叠子进行识别.融合网络使用支持向量机作为成员分类器,采用“多对多”的多类分类策略,将折叠子的6种特征分为主要特征和次要特征,构建了多个差异的融合方案,然后对这些融合方案进行动态选择得到最终决策.当分类之前难以确定哪些参与组合的特征种类能够使分类结果最好时,提供了一种可靠的解决方案来自动选择特征信息互补最大的组合,保证了最佳分类结果.最后,识别系统对独立测试样本的总分类精度达到61.04%.结果和对比表明,此方法是一种有效的折叠子识别方法. 相似文献

20.

Sequential search and the influence of male quality on female mating decisions

Daniel D. Wiegmann Kajal Mukhopadhyay Leslie A. Real 《Journal of mathematical biology》1999,39(3):193-216

The patterns of phenotypic association between mated males and females depend on the decision rules that individuals employ during search for a mate. We generalize the sequential search rule and examine how the shape of the function that relates a male character to the benefit of a mating decision influences the threshold value of the male trait that induces females to terminate search. If the fitness function is linear the optimal threshold value of a male character increases with the slope of the function. The phenotypic threshold criterion declines, all else being equal, if the fitness function is made more concave (or less convex) by an increase of the risk of the function. The expression of the trait in females has no effect on the optimal threshold value of a male character if the fitness function is linear and phenotypic values combine additively to influence the benefit of a mating decision; the phenotypic threshold criterion is ubiquitous among females. A convex fitness function induces females with high trait values to adopt a relatively high phenotypic threshold criterion, whereas a concave fitness function induces such females to adopt a low threshold value for the male trait. Thus, linear, convex and concave fitness functions effect random, assortative and disassortative combinations of phenotypes among mated individuals, respectively. Changes of female search behavior induced by changes of the distribution of a male character similarly depend on the shape of the fitness function. A variance-preserving increase of male trait values produces a relatively small increase of the threshold criterion for the male character if the fitness function is concave, relative to conditions in which the fitness function is either linear or convex. Our results suggest that a sequential search rule can in principle induce the kinds of mating patterns observed in nature and that the phenotypic association between mated individuals is likely to depend on how a male character translates into fitness, the distribution of the trait among males and attributes of searching females. Received: 20 September 1997 / Revised version: 13 August 1998 相似文献