首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Numerous studies have contributed to efforts to boost the accuracy of the credit scoring model. Especially interesting are recent studies which have successfully developed the hybrid approach, which advances classification accuracy by combining different machine learning techniques. However, to achieve better credit decisions, it is not enough merely to increase the accuracy of the credit scoring model. It is necessary to conduct meaningful supplementary analyses in order to obtain knowledge of causal relations, particularly in terms of significant conceptual patterns or structures involving attributes used in the credit scoring model. This paper proposes a solution of integrating data preprocessing strategies and the Bayesian network classifier with the tree augmented Na"?ve Bayes search algorithm, in order to improve classification accuracy and to obtain improved knowledge of causal patterns, thus enhancing the validity of credit decisions.  相似文献   

2.
Kaur H  Raghava GP 《Proteins》2004,55(1):83-90
In this paper a systematic attempt has been made to develop a better method for predicting alpha-turns in proteins. Most of the commonly used approaches in the field of protein structure prediction have been tried in this study, which includes statistical approach "Sequence Coupled Model" and machine learning approaches; i) artificial neural network (ANN); ii) Weka (Waikato Environment for Knowledge Analysis) Classifiers and iii) Parallel Exemplar Based Learning (PEBLS). We have also used multiple sequence alignment obtained from PSIBLAST and secondary structure information predicted by PSIPRED. The training and testing of all methods has been performed on a data set of 193 non-homologous protein X-ray structures using five-fold cross-validation. It has been observed that ANN with multiple sequence alignment and predicted secondary structure information outperforms other methods. Based on our observations we have developed an ANN-based method for predicting alpha-turns in proteins. The main components of the method are two feed-forward back-propagation networks with a single hidden layer. The first sequence-structure network is trained with the multiple sequence alignment in the form of PSI-BLAST-generated position specific scoring matrices. The initial predictions obtained from the first network and PSIPRED predicted secondary structure are used as input to the second structure-structure network to refine the predictions obtained from the first net. The final network yields an overall prediction accuracy of 78.0% and MCC of 0.16. A web server AlphaPred (http://www.imtech.res.in/raghava/alphapred/) has been developed based on this approach.  相似文献   

3.

Background

Knee osteoarthritis (OA) is the most common joint disease of adults worldwide. Since the treatments for advanced radiographic knee OA are limited, clinicians face a significant challenge of identifying patients who are at high risk of OA in a timely and appropriate way. Therefore, we developed a simple self-assessment scoring system and an improved artificial neural network (ANN) model for knee OA.

Methods

The Fifth Korea National Health and Nutrition Examination Surveys (KNHANES V-1) data were used to develop a scoring system and ANN for radiographic knee OA. A logistic regression analysis was used to determine the predictors of the scoring system. The ANN was constructed using 1777 participants and validated internally on 888 participants in the KNHANES V-1. The predictors of the scoring system were selected as the inputs of the ANN. External validation was performed using 4731 participants in the Osteoarthritis Initiative (OAI). Area under the curve (AUC) of the receiver operating characteristic was calculated to compare the prediction models.

Results

The scoring system and ANN were built using the independent predictors including sex, age, body mass index, educational status, hypertension, moderate physical activity, and knee pain. In the internal validation, both scoring system and ANN predicted radiographic knee OA (AUC 0.73 versus 0.81, p<0.001) and symptomatic knee OA (AUC 0.88 versus 0.94, p<0.001) with good discriminative ability. In the external validation, both scoring system and ANN showed lower discriminative ability in predicting radiographic knee OA (AUC 0.62 versus 0.67, p<0.001) and symptomatic knee OA (AUC 0.70 versus 0.76, p<0.001).

Conclusions

The self-assessment scoring system may be useful for identifying the adults at high risk for knee OA. The performance of the scoring system is improved significantly by the ANN. We provided an ANN calculator to simply predict the knee OA risk.  相似文献   

4.
Chang YJ  Yeh ML  Li YC  Hsu CY  Lin CC  Hsu MS  Chiu WT 《PloS one》2011,6(8):e23137

Background

Hospital-acquired infections (HAI) are associated with increased attributable morbidity, mortality, prolonged hospitalization, and economic costs. A simple, reliable prediction model for HAI has great clinical relevance. The objective of this study is to develop a scoring system to predict HAI that was derived from Logistic Regression (LR) and validated by Artificial Neural Networks (ANN) simultaneously.

Methodology/Principal Findings

A total of 476 patients from all the 806 HAI inpatients were included for the study between 2004 and 2005. A sample of 1,376 non-HAI inpatients was randomly drawn from all the admitted patients in the same period of time as the control group. External validation of 2,500 patients was abstracted from another academic teaching center. Sixteen variables were extracted from the Electronic Health Records (EHR) and fed into ANN and LR models. With stepwise selection, the following seven variables were identified by LR models as statistically significant: Foley catheterization, central venous catheterization, arterial line, nasogastric tube, hemodialysis, stress ulcer prophylaxes and systemic glucocorticosteroids. Both ANN and LR models displayed excellent discrimination (area under the receiver operating characteristic curve [AUC]: 0.964 versus 0.969, p = 0.507) to identify infection in internal validation. During external validation, high AUC was obtained from both models (AUC: 0.850 versus 0.870, p = 0.447). The scoring system also performed extremely well in the internal (AUC: 0.965) and external (AUC: 0.871) validations.

Conclusions

We developed a scoring system to predict HAI with simple parameters validated with ANN and LR models. Armed with this scoring system, infectious disease specialists can more efficiently identify patients at high risk for HAI during hospitalization. Further, using parameters either by observation of medical devices used or data obtained from EHR also provided good prediction outcome that can be utilized in different clinical settings.  相似文献   

5.
In the present paper, a hybrid technique involving artificial neural network (ANN) and genetic algorithm (GA) has been proposed for performing modeling and optimization of complex biological systems. In this approach, first an ANN approximates (models) the nonlinear relationship(s) existing between its input and output example data sets. Next, the GA, which is a stochastic optimization technique, searches the input space of the ANN with a view to optimize the ANN output. The efficacy of this formalism has been tested by conducting a case study involving optimization of DNA curvature characterized in terms of the RL value. Using the ANN-GA methodology, a number of sequences possessing high RL values have been obtained and analyzed to verify the existence of features known to be responsible for the occurrence of curvature. A couple of sequences have also been tested experimentally. The experimental results validate qualitatively and also near-quantitatively, the solutions obtained using the hybrid formalism. The ANN-GA technique is a useful tool to obtain, ahead of experimentation, sequences that yield high RL values. The methodology is a general one and can be suitably employed for optimizing any other biological feature.  相似文献   

6.
Abstract

In the present paper, a hybrid technique involving artificial neural network (ANN) and genetic algorithm (GA) has been proposed for performing modeling and optimization of complex biological systems. In this approach, first an ANN approximates (models) the nonlinear relationship(s) existing between its input and output example data sets. Next, the GA, which is a stochastic optimization technique, searches the input space of the ANN with a view to optimize the ANN output. The efficacy of this formalism has been tested by conducting a case study involving optimization of DNA curvature characterized in terms of the RL value. Using the ANN-GA methodology, a number of sequences possessing high RL values have been obtained and analyzed to verify the existence of features known to be responsible for the occurrence of curvature. A couple of sequences have also been tested experimentally. The experimental results validate qualitatively and also near-quantitatively, the solutions obtained using the hybrid formalism. The ANN-GA technique is a useful tool to obtain, ahead of experimentation, sequences that yield high RL values. The methodology is a general one and can be suitably employed for optimizing any other biological feature.  相似文献   

7.
Natt NK  Kaur H  Raghava GP 《Proteins》2004,56(1):11-18
This article describes a method developed for predicting transmembrane beta-barrel regions in membrane proteins using machine learning techniques: artificial neural network (ANN) and support vector machine (SVM). The ANN used in this study is a feed-forward neural network with a standard back-propagation training algorithm. The accuracy of the ANN-based method improved significantly, from 70.4% to 80.5%, when evolutionary information was added to a single sequence as a multiple sequence alignment obtained from PSI-BLAST. We have also developed an SVM-based method using a primary sequence as input and achieved an accuracy of 77.4%. The SVM model was modified by adding 36 physicochemical parameters to the amino acid sequence information. Finally, ANN- and SVM-based methods were combined to utilize the full potential of both techniques. The accuracy and Matthews correlation coefficient (MCC) value of SVM, ANN, and combined method are 78.5%, 80.5%, and 81.8%, and 0.55, 0.63, and 0.64, respectively. These methods were trained and tested on a nonredundant data set of 16 proteins, and performance was evaluated using "leave one out cross-validation" (LOOCV). Based on this study, we have developed a Web server, TBBPred, for predicting transmembrane beta-barrel regions in proteins (available at http://www.imtech.res.in/raghava/tbbpred).  相似文献   

8.
The particulate matter (PM) concentration has been one of the most relevant environmental concerns in recent decades due to its prejudicial effects on living beings and the earth’s atmosphere. High PM concentration affects the human health in several ways leading to short and long term diseases. Thus, forecasting systems have been developed to support decisions of the organizations and governments to alert the population. Forecasting systems based on Artificial Neural Networks (ANNs) have been highlighted in the literature due to their performances. In general, three ANN-based approaches have been found for this task: ANN trained via learning algorithms, hybrid systems that combine search algorithms with ANNs, and hybrid systems that combine ANN with other forecasters. Independent of the approach, it is common to suppose that the residuals (error series), obtained from the difference between actual series and forecasting, have a white noise behavior. However, it is possible that this assumption is infringed due to: misspecification of the forecasting model, complexity of the time series or temporal patterns of the phenomenon not captured by the forecaster. This paper proposes an approach to improve the performance of PM forecasters from residuals modeling. The approach analyzes the remaining residuals recursively in search of temporal patterns. At each iteration, if there are temporal patterns in the residuals, the approach generates the forecasting of the residuals in order to improve the forecasting of the PM time series. The proposed approach can be used with either only one forecaster or by combining two or more forecasting models. In this study, the approach is used to improve the performance of a hybrid system (HS) composed by genetic algorithm (GA) and ANN from residuals modeling performed by two methods, namely, ANN and own hybrid system. Experiments were performed for PM2.5 and PM10 concentration series in Kallio and Vallila stations in Helsinki and evaluated from six metrics. Experimental results show that the proposed approach improves the accuracy of the forecasting method in terms of fitness function for all cases, when compared with the method without correction. The correction via HS obtained a superior performance, reaching the best results in terms of fitness function and in five out of six metrics. These results also were found when a sensitivity analysis was performed varying the proportions of the sets of training, validation and test. The proposed approach reached consistent results when compared with the forecasting method without correction, showing that it can be an interesting tool for correction of PM forecasters.  相似文献   

9.
10.
11.
12.
The computational approach for identifying promoters on increasingly large genomic sequences has led to many false positives. The biological significance of promoter identification lies in the ability to locate true promoters with and without prior sequence contextual knowledge. Prior approaches to promoter modelling have involved artificial neural networks (ANNs) or hidden Markov models (HMMs), each producing adequate results on small scale identification tasks, i.e. narrow upstream regions. In this work, we present an architecture to support prokaryote promoter identification on large scale genomic sequences, i.e. not limited to narrow upstream regions. The significant contribution involved the hybrid formed via aggregation of the profile HMM with the ANN, via Viterbi scoring optimizations. The benefit obtained using this architecture includes the modelling ability of the profile HMM with the ability of the ANN to associate elements composing the promoter. We present the high effectiveness of the hybrid approach in comparison to profile HMMs and ANNs when used separately. The contribution of Viterbi optimizations is also highlighted for supporting the hybrid architecture in which gains in sensitivity (+0.3), specificity (+0.65) and precision (+0.54) are achieved over existing approaches.  相似文献   

13.
This contribution presents a novel method for the direct integration of a-priori knowledge in a neural network and its application for the online determination of a secondary metabolite during industrial yeast fermentation. Hereby, existing system knowledge is integrated in an artificial neural network (ANN) by means of 'functional nodes'. A generalized backpropagation algorithm is presented. For illustration, a set of ordinary differential equations describing the diacetyl formation and degradation during the cultivation is incorporated in a functional node and integrated in a dynamic feedforward neural network in a hybrid manner. The results show that a hybrid modelling approach exploiting available a-priori knowledge and experimental data can considerably outperform a pure data-based modelling approach with respect to robustness, generalization and necessary amount of training data. The number of training sets were decreased by 50%, obtaining the same accuracy as in a conventional approach. All incorrect decisions, according to defined cost criteria obtained with the conventional ANN, were avoided.  相似文献   

14.
Comparison of methods for searching protein sequence databases.   总被引:12,自引:2,他引:10       下载免费PDF全文
We have compared commonly used sequence comparison algorithms, scoring matrices, and gap penalties using a method that identifies statistically significant differences in performance. Search sensitivity with either the Smith-Waterman algorithm or FASTA is significantly improved by using modern scoring matrices, such as BLOSUM45-55, and optimized gap penalties instead of the conventional PAM250 matrix. More dramatic improvement can be obtained by scaling similarity scores by the logarithm of the length of the library sequence (In()-scaling). With the best modern scoring matrix (BLOSUM55 or JO93) and optimal gap penalties (-12 for the first residue in the gap and -2 for additional residues), Smith-Waterman and FASTA performed significantly better than BLASTP. With In()-scaling and optimal scoring matrices (BLOSUM45 or Gonnet92) and gap penalties (-12, -1), the rigorous Smith-Waterman algorithm performs better than either BLASTP and FASTA, although with the Gonnet92 matrix the difference with FASTA was not significant. Ln()-scaling performed better than normalization based on other simple functions of library sequence length. Ln()-scaling also performed better than scores based on normalized variance, but the differences were not statistically significant for the BLOSUM50 and Gonnet92 matrices. Optimal scoring matrices and gap penalties are reported for Smith-Waterman and FASTA, using conventional or In()-scaled similarity scores. Searches with no penalty for gap extension, or no penalty for gap opening, or an infinite penalty for gaps performed significantly worse than the best methods. Differences in performance between FASTA and Smith-Waterman were not significant when partial query sequences were used. However, the best performance with complete query sequences was obtained with the Smith-Waterman algorithm and In()-scaling.  相似文献   

15.
One of the challenges faced by all molecular docking algorithms is that of being able to discriminate between correct results and false positives obtained in the simulations. The scoring or energetic function is the one that must fulfill this task. Several scoring functions have been developed and new methodologies are still under development. In this paper, we have employed the Compactly Supported Radial Basis Functions (CSRBF) to create analytical representations of molecular surfaces, which are then included as key components of a new scoring function for molecular docking. The method proposed here achieves a better ranking of the solutions produced by the program DOCK, as compared with the ranking done by its native contact scoring function. Our new analytical scoring function based on CSRBF can be easily included in different available docking programs as a reliable and quick filter in large-scale docking simulations.  相似文献   

16.
In this work, the development of an Artificial Neural Network (ANN) based soft estimator is reported for the estimation of static-nonlinearity associated with the transducers. Under the realm of ANN based transducer modeling, only two neural models have been suggested to estimate the static-nonlinearity associated with the transducers with quite successful results. The first existing model is based on the concept of a functional link artificial neural network (FLANN) trained with mu-LMS (Least Mean Squares) learning algorithm. The second one is based on the architecture of a single layer linear ANN trained with alpha-LMS learning algorithm. However, both these models suffer from the problem of slow convergence (learning). In order to circumvent this problem, it is proposed to synthesize the direct model of transducers using the concept of a Polynomial-ANN (polynomial artificial neural network) trained with Levenberg-Marquardt (LM) learning algorithm. The proposed Polynomial-ANN oriented transducer model is implemented based on the topology of a single-layer feed-forward back-propagation-ANN. The proposed neural modeling technique provided an extremely fast convergence speed with increased accuracy for the estimation of transducer static nonlinearity. The results of convergence are very stimulating with the LM learning algorithm.  相似文献   

17.
Using a fermentation database for Escherichia coli producing green fluorescent protein (GFP), we have implemented a novel three-step optimization method to identify the process input variables most important in modeling the fermentation, as well as the values of those critical input variables that result in an increase in the desired output. In the first step of this algorithm, we use either decision-tree analysis (DTA) or information theoretic subset selection (ITSS) as a database mining technique to identify which process input variables best classify each of the process outputs (maximum cell concentration, maximum product concentration, and productivity) monitored in the experimental fermentations. The second step of the optimization method is to train an artificial neural network (ANN) model of the process input-output data, using the critical inputs identified in the first step. Finally, a hybrid genetic algorithm (hybrid GA), which includes both gradient and stochastic search methods, is used to identify the maximum output modeled by the ANN and the values of the input conditions that result in that maximum. The results of the database mining techniques are compared, both in terms of the inputs selected and the subsequent ANN performance. For the E. coli process used in this study, we identified 6 inputs from the original 13 that resulted in an ANN that best modeled the GFP fluorescence outputs of an independent test set. Values of the six inputs that resulted in a modeled maximum fluorescence were identified by applying a hybrid GA to the ANN model developed. When these conditions were tested in laboratory fermentors, an actual maximum fluorescence of 2.16E6 AU was obtained. The previous high value of fluorescence that was observed was 1.51E6 AU. Thus, this input condition set that was suggested by implementing the proposed optimization scheme on the available historical database increased the maximum fluorescence by 55%.  相似文献   

18.
Cholesterol oxidase (COD) is a bi-functional FAD-containing oxidoreductase which catalyzes the oxidation of cholesterol into 4-cholesten-3-one. The wider biological functions and clinical applications of COD have urged the screening, isolation and characterization of newer microbes from diverse habitats as a source of COD and optimization and over-production of COD for various uses. The practicability of statistical/ artificial intelligence techniques, such as response surface methodology (RSM), artificial neural network (ANN) and genetic algorithm (GA) have been tested to optimize the medium composition for the production of COD from novel strain Streptomyces sp. NCIM 5500. All experiments were performed according to the five factor central composite design (CCD) and the generated data was analysed using RSM and ANN. GA was employed to optimize the models generated by RSM and ANN. Based upon the predicted COD concentration, the model developed with ANN was found to be superior to the model developed with RSM. The RSM-GA approach predicted maximum of 6.283 U/mL COD production, whereas the ANN-GA approach predicted a maximum of 9.93 U/mL COD concentration. The optimum concentrations of the medium variables predicted through ANN-GA approach were: 1.431 g/50 mL soybean, 1.389 g/50 mL maltose, 0.029 g/50 mL MgSO4, 0.45 g/50 mL NaCl and 2.235 ml/50 mL glycerol. The experimental COD concentration was concurrent with the GA predicted yield and led to 9.75 U/mL COD production, which was nearly two times higher than the yield (4.2 U/mL) obtained with the un-optimized medium. This is the very first time we are reporting the statistical versus artificial intelligence based modeling and optimization of COD production by Streptomyces sp. NCIM 5500.  相似文献   

19.
In a number of programs for gene structure prediction in higher eukaryotic genomic sequences, exon prediction is decoupled from gene assembly: a large pool of candidate exons is predicted and scored from features located in the query DNA sequence, and candidate genes are assembled from such a pool as sequences of nonoverlapping frame-compatible exons. Genes are scored as a function of the scores of the assembled exons, and the highest scoring candidate gene is assumed to be the most likely gene encoded by the query DNA sequence. Considering additive gene scoring functions, currently available algorithms to determine such a highest scoring candidate gene run in time proportional to the square of the number of predicted exons. Here, we present an algorithm whose running time grows only linearly with the size of the set of predicted exons. Polynomial algorithms rely on the fact that, while scanning the set of predicted exons, the highest scoring gene ending in a given exon can be obtained by appending the exon to the highest scoring among the highest scoring genes ending at each compatible preceding exon. The algorithm here relies on the simple fact that such highest scoring gene can be stored and updated. This requires scanning the set of predicted exons simultaneously by increasing acceptor and donor position. On the other hand, the algorithm described here does not assume an underlying gene structure model. Indeed, the definition of valid gene structures is externally defined in the so-called Gene Model. The Gene Model specifies simply which gene features are allowed immediately upstream which other gene features in valid gene structures. This allows for great flexibility in formulating the gene identification problem. In particular it allows for multiple-gene two-strand predictions and for considering gene features other than coding exons (such as promoter elements) in valid gene structures.  相似文献   

20.
A protein-protein docking procedure traditionally consists in two successive tasks: a search algorithm generates a large number of candidate conformations mimicking the complex existing in vivo between two proteins, and a scoring function is used to rank them in order to extract a native-like one. We have already shown that using Voronoi constructions and a well chosen set of parameters, an accurate scoring function could be designed and optimized. However to be able to perform large-scale in silico exploration of the interactome, a near-native solution has to be found in the ten best-ranked solutions. This cannot yet be guaranteed by any of the existing scoring functions. In this work, we introduce a new procedure for conformation ranking. We previously developed a set of scoring functions where learning was performed using a genetic algorithm. These functions were used to assign a rank to each possible conformation. We now have a refined rank using different classifiers (decision trees, rules and support vector machines) in a collaborative filtering scheme. The scoring function newly obtained is evaluated using 10 fold cross-validation, and compared to the functions obtained using either genetic algorithms or collaborative filtering taken separately. This new approach was successfully applied to the CAPRI scoring ensembles. We show that for 10 targets out of 12, we are able to find a near-native conformation in the 10 best ranked solutions. Moreover, for 6 of them, the near-native conformation selected is of high accuracy. Finally, we show that this function dramatically enriches the 100 best-ranking conformations in near-native structures.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号