首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Due to the increasing gap between structure-determined and sequenced proteins, prediction of protein structural classes has been an important problem. It is very important to use efficient sequential parameters for developing class predictors because of the close sequence-structure relationship. The multinomial logistic regression model was used for the first time to evaluate the contribution of sequence parameters in determining the protein structural class. An in-house program generated parameters including single amino acid and all dipeptide composition frequencies. Then, the most effective parameters were selected by a multinomial logistic regression. Selected variables in the multinomial logistic model were Valine among single amino acid composition frequencies and Ala-Gly, Cys-Arg, Asp-Cys, Glu-Tyr, Gly-Glu, His-Tyr, Lys-Lys, Leu-Asp, Leu-Arg, Pro-Cys, Gln-Met, Gln-Thr, Ser-Trp, Val-Asn and Trp-Asn among dipeptide composition frequencies. Also a neural network model was constructed and fed by the parameters selected by multinomial logistic regression to build a hybrid predictor. In this study, self-consistency and jackknife tests on a database constructed by Zhou [1998. An intriguing controversy over protein structural class prediction. J. Protein Chem. 17(8), 729-738] containing 498 proteins are used to verify the performance of this hybrid method, and are compared with some of prior works. The results showed that our two-stage hybrid model approach is very promising and may play a complementary role to the existing powerful approaches.  相似文献   

2.
A genetic algorithm (GA) for feature selection in conjunction with neural network was applied to predict protein structural classes based on single amino acid and all dipeptide composition frequencies. These sequence parameters were encoded as input features for a GA in feature selection procedure and classified with a three-layered neural network to predict protein structural classes. The system was established through optimization of the classification performance of neural network which was used as evaluation function. In this study, self-consistency and jackknife tests on a database containing 498 proteins were used to verify the performance of this hybrid method, and were compared with some of prior works. The adoption of a hybrid model, which encompasses genetic and neural technologies, demonstrated to be a promising approach in the task of protein structural class prediction.  相似文献   

3.
In this paper, a robust algorithm for disease type determination in brain magnetic resonance image (MRI) is presented. The proposed method classifies MRI into normal or one of the seven different diseases. At first two-level two-dimensional discrete wavelet transform (2D DWT) of input image is calculated. Our analysis show that the wavelet coefficients of detail sub-bands can be modeled by generalized autoregressive conditional heteroscedasticity (GARCH) statistical model. The parameters of GARCH model are considered as the primary feature vector. After feature vector normalization, principal component analysis (PCA) and linear discriminant analysis (LDA) are used to extract the proper features and remove the redundancy from the primary feature vector. Finally, the extracted features are applied to the K-nearest neighbor (KNN) and support vector machine (SVM) classifiers separately to determine the normal image or disease type. Experimental results indicate that the proposed algorithm achieves high classification rate and outperforms recently introduced methods while it needs less number of features for classification.  相似文献   

4.
In this paper, EEG signals of 20 schizophrenic patients and 20 age-matched control participants are analyzed with the objective of determining the more informative channels and finally distinguishing the two groups. For each case, 22 channels of EEG were recorded. A two-stage feature selection algorithm is designed, such that, the more informative channels are first selected to enhance the discriminative information. Two methods, bidirectional search and plus-L minus-R (LRS) techniques are employed to select these informative channels. The interesting point is that most of selected channels are located in the temporal lobes (containing the limbic system) that confirm the neuro-phychological differences in these areas between the schizophrenic and normal participants. After channel selection, genetic algorithm (GA) is employed to select the best features from the selected channels. In this case, in addition to elimination of the less informative channels, the redundant and less discriminant features are also eliminated. A computationally fast algorithm with excellent classification results is obtained. Implementation of this efficient approach involves several features including autoregressive (AR) model parameters, band power, fractal dimension and wavelet energy. To test the performance of the final subset of features, classifiers including linear discriminant analysis (LDA) and support vector machine (SVM) are employed to classify the reduced feature set of the two groups. Using the bidirectional search for channel selection, a classification accuracy of 84.62% and 99.38% is obtained for LDA and SVM, respectively. Using the LRS technique for channel selection, a classification accuracy of 88.23% and 99.54% is also obtained for LDA and SVM, respectively. Finally, the results are compared and contrasted with two well-known methods namely, the single-stage feature selection (evolutionary feature selection) and principal component analysis (PCA)-based feature selection. The results show improved accuracy of classification in relatively low computational time with the two-stage feature selection.  相似文献   

5.
Identification of protein coding regions is fundamentally a statistical pattern recognition problem. Discriminant analysis is a statistical technique for classifying a set of observations into predefined classes and it is useful to solve such problems. It is well known that outliers are present in virtually every data set in any application domain, and classical discriminant analysis methods (including linear discriminant analysis (LDA) and quadratic discriminant analysis (QDA)) do not work well if the data set has outliers. In order to overcome the difficulty, the robust statistical method is used in this paper. We choose four different coding characters as discriminant variables and an approving result is presented by the method of robust discriminant analysis.  相似文献   

6.
A novel direct antibodies-free electrochemical approach for acute myocardial infarction (AMI) diagnosis has been developed. For this purpose, a combination of the electrochemical assay of plasma samples with chemometrics was proposed. Screen printed carbon electrodes modified with didodecyldimethylammonium bromide were used for plasma charactrerization by cyclic (CV) and square wave voltammetry and square wave (SWV) voltammetry. It was shown that the cathodic peak in voltammograms at about -250 mV vs. Ag/AgCl can be associated with AMI. In parallel tests, cardiac myoglobin and troponin I, the AMI biomarkers, were determined in each sample by RAMP immunoassay. The applicability of the electrochemical testing for AMI diagnostics was confirmed by statistical methods: generalized linear model (GLM), linear discriminant analysis (LDA) and quadratic discriminant analysis (QDA), artificial neural net (multi-layer perception, MLP), and support vector machine (SVM), all of which were created to obtain the "True-False" distribution prediction where "True" and "False" are, respectively, positive and negative decision about an illness event.  相似文献   

7.
We have previously shown the usefulness of historical data for fermentation process optimization. The methodology developed includes identification of important process inputs, training of an artificial neural network (ANN) process model, and ultimately use of the ANN model with a genetic algorithm to find the optimal values of each critical process input. However, this approach ignores the time-dependent nature of the system, and therefore, does not fully utilize the available information within a database. In this work, we propose a method for incorporating time-dependent optimization into our previously developed three-step optimization routine. This is achieved by an additional step that uses a fermentation model (consisting of coupled ordinary differential equations (ODE)) to interpret important time-course features of the collected data through adjustments in model parameters. Important process variables not explicitly included in the model were then identified for each model parameter using automatic relevance determination (ARD) with Gaussian process (GP) models. The developed GP models were then combined with the fermentation model to form a hybrid neural network model that predicted the time-course activity of the cell and protein concentrations of novel fermentation conditions. A hybrid-genetic algorithm was then used in conjunction with the hybrid model to suggest optimal time-dependent control strategies. The presented method was implemented upon an E. coli fermentation database generated in our laboratory. Optimization of two different criteria (final protein yield and a simplified economic criteria) was attempted. While the overall protein yield was not increased using this methodology, we were successful in increasing a simplified economic criterion by 15% compared to what had been previously observed. These process conditions included using 35% less arabinose (the inducer) and 33% less typtone in the media and reducing the time required to reach the maximum protein concentration by 10% while producing approximately the same level of protein as the previous optimum.  相似文献   

8.
MOTIVATION: So far various statistical and machine learning techniques applied for prediction of beta-turns. The majority of these techniques have been only focused on the prediction of beta-turn location in proteins. We developed a hybrid approach for analysis and prediction of different types of beta-turn. RESULTS: A two-stage hybrid model developed to predict the beta-turn Types I, II, IV and VIII. Multinomial logistic regression was initially used for the first time to select significant parameters in prediction of beta-turn types using a self-consistency test procedure. The extracted parameters were consisted of 80 amino acid positional occurrences and 20 amino acid percentages in beta-turn sequence. The most significant parameters were then selected using multinomial logistic regression model. Among these, the occurrences of glutamine, histidine, glutamic acid and arginine, respectively, in positions i, i + 1, i + 2 and i + 3 of beta-turn sequence had an overall relationship with five beta-turn types. A neural network model was then constructed and fed by the parameters selected by multinomial logistic regression to build a hybrid predictor. The networks have been trained and tested on a non-homologous dataset of 565 protein chains by 9-fold cross-validation. It has been observed that the hybrid model gives a Matthews correlation coefficient (MCC) of 0.235, 0.473, 0.103 and 0.124, respectively, for beta-turn Types I, II, IV and VIII. Our model also distinguished the different types of beta-turn in the embedded binary logit comparisons which have not carried out so far. AVAILABILITY: Available on request from the authors.  相似文献   

9.
The number of protein 3D structures without function annotation in Protein Data Bank (PDB) has been steadily increased. This fact has led in turn to an increment of demand for theoretical models to give a quick characterization of these proteins. In this work, we present a new and fast Markov chain model (MCM) to predict the enzyme classification (EC) number. We used both linear discriminant analysis (LDA) and/or artificial neural networks (ANN) in order to compare linear vs. non-linear classifiers. The LDA model found is very simple (three variables) and at the same time is able to predict the first EC number with an overall accuracy of 79% for a data set of 4755 proteins (859 enzymes and 3896 non-enzymes) divided into both training and external validation series. In addition, the best non-linear ANN model is notably more complex but has an overall accuracy of 98.85%. It is important to emphasize that this method may help us to predict not only new enzyme proteins but also to select peptide candidates found on the peptide mass fingerprints (PMFs) of new proteins that may improve enzyme activity. In order to illustrate the use of the model in this regard, we first report the 2D electrophoresis (2DE) and MADLI-TOF mass spectra characterization of the PMF of a new possible malate dehydrogenase sequence from Leishmania infantum. Next, we used the models to predict the contribution to a specific enzyme action of 30 peptides found in the PMF of the new protein. We implemented the present model in a server at portal Bio-AIMS (http://miaja.tic.udc.es/Bio-AIMS/EnzClassPred.php). This free on-line tool is based on PHP/HTML/Python and MARCH-INSIDE routines. This combined strategy may be used to identify and predict peptides of prokaryote and eukaryote parasites and their hosts as well as other superior organisms, which may be of interest in drug development or target identification.  相似文献   

10.
Many drugs with very different affinity to a large number of receptors are described. Thus, in this work, we selected drug-target pairs (DTPs/nDTPs) of drugs with high affinity/nonaffinity for different targets. Quantitative structure-activity relationship (QSAR) models become a very useful tool in this context because they substantially reduce time and resource-consuming experiments. Unfortunately, most QSAR models predict activity against only one protein target and/or they have not been implemented on a public Web server yet, freely available online to the scientific community. To solve this problem, we developed a multitarget QSAR (mt-QSAR) classifier combining the MARCH-INSIDE software for the calculation of the structural parameters of drug and target with the linear discriminant analysis (LDA) method in order to seek the best model. The accuracy of the best LDA model was 94.4% (3,859/4,086 cases) for training and 94.9% (1,909/2,012 cases) for the external validation series. In addition, we implemented the model into the Web portal Bio-AIMS as an online server entitled MARCH-INSIDE Nested Drug-Bank Exploration & Screening Tool (MIND-BEST), located at http://miaja.tic.udc.es/Bio-AIMS/MIND-BEST.php . This online tool is based on PHP/HTML/Python and MARCH-INSIDE routines. Finally, we illustrated two practical uses of this server with two different experiments. In experiment 1, we report for the first time a MIND-BEST prediction, synthesis, characterization, and MAO-A and MAO-B pharmacological assay of eight rasagiline derivatives, promising for anti-Parkinson drug design. In experiment 2, we report sampling, parasite culture, sample preparation, 2-DE, MALDI-TOF and -TOF/TOF MS, MASCOT search, 3D structure modeling with LOMETS, and MIND-BEST prediction for different peptides as new protein of the found in the proteome of the bird parasite Trichomonas gallinae, which is promising for antiparasite drug targets discovery.  相似文献   

11.
In some classifications the importance of classes varies and it is desirable to weight allocation to selected classes. This is common in classifications of remotely sensed imagery, especially as class occurrence can vary markedly. If, for instance, there is prior knowledge on the distribution of class occurrence this weighting can be achieved with widely used statistical classifiers by setting appropriate a priori probabilities of class membership. With an arificial neural network the incorporation of prior knowledge is more problematic. An approach to weight class allocation in an artificial neural network classifcation by replicating selected training patterns is discussed. In comparison against a discriminant analysis for the classification of synthetic aperture radar imagery the results showed that training pattern replication could be used to weight class allocation with an effect similar to that of incorporating a priori probabilities of class membership into the discriminant analysis and resulted in a significant increase in classification accuracy.  相似文献   

12.
In this study, we evaluated if the application of multivariate analysis on the data obtained from two-dimensional protein maps could mean an improvement in the search for protein markers. First, we performed a classical proteomic study of the differential expression of serum N-glycoproteins in colorectal cancer patients. Then, applying principal component analysis (PCA) we assessed the utility of the 2-D protein pattern and certain subsets of spots as a tool to distinguish control and case samples, and tested the accuracy of the classification model by linear discriminant analysis (LDA). On the other hand we looked for altered spots by univariate statistics and then analysed them as a cluster by PCA and LDA. We found that those proteins combined presented a theoretical sensitivity and specificity of 100%. Finally, the spots with known protein identity were analysed by multivariate methods, finding a subgroup that behaved as the most obvious candidates for further validation trials.  相似文献   

13.
14.
15.
Due to the slightly success of protein secondary structure prediction using the various algorithmic and non-algorithmic techniques, similar techniques have been developed for predicting γ-turns in proteins by Kaur and Raghava [2003. A neural-network based method for prediction of γ-turns in proteins from multiple sequence alignment. Protein Sci. 12, 923-929]. However, the major limitation of previous methods was inability in predicting γ-turn types. In a recent investigation we introduced a sequence based predictor model for predicting γ-turn types in proteins [Jahandideh, S., Sabet Sarvestani, A., Abdolmaleki, P., Jahandideh, M., Barfeie, M, 2007a. γ-turn types prediction in proteins using the support vector machines. J. Theor. Biol. 249, 785-790]. In the present work, in order to analyze the effect of sequence and structure in the formation of γ-turn types and predicting γ-turn types in proteins, we applied novel hybrid neural discriminant modeling procedure. As the result, this study clarified the efficiency of using the statistical model preprocessors in determining the effective parameters. Moreover, the optimal structure of neural network can be simplified by a preprocessor in the first stage of hybrid approach, thereby reducing the needed time for neural network training procedure in the second stage and the probability of overfitting occurrence decreased and a high precision and reliability obtained in this way.  相似文献   

16.
17.
In the present study, three different physicochemical molecular properties for peptides were calculated using the program MARCH-INSIDE: atomic polarizability, partition coefficient, and polarity. These measures were used as input parameters of a linear discriminant analysis (LDA) in order to develop three different quantitative structure–property relationship (QSPR)-perturbation models for the prediction of B-epitopes reported in the immune epitope database (IEDB) given perturbations in peptide sequence, in vivo process, experimental techniques, and source or host organisms. The accuracy, sensitivity and specificity of the models were >90 % for both training and cross-validation series. The statistical parameters of the models were compared to the results achieved with the electronegativity QSPR-perturbation model previously reported by González-Díaz et al. (J Immunol Res. doi: 10.1155/2014/768515, 2014). The results indicate that this type of approach may constitute a potentially valuable route for predicting in silico” new optimal peptide sequences and/or boundary conditions for vaccine development.  相似文献   

18.
Using a fermentation database for Escherichia coli producing green fluorescent protein (GFP), we have implemented a novel three-step optimization method to identify the process input variables most important in modeling the fermentation, as well as the values of those critical input variables that result in an increase in the desired output. In the first step of this algorithm, we use either decision-tree analysis (DTA) or information theoretic subset selection (ITSS) as a database mining technique to identify which process input variables best classify each of the process outputs (maximum cell concentration, maximum product concentration, and productivity) monitored in the experimental fermentations. The second step of the optimization method is to train an artificial neural network (ANN) model of the process input-output data, using the critical inputs identified in the first step. Finally, a hybrid genetic algorithm (hybrid GA), which includes both gradient and stochastic search methods, is used to identify the maximum output modeled by the ANN and the values of the input conditions that result in that maximum. The results of the database mining techniques are compared, both in terms of the inputs selected and the subsequent ANN performance. For the E. coli process used in this study, we identified 6 inputs from the original 13 that resulted in an ANN that best modeled the GFP fluorescence outputs of an independent test set. Values of the six inputs that resulted in a modeled maximum fluorescence were identified by applying a hybrid GA to the ANN model developed. When these conditions were tested in laboratory fermentors, an actual maximum fluorescence of 2.16E6 AU was obtained. The previous high value of fluorescence that was observed was 1.51E6 AU. Thus, this input condition set that was suggested by implementing the proposed optimization scheme on the available historical database increased the maximum fluorescence by 55%.  相似文献   

19.
Physico-chemical, chemical and biological parameters were studied throughout the composting process of four winery and distillery composts and the data set of compost characteristics was analysed using multivariate techniques: factorial analysis (FA) and linear discriminant analysis (LDA), in order to classify the different parameters studied and thus, to establish those that better describe the composting process of this type of wastes. Through factorial analysis (FA) of the parameters studied throughout the composting process, four components that explained 85.6% of the variability were established. The parameters associated to compost maturity, agronomic character, water-soluble fraction and ammonia and temperature increment were grouped in the components F1, F2, F3 and F4, respectively, which can reduce the number of determinations needed to ascertain the maturity and quality of the composts. In addition, the linear discriminant analysis on the factorial components makes possible to classify the four composts with a percentage of success around 95%.  相似文献   

20.
《Process Biochemistry》2014,49(4):583-588
To achieve the real-time smell monitoring of solid-state fermentation (SSF) of protein feed associated with its degree of fermentation. Electronic nose (e-nose) technique, with the help of chemometric analysis, was attempted in this study. Linear discriminant analysis (LDA), K-nearest neighbors (KNN), and support vector machines (SVM) were respectively used to calibrate discrimination models in order to evaluate the influences of different linear and non-linear classification algorithms on the identification results. Experimental results showed that the predictive precision of SVM model was superior to those of the others two, and the optimum SVM model was obtained when five PCs were included. The discrimination rates of the SVM model were 97.14% and 91.43% in the training and testing sets, respectively. The overall results sufficiently demonstrate excellent promise for the e-nose technique combined with an appropriate chemometric method to be applied in the SSF industry.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号