首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
In this paper, we present an effective and efficient diagnosis system based on particle swarm optimization (PSO) enhanced fuzzy k-nearest neighbor (FKNN) for Parkinson's disease (PD) diagnosis. In the proposed system, named PSO–FKNN, both the continuous version and binary version of PSO were used to perform the parameter optimization and feature selection simultaneously. On the one hand, the neighborhood size k and the fuzzy strength parameter m in FKNN classifier are adaptively specified by the continuous PSO. On the other hand, binary PSO is utilized to choose the most discriminative subset of features for prediction. The effectiveness of the PSO–FKNN model has been rigorously evaluated against the PD data set in terms of classification accuracy, sensitivity, specificity and the area under the receiver operating characteristic (ROC) curve (AUC). Compared to the existing methods in previous studies, the proposed system has achieved the highest classification accuracy reported so far via 10-fold cross-validation analysis, with the mean accuracy of 97.47%. Promisingly, the proposed diagnosis system might serve as a new candidate of powerful tools for diagnosing PD with excellent performance.  相似文献   

2.
MOTIVATION: The solvent accessibility of amino acid residues plays an important role in tertiary structure prediction, especially in the absence of significant sequence similarity of a query protein to those with known structures. The prediction of solvent accessibility is less accurate than secondary structure prediction in spite of improvements in recent researches. The k-nearest neighbor method, a simple but powerful classification algorithm, has never been applied to the prediction of solvent accessibility, although it has been used frequently for the classification of biological and medical data. RESULTS: We applied the fuzzy k-nearest neighbor method to the solvent accessibility prediction, using PSI-BLAST profiles as feature vectors, and achieved high prediction accuracies. With leave-one-out cross-validation on the ASTRAL SCOP reference dataset constructed by sequence clustering, our method achieved 64.1% accuracy for a 3-state (buried/intermediate/exposed) prediction (thresholds of 9% for buried/intermediate and 36% for intermediate/exposed) and 86.7, 82.0, 79.0 and 78.5% accuracies for 2-state (buried/exposed) predictions (thresholds of each 0, 5, 16 and 25% for buried/exposed), respectively. Our method also showed slightly better accuracies than other methods by about 2-5% on the RS126 dataset and a benchmarking dataset with 229 proteins. AVAILABILITY: Program and datasets are available at http://biocom1.ssu.ac.kr/FKNNacc/ CONTACT: jul@ssu.ac.kr.  相似文献   

3.
We present a machine learning method (a hierarchical network of k-nearest neighbor classifiers) that uses an RNA sequence alignment in order to predict a consensus RNA secondary structure. The input to the network is the mutual information, the fraction of complementary nucleotides, and a novel consensus RNAfold secondary structure prediction of a pair of alignment columns and its nearest neighbors. Given this input, the network computes a prediction as to whether a particular pair of alignment columns corresponds to a base pair. By using a comprehensive test set of 49 RFAM alignments, the program KNetFold achieves an average Matthews correlation coefficient of 0.81. This is a significant improvement compared with the secondary structure prediction methods PFOLD and RNAalifold. By using the example of archaeal RNase P, we show that the program can also predict pseudoknot interactions.  相似文献   

4.
Xia XY  Ge M  Wang ZX  Pan XM 《PloS one》2012,7(6):e37653
Because of the increasing gap between the data from sequencing and structural genomics, the accurate prediction of the structural class of a protein domain solely from the primary sequence has remained a challenging problem in structural biology. Traditional sequence-based predictors generally select several sequence features and then feed them directly into a classification program to identify the structural class. The current best sequence-based predictor achieved an overall accuracy of 74.1% when tested on a widely used, non-homologous benchmark dataset 25PDB. In the present work, we built a multiple linear regression (MLR) model to convert the 440-dimensional (440D) sequence feature vector extracted from the Position Specific Scoring Matrix (PSSM) of a protein domain to a 4-dimensinal (4D) structural feature vector, which could then be used to predict the four major structural classes. We performed 10-fold cross-validation and jackknife tests of the method on a large non-homologous dataset containing 8,244 domains distributed among the four major classes. The performance of our approach outperformed all of the existing sequence-based methods and had an overall accuracy of 83.1%, which is even higher than the results of those predicted secondary structure-based methods.  相似文献   

5.
Liu T  Geng X  Zheng X  Li R  Wang J 《Amino acids》2012,42(6):2243-2249
Computational prediction of protein structural class based solely on sequence data remains a challenging problem in protein science. Existing methods differ in the protein sequence representation models and prediction engines adopted. In this study, a powerful feature extraction method, which combines position-specific score matrix (PSSM) with auto covariance (AC) transformation, is introduced. Thus, a sample protein is represented by a series of discrete components, which could partially incorporate the long-range sequence order information and evolutionary information reflected from the PSI-BLAST profile. To verify the performance of our method, jackknife cross-validation tests are performed on four widely used benchmark datasets. Comparison of our results with existing methods shows that our method provides the state-of-the-art performance for structural class prediction. A Web server that implements the proposed method is freely available at http://202.194.133.5/xinxi/AAC_PSSM_AC/index.htm.  相似文献   

6.
Several accurate prediction systems have been developed for prediction of class I major histocompatibility complex (MHC):peptide binding. Most of these are trained on binding affinity data of primarily 9mer peptides. Here, we show how prediction methods trained on 9mer data can be used for accurate binding affinity prediction of peptides of length 8, 10 and 11. The method gives the opportunity to predict peptides with a different length than nine for MHC alleles where no such peptides have been measured. As validation, the performance of this approach is compared to predictors trained on peptides of the peptide length in question. In this validation, the approximation method has an accuracy that is comparable to or better than methods trained on a peptide length identical to the predicted peptides. AVAILABILITY: The algorithm has been implemented in the web-accessible servers NetMHC-3.0: http://www.cbs.dtu.dk/services/NetMHC-3.0, and NetMHCpan-1.1: http://www.cbs.dtu.dk/services/NetMHCpan-1.1  相似文献   

7.
Accurate and large‐scale prediction of protein–protein interactions directly from amino‐acid sequences is one of the great challenges in computational biology. Here we present a new Bayesian network method that predicts interaction partners using only multiple alignments of amino‐acid sequences of interacting protein domains, without tunable parameters, and without the need for any training examples. We first apply the method to bacterial two‐component systems and comprehensively reconstruct two‐component signaling networks across all sequenced bacteria. Comparisons of our predictions with known interactions show that our method infers interaction partners genome‐wide with high accuracy. To demonstrate the general applicability of our method we show that it also accurately predicts interaction partners in a recent dataset of polyketide synthases. Analysis of the predicted genome‐wide two‐component signaling networks shows that cognates (interacting kinase/regulator pairs, which lie adjacent on the genome) and orphans (which lie isolated) form two relatively independent components of the signaling network in each genome. In addition, while most genes are predicted to have only a small number of interaction partners, we find that 10% of orphans form a separate class of ‘hub’ nodes that distribute and integrate signals to and from up to tens of different interaction partners.  相似文献   

8.
MOTIVATION: An important area of research in biochemistry and molecular biology focuses on characterization of enzyme mutants. However, synthesis and analysis of experimental mutants is time consuming and expensive. We describe a machine-learning approach for inferring the activity levels of all unexplored single point mutants of an enzyme, based on a training set of such mutants with experimentally measured activity. RESULTS: Based on a Delaunay tessellation-derived four-body statistical potential function, a perturbation vector measuring environmental changes relative to wild type (wt) at every residue position uniquely characterizes each enzyme mutant for model development and prediction. First, a measure of model performance utilizing area (AUC) under the receiver operating characteristic (ROC) curve surpasses 0.83 and 0.77 for data sets of experimental HIV-1 protease and T4 lysozyme mutants, respectively. Additionally, a novel method is introduced for evaluating statistical significance associated with the number of correct test set predictions obtained from a trained model. Third, 100 stratified random splits of the protease and T4 lysozyme mutant data sets into training and test sets achieve 77.0% and 80.8% mean accuracy, respectively. Next, protease and T4 lysozyme models trained with experimental mutants are used to predict activity levels for all remaining mutants; a subsequent search for publications reporting on dozens of these test mutants reveals that experimental results are matched by 79% and 86% of predictions, respectively. Finally, learning curves for each mutant enzyme system indicate the influence of training set size on model performance. AVAILABILITY: Prediction databases at http://proteins.gmu.edu/automute/  相似文献   

9.
10.
11.
MOTIVATION: It is well understood that the successful clustering of expression profiles give beneficial ideas to understand the functions of uncharacterized genes. In order to realize such a successful clustering, we investigate a clustering method based on adaptive resonance theory (ART) in this report. RESULTS: We apply Fuzzy ART as a clustering method for analyzing the time series expression data during sporulation of Saccharomyces cerevisiae. The clustering result by Fuzzy ART was compared with those by other clustering methods such as hierarchical clustering, k-means algorithm and self-organizing maps (SOMs). In terms of the mathematical validations, Fuzzy ART achieved the most reasonable clustering. We also verified the robustness of Fuzzy ART using noised data. Furthermore, we defined the correctness ratio of clustering, which is based on genes whose temporal expressions are characterized biologically. Using this definition, it was proved that the clustering ability of Fuzzy ART was superior to other clustering methods such as hierarchical clustering, k-means algorithm and SOMs. Finally, we validate the clustering results by Fuzzy ART in terms of biological functions and evidence. AVAILABILITY: The software is available at http//www.nubio.nagoya-u.ac.jp/proc/index.html  相似文献   

12.
With the rapid increment of protein sequence data, it is indispensable to develop automated and reliable predictive methods for protein function annotation. One approach for facilitating protein function prediction is to classify proteins into functional families from primary sequence. Being the most important group of all proteins, the accurate prediction for enzyme family classes and subfamily classes is closely related to their biological functions. In this paper, for the prediction of enzyme subfamily classes, the Chou's amphiphilic pseudo-amino acid composition [Chou, K.C., 2005. Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics 21, 10-19] has been adopted to represent the protein samples for training the 'one-versus-rest' support vector machine. As a demonstration, the jackknife test was performed on the dataset that contains 2640 oxidoreductase sequences classified into 16 subfamily classes [Chou, K.C., Elrod, D.W., 2003. Prediction of enzyme family classes. J. Proteome Res. 2, 183-190]. The overall accuracy thus obtained was 80.87%. The significant enhancement in the accuracy indicates that the current method might play a complementary role to the exiting methods.  相似文献   

13.
14.
15.
For a quantitative understanding of the process of adaptation, we need to understand its "raw material," that is, the frequency and fitness effects of beneficial mutations. At present, most empirical evidence suggests an exponential distribution of fitness effects of beneficial mutations, as predicted for Gumbel-domain distributions by extreme value theory. Here, we study the distribution of mutation effects on cefotaxime (Ctx) resistance and fitness of 48 unique beneficial mutations in the bacterial enzyme TEM-1 β-lactamase, which were obtained by screening the products of random mutagenesis for increased Ctx resistance. Our contributions are threefold. First, based on the frequency of unique mutations among more than 300 sequenced isolates and correcting for mutation bias, we conservatively estimate that the total number of first-step mutations that increase Ctx resistance in this enzyme is 87 [95% CI 75-189], or 3.4% of all 2,583 possible base-pair substitutions. Of the 48 mutations, 10 are synonymous and the majority of the 38 non-synonymous mutations occur in the pocket surrounding the catalytic site. Second, we estimate the effects of the mutations on Ctx resistance by determining survival at various Ctx concentrations, and we derive their fitness effects by modeling reproduction and survival as a branching process. Third, we find that the distribution of both measures follows a Fréchet-type distribution characterized by a broad tail of a few exceptionally fit mutants. Such distributions have fundamental evolutionary implications, including an increased predictability of evolution, and may provide a partial explanation for recent observations of striking parallel evolution of antibiotic resistance.  相似文献   

16.
Sequence alignments of multiple genes are routinely used to infer phylogenetic relationships among species. The analysis of their concatenation is more likely to give correct results under an assumption of homotachy (i.e., the evolutionary rates within lineages in each of the concatenated genes are constant during evolution). Here, we examine how the violation of homotachy (i.e., presence of within-site rate variation, called heterotachy) distorts species phylogenies. A theoretical examination has been conducted using a four taxon case and the neighbor joining (NJ) method, concluding that NJ recovers the incorrect tree when concatenated genes exhibit heterotachy. The application of average and weighted-average distance approaches, where gene boundaries are kept intact, overcomes the detrimental effect of heterotachy in multigene analysis using the NJ method.  相似文献   

17.
Drug permeability determines the oral availability of drugs via cellular membranes. Poor permeability makes a drug unsuitable for further development. The permeability may be estimated as the free energy change that the drug should overcome through crossing membrane. In this paper the drug permeability was simulated using molecular dynamics method and the potential energy profile was calculated with potential of mean force (PMF) method. The membrane was simulated using DPPC bilayer and three drugs with different permeability were tested. PMF studies on these three drugs show that doxorubicin (low permeability) should pass higher free energy barrier from water to DPPC bilayer center while ibuprofen (high permeability) has a lower energy barrier. Our calculation indicates that the simulation model we built is suitable to predict drug permeability.  相似文献   

18.
MOTIVATION: With the emerging success of protein secondary structure prediction through the applications of various statistical and machine learning techniques, similar techniques have been applied to protein beta-turn prediction. In this study, we perform protein beta-turn prediction using a k-nearest neighbor method, which is combined with a filter that uses predicted protein secondary structure information. Traditional beta-turn prediction from k-nearest neighbor method is modified to account for the unbalanced ratio of the natural occurrence of beta-turns and non-beta-turns. RESULTS: Our prediction scheme is tested on a set of 426 non-homologous protein sequences. The prediction scheme consists of two stages: k-nearest neighbor method stage and filtering stage. Variations of the k-nearest neighbor method were used to take property of beta-turns into consideration. Our filtering method uses beta-turn/non-beta-turn estimates from the k-nearest neighbor method stage and predicted protein secondary structure information from PSI-PRED in order to get new beta-turn/non-beta-turn estimate. Our result is compared with the previously best known beta-turn prediction method on the dataset of 426 non-homologous protein sequences and is shown to give slightly superior performance at significantly lower computational complexity. AVAILABILITY: Contact the author for information on the source code of the programs used.  相似文献   

19.
Alternative splicing (AS) involving NAGNAG tandem acceptors is an evolutionarily widespread class of AS. Recent predictions of alternative acceptor usage reported better results for acceptors separated by larger distances, than for NAGNAGs. To improve the latter, we aimed at the use of Bayesian networks (BN), and extensive experimental validation of the predictions. Using carefully constructed training and test datasets, a balanced sensitivity and specificity of ≥92% was achieved. A BN trained on the combined dataset was then used to make predictions, and 81% (38/47) of the experimentally tested predictions were verified. Using a BN learned on human data on six other genomes, we show that while the performance for the vertebrate genomes matches that achieved on human data, there is a slight drop for Drosophila and worm. Lastly, using the prediction accuracy according to experimental validation, we estimate the number of yet undiscovered alternative NAGNAGs. State of the art classifiers can produce highly accurate prediction of AS at NAGNAGs, indicating that we have identified the major features of the ‘NAGNAG-splicing code’ within the splice site and its immediate neighborhood. Our results suggest that the mechanism behind NAGNAG AS is simple, stochastic, and conserved among vertebrates and beyond.  相似文献   

20.
MOTIVATION: Prediction of catalytic residues provides useful information for the research on function of enzymes. Most of the existing prediction methods are based on structural information, which limits their use. We propose a sequence-based catalytic residue predictor that provides predictions with quality comparable to modern structure-based methods and that exceeds quality of state-of-the-art sequence-based methods. RESULTS: Our method (CRpred) uses sequence-based features and the sequence-derived PSI-BLAST profile. We used feature selection to reduce the dimensionality of the input (and explain the input) to support vector machine (SVM) classifier that provides predictions. Tests on eight datasets and side-by-side comparison with six modern structure- and sequence-based predictors show that CRpred provides predictions with quality comparable to current structure-based methods and better than sequence-based methods. The proposed method obtains 15-19% precision and 48-58% TP (true positive) rate, depending on the dataset used. CRpred also provides confidence values that allow selecting a subset of predictions with higher precision. The improved quality is due to newly designed features and careful parameterization of the SVM. The features incorporate amino acids characterized by the highest and the lowest propensities to constitute catalytic residues, Gly that provides flexibility for catalytic sites and sequence motifs characteristic to certain catalytic reactions. Our features indicate that catalytic residues are on average more conserved when compared with the general population of residues and that highly conserved amino acids characterized by high catalytic propensity are likely to form catalytic sites. We also show that local (with respect to the sequence) hydrophobicity contributes towards the prediction.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号