首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Prediction of protein structural class by discriminant analysis   总被引:7,自引:0,他引:7  
Protein structural class--alpha, beta, mixed (alpha/beta or alpha + beta), irregular--can be predicted from the amino acid sequence by discriminant analysis. Discrimination is based on distributions, in the classes, of vectors of attributes characterizing the sequences. In this paper, two sets of attributes and two methods of estimating their distributions are compared using more than 100 proteins from the Protein Data Bank. The best results were obtained when canonical variates of the frequencies of occurrence of 20 amino acids and non-parametric estimates of their distributions were used. Three variates are sufficient to allocate proteins to one of four classes with 83% reliability (estimated by cross-validation) and four variates allowed allocation to one of five classes with 78% reliability.  相似文献   

2.
Carboxy-terminal α-amidation is a widespread post-translational modification of proteins found widely in vertebrates and invertebrates. The α-amide group is required for full biological activity, since it may render a peptide more hydrophobic and thus better be able to bind to other proteins, preventing ionization of the C-terminus. However, in particular, the C-terminal amidation is very difficult to detect because experimental methods are often labor-intensive, time-consuming and expensive. Therefore, in silico methods may complement due to their high efficiency. In this study, a computational method was developed to predict protein amidation sites, by incorporating the maximum relevance minimum redundancy method and the incremental feature selection method based on the nearest neighbor algorithm. From a total of 735 features, 41 optimal features were selected and were utilized to construct the final predictor. As a result, the predictor achieved an overall Matthews correlation coefficient of 0.8308. Feature analysis showed that PSSM conservation scores and amino acid factors played the most important roles in the α-amidation site prediction. Site-specific feature analyses showed that features derived from the amidation site itself and adjacent sites were most significant. This method presented could be used as an efficient tool to theoretically predict amidated peptides. And the selected features from our study could shed some light on the in-depth understanding of the mechanisms of the amidation modification, providing guidelines for experimental validation.  相似文献   

3.
Sexing birds by discriminant analysis: further considerations   总被引:1,自引:0,他引:1  
  相似文献   

4.
【目的】马铃薯甲虫是马铃薯生产过程中的毁灭性害虫。温度是影响马铃薯甲虫发生的重要因素,明确马铃薯甲虫越冬期及发生期的温度对其发生的影响,可为该害虫未来发生情况的预测和防治提供理论支持。【方法】采用逐步判别分析法对1994—2021年马铃薯甲虫越冬及发生期(上一年12月—当年9月)新疆察布查尔县马铃薯甲虫发生等级及出土时间进行判别分类,建立发生预测模型。【结果】在训练组中,马铃薯甲虫的发生等级、出土时间判别准确率分别为100.00%、80.00%;在预测组中,马铃薯甲虫的发生等级、出土时间总判别准确率分别为69.23%、76.92%,认为判别结果较可信。【结论】通过对影响发生程度、出土时间判别的因素筛选发现,察布查尔县马铃薯甲虫的出土和发生判别均受到4月温度的影响。  相似文献   

5.
Using multivariant discriminant function analysis, the sex of 232 Finnish crania of known sex was determined. Eight measurements were used to form two discriminant functions. In 80% of cases the sex determination by means of the discriminant functions was identical with the original information. The applicability of the sex discriminant function of Giles and Elliot ('63) for American white and Negro crania was also tested on the Finnish crania. An accuracy of only 65% was attained.  相似文献   

6.
7.
The approach adopted involved two-stages. First the 11205 measurements in the mass spectrometry data were reduced to 14 scores by a principal component analysis of the centered but otherwise untreated and unscaled data matrix. Then a linear classifier was derived by linear discriminant analysis using these 14 scores as inputs. This number of scores was chosen by leave-one-out cross-validation on the training set, where it gave an overall error rate of 14%. Some indication of the information used in the classification may be obtained from an inspection of the coefficients of the linear classifier.  相似文献   

8.
Hu LL  Wan SB  Niu S  Shi XH  Li HP  Cai YD  Chou KC 《Biochimie》2011,93(3):489-496
Palmitoylation is a universal and important lipid modification, involving a series of basic cellular processes, such as membrane trafficking, protein stability and protein aggregation. With the avalanche of new protein sequences generated in the post genomic era, it is highly desirable to develop computational methods for rapidly and effectively identifying the potential palmitoylation sites of uncharacterized proteins so as to timely provide useful information for revealing the mechanism of protein palmitoylation. By using the Incremental Feature Selection approach based on amino acid factors, conservation, disorder feature, and specific features of palmitoylation site, a new predictor named IFS-Palm was developed in this regard. The overall success rate thus achieved by jackknife test on a newly constructed benchmark dataset was 90.65%. It was shown via an in-depth analysis that palmitoylation was intimately correlated with the feature of the upstream residue directly adjacent to cysteine site as well as the conservation of amino acid cysteine. Meanwhile, the protein disorder region might also play an import role in the post-translational modification. These findings may provide useful insights for revealing the mechanisms of palmitoylation.  相似文献   

9.
10.
Hu LL  Niu S  Huang T  Wang K  Shi XH  Cai YD 《PloS one》2010,5(12):e15917

Background

Hydroxylation is an important post-translational modification and closely related to various diseases. Besides the biotechnology experiments, in silico prediction methods are alternative ways to identify the potential hydroxylation sites.

Methodology/Principal Findings

In this study, we developed a novel sequence-based method for identifying the two main types of hydroxylation sites – hydroxyproline and hydroxylysine. First, feature selection was made on three kinds of features consisting of amino acid indices (AAindex) which includes various physicochemical properties and biochemical properties of amino acids, Position-Specific Scoring Matrices (PSSM) which represent evolution information of amino acids and structural disorder of amino acids in the sliding window with length of 13 amino acids, then the prediction model were built using incremental feature selection method. As a result, the prediction accuracies are 76.0% and 82.1%, evaluated by jackknife cross-validation on the hydroxyproline dataset and hydroxylysine dataset, respectively. Feature analysis suggested that physicochemical properties and biochemical properties and evolution information of amino acids contribute much to the identification of the protein hydroxylation sites, while structural disorder had little relation to protein hydroxylation. It was also found that the amino acid adjacent to the hydroxylation site tends to exert more influence than other sites on hydroxylation determination.

Conclusions/Significance

These findings may provide useful insights for exploiting the mechanisms of hydroxylation.  相似文献   

11.
12.
Identification of protein coding regions is fundamentally a statistical pattern recognition problem. Discriminant analysis is a statistical technique for classifying a set of observations into predefined classes and it is useful to solve such problems. It is well known that outliers are present in virtually every data set in any application domain, and classical discriminant analysis methods (including linear discriminant analysis (LDA) and quadratic discriminant analysis (QDA)) do not work well if the data set has outliers. In order to overcome the difficulty, the robust statistical method is used in this paper. We choose four different coding characters as discriminant variables and an approving result is presented by the method of robust discriminant analysis.  相似文献   

13.
Mouse strains were identified by the aid of discriminant functions obtained from discriminant analysis of values measured at 13 sites of the mandible. They consisted of nine inbred strains of mice, AA, DDD, DDK, DDY, DSD, KK, NC, RR, and SS, and one mutant strain, NC-brp, maintained exactly in the National Institute of Animal Health, Minstry of Agriculture, Forestry and Fisheries. As a result, the probability of erroneous discrimination was 1 head/246 head, or 0.41%, for the males and 2 head/238 head, or 0.84%, for the females. Therefore, almost all the mouse strains were identified correctly. These results seemed to indicate that the strains of mice would be identified more correctly than before, if the present method by the aid of discriminant functions was applied in addition to the methods of identification based on the coat color, biochemical marker-genes, and histocompatibility genes.  相似文献   

14.
15.
In this study, membrane proteins were classified using the information hidden in their sequences. It was achieved by applying the wavelet analysis to the sequences and consequently extracting several features, each of them revealing a proportion of the information content present in the sequence. The resultant features were made normalized and subsequently fed into a cascaded model developed in order to reduce the effect of the existing bias in the dataset, rising from the difference in size of the membrane protein classes. The results indicate an improvement in prediction accuracy of the model in comparison with similar works. The application of the presented model can be extended to other fields of structural biology due to its efficiency, simplicity and flexibility.  相似文献   

16.
Based on the 210 non-homologous proteins (domains) classified manually by Michie et al. (J. Mol. Biol. 262, 168-185, 1996), a new structure classification criterion of globular proteins relying on the content of helix/strand has been proposed, using a quadratic discriminant method. Each protein is classified into one of the three classes, i.e. those of alpha class, beta class and alphabeta class (including alpha/beta and alpha+beta classes). According to the new structure classification criterion, of the 210 proteins in the training set, 207 are correctly classified and thus the accuracy is 207/210=98.57%. Multiple cross-validation tests are performed. The jackknife test shows that of the 210 proteins 207 are correctly classified with an accuracy of 98.57%. To test the method further, of 3577 proteins (domains) extracted from SCOP, 91.39% of them are correctly reclassified by the new classification criterion. On average, the accuracy of the new criterion is about 8 percentage points higher than that of the criterion proposed by Nakashima et al. (J. Biochem. 99, 153-162, 1986). Our result shows that the classification based solely on structures is basically consistent with that combining both structural and evolutionary information. Further complete automated classification scheme should consider both structures and evolutionary relationship. The methodology presented provides an appropriate mathematical format to reach this goal.  相似文献   

17.
In this paper, we intend to predict protein structural classes (α, β, α+β, or α/β) for low-homology data sets. Two data sets were used widely, 1189 (containing 1092 proteins) and 25PDB (containing 1673 proteins) with sequence homology being 40% and 25%, respectively. We propose to decompose the chaos game representation of proteins into two kinds of time series. Then, a novel and powerful nonlinear analysis technique, recurrence quantification analysis (RQA), is applied to analyze these time series. For a given protein sequence, a total of 16 characteristic parameters can be calculated with RQA, which are treated as feature representation of protein sequences. Based on such feature representation, the structural class for each protein is predicted with Fisher's linear discriminant algorithm. The jackknife test is used to test and compare our method with other existing methods. The overall accuracies with step-by-step procedure are 65.8% and 64.2% for 1189 and 25PDB data sets, respectively. With one-against-others procedure used widely, we compare our method with five other existing methods. Especially, the overall accuracies of our method are 6.3% and 4.1% higher for the two data sets, respectively. Furthermore, only 16 parameters are used in our method, which is less than that used by other methods. This suggests that the current method may play a complementary role to the existing methods and is promising to perform the prediction of protein structural classes.  相似文献   

18.
Partial least squares discriminant analysis (PLS-DA) is a partial least squares regression of a set Y of binary variables describing the categories of a categorical variable on a set X of predictor variables. It is a compromise between the usual discriminant analysis and a discriminant analysis on the significant principal components of the predictor variables. This technique is specially suited to deal with a much larger number of predictors than observations and with multicollineality, two of the main problems encountered when analysing microarray expression data. We explore the performance of PLS-DA with published data from breast cancer (Perou et al. 2000). Several such analyses were carried out: (1) before vs after chemotherapy treatment, (2) estrogen receptor positive vs negative tumours, and (3) tumour classification. We found that the performance of PLS-DA was extremely satisfactory in all cases and that the discriminant cDNA clones often had a sound biological interpretation. We conclude that PLS-DA is a powerful yet simple tool for analysing microarray data.  相似文献   

19.
Facing the requirements of refined paleodemographical analyses, the access to the early ontogenetic sex ratio of skeletal populations is an important feature. Using raw data provided byFazekas & Kosa (1978) for a sample of known sex, discriminant functions are derived from hip and thigh bone dimensions that allow an almost unbiased classification of more than 70% of fetal and neonate individuals.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号