共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
Identifying prokaryotes in silico is commonly based on DNA sequences. In experiments where DNA sequences may not be immediately available, we need to have a different approach to detect prokaryotes based on RNA or protein sequences. N-formylmethionine (fMet) is known as a typical characteristic of prokaryotes. A web tool has been implemented here for predicting prokaryotes through detecting the N-formylmethionine residues in protein sequences. The predictor is constructed using support vector machine. An online predictor has been implemented using Python. The implemented predictor is able to achieve the total prediction accuracy 80% with the specificity 80% and the sensitivity 81%. 相似文献
3.
Plewczynski D Tkacz A Wyrwicz LS Godzik A Kloczkowski A Rychlewski L 《Journal of molecular modeling》2006,12(4):453-461
Our algorithm predicts short linear functional motifs in proteins using only sequence information. Statistical models for
short linear functional motifs in proteins are built using the database of short sequence fragments taken from proteins in
the current release of the Swiss-Prot database. Those segments are confirmed by experiments to have single-residue post-translational
modification. The sensitivities of the classification for various types of short linear motifs are in the range of 70%. The
query protein sequence is dissected into short overlapping fragments. All segments are represented as vectors. Each vector
is then classified by a machine learning algorithm (Support Vector Machine) as potentially modifiable or not. The resulting
list of plausible post-translational sites in the query protein is returned to the user. We also present a study of the human
protein kinase C family as a biological application of our method. 相似文献
4.
High dimensional data increase the dimension of space and consequently the computational complexity and result in lower generalization. From these types of classification problems microarray data classification can be mentioned. Microarrays contain genetic and biological data which can be used to diagnose diseases including various types of cancers and tumors. Having intractable dimensions, dimension reduction process is necessary on these data. The main goal of this paper is to provide a method for dimension reduction and classification of genetic data sets. The proposed approach includes different stages. In the first stage, several feature ranking methods are fused for enhancing the robustness and stability of feature selection process. Wrapper method is combined with the proposed hybrid ranking method to embed the interaction between genes. Afterwards, the classification process is applied using support vector machine. Before feeding the data to the SVM classifier the problem of imbalance classes of data in the training phase should be overcame. The experimental results of the proposed approach on five microarray databases show that the robustness metric of the feature selection process is in the interval of [0.70, 0.88]. Also the classification accuracy is in the range of [91%, 96%]. 相似文献
5.
《IRBM》2022,43(4):300-308
ObjectivesThis study investigates the performance of the Support Vector Machine (SVM) to classify non-real-time and real-time EMG signals. The study also compares training performance using personalized and generalized data from all subjects. Thus, an idea about the data sets to be used in the training of the real-time classification model has been put forward. In addition, real-time classification results were obtained for ten days, and it was observed how training oneself would affect the classification results.Material and methods:EMG data were acquired for 7 hand gestures from 8 healthy subjects to create the data set: fist, fingers spread, wave-in, wave-out, pronation, supination, and rest. Subjects repeated each gesture 30 times. The Myo armband with 8 dry surface electrodes was used for data acquisition.Results14 features of the EMG signals have been extracted and non-real-time classification has been made for each feature; the highest accuracy of 96.38% was obtained using root mean square (RMS) and integrated EMG features. Three (3) kernel functions of SVM were tested in non-real-time classification and the highest accuracy was obtained with Cubic SVM using 3rd order polynomial. For this reason, Cubic SVM was used for real-time classification using the features that gave the best results in non-real-time classification. A subject repeated the gestures and real-time classification was performed. The highest accuracy of 99.05% was obtained with the mean absolute value (MAV) feature. The real-time classification was undertaken on eight subjects using the MAV feature's best performance with an average accuracy of 95.83% using the personalized data set and 91.79% using the generalized data set.ConclusionThe greatest accuracy is obtained by training the classifier with the subject's own data. Thus, it can be said that EMG signals are personal, just like fingerprints and retina. In addition, as a result, the tests repeated for 10 days showed the repeatability of the activation of the relevant muscle set and the training takes place and how this can be applied to those who will use prosthetic hands to obtain certain gestures. 相似文献
6.
Park JN Lee DJ Kwon O Oh DB Bahn YS Kang HA 《The Journal of biological chemistry》2012,287(23):19501-19515
The encapsulated fungal pathogen Cryptococcus neoformans causes cryptococcosis in immunocompromised individuals. Although cell surface mannoproteins have been implicated in C. neoformans pathogenicity, the structure of N-linked glycans assembled on mannoproteins has not yet been elucidated. By analyzing oligosaccharide profiles combined with exoglycosidase treatment, we report here that C. neoformans has serotype-specific high mannose-type N-glycans with or without a β1,2-xylose residue, which is attached to the trimannosyl core of N-glycans. Interestingly, the neutral N-glycans of serotypes A and D were shown to contain a xylose residue, whereas those of serotype B appeared to be much shorter and devoid of a xylose residue. Moreover, analysis of the C. neoformans uxs1Δ mutant demonstrated that UDP-xylose is utilized as a donor sugar in N-glycan biosynthesis. We also constructed and analyzed a set of C. neoformans mutant strains lacking genes putatively assigned to the reconstructed N-glycan biosynthesis pathway. It was shown that the outer chain of N-glycan is initiated by CnOch1p with addition of an α1,6-mannose residue and then subsequently extended by CnMnn2p with multiple additions of α1,2-mannose residues. Finally, comparative analysis of acidic N-glycans from wild-type, Cnoch1Δ, Cnmnn2Δ, and Cnuxs1Δ strains strongly indicated the presence of xylose phosphate attached to mannose residues in the core and outer region of N-glycans. Our data present the first report on the unique structure and biosynthesis pathway of N-glycans in C. neoformans. 相似文献
7.
Jingbo X Silan Z Feng S Huijuan X Xuehai H Xiaohui N Zhi L 《Journal of theoretical biology》2011,284(1):16-23
To evaluate the possibility of an unknown protein to be a resistant gene against Xanthomonas oryzae pv. oryzae, a different mode of pseudo amino acid composition (PseAAC) is proposed to formulate the protein samples by integrating the amino acid composition, as well as the Chaos games representation (CGR) method. Some numerical comparisons of triangle, quadrangle and 12-vertex polygon CGR are carried to evaluate the efficiency of using these fractal figures in classifiers. The numerical results show that among the three polygon methods, triangle method owns a good fractal visualization and performs the best in the classifier construction. By using triangle + 12-vertex polygon CGR as the mathematical feature, the classifier achieves 98.13% in Jackknife test and MCC achieves 0.8462. 相似文献
8.
Jiang Lin Qin Donald Rundquist Anatoly Gitelson Mark Steele Christopher Harkins Rebecca Briles 《生态学报》2010,30(6):297-303
Vegetation is a key element of our ecology system. The leaf area and its thickness provide valuable information about the status of our environment. Thus, there is a need for accurate, efficient, practical methodologies to estimate this biochemical parameter. Hyperspectral measurement is a means of quickly assessing leaf parameter in situ. In the past decades, there were lots of work (Boyd et al.) that focused on measurement of leaf area index, but very few on measurement of leaf thickness. In this paper, reflectance of grape leaves was measured over the spectral range of 350–1010 nm. The corresponding thickness of leaves from four grapevine cultivars was also measured as part of seventeen field campaigns undertaken during the summer of 2007. An artificial-intelligence technique, the support vector machine (SVM) model, was introduced to establish the relationship between the leaf thickness and red-edge/near-infrared (NIR) reflectance, with variability examined among individual cultivars as well as at various growth stages. The best wavelengths were variable depending on the grape cultivar and growth stage. The SVM model allows compilation of factors such as cultivar and growth stage with spectral information to yield a superior result. 相似文献
9.
A non-linear model for measuring grapevine leaf thickness by means of red-edge/near-infrared spectral reflectance 下载免费PDF全文
Qin J L Donald Rundquist Anatoly Gitelson Mark Steele Christopher Harkins Rebecca Briles 《农业工程》2010,30(6):297-303
Vegetation is a key element of our ecology system. The leaf area and its thickness provide valuable information about the status of our environment. Thus, there is a need for accurate, efficient, practical methodologies to estimate this biochemical parameter. Hyperspectral measurement is a means of quickly assessing leaf parameter in situ. In the past decades, there were lots of work (Boyd et al.) that focused on measurement of leaf area index, but very few on measurement of leaf thickness. In this paper, reflectance of grape leaves was measured over the spectral range of 350–1010 nm. The corresponding thickness of leaves from four grapevine cultivars was also measured as part of seventeen field campaigns undertaken during the summer of 2007. An artificial-intelligence technique, the support vector machine (SVM) model, was introduced to establish the relationship between the leaf thickness and red-edge/near-infrared (NIR) reflectance, with variability examined among individual cultivars as well as at various growth stages. The best wavelengths were variable depending on the grape cultivar and growth stage. The SVM model allows compilation of factors such as cultivar and growth stage with spectral information to yield a superior result. 相似文献
10.
Due to the structural and functional importance of tight turns, some methods have been proposed to predict gamma-turns, beta-turns, and alpha-turns in proteins. In the past, studies of pi-turns were made, but not a single prediction approach has been developed so far. It will be useful to develop a method for identifying pi-turns in a protein sequence. In this paper, the support vector machine (SVM) method has been introduced to predict pi-turns from the amino acid sequence. The training and testing of this approach is performed with a newly collected data set of 640 non-homologous protein chains containing 1931 pi-turns. Different sequence encoding schemes have been explored in order to investigate their effects on the prediction performance. With multiple sequence alignment and predicted secondary structure, the final SVM model yields a Matthews correlation coefficient (MCC) of 0.556 by a 7-fold cross-validation. A web server implementing the prediction method is available at the following URL: http://210.42.106.80/piturn/. 相似文献
11.
12.
13.
Ion channels are integral membrane proteins that control movement of ions into or out of cells. They are key components in a wide range of biological processes. Different types of ion channels have different biological functions. With the appearance of vast proteomic data, it is highly desirable for both basic research and drug-target discovery to develop a computational method for the reliable prediction of ion channels and their types. In this study, we developed a support vector machine-based method to predict ion channels and their types using primary sequence information. A feature selection technique, analysis of variance (ANOVA), was introduced to remove feature redundancy and find out an optimized feature set for improving predictive performance. Jackknife cross-validated results show that the proposed method can discriminate ion channels from non-ion channels with an overall accuracy of 86.6%, classify voltage-gated ion channels and ligand-gated ion channels with an overall accuracy of 92.6% and predict four types (potassium, sodium, calcium and anion) of voltage-gated ion channels with an overall accuracy of 87.8%, respectively. These results indicate that the proposed method can correctly identify ion channels and provide important instructions for drug-target discovery. The predictor can be freely downloaded from http://cobi.uestc.edu.cn/people/hlin/tools/IonchanPred/. 相似文献
14.
The core of an enzyme molecule is its active site from the viewpoints of both academic research and industrial application. To reveal the structural and functional mechanism of an enzyme, one needs to know its active site; to conduct structure-based drug design by regulating the function of an enzyme, one needs to know the active site and its microenvironment as well. Given the atomic coordinates of an enzyme molecule, how can we predict its active site? To tackle such a problem, a distance group approach was proposed and the support vector machine algorithm applied to predict the catalytic triad of serine hydrolase family. The success rate by jackknife test for the 139 serine hydrolases was 85%, implying that the method is quite promising and may become a useful tool in structural bioinformatics. 相似文献
15.
Ikeda K Tomii K Yokomizo T Mitomo D Maruyama K Suzuki S Higo J 《Protein science : a publication of the Protein Society》2005,14(5):1253-1265
Analysis of the conformational distribution of polypeptide segments in a conformational space is the first step for understanding a principle of structural diversity of proteins. Here, we present a statistical analysis of protein local structures based on interatomic C(alpha) distances. Using principal component analysis (PCA) on the intrasegment C(alpha)-C(alpha) atomic distances, the conformational space of protein segments, which we call the protein segment universe, has been visualized, and three essential coordinate axes, suitable for describing the universe, have been identified. Three essential axes specified radius of gyration, structural symmetry, and separation of hairpin structures from other structures. Among the segments of arbitrary length, 6-22 residues long, the conservation of those axes was uncovered. Further application of PCA to the two largest clusters in the universe revealed local structural motifs. Although some of motifs have already been reported, we identified a possibly novel strand motif. We also showed that a capping box, which is one of the helix capping motifs, was separated into independent subclusters based on the C(alpha) geometry. Implications of the strand motif, which may play a role for protein-protein interaction, are discussed. The currently proposed method is useful for not only mapping the immense universe of protein structures but also identification of structural motifs. 相似文献
16.
McDowell (2004) instantiated the Darwinian principles of selection, recombination, and mutation in a computational model of selection by consequences. The model has been tested under a variety of conditions and the outcome is quantitatively indistinguishable from that displayed by live organisms. The computational model animates a virtual organism with a repertoire of 100 behaviors, selected from the integers from 0 to 1023, where the corresponding binary representations constitute the behavior's genotypes. Using strings of binary digits raises the specific problem of Hamming distances: the number of bits that must be changed from 1 to 0 or from 0 to 1 in order to obtain another string of equal length. McDowell hypothesized that the Hamming distance may be computationally equivalent to the changeover delay used in experiments with live organisms. The results of the present experiments confirmed this hypothesis and revealed a robust rule about the effects of Hamming distances within the model, namely, in order to obtain good matching, the difference between the Hamming distance that separates the target classes and the largest Hamming distance comprised within a class must be equal to or larger than three. 相似文献
17.
Avid M. Afzal Fawzia Al‐Shubailly David P. Leader E. James Milner‐White 《Proteins》2014,82(11):3023-3031
The nest is a protein motif of three consecutive amino acid residues with dihedral angles 1,2‐αRαL (RL nests) or 1,2‐αLαR (LR nests). Many nests form a depression in which an anion or δ‐negative acceptor atom is bound by hydrogen bonds from the main chain NH groups. We have determined the extent and nature of this bridging in a database of protein structures using a computer program written for the purpose. Acceptor anions are bound by a pair of bridging hydrogen bonds in 40% of RL nests and 20% of LR nests. Two thirds of the bridges are between the NH groups at Positions 1 and 3 of the motif (N1N3‐bridging)—which confers a concavity to the nest; one third are of the N2N3 type—which does not. In bridged LR nests N2N3‐bridging predominates (14% N1N3: 75% N2N3), whereas in bridged RL nests the reverse is true (69% N1N3: 25% N2N3). Most bridged nests occur within larger motifs: 45% in (hexapeptide) Schellman loops with an additional 4 → 0 hydrogen bond (N1N3), 11% in Schellman loops with an additional 5 → 1 hydrogen bond (N2N3), 12% in a composite structure including a type 1β‐bulge loop and an asx‐ or ST‐ motif (N1N3)—remarkably homologous to the N1N3‐bridged Schellman loop—and 3% in a composite structure including a type 2β‐bulge loop and an asx‐motif (N2N3). A third hydrogen bond is a previously unrecognized feature of Schellman loops as those lacking bridged nests have an additional 4 → 0 hydrogen bond. Proteins 2014; 82:3023–3031. © 2014 Wiley Periodicals, Inc. 相似文献
18.
19.
基于氨基酸序列,用打分值、离散增量、自相关函数值和距离值来表示β-发夹模体信息,通过二次判别方法对上述信息进行融合,预测数据库ArchDB40和EVA中的β-发夹模体。文章使用的β-发夹模体包含的loop长为2~10个氨基酸,当序列模式长为17个氨基酸时,对两个数据库中β-发夹5交叉检验预测的总精度分别达到83.1%和80.7%,相关系数达到0.59和0.61,好于前人的预测结果。 相似文献
20.
复杂疾病驱使的融合SDA-SVM集成基因挖掘方法 总被引:1,自引:0,他引:1
提出了一种新颖的复杂疾病驱使的融合SDA-SVM(Stepwise Discriminant Analysis-Support Vector Machine,SDA-SVM)技术的集成基因挖掘方法。该集成方法融合逐步判别分析和支持向量机的优点,能够有效地进行复杂疾病相关基因的深度挖掘,使得挖掘出的基因能够较好地识别疾病类型和亚型。通过将该方法应用于一套弥散性大B细胞淋巴瘤DNA表达谱数据,并与其它基因挖掘方法对比,结果表明该方法挖掘出的基因具有较高的疾病相关性和较强的疾病类型识别能力。 相似文献