共查询到20条相似文献,搜索用时 15 毫秒
1.
MOTIVATION: Currently, the most accurate fold-recognition method is to perform profile-profile alignments and estimate the statistical significances of those alignments by calculating Z-score or E-value. Although this scheme is reliable in recognizing relatively close homologs related at the family level, it has difficulty in finding the remote homologs that are related at the superfamily or fold level. RESULTS: In this paper, we present an alternative method to estimate the significance of the alignments. The alignment between a query protein and a template of length n in the fold library is transformed into a feature vector of length n + 1, which is then evaluated by support vector machine (SVM). The output from SVM is converted to a posterior probability that a query sequence is related to a template, given SVM output. Results show that a new method shows significantly better performance than PSI-BLAST and profile-profile alignment with Z-score scheme. While PSI-BLAST and Z-score scheme detect 16 and 20% of superfamily-related proteins, respectively, at 90% specificity, a new method detects 46% of these proteins, resulting in more than 2-fold increase in sensitivity. More significantly, at the fold level, a new method can detect 14% of remotely related proteins at 90% specificity, a remarkable result considering the fact that the other methods can detect almost none at the same level of specificity. 相似文献
2.
3.
Glotsos D Tohka J Ravazoula P Cavouras D Nikiforidis G 《International journal of neural systems》2005,15(1-2):1-11
A computer-aided diagnosis system was developed for assisting brain astrocytomas malignancy grading. Microscopy images from 140 astrocytic biopsies were digitized and cell nuclei were automatically segmented using a Probabilistic Neural Network pixel-based clustering algorithm. A decision tree classification scheme was constructed to discriminate low, intermediate and high-grade tumours by analyzing nuclear features extracted from segmented nuclei with a Support Vector Machine classifier. Nuclei were segmented with an average accuracy of 86.5%. Low, intermediate, and high-grade tumours were identified with 95%, 88.3%, and 91% accuracies respectively. The proposed algorithm could be used as a second opinion tool for the histopathologists. 相似文献
4.
Full‐field optical coherence tomography (FF‐OCT) has been reported with its label‐free subcellular imaging performance. To realize quantitive cancer detection, the support vector machine model of classifying normal and cancerous human liver tissue is proposed with en face tomographic images. Twenty samples (10 normal and 10 cancerous) were operated from humans and composed of 285 en face tomographic images. Six histogram features and one proposed fractal dimension parameter that reveal the refractive index inhomogeneities of tissue were extracted and made up the training set. The other different 16 samples (8 normal and 8 cancerous) were imaged (190 images) and employed as the test set with the same features. First, a subcellular‐resolution tomographic image library for four histopathological areas in liver tissue was established. Second, the area under the receiver operating characteristics of 0.9378, 0.9858, 0.9391, 0.9517 for prediction of the cancerous hepatic cell, central vein, fibrosis, and portal vein were measured with the test set. The results indicate that the proposed classifier from FF‐OCT images shows promise as a label‐free assessment of quantified tumor detection, suggesting the fractal dimension‐based classifier could aid clinicians in detecting tumor boundaries for resection in surgery in the future. 相似文献
5.
Andrew S Peek 《BMC bioinformatics》2007,8(1):182
Background
RNA interference (RNAi) is a naturally occurring phenomenon that results in the suppression of a target RNA sequence utilizing a variety of possible methods and pathways. To dissect the factors that result in effective siRNA sequences a regression kernel Support Vector Machine (SVM) approach was used to quantitatively model RNA interference activities. 相似文献6.
7.
8.
Zheng M Liu Z Xue C Zhu W Chen K Luo X Jiang H 《Bioinformatics (Oxford, England)》2006,22(17):2099-2106
MOTIVATION: Mutagenicity is among the toxicological end points that pose the highest concern. The accelerated pace of drug discovery has heightened the need for efficient prediction methods. Currently, most available tools fall short of the desired degree of accuracy, and can only provide a binary classification. It is of significance to develop a discriminative and informative model for the mutagenicity prediction. RESULTS: Here we developed a mutagenic probability prediction model addressing the problem, based on datasets covering a large chemical space. A novel molecular electrophilicity vector (MEV) is first devised to represent the structure profile of chemical compounds. An extended support vector machine (SVM) method is then used to derive the posterior probabilistic estimation of mutagenicity from the MEVs of the training set. The results show that our model gives a better performance than TOPKAT (http://www.accelrys.com) and other previously published methods. In addition, a confidence level related to the prediction can be provided, which may help people make more flexible decisions on chemical ordering or synthesis. AVAILABILITY: The binary program (ZGTOX_1.1) based on our model and samples of input datasets on Windows PC are available at http://dddc.ac.cn/adme upon request from the authors. 相似文献
9.
10.
Latson L Sebek B Powell KA 《Analytical and quantitative cytology and histology / the International Academy of Cytology [and] American Society of Cytology》2003,25(6):321-331
OBJECTIVE: To develop an automated, reproducible epithelial cell nuclear segmentation method to quantify cytologic features quickly and accurately from breast biopsy. STUDY DESIGN: The method, based on fuzzy c-mean clustering of the hue-band of color images and the watershed transform, was applied to 39 images from 3 histologic types (typical hyperplasia, atypical hyperplasia, and ductal carcinoma in situ [cribriform and solid]). RESULTS: The performance of the segmentation algorithm was evaluated by visually determining the percentage of badly segmented nuclei (approximately 25% for all types), the percentage of nuclei that remained in clumps (4.5-16.7%) and the percentage of missed nuclei (0.4-1.5%) for each image. CONCLUSION: The segmentation algorithm was sensitive in that a small percentage of nuclei were missed. However, the percentage of badly segmented nuclei was on the order of 25%, and the percentage of nuclei that remained in clumps was on the order of 10% of the total number of nuclei in the duct. Even so, > 600 nuclei per duct, on average, were segmented correctly; that was a sufficient number by which to calculate accurate quantitative, cytologic, morphometric measurements of epithelial cell nuclei in stained tissue sections of breast biopsy. 相似文献
11.
Transporters play key roles in cellular transport and metabolic processes, and in facilitating drug delivery and excretion. These proteins are classified into families based on the transporter classification (TC) system. Determination of the TC family of transporters facilitates the study of their cellular and pharmacological functions. Methods for predicting TC family without sequence alignments or clustering are particularly useful for studying novel transporters whose function cannot be determined by sequence similarity. This work explores the use of a machine learning method, support vector machines (SVMs), for predicting the family of transporters from their sequence without the use of sequence similarity. A total of 10,636 transporters in 13 TC subclasses, 1914 transporters in eight TC families, and 168,341 nontransporter proteins are used to train and test the SVM prediction system. Testing results by using a separate set of 4351 transporters and 83,151 nontransporter proteins show that the overall accuracy for predicting members of these TC subclasses and families is 83.4% and 88.0%, respectively, and that of nonmembers is 99.3% and 96.6%, respectively. The accuracies for predicting members and nonmembers of individual TC subclasses are in the range of 70.7-96.1% and 97.6-99.9%, respectively, and those of individual TC families are in the range of 60.6-97.1% and 91.5-99.4%, respectively. A further test by using 26,139 transmembrane proteins outside each of the 13 TC subclasses shows that 90.4-99.6% of these are correctly predicted. Our study suggests that the SVM is potentially useful for facilitating functional study of transporters irrespective of sequence similarity. 相似文献
12.
Histones are DNA-binding proteins found in the chromatin of all eukaryotic cells. They are highly conserved and can be grouped into five major classes: H1/H5, H2A, H2B, H3, and H4. Two copies of H2A, H2B, H3, and H4 bind to about 160 base pairs of DNA forming the core of the nucleosome (the repeating structure of chromatin) and H1/H5 bind to its DNA linker sequence. Overall, histones have a high arginine/lysine content that is optimal for interaction with DNA. This sequence bias can make the classification of histones difficult using standard sequence similarity approaches. Therefore, in this paper, we applied support vector machine (SVM) to recognize and classify histones on the basis of their amino acid and dipeptide composition. On evaluation through a five-fold cross-validation, the SVM-based method was able to distinguish histones from nonhistones (nuclear proteins) with an accuracy around 98%. Similarly, we obtained an overall >95% accuracy in discriminating the five classes of histones through the application of 1-versus-rest (1-v-r) SVM. Finally, we have applied this SVM-based method to the detection of histones from whole proteomes and found a comparable sensitivity to that accomplished by hidden Markov motifs (HMM) profiles. 相似文献
13.
抗冻蛋白是一类具有提高生物抗冻能力的蛋白质。抗冻蛋白能够特异性的与冰晶相结合,进而阻止体液内冰核的形成与生长。因此,对抗冻蛋白的生物信息学研究对生物工程发展。提高作物抗冻性有重要的推动作用。本文采用由400条抗冻蛋白序列和400条非抗冻蛋白序列构成数据集,以伪氨基酸组分为特征,利用支持向量机分类算法预测抗冻蛋白,对训练集预测精度达到91.3%,对测试集预测精度达到78.8%。该结果证明伪氨基酸组分能够很好的反映抗冻蛋白特性,并能够用于预测抗冻蛋白。 相似文献
14.
In classifying cells in tissue sections, one must consider the fact that only random projections of cells and of subcellular structures are available in the two-dimensional image. Therefore, measurement values that solely reflect the size of such projections cannot be taken on their own as a basis for cell classification. More complex morphologic features such as shape, texture and distribution pattern of cells and their components should be analyzed. Using cell nuclei as an example, the relationship between such features and geometric measurement values is evaluated. It can be shown that a well balanced combination of geometric parameters provides a suitable basis for reproducing the visual preclassification of lymphocytes in tissue sections. Moreover, using a cluster algorithm, which allows different levels of similarity to be defined, a hierarchical sequence of subclusters turns out, indicating the heterogeneity of the visually determined cell classes. Whether or not these subclusters can be correlated to functionally defined subpopulations of lymphocytes remains a matter for further investigation. 相似文献
15.
16.
Structural class characterizes the overall folding type of a protein or its domain. Most of the existing methods for determining
the structural class of a protein are based on a group of features that only possesses a kind of discriminative information
for the prediction of protein structure class. However, different types of discriminative information associated with primary
sequence have been completely missed, which undoubtedly has reduced the success rate of prediction. We present a novel method
for the prediction of protein structure class by coupling the improved genetic algorithm (GA) with the support vector machine
(SVM). This improved GA was applied to the selection of an optimized feature subset and the optimization of SVM parameters.
Jackknife tests on the working datasets indicated that the prediction accuracies for the different classes were in the range
of 97.8–100% with an overall accuracy of 99.5%. The results indicate that the approach has a high potential to become a useful
tool in bioinformatics. 相似文献
17.
Background
The prediction of protein-protein binding site can provide structural annotation to the protein interaction data from proteomics studies. This is very important for the biological application of the protein interaction data that is increasing rapidly. Moreover, methods for predicting protein interaction sites can also provide crucial information for improving the speed and accuracy of protein docking methods. 相似文献18.
We describe a protocol for fully automated detection and segmentation of asymmetric, presumed excitatory, synapses in serial electron microscopy images of the adult mammalian cerebral cortex, taken with the focused ion beam, scanning electron microscope (FIB/SEM). The procedure is based on interactive machine learning and only requires a few labeled synapses for training. The statistical learning is performed on geometrical features of 3D neighborhoods of each voxel and can fully exploit the high z-resolution of the data. On a quantitative validation dataset of 111 synapses in 409 images of 1948×1342 pixels with manual annotations by three independent experts the error rate of the algorithm was found to be comparable to that of the experts (0.92 recall at 0.89 precision). Our software offers a convenient interface for labeling the training data and the possibility to visualize and proofread the results in 3D. The source code, the test dataset and the ground truth annotation are freely available on the website http://www.ilastik.org/synapse-detection. 相似文献
19.
Prediction of RNA-binding proteins from primary sequence by a support vector machine approach 总被引:3,自引:0,他引:3 下载免费PDF全文
Elucidation of the interaction of proteins with different molecules is of significance in the understanding of cellular processes. Computational methods have been developed for the prediction of protein-protein interactions. But insufficient attention has been paid to the prediction of protein-RNA interactions, which play central roles in regulating gene expression and certain RNA-mediated enzymatic processes. This work explored the use of a machine learning method, support vector machines (SVM), for the prediction of RNA-binding proteins directly from their primary sequence. Based on the knowledge of known RNA-binding and non-RNA-binding proteins, an SVM system was trained to recognize RNA-binding proteins. A total of 4011 RNA-binding and 9781 non-RNA-binding proteins was used to train and test the SVM classification system, and an independent set of 447 RNA-binding and 4881 non-RNA-binding proteins was used to evaluate the classification accuracy. Testing results using this independent evaluation set show a prediction accuracy of 94.1%, 79.3%, and 94.1% for rRNA-, mRNA-, and tRNA-binding proteins, and 98.7%, 96.5%, and 99.9% for non-rRNA-, non-mRNA-, and non-tRNA-binding proteins, respectively. The SVM classification system was further tested on a small class of snRNA-binding proteins with only 60 available sequences. The prediction accuracy is 40.0% and 99.9% for snRNA-binding and non-snRNA-binding proteins, indicating a need for a sufficient number of proteins to train SVM. The SVM classification systems trained in this work were added to our Web-based protein functional classification software SVMProt, at http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi. Our study suggests the potential of SVM as a useful tool for facilitating the prediction of protein-RNA interactions. 相似文献
20.
MicroRNAs (miRNAs) are one family of short (21-23 nt) regulatory non-coding RNAs processed from long (70-110 nt) miRNA precursors (pre-miRNAs). Identifying true and false precursors plays an important role in computational identification of miRNAs. Some numerical features have been extracted from precursor sequences and their secondary structures to suit some classification methods; however, they may lose some usefully discriminative information hidden in sequences and structures. In this study, pre-miRNA sequences and their secondary structures are directly used to construct an exponential kernel based on weighted Levenshtein distance between two sequences. This string kernel is then combined with support vector machine (SVM) for detecting true and false pre-miRNAs. Based on 331 training samples of true and false human pre-miRNAs, 2 key parameters in SVM are selected by 5-fold cross validation and grid search, and 5 realizations with different 5-fold partitions are executed. Among 16 independent test sets from 3 human, 8 animal, 2 plant, 1 virus, and 2 artificially false human pre-miRNAs, our method statistically outperforms the previous SVM-based technique on 11 sets, including 3 human, 7 animal, and 1 false human pre-miRNAs. In particular, premiRNAs with multiple loops that were usually excluded in the previous work are correctly identified in this study with an accuracy of 92.66%. 相似文献