共查询到20条相似文献,搜索用时 15 毫秒
1.
Translation is a key process for gene expression. Timely identification of the translation initiation site (TIS) is very important for conducting in-depth genome analysis. With the avalanche of genome sequences generated in the postgenomic age, it is highly desirable to develop automated methods for rapidly and effectively identifying TIS. Although some computational methods were proposed in this regard, none of them considered the global or long-range sequence-order effects of DNA, and hence their prediction quality was limited. To count this kind of effects, a new predictor, called “iTIS-PseTNC,” was developed by incorporating the physicochemical properties into the pseudo trinucleotide composition, quite similar to the PseAAC (pseudo amino acid composition) approach widely used in computational proteomics. It was observed by the rigorous cross-validation test on the benchmark dataset that the overall success rate achieved by the new predictor in identifying TIS locations was over 97%. As a web server, iTIS-PseTNC is freely accessible at http://lin.uestc.edu.cn/server/iTIS-PseTNC. To maximize the convenience of the vast majority of experimental scientists, a step-by-step guide is provided on how to use the web server to obtain the desired results without the need to go through detailed mathematical equations, which are presented in this paper just for the integrity of the new prection method. 相似文献
2.
Translation initiation sites (TISs) are important signals in cDNA sequences. In many previous attempts to predict TISs in cDNA sequences, three major factors affect the prediction performance: the nature of the cDNA sequence sets, the relevant features selected. and the classification methods used. In this paper, we examine different approaches to select and integrate relevant features for TIS prediction. The top selected significant features include the features from the position weight matrix and the propensity matrix, the number of nucleotide C in the sequence downstream ATG, the number of downstream stop codons. the number of upstream ATGs, and the number of some amino acids, such as amino acids A and D. With the numerical data generated from these features, different classification methods, including decision tree. naive Bayes, and support vector machine, were applied to three independent sequence sets. The identified significant features were found to be biologically meaningful. while the experiments showed promising results. 相似文献
3.
Knowledge of structural class plays an important role in understanding protein folding patterns. In this study, a simple and powerful computational method, which combines support vector machine with PSI-BLAST profile, is proposed to predict protein structural class for low-similarity sequences. The evolution information encoding in the PSI-BLAST profiles is converted into a series of fixed-length feature vectors by extracting amino acid composition and dipeptide composition from the profiles. The resulting vectors are then fed to a support vector machine classifier for the prediction of protein structural class. To evaluate the performance of the proposed method, jackknife cross-validation tests are performed on two widely used benchmark datasets, 1189 (containing 1092 proteins) and 25PDB (containing 1673 proteins) with sequence similarity lower than 40% and 25%, respectively. The overall accuracies attain 70.7% and 72.9% for 1189 and 25PDB datasets, respectively. Comparison of our results with other methods shows that our method is very promising to predict protein structural class particularly for low-similarity datasets and may at least play an important complementary role to existing methods. 相似文献
4.
Engineering support vector machine kernels that recognize translation initiation sites 总被引:22,自引:0,他引:22
Zien A Rätsch G Mika S Schölkopf B Lengauer T Müller KR 《Bioinformatics (Oxford, England)》2000,16(9):799-807
MOTIVATION: In order to extract protein sequences from nucleotide sequences, it is an important step to recognize points at which regions start that code for proteins. These points are called translation initiation sites (TIS). RESULTS: The task of finding TIS can be modeled as a classification problem. We demonstrate the applicability of support vector machines for this task, and show how to incorporate prior biological knowledge by engineering an appropriate kernel function. With the described techniques the recognition performance can be improved by 26% over leading existing approaches. We provide evidence that existing related methods (e.g. ESTScan) could profit from advanced TIS recognition. 相似文献
5.
As one important post-translational modification of prokaryotic proteins, pupylation plays a key role in regulating various biological processes. The accurate identification of pupylation sites is crucial for understanding the underlying mechanisms of pupylation. Although several computational methods have been developed for the identification of pupylation sites, the prediction accuracy of them is still unsatisfactory. Here, a novel bioinformatics tool named IMP–PUP is proposed to improve the prediction of pupylation sites. IMP–PUP is constructed on the composition of k-spaced amino acid pairs and trained with a modified semi-supervised self-training support vector machine (SVM) algorithm. The proposed algorithm iteratively trains a series of support vector machine classifiers on both annotated and non-annotated pupylated proteins. Computational results show that IMP–PUP achieves the area under receiver operating characteristic curves of 0.91, 0.73, and 0.75 on our training set, Tung's testing set, and our testing set, respectively, which are better than those of the different error costs SVM algorithm and the original self-training SVM algorithm. Independent tests also show that IMP–PUP significantly outperforms three other existing pupylation site predictors: GPS–PUP, iPUP, and pbPUP. Therefore, IMP–PUP can be a useful tool for accurate prediction of pupylation sites. A MATLAB software package for IMP–PUP is available at https://juzhe1120.github.io/. 相似文献
6.
Identifying prokaryotes in silico is commonly based on DNA sequences. In experiments where DNA sequences may not be immediately available, we need to have a different approach to detect prokaryotes based on RNA or protein sequences. N-formylmethionine (fMet) is known as a typical characteristic of prokaryotes. A web tool has been implemented here for predicting prokaryotes through detecting the N-formylmethionine residues in protein sequences. The predictor is constructed using support vector machine. An online predictor has been implemented using Python. The implemented predictor is able to achieve the total prediction accuracy 80% with the specificity 80% and the sensitivity 81%. 相似文献
7.
Plewczynski D Tkacz A Wyrwicz LS Rychlewski L Ginalski K 《Journal of molecular modeling》2008,14(1):69-76
We present here the recent update of AutoMotif Server (AMS 2.0) that predicts post-translational modification sites in protein
sequences. The support vector machine (SVM) algorithm was trained on data gathered in 2007 from various sets of proteins containing
experimentally verified chemical modifications of proteins. Short sequence segments around a modification site were dissected
from a parent protein, and represented in the training set as binary or profile vectors. The updated efficiency of the SVM
classification for each type of modification and the predictive power of both representations were estimated using leave-one-out
tests for model of general phosphorylation and for modifications catalyzed by several specific protein kinases. The accuracy
of the method was improved in comparison to the previous version of the service (Plewczynski et al., “AutoMotif server: prediction
of single residue post-translational modifications in proteins”, Bioinformatics 21: 2525–7, 2005). The precision of the updated
version reached over 90% for selected types of phosphorylation and was optimized in trade of lower recall value of the classification
model. The AutoMotif Server version 2007 is freely available at . Additionally, the reference dataset for optimization of prediction of phosphorylation sites, collected from the UniProtKB
was also provided and can be accessed at . 相似文献
8.
翻译起始位点(TIS)的识别是真核生物基因预测的关键步骤之一,近年来一直得到研究人员的高度重视。基于TIS附近序列的统计特性,出现了一些辨识TIS的判别方法,但识别精度还有待进一步提高。针对传统支持向量机(SVM)方法中存在的不足,提出了基于数据优化法的SVM,它通过其它统计学模型优化训练数据集,进而提高分类器的辨识精度。实验结果表明基于数据优化法的SVM分类器在翻译起始位点的辨识上可获得比其他判别方法更好的效果。 相似文献
9.
Using Chou's amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes 总被引:1,自引:1,他引:1
With the rapid increment of protein sequence data, it is indispensable to develop automated and reliable predictive methods for protein function annotation. One approach for facilitating protein function prediction is to classify proteins into functional families from primary sequence. Being the most important group of all proteins, the accurate prediction for enzyme family classes and subfamily classes is closely related to their biological functions. In this paper, for the prediction of enzyme subfamily classes, the Chou's amphiphilic pseudo-amino acid composition [Chou, K.C., 2005. Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics 21, 10-19] has been adopted to represent the protein samples for training the 'one-versus-rest' support vector machine. As a demonstration, the jackknife test was performed on the dataset that contains 2640 oxidoreductase sequences classified into 16 subfamily classes [Chou, K.C., Elrod, D.W., 2003. Prediction of enzyme family classes. J. Proteome Res. 2, 183-190]. The overall accuracy thus obtained was 80.87%. The significant enhancement in the accuracy indicates that the current method might play a complementary role to the exiting methods. 相似文献
10.
A change in the normal concentration of essential trace elements in the human body might lead to major health disturbances.
In this study, hair samples were collected from 115 human subject, including 55 healthy people and 60 patients with prostate
cancer. The concentrations of 20 trace elements (TEs) in these samples were measured by inductively coupled plasma-mass spectrometry.
A support vector machine was used to investigate the relationship between TEs and prostate cancer. It is found that, among
the 20 TEs, 10 (Mg P, K, Ca, Cr, Mn, Fe. Cu, Zn, and Se) are related to the risk of prostate cancer. These 10 TEs were used
to build the prediction model for prostate cancer. The model obtained can satisfactorily distinguish the healthy samples from
the cancer samples. Furthermore, the cross-validation by leaving-one method proved that the prediction ability of this model
reaches as high as 95.8%. It is practical to predict the risk of prostate cancer using this model in the clinics 相似文献
11.
Apoptosis, or programmed cell death, plays an important role in development of an organism. Obtaining information on subcellular location of apoptosis proteins is very helpful to understand the apoptosis mechanism. In this paper, based on the concept that the position distribution information of amino acids is closely related with the structure and function of proteins, we introduce the concept of distance frequency [Matsuda, S., Vert, J.P., Ueda, N., Toh, H., Akutsu, T., 2005. A novel representation of protein sequences for prediction of subcellular location using support vector machines. Protein Sci. 14, 2804-2813] and propose a novel way to calculate distance frequencies. In order to calculate the local features, each protein sequence is separated into p parts with the same length in our paper. Then we use the novel representation of protein sequences and adopt support vector machine to predict subcellular location. The overall prediction accuracy is significantly improved by jackknife test. 相似文献
12.
Tikole S Sankararamakrishnan R 《Biochemical and biophysical research communications》2008,369(4):1166-1168
Translation of eukaryotic mRNAs is often regulated by nucleotides around the start codon. A purine at position −3 and a guanine at position +4 contribute significantly to enhance the translation efficiency. Algorithms to predict the translation initiation site often fail to predict the start site if the sequence context is not present. We have developed a neural network method to predict the initiation site of mRNA sequences that lack the preferred nucleotides at the positions −3 and +4 surrounding the translation initiation site. Neural networks of various architectures comprising different number of hidden layers were designed and tested for various sizes of windows of nucleotides surrounding translation initiation sites. We found that the neural network with two hidden layers showed a sensitivity of 83% and specificity of 73% indicating a vastly improved performance in successfully predicting the translation initiation site of mRNA sequences with weak Kozak context. WeakAUG server is freely available at http://bioinfo.iitk.ac.in/AUGPred/. 相似文献
13.
Cancers are regarded as malignant proliferations of tumor cells present in many tissues and organs, which can severely curtail the quality of human life. The potential of using plasma DNA for cancer detection has been widely recognized, leading to the need of mapping the tissue-of-origin through the identification of somatic mutations. With cutting-edge technologies, such as next-generation sequencing, numerous somatic mutations have been identified, and the mutation signatures have been uncovered across different cancer types. However, somatic mutations are not independent events in carcinogenesis but exert functional effects. In this study, we applied a pan-cancer analysis to five types of cancers: (I) breast cancer (BRCA), (II) colorectal adenocarcinoma (COADREAD), (III) head and neck squamous cell carcinoma (HNSC), (IV) kidney renal clear cell carcinoma (KIRC), and (V) ovarian cancer (OV). Based on the mutated genes of patients suffering from one of the aforementioned cancer types, patients they were encoded into a large number of numerical values based upon the enrichment theory of gene ontology (GO) terms and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. We analyzed these features with the Monte-Carlo Feature Selection (MCFS) method, followed by the incremental feature selection (IFS) method to identify functional alteration features that could be used to build the support vector machine (SVM)-based classifier for distinguishing the five types of cancers. Our results showed that the optimal classifier with the selected 344 features had the highest Matthews correlation coefficient value of 0.523. Sixteen decision rules produced by the MCFS method can yield an overall accuracy of 0.498 for the classification of the five cancer types. Further analysis indicated that some of these features and rules were supported by previous experiments. This study not only presents a new approach to mapping the tissue-of-origin for cancer detection but also unveils the specific functional alterations of each cancer type, providing insight into cancer-specific functional aberrations as potential therapeutic targets. This article is part of a Special Issue entitled: Accelerating Precision Medicine through Genetic and Genomic Big Data Analysis edited by Yudong Cai & Tao Huang. 相似文献
14.
Fast Fourier transform-based support vector machine for subcellular localization prediction using different substitution models 总被引:2,自引:0,他引:2
There are approximately 109 proteins in a cell. A hotspot in bioinformatics is how to identify a protein's subcellular localization, if its sequence is known. In this paper, a method using fast Fourier transform-based support vector machine is developed to predict the subcellular localization of proteins from their physicochemical properties and structural parameters. The prediction accuracies reached 83% in prokaryotic organisms and 84% in eukaryotic organisms with the substitution model of the c-p-v matrix (c, composition; p, polarity; and v, molecular volume). The overall prediction accuracy was also evaluated using the "leave-one-out" jackknife procedure. The influence of the substitution model on prediction accuracy has also been discussed in the work. The source code of the new program is available on request from the authors. 相似文献
15.
Using pseudo-amino acid composition and support vector machine to predict protein structural class 总被引:4,自引:0,他引:4
As a result of genome and other sequencing projects, the gap between the number of known protein sequences and the number of known protein structural classes is widening rapidly. In order to narrow this gap, it is vitally important to develop a computational prediction method for fast and accurately determining the protein structural class. In this paper, a novel predictor is developed for predicting protein structural class. It is featured by employing a support vector machine learning system and using a different pseudo-amino acid composition (PseAA), which was introduced to, to some extent, take into account the sequence-order effects to represent protein samples. As a demonstration, the jackknife cross-validation test was performed on a working dataset that contains 204 non-homologous proteins. The predicted results are very encouraging, indicating that the current predictor featured with the PseAA may play an important complementary role to the elegant covariant discriminant predictor and other existing algorithms. 相似文献
16.
I. V. Boni 《Molecular Biology》2006,40(4):587-596
More than 30 years ago Shine and Dalgarno proposed a classic model of prokaryotic translation initiation, based on the central role of the mRNA-16S rRNA interactions. Since then basic research has greatly extended the view of this process, owing to rapid progress in experimental techniques and genome sequencing. This review focuses on bioinformatic data and experimental results obtained in vitro and in vivo, demonstrating the diversity of molecular mechanisms for ribosome recruitment in prokaryotes. 相似文献
17.
Yu-Hang Zhang Yu Hu Yuchao Zhang Lan-Dian Hu Xiangyin Kong 《生物化学与生物物理学报:疾病的分子基础》2018,1864(6):2255-2265
Hematopoiesis is a complicated process involving a series of biological sub-processes that lead to the formation of various blood components. A widely accepted model of early hematopoiesis proceeds from long-term hematopoietic stem cells (LT-HSCs) to multipotent progenitors (MPPs) and then to lineage-committed progenitors. However, the molecular mechanisms of early hematopoiesis have not been fully characterized. In this study, we applied a computational strategy to identify the gene expression signatures distinguishing three types of closely related hematopoietic cells collected in recent studies: (1) hematopoietic stem cell/multipotent progenitor cells; (2) LT-HSCs; and (3) hematopoietic progenitor cells. Each cell in these cell types was represented by its gene expression profile among a total number of 20,475 genes. The expression features were analyzed by a Monte-Carlo Feature Selection (MCFS) method, resulting in a feature list. Then, the incremental feature selection (IFS) and a support vector machine (SVM) optimized with a sequential minimum optimization (SMO) algorithm were employed to access the optimal classifier with the highest Matthews correlation coefficient (MCC) value of 0.889, in which 6698 features were used to represent cells. In addition, through an updated program of MCFS method, seventeen decision rules can be obtained, which can classify the three cell types with an overall accuracy of 0.812. Using a literature review, both the rules and the top features used for building the optimal classifier were confirmed to be commonly used or potential biological markers for distinguishing the three cell types of HSPCs. This article is part of a Special Issue entitled: Accelerating Precision Medicine through Genetic and Genomic Big Data Analysis edited by Yudong Cai & Tao Huang. 相似文献
18.
Local secondary structures in coding sequences have important functions across various translational processes. To date, however, the local structures and their functions in the early stage of translation elongation remain poorly understood. Here, we surveyed the structural stability in the first 180 nucleotides of the coding sequence of 27 species using computational method. We found that the structural stability in the 30–80 nucleotide interval was significantly higher than that in other regions in eukaryotes and most prokaryotes. No significant correlation between local translation efficiency and structural stability was observed, suggesting that this structural region has undergone selection pressure directly to maintain high stability. Furthermore, ribosome was blocked by this region, providing an opportunity for co-translational regulation. Remarkably, in eukaryotes, we found that mRNAs with higher structural stability in the 30–80 nucleotide interval tended to encode the secreted proteins. Overall, our results revealed a previously unappreciated correlation between structural stability and protein localization. 相似文献
19.
Support vector machine (SVM) is introduced as a method for the classification of proteins into functionally distinguished classes. Studies are conducted on a number of protein classes including RNA-binding proteins; protein homodimers, proteins responsible for drug absorption, proteins involved in drug distribution and excretion, and drug metabolizing enzymes. Testing accuracy for the classification of these protein classes is found to be in the range of 84-96%. This suggests the usefulness of SVM in the classification of protein functional classes and its potential application in protein function prediction. 相似文献