首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Bilateral similarity function is designed for analyzing the similarities of biological sequences such as DNA, RNA secondary structure or protein in this paper. The defined function can perform comprehensive comparison between sequences remarkably well, both in terms of the Hamming distance of two compared sequences and the corresponding location difference. Compared with the existing methods for similarity analysis, the examination of similarities/dissimilarities illustrates that the proposed method with the computational complexity of O(N) is effective for these three kinds of biological sequences, and bears the universality for them.  相似文献   

2.

Background

Large amounts of data are being generated by high-throughput genome sequencing methods. But the rate of the experimental functional characterization falls far behind. To fill the gap between the number of sequences and their annotations, fast and accurate automated annotation methods are required. Many methods, such as GOblet, GOFigure, and Gotcha, are designed based on the BLAST search. Unfortunately, the sequence coverage of these methods is low as they cannot detect the remote homologues. Adding to this, the lack of annotation specificity advocates the need to improve automated protein function prediction.

Results

We designed a novel automated protein functional assignment method based on the neural response algorithm, which simulates the neuronal behavior of the visual cortex in the human brain. Firstly, we predict the most similar target protein for a given query protein and thereby assign its GO term to the query sequence. When assessed on test set, our method ranked the actual leaf GO term among the top 5 probable GO terms with accuracy of 86.93%.

Conclusions

The proposed algorithm is the first instance of neural response algorithm being used in the biological domain. The use of HMM profiles along with the secondary structure information to define the neural response gives our method an edge over other available methods on annotation accuracy. Results of the 5-fold cross validation and the comparison with PFP and FFPred servers indicate the prominent performance by our method. The program, the dataset, and help files are available at http://www.jjwanglab.org/NRProF/.
  相似文献   

3.
4.
Following concerns over the potential for insect resistance to insecticidal Bacillus thuringiensis toxins expressed in transgenic plants, there has been recent interest in novel biological insecticides. Over the past year there has been considerable progress in the cloning of several alternative toxin genes from the bacteria Photorhabdus luminescens and Xenorhabdus nematophilus. These genes encode large insecticidal toxin complexes with little homology to other known toxins.  相似文献   

5.
6.
Pore-forming protein toxins: from structure to function   总被引:4,自引:0,他引:4  
Pore-forming protein toxins (PFTs) are one of Nature's most potent biological weapons. An essential feature of their toxicity is the remarkable property that PFTs can exist either in a stable water-soluble state or as an integral membrane pore. In order to convert from the water-soluble to the membrane state, the toxin must undergo large conformational changes. There are now more than a dozen PFTs for which crystal structures have been determined and the nature of the conformational changes they must undergo is beginning to be understood. Although they differ markedly in their primary, secondary, tertiary and quaternary structures, nearly all can be classified into one of two families based on the types of pores they are thought to form: alpha-PFTs or beta-PFTs. Recent work suggests a number of common features in the mechanism of membrane insertion may exist for each class.  相似文献   

7.
In the study of in silico functional genomics, improving the performance of protein function prediction is the ultimate goal for identifying proteins associated with defined cellular functions. The classical prediction approach is to employ pairwise sequence alignments. However this method often faces difficulties when no statistically significant homologous sequences are identified. An alternative way is to predict protein function from sequence-derived features using machine learning. In this case the choice of possible features which can be derived from the sequence is of vital importance to ensure adequate discrimination to predict function. In this paper we have successfully selected biologically significant features for protein function prediction. This was performed using a new feature selection method (FrankSum) that avoids data distribution assumptions, uses a data independent measurement (p-value) within the feature, identifies redundancy between features and uses an appropriate ranking criterion for feature selection. We have shown that classifiers generated from features selected by FrankSum outperforms classifiers generated from full feature sets, randomly selected features and features selected from the Wrapper method. We have also shown the features are concordant across all species and top ranking features are biologically informative. We conclude that feature selection is vital for successful protein function prediction and FrankSum is one of the feature selection methods that can be applied successfully to such a domain.  相似文献   

8.
Phosphopeptide-binding domains, including the FHA, SH2, WW, WD40, MH2, and Polo-box domains, as well as the 14-3-3 proteins, exert control functions in important processes such as cell growth, division, differentiation, and apoptosis. Structures and mechanisms of phosphopeptide binding are generally diverse, revealing few general principles. A computational method for analysis of phosphopeptide-binding domains was therefore developed to elucidate the physical and chemical nature of phosphopeptide binding, given this lack of structural similarity. The surfaces of nine phosphopeptide-binding proteins, representing seven distinct classes of phosphopeptide-binding modules, were discretized, and encoded with information about amino acid identity, surface curvature, and electrostatic potential at every point on the surface in order to identify local surface properties enriched in phosphoresidue contact sites. Cross-validation indicated that propensities corresponding to this enrichment calculated from a subset of the training data could be used to predict the phosphoresidue contact site on proteins not used in training with no false negative results, and with few unconfirmed positive predictions. The locations of phosphoresidue contact sites were then predicted on the surfaces of the checkpoint kinase Chk1 and the BRCA1 BRCT repeat domain, and these predictions are consistent with recent experimental evidence.  相似文献   

9.
从蛋白质序列出发,采用分组重量编码(Encoding Based on Grouped Weight,简记EBGW),并结合最近邻居算法对蛋白质功能进行预测。对酵母(Saccharomyces cerevisiae)蛋白质的1826条序列进行预测,整体预测准确率与其他基于序列信息的蛋白质功能预测方法相当。实验结果表明基于EBGW编码方案的新方法可有效地应用于蛋白质功能预测。  相似文献   

10.

Background

The use of novel algorithmic techniques is pivotal to many important problems in life science. For example the sequencing of the human genome [1] would not have been possible without advanced assembly algorithms. However, owing to the high speed of technological progress and the urgent need for bioinformatics tools, there is a widening gap between state-of-the-art algorithmic techniques and the actual algorithmic components of tools that are in widespread use.

Results

To remedy this trend we propose the use of SeqAn, a library of efficient data types and algorithms for sequence analysis in computational biology. SeqAn comprises implementations of existing, practical state-of-the-art algorithmic components to provide a sound basis for algorithm testing and development. In this paper we describe the design and content of SeqAn and demonstrate its use by giving two examples. In the first example we show an application of SeqAn as an experimental platform by comparing different exact string matching algorithms. The second example is a simple version of the well-known MUMmer tool rewritten in SeqAn. Results indicate that our implementation is very efficient and versatile to use.

Conclusion

We anticipate that SeqAn greatly simplifies the rapid development of new bioinformatics tools by providing a collection of readily usable, well-designed algorithmic components which are fundamental for the field of sequence analysis. This leverages not only the implementation of new algorithms, but also enables a sound analysis and comparison of existing algorithms.  相似文献   

11.
FUS1/TUSC2 is a mitochondrial tumor suppressor with activity to regulate cellular oxidative stress by maintaining balanced ROS production and mitochondrial homeostasis. Fus1 expression is inhibited by ROS, suggesting that individuals with a high level of ROS may have lower Fus1 in normal tissues and, thus, may be more prone to oxidative stress-induced side effects of cancer treatment, including radiotherapy. As the role of Fus1 in the modulation of cellular radiosensitivity is unknown, we set out to determine molecular mechanisms of Fus1 involvement in the IR response in normal tissues. Mouse whole-body irradiation methodology was employed to determine the role for Fus1 in the radiation response and explore underlying molecular mechanisms. Fus1−/− mice were more susceptible to radiation compared with Fus1+/+ mice, exhibiting increased mortality and accelerated apoptosis of the GI crypt epithelial cells. Following untimely reentrance into the cell cycle, the Fus1−/− GI crypt cells died at accelerated rate via mitotic catastrophe that resulted in diminished and/or delayed crypt regeneration after irradiation. At the molecular level, dysregulated dynamics of activation of main IR response proteins (p53, NFκB, and GSK-3β), as well as key signaling pathways involved in oxidative stress response (SOD2, PRDX1, and cytochrome c), apoptosis (BAX and PARP1), cell cycle (Cyclins B1 and D1), and DNA repair (γH2AX) were found in Fus1−/− cells after irradiation. Increased radiosensitivity of other tissues, such as immune cells and hair follicles was also detected in Fus1−/− mice. Our findings demonstrate a previously unknown radioprotective function of the mitochondrial tumor suppressor Fus1 in normal tissues and suggest new individualized therapeutic approaches based on Fus1 expression.  相似文献   

12.
《Biochimie》2013,95(9):1741-1744
In this study, a 12-dimensional feature vector is constructed to reflect the general contents and spatial arrangements of the secondary structural elements of a given protein sequence. Among the 12 features, 6 novel features are specially designed to improve the prediction accuracies for α/β and α + β classes based on the distributions of α-helices and β-strands and the characteristics of parallel β-sheets and anti-parallel β-sheets. To evaluate our method, the jackknife cross-validating test is employed on two widely-used datasets, 25PDB and 1189 datasets with sequence similarity lower than 40% and 25%, respectively. The performance of our method outperforms the recently reported methods in most cases, and the 6 newly-designed features have significant positive effect to the prediction accuracies, especially for α/β and α + β classes.  相似文献   

13.
Monoclonal antibody samples derived from transgenic plants (plantibodies) may often contain significant amounts of aglycosylated variants. Because glycosylated and non-/de-glycosylated proteins exhibit different functional and pharmacokinetic properties, accurate measurement of non- and de-glycosylated glycoprotein abundances is important. Glycosylation of plant-derived glycoproteins presents specific challenges. Here we describe a novel method to accurately measure relative and absolute amounts of non-glycosylated, de-glycosylated, and total glycosylated protein using an HPLC-UV-MS methodology. Additionally, these results were compared with glycopeptide profiling by MALDI MS. Our studies demonstrated that the quantitative aspect of HPLC-UV method was superior to MALDI MS profiling, which significantly overestimated the relative amounts of aglycosylated species in the isolated glycopeptide fractions.  相似文献   

14.
The NetAcet method has been developed to make predictions of N-terminal acetylation sites, but more information of the data set could be utilized to improve the performance of the model. By employing a new way to extract patterns from sequences and using a sample balancing mechanism, we obtained a correlation coefficient of 0.85, and a sensitivity of 93% on an independent mammalian data set. A web server utilizing this method has been constructed and is available at http://166.111.24.5/acetylation.html.  相似文献   

15.

Background  

Machine-learning tools have gained considerable attention during the last few years for analyzing biological networks for protein function prediction. Kernel methods are suitable for learning from graph-based data such as biological networks, as they only require the abstraction of the similarities between objects into the kernel matrix. One key issue in kernel methods is the selection of a good kernel function. Diffusion kernels, the discretization of the familiar Gaussian kernel of Euclidean space, are commonly used for graph-based data.  相似文献   

16.
17.
Guo J  Chen H  Sun Z  Lin Y 《Proteins》2004,54(4):738-743
A high-performance method was developed for protein secondary structure prediction based on the dual-layer support vector machine (SVM) and position-specific scoring matrices (PSSMs). SVM is a new machine learning technology that has been successfully applied in solving problems in the field of bioinformatics. The SVM's performance is usually better than that of traditional machine learning approaches. The performance was further improved by combining PSSM profiles with the SVM analysis. The PSSMs were generated from PSI-BLAST profiles, which contain important evolution information. The final prediction results were generated from the second SVM layer output. On the CB513 data set, the three-state overall per-residue accuracy, Q3, reached 75.2%, while segment overlap (SOV) accuracy increased to 80.0%. On the CB396 data set, the Q3 of our method reached 74.0% and the SOV reached 78.1%. A web server utilizing the method has been constructed and is available at http://www.bioinfo.tsinghua.edu.cn/pmsvm.  相似文献   

18.
MOTIVATION: The detection of function-related local 3D-motifs in protein structures can provide insights towards protein function in absence of sequence or fold similarity. Protein loops are known to play important roles in protein function and several loop classifications have been described, but the automated identification of putative functional 3D-motifs in such classifications has not yet been addressed. This identification can be used on sequence annotations. RESULTS: We evaluated three different scoring methods for their ability to identify known motifs from the PROSITE database in ArchDB. More than 500 new putative function-related motifs not reported in PROSITE were identified. Sequence patterns derived from these motifs were especially useful at predicting precise annotations. The number of reliable sequence annotations could be increased up to 100% with respect to standard BLAST. CONTACT: boliva@imim.es SUPPLEMENTARY INFORMATION: Supplementary Data are available at Bioinformatics online.  相似文献   

19.
20.
MOTIVATION: With the increasing availability of diverse biological information, protein function prediction approaches have converged towards integration of heterogeneous data. Many adapted existing techniques, such as machine-learning and probabilistic methods, which have proven successful on specific data types. However, the impact of these approaches is hindered by a couple of factors. First, there is little comparison between existing approaches. This is in part due to a divergence in the focus adopted by different works, which makes comparison difficult or even fuzzy. Second, there seems to be over-emphasis on the use of computationally demanding machine-learning methods, which runs counter to the surge in biological data. Analogous to the success of BLAST for sequence homology search, we believe that the ability to tap escalating quantity, quality and diversity of biological data is crucial to the success of automated function prediction as a useful instrument for the advancement of proteomic research. We address these problems by: (1) providing useful comparison between some prominent methods; (2) proposing Integrated Weighted Averaging (IWA)--a scalable, efficient and flexible function prediction framework that integrates diverse information using simple weighting strategies and a local prediction method. The simplicity of the approach makes it possible to make predictions based on on-the-fly information fusion. RESULTS: In addition to its greater efficiency, IWA performs exceptionally well against existing approaches. In the presence of cross-genome information, which is overwhelming for existing approaches, IWA makes even better predictions. We also demonstrate the significance of appropriate weighting strategies in data integration.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号