首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
数据挖掘在生物信息学中的应用   总被引:6,自引:0,他引:6  
借助各种应用数学和计算机技术 ,将大量积累并急需处理的生物信息数据利用起来 ,探索生物信息中的规律 ,是当前国内国际生物信息学研究的热点和重点。其中数据挖掘技术在生物信息研究中发挥着巨大的作用。  相似文献   

2.
3.
4.
MS/MS is a widely used method for proteome‐wide analysis of protein expression and PTMs. The thousands of MS/MS spectra produced from a single experiment pose a major challenge for downstream analysis. Standard programs, such as MASCOT, provide peptide assignments for many of the spectra, including identification of PTM sites, but these results are plagued by false‐positive identifications. In phosphoproteomic experiments, only a single peptide assignment is typically available to support identification of each phosphorylation site, and hence minimizing false positives is critical. Thus, tedious manual validation is often required to increase confidence in the spectral assignments. We have developed phoMSVal, an open‐source platform for managing MS/MS data and automatically validating identified phosphopeptides. We tested five classification algorithms with 17 extracted features to separate correct peptide assignments from incorrect ones using over 2600 manually curated spectra. The naïve Bayes algorithm was among the best classifiers with an AUC value of 97% and PPV of 97% for phosphotyrosine data. This classifier required only three features to achieve a 76% decrease in false positives as compared with MASCOT while retaining 97% of true positives. This algorithm was able to classify an independent phosphoserine/threonine data set with AUC value of 93% and PPV of 91%, demonstrating the applicability of this method for all types of phospho‐MS/MS data. PhoMSVal is available at http://csbi.ltdk.helsinki.fi/phomsval .  相似文献   

5.
生物信息学是一门交叉学科,对于现代生物学研究具有重要的意义。数据库技术是生物信息学的基础之一。本文对农林院校的生物信息学专业的数据库技术课程的教学现状做了一些介绍,对目前生物信息专业数据库教学存在的问题进行了分析。结合教学的实践,有针对性的提出了一些教学改革的具体措施。  相似文献   

6.
7.
Rapid development of genomic and proteomic methodologies has provided a wealth of data for deciphering the biomolecular circuitry of a living cell. The main areas of computational research of proteomes outlined in this review are: understanding the system, its features and parameters to help plan the experiments; data integration, to help produce more reliable data sets; visualization and other forms of data representation to simplify interpretation; modeling of the functional regulation; and systems biology. With false-positive rates reaching 50% even in the more reliable data sets, handling the experimental error remains one of the most challenging tasks. Integrative approaches, incorporating results of various genome- and proteome-wide experiments, allow for minimizing the error and bring with them significant predictive power.  相似文献   

8.
Rapid development of genomic and proteomic methodologies has provided a wealth of data for deciphering the biomolecular circuitry of a living cell. The main areas of computational research of proteomes outlined in this review are: understanding the system, its features and parameters to help plan the experiments; data integration, to help produce more reliable data sets; visualization and other forms of data representation to simplify interpretation; modeling of the functional regulation; and systems biology. With false-positive rates reaching 50% even in the more reliable data sets, handling the experimental error remains one of the most challenging tasks. Integrative approaches, incorporating results of various genome- and proteome-wide experiments, allow for minimizing the error and bring with them significant predictive power.  相似文献   

9.
卵巢癌因其侵袭转移特性,致死率极高,居所有妇科恶性肿瘤之首。近年来随着高通量测序技术及生物信息学方法的快速发展,越来越多调控卵巢癌侵袭转移机制的相关生物大分子被发现。本文对卵巢癌侵袭转移机制的研究背景及现状进行了综述,归纳总结了侵袭转移机制相关调控因素,并对蛋白质组学和单细胞组学的生物信息学分析工具及数据库进行了汇总和介绍,以期为卵巢肿瘤细胞侵袭转移机制的深入研究提供理论依据和科研线索。  相似文献   

10.
11.
Shi L  Zhang Q  Rui W  Lu M  Jing X  Shang T  Tang J 《Regulatory peptides》2004,120(1-3):1-3
Bioactive peptide database (BioPD) is a web-based knowledge base that contains more than 1100 protein sequences from human, mouse and rat, which are putative or are known to be bioactive peptides. In addition to peptide sequences and the annotation, the database also contains gene sequences with annotation, protein interaction and disease data related to the peptides. Each entry has as many references as possible to support the information represented. BioPD consists of six parts: PROTEIN, GENE, DISEASE, LINKS, INTERACTION, and REFERENCE. The database is searchable through keyword, gene and protein name, receptor name, etc. The links to PDB, InterPro, Pfam, OMIM, etc. are provided in each entry. Thus BioPD is formed as an information center for the bioactive peptide and serves as a gateway for exploration of bioactive peptides. The database can be accessed at http://biopd.bjmu.edu.cn.  相似文献   

12.
We have developed a proteome database (DB), BiomarkerDigger ( http://biomarkerdigger.org ) that automates data analysis, searching, and metadata‐gathering function. The metadata‐gathering function searches proteome DBs for protein–protein interaction, Gene Ontology, protein domain, Online Mendelian Inheritance in Man, and tissue expression profile information and integrates it into protein data sets that are accessed through a search function in BiomarkerDigger. This DB also facilitates cross‐proteome comparisons by classifying proteins based on their annotation. BiomarkerDigger highlights relationships between a given protein in a proteomic data set and any known biomarkers or biomarker candidates. The newly developed BiomarkerDigger system is useful for multi‐level synthesis, comparison, and analyses of data sets obtained from currently available web sources. We demonstrate the application of this resource to the identification of a serological biomarker for hepatocellular carcinoma by comparison of plasma and tissue proteomic data sets from healthy volunteers and cancer patients.  相似文献   

13.

Background  

High-throughput sequencing makes it possible to rapidly obtain thousands of 16S rDNA sequences from environmental samples. Bioinformatic tools for the analyses of large 16S rDNA sequence databases are needed to comprehensively describe and compare these datasets.  相似文献   

14.
Channelrhodopsins are microbial-type rhodopsins that function as light-gated cation channels. Understanding how the detailed architecture of the protein governs its dynamics and specificity for ions is important, because it has the potential to assist in designing site-directed channelrhodopsin mutants for specific neurobiology applications. Here we use bioinformatics methods to derive accurate alignments of channelrhodopsin sequences, assess the sequence conservation patterns and find conserved motifs in channelrhodopsins, and use homology modeling to construct three-dimensional structural models of channelrhodopsins. The analyses reveal that helices C and D of channelrhodopsins contain Cys, Ser, and Thr groups that can engage in both intra- and inter-helical hydrogen bonds. We propose that these polar groups participate in inter-helical hydrogen-bonding clusters important for the protein conformational dynamics and for the local water interactions. This article is part of a Special Issue entitled: Retinal Proteins — You can teach an old dog new tricks.  相似文献   

15.
16.
To date, only a handful of phosphoproteins with important biological functions have been identified and characterized in oral fluids, and these include some of the abundant protein constituents of saliva. Whole saliva (WS) samples were trypsin digested, followed by chemical derivatization using dithiothreitol (DTT) of the phospho-serine/threonine-containing peptides. The DTT-phosphopeptides were enriched by covalent disulfide-thiol interchange chromatography and analysis by nanoflow liquid chromatography and electrospray ionization tandem mass spectrometry (LC-ESI-MS/MS). The specificity of DTT chemical derivatization was evaluated separately under different base-catalyzed conditions with NaOH and Ba(OH)2, blocking cysteine residues by iodoacetamide and enzymatic O-deglycosylation prior to DTT reaction. Further analysis of WS samples that were subjected to either of these conditions provided supporting evidence for phosphoprotein identifications. The combined chemical strategies and mass spectrometric analyses identified 65 phosphoproteins in WS; of these, 28 were based on two or more peptide identification criteria with high confidence and 37 were based on a single phosphopeptide identification. Most of the identified proteins (∼80%) were previously unknown phosphoprotein components. This study represents the first large-scale documentation of phosphoproteins of WS. The origins and identity of WS phosphoproteome suggest significant implications for both basic science and the development of novel biomarkers/diagnostic tools for systemic and oral disease states.  相似文献   

17.
Babnigg G  Giometti CS 《Proteomics》2003,3(5):584-600
The analysis of proteomes, i.e., the proteins expressed by biological organisms under a given set of conditions at a given time, requires separating complex protein mixtures into discrete protein components, measuring their relative abundances, and identifying the individual protein components. Many types of data are generated during the course of proteome analysis, including graphic images of the protein profiles, flat files containing numeric data, spreadsheets for assimilating numeric data, and relational database tables for integrating data from multiple experiments. As part of a project to describe the proteomes of microbes of interest to the U.S. Department of Energy, a World-Wide Web-based interface has been developed for the display of protein profiles generated by two-dimensional gel electrophoresis. The web interface is capable of obtaining protein identifications on the fly, interrogating the quantitative data in the context of available genome sequence information, and relating the proteome data to existing metabolic pathway databases. Analysis of protein expression profiles is expedited, providing the capability to efficiently determine the gene locations for proteins modulated in abundance in response to different growth conditions and to locate the positions of the proteins within specific metabolic pathways. The proteome of the archaeon Methanococcus jannaschii, a microbe for which the complete genome sequence is available, is used to demonstrate the capabilities of this evolving web interface (http://proteomeweb.anl.gov).  相似文献   

18.
19.
20.
从信息处理的角度来看,生物信息学与自然语言处理中的许多问题是非常相似的,因此,可以将一些自然语言处理中的经典方法应用到生物信息学文字中。本文介绍了自然语言处理和生物信息学中共有的问题,如比对、分类、预测等,以及这些问题的解决方法。通过对两个领域形似问题的分析可知,优秀的自然语言处理技术也可用来解决生物信息学方面的问题,并且一些还未在生物信息学领域得到应用的自然语言理解技术也有其潜在的应用价值。最后给出了一个分类问题的解决方案,演示了如何在生物数据上应用算法进行实验。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号