首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 296 毫秒
1.
XML, bioinformatics and data integration   总被引:15,自引:0,他引:15  
Motivation: The eXtensible Markup Language (XML) is an emerging standard for structuring documents, notably for the World Wide Web. In this paper, the authors present XML and examine its use as a data language for bioinformatics. In particular, XML is compared to other languages, and some of the potential uses of XML in bioinformatics applications are presented. The authors propose to adopt XML for data interchange between databases and other sources of data. Finally the discussion is illustrated by a test case of a pedigree data model in XML. Contact: Emmanuel.Barillot@infobiogen.fr  相似文献   

2.
In analysis of bioinformatics data, a unique challenge arises from the high dimensionality of measurements. Without loss of generality, we use genomic study with gene expression measurements as a representative example but note that analysis techniques discussed in this article are also applicable to other types of bioinformatics studies. Principal component analysis (PCA) is a classic dimension reduction approach. It constructs linear combinations of gene expressions, called principal components (PCs). The PCs are orthogonal to each other, can effectively explain variation of gene expressions, and may have a much lower dimensionality. PCA is computationally simple and can be realized using many existing software packages. This article consists of the following parts. First, we review the standard PCA technique and their applications in bioinformatics data analysis. Second, we describe recent 'non-standard' applications of PCA, including accommodating interactions among genes, pathways and network modules and conducting PCA with estimating equations as opposed to gene expressions. Third, we introduce several recently proposed PCA-based techniques, including the supervised PCA, sparse PCA and functional PCA. The supervised PCA and sparse PCA have been shown to have better empirical performance than the standard PCA. The functional PCA can analyze time-course gene expression data. Last, we raise the awareness of several critical but unsolved problems related to PCA. The goal of this article is to make bioinformatics researchers aware of the PCA technique and more importantly its most recent development, so that this simple yet effective dimension reduction technique can be better employed in bioinformatics data analysis.  相似文献   

3.
随着石油等不可再生资源的日益减少以及环境污染问题的日益严重,应用工业生物催化技术改造或取代传统化工工艺已经成为新世纪化学工业可持续发展的研究热点。工业生物催化技术的研究对象是生物催化剂及其催化过程。近来,利用生物信息学技术进行工业生物催化研究已经越来越受到人们的重视。随着工业生物催化的发展,生物信息学将直接指导并加快新型高效生物催化剂的发现及功能改造进程。  相似文献   

4.
5.
MOTIVATION: Grid computing is used to solve large-scale bioinformatics problems with gigabytes database by distributing the computation across multiple platforms. Until now in developing bioinformatics grid applications, it is extremely tedious to design and implement the component algorithms and parallelization techniques for different classes of problems, and to access remotely located sequence database files of varying formats across the grid. In this study, we propose a grid programming toolkit, GLAD (Grid Life sciences Applications Developer), which facilitates the development and deployment of bioinformatics applications on a grid. RESULTS: GLAD has been developed using ALiCE (Adaptive scaLable Internet-based Computing Engine), a Java-based grid middleware, which exploits the task-based parallelism. Two bioinformatics benchmark applications, such as distributed sequence comparison and distributed progressive multiple sequence alignment, have been developed using GLAD.  相似文献   

6.
About five years ago, ontology was almost unknown in bioinformatics, even more so in molecular biology. Nowadays, many bioinformatics articles mention it in connection with text mining, data integration or as a metaphysical cure for problems in standardisation of nomenclature and other applications. This article attempts to give an account of what concept ontologies in the domain of biology and bioinformatics are; what they are not; how they can be constructed; how they can be used; and some fallacies and pitfalls creators and users should be aware of.  相似文献   

7.
Text mining for translational bioinformatics is a new field with tremendous research potential. It is a subfield of biomedical natural language processing that concerns itself directly with the problem of relating basic biomedical research to clinical practice, and vice versa. Applications of text mining fall both into the category of T1 translational research—translating basic science results into new interventions—and T2 translational research, or translational research for public health. Potential use cases include better phenotyping of research subjects, and pharmacogenomic research. A variety of methods for evaluating text mining applications exist, including corpora, structured test suites, and post hoc judging. Two basic principles of linguistic structure are relevant for building text mining applications. One is that linguistic structure consists of multiple levels. The other is that every level of linguistic structure is characterized by ambiguity. There are two basic approaches to text mining: rule-based, also known as knowledge-based; and machine-learning-based, also known as statistical. Many systems are hybrids of the two approaches. Shared tasks have had a strong effect on the direction of the field. Like all translational bioinformatics software, text mining software for translational bioinformatics can be considered health-critical and should be subject to the strictest standards of quality assurance and software testing.

What to Learn in This Chapter

Text mining is an established field, but its application to translational bioinformatics is quite new and it presents myriad research opportunities. It is made difficult by the fact that natural (human) language, unlike computer language, is characterized at all levels by rampant ambiguity and variability. Important sub-tasks include gene name recognition, or finding mentions of gene names in text; gene normalization, or mapping mentions of genes in text to standard database identifiers; phenotype recognition, or finding mentions of phenotypes in text; and phenotype normalization, or mapping mentions of phenotypes to concepts in ontologies. Text mining for translational bioinformatics can necessitate dealing with two widely varying genres of text—published journal articles, and prose fields in electronic medical records. Research into the latter has been impeded for years by lack of public availability of data sets, but this has very recently changed and the field is poised for rapid advances. Like all translational bioinformatics software, text mining software for translational bioinformatics can be considered health-critical and should be subject to the strictest standards of quality assurance and software testing.
This article is part of the “Translational Bioinformatics” collection for PLOS Computational Biology.
  相似文献   

8.
The cross-disciplinary nature of bioinformatics entails co-evolution with other biomedical disciplines, whereby some bioinformatics applications become popular in certain disciplines and, in turn, these disciplines influence the focus of future bioinformatics development efforts. We observe here that the growth of computational approaches within various biomedical disciplines is not merely a reflection of a general extended usage of computers and the Internet, but due to the production of useful bioinformatics databases and methods for the rest of the biomedical scientific community. We have used the abstracts stored both in the MEDLINE database of biomedical literature and in NIH-funded project grants, to quantify two effects. First, we examine the biomedical literature as a whole and find that the use of computational methods has become increasingly prevalent across biomedical disciplines over the past three decades, while use of databases and the Internet have been rapidly increasing over the past decade. Second, we study the recent trends in the use of bioinformatics topics. We observe that molecular sequence databases are a widely adopted contribution in biomedicine from the field of bioinformatics, and that microarray analysis is one of the major new topics engaged by the bioinformatics community. Via this analysis, we were able to identify areas of rapid growth in the use of informatics to aid in curriculum planning, development of computational infrastructure and strategies for workforce education and funding.  相似文献   

9.
There are many ftp or http servers storing data required for biological research. While some download applications are available, there is no user-friendly download application with a graphical interface specifically designed and adapted to meet the requirements of bioinformatics. BioDownloader is a program for downloading and updating files from ftp and http servers. It is optimized to work robustly with large numbers of files. It allows the selective retrieval of only the required files (batch downloads, multiple file masks, ls-lR file parsing, recursive search, recent updates, etc.). BioDownloader has a built-in repository containing the settings for common bioinformatics file-synchronization needs, including the Protein Data Bank (PDB) and National Center for Biotechnology Information (NCBI) databases. It can post-process downloaded files, including archive extraction and file conversions. AVAILABILITY: The program can be installed from http://dunbrack.fccc.edu/BioDownloader. The software is freely available for both non-commercial and commercial users under the BSD license.  相似文献   

10.
Storing biological sequence databases in relational form   总被引:2,自引:0,他引:2  
SUMMARY: We have created a set of applications using Perl and Java in combination with XML technology to install biological sequence databases into an Oracle RDBMS. An easy-to-use interface using Java has been created for database query and other tools developed to integrate with our in-house bioinformatics applications. AVAILIBILITY: The database schema, DTD file, and source codes are available from the authors via email. CONTACT: guochun_ xie@merck. com  相似文献   

11.
非专业研究生生物信息学课程教学中存在的问题及对策   总被引:5,自引:0,他引:5  
生物信息学是一门多学科交叉的核心学科,对生物医学研究和发展有着巨大的推动作用。对非专业研究生生物信息学课程教学进行剖析,阐述了生物信息学课程的适用对象,总结了教学中存在的供需矛盾及课程教学中存在的问题,并提出了相应对策,为进一步提高非专业研究生生物信息学课程教学质量提供参考。  相似文献   

12.
A review of feature selection techniques in bioinformatics   总被引:13,自引:0,他引:13  
Feature selection techniques have become an apparent need in many bioinformatics applications. In addition to the large pool of techniques that have already been developed in the machine learning and data mining fields, specific applications in bioinformatics have led to a wealth of newly proposed techniques. In this article, we make the interested reader aware of the possibilities of feature selection, providing a basic taxonomy of feature selection techniques, and discussing their use, variety and potential in a number of both common as well as upcoming bioinformatics applications.  相似文献   

13.
农业生物信息数据库发展现状及应用   总被引:4,自引:0,他引:4  
农业生物信息数据库是农业科学研究者的基础工具,利用数据库中的大量信息,便于进行农业生物的改良与保护。本文介绍了农业生物信息数据库的发展状况及其应用,并讨论了目前农业生物信息数据库存在的问题。  相似文献   

14.
The ability of bioinformatics to characterize genomic and proteomic sequences from bacteria Bacillus sp. for prediction of genes and proteins has been evaluated. Genomics coupling with proteomics, which is relied on integration of the significant advances recently achieved in two-dimensional (2-D) electrophoretic separation of proteins and mass spectrometry (MS), are now important and high throughput techniques for qualifying and analyzing gene and protein expression, discovering new gene or protein products, and understanding of gene and protein functions including post-genomic study. In addition, the bioinformatics of Bacillus sp. is embraced into many databases that will facilitate to rapidly search the information of Bacillus sp. in both genomics and proteomics. It is also possible to highlight sites for post-translational modifications based on the specific protein sequence motifs that play important roles in the structure, activity and compartmentalization of proteins. Moreover, the secreted proteins from Bacillus sp. are interesting and widely used in many applications especially biomedical applications that are the highly advantages for their potential therapeutic values.  相似文献   

15.
高通量测序技术的发展促进了组学技术在环境微生物研究中的广泛应用,而宏基因组学是目前最为关键和成熟的组学方法。生物信息学在微生物宏基因组学研究中具有至关重要的作用。它贯穿于宏基因组学的数据收集和存储、数据处理和分析等各个阶段,既是宏基因组学推广的最大瓶颈,也是目前宏基因组学研究发展的关键所在。本文主要介绍和归纳了目前在高通量宏基因组测序中常用的生物信息学分析平台及其重要的信息分析工具。未来几年之内,测序成本的下降和测序深度的增加将进一步增大宏基因组学研究在数据存储、数据处理和数据挖掘层面的难度,因此相应生物信息学技术与方法的研究和发展也势在必行。近期内我们应该首先加强基础性分析和存储平台的建设以方便普通环境微生物研究者使用,同时针对目前生物信息分析的瓶颈步骤和关键任务重点突破,逐步发展。  相似文献   

16.
Novel omics technologies in nutrition research   总被引:1,自引:0,他引:1  
  相似文献   

17.
There are many bioinformatics tools that deal with input/output (I/O) issues by using filing systems from the most common operating systems, such as Linux or MS Windows. However, as data volumes increase, there is a need for more efficient disk access, ad hoc memory management and specific page-replacement policies. We propose a device driver that can be used by multiple applications. It keeps the application code unchanged, providing a non-intrusive and flexible strategy for I/O calls that may be adopted in a straightforward manner. With our approach, database developers can define their own I/O management strategies. We used our device driver to manage Basic Local Alignment Search Tool (BLAST) I/O calls. Based on preliminary experimental results with National Center for Biotechnology Information (NCBI) BLAST, this approach can provide database management systems-like data management features, which may be used for BLAST and many other computational biology applications.  相似文献   

18.
The explosive growth of the bioinformatics field has led to a large amount of data and software applications publicly available as web resources. However, the lack of persistence of web references is a barrier to a comprehensive shared access. We conducted a study of the current availability and other features of primary bioinforo matics web resources (such as software tools and databases). The majority (95%) of the examined bioinformatics web resources were found running on UNIX/Linux operating systems, and the most widely used web server was found to be Apache (or Apache-related products). Of the overall 1,130 Uniform Resource Locators (URLs) examined, 91% were highly available (more than 90% of the time), while only 4% showed low accessibility (less than 50% of the time) during the survey. Furthermore, the most common URL failure modes are presented and analyzed.  相似文献   

19.
The large amount of biological data available in the current times, makes it necessary to use tools and applications based on sophisticated and efficient algorithms, developed in the area of bioinformatics. Further, access to high performance computing resources is necessary, to achieve results in reasonable time. To speed up applications and utilize available compute resources as efficient as possible, software developers make use of parallelization mechanisms, like multithreading. Many of the available tools in bioinformatics offer multithreading capabilities, but more compute power is not always helpful. In this study we investigated the behavior of well-known applications in bioinformatics, regarding their performance in the terms of scaling, different virtual environments and different datasets with our benchmarking tool suite BOOTABLE. The tool suite includes the tools BBMap, Bowtie2, BWA, Velvet, IDBA, SPAdes, Clustal Omega, MAFFT, SINA and GROMACS. In addition we added an application using the machine learning framework TensorFlow. Machine learning is not directly part of bioinformatics but applied to many biological problems, especially in the context of medical images (X-ray photographs). The mentioned tools have been analyzed in two different virtual environments, a virtual machine environment based on the OpenStack cloud software and in a Docker environment. The gained performance values were compared to a bare-metal setup and among each other. The study reveals, that the used virtual environments produce an overhead in the range of seven to twenty-five percent compared to the bare-metal environment. The scaling measurements showed, that some of the analyzed tools do not benefit from using larger amounts of computing resources, whereas others showed an almost linear scaling behavior. The findings of this study have been generalized as far as possible and should help users to find the best amount of resources for their analysis. Further, the results provide valuable information for resource providers to handle their resources as efficiently as possible and raise the user community’s awareness of the efficient usage of computing resources.  相似文献   

20.
Accurate protein structure prediction remains an active objective of research in bioinformatics. Membrane proteins comprise approximately 20% of most genomes. They are, however, poorly tractable targets of experimental structure determination. Their analysis using bioinformatics thus makes an important contribution to their on-going study. Using a method based on Bayesian Networks, which provides a flexible and powerful framework for statistical inference, we have addressed the alignment-free discrimination of membrane from non-membrane proteins. The method successfully identifies prokaryotic and eukaryotic alpha-helical membrane proteins at 94.4% accuracy, beta-barrel proteins at 72.4% accuracy, and distinguishes assorted non-membranous proteins with 85.9% accuracy. The method here is an important potential advance in the computational analysis of membrane protein structure. It represents a useful tool for the characterisation of membrane proteins with a wide variety of potential applications.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号