首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Given the growing amount of biological data, data mining methods have become an integral part of bioinformatics research. Unfortunately, standard data mining tools are often not sufficiently equipped for handling raw data such as e.g. amino acid sequences. One popular and freely available framework that contains many well-known data mining algorithms is the Waikato Environment for Knowledge Analysis (Weka). In the BioWeka project, we introduce various input formats for bioinformatics data and bioinformatics methods like alignments to Weka. This allows users to easily combine them with Weka's classification, clustering, validation and visualization facilities on a single platform and therefore reduces the overhead of converting data between different data formats as well as the need to write custom evaluation procedures that can deal with many different programs. We encourage users to participate in this project by adding their own components and data formats to BioWeka. Availability: The software, documentation and tutorial are available at http://www.bioweka.org.  相似文献   

2.

Background  

The BioMoby project aims to identify and deploy standards and conventions that aid in the discovery, execution, and pipelining of distributed bioinformatics Web Services. As of August, 2006, approximately 680 bioinformatics resources were available through the BioMoby interoperability platform. There are a variety of clients that can interact with BioMoby-style services. Here we describe a Web-based browser-style client – Gbrowse Moby – that allows users to discover and "surf" from one bioinformatics service to the next using a semantically-aided browsing interface.  相似文献   

3.
MOTIVATION: The development of an integrated genetic and physical map for the maize genome involves the generation of an enormous amount of data. Managing this data requires a system to aid in genotype scoring for different types of markers coming from both local and remote users. In addition, researchers need an efficient way to interact with genetic mapping software and with data files from automated DNA sequencing. They also need ways to manage primer data for mapping and sequencing and provide views of the integrated physical and genetic map and views of genetic map comparisons. RESULTS: The MMP-LIMS system has been used successfully in a high-throughput mapping environment. The genotypes from 957 SSR, 1023 RFLP, 189 SNP, and 177 InDel markers have been entered and verified via MMP-LIMS. The system is flexible, and can be easily modified to manage data for other species. The software is freely available. AVAILABILITY: To receive a copy of the iMap or cMap software, please fill out the form on our website. The other MMP-LIMS software is freely available at http://www.maizemap.org/bioinformatics.htm.  相似文献   

4.

Background

Analyzing high throughput genomics data is a complex and compute intensive task, generally requiring numerous software tools and large reference data sets, tied together in successive stages of data transformation and visualisation. A computational platform enabling best practice genomics analysis ideally meets a number of requirements, including: a wide range of analysis and visualisation tools, closely linked to large user and reference data sets; workflow platform(s) enabling accessible, reproducible, portable analyses, through a flexible set of interfaces; highly available, scalable computational resources; and flexibility and versatility in the use of these resources to meet demands and expertise of a variety of users. Access to an appropriate computational platform can be a significant barrier to researchers, as establishing such a platform requires a large upfront investment in hardware, experience, and expertise.

Results

We designed and implemented the Genomics Virtual Laboratory (GVL) as a middleware layer of machine images, cloud management tools, and online services that enable researchers to build arbitrarily sized compute clusters on demand, pre-populated with fully configured bioinformatics tools, reference datasets and workflow and visualisation options. The platform is flexible in that users can conduct analyses through web-based (Galaxy, RStudio, IPython Notebook) or command-line interfaces, and add/remove compute nodes and data resources as required. Best-practice tutorials and protocols provide a path from introductory training to practice. The GVL is available on the OpenStack-based Australian Research Cloud (http://nectar.org.au) and the Amazon Web Services cloud. The principles, implementation and build process are designed to be cloud-agnostic.

Conclusions

This paper provides a blueprint for the design and implementation of a cloud-based Genomics Virtual Laboratory. We discuss scope, design considerations and technical and logistical constraints, and explore the value added to the research community through the suite of services and resources provided by our implementation.  相似文献   

5.
Integr8 (http://www.ebi.ac.uk/integr8/) is providing an integration layer for the exploitation of genomic and proteomic data by drawing on databases maintained at major bioinformatics centres in Europe. Main aims are to store the relationships of biological entities to each other and to entries in other databases, to provide a framework that allows for new kinds of data to be integrated, and to offer an entity-centric view of complete genomes and proteomes. Basic tools for data integration comprise the Proteome Analysis database, the International Protein Index (IPI), the Universal Protein sequence archive (UniParc) and the Genome Reviews. Entry points for the Integr8 portal depend on the users entity of interest: from browsing the taxonomy or with a predetermined species of interest, the species page can be used, and a simple search page leads to different applications when looking for certain protein sequences or genes. Customisable statistics data are available from the BioMart application, and pre-prepared data can be downloaded from the FTP site.  相似文献   

6.
The large amount of biological data available in the current times, makes it necessary to use tools and applications based on sophisticated and efficient algorithms, developed in the area of bioinformatics. Further, access to high performance computing resources is necessary, to achieve results in reasonable time. To speed up applications and utilize available compute resources as efficient as possible, software developers make use of parallelization mechanisms, like multithreading. Many of the available tools in bioinformatics offer multithreading capabilities, but more compute power is not always helpful. In this study we investigated the behavior of well-known applications in bioinformatics, regarding their performance in the terms of scaling, different virtual environments and different datasets with our benchmarking tool suite BOOTABLE. The tool suite includes the tools BBMap, Bowtie2, BWA, Velvet, IDBA, SPAdes, Clustal Omega, MAFFT, SINA and GROMACS. In addition we added an application using the machine learning framework TensorFlow. Machine learning is not directly part of bioinformatics but applied to many biological problems, especially in the context of medical images (X-ray photographs). The mentioned tools have been analyzed in two different virtual environments, a virtual machine environment based on the OpenStack cloud software and in a Docker environment. The gained performance values were compared to a bare-metal setup and among each other. The study reveals, that the used virtual environments produce an overhead in the range of seven to twenty-five percent compared to the bare-metal environment. The scaling measurements showed, that some of the analyzed tools do not benefit from using larger amounts of computing resources, whereas others showed an almost linear scaling behavior. The findings of this study have been generalized as far as possible and should help users to find the best amount of resources for their analysis. Further, the results provide valuable information for resource providers to handle their resources as efficiently as possible and raise the user community’s awareness of the efficient usage of computing resources.  相似文献   

7.
The development of glycan-related databases and bioinformatics applications is considerably lagging behind compared with the wealth of available data and software tools in genomics and proteomics. Because the encoding of glycan structures is more complex, most of the bioinformatics approaches cannot be applied to glycan structures. No standard procedures exist where glycan structures found in various species, organs, tissues or cells can be routinely deposited. In this article the concepts of the GLYCOSCIENCES.de portal are described. It is demonstrated how an efficient structure-based cross-linking of various glycan-related data originating from different resources can be accomplished using a single user interface. The structure oriented retrieval options-exact structure, substructure, motif, composition and sugar components-are discussed. The types of available data-references, composition, spatial structures, nuclear magnetic resonance (NMR) shifts (experimental and estimated), theoretically calculated fragments and Protein Database (PDB) entries-are exemplified for Man(3.) The free availability and unrestricted use of glycan-related data is an absolute prerequisite to efficiently share distributed resources. Additionally, there is an urgent need to agree to a generally accepted exchange format as well as to a common software interface. An open access repository for glyco-related experimental data will secure that the loss of primary data will be considerably reduced.  相似文献   

8.
MOTIVATION: As more whole genome sequences become available, comparing multiple genomes at the sequence level can provide insight into new biological discovery. However, there are significant challenges for genome comparison. The challenge includes requirement for computational resources owing to the large volume of genome data. More importantly, since the choice of genomes to be compared is entirely subjective, there are too many choices for genome comparison. For these reasons, there is pressing need for bioinformatics systems for comparing multiple genomes where users can choose genomes to be compared freely. RESULTS: PLATCOM (Platform for Computational Comparative Genomics) is an integrated system for the comparative analysis of multiple genomes. The system is built on several public databases and a suite of genome analysis applications are provided as exemplary genome data mining tools over these internal databases. Researchers are able to visually investigate genomic sequence similarities, conserved gene neighborhoods, conserved metabolic pathways and putative gene fusion events among a set of selected multiple genomes. AVAILABILITY: http://platcom.informatics.indiana.edu/platcom  相似文献   

9.
High throughput MS‐based proteomic experiments generate large volumes of complex data and necessitate bioinformatics tools to facilitate their handling. Needs include means to archive data, to disseminate them to the scientific communities, and to organize and annotate them to facilitate their interpretation. We present here an evolution of PROTICdb, a database software that now handles MS data, including quantification. PROTICdb has been developed to be as independent as possible from tools used to produce the data. Biological samples and proteomics data are described using ontology terms. A Taverna workflow is embedded, thus permitting to automatically retrieve information related to identified proteins by querying external databases. Stored data can be displayed graphically and a “Query Builder” allows users to make sophisticated queries without knowledge on the underlying database structure. All resources can be accessed programmatically using a Java client API or RESTful web services, allowing the integration of PROTICdb in any portal. An example of application is presented, where proteins extracted from a maize leaf sample by four different methods were compared using a label‐free shotgun method. Data are available at http://moulon.inra.fr/protic/public . PROTICdb thus provides means for data storage, enrichment, and dissemination of proteomics data.  相似文献   

10.
EMBnet is a consortium of collaborating bioinformatics groups located mainly within Europe (http://www.embnet.org). Each member country is represented by a 'node', a group responsible for the maintenance of local services for their users (e.g. education, training, software, database distribution, technical support, helpdesk). Among these services a web portal with links and access to locally developed and maintained software is essential and different for each node. Our web portal targets biomedical scientists in Switzerland and elsewhere, offering them access to a collection of important sequence analysis tools mirrored from other sites or developed locally. We describe here the Swiss EMBnet node web site (http://www.ch.embnet.org), which presents a number of original services not available anywhere else.  相似文献   

11.
With the proliferation of high-throughput technologies, genome-level data analysis has become common in molecular biology. Bioinformaticians are developing extensive resources to annotate and mine biological features from high-throughput data. The underlying database management systems for most bioinformatics software are based on a relational model. Modern non-relational databases offer an alternative that has flexibility, scalability, and a non-rigid design schema. Moreover, with an accelerated development pace, non-relational databases like CouchDB can be ideal tools to construct bioinformatics utilities. We describe CouchDB by presenting three new bioinformatics resources: (a) geneSmash, which collates data from bioinformatics resources and provides automated gene-centric annotations, (b) drugBase, a database of drug-target interactions with a web interface powered by geneSmash, and (c) HapMap-CN, which provides a web interface to query copy number variations from three SNP-chip HapMap datasets. In addition to the web sites, all three systems can be accessed programmatically via web services.  相似文献   

12.
A web-based microarray data analysis tool, ArrayOU (freely available at www.bioinformatics.plantbio.ohiou.edu.), has been developed at the Ohio University Genomics Facility for the research and education community to analyze Agilent microarray data. Agilent''s microarray pipeline has gained in popularity as a result of its ease of use and low cost of customized arrays. The current version of the ArrayOU pipeline allows users to visualize, analyze, and annotate microarray data from commercially available and customized Agilent expression arrays and is extendable for further implementations.  相似文献   

13.
14.
15.
The MaxOcc web portal is presented for the characterization of the conformational heterogeneity of two-domain proteins, through the calculation of the Maximum Occurrence that each protein conformation can have in agreement with experimental data. Whatever the real ensemble of conformations sampled by a protein, the weight of any conformation cannot exceed the calculated corresponding Maximum Occurrence value. The present portal allows users to compute these values using any combination of restraints like pseudocontact shifts, paramagnetism-based residual dipolar couplings, paramagnetic relaxation enhancements and small angle X-ray scattering profiles, given the 3D structure of the two domains as input. MaxOcc is embedded within the NMR grid services of the WeNMR project and is available via the WeNMR gateway at http://py-enmr.cerm.unifi.it/access/index/maxocc . It can be used freely upon registration to the grid with a digital certificate.  相似文献   

16.
The extensive germplasm resource collections that are now available for major crop plants and their wild relatives will increasingly provide valuable biological and bioinformatics resources for plant physiologists and geneticists to dissect the molecular basis of key traits and to develop highly adapted plant material to sustain future breeding programs. A key to the efficient deployment of these resources is the development of information systems that will enable the collection and storage of biological information for these plant lines to be integrated with the molecular information that is now becoming available through the use of high-throughput genomics and post-genomics technologies. The GERMINATE database has been designed to hold a diverse variety of data types, ranging from molecular to phenotypic, and to allow querying between such data for any plant species. Data are stored in GERMINATE in a technology-independent manner, such that new technologies can be accommodated in the database as they emerge, without modification of the underlying schema. Users can access data in GERMINATE databases either via a lightweight Perl-CGI Web interface or by the more complex Genomic Diversity and Phenotype Connection software. GERMINATE is released under the GNU General Public License and is available at http://germinate.scri.sari.ac.uk/germinate/.  相似文献   

17.
Here we describe the Immunogenetic Management Software (IMS) system, a novel web-based application that permits multiplexed analysis of complex immunogenetic traits that are necessary for the accurate planning and execution of experiments involving large animal models, including nonhuman primates. IMS is capable of housing complex pedigree relationships, microsatellite-based MHC typing data, as well as MHC pyrosequencing expression analysis of class I alleles. It includes a novel, automated MHC haplotype naming algorithm and has accomplished an innovative visualization protocol that allows users to view multiple familial and MHC haplotype relationships through a single, interactive graphical interface. Detailed DNA and RNA-based data can also be queried and analyzed in a highly accessible fashion, and flexible search capabilities allow experimental choices to be made based on multiple, individualized and expandable immunogenetic factors. This web application is implemented in Java, MySQL, Tomcat, and Apache, with supported browsers including Internet Explorer and Firefox on Windows and Safari on Mac OS. The software is freely available for distribution to noncommercial users by contacting Leslie.kean@emory.edu. A demonstration site for the software is available at http://typing.emory.edu/typing_demo , user name: imsdemo7@gmail.com and password: imsdemo.  相似文献   

18.
With the decreasing cost of DNA sequencing technology and the vast diversity of biological resources, researchers increasingly face the basic challenge of annotating a larger number of expressed sequences tags (EST) from a variety of species. This typically consists of a series of repetitive tasks, which should be automated and easy to use. The results of these annotation tasks need to be stored and organized in a consistent way. All these operations should be self-installing, platform independent, easy to customize and amenable to using distributed bioinformatics resources available on the Internet.  相似文献   

19.
Several systems have been presented in the last years in order to manage the complexity of large microarray experiments. Although good results have been achieved, most systems tend to lack in one or more fields. A Grid based approach may provide a shared, standardized and reliable solution for storage and analysis of biological data, in order to maximize the results of experimental efforts. A Grid framework has been therefore adopted due to the necessity of remotely accessing large amounts of distributed data as well as to scale computational performances for terabyte datasets. Two different biological studies have been planned in order to highlight the benefits that can emerge from our Grid based platform. The described environment relies on storage services and computational services provided by the gLite Grid middleware. The Grid environment is also able to exploit the added value of metadata in order to let users better classify and search experiments. A state-of-art Grid portal has been implemented in order to hide the complexity of framework from end users and to make them able to easily access available services and data. The functional architecture of the portal is described. As a first test of the system performances, a gene expression analysis has been performed on a dataset of Affymetrix GeneChip Rat Expression Array RAE230A, from the ArrayExpress database. The sequence of analysis includes three steps: (i) group opening and image set uploading, (ii) normalization, and (iii) model based gene expression (based on PM/MM difference model). Two different Linux versions (sequential and parallel) of the dChip software have been developed to implement the analysis and have been tested on a cluster. From results, it emerges that the parallelization of the analysis process and the execution of parallel jobs on distributed computational resources actually improve the performances. Moreover, the Grid environment have been tested both against the possibility of uploading and accessing distributed datasets through the Grid middleware and against its ability in managing the execution of jobs on distributed computational resources. Results from the Grid test will be discussed in a further paper.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号