首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In order to make use of the emerging grid and network services offered by various institutes and mandated by many current research projects, some kind of user accessible client is required. In contrast with attempts to build generic workbenches, Talisman is designed to allow a bioinformatics expert to rapidly build custom applications, immediately visible using standard web technology, for users who wish to concentrate on the biology of their problem rather than the informatics aspects. As a component of the MyGrid project, it is intended to allow access to arbitrary resources, including but not limited to relational, object and flat file data sources, analysis programs and grid based storage, tracking and distributed annotation systems.  相似文献   

2.
This paper provides an overview of methods and current applications of distributed computing in bioinformatics. Distributed computing is a strategy of dividing a large workload among multiple computers to reduce processing time, or to make use of resources such as programs and databases that are not available on all computers. Participating computers may be connected either through a local high-speed network or through the Internet.  相似文献   

3.
MOTIVATION: Grid computing is used to solve large-scale bioinformatics problems with gigabytes database by distributing the computation across multiple platforms. Until now in developing bioinformatics grid applications, it is extremely tedious to design and implement the component algorithms and parallelization techniques for different classes of problems, and to access remotely located sequence database files of varying formats across the grid. In this study, we propose a grid programming toolkit, GLAD (Grid Life sciences Applications Developer), which facilitates the development and deployment of bioinformatics applications on a grid. RESULTS: GLAD has been developed using ALiCE (Adaptive scaLable Internet-based Computing Engine), a Java-based grid middleware, which exploits the task-based parallelism. Two bioinformatics benchmark applications, such as distributed sequence comparison and distributed progressive multiple sequence alignment, have been developed using GLAD.  相似文献   

4.
Both distributed systems and multicore systems are difficult programming environments. Although the expert programmer may be able to carefully tune these systems to achieve high performance, the non-expert may struggle. We argue that high level abstractions are an effective way of making parallel computing accessible to the non-expert. An abstraction is a regularly structured framework into which a user may plug in simple sequential programs to create very large parallel programs. By virtue of a regular structure and declarative specification, abstractions may be materialized on distributed, multicore, and distributed multicore systems with robust performance across a wide range of problem sizes. In previous work, we presented the All-Pairs abstraction for computing on distributed systems of single CPUs. In this paper, we extend All-Pairs to multicore systems, and introduce the Wavefront and Makeflow abstractions, which represent a number of problems in economics and bioinformatics. We demonstrate good scaling of both abstractions up to 32 cores on one machine and hundreds of cores in a distributed system.  相似文献   

5.
ABCGrid: Application for Bioinformatics Computing Grid   总被引:1,自引:0,他引:1  
We have developed a package named Application for Bioinformatics Computing Grid (ABCGrid). ABCGrid was designed for biology laboratories to use heterogeneous computing resources and access bioinformatics applications from one master node. ABCGrid is very easy to install and maintain at the premise of robustness and high performance. We implement a mechanism to install and update all applications and databases in worker nodes automatically to reduce the workload of manual maintenance. We use a backup task method and self-adaptive job dispatch approach to improve performance. Currently, ABCGrid integrates NCBI_BLAST, Hmmpfam and CE, running on a number of computing platforms including UNIX/Linux, Windows and Mac OS X. AVAILABILITY: The source code, executables and documents can be downloaded from http://abcgrid.cbi.pku.edu.cn  相似文献   

6.
The Bioconductor project is an initiative for the collaborative creation of extensible software for computational biology and bioinformatics. The goals of the project include: fostering collaborative development and widespread use of innovative software, reducing barriers to entry into interdisciplinary scientific research, and promoting the achievement of remote reproducibility of research results. We describe details of our aims and methods, identify current challenges, compare Bioconductor to other open bioinformatics projects, and provide working examples.  相似文献   

7.
Global computing, the collaboration of idle PCs via the Internet in a SETI@home style, emerges as a new way of massive parallel multiprocessing with potentially enormous CPU power. Its relations to the broader, fast-moving field of Grid computing are discussed without attempting a review of the latter. This review (i) includes a short table of milestones in global computing history, (ii) lists opportunities global computing offers for bioinformatics, (iii) describes the structure of problems well suited for such an approach, (iv) analyses the anatomy of successful projects and (v) points to existing software frameworks. Finally, an evaluation of the various costs shows that global computing indeed has merit, if the problem to be solved is already coded appropriately and a suitable global computing framework can be found. Then, either significant amounts of computing power can be recruited from the general public, or--if employed in an enterprise-wide Intranet for security reasons--idle desktop PCs can substitute for an expensive dedicated cluster.  相似文献   

8.
The cross-disciplinary nature of bioinformatics entails co-evolution with other biomedical disciplines, whereby some bioinformatics applications become popular in certain disciplines and, in turn, these disciplines influence the focus of future bioinformatics development efforts. We observe here that the growth of computational approaches within various biomedical disciplines is not merely a reflection of a general extended usage of computers and the Internet, but due to the production of useful bioinformatics databases and methods for the rest of the biomedical scientific community. We have used the abstracts stored both in the MEDLINE database of biomedical literature and in NIH-funded project grants, to quantify two effects. First, we examine the biomedical literature as a whole and find that the use of computational methods has become increasingly prevalent across biomedical disciplines over the past three decades, while use of databases and the Internet have been rapidly increasing over the past decade. Second, we study the recent trends in the use of bioinformatics topics. We observe that molecular sequence databases are a widely adopted contribution in biomedicine from the field of bioinformatics, and that microarray analysis is one of the major new topics engaged by the bioinformatics community. Via this analysis, we were able to identify areas of rapid growth in the use of informatics to aid in curriculum planning, development of computational infrastructure and strategies for workforce education and funding.  相似文献   

9.
ProteoMix is a suite of JAVA programs for identifying, annotating and predicting regions of interest in large sets of amino acid sequences, according to systematic and consistent criteria. It is based on two concepts (1) the integration of results from different sequence analysis tools increases the prediction reliability; and (2) the integration protocol is critical and needs to be easily adaptable in a case-by-case manner. ProteoMix was designed to analyze simultaneously multiple protein sequences using several bioinformatics tools, merge the results of the analyses using logical functions and display them on an integrated viewer. In addition, new sequences can be added seamlessly to an analysis performed on an initial set of sequences. ProteoMix has a modular design, and bioinformatics tools are run on remote servers accessed using the Internet Simple Object Access Protocol (SOAP), ensuring the swift implementation of additional tools. ProteoMix has a user-friendly interactive graphical user interface environment and runs on PCs with Microsoft OS. AVAILABILITY: ProteoMix is freely available for academic users at http://bio.gsc.riken.jp/ProteoMix/  相似文献   

10.
11.
Given the growing amount of biological data, data mining methods have become an integral part of bioinformatics research. Unfortunately, standard data mining tools are often not sufficiently equipped for handling raw data such as e.g. amino acid sequences. One popular and freely available framework that contains many well-known data mining algorithms is the Waikato Environment for Knowledge Analysis (Weka). In the BioWeka project, we introduce various input formats for bioinformatics data and bioinformatics methods like alignments to Weka. This allows users to easily combine them with Weka's classification, clustering, validation and visualization facilities on a single platform and therefore reduces the overhead of converting data between different data formats as well as the need to write custom evaluation procedures that can deal with many different programs. We encourage users to participate in this project by adding their own components and data formats to BioWeka. Availability: The software, documentation and tutorial are available at http://www.bioweka.org.  相似文献   

12.
13.
Markov clustering (MCL) is becoming a key algorithm within bioinformatics for determining clusters in networks. However,with increasing vast amount of data on biological networks, performance and scalability issues are becoming a critical limiting factor in applications. Meanwhile, GPU computing, which uses CUDA tool for implementing a massively parallel computing environment in the GPU card, is becoming a very powerful, efficient, and low-cost option to achieve substantial performance gains over CPU approaches. The use of on-chip memory on the GPU is efficiently lowering the latency time, thus, circumventing a major issue in other parallel computing environments, such as MPI. We introduce a very fast Markov clustering algorithm using CUDA (CUDA-MCL) to perform parallel sparse matrix-matrix computations and parallel sparse Markov matrix normalizations, which are at the heart of MCL. We utilized ELLPACK-R sparse format to allow the effective and fine-grain massively parallel processing to cope with the sparse nature of interaction networks data sets in bioinformatics applications. As the results show, CUDA-MCL is significantly faster than the original MCL running on CPU. Thus, large-scale parallel computation on off-the-shelf desktop-machines, that were previously only possible on supercomputing architectures, can significantly change the way bioinformaticians and biologists deal with their data.  相似文献   

14.
MOTIVATION: In recent years there has been increased interest in producing large and accurate phylogenetic trees using statistical approaches. However for a large number of taxa, it is not feasible to construct large and accurate trees using only a single processor. A number of specialized parallel programs have been produced in an attempt to address the huge computational requirements of maximum likelihood. We express a number of concerns about the current set of parallel phylogenetic programs which are currently severely limiting the widespread availability and use of parallel computing in maximum likelihood-based phylogenetic analysis. RESULTS: We have identified the suitability of phylogenetic analysis to large-scale heterogeneous distributed computing. We have completed a distributed and fully cross-platform phylogenetic tree building program called distributed phylogeny reconstruction by maximum likelihood. It uses an already proven maximum likelihood-based tree building algorithm and a popular phylogenetic analysis library for all its likelihood calculations. It offers one of the most extensive sets of DNA substitution models currently available. We are the first, to our knowledge, to report the completion of a distributed phylogenetic tree building program that can achieve near-linear speedup while only using the idle clock cycles of machines. For those in an academic or corporate environment with hundreds of idle desktop machines, we have shown how distributed computing can deliver a 'free' ML supercomputer.  相似文献   

15.
The identification and characterization of peptides from MS/MS data represents a critical aspect of proteomics. It has been the subject of extensive research in bioinformatics resulting in the generation of a fair number of identification software tools. Most often, only one program with a specific and unvarying set of parameters is selected for identifying proteins. Hence, a significant proportion of the experimental spectra do not match the peptide sequences in the screened database due to inappropriate parameters or scoring schemes. The Swiss protein identification toolbox (swissPIT) project provides the scientific community with an expandable multitool platform for automated in‐depth analysis of MS data also able to handle data from high‐throughput experiments. With swissPIT many problems have been solved: The missing standards for input and output formats (A), creation of analysis workflows (B), unified result visualization (C), and simplicity of the user interface (D). Currently, swissPIT supports four different programs implementing two different search strategies to identify MS/MS spectra. Conceived to handle the calculation‐intensive needs of each of the programs, swissPIT uses the distributed resources of a Swiss‐wide computer Grid (http://www.swing‐grid.ch).  相似文献   

16.
In analysis of bioinformatics data, a unique challenge arises from the high dimensionality of measurements. Without loss of generality, we use genomic study with gene expression measurements as a representative example but note that analysis techniques discussed in this article are also applicable to other types of bioinformatics studies. Principal component analysis (PCA) is a classic dimension reduction approach. It constructs linear combinations of gene expressions, called principal components (PCs). The PCs are orthogonal to each other, can effectively explain variation of gene expressions, and may have a much lower dimensionality. PCA is computationally simple and can be realized using many existing software packages. This article consists of the following parts. First, we review the standard PCA technique and their applications in bioinformatics data analysis. Second, we describe recent 'non-standard' applications of PCA, including accommodating interactions among genes, pathways and network modules and conducting PCA with estimating equations as opposed to gene expressions. Third, we introduce several recently proposed PCA-based techniques, including the supervised PCA, sparse PCA and functional PCA. The supervised PCA and sparse PCA have been shown to have better empirical performance than the standard PCA. The functional PCA can analyze time-course gene expression data. Last, we raise the awareness of several critical but unsolved problems related to PCA. The goal of this article is to make bioinformatics researchers aware of the PCA technique and more importantly its most recent development, so that this simple yet effective dimension reduction technique can be better employed in bioinformatics data analysis.  相似文献   

17.
caCORE: a common infrastructure for cancer informatics   总被引:4,自引:0,他引:4  
MOTIVATION:Sites with substantive bioinformatics operations are challenged to build data processing and delivery infrastructure that provides reliable access and enables data integration. Locally generated data must be processed and stored such that relationships to external data sources can be presented. Consistency and comparability across data sets requires annotation with controlled vocabularies and, further, metadata standards for data representation. Programmatic access to the processed data should be supported to ensure the maximum possible value is extracted. Confronted with these challenges at the National Cancer Institute Center for Bioinformatics, we decided to develop a robust infrastructure for data management and integration that supports advanced biomedical applications. RESULTS: We have developed an interconnected set of software and services called caCORE. Enterprise Vocabulary Services (EVS) provide controlled vocabulary, dictionary and thesaurus services. The Cancer Data Standards Repository (caDSR) provides a metadata registry for common data elements. Cancer Bioinformatics Infrastructure Objects (caBIO) implements an object-oriented model of the biomedical domain and provides Java, Simple Object Access Protocol and HTTP-XML application programming interfaces. caCORE has been used to develop scientific applications that bring together data from distinct genomic and clinical science sources. AVAILABILITY: caCORE downloads and web interfaces can be accessed from links on the caCORE web site (http://ncicb.nci.nih.gov/core). caBIO software is distributed under an open source license that permits unrestricted academic and commercial use. Vocabulary and metadata content in the EVS and caDSR, respectively, is similarly unrestricted, and is available through web applications and FTP downloads. SUPPLEMENTARY INFORMATION: http://ncicb.nci.nih.gov/core/publications contains links to the caBIO 1.0 class diagram and the caCORE 1.0 Technical Guide, which provide detailed information on the present caCORE architecture, data sources and APIs. Updated information appears on a regular basis on the caCORE web site (http://ncicb.nci.nih.gov/core).  相似文献   

18.
MOTIVATION: The (my)Grid project aims to exploit Grid technology, with an emphasis on the Information Grid, and provide middleware layers that make it appropriate for the needs of bioinformatics. (my)Grid is building high level services for data and application integration such as resource discovery, workflow enactment and distributed query processing. Additional services are provided to support the scientific method and best practice found at the bench but often neglected at the workstation, notably provenance management, change notification and personalisation. RESULTS: We give an overview of these services and their metadata. In particular, semantically rich metadata expressed using ontologies necessary to discover, select and compose services into dynamic workflows.  相似文献   

19.
Utility library for structural bioinformatics   总被引:1,自引:0,他引:1  
  相似文献   

20.
birgHPC, a bootable Linux Live CD has been developed to create high-performance clusters for bioinformatics and molecular dynamics studies using any Local Area Network (LAN)-networked computers. birgHPC features automated hardware and slots detection as well as provides a simple job submission interface. The latest versions of GROMACS, NAMD, mpiBLAST and ClustalW-MPI can be run in parallel by simply booting the birgHPC CD or flash drive from the head node, which immediately positions the rest of the PCs on the network as computing nodes. Thus, a temporary, affordable, scalable and high-performance computing environment can be built by non-computing-based researchers using low-cost commodity hardware. AVAILABILITY: The birgHPC Live CD and relevant user guide are available for free at http://birg1.fbb.utm.my/birghpc.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号