共查询到20条相似文献,搜索用时 0 毫秒
1.
Iterative applications are known to run as slow as their slowest computational component. This paper introduces malleability, a new dynamic reconfiguration strategy to overcome this limitation. Malleability is the ability to dynamically change the
data size and number of computational entities in an application. Malleability can be used by middleware to autonomously reconfigure
an application in response to dynamic changes in resource availability in an architecture-aware manner, allowing applications
to optimize the use of multiple processors and diverse memory hierarchies in heterogeneous environments.
The modular Internet Operating System (IOS) was extended to reconfigure applications autonomously using malleability. Two
different iterative applications were made malleable. The first is used in astronomical modeling, and representative of maximum-likelihood
applications was made malleable in the SALSA programming language. The second models the diffusion of heat over a two dimensional
object, and is representative of applications such as partial differential equations and some types of distributed simulations.
Versions of the heat application were made malleable both in SALSA and MPI. Algorithms for concurrent data redistribution
are given for each type of application. Results show that using malleability for reconfiguration is 10 to 100 times faster
on the tested environments. The algorithms are also shown to be highly scalable with respect to the quantity of data involved.
While previous work has shown the utility of dynamically reconfigurable applications using only computational component migration,
malleability is shown to provide up to a 15% speedup over component migration alone on a dynamic cluster environment.
This work is part of an ongoing research effort to enable applications to be highly reconfigurable and autonomously modifiable
by middleware in order to efficiently utilize distributed environments. Grid computing environments are becoming increasingly
heterogeneous and dynamic, placing new demands on applications’ adaptive behavior. This work shows that malleability is a
key aspect in enabling effective dynamic reconfiguration of iterative applications in these environments.
相似文献
Carlos A. VarelaEmail: |
2.
MOTIVATION: The human genome project and the development of new high-throughput technologies have created unparalleled opportunities to study the mechanism of diseases, monitor the disease progression and evaluate effective therapies. Gene expression profiling is a critical tool to accomplish these goals. The use of nucleic acid microarrays to assess the gene expression of thousands of genes simultaneously has seen phenomenal growth over the past five years. Although commercial sources of microarrays exist, investigators wanting more flexibility in the genes represented on the array will turn to in-house production. The creation and use of cDNA microarrays is a complicated process that generates an enormous amount of information. Effective data management of this information is essential to efficiently access, analyze, troubleshoot and evaluate the microarray experiments. RESULTS: We have developed a distributable software package designed to track and store the various pieces of data generated by a cDNA microarray facility. This includes the clone collection storage data, annotation data, workflow queues, microarray data, data repositories, sample submission information, and project/investigator information. This application was designed using a 3-tier client server model. The data access layer (1st tier) contains the relational database system tuned to support a large number of transactions. The data services layer (2nd tier) is a distributed COM server with full database transaction support. The application layer (3rd tier) is an internet based user interface that contains both client and server side code for dynamic interactions with the user. AVAILABILITY: This software is freely available to academic institutions and non-profit organizations at http://www.genomics.mcg.edu/niddkbtc. 相似文献
3.
A scalable and fast OPTICS for clustering trajectory big data 总被引:1,自引:0,他引:1
4.
Frequent itemset mining is widely used as a fundamental data mining technique. Recently, there have been proposed a number of MapReduce-based frequent itemset mining methods in order to overcome the limits on data size and speed of mining that sequential mining methods have. However, the existing MapReduce-based methods still do not have a good scalability due to high workload skewness, large intermediate data, and large network communication overhead. In this paper, we propose BIGMiner, a fast and scalable MapReduce-based frequent itemset mining method. BIGMiner generates equal-sized sub-databases called transaction chunks and performs support counting only based on transaction chunks and bitwise operations without generating and shuffling intermediate data. As a result, BIGMiner achieves very high scalability due to no workload skewness, no intermediate data, and small network communication overhead. Through extensive experiments using large-scale datasets of up to 6.5 billion transactions, we have shown that BIGMiner consistently and significantly outperforms the state-of-the-art methods without any memory problems. 相似文献
5.
Yuan Tian Cong Xu Weikuan Yu Jeffrey S. Vetter Scott Klasky Honggao Liu Saad Biaz 《Cluster computing》2014,17(2):475-486
Advances on multicore technologies lead to processors with tens and soon hundreds of cores in a single socket, resulting in an ever growing gap between computing power and available memory and I/O bandwidths for data handling. It would be beneficial if some of the computing power can be transformed into gains of I/O efficiency, thereby reducing this speed disparity between computing and I/O. In this paper, we design and implement a NEarline data COmpression and DECompression (neCODEC) scheme for data-intensive parallel applications. Several salient techniques are introduced in neCODEC, including asynchronous compression threads, elastic file representation, distributed metadata handling, and balanced subfile distribution. Our performance evaluation indicates that neCODEC can improve the performance of a variety of data-intensive microbenchmarks and scientific applications. Particularly, neCODEC is capable of increasing the effective bandwidth of S3D, a combustion simulation code, by more than 5 times. 相似文献
6.
Abdallah Mohamed Feroz Sainab Alani Sama Sayed Enas Taha Shanableh Abdallah 《Reviews in Environmental Science and Biotechnology》2019,18(3):543-578
Reviews in Environmental Science and Bio/Technology - The depletion of conventional energy sources has motivated countries to shift towards renewable and eco-friendly sources of energy. One of the... 相似文献
7.
Background
Very often genome-wide data analysis requires the interoperation of multiple databases and analytic tools. A large number of genome databases and bioinformatics applications are available through the web, but it is difficult to automate interoperation because: 1) the platforms on which the applications run are heterogeneous, 2) their web interface is not machine-friendly, 3) they use a non-standard format for data input and output, 4) they do not exploit standards to define application interface and message exchange, and 5) existing protocols for remote messaging are often not firewall-friendly. To overcome these issues, web services have emerged as a standard XML-based model for message exchange between heterogeneous applications. Web services engines have been developed to manage the configuration and execution of a web services workflow. 相似文献8.
9.
M. B. Dale 《Plant Ecology》1989,81(1-2):41-60
Although there are many measures of similarity existing in the phytosociological literature, these almost all apply to data for which the describing attributes have only single values. In many cases, however, there can be a richer structure in the attribute values, either directly from the nature of the attributes or derived from relationships between the stands. In this paper, I first examine a range of possible sources of such structure in phytosociological data, and then propose a similarity measure sufficiently general to be applicable to all the variant types. Finally I present some examples of applying such measures to frequency data from tropical grasslands and to successional data from subtropical rain forest. 相似文献
10.
Some of the approaches have been developed to retrieve data automatically from one or multiple remote biological data sources. However, most of them require researchers to remain online and wait for returned results. The latter not only requires highly available network connection, but also may cause the network overload. Moreover, so far none of the existing approaches has been designed to address the following problems when retrieving the remote data in a mobile network environment: (1) the resources of mobile devices are limited; (2) network connection is relatively of low quality; and (3) mobile users are not always online. To address the aforementioned problems, we integrate an agent migration approach with a multi-agent system to overcome the high latency or limited bandwidth problem by moving their computations to the required resources or services. More importantly, the approach is fit for the mobile computing environments. Presented in this paper are also the system architecture, the migration strategy, as well as the security authentication of agent migration. As a demonstration, the remote data retrieval from GenBank was used to illustrate the feasibility of the proposed approach. 相似文献
11.
12.
Werner T 《Current opinion in biotechnology》2008,19(1):50-54
13.
Sevinsky JR Cargile BJ Bunger MK Meng F Yates NA Hendrickson RC Stephenson JL 《Journal of proteome research》2008,7(1):80-88
High-throughput genome sequencing continues to accelerate the rate at which complete genomes are available for biological research. Many of these new genome sequences have little or no genome annotation currently available and hence rely upon computational predictions of protein coding genes. Evidence of translation from proteomic techniques could facilitate experimental validation of protein coding genes, but the techniques for whole genome searching with MS/MS data have not been adequately developed to date. Here we describe GENQUEST, a novel method using peptide isoelectric focusing and accurate mass to greatly reduce the peptide search space, making fast, accurate, and sensitive whole human genome searching possible on common desktop computers. In an initial experiment, almost all exonic peptides identified in a protein database search were identified when searching genomic sequence. Many peptides identified exclusively in the genome searches were incorrectly identified or could not be experimentally validated, highlighting the importance of orthogonal validation. Experimentally validated peptides exclusive to the genomic searches can be used to reannotate protein coding genes. GENQUEST represents an experimental tool that can be used by the proteomics community at large for validating computational approaches to genome annotation. 相似文献
14.
Background
Protein-protein interaction (PPI) plays a core role in cellular functions. Massively parallel supercomputing systems have been actively developed over the past few years, which enable large-scale biological problems to be solved, such as PPI network prediction based on tertiary structures.Results
We have developed a high throughput and ultra-fast PPI prediction system based on rigid docking, “MEGADOCK”, by employing a hybrid parallelization (MPI/OpenMP) technique assuming usages on massively parallel supercomputing systems. MEGADOCK displays significantly faster processing speed in the rigid-body docking process that leads to full utilization of protein tertiary structural data for large-scale and network-level problems in systems biology. Moreover, the system was scalable as shown by measurements carried out on two supercomputing environments. We then conducted prediction of biological PPI networks using the post-docking analysis.Conclusions
We present a new protein-protein docking engine aimed at exhaustive docking of mega-order numbers of protein pairs. The system was shown to be scalable by running on thousands of nodes. The software package is available at: http://www.bi.cs.titech.ac.jp/megadock/k/.15.
With the number of satellite sensors and date centers being increased continuously, it is becoming a trend to manage and process massive remote sensing data from multiple distributed sources. However, the combination of multiple satellite data centers for massive remote sensing (RS) data collaborative processing still faces many challenges. In order to reduce the huge amounts of data migration and improve the efficiency of multi-datacenter collaborative process, this paper presents the infrastructures and services of the data management as well as workflow management for massive remote sensing data production. A dynamic data scheduling strategy was employed to reduce the duplication of data request and data processing. And by combining the remote sensing spatial metadata repositories and Gfarm grid file system, the unified management of the raw data, intermediate products and final products were achieved in the co-processing. In addition, multi-level task order repositories and workflow templates were used to construct the production workflow automatically. With the help of specific heuristic scheduling rules, the production tasks were executed quickly. Ultimately, the Multi-datacenter Collaborative Process System (MDCPS) were implemented for large-scale remote sensing data production based on the effective management of data and workflow. As a consequence, the performance of MDCPS in experiments environment showed that those strategies could significantly enhance the efficiency of co-processing across multiple data centers. 相似文献
16.
Genetic interference means that the occurrence of one crossover affects the occurrence and/or location of other crossovers in its neighborhood. Of the three components of genetic interference, two are well modeled: the distribution of the number and the locations of chiasmata. For the third component, chromatid interference, there exists only one model. Its application to real data has not yet been published. A further, new model for chromatid interference is presented here. In contrast to the existing model, it is assumed that chromatid interference acts only in the neighborhood of a chiasma. The appropriateness of this model is demonstrated by its application to three sets of recombination data. Both models for chromatid interference increased fit significantly compared to assuming no chromatid interference, at least for parts of the chromosomes. Interference does not necessarily act homogeneously. After extending both models to allow for heterogeneity of chromatid interference, a further improvement in fit was achieved. 相似文献
17.
18.
Daniel Paulino René L. Warren Benjamin P. Vandervalk Anthony Raymond Shaun D. Jackman Inan? Birol 《BMC bioinformatics》2015,16(1)
Background
While next-generation sequencing technologies have made sequencing genomes faster and more affordable, deciphering the complete genome sequence of an organism remains a significant bioinformatics challenge, especially for large genomes. Low sequence coverage, repetitive elements and short read length make de novo genome assembly difficult, often resulting in sequence and/or fragment “gaps” – uncharacterized nucleotide (N) stretches of unknown or estimated lengths. Some of these gaps can be closed by re-processing latent information in the raw reads. Even though there are several tools for closing gaps, they do not easily scale up to processing billion base pair genomes.Results
Here we describe Sealer, a tool designed to close gaps within assembly scaffolds by navigating de Bruijn graphs represented by space-efficient Bloom filter data structures. We demonstrate how it scales to successfully close 50.8 % and 13.8 % of gaps in human (3 Gbp) and white spruce (20 Gbp) draft assemblies in under 30 and 27 h, respectively – a feat that is not possible with other leading tools with the breadth of data used in our study.Conclusion
Sealer is an automated finishing application that uses the succinct Bloom filter representation of a de Bruijn graph to close gaps in draft assemblies, including that of very large genomes. We expect Sealer to have broad utility for finishing genomes across the tree of life, from bacterial genomes to large plant genomes and beyond. Sealer is available for download at https://github.com/bcgsc/abyss/tree/sealer-release.Electronic supplementary material
The online version of this article (doi:10.1186/s12859-015-0663-4) contains supplementary material, which is available to authorized users. 相似文献19.
Fengying Sun;Haoyan Li;Dongqing Sun;Shaliu Fu;Lei Gu;Xin Shao;Qinqin Wang;Xin Dong;Bin Duan;Feiyang Xing;Jun Wu;Minmin Xiao;Fangqing Zhao;Jing-Dong J.Han;Qi Liu;Xiaohui Fan;Chen Li;Chenfei Wang;Tieliu Shi 《中国科学:生命科学英文版》2025,(1):5-102
Cells are the fundamental units of biological systems and exhibit unique development trajectories and molecular features. Our exploration of how the genomes orchestrate the formation and maintenance of each cell, and control the cellular phenotypes of various organismsis, is both captivating and intricate. Since the inception of the first single-cell RNA technology, technologies related to single-cell sequencing have experienced rapid advancements in recent years. These technologies have expanded horizontally to include single-cell genome, epigenome,proteome, and metabolome, while vertically, they have progressed to integrate multiple omics data and incorporate additional information such as spatial sc RNA-seq and CRISPR screening. Single-cell omics represent a groundbreaking advancement in the biomedical field,offering profound insights into the understanding of complex diseases, including cancers. Here, we comprehensively summarize recent advances in single-cell omics technologies, with a specific focus on the methodology section. This overview aims to guide researchers in selecting appropriate methods for single-cell sequencing and related data analysis. 相似文献
20.
Clustering by soft-constraint affinity propagation: applications to gene-expression data 总被引:4,自引:0,他引:4
MOTIVATION: Similarity-measure-based clustering is a crucial problem appearing throughout scientific data analysis. Recently, a powerful new algorithm called Affinity Propagation (AP) based on message-passing techniques was proposed by Frey and Dueck (2007a). In AP, each cluster is identified by a common exemplar all other data points of the same cluster refer to, and exemplars have to refer to themselves. Albeit its proved power, AP in its present form suffers from a number of drawbacks. The hard constraint of having exactly one exemplar per cluster restricts AP to classes of regularly shaped clusters, and leads to suboptimal performance, e.g. in analyzing gene expression data. RESULTS: This limitation can be overcome by relaxing the AP hard constraints. A new parameter controls the importance of the constraints compared to the aim of maximizing the overall similarity, and allows to interpolate between the simple case where each data point selects its closest neighbor as an exemplar and the original AP. The resulting soft-constraint affinity propagation (SCAP) becomes more informative, accurate and leads to more stable clustering. Even though a new a priori free parameter is introduced, the overall dependence of the algorithm on external tuning is reduced, as robustness is increased and an optimal strategy for parameter selection emerges more naturally. SCAP is tested on biological benchmark data, including in particular microarray data related to various cancer types. We show that the algorithm efficiently unveils the hierarchical cluster structure present in the data sets. Further on, it allows to extract sparse gene expression signatures for each cluster. 相似文献