首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
ClustalW-MPI: ClustalW analysis using distributed and parallel computing   总被引:7,自引:0,他引:7  
ClustalW is a tool for aligning multiple protein or nucleotide sequences. The alignment is achieved via three steps: pairwise alignment, guide-tree generation and progressive alignment. ClustalW-MPI is a distributed and parallel implementation of ClustalW. All three steps have been parallelized to reduce the execution time. The software uses a message-passing library called MPI (Message Passing Interface) and runs on distributed workstation clusters as well as on traditional parallel computers.  相似文献   

2.
Cluster, consisting of a group of computers, is to act as a whole system to provide users with computer resources. Each computer is a node of this cluster. Cluster computer refers to a system consisting of a complete set of computers connected to each other. With the rapid development of computer technology, cluster computing technique with high performance–cost ratio has been widely applied in distributed parallel computing. For the large-scale close data in group enterprise, a heterogeneous data integration model was built under cluster environment based on cluster computing, XML technology and ontology theory. Such model could provide users unified and transparent access interfaces. Based on cluster computing, the work has solved the heterogeneous data integration problems by means of Ontology and XML technology. Furthermore, good application effect has been achieved compared with traditional data integration model. Furthermore, it was proved that this model improved the computing capacity of system, with high performance–cost ratio. Thus, it is hoped to provide support for decision-making of enterprise managers.  相似文献   

3.
Heterogeneous networked clusters are being increasingly used as platforms for resource-intensive parallel and distributed applications. The fundamental underlying idea is to provide large amounts of processing capacity over extended periods of time by harnessing the idle and available resources on the network in an opportunistic manner. In this paper we present the design, implementation and evaluation of a framework that uses JavaSpaces to support this type of opportunistic adaptive parallel/distributed computing over networked clusters in a non-intrusive manner. The framework targets applications exhibiting coarse grained parallelism and has three key features: (1) portability across heterogeneous platforms, (2) minimal configuration overheads for participating nodes, and (3) automated system state monitoring (using SNMP) to ensure non-intrusive behavior. Experimental results presented in this paper demonstrate that for applications that can be broken into coarse-grained, relatively independent tasks, the opportunistic adaptive parallel computing framework can provide performance gains. Furthermore, the results indicate that monitoring and reacting to the current system state minimizes the intrusiveness of the framework.  相似文献   

4.
We propose a novel parallel computing framework for a nonlinear finite element method (FEM)-based cell model and apply it to simulate avascular tumor growth. We derive computation formulas to simplify the simulation and design the basic algorithms. With the increment of the proliferation generations of tumor cells, the FEM elements may become larger and more distorted. Then, we describe a remesh and refinement processing of the distorted or over large finite elements and the parallel implementation based on Message Passing Interface to improve the accuracy and efficiency of the simulation. We demonstrate the feasibility and effectiveness of the FEM model and the parallelization methods in simulations of early tumor growth.  相似文献   

5.
6.
MOSIX is a cluster management system that supports preemptive process migration. This paper presents the MOSIX Direct File System Access (DFSA), a provision that can improve the performance of cluster file systems by allowing a migrated process to directly access files in its current location. This capability, when combined with an appropriate file system, could substantially increase the I/O performance and reduce the network congestion by migrating an I/O intensive process to a file server rather than the traditional way of bringing the file's data to the process. DFSA is suitable for clusters that manage a pool of shared disks among multiple machines. With DFSA, it is possible to migrate parallel processes from a client node to file servers for parallel access to different files. Any consistent file system can be adjusted to work with DFSA. To test its performance, we developed the MOSIX File-System (MFS) which allows consistent parallel operations on different files. The paper describes DFSA and presents the performance of MFS with and without DFSA.  相似文献   

7.
SUMMARY: TREE-PUZZLE is a program package for quartet-based maximum-likelihood phylogenetic analysis (formerly PUZZLE, Strimmer and von Haeseler, Mol. Biol. Evol., 13, 964-969, 1996) that provides methods for reconstruction, comparison, and testing of trees and models on DNA as well as protein sequences. To reduce waiting time for larger datasets the tree reconstruction part of the software has been parallelized using message passing that runs on clusters of workstations as well as parallel computers. AVAILABILITY: http://www.tree-puzzle.de. The program is written in ANSI C. TREE-PUZZLE can be run on UNIX, Windows and Mac systems, including Mac OS X. To run the parallel version of PUZZLE, a Message Passing Interface (MPI) library has to be installed on the system. Free MPI implementations are available on the Web (cf. http://www.lam-mpi.org/mpi/implementations/).  相似文献   

8.
We present a framework to design efficient and portable HPF applications which exploit a mixture of task and data parallelism. According to the framework proposed, data parallelism is restricted within HPF modules, and task parallelism is achieved by the concurrent execution of several data-parallel modules cooperating through COLTHPF, a coordination layer implemented on top of PVM. COLTHPF can be used independently of the HPF compilation system exploited, and it allows instances of cooperating HPF tasks to be created either statically or at run-time. We claim that COLTHPF can be exploited by means of a simple skeleton-based coordination language and associated compiler to easily express mixed data and task parallel applications runnable on either multicomputers or cluster of workstations. We used a physics application as a test case of our approach for mixing task and data parallelism, and we present the results of several experiments conducted on a cluster of Linux SMPs. This revised version was published online in July 2006 with corrections to the Cover Date.  相似文献   

9.
Many research institutions are deploying computing clusters based on a shared/buy-in paradigm. Such clusters combine shared computers, which are free to be used by all users, and buy-in computers, which are computers purchased by users for semi-exclusive use. The purpose of this paper is to characterize the typical behavior and performance of a shared/buy-in computing cluster, using data traces from the Shared Computing Cluster (SCC) at Boston University that runs under this paradigm as a case study. Among our main findings, we show that the semi-exclusive policy, which allows any SCC user to use idle buy-in resources for a limited time, increases the utilization of buy-in resources by 17.4%, thus significantly improving the performance of the system as a whole. We find that jobs allowed to run on idle buy-in resources arrive more frequently and run for a shorter time than other jobs. Finally, we identify the run time limit (i.e., the maximum time during which a job is allowed to use resources) and the type of parallel environment as two factors that have a significant impact on the different performance experienced by shared and buy-in jobs.  相似文献   

10.

Background

Use of missing genotype imputations and haplotype reconstructions are valuable in genome-wide association studies (GWASs). By modeling the patterns of linkage disequilibrium in a reference panel, genotypes not directly measured in the study samples can be imputed and used for GWASs. Since millions of single nucleotide polymorphisms need to be imputed in a GWAS, faster methods for genotype imputation and haplotype reconstruction are required.

Results

We developed a program package for parallel computation of genotype imputation and haplotype reconstruction. Our program package, ParaHaplo 3.0, is intended for use in workstation clusters using the Intel Message Passing Interface. We compared the performance of ParaHaplo 3.0 on the Japanese in Tokyo, Japan and Han Chinese in Beijing, and Chinese in the HapMap dataset. A parallel version of ParaHaplo 3.0 can conduct genotype imputation 20 times faster than a non-parallel version of ParaHaplo.

Conclusions

ParaHaplo 3.0 is an invaluable tool for conducting haplotype-based GWASs. The need for faster genotype imputation and haplotype reconstruction using parallel computing will become increasingly important as the data sizes of such projects continue to increase. ParaHaplo executable binaries and program sources are available at http://en.sourceforge.jp/projects/parallelgwas/releases/.  相似文献   

11.
Over the past few years, cluster/distributed computing has been gaining popularity. The proliferation of the cluster/distributed computing is due to the improved performance and increased reliability of these systems. Many parallel programming languages and related parallel programming models have become widely accepted. However, one of the major shortcomings of running parallel applications on cluster/distributed computing environments is the high communication overhead incurred. To reduce the communication overhead, and thus the completion time of a parallel application, this paper describes a simple, efficient and portable Key Message (KM) approach to support parallel computing on cluster/distributed computing environments. To demonstrate the advantage of the KM approach, a prototype runtime system has been implemented and evaluated. Our preliminary experimental results show that the KM approach has better improvement on communication of a parallel application when network background load increases or the computation to communication ratio of the application decreases.  相似文献   

12.
Workstation clusters are emerging as a general-purpose computing platform for the execution of workloads comprising parallel and sequential applications. The scalability and flexibility typical of implicit coscheduling strategies makes them a very promising solution to the scheduling needs of workstation clusters. In this paper we present a simulation study that compares, for a variety of workloads (that include both parallel and sequential applications) and operating system schedulers, 12 implicit coscheduling strategies in terms of the performance they are able to deliver to applications. By using a detailed simulator, we evaluate the performance of different coscheduling alternatives for a variety of simulation scenarios, and we identify the set of strategies that deliver the best performance to all the applications composing typical cluster workloads. Moreover, we show that for schedulers providing immediate preemption, the best strategies are also the simplest ones to implement.  相似文献   

13.
We describe a system for creating personal clusters in user-space to support the submission and management of thousands of compute-intensive serial jobs to the network-connected compute resources on the NSF TeraGrid. The system implements a robust infrastructure that submits and manages job proxies across a distributed computing environment. These job proxies contribute resources to personal clusters created dynamically for a user on-demand. The personal clusters then adapt to the prevailing job load conditions at the distributed sites by migrating job proxies to sites expected to provide resources more quickly. Furthermore, the system allows multiple instances of these personal clusters to be created as containers for individual scientific experiments, allowing the submission environment to be customized for each instance. The version of the system described in this paper allows users to build large personal Condor and Sun Grid Engine clusters on the TeraGrid. Users then manage their scientific jobs, within each personal cluster, with a single uniform interface using the feature-rich functionality found in these job management environments.
Evan L. TurnerEmail:
  相似文献   

14.
Cloud computing environment came about in order to effectively manage and use enormous amount of data that have become available with the development of the Internet. Cloud computing service is widely used not only to manage the users’ IT resources, but also to use enterprise IT resources in an effective manner. Various security threats have occurred while using cloud computing and plans for reaction are much needed, since they will eventually elevate to security threats to enterprise information. Plans to strengthen the security of enterprise information by using cloud security will be proposed in this research. These cloud computing security measures must be supported by the governmental policies. Publications on guidelines to information protection will raise awareness among the users and service providers. System of reaction must be created in order to constantly monitor and to promptly respond to any security accident. Therefore, both technical countermeasures and governmental policy must be supported at the same time. Cloud computing service is expanding more than ever, thus active research on cloud computing security is expected.  相似文献   

15.
Recently, software distributed shared memory systems have successfully provided an easy user interface to parallel user applications on distributed systems. In order to prompt program performance, most of DSM systems usually were greedy to utilize all of available processors in a computer network to execute user programs. However, using more processors to execute programs cannot necessarily guarantee to obtain better program performance. The overhead of paralleling programs is increased by the addition in the number of processors used for program execution. If the performance gain from program parallel cannot compensate for the overhead, increasing the number of execution processors will result in performance degradation and resource waste. In this paper, we proposed a mechanism to dynamically find a suitable system scale to optimize performance for DSM applications according to run-time information. The experimental results show that the proposed mechanism can precisely predict the processor number that will result in the best performance and then effectively optimize the performance of the test applications by adapting system scale according to the predicted result. Yi-Chang Zhuang received his B.S., M.S. and Ph.D. degrees in electrical engineering from National Cheng Kung University in 1995, 1997, and 2004. He is currently working as an engineer at Industrial Technology Research Institute in Taiwan. His research interests include object-based storage, file systems, distributed systems, and grid computing. Jyh-Biau Chang is currently an assistant professor at the Information Management Department of Leader University in Taiwan. He received his B.S., M.S. and Ph.D. degrees from Electrical Engineering Department of National Cheng Kung University in 1994, 1996, and 2005. His research interest is focused on cluster and grid computing, parallel and distributed system, and operating system. Tyng-Yeu Liang is currently an assistant professor who teaches and studies at Department of Electrical Engineering, National Kaohsiung University of Applied Sciences in Taiwan. He received his B.S., M.S. and Ph.D. degrees from National Cheng Kung University in 1992, 1994, and 2000. His study is interested in cluster and grid computing, image processing and multimedia. Ce-Kuen Shieh currently is a professor at the Electrical Engineering Department of National Cheng Kung University in Taiwan. He is also the chief of computation center at National Cheng Kung University. He received his Ph.D. degree from the Department of Electrical Engineering of National Cheng Kung University in 1988. He was the chairman of the Electrical Engineering Department of National Cheng Kung University from 2002 to 2005. His research interest is focused on computer network, and parallel and distributed system. Laurence T. Yang is a professor at the Department of Computer Science, St. Francis Xavier University, Canada. His research includes high performance computing and networking, embedded systems, ubiquitous/pervasive computing and intelligence, and autonomic and trusted computing.  相似文献   

16.
I/O bottlenecks are already a problem in many large-scale applications that manipulate huge datasets. This problem is expected to get worse as applications get larger, and the I/O subsystem performance lags behind processor and memory speed improvements. At the same time, off-the-shelf clusters of workstations are becoming a popular platform for demanding applications due to their cost-effectiveness and widespread deployment. Caching I/O blocks is one effective way of alleviating disk latencies, and there can be multiple levels of caching on a cluster of workstations. Previous studies have shown the benefits of caching—whether it be local to a particular node, or a shared global cache across the cluster—for certain applications. However, we show that while caching is useful in some situations, it can hurt performance if we are not careful about what to cache and when to bypass the cache. This paper presents compilation techniques and runtime support to address this problem. These techniques are implemented and evaluated on an experimental Linux/Pentium cluster running a parallel file system. Our results using a diverse set of applications (scientific and commercial) demonstrate the benefits of a discretionary approach to caching for I/O subsystems on clusters, providing as much as 48% savings in overall execution time over indiscriminately caching everything in some applications. Parts of this paper have appeared in the Proceedings of the 3rd IEEE/ACM Symposium on Cluster Computing and the Grid (CCGrid'03). This paper is an extension of these prior results, and includes a more extensive performance evaluation. Murali Vilayannur is a Ph.D. student in the Department of Computer Science and Engineering at The Pennsylvania State University. His research interests are in High-Performance Parallel I/O, File Systems, Virtual Memory Algorithms and Operating Systems. Anand Sivasubramaniam received his B.Tech. in Computer Science from the Indian Institute of Technology, Madras, in 1989, and the M.S. and Ph.D. degrees in Computer Science from the Georgia Institute of Technology in 1991 and 1995 respectively. He has been on the faculty at The Pennsylvania State University since Fall 1995 where he is currently an Associate Professor. Anand's research interests are in computer architecture, operating systems, performance evaluation, and applications for both high performance computer systems and embedded systems. Anand's research has been funded by NSF through several grants, including the CAREER award, and from industries including IBM, Microsoft and Unisys Corp. He has several publications in leading journals and conferences, and is on the editorial board of IEEE Transactions on Computers and IEEE Transactions on Parallel and Distributed Systems. He is a recipient of the 2002 IBM Faculty Award. Anand is a member of the IEEE, IEEE Computer Society, and ACM. Mahmut Kandemir received the B.Sc. and M.Sc. degrees in control and computer engineering from Istanbul Technical University, Istanbul, Turkey, in 1988 and 1992, respectively. He received the Ph.D. from Syracuse University, Syracuse, New York in electrical engineering and computer science, in 1999. He has been an assistant professor in the Computer Science and Engineering Department at the Pennsylvania State University since August 1999. His main research interests are optimizing compilers, I/O intensive applications, and power-aware computing. He is a member of the IEEE and the ACM. Rajeev Thakur is a Computer Scientist in the Mathematics and Computer Science Division at Argonne National Laboratory. He received a B.E. from the University of Bombay, India, in 1990, M.S. from Syracuse University in 1992, and Ph.D. from Syracuse University in 1995, all in computer engineering. His research interests are in the area of high-performance computing in general and high-performance networking and I/O in particular. He was a member of the MPI Forum and participated actively in the definition of the I/O part of the MPI-2 standard. He is the author of a widely used, portable implementation of MPI-IO, called ROMIO. He is also a co-author of the book “Using MPI-2: Advanced Features of the Message Passing Interface” published by MIT Press. Robert Ross received his Ph.D. in Computer Engineering from Clemson University in 2000. He is now an Assistant Scientist in the Mathematics and Computer Science Division at Argonne National Laboratory. His research interests are in message passing and storage systems for high performance computing environments. He is the primary author and lead developer for the Parallel Virtual File System (PVFS), a parallel file system for Linux clusters. Current projects include the ROMIO MPI-IO implementation, PVFS, PVFS2, and the MPICH2 implementation of the MPI message passing interface.  相似文献   

17.
The increases in multi-core processor parallelism and in the flexibility of many-core accelerator processors, such as GPUs, have turned traditional SMP systems into hierarchical, heterogeneous computing environments. Fully exploiting these improvements in parallel system design remains an open problem. Moreover, most of the current tools for the development of parallel applications for hierarchical systems concentrate on the use of only a single processor type (e.g., accelerators) and do not coordinate several heterogeneous processors. Here, we show that making use of all of the heterogeneous computing resources can significantly improve application performance. Our approach, which consists of optimizing applications at run-time by efficiently coordinating application task execution on all available processing units is evaluated in the context of replicated dataflow applications. The proposed techniques were developed and implemented in an integrated run-time system targeting both intra- and inter-node parallelism. The experimental results with a real-world complex biomedical application show that our approach nearly doubles the performance of the GPU-only implementation on a distributed heterogeneous accelerator cluster.  相似文献   

18.
Mainstream computing equipment and the advent of affordable multi-Gigabit communication technology permit us to address data acquisition and processing problems with clusters of COTS machinery. Such networks typically contain heterogeneous platforms, real-time partitions and even custom devices. Vital overall system requirements are high efficiency and flexibility. In preceding projects we experienced the difficulties to meet both requirements at once. Intelligent I/O (I2O) is an industry specification that defines a uniform messaging format and execution environment for hardware and operating system independent device drivers in systems with processor based communication equipment. Mapping this concept to a distributed computing environment and encapsulating the details of the specification into an application-programming framework allow us to provide architectural support for (i) efficient and (ii) extensible cluster operation. This paper portrays our view of applying I2O to high-performance clusters. We demonstrate the feasibility of this approach and report on the efficiency of our XDAQ software framework for distributed data acquisition systems.  相似文献   

19.
Clusters of Symmetrical Multiprocessors (SMPs) have recently become the norm for high-performance economical computing solutions. Multiple nodes in a cluster can be used for parallel programming using a message passing library. An alternate approach is to use a software Distributed Shared Memory (DSM) to provide a view of shared memory to the application programmer. This paper describes Strings, a high performance distributed shared memory system designed for such SMP clusters. The distinguishing feature of this system is the use of a fully multi-threaded runtime system, using kernel level threads. Strings allows multiple application threads to be run on each node in a cluster. Since most modern UNIX systems can multiplex these threads on kernel level light weight processes, applications written using Strings can exploit multiple processors on a SMP machine. This paper describes some of the architectural details of the system and illustrates the performance improvements with benchmark programs from the SPLASH-2 suite, some computational kernels as well as a full fledged application. It is found that using multiple processes on SMP nodes provides good speedups only for a few of the programs. Multiple application threads can improve the performance in some cases, but other programs show a slowdown. If kernel threads are used additionally, the overall performance improves significantly in all programs tested. Other design decisions also have a beneficial impact, though to a lesser degree. This revised version was published online in July 2006 with corrections to the Cover Date.  相似文献   

20.
Yin  Fei  Shi  Feng 《Cluster computing》2022,25(4):2601-2611

With the rapid development of network technology and parallel computing, clusters formed by connecting a large number of PCs with high-speed networks have gradually replaced the status of supercomputers in scientific research and production and high-performance computing with cost-effective advantages. The research purpose of this paper is to integrate the Kriging proxy model method and energy efficiency modeling method into a cluster optimization algorithm of CPU and GPU hybrid architecture. This paper proposes a parallel computing model for large-scale CPU/GPU heterogeneous high-performance computing systems, which can effectively describe the computing capabilities and various communication behaviors of CPU/GPU heterogeneous systems, and finally provide algorithm optimization for CPU/GPU heterogeneous clusters. According to the GPU architecture, an efficient method of constructing a Kriging proxy model and an optimized search algorithm are designed. The experimental results in this paper show that the construction of the Kriging proxy model can obtain a 220 times speedup ratio, and the search algorithm can reach an 8 times speedup ratio. It can be seen that this heterogeneous cluster optimization algorithm has high feasibility.

  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号