期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

An efficient algorithm for DNA fragment assembly in MapReduce

Baomin Xu Jin GaoChunyan Li 《Biochemical and biophysical research communications》2012,426(3):395-398

Fragment assembly is one of the most important problems of sequence assembly. Algorithms for DNA fragment assembly using de Bruijn graph have been widely used. These algorithms require a large amount of memory and running time to build the de Bruijn graph. Another drawback of the conventional de Bruijn approach is the loss of information. To overcome these shortcomings, this paper proposes a parallel strategy to construct de Bruijin graph. Its main characteristic is to avoid the division of de Bruijin graph. A novel fragment assembly algorithm based on our parallel strategy is implemented in the MapReduce framework. The experimental results show that the parallel strategy can effectively improve the computational efficiency and remove the memory limitations of the assembly algorithm based on Euler superpath. This paper provides a useful attempt to the assembly of large-scale genome sequence using Cloud Computing. 相似文献

2.

Is bigger always better? A critical appraisal of the use of volumetric analysis in the study of the hippocampus

Timothy C. Roth II Anders Brodin Tom V. Smulders Lara D. LaDage Vladimir V. Pravosudov 《Philosophical transactions of the Royal Society of London. Series B, Biological sciences》2010,365(1542):915-931

A well-developed spatial memory is important for many animals, but appears especially important for scatter-hoarding species. Consequently, the scatter-hoarding system provides an excellent paradigm in which to study the integrative aspects of memory use within an ecological and evolutionary framework. One of the main tenets of this paradigm is that selection for enhanced spatial memory for cache locations should specialize the brain areas involved in memory. One such brain area is the hippocampus (Hp). Many studies have examined this adaptive specialization hypothesis, typically relating spatial memory to Hp volume. However, it is unclear how the volume of the Hp is related to its function for spatial memory. Thus, the goal of this article is to evaluate volume as a main measurement of the degree of morphological and physiological adaptation of the Hp as it relates to memory. We will briefly review the evidence for the specialization of memory in food-hoarding animals and discuss the philosophy behind volume as the main currency. We will then examine the problems associated with this approach, attempting to understand the advantages and limitations of using volume and discuss alternatives that might yield more specific hypotheses. Overall, there is strong evidence that the Hp is involved in the specialization of spatial memory in scatter-hoarding animals. However, volume may be only a coarse proxy for more relevant and subtle changes in the structure of the brain underlying changes in behaviour. To better understand the nature of this brain/memory relationship, we suggest focusing on more specific and relevant features of the Hp, such as the number or size of neurons, variation in connectivity depending on dendritic and axonal arborization and the number of synapses. These should generate more specific hypotheses derived from a solid theoretical background and should provide a better understanding of both neural mechanisms of memory and their evolution. 相似文献

3.

A Parallel Algorithm for Nonequilibrium Molecular Dynamics Simulation of Shear Flow on Distributed Memory Machines

David P. Hansen Denis J. Evans 《Molecular simulation》2013,39(6):375-393

Abstract

An algorithm is described which allows Nonequilibrium Molecular Dynamics (NEMD) simulations of a fluid undergoing planar Couette flow (shear flow) to be carried out on a distributed memory parallel processor using a (spatial) domain decomposition technique. Unlike previous algorithms, this algorithm uses a co-moving, or Lagrangian, simulation box. Also, the shape of the simulation box changes throughout the course of the simulation. The algorithm, which can be used for two or three dimensional systems, has been tested on a Fujitsu AP1000 Parallel computer with 128 processors. 相似文献

4.

Parallel Clustering Algorithm for Large-Scale Biological Data Sets

Minchao Wang Wu Zhang Wang Ding Dongbo Dai Huiran Zhang Hao Xie Luonan Chen Yike Guo Jiang Xie 《PloS one》2014,9(4)

Backgrounds

Recent explosion of biological data brings a great challenge for the traditional clustering algorithms. With increasing scale of data sets, much larger memory and longer runtime are required for the cluster identification problems. The affinity propagation algorithm outperforms many other classical clustering algorithms and is widely applied into the biological researches. However, the time and space complexity become a great bottleneck when handling the large-scale data sets. Moreover, the similarity matrix, whose constructing procedure takes long runtime, is required before running the affinity propagation algorithm, since the algorithm clusters data sets based on the similarities between data pairs.

Methods

Two types of parallel architectures are proposed in this paper to accelerate the similarity matrix constructing procedure and the affinity propagation algorithm. The memory-shared architecture is used to construct the similarity matrix, and the distributed system is taken for the affinity propagation algorithm, because of its large memory size and great computing capacity. An appropriate way of data partition and reduction is designed in our method, in order to minimize the global communication cost among processes.

Result

A speedup of 100 is gained with 128 cores. The runtime is reduced from serval hours to a few seconds, which indicates that parallel algorithm is capable of handling large-scale data sets effectively. The parallel affinity propagation also achieves a good performance when clustering large-scale gene data (microarray) and detecting families in large protein superfamilies. 相似文献

5.

Visual computation and saccadic eye movements: a theoretical perspective

J M Findlay 《Spatial Vision》1987,2(3):175-189

A simple instance of parallel computation in neural networks occurs when the eye orients to a novel visual target. Consideration of target-elicited saccadic eye movements opens the question of how spatial position is represented in the visual pathways involved in this response. It is argued that a point-for-point retinotopic coding of spatial position (the 'local sign' approach) is inadequate to account for the characteristics of the response. An alternative approach based on distributed coding is developed. 相似文献

6.

Reverse engineering and analysis of genome-wide gene regulatory networks from gene expression profiles using high-performance computing

Belcastro V Gregoretti F Siciliano V Santoro M D'Angelo G Oliva G di Bernardo D 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2012,9(3):668-678

Regulation of gene expression is a carefully regulated phenomenon in the cell. “Reverse-engineering” algorithms try to reconstruct the regulatory interactions among genes from genome-scale measurements of gene expression profiles (microarrays). Mammalian cells express tens of thousands of genes; hence, hundreds of gene expression profiles are necessary in order to have acceptable statistical evidence of interactions between genes. As the number of profiles to be analyzed increases, so do computational costs and memory requirements. In this work, we designed and developed a parallel computing algorithm to reverse-engineer genome-scale gene regulatory networks from thousands of gene expression profiles. The algorithm is based on computing pairwise Mutual Information between each gene-pair. We successfully tested it to reverse engineer the Mus Musculus (mouse) gene regulatory network in liver from gene expression profiles collected from a public repository. A parallel hierarchical clustering algorithm was implemented to discover “communities” within the gene network. Network communities are enriched for genes involved in the same biological functions. The inferred network was used to identify two mitochondrial proteins. 相似文献

7.

Key Message Approach to Optimize Communication of Parallel Applications on Clusters

Ming Zhu Wentong Cai Bu-Sung Lee 《Cluster computing》2003,6(3):253-265

Over the past few years, cluster/distributed computing has been gaining popularity. The proliferation of the cluster/distributed computing is due to the improved performance and increased reliability of these systems. Many parallel programming languages and related parallel programming models have become widely accepted. However, one of the major shortcomings of running parallel applications on cluster/distributed computing environments is the high communication overhead incurred. To reduce the communication overhead, and thus the completion time of a parallel application, this paper describes a simple, efficient and portable Key Message (KM) approach to support parallel computing on cluster/distributed computing environments. To demonstrate the advantage of the KM approach, a prototype runtime system has been implemented and evaluated. Our preliminary experimental results show that the KM approach has better improvement on communication of a parallel application when network background load increases or the computation to communication ratio of the application decreases. 相似文献

8.

Fingerprint matching using recurrent autoassociative memory

Poorna B Easwarakumar KS 《International journal of neural systems》2003,13(4):263-271

An efficient method for fingerprint searching using recurrent autoassociative memory is proposed. This algorithm uses recurrent autoassociative memory, which uses a connectivity matrix to find if the pattern being searched is already stored in the database. The advantage of this memory is that a big database is to be searched only if there is a matching pattern. Fingerprint comparison is usually based on minutiae matching, and its efficiency depends on the extraction of minutiae. This process may reduce the speed, when large amount of data is involved. So, in the proposed method, a simple approach has been adopted, wherein first determines the closely matched fingerprint images, and then determines the minutiae of only those images for finding the more appropriate one. The gray level value of pixels along with its neighboring ones are considered for the extraction of minutiae, which is more easier than using ridge information. This approach is best suitable when database size is large. 相似文献

9.

A Parallel Distributed-Memory Particle Method Enables Acquisition-Rate Segmentation of Large Fluorescence Microscopy Images

Yaser Afshar Ivo F. Sbalzarini 《PloS one》2016,11(4)

Modern fluorescence microscopy modalities, such as light-sheet microscopy, are capable of acquiring large three-dimensional images at high data rate. This creates a bottleneck in computational processing and analysis of the acquired images, as the rate of acquisition outpaces the speed of processing. Moreover, images can be so large that they do not fit the main memory of a single computer. We address both issues by developing a distributed parallel algorithm for segmentation of large fluorescence microscopy images. The method is based on the versatile Discrete Region Competition algorithm, which has previously proven useful in microscopy image segmentation. The present distributed implementation decomposes the input image into smaller sub-images that are distributed across multiple computers. Using network communication, the computers orchestrate the collectively solving of the global segmentation problem. This not only enables segmentation of large images (we test images of up to 10¹⁰ pixels), but also accelerates segmentation to match the time scale of image acquisition. Such acquisition-rate image segmentation is a prerequisite for the smart microscopes of the future and enables online data compression and interactive experiments. 相似文献

10.

Distributed Function Mining for Gene Expression Programming Based on Fast Reduction

Song Deng Dong Yue Le-chan Yang Xiong Fu Ya-zhou Feng 《PloS one》2016,11(1)

For high-dimensional and massive data sets, traditional centralized gene expression programming (GEP) or improved algorithms lead to increased run-time and decreased prediction accuracy. To solve this problem, this paper proposes a new improved algorithm called distributed function mining for gene expression programming based on fast reduction (DFMGEP-FR). In DFMGEP-FR, fast attribution reduction in binary search algorithms (FAR-BSA) is proposed to quickly find the optimal attribution set, and the function consistency replacement algorithm is given to solve integration of the local function model. Thorough comparative experiments for DFMGEP-FR, centralized GEP and the parallel gene expression programming algorithm based on simulated annealing (parallel GEPSA) are included in this paper. For the waveform, mushroom, connect-4 and musk datasets, the comparative results show that the average time-consumption of DFMGEP-FR drops by 89.09%%, 88.85%, 85.79% and 93.06%, respectively, in contrast to centralized GEP and by 12.5%, 8.42%, 9.62% and 13.75%, respectively, compared with parallel GEPSA. Six well-studied UCI test data sets demonstrate the efficiency and capability of our proposed DFMGEP-FR algorithm for distributed function mining. 相似文献

11.

The Text-mining based PubChem Bioassay neighboring analysis

Lianyi Han Tugba O Suzek Yanli Wang Steve H Bryant 《BMC bioinformatics》2010,11(1):549

相似文献

12.

Estimation of parameters of allometric equations.

M Valente P Murino T De Leo A Amato 《Bollettino della Società italiana di biologia sperimentale》1992,68(12):721-734

Accurate parameter estimation of allometric equations is a question of considerable interest. Various techniques that address this problem exist. In this paper it is assumed that the measured values are normally distributed and a maximum likelihood estimation approach is used. The computations involved in this procedure are reducible to relatively simple forms, and an efficient numerical algorithm is used. A listing of the computer program is included as an appendix. 相似文献

13.

Natively unstructured regions in proteins identified from contact predictions 总被引：4，自引：0，他引：4

Schlessinger A Punta M Rost B 《Bioinformatics (Oxford, England)》2007,23(18):2376-2384

MOTIVATION: Natively unstructured (also dubbed intrinsically disordered) regions in proteins lack a defined 3D structure under physiological conditions and often adopt regular structures under particular conditions. Proteins with such regions are overly abundant in eukaryotes, they may increase functional complexity of organisms and they usually evade structure determination in the unbound form. Low propensity for the formation of internal residue contacts has been previously used to predict natively unstructured regions. RESULTS: We combined PROFcon predictions for protein-specific contacts with a generic pairwise potential to predict unstructured regions. This novel method, Ucon, outperformed the best available methods in predicting proteins with long unstructured regions. Furthermore, Ucon correctly identified cases missed by other methods. By computing the difference between predictions based on specific contacts (approach introduced here) and those based on generic potentials (realized in other methods), we might identify unstructured regions that are involved in protein-protein binding. We discussed one example to illustrate this ambitious aim. Overall, Ucon added quality and an orthogonal aspect that may help in the experimental study of unstructured regions in network hubs. AVAILABILITY: http://www.predictprotein.org/submit_ucon.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. 相似文献

14.

The effects of different mesh generation methods on computational fluid dynamic analysis and power loss assessment in total cavopulmonary connection 总被引：1，自引：0，他引：1

Liu Y Pekkan K Jones SC Yoganathan AP 《Journal of biomechanical engineering》2004,126(5):594-603

The flow field and energetic efficiency of total cavopulmonary connection (TCPC) models have been studied by both in vitro experiment and computational fluid dynamics (CFD). All the previous CFD studies have employed the structured mesh generation method to create the TCPC simulation model. In this study, a realistic TCPC model with complete anatomical features was numerically simulated using both structured and unstructured mesh generation methods. The flow fields and energy losses were compared in these two meshes. Two different energy loss calculation methods, the control volume and viscous dissipation methods, were investigated. The energy losses were also compared to the in vitro experimental results. The results demonstrated that: (1) the flow fields in the structured model were qualitatively similar to the unstructured model; (2) more vortices were present in the structured model than in the unstructured model; (3) both models had the least energy loss when flow was equally distributed to the left and right pulmonary arteries, while high losses occurred for extreme pulmonary arterial flow splits; (4) the energy loss results calculated using the same method were significantly different for different meshes; and (5) the energy loss results calculated using different methods were significantly different for the same mesh. 相似文献

15.

Modular implementation of dynamic algorithm switching in parallel simulations

Pilsung Kang 《Cluster computing》2012,15(3):321-332

We present a modular approach to implementing dynamic algorithm switching for parallel scientific software. By using a compositional framework based on function call interception techniques, our proposed method transparently integrates algorithm switching code with a given program without directly modifying the original code structure. Through fine-grained control of algorithmic behavior of an application at the level of functions, our approach supports design and implementation of application-specific switching scenarios in a modular way. Our approach encourages algorithm switching to dynamically perform at the loop end of a parallel simulation, where cooperating processes in concurrent execution typically synchronize and intermediate computation results are consistent. In this way, newly added switching operations do not cause race conditions that may produce unreliable computation results in parallel simulations. By applying our method to a real-world scientific application and adapting its algorithmic behavior to the properties of input problems, we demonstrate the applicability and effectiveness of our approach to constructing efficient parallel simulations. 相似文献

16.

Scalable and highly parallel implementation of Smith-Waterman on graphics processing unit using CUDA 总被引：1，自引：0，他引：1

Ali Akoglu Gregory M. Striemer 《Cluster computing》2009,12(3):341-352

Program development environments have enabled graphics processing units (GPUs) to become an attractive high performance computing platform for the scientific community. A commonly posed problem in computational biology is protein database searching for functional similarities. The most accurate algorithm for sequence alignments is Smith-Waterman (SW). However, due to its computational complexity and rapidly increasing database sizes, the process becomes more and more time consuming making cluster based systems more desirable. Therefore, scalable and highly parallel methods are necessary to make SW a viable solution for life science researchers. In this paper we evaluate how SW fits onto the target GPU architecture by exploring ways to map the program architecture on the processor architecture. We develop new techniques to reduce the memory footprint of the application while exploiting the memory hierarchy of the GPU. With this implementation, GSW, we overcome the on chip memory size constraint, achieving 23× speedup compared to a serial implementation. Results show that as the query length increases our speedup almost stays stable indicating the solid scalability of our approach. Additionally this is a first of a kind implementation which purely runs on the GPU instead of a CPU-GPU integrated environment, making our design suitable for porting onto a cluster of GPUs. 相似文献

17.

On the parallelisation of bioinformatics applications

Trelles O 《Briefings in bioinformatics》2001,2(2):181-194

This paper surveys the computational strategies followed to parallelise the most used software in the bioinformatics arena. The studied algorithms are computationally expensive and their computational patterns range from regular, such as database-searching applications, to very irregularly structured patterns (phylogenetic trees). Fine- and coarse-grained parallel strategies are discussed for these very diverse sets of applications. This overview outlines computational issues related to parallelism, physical machine models, parallel programming approaches and scheduling strategies for a broad range of computer architectures. In particular, it deals with shared, distributed and shared/distributed memory architectures. 相似文献

18.

Parallel hash-based EST clustering algorithm for gene sequencing 总被引：2，自引：0，他引：2

Mudhireddy R Ercal F Frank R 《DNA and cell biology》2004,23(10):615-623

EST clustering is a simple, yet effective method to discover all the genes present in a variety of species. Although using ESTs is a cost-effective approach in gene discovery, the amount of data, and hence the computational resources required, make it a very challenging problem. Time and storage requirements for EST clustering problems are prohibitively expensive. Existing tools have quadratic time complexity resulting from all against all sequence comparisons. With the rapid growth of EST data we need better and faster clustering tools. In this paper, we present HECT (Hash based EST Clustering Tool), a novel time- and memory-efficient algorithm for EST clustering. We report that HECT can cluster a 10,000 Human EST dataset (which is also used in benchmarking d2_cluster), in 207 minutes on a 1 GHz Pentium III processor which is 36 times faster than the original d2_cluster algorithm. A parallel version of HECT (PECT) is also developed and used to cluster 269,035 soybean EST sequences on IA-32 Linux cluster at National Center for Supercomputing Applications at UIUC. The parallel algorithm exhibited excellent speedup over its sequential counterpart and its memory requirements are almost negligible making it suitable to run virtually on any data size. The performance of the proposed clustering algorithms is compared against other known clustering techniques and results are reported in the paper. 相似文献

19.

The search for a hippocampal engram

Mark Mayford 《Philosophical transactions of the Royal Society of London. Series B, Biological sciences》2014,369(1633)

Understanding the molecular and cellular changes that underlie memory, the engram, requires the identification, isolation and manipulation of the neurons involved. This presents a major difficulty for complex forms of memory, for example hippocampus-dependent declarative memory, where the participating neurons are likely to be sparse, anatomically distributed and unique to each individual brain and learning event. In this paper, I discuss several new approaches to this problem. In vivo calcium imaging techniques provide a means of assessing the activity patterns of large numbers of neurons over long periods of time with precise anatomical identification. This provides important insight into how the brain represents complex information and how this is altered with learning. The development of techniques for the genetic modification of neural ensembles based on their natural, sensory-evoked, activity along with optogenetics allows direct tests of the coding function of these ensembles. These approaches provide a new methodological framework in which to examine the mechanisms of complex forms of learning at the level of the neurons involved in a specific memory. 相似文献

20.

Speed improvements of peptide-spectrum matching using single-instruction multiple-data instructions

Zhang J McQuillan I Wu FX 《Proteomics》2011,11(19):3779-3785

Peptide-spectrum matching is one of the most time-consuming portion of the database search method for assignment of tandem mass spectra to peptides. In this study, we develop a parallel algorithm for peptide-spectrum matching using Single-Instruction Multiple Data (SIMD) instructions. Unlike other parallel algorithms in peptide-spectrum matching, our algorithm parallelizes the computation of matches between a single spectrum and a given peptide sequence from the database. It also significantly reduces the number of comparison operations. Extra improvements are obtained by using SIMD instructions to avoid conditional branches and unnecessary memory access within the algorithm. The implementation of the developed algorithm is based on the Streaming SIMD Extensions technology that is embedded in most Intel microprocessors. Similar technology also exists in other modern microprocessors. A simulation shows that the developed algorithm achieves an 18-fold speedup over the previous version of Real-Time Peptide-Spectrum Matching algorithm [F. X. Wu et al., Rapid Commun. Mass Sepctrom. 2006, 20, 1199-1208]. Therefore, the developed algorithm can be employed to develop real-time control methods for MS/MS. 相似文献