首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Iterative reconstruction algorithms are becoming increasingly important in electron tomography of biological samples. These algorithms, however, impose major computational demands. Parallelization must be employed to maintain acceptable running times. Graphics Processing Units (GPUs) have been demonstrated to be highly cost-effective for carrying out these computations with a high degree of parallelism. In a recent paper by Xu et al. (2010), a GPU implementation strategy was presented that obtains a speedup of an order of magnitude over a previously proposed GPU-based electron tomography implementation. In this technical note, we demonstrate that by making alternative design decisions in the GPU implementation, an additional speedup can be obtained, again of an order of magnitude. By carefully considering memory access locality when dividing the workload among blocks of threads, the GPU’s cache is used more efficiently, making more effective use of the available memory bandwidth.  相似文献   

2.
The Graphics Processing Unit (GPU) originally designed for rendering graphics and which is difficult to program for other tasks, has since evolved into a device suitable for general-purpose computations. As a result graphics hardware has become progressively more attractive yielding unprecedented performance at a relatively low cost. Thus, it is the ideal candidate to accelerate a wide variety of data parallel tasks in many fields such as in Machine Learning (ML). As problems become more and more demanding, parallel implementations of learning algorithms are crucial for a useful application. In particular, the implementation of Neural Networks (NNs) in GPUs can significantly reduce the long training times during the learning process. In this paper we present a GPU parallel implementation of the Back-Propagation (BP) and Multiple Back-Propagation (MBP) algorithms, and describe the GPU kernels needed for this task. The results obtained on well-known benchmarks show faster training times and improved performances as compared to the implementation in traditional hardware, due to maximized floating-point throughput and memory bandwidth. Moreover, a preliminary GPU based Autonomous Training System (ATS) is developed which aims at automatically finding high-quality NNs-based solutions for a given problem.  相似文献   

3.
An effective approach termed Recursive Gaussian Maximum Likelihood Estimation (RGMLE) is developed in this paper to suppress 2-D impulse noise. And two algorithms termed RGMLE-C and RGMLE-CS are derived by using spatially-adaptive variances, which are respectively estimated based on certainty and joint certainty & similarity information. To give reliable implementation of RGMLE-C and RGMLE-CS algorithms, a novel recursion stopping strategy is proposed by evaluating the estimation error of uncorrupted pixels. Numerical experiments on different noise densities show that the proposed two algorithms can lead to significantly better results than some typical median type filters. Efficient implementation is also realized via GPU (Graphic Processing Unit)-based parallelization techniques.  相似文献   

4.
Graphics processing unit (GPU) is becoming a powerful computational tool in science and engineering. In this paper, different from previous molecular dynamics (MD) simulation with pair potentials and many-body potentials, two MD simulation algorithms implemented on a single GPU are presented to describe a special category of many-body potentials – bond order potentials used frequently in solid covalent materials, such as the Tersoff potentials for silicon crystals. The simulation results reveal that the performance of GPU implementations is apparently superior to their CPU counterpart. Furthermore, the proposed algorithms are generalised, transferable and scalable, and can be extended to the simulations with general many-body interactions such as Stillinger–Weber potential and so on.  相似文献   

5.
Neuroimage registration is crucial for brain morphometric analysis and treatment efficacy evaluation. However, existing advanced registration algorithms such as FLIRT and ANTs are not efficient enough for clinical use. In this paper, a GPU implementation of FLIRT with the correlation ratio (CR) as the similarity metric and a GPU accelerated correlation coefficient (CC) calculation for the symmetric diffeomorphic registration of ANTs have been developed. The comparison with their corresponding original tools shows that our accelerated algorithms can greatly outperform the original algorithm in terms of computational efficiency. This paper demonstrates the great potential of applying these registration tools in clinical applications.  相似文献   

6.
Previous univariate studies of the fly Sepsis cynipsea (Diptera: Sepsidae) have demonstrated spatiotemporally variable and consequently overall weak sexual selection favouring large male size, which is nevertheless stronger on average than fecundity selection favouring larger females. To identify specific target(s) of selection on body size and additional traits possibly affecting mating success, two multivariate field studies of sexual selection were conducted. In one study using seasonal replicates from three populations, we assessed 15 morphological traits. No clear targets of sexual selection on male size could be detected, perhaps because spatiotemporal variation in selection was again strong. In particular, there was no (current) selection on male abdomen length or fore coxa length, the only traits for which S. cynipsea males are not smaller than females. Interestingly, copulating males had a consistently shorter fore femur base, a secondary sexual trait, and a wider clasper (hypopygium) gap, an external genital trait. In a second study using daily and seasonal replicates from one population, we included physiological measures of energy reserves (lipids, glucose, glycogen), in addition to hind tibia length and fluctuating asymmetry (FA) of all pairs of legs. This study again confirmed the mating advantage of large males, and additionally suggests independent positive influences of lipids (the long-term energy stores), with effects of glucose and glycogen (the short-term energy stores) tending to be negative. FA of paired traits was not associated with male mating success. Our study suggests that inclusion of physiological measures and genital traits in phenomenological studies of selection, which is rare, would be fruitful in other species.  相似文献   

7.
Multivariate linear models are increasingly important in quantitative genetics. In high dimensional specifications, factor analysis (FA) may provide an avenue for structuring (co)variance matrices, thus reducing the number of parameters needed for describing (co)dispersion. We describe how FA can be used to model genetic effects in the context of a multivariate linear mixed model. An orthogonal common factor structure is used to model genetic effects under Gaussian assumption, so that the marginal likelihood is multivariate normal with a structured genetic (co)variance matrix. Under standard prior assumptions, all fully conditional distributions have closed form, and samples from the joint posterior distribution can be obtained via Gibbs sampling. The model and the algorithm developed for its Bayesian implementation were used to describe five repeated records of milk yield in dairy cattle, and a one common FA model was compared with a standard multiple trait model. The Bayesian Information Criterion favored the FA model.  相似文献   

8.
Space is a very important aspect in the simulation of biochemical systems; recently, the need for simulation algorithms able to cope with space is becoming more and more compelling. Complex and detailed models of biochemical systems need to deal with the movement of single molecules and particles, taking into consideration localized fluctuations, transportation phenomena, and diffusion. A common drawback of spatial models lies in their complexity: models can become very large, and their simulation could be time consuming, especially if we want to capture the systems behavior in a reliable way using stochastic methods in conjunction with a high spatial resolution. In order to deliver the promise done by systems biology to be able to understand a system as whole, we need to scale up the size of models we are able to simulate, moving from sequential to parallel simulation algorithms. In this paper, we analyze Smoldyn, a widely diffused algorithm for stochastic simulation of chemical reactions with spatial resolution and single molecule detail, and we propose an alternative, innovative implementation that exploits the parallelism of Graphics Processing Units (GPUs). The implementation executes the most computational demanding steps (computation of diffusion, unimolecular, and bimolecular reaction, as well as the most common cases of molecule-surface interaction) on the GPU, computing them in parallel on each molecule of the system. The implementation offers good speed-ups and real time, high quality graphics output  相似文献   

9.
A major consideration in multitrait analysis is which traits should be jointly analyzed. As a common strategy, multitrait analysis is performed either on pairs of traits or on all of traits. To fully exploit the power of multitrait analysis, we propose variable selection to choose a subset of informative traits for multitrait quantitative trait locus (QTL) mapping. The proposed method is very useful for achieving optimal statistical power for QTL identification and for disclosing the most relevant traits. It is also a practical strategy to effectively take advantage of multitrait analysis when the number of traits under consideration is too large, making the usual multivariate analysis of all traits challenging. We study the impact of selection bias and the usage of permutation tests in the context of variable selection and develop a powerful implementation procedure of variable selection for genome scanning. We demonstrate the proposed method and selection procedure in a backcross population, using both simulated and real data. The extension to other experimental mapping populations is straightforward.  相似文献   

10.
Inconsistencies in the relationship between fluctuating asymmetry (FA) and fitness may be due to selection acting on the degree of trait asymmetry that differs among populations or among traits. We assessed relationships between parasite susceptibility and fluctuating asymmetry in the number of bony lateral plates among 83 populations of freshwater Gasterosteus aculeatus (three spined stickleback) and among lateral plate positions that vary in the selection they experience for symmetry. The correlation between FA and parasite infection was highly variable among samples. Excess of infected asymmetric G. aculeatus increased significantly as the robustness of structural predator defences decreased. This effect was found for one parasite species only (Eustrongylides sp.) and was slightly stronger in females. In addition, there was a trend for there to be an excess of infected females asymmetric in those lateral plates positions that did not experience selection for their symmetry, although the trend only approached significance. These results suggest that selection for trait symmetry can obscure relationships between fitness and individual-wide developmental stability, providing one possible explanation for some of the heterogeneity in FA/fitness relationships seen in the literature. These results are also consistent with previous reports showing that ecological segregation between symmetric and asymmetric G. aculeatus and between sexes can alter the FA/fitness relationship.  相似文献   

11.
The large quantities of data now being transferred via high-speed networks have made deep packet inspection indispensable for security purposes. Scalable and low-cost signature-based network intrusion detection systems have been developed for deep packet inspection for various software platforms. Traditional approaches that only involve central processing units (CPUs) are now considered inadequate in terms of inspection speed. Graphic processing units (GPUs) have superior parallel processing power, but transmission bottlenecks can reduce optimal GPU efficiency. In this paper we describe our proposal for a hybrid CPU/GPU pattern-matching algorithm (HPMA) that divides and distributes the packet-inspecting workload between a CPU and GPU. All packets are initially inspected by the CPU and filtered using a simple pre-filtering algorithm, and packets that might contain malicious content are sent to the GPU for further inspection. Test results indicate that in terms of random payload traffic, the matching speed of our proposed algorithm was 3.4 times and 2.7 times faster than those of the AC-CPU and AC-GPU algorithms, respectively. Further, HPMA achieved higher energy efficiency than the other tested algorithms.  相似文献   

12.

Background  

Inferring gene networks from time-course microarray experiments with vector autoregressive (VAR) model is the process of identifying functional associations between genes through multivariate time series. This problem can be cast as a variable selection problem in Statistics. One of the promising methods for variable selection is the elastic net proposed by Zou and Hastie (2005). However, VAR modeling with the elastic net succeeds in increasing the number of true positives while it also results in increasing the number of false positives.  相似文献   

13.
Yang X  Belin TR  Boscardin WJ 《Biometrics》2005,61(2):498-506
Across multiply imputed data sets, variable selection methods such as stepwise regression and other criterion-based strategies that include or exclude particular variables typically result in models with different selected predictors, thus presenting a problem for combining the results from separate complete-data analyses. Here, drawing on a Bayesian framework, we propose two alternative strategies to address the problem of choosing among linear regression models when there are missing covariates. One approach, which we call "impute, then select" (ITS) involves initially performing multiple imputation and then applying Bayesian variable selection to the multiply imputed data sets. A second strategy is to conduct Bayesian variable selection and missing data imputation simultaneously within one Gibbs sampling process, which we call "simultaneously impute and select" (SIAS). The methods are implemented and evaluated using the Bayesian procedure known as stochastic search variable selection for multivariate normal data sets, but both strategies offer general frameworks within which different Bayesian variable selection algorithms could be used for other types of data sets. A study of mental health services utilization among children in foster care programs is used to illustrate the techniques. Simulation studies show that both ITS and SIAS outperform complete-case analysis with stepwise variable selection and that SIAS slightly outperforms ITS.  相似文献   

14.
Hybrid functional Petri nets are a wide-spread tool for representing and simulating biological models. Due to their potential of providing virtual drug testing environments, biological simulations have a growing impact on pharmaceutical research. Continuous research advancements in biology and medicine lead to exponentially increasing simulation times, thus raising the demand for performance accelerations by efficient and inexpensive parallel computation solutions. Recent developments in the field of general-purpose computation on graphics processing units (GPGPU) enabled the scientific community to port a variety of compute intensive algorithms onto the graphics processing unit (GPU). This work presents the first scheme for mapping biological hybrid functional Petri net models, which can handle both discrete and continuous entities, onto compute unified device architecture (CUDA) enabled GPUs. GPU accelerated simulations are observed to run up to 18 times faster than sequential implementations. Simulating the cell boundary formation by Delta-Notch signaling on a CUDA enabled GPU results in a speedup of approximately 7x for a model containing 1,600 cells.  相似文献   

15.
The graphics processing unit (GPU), which originally was used exclusively for visualization purposes, has evolved into an extremely powerful co-processor. In the meanwhile, through the development of elaborate interfaces, the GPU can be used to process data and deal with computationally intensive applications. The speed-up factors attained compared to the central processing unit (CPU) are dependent on the particular application, as the GPU architecture gives the best performance for algorithms that exhibit high data parallelism and high arithmetic intensity. Here, we evaluate the performance of the GPU on a number of common algorithms used for three-dimensional image processing. The algorithms were developed on a new software platform called "CUDA", which allows a direct translation from C code to the GPU. The implemented algorithms include spatial transformations, real-space and Fourier operations, as well as pattern recognition procedures, reconstruction algorithms and classification procedures. In our implementation, the direct porting of C code in the GPU achieves typical acceleration values in the order of 10-20 times compared to a state-of-the-art conventional processor, but they vary depending on the type of the algorithm. The gained speed-up comes with no additional costs, since the software runs on the GPU of the graphics card of common workstations.  相似文献   

16.
Scanning protein sequence database is an often repeated task in computational biology and bioinformatics. However, scanning large protein databases, such as GenBank, with popular tools such as BLASTP requires long runtimes on sequential architectures. Due to the continuing rapid growth of sequence databases, there is a high demand to accelerate this task. In this paper, we demonstrate how GPUs, powered by the Compute Unified Device Architecture (CUDA), can be used as an efficient computational platform to accelerate the BLASTP algorithm. In order to exploit the GPU’s capabilities for accelerating BLASTP, we have used a compressed deterministic finite state automaton for hit detection as well as a hybrid parallelization scheme. Our implementation achieves speedups up to 10.0 on an NVIDIA GeForce GTX 295 GPU compared to the sequential NCBI BLASTP 2.2.22. CUDA-BLASTP source code which is available at https://sites.google.com/site/liuweiguohome/software.  相似文献   

17.
Gene co-expression networks comprise one type of valuable biological networks. Many methods and tools have been published to construct gene co-expression networks; however, most of these tools and methods are inconvenient and time consuming for large datasets. We have developed a user-friendly, accelerated and optimized tool for constructing gene co-expression networks that can fully harness the parallel nature of GPU (Graphic Processing Unit) architectures. Genetic entropies were exploited to filter out genes with no or small expression changes in the raw data preprocessing step. Pearson correlation coefficients were then calculated. After that, we normalized these coefficients and employed the False Discovery Rate to control the multiple tests. At last, modules identification was conducted to construct the co-expression networks. All of these calculations were implemented on a GPU. We also compressed the coefficient matrix to save space. We compared the performance of the GPU implementation with those of multi-core CPU implementations with 16 CPU threads, single-thread C/C++ implementation and single-thread R implementation. Our results show that GPU implementation largely outperforms single-thread C/C++ implementation and single-thread R implementation, and GPU implementation outperforms multi-core CPU implementation when the number of genes increases. With the test dataset containing 16,000 genes and 590 individuals, we can achieve greater than 63 times the speed using a GPU implementation compared with a single-thread R implementation when 50 percent of genes were filtered out and about 80 times the speed when no genes were filtered out.  相似文献   

18.
Han  KyungHyun  Lee  Wai-Kong  Hwang  Seong Oun 《Cluster computing》2022,25(1):433-450

Recently, National Institute of Standards and Technology (NIST) in the U.S. had initiated a global-scale competition to standardize the lightweight authenticated encryption with associated data (AEAD) and hash function. Gimli is one of the Round 2 candidates that is designed to be efficiently implemented across various platforms, including hardware (VLSI and FPGA), microprocessors, and microcontrollers. However, the performance of Gimli in massively parallel architectures like Graphics Processing Units (GPU) is still unknown. A high performance Gimli implementation on GPU can be especially useful to Internet of Things (IoT) applications, wherein the gateway devices and cloud servers need to handle a massive number of communications protected by AEAD. In this paper, we show that with careful optimization, Gimli can be efficiently implemented in desktop and embedded GPU to achieve extremely high throughput. Our experiments show that the proposed Gimli implementation can achieve 661.44 KB/s (encryption), 892.24 KB/s (decryption), and 4344.46 KB/s (hashing) in state-of-the-art GPUs.

  相似文献   

19.
Gene expression data usually contain a large number of genes but a small number of samples. Feature selection for gene expression data aims at finding a set of genes that best discriminate biological samples of different types. Using machine learning techniques, traditional gene selection based on empirical mutual information suffers the data sparseness issue due to the small number of samples. To overcome the sparseness issue, we propose a model-based approach to estimate the entropy of class variables on the model, instead of on the data themselves. Here, we use multivariate normal distributions to fit the data, because multivariate normal distributions have maximum entropy among all real-valued distributions with a specified mean and standard deviation and are widely used to approximate various distributions. Given that the data follow a multivariate normal distribution, since the conditional distribution of class variables given the selected features is a normal distribution, its entropy can be computed with the log-determinant of its covariance matrix. Because of the large number of genes, the computation of all possible log-determinants is not efficient. We propose several algorithms to largely reduce the computational cost. The experiments on seven gene data sets and the comparison with other five approaches show the accuracy of the multivariate Gaussian generative model for feature selection, and the efficiency of our algorithms.  相似文献   

20.
Near-infrared spectroscopy (NIRS) is known to be a suitable technique for rapid fermentation monitoring. Industrial fermentation media are complex, both chemically (ill-defined composition) and physically (multiphase sample matrix), which poses an additional challenge to the development of robust NIRS calibration models. We investigated the use of NIRS for at-line monitoring of the concentration of clavulanic acid during an industrial fermentation. An industrial strain of Streptomyces clavuligerus was cultivated at 200-L scale for the production of clavulanic acid. Partial least squares (PLS) regression was used to develop calibration models between spectral and analytical data. In this work, two different variable selection methods, genetic algorithms (GA) and PLS-bootstrap, were studied and compared with models built using all the spectral variables. Calibration models for clavulanic acid concentration performed well both on internal and external validation. The two variable selection methods improved the predictive ability of the models up to 20%, relative to the calibration model built using the whole spectra.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号