首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Neural networks are usually considered as naturally parallel computing models. But the number of operators and the complex connection graph of standard neural models can not be directly handled by digital hardware devices. More particularly, several works show that programmable digital hardware is a real opportunity for flexible hardware implementations of neural networks. And yet many area and topology problems arise when standard neural models are implemented onto programmable circuits such as FPGAs, so that the fast FPGA technology improvements can not be fully exploited. Therefore neural network hardware implementations need to reconcile simple hardware topologies with complex neural architectures. The theoretical and practical framework developed, allows this combination thanks to some principles of configurable hardware that are applied to neural computation: Field Programmable Neural Arrays (FPNA) lead to powerful neural architectures that are easy to map onto FPGAs, thanks to a simplified topology and an original data exchange scheme. This paper shows how FPGAs have led to the definition of the FPNA computation paradigm. Then it shows how FPNAs contribute to current and future FPGA-based neural implementations by solving the general problems that are raised by the implementation of complex neural networks onto FPGAs.  相似文献   

2.
The Reconfigurable Computing Cluster (RCC) project has been investigating unconventional architectures for high end computing using a cluster of FPGA devices connected by a high-speed, custom network. Most applications use the FPGAs to realize an embedded System-on-a-Chip (SoC) design augmented with application-specific accelerators to form a message-passing parallel computer. Other applications take a single accelerator core and tessellate the core across all of the devices, treating them like a large virtual FPGA. The experimental hardware has also been used for basic computer research by emulating novel architectures. This article discusses the genesis of the over-arching project, summarizes results of individual investigations that have been completed, and how this approach may prove useful in the investigation of future Exascale systems.  相似文献   

3.
4.
MOTIVATION: High-resolution mass spectrometers generate large data files that are complex, noisy and require extensive processing to extract the optimal data from raw spectra. This processing is readily achieved in software and is often embedded in manufacturers' instrument control and data processing environments. However, the speed of this data processing is such that it is usually performed off-line, post data acquisition. We have been exploring strategies that would allow real-time advanced processing of mass spectrometric data, making use of the reconfigurable computing paradigm, which exploits the flexibility and versatility of Field Programmable Gate Arrays (FPGAs). This approach has emerged as a powerful solution for speeding up time-critical algorithms. We describe here a reconfigurable computing solution for processing raw mass spectrometric data generated by MALDI-ToF instruments. The hardware-implemented algorithms for de-noising, baseline correction, peak identification and deisotoping, running on a Xilinx Virtex 2 FPGA at 180 MHz, generate a mass fingerprint over 100 times faster than an equivalent algorithm written in C, running on a Dual 3 GHz Xeon workstation.  相似文献   

5.
A shift from a traditional biogeographical paradigm in cladistic biogeography to a chronobiogeographical paradigm is proposed. The chronobiogeographical paradigm aims to utilize temporal data in conjunction with spatial data in the detection of discrete historical events, such as vicariance and vicariant speciation, in cladograms. The concepts of primary and secondary congruency are introduced in relation to the distinction between repeated area relationships (primary congruency) and common extrinsic causality (secondary congruency). Simple hypothetical examples demonstrate that area cladograms cannot be safely interpreted purely as representing the sequence of area fragmentation; rather, they reflect recency of biotic interaction. Temporal data are shown to have a direct and potentially profound influence on the results of traditional cladistic biogeographical analyses, indicating the necessity of developing a chronobiogeographical approach. The implementation of the paradigm is considered first from a theoretical viewpoint and then in the context of the type of empirical data usually available. An as yet undevised "time/space algorithm" is deemed necessary for the latter, and guidelines are presented for the development of such an algorithm. Finally, we argue that the most rigorous and philosophically justified approach to the detection of phylogenetic causal events can be found only when temporal and spatial data are considered simultaneously. Consequently, the chronobiogeographical paradigm is seen as a logical elaboration of, not a replacement for, the biogeographical paradigm.  相似文献   

6.
A hardware architecture of a Probabilistic Logic Neuron (PLN) is presented. The suggested model facilitates the on-chip learning of pyramidal Weightless Neural Networks using a modified probabilistic search reward/penalty training algorithm. The penalization strategy of the training algorithm depends on a predefined parameter called the probabilistic search interval. A complete Weightless Neural Network (WNN) learning system is modeled and implemented on Xilinx XC4005E Field Programmable Gate Array (FPGA), allowing its architecture to be configurable. Various experiments have been conducted to examine the feasibility and performance of the WNN learning system. Results show that the system has a fast convergence rate and good generalization ability.  相似文献   

7.
Aligning hundreds of sequences using progressive alignment tools such as ClustalW requires several hours on state-of-the-art workstations. We present a new approach to compute multiple sequence alignments in far shorter time using reconfigurable hardware. This results in an implementation of ClustalW with significant runtime savings on a standard off-the-shelf FPGA.  相似文献   

8.
Exhaustive prediction of physicochemical properties of peptide sequences is used in different areas of biological research. One example is the identification of selective cationic antibacterial peptides (SCAPs), which may be used in the treatment of different diseases. Due to the discrete nature of peptide sequences, the physicochemical properties calculation is considered a high-performance computing problem. A competitive solution for this class of problems is to embed algorithms into dedicated hardware. In the present work we present the adaptation, design and implementation of an algorithm for SCAPs prediction into a Field Programmable Gate Array (FPGA) platform. Four physicochemical properties codes useful in the identification of peptide sequences with potential selective antibacterial activity were implemented into an FPGA board. The speed-up gained in a single-copy implementation was up to 108 times compared with a single Intel processor cycle for cycle. The inherent scalability of our design allows for replication of this code into multiple FPGA cards and consequently improvements in speed are possible. Our results show the first embedded SCAPs prediction solution described and constitutes the grounds to efficiently perform the exhaustive analysis of the sequence-physicochemical properties relationship of peptides.  相似文献   

9.
MOTIVATION: Sequence database searching is among the most important and challenging tasks in bioinformatics. The ultimate choice of sequence-search algorithm is that of Smith-Waterman. However, because of the computationally demanding nature of this method, heuristic programs or special-purpose hardware alternatives have been developed. Increased speed has been obtained at the cost of reduced sensitivity or very expensive hardware. RESULTS: A fast implementation of the Smith-Waterman sequence-alignment algorithm using Single-Instruction, Multiple-Data (SIMD) technology is presented. This implementation is based on the MultiMedia eXtensions (MMX) and Streaming SIMD Extensions (SSE) technology that is embedded in Intel's latest microprocessors. Similar technology exists also in other modern microprocessors. Six-fold speed-up relative to the fastest previously known Smith-Waterman implementation on the same hardware was achieved by an optimized 8-way parallel processing approach. A speed of more than 150 million cell updates per second was obtained on a single Intel Pentium III 500 MHz microprocessor. This is probably the fastest implementation of this algorithm on a single general-purpose microprocessor described to date.  相似文献   

10.
Motivation: Peptide mass fingerprinting (PMF) is a method for protein identification in which a protein is fragmented by a defined cleavage protocol (usually proteolysis with trypsin), and the masses of these products constitute a 'fingerprint' that can be searched against theoretical fingerprints of all known proteins. In the first stage of PMF, the raw mass spectrometric data are processed to generate a peptide mass list. In the second stage this protein fingerprint is used to search a database of known proteins for the best protein match. Although current software solutions can typically deliver a match in a relatively short time, a system that can find a match in real time could change the way in which PMF is deployed and presented. In a paper published earlier we presented a hardware design of a raw mass spectra processor that, when implemented in Field Programmable Gate Array (FPGA) hardware, achieves almost 170-fold speed gain relative to a conventional software implementation running on a dual processor server. In this article we present a complementary hardware realization of a parallel database search engine that, when running on a Xilinx Virtex 2 FPGA at 100 MHz, delivers 1800-fold speed-up compared with an equivalent C software routine, running on a 3.06 GHz Xeon workstation. The inherent scalability of the design means that processing speed can be multiplied by deploying the design on multiple FPGAs. The database search processor and the mass spectra processor, running on a reconfigurable computing platform, provide a complete real-time PMF protein identification solution.  相似文献   

11.
心脏起搏器程控与遥测系统的研究   总被引:1,自引:0,他引:1  
目的通过搭建一个心脏起搏器程控与遥测的实验平台,研究高效的程控遥测方案。方法系统由计算机显示终端、FPGA开发板、无线传输硬件电路构成。讨论除无线传输硬件电路之外的设计:计算机显示终端主要实现发送程控命令和显示遥测数据;FPGA开发板分别模仿程控仪和心脏起搏器,主要实现程控遥测信息的编码、解码和纠错功能。结论本系统提出了一种脉冲位置调制(PPM)编码方式——多位PPM编码方式,用于产生包含信息的脉冲波。该方法可以提高无线通讯的传输效率,若应用于植入式心脏起搏器,可延长其电池的使用寿命。实际测试结果证实了本方案的可行性。  相似文献   

12.
A multiple exposure laser speckle contrast imaging (MELSCI) setup for visualizing blood perfusion was developed using a field programmable gate array (FPGA), connected to a 1000 frames per second (fps) 1‐megapixel camera sensor. Multiple exposure time images at 1, 2, 4, 8, 16, 32 and 64 milliseconds were calculated by cumulative summation of 64 consecutive snapshot images. The local contrast was calculated for all exposure times using regions of 4 × 4 pixels. Averaging of multiple contrast images from the 64‐millisecond acquisition was done to improve the signal‐to‐noise ratio. The results show that with an effective implementation of the algorithm on an FPGA, contrast images at all exposure times can be calculated in only 28 milliseconds. The algorithm was applied to data recorded during a 5 minutes finger occlusion. Expected contrast changes were found during occlusion and the following hyperemia in the occluded finger, while unprovoked fingers showed constant contrast during the experiment. The developed setup is capable of massive data processing on an FPGA that enables processing of MELSCI data in 15.6 fps (1000/64 milliseconds). It also leads to improved frame rates, enhanced image quality and enables the calculation of improved microcirculatory perfusion estimates compared to single exposure time systems.   相似文献   

13.
Vigelius M  Meyer B 《PloS one》2012,7(4):e33384
For many biological applications, a macroscopic (deterministic) treatment of reaction-drift-diffusion systems is insufficient. Instead, one has to properly handle the stochastic nature of the problem and generate true sample paths of the underlying probability distribution. Unfortunately, stochastic algorithms are computationally expensive and, in most cases, the large number of participating particles renders the relevant parameter regimes inaccessible. In an attempt to address this problem we present a genuine stochastic, multi-dimensional algorithm that solves the inhomogeneous, non-linear, drift-diffusion problem on a mesoscopic level. Our method improves on existing implementations in being multi-dimensional and handling inhomogeneous drift and diffusion. The algorithm is well suited for an implementation on data-parallel hardware architectures such as general-purpose graphics processing units (GPUs). We integrate the method into an operator-splitting approach that decouples chemical reactions from the spatial evolution. We demonstrate the validity and applicability of our algorithm with a comprehensive suite of standard test problems that also serve to quantify the numerical accuracy of the method. We provide a freely available, fully functional GPU implementation. Integration into Inchman, a user-friendly web service, that allows researchers to perform parallel simulations of reaction-drift-diffusion systems on GPU clusters is underway.  相似文献   

14.
Peng  Bo  Li  Lei 《Cognitive neurodynamics》2015,9(2):249-256
Wireless sensor network (WSN) are widely used in many applications. A WSN is a wireless decentralized structure network comprised of nodes, which autonomously set up a network. The node localization that is to be aware of position of the node in the network is an essential part of many sensor network operations and applications. The existing localization algorithms can be classified into two categories: range-based and range-free. The range-based localization algorithm has requirements on hardware, thus is expensive to be implemented in practice. The range-free localization algorithm reduces the hardware cost. Because of the hardware limitations of WSN devices, solutions in range-free localization are being pursued as a cost-effective alternative to more expensive range-based approaches. However, these techniques usually have higher localization error compared to the range-based algorithms. DV-Hop is a typical range-free localization algorithm utilizing hop-distance estimation. In this paper, we propose an improved DV-Hop algorithm based on genetic algorithm. Simulation results show that our proposed algorithm improves the localization accuracy compared with previous algorithms.  相似文献   

15.
We have developed the MC64-ClustalWP2 as a new implementation of the Clustal W algorithm, integrating a novel parallelization strategy and significantly increasing the performance when aligning long sequences in architectures with many cores. It must be stressed that in such a process, the detailed analysis of both the software and hardware features and peculiarities is of paramount importance to reveal key points to exploit and optimize the full potential of parallelism in many-core CPU systems. The new parallelization approach has focused into the most time-consuming stages of this algorithm. In particular, the so-called progressive alignment has drastically improved the performance, due to a fine-grained approach where the forward and backward loops were unrolled and parallelized. Another key approach has been the implementation of the new algorithm in a hybrid-computing system, integrating both an Intel Xeon multi-core CPU and a Tilera Tile64 many-core card. A comparison with other Clustal W implementations reveals the high-performance of the new algorithm and strategy in many-core CPU architectures, in a scenario where the sequences to align are relatively long (more than 10 kb) and, hence, a many-core GPU hardware cannot be used. Thus, the MC64-ClustalWP2 runs multiple alignments more than 18x than the original Clustal W algorithm, and more than 7x than the best x86 parallel implementation to date, being publicly available through a web service. Besides, these developments have been deployed in cost-effective personal computers and should be useful for life-science researchers, including the identification of identities and differences for mutation/polymorphism analyses, biodiversity and evolutionary studies and for the development of molecular markers for paternity testing, germplasm management and protection, to assist breeding, illegal traffic control, fraud prevention and for the protection of the intellectual property (identification/traceability), including the protected designation of origin, among other applications.  相似文献   

16.
A special-purpose processor for gene sequence analysis   总被引:1,自引:1,他引:0  
Advances in computational biology have occurred primarily inthe areas of software and algorithm development; new designsof hardware to support biological computing are extremely scarce.This is due, we believe, to the presence of a non-trivial knowledgegap between molecular biologists and computer designers. Theexistence of this gap is unfortunate, as it has long been knownthat for certain problems, special-purpose computers can achievesignificant cost/performance gains over general-purpose machines.We describe one such computer here: a custom accelerator forgene sequence analysis. The accelerator implements a versionof the Needleman – Wunsch algorithm for nucleotide sequencealignment. Sequence lengths are constrained only by availablememory; the product of sequence lengths in the current implementationcan be up to 222. The machine is implemented as two NuBus boardsconnected to a Mac IIf/x, using a mixture of TTL and FPGA technologyclocked at 10 MHz. The boards are completely functional, andyield a 15-fold performance improvement over an unassisted host.  相似文献   

17.

Background

Protein is an important molecule that performs a wide range of functions in biological systems. Recently, the protein folding attracts much more attention since the function of protein can be generally derived from its molecular structure. The GOR algorithm is one of the most successful computational methods and has been widely used as an efficient analysis tool to predict secondary structure from protein sequence. However, the execution time is still intolerable with the steep growth in protein database. Recently, FPGA chips have emerged as one promising application accelerator to accelerate bioinformatics algorithms by exploiting fine-grained custom design.

Results

In this paper, we propose a complete fine-grained parallel hardware implementation on FPGA to accelerate the GOR-IV package for 2D protein structure prediction. To improve computing efficiency, we partition the parameter table into small segments and access them in parallel. We aggressively exploit data reuse schemes to minimize the need for loading data from external memory. The whole computation structure is carefully pipelined to overlap the sequence loading, computing and back-writing operations as much as possible. We implemented a complete GOR desktop system based on an FPGA chip XC5VLX330.

Conclusions

The experimental results show a speedup factor of more than 430x over the original GOR-IV version and 110x speedup over the optimized version with multi-thread SIMD implementation running on a PC platform with AMD Phenom 9650 Quad CPU for 2D protein structure prediction. However, the power consumption is only about 30% of that of current general-propose CPUs.
  相似文献   

18.
Díaz J  Ros E  Sabatini SP  Solari F  Mota S 《Bio Systems》2007,87(2-3):314-321
A simple and fast technique for depth estimation based on phase measurement has been adopted for the implementation of a real-time stereo system with sub-pixel resolution on an FPGA device. The technique avoids the attendant problem of phase warping. The designed system takes full advantage of the inherent processing parallelism and segmentation capabilities of FPGA devices to achieve a computation speed of 65megapixels/s, which can be arranged with a customized frame-grabber module to process 211frames/s at a size of 640x480 pixels. The processing speed achieved is higher than conventional camera frame rates, thus allowing the system to extract multiple estimations and be used as a platform to evaluate integration schemes of a population of neurons without increasing hardware resource demands.  相似文献   

19.

Background

Principal component analysis (PCA) has been widely employed for automatic neuronal spike sorting. Calculating principal components (PCs) is computationally expensive, and requires complex numerical operations and large memory resources. Substantial hardware resources are therefore needed for hardware implementations of PCA. General Hebbian algorithm (GHA) has been proposed for calculating PCs of neuronal spikes in our previous work, which eliminates the needs of computationally expensive covariance analysis and eigenvalue decomposition in conventional PCA algorithms. However, large memory resources are still inherently required for storing a large volume of aligned spikes for training PCs. The large size memory will consume large hardware resources and contribute significant power dissipation, which make GHA difficult to be implemented in portable or implantable multi-channel recording micro-systems.

Method

In this paper, we present a new algorithm for PCA-based spike sorting based on GHA, namely stream-based Hebbian eigenfilter, which eliminates the inherent memory requirements of GHA while keeping the accuracy of spike sorting by utilizing the pseudo-stationarity of neuronal spikes. Because of the reduction of large hardware storage requirements, the proposed algorithm can lead to ultra-low hardware resources and power consumption of hardware implementations, which is critical for the future multi-channel micro-systems. Both clinical and synthetic neural recording data sets were employed for evaluating the accuracy of the stream-based Hebbian eigenfilter. The performance of spike sorting using stream-based eigenfilter and the computational complexity of the eigenfilter were rigorously evaluated and compared with conventional PCA algorithms. Field programmable logic arrays (FPGAs) were employed to implement the proposed algorithm, evaluate the hardware implementations and demonstrate the reduction in both power consumption and hardware memories achieved by the streaming computing

Results and discussion

Results demonstrate that the stream-based eigenfilter can achieve the same accuracy and is 10 times more computationally efficient when compared with conventional PCA algorithms. Hardware evaluations show that 90.3% logic resources, 95.1% power consumption and 86.8% computing latency can be reduced by the stream-based eigenfilter when compared with PCA hardware. By utilizing the streaming method, 92% memory resources and 67% power consumption can be saved when compared with the direct implementation of GHA.

Conclusion

Stream-based Hebbian eigenfilter presents a novel approach to enable real-time spike sorting with reduced computational complexity and hardware costs. This new design can be further utilized for multi-channel neuro-physiological experiments or chronic implants.  相似文献   

20.
目的:本文研究了基于现场可编程门阵列(Field Programmable Gate Array,FPGA)超声成像系统中数字动态滤波器的实现方法和过程。方法:动态滤波器中FIR滤波器采用分布式算法(Distributed Arithmetic,DA)实现结构,并在应用中对DA算法进行了改进,包括数据并行处理结构的设计、对查找表(Look Up Table,LUT)输入字长N大小的控制和具有对称系数的FIR滤波器的采用。改进后的DA实现在FPGA资源占用和处理速度之间达到了平衡。同时,结合多级流水线结构,动态滤波器实现了数字超声信号并行处理。结果:采用常值滤波器(远场匹配参数)进行滤波后,超声回波图像远场分辨率达到了要求,但越靠近近场效果越差。相比之下,本文设计的基于FPGA超声信号动态数字滤波器达到了很好的滤波效果,使回声图像近场和远场都有最佳分辨率。结论:利用FPGA实现超声系统中动态滤波器是完全可行的,并且有助于提高系统的稳定性和可靠性,并大大减低系统成本。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号