首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Iterative reconstruction algorithms are becoming increasingly important in electron tomography of biological samples. These algorithms, however, impose major computational demands. Parallelization must be employed to maintain acceptable running times. Graphics Processing Units (GPUs) have been demonstrated to be highly cost-effective for carrying out these computations with a high degree of parallelism. In a recent paper by Xu et al. (2010), a GPU implementation strategy was presented that obtains a speedup of an order of magnitude over a previously proposed GPU-based electron tomography implementation. In this technical note, we demonstrate that by making alternative design decisions in the GPU implementation, an additional speedup can be obtained, again of an order of magnitude. By carefully considering memory access locality when dividing the workload among blocks of threads, the GPU’s cache is used more efficiently, making more effective use of the available memory bandwidth.  相似文献   

2.
The question as to whether cultures evolve in a manner analogous to that of genetic evolution can be addressed by attempting to reconstruct population histories using cultural data. As others have argued, this can only succeed if cultures are isolated enough to maintain and pass on a central core of traditions that can be modified over time. In this study we used a set of cultural data (canoe design traits from Polynesia) to look for the kinds of patterns and relationships normally found in population genetic studies. After developing new techniques to accommodate the peculiarities of cultural data, we were able to infer an ancestral region (Fiji) and a sequence of cultural origins for these Polynesian societies. In addition, we found evidence of cultural exchange, migration and a serial founder effect. Results were stronger when analyses were based on functional traits (presumably subject to natural selection and convergence) rather than symbolic or stylistic traits (probably subject to cultural selection for rapid divergence). These patterns strongly suggest that cultural evolution, while clearly affected by cultural exchange, is also subject to some of the same processes and constraints as genetic evolution.  相似文献   

3.
Tensor contractions are generalized multidimensional matrix multiplication operations that widely occur in quantum chemistry. Efficient execution of tensor contractions on Graphics Processing Units (GPUs) requires several challenges to be addressed, including index permutation and small dimension-sizes reducing thread block utilization. Moreover, to apply the same optimizations to various expressions, we need a code generation tool. In this paper, we present our approach to automatically generate CUDA code to execute tensor contractions on GPUs, including management of data movement between CPU and GPU. To evaluate our tool, GPU-enabled code is generated for the most expensive contractions in CCSD(T), a key coupled cluster method, and incorporated into NWChem, a popular computational chemistry suite. For this method, we demonstrate speedup over a factor of 8.4 using one GPU as compared to one CPU core and over 2.6 when utilizing the entire system using hybrid CPU+GPU solution with 2 GPUs and 5 cores (instead of 7 cores per node). We further investigate tensor contraction code on a new series of GPUs, the Fermi GPUs, and provide several effective optimization algorithms. For the same computation of CCSD(T), on a cluster with Fermi GPUs, we achieve a speedup of 3.4 over a cluster with T10 GPUs. With a single Fermi GPU on each node, we achieve a speedup of 43 over the sequential CPU version.  相似文献   

4.
Parallel hash-based EST clustering algorithm for gene sequencing   总被引:2,自引:0,他引:2  
EST clustering is a simple, yet effective method to discover all the genes present in a variety of species. Although using ESTs is a cost-effective approach in gene discovery, the amount of data, and hence the computational resources required, make it a very challenging problem. Time and storage requirements for EST clustering problems are prohibitively expensive. Existing tools have quadratic time complexity resulting from all against all sequence comparisons. With the rapid growth of EST data we need better and faster clustering tools. In this paper, we present HECT (Hash based EST Clustering Tool), a novel time- and memory-efficient algorithm for EST clustering. We report that HECT can cluster a 10,000 Human EST dataset (which is also used in benchmarking d2_cluster), in 207 minutes on a 1 GHz Pentium III processor which is 36 times faster than the original d2_cluster algorithm. A parallel version of HECT (PECT) is also developed and used to cluster 269,035 soybean EST sequences on IA-32 Linux cluster at National Center for Supercomputing Applications at UIUC. The parallel algorithm exhibited excellent speedup over its sequential counterpart and its memory requirements are almost negligible making it suitable to run virtually on any data size. The performance of the proposed clustering algorithms is compared against other known clustering techniques and results are reported in the paper.  相似文献   

5.
In recent times, improvements in smart mobile devices have led to new functionalities related to their embedded positioning abilities. Many related applications that use positioning data have been introduced and are widely being used. However, the positioning data acquired by such devices are prone to erroneous values caused by environmental factors. In this research, a detection algorithm is implemented to detect erroneous data over a continuous positioning data set with several options. Our algorithm is based on a moving window for speed values derived by consecutive positioning data. Both the moving average of the speed and standard deviation in a moving window compose a moving significant interval at a given time, which is utilized to detect erroneous positioning data along with other parameters by checking the newly obtained speed value. In order to fulfill the designated operation, we need to examine the physical parameters and also determine the parameters for the moving windows. Along with the detection of erroneous speed data, estimations of correct positioning are presented. The proposed algorithm first estimates the speed, and then the correct positions. In addition, it removes the effect of errors on the moving window statistics in order to maintain accuracy. Experimental verifications based on our algorithm are presented in various ways. We hope that our approach can help other researchers with regard to positioning applications and human mobility research.  相似文献   

6.
In order to accelerate computing the convex hull on a set of n points, a heuristic procedure is often applied to reduce the number of points to a set of s points, sn, which also contains the same hull. We present an algorithm to precondition 2D data with integer coordinates bounded by a box of size p × q before building a 2D convex hull, with three distinct advantages. First, we prove that under the condition min(p, q) ≤ n the algorithm executes in time within O(n); second, no explicit sorting of data is required; and third, the reduced set of s points forms a simple polygonal chain and thus can be directly pipelined into an O(n) time convex hull algorithm. This paper empirically evaluates and quantifies the speed up gained by preconditioning a set of points by a method based on the proposed algorithm before using common convex hull algorithms to build the final hull. A speedup factor of at least four is consistently found from experiments on various datasets when the condition min(p, q) ≤ n holds; the smaller the ratio min(p, q)/n is in the dataset, the greater the speedup factor achieved.  相似文献   

7.
It has been known for some time that the variability of the R-R intervals in the electrocardiogram yields valuable information concerning the types of arrhythmia which might be present. In this paper, an investigation is made into the application of zero-crossing analysis to the study of such variability. The number of times that the R-R interval crosses its mean value over a specified interval of time is counted, and may be associated with a particular characteristic frequency, related to the dominant frequency components of the power spectrum of R-R intervals. Higher order crossing counts may be computed by taking combinations of sum and difference operations on the original time series. The advantage of using zero-crossing analysis over spectral analysis is the computational simplicity of the former. It is demonstrated, by analysing data taken from the MIT-BIH Arrhythmia database, that zero crossing analysis can sometimes be used to distinguish between different arrhythmias, but forethought concerning the number of sum and difference operations to be taken on the original data set is required when computing the higher order crossing counts.  相似文献   

8.
In the present study, an artificial neural network was trained with the Stuttgart Neural Networks Simulator, in order to identify Corynebacterium species by analyzing their pyrolysis patterns. An earlier study described the combination of pyrolysis, gas chromatography and atomic emission detection we used on whole cell bacteria. Carbon, sulfur and nitrogen were detected in the pyrolysis compounds. Pyrolysis patterns were obtained from 52 Corynebacterium strains belonging to 5 close species. These data were previously analyzed by Euclidean distances calculation followed by Unweighted Pair Group Method of Averages, a clustering method. With this early method, strains from 3 of the 5 species (C. xerosis, C. freneyi and C. amycolatum) were correctly characterized even if the 29 strains of C. amycolatum were grouped into 2 subgroups. Strains from the 2 remaining species (C. minutissimum and C. striatum) cannot be separated. To build an artificial neural network, able to discriminate the 5 previous species, the pyrolysis data of 42 selected strains were used as learning set and the 10 remaining strains as testing set. The chosen learning algorithm was Back-Propagation with Momentum. Parameters used to train a correct network are described here, and the results analyzed. The obtained artificial neural network has the following cone-shaped structure: 144 nodes in input, 25 and 9 nodes in 2 successive hidden layers, and then 5 outputs. It could classify all the strains in their species group. This network completes a chemotaxonomic method for Corynebacterium identification.  相似文献   

9.
Desktop grids (DG) offer large amounts of computing power coming from internet-based volunteer networks. They suffer from the free-riding phenomenon. It may be possible for users to free ride, consuming resources donated by others but not donating any of their own. In this paper, we present PGTrust: our decentralized free-riding prevention model designed for PastryGrid. PastryGrid is a decentralized DG system which manages resources over a decentralized P2P network. PGTrust relies on the notion of score which is a metric of reputation used to evaluate the level of QoS of a peer. We have conducted out experimentations on Grid’5000 testbed. Obtained results prove the benefits of our free-riding prevention model. PGTrust is able to improve application running time by discouraging free-riders and motivating selfish peers to contribute. It offers a considerable speedup over distributed applications.  相似文献   

10.
Convergence in nucleotide composition (CNC) in unrelated lineages is a factor potentially affecting the performance of most phylogeny reconstruction methods. Such convergence has deleterious effects because unrelated lineages show similarities due to similar nucleotide compositions and not shared histories. While some methods (such as the LogDet/paralinear distance measure) avoid this pitfall, the amount of convergence in nucleotide composition necessary to deceive other phylogenetic methods has never been quantified. We examined analytically the relationship between convergence in nucleotide composition and the consistency of parsimony as a phylogenetic estimator for four taxa. Our results show that rather extreme amounts of convergence are necessary before parsimony begins to prefer the incorrect tree. Ancillary observations are that (for unweighted Fitch parsimony) transition/transversion bias contributes to the impact of CNC and, for a given amount of CNC and fixed branch lengths, data sets exhibiting substantial site-to-site rate heterogeneity present fewer difficulties than data sets in which rates are homogeneous. We conclude by reexamining a data set originally used to illustrate the problems caused by CNC. Using simulations, we show that in this case the convergence in nucleotide composition alone is insufficient to cause any commonly used methods to fail, and accounting for other evolutionary factors (such as site-to-site rate heterogeneity) can give a correct inference without accounting for CNC.  相似文献   

11.

Background

The clinical decision support system can effectively break the limitations of doctors’ knowledge and reduce the possibility of misdiagnosis to enhance health care. The traditional genetic data storage and analysis methods based on stand-alone environment are hard to meet the computational requirements with the rapid genetic data growth for the limited scalability.

Methods

In this paper, we propose a distributed gene clinical decision support system, which is named GCDSS. And a prototype is implemented based on cloud computing technology. At the same time, we present CloudBWA which is a novel distributed read mapping algorithm leveraging batch processing strategy to map reads on Apache Spark.

Results

Experiments show that the distributed gene clinical decision support system GCDSS and the distributed read mapping algorithm CloudBWA have outstanding performance and excellent scalability. Compared with state-of-the-art distributed algorithms, CloudBWA achieves up to 2.63 times speedup over SparkBWA. Compared with stand-alone algorithms, CloudBWA with 16 cores achieves up to 11.59 times speedup over BWA-MEM with 1 core.

Conclusions

GCDSS is a distributed gene clinical decision support system based on cloud computing techniques. In particular, we incorporated a distributed genetic data analysis pipeline framework in the proposed GCDSS system. To boost the data processing of GCDSS, we propose CloudBWA, which is a novel distributed read mapping algorithm to leverage batch processing technique in mapping stage using Apache Spark platform.
  相似文献   

12.
Klionsky DJ  Kumar A 《Autophagy》2006,2(1):12-23
With its relevance to our understanding of eukaryotic cell function in the normal and disease state, autophagy is an important topic in modern cell biology; yet, few textbooks discuss autophagy beyond a two- or three-sentence summary. Here, we report an undergraduate/graduate class lesson for the in-depth presentation of autophagy using an active learning approach. By our method, students will work in small groups to solve problems and interpret an actual data set describing genes involved in autophagy. The problem-solving exercises and data set analysis will instill within the students a much greater understanding of the autophagy pathway than can be achieved by simple rote memorization of lecture materials; furthermore, the students will gain a general appreciation of the process by which data are interpreted and eventually formed into an understanding of a given pathway. As the data sets used in these class lessons are largely genomic and complementary in content, students will also understand first-hand the advantage of an integrative or systems biology study: No single data set can be used to define the pathway in full-the information from multiple complementary studies must be integrated in order to recapitulate our present understanding of the pathways mediating autophagy. In total, our teaching methodology offers an effective presentation of autophagy as well as a general template for the discussion of nearly any signaling pathway within the eukaryotic kingdom.  相似文献   

13.
We shall present several qualitative mathematical models to describe the early evolution of water transport systems in plants. To perform this in a systematic way we apply methods which have been developed in phenomenological synergetics. These methods rest on the fact that it becomes possible to describe the macroscopic behavior of a complex system by a set of control and order parameters when they are suitably identified. Our presentation is addressed to community with interdisciplinary interests.  相似文献   

14.
15.
《Autophagy》2013,9(1):12-23
With its relevance to our understanding of eukaryotic cell function in the normal and disease state, autophagy is an important topic in modern cell biology; yet, few textbooks discuss autophagy beyond a two- or three-sentence summary. Here, we report an undergraduate/graduate class lesson for the in-depth presentation of autophagy using an active learning approach. By our method, students will work in small groups to solve problems and interpret an actual data set describing genes involved in autophagy. The problem-solving exercises and data set analysis will instill within the students a much greater understanding of the autophagy pathway than can be achieved by simple rote memorization of lecture materials; furthermore, the students will gain a general appreciation of the process by which data are interpreted and eventually formed into an understanding of a given pathway. As the data sets used in these class lessons are largely genomic and complementary in content, students will also understand first-hand the advantage of an integrative or systems biology study: No single data set can be used to define the pathway in full æ the information from multiple complementary studies must be integrated in order to recapitulate our present understanding of the pathways mediating autophagy. In total, our teaching methodology offers an effective presentation of autophagy as well as a general template for the discussion of nearly any signaling pathway within the eukaryotic kingdom.  相似文献   

16.
Operons, co-transcribed and co-regulated contiguous sets of genes, are poorly conserved over short periods of evolutionary time. The gene order, gene content and regulatory mechanisms of operons can be very different, even in closely related species. Here, we present several lines of evidence which suggest that, although an operon and its individual genes and regulatory structures are rearranged when comparing the genomes of different species, this rearrangement is a conservative process. Genomic rearrangements invariably maintain individual genes in very specific functional and regulatory contexts. We call this conserved context an uber-operon.  相似文献   

17.
K-ary clustering with optimal leaf ordering for gene expression data   总被引:2,自引:0,他引:2  
MOTIVATION: A major challenge in gene expression analysis is effective data organization and visualization. One of the most popular tools for this task is hierarchical clustering. Hierarchical clustering allows a user to view relationships in scales ranging from single genes to large sets of genes, while at the same time providing a global view of the expression data. However, hierarchical clustering is very sensitive to noise, it usually lacks of a method to actually identify distinct clusters, and produces a large number of possible leaf orderings of the hierarchical clustering tree. In this paper we propose a new hierarchical clustering algorithm which reduces susceptibility to noise, permits up to k siblings to be directly related, and provides a single optimal order for the resulting tree. RESULTS: We present an algorithm that efficiently constructs a k-ary tree, where each node can have up to k children, and then optimally orders the leaves of that tree. By combining k clusters at each step our algorithm becomes more robust against noise and missing values. By optimally ordering the leaves of the resulting tree we maintain the pairwise relationships that appear in the original method, without sacrificing the robustness. Our k-ary construction algorithm runs in O(n(3)) regardless of k and our ordering algorithm runs in O(4(k)n(3)). We present several examples that show that our k-ary clustering algorithm achieves results that are superior to the binary tree results in both global presentation and cluster identification. AVAILABILITY: We have implemented the above algorithms in C++ on the Linux operating system.  相似文献   

18.
Detecting members of known noncoding RNA (ncRNA) families in genomic DNA is an important part of sequence annotation. However, the most widely used tool for modeling ncRNA families, the covariance model (CM), incurs a high-computational cost when used for genome-wide search. This cost can be reduced by using a filter to exclude sequences that are unlikely to contain the ncRNA of interest, applying the CM only where it is likely to match strongly. Despite recent advances, designing an efficient filter that can detect ncRNA instances lacking strong conservation while excluding most irrelevant sequences remains challenging. In this work, we design three types of filters based on multiple secondary structure profiles (SSPs). An SSP augments a regular profile (i.e., a position weight matrix) with secondary structure information but can still be efficiently scanned against long sequences. Multi-SSPbased filters combine evidence from multiple SSP matches and can achieve high sensitivity and specificity. Our SSP-based filters are extensively tested in BRAliBase III data set, Rfam 9.0, and a published soil metagenomic data set. In addition, we compare the SSPbased filters with several other ncRNA search tools including Infernal (with profile HMMs as filters), ERPIN, and tRNAscan-SE. Our experiments demonstrate that carefully designed SSP filters can achieve significant speedup over unfiltered CM search while maintaining high sensitivity for various ncRNA families. The designed filters and filter-scanning programs are available at our website: www.cse.msu.edu/~yannisun/ssp/.  相似文献   

19.
Biological systems often display modularity, in the sense that they can be decomposed into nearly independent subsystems. Recent studies have suggested that modular structure can spontaneously emerge if goals (environments) change over time, such that each new goal shares the same set of sub-problems with previous goals. Such modularly varying goals can also dramatically speed up evolution, relative to evolution under a constant goal. These studies were based on simulations of model systems, such as logic circuits and RNA structure, which are generally not easy to treat analytically. We present, here, a simple model for evolution under modularly varying goals that can be solved analytically. This model helps to understand some of the fundamental mechanisms that lead to rapid emergence of modular structure under modularly varying goals. In particular, the model suggests a mechanism for the dramatic speedup in evolution observed under such temporally varying goals.  相似文献   

20.
Two-part regression models are frequently used to analyze longitudinal count data with excess zeros, where the same set of subjects is repeatedly observed over time. In this context, several sources of heterogeneity may arise at individual level that affect the observed process. Further, longitudinal studies often suffer from missing values: individuals dropout of the study before its completion, and thus present incomplete data records. In this paper, we propose a finite mixture of hurdle models to face the heterogeneity problem, which is handled by introducing random effects with a discrete distribution; a pattern-mixture approach is specified to deal with non-ignorable missing values. This approach helps us to consider overdispersed counts, while allowing for association between the two parts of the model, and for non-ignorable dropouts. The effectiveness of the proposal is tested through a simulation study. Finally, an application to real data on skin cancer is provided.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号