共查询到20条相似文献,搜索用时 0 毫秒
1.
ABSTRACT: BACKGROUND: Aligning short DNA reads to a reference sequence alignment is a prerequisite fordetecting their biological origin and analyzing them in a phylogenetic context. With thePaPaRa tool we introduced a dedicated dynamic programming algorithm forsimultaneously aligning short reads to reference alignments and correspondingevolutionary reference trees. The algorithm aligns short reads to phylogenetic profiles thatcorrespond to the branches of such a reference tree. The algorithm needs to perform animmense number of pairwise alignments. Therefore, we explore vector intrinsics andGPUs to accelerate the PaPaRa alignment kernel. RESULTS: We optimized and parallelized PaPaRa on CPUs and GPUs. Via SSE 4.1 SIMD (SingleInstruction, Multiple Data) intrinsics for x86 SIMD architectures and multi-threading, weobtained a 9-fold acceleration on a single core as well as linear speedups with respect tothe number of cores. The peak CPU performance amounts to 18.1 GCUPS (Giga CellUpdates per Second) using all four physical cores on an Intel i7 2600 CPU running at 3.4GHz. The average CPU performance (averaged over all test runs) is 12.33 GCUPS. Wealso used OpenCL to execute PaPaRa on a GPU SIMT (Single Instruction, MultipleThreads) architecture. A NVIDIA GeForce 560 GPU delivered peak and averageperformance of 22.1 and 18.4 GCUPS respectively. Finally, we combined the SIMD andSIMT implementations into a hybrid CPU-GPU system that achieved an accumulatedpeak performance of 33.8 GCUPS. CONCLUSIONS: This accelerated version of PaPaRa (available at www.exelixis-lab.org/software.html)provides a significant performance improvement that allows for analyzing larger datasetsin less time. We observe that state-of-the-art SIMD and SIMT architectures delivercomparable performance for this dynamic programming kernel when the "competingprogrammer approach" is deployed. Finally, we show that overall performance can besubstantially increased by designing a hybrid CPU-GPU system with appropriate loaddistribution mechanisms. 相似文献
2.
3.
In this paper we introduce methods to build a SOM that can be used as an isometric map for mobile robots. That is, given a dataset of sensor readings collected at points uniformly distributed with respect to the ground, we wish to build a SOM whose neurons (prototype vectors in sensor space) correspond to points uniformly distributed on the ground. Manifold learning techniques have already been used for dimensionality reduction of sensor space in navigation systems. Our focus is on the isometric property of the SOM. For reliable path-planning and information sharing between several robots, it is desirable that the robots build an internal representation of the sensor manifold, a map, that is isometric with the environment. We show experimentally that standard Non-Linear Dimensionality Reduction (NLDR) algorithms do not provide isometric maps for range data and bearing data. However, the auxiliary low dimensional manifolds created can be used to improve the distribution of the neurons of a SOM (that is, make the neurons more evenly distributed with respect to the ground). We also describe a method to create an isometric map from a sensor readings collected along a polygonal line random walk. 相似文献
4.
Lila Rami Patrick Auguste Noélie B. Thebaud Reine Bareille Richard Daculsi Jean Ripoche Laurence Bordenave 《PloS one》2013,8(11)
Shear stress is one of mechanical constraints which are exerted by blood flow on endothelial cells (ECs). To adapt to shear stress, ECs align in the direction of flow through adherens junction (AJ) remodeling. However, mechanisms regulating ECs alignment under shear stress are poorly understood. The scaffold protein IQ domain GTPase activating protein 1 (IQGAP1) is a scaffold protein which couples cell signaling to the actin and microtubule cytoskeletons and is involved in cell migration and adhesion. IQGAP1 also plays a role in AJ organization in epithelial cells. In this study, we investigated the potential IQGAP1 involvement in the endothelial cells alignment under shear stress. Progenitor-derived endothelial cells (PDECs), transfected (or not) with IQGAP1 small interfering RNA, were exposed to a laminar shear stress (1.2 N/m2) and AJ proteins (VE-cadherin and β-catenin) and IQGAP1 were labeled by immunofluorescence. We show that IQGAP1 is essential for ECs alignment under shear stress. We studied the role of IQGAP1 in AJs remodeling of PDECs exposed to shear stress by studying cell localization and IQGAP1 interactions with VE-cadherin and β-catenin by immunofluorescence and Proximity Ligation Assays. In static conditions, IQGAP1 interacts with VE-cadherin but not with β-catenin at the cell membrane. Under shear stress, IQGAP1 lost its interaction from VE-cadherin to β-catenin. This “switch” was concomitant with the loss of β-catenin/VE-cadherin interaction at the cell membrane. This work shows that IQGAP1 is essential to ECs alignment under shear stress and that AJ remodeling represents one of the mechanisms involved. These results provide a new approach to understand ECs alignment under to shear stress. 相似文献
5.
The dimension of the population genetics data produced by next-generation sequencing platforms is extremely high. However, the "intrinsic dimensionality" of sequence data, which determines the structure of populations, is much lower. This motivates us to use locally linear embedding (LLE) which projects high dimensional genomic data into low dimensional, neighborhood preserving embedding, as a general framework for population structure and historical inference. To facilitate application of the LLE to population genetic analysis, we systematically investigate several important properties of the LLE and reveal the connection between the LLE and principal component analysis (PCA). Identifying a set of markers and genomic regions which could be used for population structure analysis will provide invaluable information for population genetics and association studies. In addition to identifying the LLE-correlated or PCA-correlated structure informative marker, we have developed a new statistic that integrates genomic information content in a genomic region for collectively studying its association with the population structure and LASSO algorithm to search such regions across the genomes. We applied the developed methodologies to a low coverage pilot dataset in the 1000 Genomes Project and a PHASE III Mexico dataset of the HapMap. We observed that 25.1%, 44.9% and 21.4% of the common variants and 89.2%, 92.4% and 75.1% of the rare variants were the LLE-correlated markers in CEU, YRI and ASI, respectively. This showed that rare variants, which are often private to specific populations, have much higher power to identify population substructure than common variants. The preliminary results demonstrated that next generation sequencing offers a rich resources and LLE provide a powerful tool for population structure analysis. 相似文献
6.
Roytberg Mikhail Gambin Anna Noe Laurent Lasota Slawomir Furletova Eugenia Szczurek Ewa Kucherov Gregory 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2009,6(3):483-494
We apply the concept of subset seeds proposed in [1] to similarity search in protein sequences. The main question studied is the design of efficient seed alphabets to construct seeds with optimal sensitivity/selectivity trade-offs. We propose several different design methods and use them to construct several alphabets. We then perform a comparative analysis of seeds built over those alphabets and compare them with the standard Blastp seeding method [2], [3], as well as with the family of vector seeds proposed in [4]. While the formalism of subset seeds is less expressive (but less costly to implement) than the cumulative principle used in Blastp and vector seeds, our seeds show a similar or even better performance than Blastp on Bernoulli models of proteins compatible with the common BLOSUM62 matrix. Finally, we perform a large-scale benchmarking of our seeds against several main databases of protein alignments. Here again, the results show a comparable or better performance of our seeds versus Blastp. 相似文献
7.
Stanislav Kadl?ík Tomá? Ku?era Dominika Chalupská Radek Ga?ák Markéta Koběrská Dana Ulanová Jan Kopecky Eva Kutejová Lucie Najmanová Ji?í Janata 《PloS one》2013,8(12)
Clinically used lincosamide antibiotic lincomycin incorporates in its structure 4-propyl-L-proline (PPL), an unusual amino acid, while celesticetin, a less efficient related compound, makes use of proteinogenic L-proline. Biochemical characterization, as well as phylogenetic analysis and homology modelling combined with the molecular dynamics simulation were employed for complex comparative analysis of the orthologous protein pair LmbC and CcbC from the biosynthesis of lincomycin and celesticetin, respectively. The analysis proved the compared proteins to be the stand-alone adenylation domains strictly preferring their own natural substrate, PPL or L-proline. The LmbC substrate binding pocket is adapted to accomodate a rare PPL precursor. When compared with L-proline specific ones, several large amino acid residues were replaced by smaller ones opening a channel which allowed the alkyl side chain of PPL to be accommodated. One of the most important differences, that of the residue corresponding to V306 in CcbC changing to G308 in LmbC, was investigated in vitro and in silico. Moreover, the substrate binding pocket rearrangement also allowed LmbC to effectively adenylate 4-butyl-L-proline and 4-pentyl-L-proline, substrates with even longer alkyl side chains, producing more potent lincosamides. A shift of LmbC substrate specificity appears to be an integral part of biosynthetic pathway adaptation to the PPL acquisition. A set of genes presumably coding for the PPL biosynthesis is present in the lincomycin - but not in the celesticetin cluster; their homologs are found in biosynthetic clusters of some pyrrolobenzodiazepines (PBD) and hormaomycin. Whereas in the PBD and hormaomycin pathways the arising precursors are condensed to another amino acid moiety, the LmbC protein is the first functionally proved part of a unique condensation enzyme connecting PPL to the specialized amino sugar building unit. 相似文献
8.
9.
Kernel approaches for genic interaction extraction 总被引:2,自引:0,他引:2
10.
Summary Closely related proteins show an obvious kinship by having numerous matching amino acids in their aligned sequences. Kinship between anciently separated proteins requires a statistical evaluation to rule out fortuitous similarities. A simple statistic is developed which assumes equal probability for all codon pairs, and a table of critical values for amino acid sequence alignments of length 200 or less is presented. Applying this statistic toV andC regions of immunoglobulin chains, aligned on the basis of shared features of three-dimensional structure, provides evidence that theV andC sequences descended from a common ancestor. Similarly the distant evolutionary relationship of dehydrogenases, flavdoxin, and subtilisin, suggested by structural alignments, is verified. On the other hand, the statistic does not verify a common evolutionary origin for the heme binding pocket in globins and cytochromeb
5. Empirical evidence from the distribution of MMD values of amino acid pairs in comparisons of misaligned polypeptide chains and from Monte Carlo trials of sequences aligned with arbitrary gaps supports the validity of the statistic. 相似文献
11.
Jensen GJ 《Journal of structural biology》2001,133(2-3):143-155
To determine the structure of a biological particle to high resolution by electron microscopy, image averaging is required to combine information from different views and to increase the signal-to-noise ratio. Starting from the number of noiseless views necessary to resolve features of a given size, four general factors are considered that increase the number of images actually needed: (1) the physics of electron scattering introduces shot noise, (2) thermal motion and particle inhomogeneity cause the scattered electrons to describe a mixture of structures, (3) the microscope system fails to usefully record all the information carried by the scattered electrons, and (4) image misalignment leads to information loss through incoherent averaging. The compound effect of factors 2-4 is approximated by the product of envelope functions. The problem of incoherent image averaging is developed in detail through derivation of five envelope functions that account for small errors in 11 "alignment" parameters describing particle location, orientation, defocus, magnification, and beam tilt. The analysis provides target error tolerances for single particle analysis to near-atomic (3.5 A) resolution, and this prospect is shown to depend critically on image quality, defocus determination, and microscope alignment. 相似文献
12.
13.
Albert K. Hoang Duc Marc Modat Kelvin K. Leung M. Jorge Cardoso Josephine Barnes Timor Kadir Sébastien Ourselin for the Alzheimer’s Disease Neuroimaging Initiative 《PloS one》2013,8(8)
Multi-atlas segmentation has been widely used to segment various anatomical structures. The success of this technique partly relies on the selection of atlases that are best mapped to a new target image after registration. Recently, manifold learning has been proposed as a method for atlas selection. Each manifold learning technique seeks to optimize a unique objective function. Therefore, different techniques produce different embeddings even when applied to the same data set. Previous studies used a single technique in their method and gave no reason for the choice of the manifold learning technique employed nor the theoretical grounds for the choice of the manifold parameters. In this study, we compare side-by-side the results given by 3 manifold learning techniques (Isomap, Laplacian Eigenmaps and Locally Linear Embedding) on the same data set. We assess the ability of those 3 different techniques to select the best atlases to combine in the framework of multi-atlas segmentation. First, a leave-one-out experiment is used to optimize our method on a set of 110 manually segmented atlases of hippocampi and find the manifold learning technique and associated manifold parameters that give the best segmentation accuracy. Then, the optimal parameters are used to automatically segment 30 subjects from the Alzheimer’s Disease Neuroimaging Initiative (ADNI). For our dataset, the selection of atlases with Locally Linear Embedding gives the best results. Our findings show that selection of atlases with manifold learning leads to segmentation accuracy close to or significantly higher than the state-of-the-art method and that accuracy can be increased by fine tuning the manifold learning process. 相似文献
14.
Kernel density estimation for length biased data 总被引:3,自引:0,他引:3
15.
Byung-Jun Yoon 《EURASIP Journal on Bioinformatics and Systems Biology》2009,2009(1):491074
When aligning RNAs, it is important to consider both the secondary structure similarity and primary sequence similarity to find an accurate alignment. However, algorithms that can handle RNA secondary structures typically have high computational complexity that limits their utility. For this reason, there have been a number of attempts to find useful alignment constraints that can reduce the computations without sacrificing the alignment accuracy. In this paper, we propose a new method for finding effective alignment constraints for fast and accurate structural alignment of RNAs, including pseudoknots. In the proposed method, we use a profile-HMM to identify the “seedâ€� regions that can be aligned with high confidence. We also estimate the position range of the aligned bases that are located outside the seed regions. The location of the seed regions and the estimated range of the alignment positions are then used to establish the sequence alignment constraints. We incorporated the proposed constraints into the profile context-sensitive HMM (profile-csHMM) based RNA structural alignment algorithm. Experiments indicate that the proposed method can make the alignment speed up to 11 times faster without degrading the accuracy of the RNA alignment. 相似文献
16.
Robert K. Bradley Adam Roberts Michael Smoot Sudeep Juvekar Jaeyoung Do Colin Dewey Ian Holmes Lior Pachter 《PLoS computational biology》2009,5(5)
We describe a new program for the alignment of multiple biological sequences that is both statistically motivated and fast enough for problem sizes that arise in practice. Our Fast Statistical Alignment program is based on pair hidden Markov models which approximate an insertion/deletion process on a tree and uses a sequence annealing algorithm to combine the posterior probabilities estimated from these models into a multiple alignment. FSA uses its explicit statistical model to produce multiple alignments which are accompanied by estimates of the alignment accuracy and uncertainty for every column and character of the alignment—previously available only with alignment programs which use computationally-expensive Markov Chain Monte Carlo approaches—yet can align thousands of long sequences. Moreover, FSA utilizes an unsupervised query-specific learning procedure for parameter estimation which leads to improved accuracy on benchmark reference alignments in comparison to existing programs. The centroid alignment approach taken by FSA, in combination with its learning procedure, drastically reduces the amount of false-positive alignment on biological data in comparison to that given by other methods. The FSA program and a companion visualization tool for exploring uncertainty in alignments can be used via a web interface at http://orangutan.math.berkeley.edu/fsa/, and the source code is available at http://fsa.sourceforge.net/. 相似文献
17.
18.
19.
We propose a class of kernel stick-breaking processes for uncountable collections of dependent random probability measures. The process is constructed by first introducing an infinite sequence of random locations. Independent random probability measures and beta-distributed random weights are assigned to each location. Predictor-dependent random probability measures are then constructed by mixing over the locations, with stick-breaking probabilities expressed as a kernel multiplied by the beta weights. Some theoretical properties of the process are described, including a covariate-dependent prediction rule. A retrospective Markov chain Monte Carlo algorithm is developed for posterior computation, and the methods are illustrated using a simulated example and an epidemiological application. 相似文献
20.
玉米出籽率全基因组关联分析 总被引:1,自引:0,他引:1
出籽率与玉米单穗产量密切相关,其遗传机制的解析对玉米高产育种具有重要意义.本研究利用309份玉米自交系为关联群体,利用固定和随机模型交替概率统一(FarmCPU)、压缩混合线性模型(CMLM)和多位点混合线性模型(MLMM)对2017年和2019年河南新乡原阳、周口郸城、海南三亚以及最佳线性无偏估计值(BLUE)的出籽... 相似文献