首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 640 毫秒
1.
Scanning protein sequence database is an often repeated task in computational biology and bioinformatics. However, scanning large protein databases, such as GenBank, with popular tools such as BLASTP requires long runtimes on sequential architectures. Due to the continuing rapid growth of sequence databases, there is a high demand to accelerate this task. In this paper, we demonstrate how GPUs, powered by the Compute Unified Device Architecture (CUDA), can be used as an efficient computational platform to accelerate the BLASTP algorithm. In order to exploit the GPU’s capabilities for accelerating BLASTP, we have used a compressed deterministic finite state automaton for hit detection as well as a hybrid parallelization scheme. Our implementation achieves speedups up to 10.0 on an NVIDIA GeForce GTX 295 GPU compared to the sequential NCBI BLASTP 2.2.22. CUDA-BLASTP source code which is available at https://sites.google.com/site/liuweiguohome/software.  相似文献   

2.
3.
4.
GenBank          下载免费PDF全文
The GenBank sequence database incorporates publicly available DNA sequences of more than 105 000 different organisms, primarily through direct submission of sequence data from individual laboratories and large-scale sequencing projects. Most submissions are made using the BankIt (web) or Sequin programs and accession numbers are assigned by GenBank staff upon receipt. Data exchange with the EMBL Data Library and the DNA Data Bank of Japan helps ensure comprehensive worldwide coverage. GenBank data is accessible through NCBI’s integrated retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical literature via PubMed. Sequence similarity searching is provided by the BLAST family of programs. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. NCBI also offers a wide range of World Wide Web retrieval and analysis services based on GenBank data. The GenBank database and related resources are freely accessible via the NCBI home page at http://www.ncbi.nlm.nih.gov.  相似文献   

5.
This study describes the development of aptamers as a therapy against influenza virus infection. Aptamers are oligonucleotides (like ssDNA or RNA) that are capable of binding to a variety of molecular targets with high affinity and specificity. We have studied the ssDNA aptamer BV02, which was designed to inhibit influenza infection by targeting the hemagglutinin viral protein, a protein that facilitates the first stage of the virus’ infection. While testing other aptamers and during lead optimization, we realized that the dominant characteristics that determine the aptamer’s binding to the influenza virus may not necessarily be sequence-specific, as with other known aptamers, but rather depend on general 2D structural motifs. We adopted QSAR (quantitative structure activity relationship) tool and developed computational algorithm that correlate six calculated structural and physicochemical properties to the aptamers’ binding affinity to the virus. The QSAR study provided us with a predictive tool of the binding potential of an aptamer to the influenza virus. The correlation between the calculated and actual binding was R2 = 0.702 for the training set, and R2 = 0.66 for the independent test set. Moreover, in the test set the model’s sensitivity was 89%, and the specificity was 87%, in selecting aptamers with enhanced viral binding. The most important properties that positively correlated with the aptamer’s binding were the aptamer length, 2D-loops and repeating sequences of C nucleotides. Based on the structure-activity study, we have managed to produce aptamers having viral affinity that was more than 20 times higher than that of the original BV02 aptamer. Further testing of influenza infection in cell culture and animal models yielded aptamers with 10 to 15 times greater anti-viral activity than the BV02 aptamer. Our insights concerning the mechanism of action and the structural and physicochemical properties that govern the interaction with the influenza virus are discussed.  相似文献   

6.
We present a prototype of a new database tool, GeneCensus, which focuses on comparing genomes globally, in terms of the collective properties of many genes, rather than in terms of the attributes of a single gene (e.g. sequence similarity for a particular ortholog). The comparisons are presented in a visual fashion over the web at GeneCensus.org. The system concentrates on two types of comparisons: (i) trees based on the sharing of generalized protein families between genomes, and (ii) whole pathway analysis in terms of activity levels. For the trees, we have developed a module (TreeViewer) that clusters genomes in terms of the folds, superfamilies or orthologs—all can be considered as generalized ‘families’ or ‘protein parts’—they share, and compares the resulting trees side-by-side with those built from sequence similarity of individual genes (e.g. a traditional tree built on ribosomal similarity). We also include comparisons to trees built on whole-genome dinucleotide or codon composition. For pathway comparisons, we have implemented a module (PathwayPainter) that graphically depicts, in selected metabolic pathways, the fluxes or expression levels of the associated enzymes (i.e. generalized ‘activities’). One can, consequently, compare organisms (and organism states) in terms of representations of these systemic quantities. Develop ment of this module involved compiling, calculating and standardizing flux and expression information from many different sources. We illustrate pathway analysis for enzymes involved in central metabolism. We are able to show that, to some degree, flux and expression fluctuations have characteristic values in different sections of the central metabolism and that control points in this system (e.g. hexokinase, pyruvate kinase, phosphofructokinase, isocitrate dehydrogenase and citric synthase) tend to be especially variable in flux and expression. Both the TreeViewer and PathwayPainter modules connect to other information sources related to individual-gene or organism properties (e.g. a single-gene structural annotation viewer).  相似文献   

7.
The review considers the original works on the primary structure of biopolymers carried out from 1983 to 2003. Most works were supported by the Russian program Human Genome and earlier similar Russian programs. Little-known publications of 1983–1993 and recent unpublished results are described in detail. In the field of genome comparisons, these concern the OWEN hierarchic algorithm aligning syntenic regions of two genome sequences. The resulting global alignment is obtained as an ordered chain of local similarities. Alignment of megabase sequences takes several minutes. The concept of local similarity conflicts is generalized to multiple comparisons. New algorithms aligning protein sequences are described and compared with the Smith–Waterman algorithm, which is now most accurate. The ANCHOR hierarchic algorithm generates alignments of much the same accuracy and is twice as rapid as the Smith–Waterman one. The STRSWer algorithm takes into account the secondary structures of proteins under study. With the secondary structures predicted using the PSI-PRED software for pairs of proteins having 10–30% similarity, the average accuracy of alignments generated by STRSWer is 15% higher than that achieved with the Smith–Waterman algorithm.  相似文献   

8.

Background

Predicting protein function from primary sequence is an important open problem in modern biology. Not only are there many thousands of proteins of unknown function, current approaches for predicting function must be improved upon. One problem in particular is overly-specific function predictions which we address here with a new statistical model of the relationship between protein sequence similarity and protein function similarity.

Methodology

Our statistical model is based on sets of proteins with experimentally validated functions and numeric measures of function specificity and function similarity derived from the Gene Ontology. The model predicts the similarity of function between two proteins given their amino acid sequence similarity measured by statistics from the BLAST sequence alignment algorithm. A novel aspect of our model is that it predicts the degree of function similarity shared between two proteins over a continuous range of sequence similarity, facilitating prediction of function with an appropriate level of specificity.

Significance

Our model shows nearly exact function similarity for proteins with high sequence similarity (bit score >244.7, e-value >1e−62, non-redundant NCBI protein database (NRDB)) and only small likelihood of specific function match for proteins with low sequence similarity (bit score <54.6, e-value <1e−05, NRDB). For sequence similarity ranges in between our annotation model shows an increasing relationship between function similarity and sequence similarity, but with considerable variability. We applied the model to a large set of proteins of unknown function, and predicted functions for thousands of these proteins ranging from general to very specific. We also applied the model to a data set of proteins with previously assigned, specific functions that were electronically based. We show that, on average, these prior function predictions are more specific (quite possibly overly-specific) compared to predictions from our model that is based on proteins with experimentally determined function.  相似文献   

9.
In silico prediction of a protein’s tertiary structure remains an unsolved problem. The community-wide Critical Assessment of Protein Structure Prediction (CASP) experiment provides a double-blind study to evaluate improvements in protein structure prediction algorithms. We developed a protein structure prediction pipeline employing a three-stage approach, consisting of low-resolution topology search, high-resolution refinement, and molecular dynamics simulation to predict the tertiary structure of proteins from the primary structure alone or including distance restraints either from predicted residue-residue contacts, nuclear magnetic resonance (NMR) nuclear overhauser effect (NOE) experiments, or mass spectroscopy (MS) cross-linking (XL) data. The protein structure prediction pipeline was evaluated in the CASP11 experiment on twenty regular protein targets as well as thirty-three ‘assisted’ protein targets, which also had distance restraints available. Although the low-resolution topology search module was able to sample models with a global distance test total score (GDT_TS) value greater than 30% for twelve out of twenty proteins, frequently it was not possible to select the most accurate models for refinement, resulting in a general decay of model quality over the course of the prediction pipeline. In this study, we provide a detailed overall analysis, study one target protein in more detail as it travels through the protein structure prediction pipeline, and evaluate the impact of limited experimental data.  相似文献   

10.
The variety and complexity of human-made tools are unique in the animal kingdom. Research investigating why human tool use is special has focused on the role of social learning: while non-human great apes acquire tool-use behaviours mostly by individual (re-)inventions, modern humans use imitation and teaching to accumulate innovations over time. However, little is known about tool-use behaviours that humans can invent individually, i.e. without cultural knowledge. We presented 2- to 3.5-year-old children with 12 problem-solving tasks based on tool-use behaviours shown by great apes. Spontaneous tool use was observed in 11 tasks. Additionally, tasks which occurred more frequently in wild great apes were also solved more frequently by human children. Our results demonstrate great similarity in the spontaneous tool-use abilities of human children and great apes, indicating that the physical cognition underlying tool use shows large overlaps across the great ape species. This suggests that humans are neither born with special physical cognition skills, nor that these skills have degraded due to our species’ long reliance of social learning in the tool-use domain.  相似文献   

11.
Fermat’s principle of least time states that light rays passing through different media follow the fastest (and not the most direct) path between two points, leading to refraction at medium borders. Humans intuitively employ this rule, e.g., when a lifeguard has to infer the fastest way to traverse both beach and water to reach a swimmer in need. Here, we tested whether foraging ants also follow Fermat’s principle when forced to travel on two surfaces that differentially affected the ants’ walking speed. Workers of the little fire ant, Wasmannia auropunctata, established “refracted” pheromone trails to a food source. These trails deviated from the most direct path, but were not different to paths predicted by Fermat’s principle. Our results demonstrate a new aspect of decentralized optimization and underline the versatility of the simple yet robust rules governing the self-organization of group-living animals.  相似文献   

12.
Large-scale artificial neural networks have many redundant structures, making the network fall into the issue of local optimization and extended training time. Moreover, existing neural network topology optimization algorithms have the disadvantage of many calculations and complex network structure modeling. We propose a Dynamic Node-based neural network Structure optimization algorithm (DNS) to handle these issues. DNS consists of two steps: the generation step and the pruning step. In the generation step, the network generates hidden layers layer by layer until accuracy reaches the threshold. Then, the network uses a pruning algorithm based on Hebb’s rule or Pearson’s correlation for adaptation in the pruning step. In addition, we combine genetic algorithm to optimize DNS (GA-DNS). Experimental results show that compared with traditional neural network topology optimization algorithms, GA-DNS can generate neural networks with higher construction efficiency, lower structure complexity, and higher classification accuracy.  相似文献   

13.
14.
Standard deviational ellipse (SDE) has long served as a versatile GIS tool for delineating the geographic distribution of concerned features. This paper firstly summarizes two existing models of calculating SDE, and then proposes a novel approach to constructing the same SDE based on spectral decomposition of the sample covariance, by which the SDE concept is naturally generalized into higher dimensional Euclidean space, named standard deviational hyper-ellipsoid (SDHE). Then, rigorous recursion formulas are derived for calculating the confidence levels of scaled SDHE with arbitrary magnification ratios in any dimensional space. Besides, an inexact-newton method based iterative algorithm is also proposed for solving the corresponding magnification ratio of a scaled SDHE when the confidence probability and space dimensionality are pre-specified. These results provide an efficient manner to supersede the traditional table lookup of tabulated chi-square distribution. Finally, synthetic data is employed to generate the 1-3 multiple SDEs and SDHEs. And exploratory analysis by means of SDEs and SDHEs are also conducted for measuring the spread concentrations of Hong Kong’s H1N1 in 2009.  相似文献   

15.
In Compressed Sensing (CS) of MRI, optimization of the regularization parameters is not a trivial task. We aimed to establish a method that could determine the optimal weights for regularization parameters in CS of time-of-flight MR angiography (TOF-MRA) by comparing various image metrics with radiologists’ visual evaluation. TOF-MRA of a healthy volunteer was scanned using a 3T-MR system. Images were reconstructed by CS from retrospectively under-sampled data by varying the weights for the L1 norm of wavelet coefficients and that of total variation. The reconstructed images were evaluated both quantitatively by statistical image metrics including structural similarity (SSIM), scale invariant feature transform (SIFT) and contrast-to-noise ratio (CNR), and qualitatively by radiologists’ scoring. The results of quantitative metrics and qualitative scorings were compared. SSIM and SIFT in conjunction with brain masks and CNR of artery-to-parenchyma correlated very well with radiologists’ visual evaluation. By carefully selecting a region to measure, we have shown that statistical image metrics can reflect radiologists’ visual evaluation, thus enabling an appropriate optimization of regularization parameters for CS.  相似文献   

16.
17.
In recent years, comprehensive learning particle swarm optimization (CLPSO) has attracted the attention of many scholars for using in solving multimodal problems, as it is excellent in preserving the particles’ diversity and thus preventing premature convergence. However, CLPSO exhibits low solution accuracy. Aiming to address this issue, we proposed a novel algorithm called LILPSO. First, this algorithm introduced a Lagrange interpolation method to perform a local search for the global best point (gbest). Second, to gain a better exemplar, one gbest, another two particle’s historical best points (pbest) are chosen to perform Lagrange interpolation, then to gain a new exemplar, which replaces the CLPSO’s comparison method. The numerical experiments conducted on various functions demonstrate the superiority of this algorithm, and the two methods are proven to be efficient for accelerating the convergence without leading the particle to premature convergence.  相似文献   

18.
BackgroundSimilarity based computational methods are a useful tool for predicting protein functions from protein–protein interaction (PPI) datasets. Although various similarity-based prediction algorithms have been proposed, unsatisfactory prediction results have occurred on many occasions. The purpose of this type of algorithm is to predict functions of an unannotated protein from the functions of those proteins that are similar to the unannotated protein. Therefore, the prediction quality largely depends on how to select a set of proper proteins (i.e., a prediction domain) from which the functions of an unannotated protein are predicted, and how to measure the similarity between proteins. Another issue with existing algorithms is they only believe the function prediction is a one-off procedure, ignoring the fact that interactions amongst proteins are mutual and dynamic in terms of similarity when predicting functions. How to resolve these major issues to increase prediction quality remains a challenge in computational biology.ResultsIn this paper, we propose an innovative approach to predict protein functions of unannotated proteins iteratively from a PPI dataset. The iterative approach takes into account the mutual and dynamic features of protein interactions when predicting functions, and addresses the issues of protein similarity measurement and prediction domain selection by introducing into the prediction algorithm a new semantic protein similarity and a method of selecting the multi-layer prediction domain. The new protein similarity is based on the multi-layered information carried by protein functions. The evaluations conducted on real protein interaction datasets demonstrated that the proposed iterative function prediction method outperformed other similar or non-iterative methods, and provided better prediction results.ConclusionsThe new protein similarity derived from multi-layered information of protein functions more reasonably reflects the intrinsic relationships among proteins, and significant improvement to the prediction quality can occur through incorporation of mutual and dynamic features of protein interactions into the prediction algorithm.  相似文献   

19.
To address the void in the availability of high-quality proteomic data traversing the animal tree, we have implemented a pipeline for generating de novo assemblies based on publicly available data from the NCBI Sequence Read Archive, yielding a comprehensive collection of proteomes from 100 species spanning 21 animal phyla. We have also created the Animal Proteome Database (AniProtDB), a resource providing open access to this collection of high-quality metazoan proteomes, along with information on predicted proteins and protein domains for each taxonomic classification and the ability to perform sequence similarity searches against all proteomes generated using this pipeline. This solution vastly increases the utility of these data by removing the barrier to access for research groups who do not have the expertise or resources to generate these data themselves and enables the use of data from nontraditional research organisms that have the potential to address key questions in biomedicine.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号