首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The choice of technology and bioinformatics approach is critical in obtaining accurate and reliable information from next‐generation sequencing (NGS) experiments. An increasing number of software and methodological guidelines are being published, but deciding upon which approach and experimental design to use can depend on the particularities of the species and on the aims of the study. This leaves researchers unable to produce informed decisions on these central questions. To address these issues, we developed pipeliner – a tool to evaluate, by simulation, the performance of NGS pipelines in resequencing studies. Pipeliner provides a graphical interface allowing the users to write and test their own bioinformatics pipelines with publicly available or custom software. It computes a number of statistics summarizing the performance in SNP calling, including the recovery, sensitivity and false discovery rate for heterozygous and homozygous SNP genotypes. Pipeliner can be used to answer many practical questions, for example, for a limited amount of NGS effort, how many more reliable SNPs can be detected by doubling coverage and halving sample size or what is the false discovery rate provided by different SNP calling algorithms and options. Pipeliner thus allows researchers to carefully plan their study's sampling design and compare the suitability of alternative bioinformatics approaches for their specific study systems. Pipeliner is written in C++ and is freely available from http://github.com/brunonevado/Pipeliner .  相似文献   

2.
采用B/S架构,应用ASP.NET技术开发了烟草种质资源WebGIS。该系统采用Google Maps API作为地图服务,其数据库采用Microsoft SQL Server,服务器端程序应用C++语言开发,浏览器端基于XHTML+JavaScript。该系统对地图操作和种质资源查询进行了整合,用户可通过4种类型5种比例的地图,选择感兴趣的区域和范围。其地图标注和地址解析功能直观地展现了种质资源的地理分布状况,以及种质资源所在位置的地址名称和经纬度等。另外,用户还可通过WebGIS的查询功能获得种质资源的详细数据信息。因此,本研究开发的烟草种质资源WebGIS为种质资源的研究与利用提供了新的技术和策略。  相似文献   

3.
SUMMARY: MuSeqBox is a program to parse BLAST output and store attributes of BLAST hits in tabular form. The user can apply a number of selection criteria to filter out hits with particular attributes. MuSeqBox provides a powerful annotation tool for large sets of query sequences that are simultaneously compared against a database with any of the standard stand-alone or network-client BLAST programs. We discuss such application to the problem of annotation and analysis of EST collections. AVAILABILITY: The program was written in standard C++ and is freely available to noncommercial users by request from the authors. The program is also available over the web at http://bioinformatics.iastate.edu/bioinformatics2go/mb/MuSeqBox.html.  相似文献   

4.
SUMMARY: Multiple sequence alignment is a frequently used technique for analyzing sequence relationships. Compilation of large alignments is computationally expensive, but processing time can be considerably reduced when the computational load is distributed over many processors. Parallel processing functionality in the form of single-instruction multiple-data (SIMD) technology was implemented into the multiple alignment program Praline by using 'message passing interface' (MPI) routines. Over the alignments tested here, the parallelized program performed up to ten times faster on 25 processors compared to the single processor version. AVAILABILITY: Example program code for parallelizing pairwise alignment loops is available from http://mathbio.nimr.mrc.ac.uk/~jkleinj/tools/mpicode. The 'message passing interface' package (MPICH) is available from http:/www.unix.mcs.anl.gov/mpi/mpich. CONTACT: jhering@nimr.mrc.ac.uk SUPPLEMENTARY INFORMATION: Praline is accessible at http://mathbio.nimr.mrc.ac.uk/praline.  相似文献   

5.
The deluge of data emerging from high-throughput sequencing technologies poses large analytical challenges when testing for association to disease. We introduce a scalable framework for variable selection, implemented in C++ and OpenCL, that fits regularized regression across multiple Graphics Processing Units. Open source code and documentation can be found at a Google Code repository under the URL http://bioinformatics.oxfordjournals.org/content/early/2012/01/10/bioinformatics.bts015.abstract. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

6.
Zhang W  Duan S  Dolan ME 《Bioinformation》2008,2(8):322-324
The International HapMap Project provides a resource of genotypic data on single nucleotide polymorphisms (SNPs), which can be used in various association studies to identify the genetic determinants for phenotypic variations. Prior to the association studies, the HapMap dataset should be preprocessed in order to reduce the computation time and control the multiple testing problem. The less informative SNPs including those with very low genotyping rate and SNPs with rare minor allele frequencies to some extent in one or more population are removed. Some research designs only use SNPs in a subset of HapMap cell lines. Although the HapMap website and other association software packages have provided some basic tools for optimizing these datasets, a fast and user-friendly program to generate the output for filtered genotypic data would be beneficial for association studies. Here, we present a flexible, straight-forward bioinformatics program that can be useful in preparing the HapMap genotypic data for association studies by specifying cell lines and two common filtering criteria: minor allele frequencies and genotyping rate. The software was developed for Microsoft Windows and written in C++. AVAILABILITY: The Windows executable and source code in Microsoft Visual C++ are available at Google Code (http://hapmap-filter-v1.googlecode.com/) or upon request. Their distribution is subject to GNU General Public License v3.  相似文献   

7.
We propose a C++ class library developed to the purpose of making the implementation of sequence analysis algorithms easier and faster when genomic annotations and variations need to be considered. The library provides a class hierarchy to seamlessly bind together annotations of genomic elements to sequences and to algorithm results; it allows to evaluate the effect of mutations/variations in terms of both element position shifts and of algorithm results, limiting recalculation to the minimum. Particular care has been posed to keep memory and time overhead into acceptable limits. AVAILABILITY AND IMPLEMENTATION: A complete tutorial as well as a detailed doxygen generated documentation and source code is freely available at http://bioinformatics.emedea.it/geco, under the GPL license. The library was written in standard ISO C++, and does not depend on external libraries.  相似文献   

8.
Structure_threader is a program to parallelize multiple runs of genetic clustering software that does not make use of multithreading technology (structure , fastStructure and MavericK) on multicore computers. Our approach was benchmarked across multiple systems and displayed great speed improvements relative to the single‐threaded implementation, scaling very close to linearly with the number of physical cores used. Structure_threader was compared to previous software written for the same task—ParallelStructure and StrAuto and was proven to be the faster (up to 25% faster) wrapper under all tested scenarios. Furthermore, Structure_threader can perform several automatic and convenient operations, assisting the user in assessing the most biologically likely value of ‘K’ via implementations such as the “Evanno,” or “Thermodynamic Integration” tests and automatically draw the “meanQ” plots (static or interactive) for each value of K (or even combined plots). Structure_threader is written in python 3 and licensed under the GPLv3. It can be downloaded free of charge at https://github.com/StuntsPT/Structure_threader .  相似文献   

9.
MOTIVATION: Sequence database searching is among the most important and challenging tasks in bioinformatics. The ultimate choice of sequence-search algorithm is that of Smith-Waterman. However, because of the computationally demanding nature of this method, heuristic programs or special-purpose hardware alternatives have been developed. Increased speed has been obtained at the cost of reduced sensitivity or very expensive hardware. RESULTS: A fast implementation of the Smith-Waterman sequence-alignment algorithm using Single-Instruction, Multiple-Data (SIMD) technology is presented. This implementation is based on the MultiMedia eXtensions (MMX) and Streaming SIMD Extensions (SSE) technology that is embedded in Intel's latest microprocessors. Similar technology exists also in other modern microprocessors. Six-fold speed-up relative to the fastest previously known Smith-Waterman implementation on the same hardware was achieved by an optimized 8-way parallel processing approach. A speed of more than 150 million cell updates per second was obtained on a single Intel Pentium III 500 MHz microprocessor. This is probably the fastest implementation of this algorithm on a single general-purpose microprocessor described to date.  相似文献   

10.
We present GranatumX, a next-generation software environment for single-cell RNA sequencing (scRNA-seq) data analysis. GranatumX is inspired by the interactive webtool Granatum. GranatumX enables biologists to access the latest scRNA-seq bioinformatics methods in a web-based graphical environment. It also offers software developers the opportunity to rapidly promote their own tools with others in customizable pipelines. The architecture of GranatumX allows for easy inclusion of plugin modules, named Gboxes, which wrap around bioinformatics tools written in various programming languages and on various platforms. GranatumX can be run on the cloud or private servers and generate reproducible results. It is a community-engaging, flexible, and evolving software ecosystem for scRNA-seq analysis, connecting developers with bench scientists. GranatumX is freely accessible at http://garmiregroup.org/granatumx/app.  相似文献   

11.
MOTIVATION: High-throughput technologies now allow the acquisition of biological data, such as comprehensive biochemical time-courses at unprecedented rates. These temporal profiles carry topological and kinetic information regarding the biochemical network from which they were drawn. Retrieving this information will require systematic application of both experimental and computational methods. RESULTS: S-systems are non-linear mathematical approximative models based on the power-law formalism. They provide a general framework for the simulation of integrated biological systems exhibiting complex dynamics, such as genetic circuits, signal transduction and metabolic networks. We describe how the heuristic optimization technique simulated annealing (SA) can be effectively used for estimating the parameters of S-systems from time-course biochemical data. We demonstrate our methods using three artificial networks designed to simulate different network topologies and behavior. We then end with an application to a real biochemical network by creating a working model for the cadBA system in Escherichia coli. AVAILABILITY: The source code written in C++ is available at http://www.engg.upd.edu.ph/~naval/bioinformcode.html. All the necessary programs including the required compiler are described in a document archived with the source code. SUPPLEMENTARY INFORMATION: Supplementary material is available at Bioinformatics online.  相似文献   

12.
13.
This paper investigates scheduling strategies for divisible jobs/loads originating from multiple sites in hierarchical networks with heterogeneous processors and communication channels. In contrast, most previous work in the divisible load scheduling theory (DLT) literature mainly addressed scheduling problems with loads originating from a single processor. This is one of the first works that address scheduling multiple loads from multiple sites in the DLT paradigm. In addition, scheduling multi-site jobs is common in Grids and other general distributed systems for resource sharing and coordination. An efficient static scheduling algorithm PPDD (Processor-set Partitioning and Data Distribution Algorithm) is proposed to near-optimally distribute multiple loads among all processors so that the overall processing time of all jobs is minimized. The PPDD algorithm is applied to two cases: when processors are equipped with front-ends and when they are not equipped with front-ends. The application of the algorithm to homogeneous systems is also studied. Further, several important properties exhibited by the PPDD algorithm are proven through lemmas. To implement the PPDD algorithm, we propose a communication strategy. In addition, we compare the performance of the PPDD algorithm with a Round-robin Scheduling Algorithm (RSA), which is most commonly used. Extensive case studies through numerical analysis have been conducted to verify the theoretical findings.  相似文献   

14.
Background: The recently emerged technology of methylated RNA immunoprecipitation sequencing (MeRIP-seq) sheds light on the study of RNA epigenetics. This new bioinformatics question calls for effective and robust peaking calling algorithms to detect mRNA methylation sites from MeRIP-seq data. Methods: We propose a Bayesian hierarchical model to detect methylation sites from MeRIP-seq data. Our modeling approach includes several important characteristics. First, it models the zero-inflated and over-dispersed counts by deploying a zero-inflated negative binomial model. Second, it incorporates a hidden Markov model (HMM) to account for the spatial dependency of neighboring read enrichment. Third, our Bayesian inference allows the proposed model to borrow strength in parameter estimation, which greatly improves the model stability when dealing with MeRIP-seq data with a small number of replicates. We use Markov chain Monte Carlo (MCMC) algorithms to simultaneously infer the model parameters in a de novo fashion. The R Shiny demo is available at the authors' website and the R/C++ code is available at https://github.com/liqiwei2000/BaySeqPeak. Results: In simulation studies, the proposed method outperformed the competing methods exomePeak and MeTPeak, especially when an excess of zeros were present in the data. In real MeRIP-seq data analysis, the proposed method identified methylation sites that were more consistent with biological knowledge, and had better spatial resolution compared to the other methods. Conclusions: In this study, we develop a Bayesian hierarchical model to identify methylation peaks in MeRIP-seq data. The proposed method has a competitive edge over existing methods in terms of accuracy, robustness and spatial resolution.  相似文献   

15.

Background  

Microarray analysis allows the simultaneous measurement of thousands to millions of genes or sequences across tens to thousands of different samples. The analysis of the resulting data tests the limits of existing bioinformatics computing infrastructure. A solution to this issue is to use High Performance Computing (HPC) systems, which contain many processors and more memory than desktop computer systems. Many biostatisticians use R to process the data gleaned from microarray analysis and there is even a dedicated group of packages, Bioconductor, for this purpose. However, to exploit HPC systems, R must be able to utilise the multiple processors available on these systems. There are existing modules that enable R to use multiple processors, but these are either difficult to use for the HPC novice or cannot be used to solve certain classes of problems. A method of exploiting HPC systems, using R, but without recourse to mastering parallel programming paradigms is therefore necessary to analyse genomic data to its fullest.  相似文献   

16.
MOTIVATION: The efficiency of bioinformatics programmers can be greatly increased through the provision of ready-made software components that can be rapidly combined, with additional bespoke components where necessary, to create finished programs. The new standard for C++ includes an efficient and easy to use library of generic algorithms and data-structures, designed to facilitate low-level component programming. The extension of this library to include functionality that is specifically useful in compute-intensive tasks in bioinformatics and molecular modelling could provide an effective standard for the design of reusable software components within the biocomputing community. RESULTS: A novel application of generic programming techniques in the form of a library of C++ components called the Bioinformatics Template Library (BTL) is presented. This library will facilitate the rapid development of efficient programs by providing efficient code for many algorithms and data-structures that are commonly used in biocomputing, in a generic form that allows them to be flexibly combined with application specific object-oriented class libraries. AVAILABILITY: The BTL is available free of charge from our web site http://www.cryst.bbk.ac.uk/~classlib/ and the EMBL file server http://www.embl-ebi.ac.uk/FTP/index.html  相似文献   

17.
Cloud computing took a step forward in the efficient use of hardware through virtualization technology. And as a result, cloud brings evident benefits for both users and providers. While users can acquire computational resources on-demand elastically, cloud vendors can also utilize maximally the investment costs for data centers infrastructure. In the Internet era, the number of appliances and services migrated to cloud environment increases exponentially. This leads to the expansion of data centers, which become bigger and bigger. Not just that these data centers must have the architecture with a high elasticity in order to serve the huge upsurge of tasks and balance the energy consumption. Although in recent times, many research works have dealt with finite capacity for single job queue in data centers, the multiple finite-capacity queues architecture receives less attention. In reality, the multiple queues architecture is widely used in large data centers. In this paper, we propose a novel three-state model for cloud servers. The model is deployed in both single and multiple finite capacity queues. We also bring forward several strategies to control multiple queues at the same time. This approach allows to reduce service waiting time for jobs and managing elastically the service capability for the whole system. We use CloudSim to simulate the cloud environment and to carry out the experiments in order to demonstrate the operability and effectiveness of the proposed method and strategies. The power consumption is also evaluated to provide insights into the system performance in respect of performance-energy trade-off.  相似文献   

18.
Both distributed systems and multicore systems are difficult programming environments. Although the expert programmer may be able to carefully tune these systems to achieve high performance, the non-expert may struggle. We argue that high level abstractions are an effective way of making parallel computing accessible to the non-expert. An abstraction is a regularly structured framework into which a user may plug in simple sequential programs to create very large parallel programs. By virtue of a regular structure and declarative specification, abstractions may be materialized on distributed, multicore, and distributed multicore systems with robust performance across a wide range of problem sizes. In previous work, we presented the All-Pairs abstraction for computing on distributed systems of single CPUs. In this paper, we extend All-Pairs to multicore systems, and introduce the Wavefront and Makeflow abstractions, which represent a number of problems in economics and bioinformatics. We demonstrate good scaling of both abstractions up to 32 cores on one machine and hundreds of cores in a distributed system.  相似文献   

19.
CROW (Columns and Rows Of Workstations - http://www.sicmm.org/crow/) is a parallel computer cluster based on the Beowulf (http://www.beowulf.org/) idea, modified to support a larger number of processors. Its architecture is based on point-to-point network architecture, which does not require the use of any network switching equipment in the system. Thus, the cost is lower, and there is no degradation in network performance even for a larger number of processors.  相似文献   

20.
SUMMARY: ESS++ is a C++ implementation of a fully Bayesian variable selection approach for single and multiple response linear regression. ESS++ works well both when the number of observations is larger than the number of predictors and in the 'large p, small n' case. In the current version, ESS++ can handle several hundred observations, thousands of predictors and a few responses simultaneously. The core engine of ESS++ for the selection of relevant predictors is based on Evolutionary Monte Carlo. Our implementation is open source, allowing community-based alterations and improvements. AVAILABILITY: C++ source code and documentation including compilation instructions are available under GNU licence at http://bgx.org.uk/software/ESS.html.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号