期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

BABAR: an R package to simplify the normalisation of common reference design microarray-based transcriptomic datasets

Mark J Alston John Seers Jay CD Hinton Sacha Lucchini 《BMC bioinformatics》2010,11(1):73

相似文献

2.

R/parallel – speeding up bioinformatics analysis with R

Gonzalo Vera Ritsert C Jansen Remo L Suppi 《BMC bioinformatics》2008,9(1):390

Background

R is the preferred tool for statistical analysis of many bioinformaticians due in part to the increasing number of freely available analytical methods. Such methods can be quickly reused and adapted to each particular experiment. However, in experiments where large amounts of data are generated, for example using high-throughput screening devices, the processing time required to analyze data is often quite long. A solution to reduce the processing time is the use of parallel computing technologies. Because R does not support parallel computations, several tools have been developed to enable such technologies. However, these tools require multiple modications to the way R programs are usually written or run. Although these tools can finally speed up the calculations, the time, skills and additional resources required to use them are an obstacle for most bioinformaticians. 相似文献

3.

<Emphasis Type="Italic">ClustalXeed</Emphasis>: a GUI-based grid computation version for high performance and terabyte size multiple sequence alignment

Taeho Kim Hyun Joo 《BMC bioinformatics》2010,11(1):467

Background

There is an increasing demand to assemble and align large-scale biological sequence data sets. The commonly used multiple sequence alignment programs are still limited in their ability to handle very large amounts of sequences because the system lacks a scalable high-performance computing (HPC) environment with a greatly extended data storage capacity. 相似文献

4.

A web services choreography scenario for interoperating bioinformatics applications

Remko?de Knikker Youjun?Guo Jin-long?Li Albert?KH?Kwan Kevin?Y?Yip David?W?Cheung Kei-Hoi?Cheung Email author 《BMC bioinformatics》2004,5(1):25

Background

Very often genome-wide data analysis requires the interoperation of multiple databases and analytic tools. A large number of genome databases and bioinformatics applications are available through the web, but it is difficult to automate interoperation because: 1) the platforms on which the applications run are heterogeneous, 2) their web interface is not machine-friendly, 3) they use a non-standard format for data input and output, 4) they do not exploit standards to define application interface and message exchange, and 5) existing protocols for remote messaging are often not firewall-friendly. To overcome these issues, web services have emerged as a standard XML-based model for message exchange between heterogeneous applications. Web services engines have been developed to manage the configuration and execution of a web services workflow. 相似文献

5.

Yabi: An online research environment for grid,high performance and cloud computing

Hunter AA Macgregor AB Szabo TO Wellington CA Bellgard MI 《Source code for biology and medicine》2012,7(1):1

Background

There is a significant demand for creating pipelines or workflows in the life science discipline that chain a number of discrete compute and data intensive analysis tasks into sophisticated analysis procedures. This need has led to the development of general as well as domain-specific workflow environments that are either complex desktop applications or Internet-based applications. Complexities can arise when configuring these applications in heterogeneous compute and storage environments if the execution and data access models are not designed appropriately. These complexities manifest themselves through limited access to available HPC resources, significant overhead required to configure tools and inability for users to simply manage files across heterogenous HPC storage infrastructure. 相似文献

6.

Ontology-Based Meta-Analysis of Global Collections of High-Throughput Public Data

Ilya Kupershmidt Qiaojuan Jane Su Anoop Grewal Suman Sundaresh Inbal Halperin James Flynn Mamatha Shekar Helen Wang Jenny Park Wenwu Cui Gregory D. Wall Robert Wisotzkey Satnam Alag Saeid Akhtari Mostafa Ronaghi 《PloS one》2010,5(9)

Background

The investigation of the interconnections between the molecular and genetic events that govern biological systems is essential if we are to understand the development of disease and design effective novel treatments. Microarray and next-generation sequencing technologies have the potential to provide this information. However, taking full advantage of these approaches requires that biological connections be made across large quantities of highly heterogeneous genomic datasets. Leveraging the increasingly huge quantities of genomic data in the public domain is fast becoming one of the key challenges in the research community today.

Methodology/Results

We have developed a novel data mining framework that enables researchers to use this growing collection of public high-throughput data to investigate any set of genes or proteins. The connectivity between molecular states across thousands of heterogeneous datasets from microarrays and other genomic platforms is determined through a combination of rank-based enrichment statistics, meta-analyses, and biomedical ontologies. We address data quality concerns through dataset replication and meta-analysis and ensure that the majority of the findings are derived using multiple lines of evidence. As an example of our strategy and the utility of this framework, we apply our data mining approach to explore the biology of brown fat within the context of the thousands of publicly available gene expression datasets.

Conclusions

Our work presents a practical strategy for organizing, mining, and correlating global collections of large-scale genomic data to explore normal and disease biology. Using a hypothesis-free approach, we demonstrate how a data-driven analysis across very large collections of genomic data can reveal novel discoveries and evidence to support existing hypothesis. 相似文献

7.

A factor model to analyze heterogeneity in gene expression

Yuna Blum Guillaume Le Mignon Sandrine Lagarrigue David Causeur 《BMC bioinformatics》2010,11(1):368

相似文献

8.

A joint finite mixture model for clustering genes from independent Gaussian and beta distributed data 总被引：1，自引：0，他引：1

Xiaofeng Dai Timo Erkkil? Olli Yli-Harja Harri L?hdesm?ki 《BMC bioinformatics》2009,10(1):165

Background

Cluster analysis has become a standard computational method for gene function discovery as well as for more general explanatory data analysis. A number of different approaches have been proposed for that purpose, out of which different mixture models provide a principled probabilistic framework. Cluster analysis is increasingly often supplemented with multiple data sources nowadays, and these heterogeneous information sources should be made as efficient use of as possible. 相似文献

9.

Biana: a software framework for compiling biological interactions and analyzing networks

Javier Garcia-Garcia Emre Guney Ramon Aragues Joan Planas-Iglesias Baldo Oliva 《BMC bioinformatics》2010,11(1):56

Background

The analysis and usage of biological data is hindered by the spread of information across multiple repositories and the difficulties posed by different nomenclature systems and storage formats. In particular, there is an important need for data unification in the study and use of protein-protein interactions. Without good integration strategies, it is difficult to analyze the whole set of available data and its properties. 相似文献

10.

Goulphar: rapid access and expertise for standard two-color microarray normalization methods

Sophie Lemoine Florence Combes Nicolas Servant Stéphane Le Crom 《BMC bioinformatics》2006,7(1):467-5

相似文献

11.

SNPFile – A software library and file format for large scale association mapping and population genetics studies

Jesper Nielsen Thomas Mailund 《BMC bioinformatics》2008,9(1):526

Background

High-throughput genotyping technology has enabled cost effective typing of thousands of individuals in hundred of thousands of markers for use in genome wide studies. This vast improvement in data acquisition technology makes it an informatics challenge to efficiently store and manipulate the data. While spreadsheets and at text files were adequate solutions earlier, the increased data size mandates more efficient solutions. 相似文献

12.

Empirical Bayes analysis of single nucleotide polymorphisms

Holger Schwender Katja Ickstadt 《BMC bioinformatics》2008,9(1):144

Background

An important goal of whole-genome studies concerned with single nucleotide polymorphisms (SNPs) is the identification of SNPs associated with a covariate of interest such as the case-control status or the type of cancer. Since these studies often comprise the genotypes of hundreds of thousands of SNPs, methods are required that can cope with the corresponding multiple testing problem. For the analysis of gene expression data, approaches such as the empirical Bayes analysis of microarrays have been developed particularly for the detection of genes associated with the response. However, the empirical Bayes analysis of microarrays has only been suggested for binary responses when considering expression values, i.e. continuous predictors. 相似文献

13.

Microarray data analysis: a practical approach for selecting differentially expressed genes

下载免费PDF全文

David M Mutch Alvin Berger Robert Mansourian Andreas Rytz Matthew-Alan Roberts 《Genome biology》2001,2(12):preprint00-29

Background

The biomedical community is rapidly developing new methods of data analysis for microarray experiments, with the goal of establishing new standards to objectively process the massive datasets produced from functional genomic experiments. Each microarray experiment measures thousands of genes simultaneously producing an unprecedented amount of biological information across increasingly numerous experiments; however, in general, only a very small percentage of the genes present on any given array are identified as differentially regulated. The challenge then is to process this information objectively and efficiently in order to obtain knowledge of the biological system under study and by which to compare information gained across multiple experiments. In this context, systematic and objective mathematical approaches, which are simple to apply across a large number of experimental designs, become fundamental to correctly handle the mass of data and to understand the true complexity of the biological systems under study. 相似文献

14.

Automatically visualise and analyse data on pathways using PathVisioRPC from any programming environment

Anwesha Bohler Lars M. T. Eijssen Martijn P. van Iersel Christ Leemans Egon L. Willighagen Martina Kutmon Magali Jaillard Chris T. Evelo 《BMC bioinformatics》2015,16(1)

相似文献

15.

DOVIS: an implementation for high-throughput virtual screening using AutoDock

Shuxing Zhang Kamal Kumar Xiaohui Jiang Anders Wallqvist Jaques Reifman 《BMC bioinformatics》2008,9(1):126

Background

Molecular-docking-based virtual screening is an important tool in drug discovery that is used to significantly reduce the number of possible chemical compounds to be investigated. In addition to the selection of a sound docking strategy with appropriate scoring functions, another technical challenge is to in silico screen millions of compounds in a reasonable time. To meet this challenge, it is necessary to use high performance computing (HPC) platforms and techniques. However, the development of an integrated HPC system that makes efficient use of its elements is not trivial. 相似文献

16.

ParaKMeans: Implementation of a parallelized K-means algorithm suitable for general laboratory use

Piotr Kraj Ashok Sharma Nikhil Garge Robert Podolsky Richard A McIndoe 《BMC bioinformatics》2008,9(1):200

相似文献

17.

New analysis for consistency among markers in the study of genetic diversity: development and application to the description of bacterial diversity

Sandrine Pavoine Xavier Bailly 《BMC evolutionary biology》2007,7(1):156

Background

The development of post-genomic methods has dramatically increased the amount of qualitative and quantitative data available to understand how ecological complexity is shaped. Yet, new statistical tools are needed to use these data efficiently. In support of sequence analysis, diversity indices were developed to take into account both the relative frequencies of alleles and their genetic divergence. Furthermore, a method for describing inter-population nucleotide diversity has recently been proposed and named the double principal coordinate analysis (DPCoA), but this procedure can only be used with one locus. In order to tackle the problem of measuring and describing nucleotide diversity with more than one locus, we developed three versions of multiple DPCoA by using three ordination methods: multiple co-inertia analysis, STATIS, and multiple factorial analysis. 相似文献

18.

ANGSD: Analysis of Next Generation Sequencing Data

Thorfinn Sand Korneliussen Anders Albrechtsen Rasmus Nielsen 《BMC bioinformatics》2014,15(1)

Background

High-throughput DNA sequencing technologies are generating vast amounts of data. Fast, flexible and memory efficient implementations are needed in order to facilitate analyses of thousands of samples simultaneously.

Results

We present a multithreaded program suite called ANGSD. This program can calculate various summary statistics, and perform association mapping and population genetic analyses utilizing the full information in next generation sequencing data by working directly on the raw sequencing data or by using genotype likelihoods.

Conclusions

The open source c/c++ program ANGSD is available at http://www.popgen.dk/angsd. The program is tested and validated on GNU/Linux systems. The program facilitates multiple input formats including BAM and imputed beagle genotype probability files. The program allow the user to choose between combinations of existing methods and can perform analysis that is not implemented elsewhere.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-014-0356-4) contains supplementary material, which is available to authorized users. 相似文献

19.

Pathomx: an interactive workflow-based tool for the analysis of metabolomic data

Martin A Fitzpatrick Catherine M McGrath Stephen P Young 《BMC bioinformatics》2014,15(1)

相似文献

20.

MASQOT: a method for cDNA microarray spot quality control

Max?Bylesj?Email author Daniel?Eriksson Andreas?Sj?din Michael?Sj?str?m Stefan?Jansson Henrik?Antti Johan?Trygg 《BMC bioinformatics》2005,6(1):250

Background

cDNA microarray technology has emerged as a major player in the parallel detection of biomolecules, but still suffers from fundamental technical problems. Identifying and removing unreliable data is crucial to prevent the risk of receiving illusive analysis results. Visual assessment of spot quality is still a common procedure, despite the time-consuming work of manually inspecting spots in the range of hundreds of thousands or more. 相似文献