共查询到20条相似文献,搜索用时 0 毫秒
1.
Statistical tools for synthesizing lists of differentially expressed features in related experiments
下载免费PDF全文

We propose a novel approach for finding a list of features that are commonly perturbed in two or more experiments, quantifying the evidence of dependence between the experiments by a ratio. We present a Bayesian analysis of this ratio, which leads us to suggest two rules for choosing a cut-off on the ranked list of p values. We evaluate and compare the performance of these statistical tools in a simulation study, and show their usefulness on two real datasets. 相似文献
2.
TEQC is an R/Bioconductor package for quality assessment of target enrichment experiments. Quality measures comprise specificity and sensitivity of the capture, enrichment, per-target read coverage and its relation to hybridization probe characteristics, coverage uniformity and reproducibility, and read duplicate analysis. Several diagnostic plots allow visual inspection of the data quality. AVAILABILITY AND IMPLEMENTATION: TEQC is implemented in the R language (version >2.12.0) and is available as a Bioconductor package for Linux, Windows and MacOS from www.bioconductor.org. 相似文献
3.
Sergio Picart-Armada Francesc Fernández-Albert Maria Vinaixa Oscar Yanes Alexandre Perera-Lluna 《BMC bioinformatics》2018,19(1):538
Background
Pathway enrichment techniques are useful for understanding experimental metabolomics data. Their purpose is to give context to the affected metabolites in terms of the prior knowledge contained in metabolic pathways. However, the interpretation of a prioritized pathway list is still challenging, as pathways show overlap and cross talk effects.Results
We introduce FELLA, an R package to perform a network-based enrichment of a list of affected metabolites. FELLA builds a hierarchical representation of an organism biochemistry from the Kyoto Encyclopedia of Genes and Genomes (KEGG), containing pathways, modules, enzymes, reactions and metabolites. In addition to providing a list of pathways, FELLA reports intermediate entities (modules, enzymes, reactions) that link the input metabolites to them. This sheds light on pathway cross talk and potential enzymes or metabolites as targets for the condition under study. FELLA has been applied to six public datasets –three from Homo sapiens, two from Danio rerio and one from Mus musculus– and has reproduced findings from the original studies and from independent literature.Conclusions
The R package FELLA offers an innovative enrichment concept starting from a list of metabolites, based on a knowledge graph representation of the KEGG database that focuses on interpretability. Besides reporting a list of pathways, FELLA suggests intermediate entities that are of interest per se. Its usefulness has been shown at several molecular levels on six public datasets, including human and animal models. The user can run the enrichment analysis through a simple interactive graphical interface or programmatically. FELLA is publicly available in Bioconductor under the GPL-3 license.4.
González JR Armengol L Solé X Guinó E Mercader JM Estivill X Moreno V 《Bioinformatics (Oxford, England)》2007,23(5):644-645
5.
SUMMARY: Gene Ontology (GO) annotations have become a major tool for analysis of genome-scale experiments. We have created OntologyTraverser--an R package for GO analysis of gene lists. Our system is a major advance over previous work because (1) the system can be installed as an R package, (2) the system uses Java to instantiate the GO structure and the SJava system to integrate R and Java and (3) the system is also deployed as a publicly available web tool. AVAILABILITY: Our software is academically available through http://franklin.imgen.bcm.tmc.edu/OntologyTraverser/. Both the R package and the web tool are accessible. CONTACT: cashaw@bcm.tmc.edu 相似文献
6.
ABSTRACT: BACKGROUND: Many methods for dimensionality reduction of large data sets such as those generated in microarray studies boil down to the Singular Value Decomposition (SVD). Although singular vectors associated with the largest singular values have strong optimality properties and can often be quite useful as a tool to summarize the data, they are linear combinations of up to all of the data points, and thus it is typically quite hard to interpret those vectors in terms of the application domain from which the data are drawn. Recently, an alternative dimensionality reduction paradigm, CUR matrix decompositions, has been proposed to address this problem and has been applied to genetic and internet data. CUR decompositions are low-rank matrix decompositions that are explicitly expressed in terms of a small number of actual columns and/or actual rows of the data matrix. Since they are constructed from actual data elements, CUR decompositions are interpretable by practitioners of the eld from which the data are drawn. RESULTS: We present an implementation to perform CUR matrix decompositions, in the form of a freely available, open source R-package called rCUR. This package will help users to perform CUR-based analysis on large-scale data, such as those obtained from different high-throughput technologies, in an interactive and exploratory manner. We show two examples that illustrate how CUR-based techniques make it possible to reduce signicantly the number of probes, while at the same time maintaining major trends in data and keeping the same classication accuracy. CONCLUSIONS: The package rCUR provides functions for the users to perform CUR-based matrix decompositions in the R environment. In gene expression studies, it gives an additional way of analysis of differential expression and discriminant gene selection based on the use of statistical leverage scores. These scores, which have been used historically in diagnostic regression analysis to identify outliers, can be used by rCUR to identify the most informative data points with respect to which to express the remaining data points. 相似文献
7.
RefPlus: an R package extending the RMA Algorithm 总被引:1,自引:0,他引:1
RMA has become a widely used methodology to pre-process Affymetrix gene expression microarrays. A limitation of RMA is that the calculated probeset intensities change when a set of microarrays is re-pre-processed after the inclusion of additional microarrays into the analysis set. Here we report the availability of the RefPlus package containing functions to perform the Extrapolation Strategy and Extrapolation Averaging algorithms which address these issues. AVAILABILITY: The software is implemented in the R language and can be downloaded from the Bioconductor project website (http://www.bioconductor.org). SUPPLEMENTARY INFORMATION: Further details of the workings and evaluation of these functions are given in the documentation available on the Bioconductor website. 相似文献
8.
POLYSAT: an R package for polyploid microsatellite analysis 总被引:4,自引:0,他引:4
We present an R package to help remedy the lack of software for manipulating and analysing autopolyploid and allopolyploid microsatellite data. POLYSAT can handle genotype data of any ploidy, including populations of mixed ploidy, and assumes that allele copy number is always ambiguous in partial heterozygotes. It can import and export genotype data in eight different formats, calculate pairwise distances between individuals using a stepwise mutation and infinite alleles model, estimate ploidy based on allele counts and estimate allele frequencies and pairwise F(ST) values. This software is freely available through the Comprehensive R Archive Network (http://cran.r-project.org/) and includes a thorough tutorial. 相似文献
9.
APCluster: an R package for affinity propagation clustering 总被引:3,自引:0,他引:3
10.
Background
The proteomics literature has seen a proliferation of publications that seek to apply the rapidly improving technology of 2D gels to study various biological systems. However, there is a dearth of systematic studies that have investigated appropriate statistical approaches to analyse the data from these experiments. 相似文献11.
Boulesteix AL 《Bioinformatics (Oxford, England)》2007,23(13):1702-1704
12.
WGCNA: an R package for weighted correlation network analysis 总被引:12,自引:0,他引:12
Background
Modelling the time-related behaviour of biological systems is essential for understanding their dynamic responses to perturbations. In metabolic profiling studies, the sampling rate and number of sampling points are often restricted due to experimental and biological constraints.Results
A supervised multivariate modelling approach with the objective to model the time-related variation in the data for short and sparsely sampled time-series is described. A set of piecewise Orthogonal Projections to Latent Structures (OPLS) models are estimated, describing changes between successive time points. The individual OPLS models are linear, but the piecewise combination of several models accommodates modelling and prediction of changes which are non-linear with respect to the time course. We demonstrate the method on both simulated and metabolic profiling data, illustrating how time related changes are successfully modelled and predicted.Conclusion
The proposed method is effective for modelling and prediction of short and multivariate time series data. A key advantage of the method is model transparency, allowing easy interpretation of time-related variation in the data. The method provides a competitive complement to commonly applied multivariate methods such as OPLS and Principal Component Analysis (PCA) for modelling and analysis of short time-series data. 相似文献13.
Tad Dallas 《Ecography》2014,37(4):402-405
Metacommunity theory is an extension of metapopulation theory with the goal of understanding how ecological communities vary through space and time. One off‐shoot of metacommunity theory deals with understanding how community structure varies along biotic or environmental gradients. The Elements of Metacommunity Structure framework is a three‐tiered analysis of metacommunity structure that enables the user to identify metacommunity properties that arise in site‐by‐species incidence matrices. These properties can then be related to underlying variables that influence species distributions. The EMS framework is now implemented in metacom, an open source R package that allows for the analysis and plotting of metacommunities. 相似文献
14.
Quentin Grimonprez Alain Celisse Samuel Blanck Meyling Cheok Martin Figeac Guillemette Marot 《BMC bioinformatics》2014,15(1)
Background
Last generations of Single Nucleotide Polymorphism (SNP) arrays allow to study copy-number variations in addition to genotyping measures.Results
MPAgenomics, standing for multi-patient analysis (MPA) of genomic markers, is an R-package devoted to: (i) efficient segmentation and (ii) selection of genomic markers from multi-patient copy number and SNP data profiles. It provides wrappers from commonly used packages to streamline their repeated (sometimes difficult) manipulation, offering an easy-to-use pipeline for beginners in R.The segmentation of successive multiple profiles (finding losses and gains) is performed with an automatic choice of parameters involved in the wrapped packages. Considering multiple profiles in the same time, MPAgenomics wraps efficient penalized regression methods to select relevant markers associated with a given outcome.Conclusions
MPAgenomics provides an easy tool to analyze data from SNP arrays in R. The R-package MPAgenomics is available on CRAN. 相似文献15.
SUMMARY: Pvclust is an add-on package for a statistical software R to assess the uncertainty in hierarchical cluster analysis. Pvclust can be used easily for general statistical problems, such as DNA microarray analysis, to perform the bootstrap analysis of clustering, which has been popular in phylogenetic analysis. Pvclust calculates probability values (p-values) for each cluster using bootstrap resampling techniques. Two types of p-values are available: approximately unbiased (AU) p-value and bootstrap probability (BP) value. Multiscale bootstrap resampling is used for the calculation of AU p-value, which has superiority in bias over BP value calculated by the ordinary bootstrap resampling. In addition the computation time can be enormously decreased with parallel computing option. 相似文献
16.
SUMMARY: OrderedList is a Bioconductor compliant package for meta-analysis based on ordered gene lists like those resulting from differential gene expression analysis. Our package quantifies the similarity between gene lists. The significance of the similarity score is estimated from random scores computed on perturbed data. OrderedList illustrates list similarity in intuitive plots and determines the score-driving genes for further analysis. AVAILABILITY: http://www.bioconductor.org CONTACT: claudio.lottaz@molgen.mpg.de SUPPLEMENTARY INFORMATION: Please visit our webpage on http://compdiag.molgen.mpg.de/software. 相似文献
17.
18.
SUMMARY: OTUbase is an R package designed to facilitate the analysis of operational taxonomic unit (OTU) data and sequence classification (taxonomic) data. Currently there are programs that will cluster sequence data into OTUs and/or classify sequence data into known taxonomies. However, there is a need for software that can take the summarized output of these programs and organize it into easily accessed and manipulated formats. OTUbase provides this structure and organization within R, to allow researchers to easily manipulate the data with the rich library of R packages currently available for additional analysis. AVAILABILITY: OTUbase is an R package available through Bioconductor. It can be found at http://www.bioconductor.org/packages/release/bioc/html/OTUbase.html. 相似文献
19.
20.