共查询到20条相似文献,搜索用时 15 毫秒
1.
MOTIVATION: Microarray-based expression profiles have become a standard methodology in any high-throughput analysis. Several commercial platforms are available, each with its strengths and weaknesses. The R platform for statistical analysis and graphics is a powerful environment for the analysis of microarray data, because it has many integrated statistical methods available as well as the specialized microarray analysis project Bioconductor. Many packages have been added in the last few years increasing the range of possible analysis. Here, we report the availability of a package for reading and analyzing data from GE Healthcare Gene Expression Bioarrays within the R environment. AVAILABILITY: The software is implemented in the R language, is open source and available for download free of charge through the Bioconductor (http://www.bioconductor.org) project. 相似文献
2.
Culhane AC Thioulouse J Perrière G Higgins DG 《Bioinformatics (Oxford, England)》2005,21(11):2789-2790
SUMMARY: MADE4, microarray ade4, is a software package that facilitates multivariate analysis of microarray gene-expression data. MADE4 accepts a wide variety of gene-expression data formats. MADE4 takes advantage of the extensive multivariate statistical and graphical functions in the R package ade4, extending these for application to microarray data. In addition, MADE4 provides new graphical and visualization tools that aid in interpretation of multivariate analysis of microarray data. 相似文献
3.
Lian H 《Biostatistics (Oxford, England)》2008,9(3):411-418
We propose a new statistics for the detection of differentially expressed genes when the genes are activated only in a subset of the samples. Statistics designed for this unconventional circumstance has proved to be valuable for most cancer studies, where oncogenes are activated for a small number of disease samples. Previous efforts made in this direction include cancer outlier profile analysis (Tomlins and others, 2005), outlier sum (Tibshirani and Hastie, 2007), and outlier robust t-statistics (Wu, 2007). We propose a new statistics called maximum ordered subset t-statistics (MOST) which seems to be natural when the number of activated samples is unknown. We compare MOST to other statistics and find that the proposed method often has more power then its competitors. 相似文献
4.
Background
So far many algorithms have been proposed towards the detection of significant genes in microarray analysis problems. Several of those approaches are freely available as R-packages though their engagement in gene expression analysis by non-bioinformaticians is usually a frustrating task. Besides, only some of those packages offer a complete suite of tools starting from initial data import and ending to analysis report. Here we present an R/Bioconductor package that implements a hybrid gene selection method along with a bunch of functions to facilitate a thorough and convenient gene expression profiling analysis.Results
mAPKL is an open-source R/Bioconductor package that implements the mAP-KL hybrid gene selection method. The advantage of this method is that selects a small number of gene exemplars while achieving comparable classification results to other well established algorithms on a variety of datasets and dataset sizes. The mAPKL package is accompanied with extra functionalities including (i) solid data import; (ii) data sampling following a user-defined proportion; (iii) preprocessing through several normalization and transformation alternatives; (iv) classification with the aid of SVM and performance evaluation; (v) network analysis of the significant genes (exemplars), including degree of centrality, closeness, betweeness, clustering coefficient as well as the construction of an edge list table; (vi) gene annotation analysis, (vii) pathway analysis and (viii) auto-generated analysis reporting.Conclusions
Users are able to run a thorough gene expression analysis in a timely manner starting from raw data and concluding to network characteristics of the selected gene exemplars. Detailed instructions and example data are provided in the R package, which is freely available at Bioconductor under the GPL-2 or later license http://www.bioconductor.org/packages/3.1/bioc/html/mAPKL.html.Electronic supplementary material
The online version of this article (doi:10.1186/s12859-015-0719-5) contains supplementary material, which is available to authorized users. 相似文献5.
Differential network analysis provides a framework for examining if there is sufficient statistical evidence to conclude that the
structure of a network differs under two experimental conditions or if the structures of two networks are different. The R package
dna provides tools and procedures for differential network analysis of genomic data. The focus of this package is on gene-gene
networks, but the methods are easily adaptable for more general biological processes. This package includes preprocessing tools
for simultaneously preparing a pair of networks for analysis, procedures for computing connectivity scores between pairs of genes
based on many available statistical techniques, and tools for handling modules of genes based on these scores. Also, procedures
are provided for performing permutation tests based on these scores to determine if the connectivity of a gene differs between the
two networks, to determine if the connectivity of a particular set of important genes differs between the two networks, and to
determine if the overall module structure differs between the two networks. Several built-in options are available for the types of
scores and distances used in the testing procedures, and additionally, the procedures provide flexible methods that allow the user
to define custom scores and distances.
Availability
dna is freely available at The Comprehensive R Archive Network, http://CRAN.R-project.org/package=dna 相似文献6.
7.
Begun A 《Bioinformatics (Oxford, England)》2006,22(23):2905-2909
MOTIVATION: A steadily increasing number of experiments with microarrays stimulate the further development of the statistical methods of the analysis of gene expression data. One of the central problems in this area is detecting differential gene expression under two or more conditions. Unfortunately, up to now it has not been studied how the correlations between related individuals, such as twins influence the estimates of differential gene expression. RESULTS: In this paper, we discuss this problem and propose a new method that is robust with respect to correlations of gene expression data for twins. 相似文献
8.
Jesper R. G?din Ferdinand M. van’t Hooft Per Eriksson Lasse Folkersen 《BMC bioinformatics》2015,16(1)
Background
One aspect in which RNA sequencing is more valuable than microarray-based methods is the ability to examine the allelic imbalance of the expression of a gene. This process is often a complex task that entails quality control, alignment, and the counting of reads over heterozygous single-nucleotide polymorphisms. Allelic imbalance analysis is subject to technical biases, due to differences in the sequences of the measured alleles. Flexible bioinformatics tools are needed to ease the workflow while retaining as much RNA sequencing information as possible throughout the analysis to detect and address the possible biases.Results
We present AllelicImblance, a software program that is designed to detect, manage, and visualize allelic imbalances comprehensively. The purpose of this software is to allow users to pose genetic questions in any RNA sequencing experiment quickly, enhancing the general utility of RNA sequencing. The visualization features can reveal notable, non-trivial allelic imbalance behavior over specific regions, such as exons.Conclusions
The software provides a complete framework to perform allelic imbalance analyses of aligned RNA sequencing data, from detection to visualization, within the robust and versatile management class, ASEset.Electronic supplementary material
The online version of this article (doi:10.1186/s12859-015-0620-2) contains supplementary material, which is available to authorized users. 相似文献9.
SUMMARY: Gene Ontology (GO) annotations have become a major tool for analysis of genome-scale experiments. We have created OntologyTraverser--an R package for GO analysis of gene lists. Our system is a major advance over previous work because (1) the system can be installed as an R package, (2) the system uses Java to instantiate the GO structure and the SJava system to integrate R and Java and (3) the system is also deployed as a publicly available web tool. AVAILABILITY: Our software is academically available through http://franklin.imgen.bcm.tmc.edu/OntologyTraverser/. Both the R package and the web tool are accessible. CONTACT: cashaw@bcm.tmc.edu 相似文献
10.
SUMMARY: Gene copy number and DNA methylation alterations are key regulators of gene expression in cancer. Accordingly, genes that show simultaneous methylation, copy number and expression alterations are likely to have a key role in tumor progression. We have implemented a novel software package (CNAmet) for integrative analysis of high-throughput copy number, DNA methylation and gene expression data. To demonstrate the utility of CNAmet, we use copy number, DNA methylation and gene expression data from 50 glioblastoma multiforme and 188 ovarian cancer primary tumor samples. Our results reveal a synergistic effect of DNA methylation and copy number alterations on gene expression for several known oncogenes as well as novel candidate oncogenes. AVAILABILITY: CNAmet R-package and user guide are freely available under GNU General Public License at http://csbi.ltdk.helsinki.fi/CNAmet. 相似文献
11.
POLYSAT: an R package for polyploid microsatellite analysis 总被引:4,自引:0,他引:4
We present an R package to help remedy the lack of software for manipulating and analysing autopolyploid and allopolyploid microsatellite data. POLYSAT can handle genotype data of any ploidy, including populations of mixed ploidy, and assumes that allele copy number is always ambiguous in partial heterozygotes. It can import and export genotype data in eight different formats, calculate pairwise distances between individuals using a stepwise mutation and infinite alleles model, estimate ploidy based on allele counts and estimate allele frequencies and pairwise F(ST) values. This software is freely available through the Comprehensive R Archive Network (http://cran.r-project.org/) and includes a thorough tutorial. 相似文献
12.
APCluster: an R package for affinity propagation clustering 总被引:3,自引:0,他引:3
13.
ABSTRACT: BACKGROUND: Many methods for dimensionality reduction of large data sets such as those generated in microarray studies boil down to the Singular Value Decomposition (SVD). Although singular vectors associated with the largest singular values have strong optimality properties and can often be quite useful as a tool to summarize the data, they are linear combinations of up to all of the data points, and thus it is typically quite hard to interpret those vectors in terms of the application domain from which the data are drawn. Recently, an alternative dimensionality reduction paradigm, CUR matrix decompositions, has been proposed to address this problem and has been applied to genetic and internet data. CUR decompositions are low-rank matrix decompositions that are explicitly expressed in terms of a small number of actual columns and/or actual rows of the data matrix. Since they are constructed from actual data elements, CUR decompositions are interpretable by practitioners of the eld from which the data are drawn. RESULTS: We present an implementation to perform CUR matrix decompositions, in the form of a freely available, open source R-package called rCUR. This package will help users to perform CUR-based analysis on large-scale data, such as those obtained from different high-throughput technologies, in an interactive and exploratory manner. We show two examples that illustrate how CUR-based techniques make it possible to reduce signicantly the number of probes, while at the same time maintaining major trends in data and keeping the same classication accuracy. CONCLUSIONS: The package rCUR provides functions for the users to perform CUR-based matrix decompositions in the R environment. In gene expression studies, it gives an additional way of analysis of differential expression and discriminant gene selection based on the use of statistical leverage scores. These scores, which have been used historically in diagnostic regression analysis to identify outliers, can be used by rCUR to identify the most informative data points with respect to which to express the remaining data points. 相似文献
14.
Cordonnier-Pratt MM Liang C Wang H Kolychev DS Sun F Freeman R Sullivan R Pratt LH 《Comparative and Functional Genomics》2004,5(3):268-275
The rapidly increasing rate at which biological data is being produced requires a corresponding growth in relational databases and associated tools that can help laboratories contend with that data. With this need in mind, we describe here a Modular Approach to a Genomic, Integrated and Comprehensive (MAGIC) Database. This Oracle 9i database derives from an initial focus in our laboratory on gene discovery via production and analysis of expressed sequence tags (ESTs), and subsequently on gene expression as assessed by both EST clustering and microarrays. The MAGIC Gene Discovery portion of the database focuses on information derived from DNA sequences and on its biological relevance. In addition to MAGIC SEQ-LIMS, which is designed to support activities in the laboratory, it contains several additional subschemas. The latter include MAGIC Admin for database administration, MAGIC Sequence for sequence processing as well as sequence and clone attributes, MAGIC Cluster for the results of EST clustering, MAGIC Polymorphism in support of microsatellite and single-nucleotide-polymorphism discovery, and MAGIC Annotation for electronic annotation by BLAST and BLAT. The MAGIC Microarray portion is a MIAME-compliant database with two components at present. These are MAGIC Array-LIMS, which makes possible remote entry of all information into the database, and MAGIC Array Analysis, which provides data mining and visualization. Because all aspects of interaction with the MAGIC Database are via a web browser, it is ideally suited not only for individual research laboratories but also for core facilities that serve clients at any distance. 相似文献
15.
Probe-level measurement error improves accuracy in detecting differential gene expression 总被引:1,自引:0,他引:1
MOTIVATION: Finding differentially expressed genes is a fundamental objective of a microarray experiment. Numerous methods have been proposed to perform this task. Existing methods are based on point estimates of gene expression level obtained from each microarray experiment. This approach discards potentially useful information about measurement error that can be obtained from an appropriate probe-level analysis. Probabilistic probe-level models can be used to measure gene expression and also provide a level of uncertainty in this measurement. This probe-level measurement error provides useful information which can help in the identification of differentially expressed genes. RESULTS: We propose a Bayesian method to include probe-level measurement error into the detection of differentially expressed genes from replicated experiments. A variational approximation is used for efficient parameter estimation. We compare this approximation with MAP and MCMC parameter estimation in terms of computational efficiency and accuracy. The method is used to calculate the probability of positive log-ratio (PPLR) of expression levels between conditions. Using the measurements from a recently developed Affymetrix probe-level model, multi-mgMOS, we test PPLR on a spike-in dataset and a mouse time-course dataset. Results show that the inclusion of probe-level measurement error improves accuracy in detecting differential gene expression. AVAILABILITY: The MAP approximation and variational inference described in this paper have been implemented in an R package pplr. The MCMC method is implemented in Matlab. Both software are available from http://umber.sbs.man.ac.uk/resources/puma. 相似文献
16.
SUMMARY: OrderedList is a Bioconductor compliant package for meta-analysis based on ordered gene lists like those resulting from differential gene expression analysis. Our package quantifies the similarity between gene lists. The significance of the similarity score is estimated from random scores computed on perturbed data. OrderedList illustrates list similarity in intuitive plots and determines the score-driving genes for further analysis. AVAILABILITY: http://www.bioconductor.org CONTACT: claudio.lottaz@molgen.mpg.de SUPPLEMENTARY INFORMATION: Please visit our webpage on http://compdiag.molgen.mpg.de/software. 相似文献
17.
Bickel DR 《Bioinformatics (Oxford, England)》2004,20(5):682-688
MOTIVATION: Many methods of identifying differential expression in genes depend on testing the null hypotheses of exactly equal means or distributions of expression levels for each gene across groups, even though a statistically significant difference in the expression level does not imply the occurrence of any difference of biological or clinical significance. This is because a mathematical definition of 'differential expression' as any non-zero difference does not correspond to the differential expression biologists seek. Furthermore, while some current methods account for multiple comparisons in hypothesis tests, they do not accordingly adjust estimates of the degrees to which genes are differentially expressed. Both problems lead to overstating the relevance of findings. RESULTS: Testing whether genes have relevant differential expression can be accomplished with customized null hypotheses, thereby redefining 'differential expression' in a way that is more biologically meaningful. When such tests control the false discovery rate, they effectively discover genes based on a desired quantile of differential gene expression. Estimation of the degree to which genes are differentially expressed has been corrected for multiple comparisons. AVAILABILITY: R code is freely available from http://www.davidbickel.com and may become available from www.r-project.org or www.bioconductor.org SUPPLEMENTARY INFORMATION: Applications to cancer microarrays, an application in the absence of differential expression, pseudocode, and a guide to customizing the methods may be found at www.davidbickel.com and www.mathpreprints.com 相似文献
18.
In previous work, we proposed a method for detecting differential gene expression based on change-point of expression profile. This non-parametric change-point method gave promising result in both simulation study and public dataset experiment. However, the performance is still limited by the less sensitiveness to the right bound and the statistical significance of the statistics has not been fully explored. To overcome the insensitiveness to the right bound we modified the original method by adding a weight function to the D(n) statistic. Simulation study showed that the weighted change-point statistics method is significantly better than the original NPCPS in terms of ROC, false positive rate, as well as change-point estimate. The mean absolute error of the estimated change-point by weighted change-point method was 0.03, reduced by more than 50% comparing with the original 0.06, and the mean FPR was reduced by more than 55%. Experiment on microarray Dataset I resulted in 3974 differentially expressed genes out of total 5293 genes; experiment on microarray Dataset II resulted in 9983 differentially expressed genes among total 12576 genes. In summary, the method proposed here is an effective modification to the previous method especially when only a small subset of cancer samples has DGE. 相似文献
19.
WGCNA: an R package for weighted correlation network analysis 总被引:12,自引:0,他引:12
Background
Modelling the time-related behaviour of biological systems is essential for understanding their dynamic responses to perturbations. In metabolic profiling studies, the sampling rate and number of sampling points are often restricted due to experimental and biological constraints.Results
A supervised multivariate modelling approach with the objective to model the time-related variation in the data for short and sparsely sampled time-series is described. A set of piecewise Orthogonal Projections to Latent Structures (OPLS) models are estimated, describing changes between successive time points. The individual OPLS models are linear, but the piecewise combination of several models accommodates modelling and prediction of changes which are non-linear with respect to the time course. We demonstrate the method on both simulated and metabolic profiling data, illustrating how time related changes are successfully modelled and predicted.Conclusion
The proposed method is effective for modelling and prediction of short and multivariate time series data. A key advantage of the method is model transparency, allowing easy interpretation of time-related variation in the data. The method provides a competitive complement to commonly applied multivariate methods such as OPLS and Principal Component Analysis (PCA) for modelling and analysis of short time-series data. 相似文献20.