首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
Normalization is an essential step in the analysis of high-throughput data. Multi-sample global normalization methods, such as quantile normalization, have been successfully used to remove technical variation. However, these methods rely on the assumption that observed global changes across samples are due to unwanted technical variability. Applying global normalization methods has the potential to remove biologically driven variation. Currently, it is up to the subject matter experts to determine if the stated assumptions are appropriate. Here, we propose a data-driven alternative. We demonstrate the utility of our method (quantro) through examples and simulations. A software implementation is available from http://www.bioconductor.org/packages/release/bioc/html/quantro.html.

Electronic supplementary material

The online version of this article (doi:10.1186/s13059-015-0679-0) contains supplementary material, which is available to authorized users.  相似文献   

2.
New normalization methods for cDNA microarray data   总被引:7,自引:0,他引:7  
MOTIVATION: The focus of this paper is on two new normalization methods for cDNA microarrays. After the image analysis has been performed on a microarray and before differentially expressed genes can be detected, some form of normalization must be applied to the microarrays. Normalization removes biases towards one or other of the fluorescent dyes used to label each mRNA sample allowing for proper evaluation of differential gene expression. RESULTS: The two normalization methods that we present here build on previously described non-linear normalization techniques. We extend these techniques by firstly introducing a normalization method that deals with smooth spatial trends in intensity across microarrays, an important issue that must be dealt with. Secondly we deal with normalization of a new type of cDNA microarray experiment that is coming into prevalence, the small scale specialty or 'boutique' array, where large proportions of the genes on the microarrays are expected to be highly differentially expressed. AVAILABILITY: The normalization methods described in this paper are available via http://www.pi.csiro.au/gena/ in a software suite called tRMA: tools for R Microarray Analysis upon request of the authors. Images and data used in this paper are also available via the same link.  相似文献   

3.
4.
5.
We present a fast, versatile and adaptive-multiscale algorithm for analyzing a wide-variety of DNA microarray data. Its primary application is in normalization of array data as well as subsequent identification of 'enriched targets', e.g. differentially expressed genes in expression profiling arrays and enriched sites in ChIP-on-chip experimental data. We show how to accommodate the unique characteristics of ChIP-on-chip data, where the set of 'enriched targets' is large, asymmetric and whose proportion to the whole data varies locally. SUPPLEMENTARY INFORMATION: Supplementary figures, related preprint, free software as well as our raw DNA microarray data with PCR validations are available at http://www.math.umn.edu/~lerman/supp/bioinfo06 as well as Bioinformatics online.  相似文献   

6.
AMarge     
AMarge is a web tool for the automatic quality assessment of Affymetrix GeneChip data. It is essential to have a trustworthy set of chips in order to derive gene expression data for phenotypic analysis, and AMarge provides a complete and rigorous web-accessible tool to fulfill this need. The quality assessment steps include image plots of weights derived from a robust linear model fit of the data, a 3'/5' RNA digestion plot, and Affymetrix Microarray Suite version 5.0 (MAS 5.0) quality standard procedures. Furthermore, robust multi-array average expression values are generated in order to have a start-up expression set for the subsequent analysis. The results of the complete analysis are summarised and returned as an HTML report. AVAILABILITY: The AMarge web interface is accessible at http://nin.crg.es/cgi-binf/AMargeWeb.cgi. A mirror server is also available at http://bioinformatics.istge.it/AMarge-bin/AMargeWeb.cgi. The software implementing all these methods is part of the Bioconductor project (http://www.bioconductor.org).  相似文献   

7.
Introduction: The technological and scientific progress performed in the Human Proteome Project (HPP) has provided to the scientific community a new set of experimental and bioinformatic methods in the challenging field of shotgun and SRM/MRM-based Proteomics. The requirements for a protein to be considered experimentally validated are now well-established, and the information about the human proteome is available in the neXtProt database, while targeted proteomic assays are stored in SRMAtlas. However, the study of the missing proteins continues being an outstanding issue.

Areas covered: This review is focused on the implementation of proteogenomic methods designed to improve the detection and validation of the missing proteins. The evolution of the methodological strategies based on the combination of different omic technologies and the use of huge publicly available datasets is shown taking the Chromosome 16 Consortium as reference.

Expert commentary: Proteogenomics and other strategies of data analysis implemented within the C-HPP initiative could be used as guidance to complete in a near future the catalog of the human proteins. Besides, in the next years, we will probably witness their use in the B/D-HPP initiative to go a step forward on the implications of the proteins in the human biology and disease.  相似文献   


8.
PartiGene--constructing partial genomes   总被引:4,自引:0,他引:4  
Expressed sequence tags (ESTs) offer a low-cost approach to gene discovery and are being used by an increasing number of laboratories to obtain sequence information for a wide variety of organisms. The challenge lies in processing and organizing this data within a genomic context to facilitate large scale analyses. Here we present PartiGene, an integrated sequence analysis suite that uses freely available public domain software to (1) process raw trace chromatograms into sequence objects suitable for submission to dbEST; (2) place these sequences within a genomic context; (3) perform customizable first-pass annotation of the data; and (4) present the data as HTML tables and an SQL database resource. PartiGene has been used to create a number of non-model organism database resources including NEMBASE (http://www.nematodes.org) and LumbriBase (http://www.earthworms.org/). The packages are readily portable, freely available and can be run on simple Linux-based workstations. AVAILABILITY: PartiGene is available from http://www.nematodes.org/PartiGene and also forms part of the EST analysis software, associated with the Natural Environmental Research Council (UK) Bio-Linux project (http://envgen.nox.ac.uk/biolinux.html).  相似文献   

9.
10.
MOTIVATION: There is a very large and growing level of effort toward improving the platforms, experiment designs, and data analysis methods for microarray expression profiling. Along with a growing richness in the approaches there is a growing confusion among most scientists as to how to make objective comparisons and choices between them for different applications. There is a need for a standard framework for the microarray community to compare and improve analytical and statistical methods. RESULTS: We report on a microarray data set comprising 204 in-situ synthesized oligonucleotide arrays, each hybridized with two-color cDNA samples derived from 20 different human tissues and cell lines. Design of the approximately 24 000 60mer oligonucleotides that report approximately 2500 known genes on the arrays, and design of the hybridization experiments, were carried out in a way that supports the performance assessment of alternative data processing approaches and of alternative experiment and array designs. We also propose standard figures of merit for success in detecting individual differential expression changes or expression levels, and for detecting similarities and differences in expression patterns across genes and experiments. We expect this data set and the proposed figures of merit will provide a standard framework for much of the microarray community to compare and improve many analytical and statistical methods relevant to microarray data analysis, including image processing, normalization, error modeling, combining of multiple reporters per gene, use of replicate experiments, and sample referencing schemes in measurements based on expression change. AVAILABILITY/SUPPLEMENTARY INFORMATION: Expression data and supplementary information are available at http://www.rii.com/publications/2003/HE_SDS.htm  相似文献   

11.
Data Analysis Tool Extension (DAnTE) is a statistical tool designed to address challenges associated with quantitative bottom-up, shotgun proteomics data. This tool has also been demonstrated for microarray data and can easily be extended to other high-throughput data types. DAnTE features selected normalization methods, missing value imputation algorithms, peptide-to-protein rollup methods, an extensive array of plotting functions and a comprehensive hypothesis-testing scheme that can handle unbalanced data and random effects. The graphical user interface (GUI) is designed to be very intuitive and user friendly. AVAILABILITY: DAnTE may be downloaded free of charge at http://omics.pnl.gov/software/. SUPPLEMENTARY INFORMATION: An example dataset with instructions on how to perform a series of analysis steps is available at http://omics.pnl.gov/software/  相似文献   

12.
RESULTS: A new algorithm is developed which is intended to find groups of genes whose expression values change in a concordant manner in a series of experiments with DNA arrays. This algorithm is named as CoexpressionFinder. It can find more complete and internally coordinated groups of gene expression vectors than hierarchical clustering. Also, it finds more genes having coordinated expression. The algorithm's design allows parallel execution. AVAILABILITY: The algorithm is implemented as a Java application which is freely available at: http://www.bioinformatics.ru/cf/index.jsp and http://bioinformatics.ru/cf/index.jsp.  相似文献   

13.
MOTIVATION: Methods for analyzing cancer microarray data often face two distinct challenges: the models they infer need to perform well when classifying new tissue samples while at the same time providing an insight into the patterns and gene interactions hidden in the data. State-of-the-art supervised data mining methods often cover well only one of these aspects, motivating the development of methods where predictive models with a solid classification performance would be easily communicated to the domain expert. RESULTS: Data visualization may provide for an excellent approach to knowledge discovery and analysis of class-labeled data. We have previously developed an approach called VizRank that can score and rank point-based visualizations according to degree of separation of data instances of different class. We here extend VizRank with techniques to uncover outliers, score features (genes) and perform classification, as well as to demonstrate that the proposed approach is well suited for cancer microarray analysis. Using VizRank and radviz visualization on a set of previously published cancer microarray data sets, we were able to find simple, interpretable data projections that include only a small subset of genes yet do clearly differentiate among different cancer types. We also report that our approach to classification through visualization achieves performance that is comparable to state-of-the-art supervised data mining techniques. AVAILABILITY: VizRank and radviz are implemented as part of the Orange data mining suite (http://www.ailab.si/orange). SUPPLEMENTARY INFORMATION: Supplementary data are available from http://www.ailab.si/supp/bi-cancer.  相似文献   

14.
MOTIVATION: Genome-wide association studies (GWAS) based on single nucleotide polymorphism (SNP) arrays are the most widely used approach to detect loci associated to human traits. Due to the complexity of the methods and software packages available, each with its particular format requiring intricate management workflows, the analysis of GWAS usually confronts scientists with steep learning curves. Indeed, the wide variety of tools makes the parsing and manipulation of data the most time consuming and error prone part of a study. To help resolve these issues, we present GWASpi, a user-friendly, multiplatform, desktop-able application for the management and analysis of GWAS data, with a novel approach on database technologies to leverage the most out of commonly available desktop hardware. GWASpi aims to be a start-to-finish GWAS management application, from raw data to results, containing the most common analysis tools. As a result, GWASpi is easy to use and reduces in up to two orders of magnitude the time needed to perform the fundamental steps of a GWAS. AVAILABILITY: Freely available on the web at http://www.gwaspi.org. Implemented in Java, Apache-Derby and NetCDF-3, with all major operating systems supported. CONTACT: gwaspi@upf.edu; arcadi.navarro@upf.edu.  相似文献   

15.
Background: Epiphyte removal forms part of routine management in shade coffee plantations.

Aims: Assess the current status of three population orchids growing in Mexican shaded coffee plantations and evaluate the effect of perturbing the transient behaviour of different life stages.

Methods: We modelled the short-term response of eliminating I) non-reproductive juveniles, or II) reproductive adult plants from coffee bushes, on populations of Oncidium poikilostalix, Lepanthes acuminata and Telipogon helleri (Orchidaceae). First, we calculated the transient dynamics per se and second, we made a perturbation analysis on population inertia. Finally, we made a comparison with a traditional sensitivity analysis.

Results: All three species showed different positive asymptotic growth rate: O. poikilostalix (λmax = 1.106), L. acuminata (λmax = 1.209), and T. helleri (λmax = 1.012). The effect of eliminating the major part of the juvenile or adult orchids gave population inertia in relation to steady state, respectively, (+19%, -24%) for O. poikilostalix, (+17%, -28%) for T. helleri and (+57%, -35%) for L. acuminata.

Conclusions: Eliminating juveniles or adults affects in different ways the short-term dynamics due to differential impact on size stages that have the non-linear effects associated with important disturbances that currently affect orchids growing in coffee plantations.  相似文献   


16.
The integration of genomic and epigenomic data is an increasingly popular approach for studying the complex mechanisms driving cancer development. We have developed a method for evaluating both methylation and copy number from high-density DNA methylation arrays. Comparing copy number data from Infinium HumanMethylation450 BeadChips and SNP arrays, we demonstrate that Infinium arrays detect copy number alterations with the sensitivity of SNP platforms. These results show that high-density methylation arrays provide a robust and economic platform for detecting copy number and methylation changes in a single experiment. Our method is available in the ChAMP Bioconductor package: http://www.bioconductor.org/packages/2.13/bioc/html/ChAMP.html.  相似文献   

17.
SUMMARY: SelSim is a program for Monte Carlo simulation of DNA polymorphism data for a recombining region within which a single bi-allelic site has experienced natural selection. SelSim allows simulation from either a fully stochastic model of, or deterministic approximations to, natural selection within a coalescent framework. A number of different mutation models are available for simulating surrounding neutral variation. The package enables a detailed exploration of the effects of different models and strengths of selection on patterns of diversity. This provides a tool for the statistical analysis of both empirical data and methods designed to detect natural selection. AVAILABILITY: http://www.stats.ox.ac.uk/mathgen/software.html. SUPPLEMENTARY INFORMATION: http://www.stats.ox.ac.uk/mathgen/software.html.  相似文献   

18.
Domain-enhanced analysis of microarray data using GO annotations   总被引:2,自引:0,他引:2  
MOTIVATION: New biological systems technologies give scientists the ability to measure thousands of bio-molecules including genes, proteins, lipids and metabolites. We use domain knowledge, e.g. the Gene Ontology, to guide analysis of such data. By focusing on domain-aggregated results at, say the molecular function level, increased interpretability is available to biological scientists beyond what is possible if results are presented at the gene level. RESULTS: We use a 'top-down' approach to perform domain aggregation by first combining gene expressions before testing for differentially expressed patterns. This is in contrast to the more standard 'bottom-up' approach, where genes are first tested individually then aggregated by domain knowledge. The benefits are greater sensitivity for detecting signals. Our method, domain-enhanced analysis (DEA) is assessed and compared to other methods using simulation studies and analysis of two publicly available leukemia data sets. AVAILABILITY: Our DEA method uses functions available in R (http://www.r-project.org/) and SAS (http://www.sas.com/). The two experimental data sets used in our analysis are available in R as Bioconductor packages, 'ALL' and 'golubEsets' (http://www.bioconductor.org/). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

19.
The use of high-density SNP arrays for investigating copy number alterations in clinical tumor samples, with intra tumor heterogeneity and varying degrees of normal cell contamination, imposes several problems for commonly used segmentation algorithms. This calls for flexibility when setting thresholds for calling gains and losses. In addition, sample normalization can induce artifacts in the copy-number ratios for the non-changed genomic elements in the tumor samples. RESULTS: We present an open source R package, Rseg, which allows the user to define sample-specific thresholds to call gains and losses. It also allows the user to correct for normalization artifacts. AVAILABILITY: The R package, Rseg, is available at: http://www.cs.au.dk/~plamy/Rseg/ and runs on Linux and MS-Windows.  相似文献   

20.
MOTIVATION: Traditional sequence distances require an alignment and therefore are not directly applicable to the problem of whole genome phylogeny where events such as rearrangements make full length alignments impossible. We present a sequence distance that works on unaligned sequences using the information theoretical concept of Kolmogorov complexity and a program to estimate this distance. RESULTS: We establish the mathematical foundations of our distance and illustrate its use by constructing a phylogeny of the Eutherian orders using complete unaligned mitochondrial genomes. This phylogeny is consistent with the commonly accepted one for the Eutherians. A second, larger mammalian dataset is also analyzed, yielding a phylogeny generally consistent with the commonly accepted one for the mammals. AVAILABILITY: The program to estimate our sequence distance, is available at http://www.cs.cityu.edu.hk/~cssamk/gencomp/GenCompress1.htm. The distance matrices used to generate our phylogenies are available at http://www.math.uwaterloo.ca/~mli/distance.html.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号