期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Data rotation improves genomotyping efficiency

Repsilber D Mira A Lindroos H Andersson S Ziegler A 《Biometrical journal. Biometrische Zeitschrift》2005,47(4):585-598

Unsequenced bacterial strains can be characterized by comparing their genomic DNA to a sequenced reference genome of the same species. This comparative genomic approach, also called genomotyping, is leading to an increased understanding of bacterial evolution and pathogenesis. It is efficiently accomplished by comparative genomic hybridization on custom-designed cDNA microarrays. The microarray experiment results in fluorescence intensities for reference and sample genome for each gene. The log-ratio of these intensities is usually compared to a cut-off, classifying each gene of the sample genome as a candidate for an absent or present gene with respect to the reference genome. Reducing the usually high rate of false positives in the list of candidates for absent genes is decisive for both time and costs of the experiment. We propose a novel method to improve efficiency of genomotyping experiments in this sense, by rotating the normalized intensity data before setting up the list of candidate genes. We analyze simulated genomotyping data and also re-analyze an experimental data set for comparison and illustration. We approximately halve the proportion of false positives in the list of candidate absent genes for the example comparative genomic hybridization experiment as well as for the simulation experiments. 相似文献

2.

A Regression-Based Differential Expression Detection Algorithm for Microarray Studies with Ultra-Low Sample Size

Daniel Vasiliu Samuel Clamons Molly McDonough Brian Rabe Margaret Saha 《PloS one》2015,10(3)

Global gene expression analysis using microarrays and, more recently, RNA-seq, has allowed investigators to understand biological processes at a system level. However, the identification of differentially expressed genes in experiments with small sample size, high dimensionality, and high variance remains challenging, limiting the usability of these tens of thousands of publicly available, and possibly many more unpublished, gene expression datasets. We propose a novel variable selection algorithm for ultra-low-n microarray studies using generalized linear model-based variable selection with a penalized binomial regression algorithm called penalized Euclidean distance (PED). Our method uses PED to build a classifier on the experimental data to rank genes by importance. In place of cross-validation, which is required by most similar methods but not reliable for experiments with small sample size, we use a simulation-based approach to additively build a list of differentially expressed genes from the rank-ordered list. Our simulation-based approach maintains a low false discovery rate while maximizing the number of differentially expressed genes identified, a feature critical for downstream pathway analysis. We apply our method to microarray data from an experiment perturbing the Notch signaling pathway in Xenopus laevis embryos. This dataset was chosen because it showed very little differential expression according to limma, a powerful and widely-used method for microarray analysis. Our method was able to detect a significant number of differentially expressed genes in this dataset and suggest future directions for investigation. Our method is easily adaptable for analysis of data from RNA-seq and other global expression experiments with low sample size and high dimensionality. 相似文献

3.

Assigning significance to peptides identified by tandem mass spectrometry using decoy databases 总被引：1，自引：0，他引：1

Käll L Storey JD MacCoss MJ Noble WS 《Journal of proteome research》2008,7(1):29-34

Automated methods for assigning peptides to observed tandem mass spectra typically return a list of peptide-spectrum matches, ranked according to an arbitrary score. In this article, we describe methods for converting these arbitrary scores into more useful statistical significance measures. These methods employ a decoy sequence database as a model of the null hypothesis, and use false discovery rate (FDR) analysis to correct for multiple testing. We first describe a simple FDR inference method and then describe how estimating and taking into account the percentage of incorrectly identified spectra in the entire data set can lead to increased statistical power. 相似文献

4.

An integrated approach for the analysis of biological pathways using mixed models

下载免费PDF全文

Wang L Zhang B Wolfinger RD Chen X 《PLoS genetics》2008,4(7):e1000115

Gene class, ontology, or pathway testing analysis has become increasingly popular in microarray data analysis. Such approaches allow the integration of gene annotation databases, such as Gene Ontology and KEGG Pathway, to formally test for subtle but coordinated changes at a system level. Higher power in gene class testing is gained by combining weak signals from a number of individual genes in each pathway. We propose an alternative approach for gene-class testing based on mixed models, a class of statistical models that: a) provides the ability to model and borrow strength across genes that are both up and down in a pathway, b) operates within a well-established statistical framework amenable to direct control of false positive or false discovery rates, c) exhibits improved power over widely used methods under normal location-based alternative hypotheses, and d) handles complex experimental designs for which permutation resampling is difficult. We compare the properties of this mixed models approach with nonparametric method GSEA and parametric method PAGE using a simulation study, and illustrate its application with a diabetes data set and a dose-response data set. 相似文献

5.

Identifying genes of gene regulatory networks using formal concept analysis

Jutta Gebert Susanne Motameny Ulrich Faigle Christian V Forst Rainer Schrader 《Journal of computational biology》2008,15(2):185-194

In order to understand the behavior of a gene regulatory network, it is essential to know the genes that belong to it. Identifying the correct members (e.g., in order to build a model) is a difficult task even for small subnetworks. Usually only few members of a network are known and one needs to guess the missing members based on experience or informed speculation. It is beneficial if one can additionally rely on experimental data to support this guess. In this work we present a new method based on formal concept analysis to detect unknown members of a gene regulatory network from gene expression time series data. We show that formal concept analysis is able to find a list of candidate genes for inclusion into a partially known basic network. This list can then be reduced by a statistical analysis so that the resulting genes interact strongly with the basic network and therefore should be included when modeling the network. The method has been applied to the DNA repair system of Mycobacterium tuberculosis. In this application, our method produces comparable results to an already existing method of component selection while it is applicable to a broader range of problems. 相似文献

6.

Discovering molecular functions significantly related to phenotypes by combining gene expression data and biological information 总被引：2，自引：0，他引：2

Al-Shahrour F Díaz-Uriarte R Dopazo J 《Bioinformatics (Oxford, England)》2005,21(13):2988-2993

MOTIVATION: The analysis of genome-scale data from different high throughput techniques can be used to obtain lists of genes ordered according to their different behaviours under distinct experimental conditions corresponding to different phenotypes (e.g. differential gene expression between diseased samples and controls, different response to a drug, etc.). The order in which the genes appear in the list is a consequence of the biological roles that the genes play within the cell, which account, at molecular scale, for the macroscopic differences observed between the phenotypes studied. Typically, two steps are followed for understanding the biological processes that differentiate phenotypes at molecular level: first, genes with significant differential expression are selected on the basis of their experimental values and subsequently, the functional properties of these genes are analysed. Instead, we present a simple procedure which combines experimental measurements with available biological information in a way that genes are simultaneously tested in groups related by common functional properties. The method proposed constitutes a very sensitive tool for selecting genes with significant differential behaviour in the experimental conditions tested. RESULTS: We propose the use of a method to scan ordered lists of genes. The method allows the understanding of the biological processes operating at molecular level behind the macroscopic experiment from which the list was generated. This procedure can be useful in situations where it is not possible to obtain statistically significant differences based on the experimental measurements (e.g. low prevalence diseases, etc.). Two examples demonstrate its application in two microarray experiments and the type of information that can be extracted. 相似文献

7.

Finding common genes in multiple cancer types through meta-analysis of microarray experiments: a rank aggregation approach

Pihur V Datta S Datta S 《Genomics》2008,92(6):400-403

Discovering genes involved in multiple types of cancers is of significant therapeutic importance. We show that collective evidence for such genes can be obtained via a form of meta-analysis that aggregates the results (rankings and p values) from various cancer-specific microarray experiments. This method is illustrated by a combined analysis of 20 microarray experiments. In the aggregated list of top-50 genes, 36 of them have been implicated in cancer (often multiple cancers) genesis in past studies, which also suggests that this list may contain some novel cancer genes that may deserve further scrutiny in the future. 相似文献

8.

Approaches to reduce false positives and false negatives in the analysis of microarray data: applications in type 1 diabetes research

Wu J Lenchik NI Gerling IC 《BMC genomics》2008,9(Z2):S12

Background

As studies of molecular biology system attempt to achieve a comprehensive understanding of a particular system, Type 1 errors may be a significant problem. However, few investigators are inclined to accept the increase in Type 2 errors (false positives) that may result when less stringent statistical cut-off values are used. To address this dilemma, we developed an analysis strategy that used a stringent statistical analysis to create a list of differentially expressed genes that served as "bait" to "fish out" other genes with similar patterns of expression.

Results

Comparing two strains of mice (NOD and C57Bl/6), we identified 93 genes with statistically significant differences in their patterns of expression. Hierarchical clustering identified an additional 39 genes with similar patterns of expression differences between the two strains. Pathway analysis was then employed: 1) identify the central genes and define biological processes that may be regulated by the genes identified, and 2) identify genes on the lists that could not be connected to each other in pathways (potential false positives). For networks created by both gene lists, the most connected (central) genes were interferon gamma (IFN-γ) and tumor necrosis factor alpha (TNF-α). These two cytokines are relevant to the biological differences between the two strains of mice. Furthermore, the network created by the list of 39 genes also suggested other biological differences between the strains.

Conclusion

Taken together, these data demonstrate how stringent statistical analysis, combined with hierarchical clustering and pathway analysis may offer deeper insight into the biological processes reflected from a set of expression array data. This approach allows us to 'recapture" false negative genes that otherwise would have been missed by the statistical analysis.

相似文献

9.

Parallel multiplicity and error discovery rate (EDR) in microarray experiments

Wayne Wenzhong Xu Clay J Carter 《BMC bioinformatics》2010,11(1):465

Background

In microarray gene expression profiling experiments, differentially expressed genes (DEGs) are detected from among tens of thousands of genes on an array using statistical tests. It is important to control the number of false positives or errors that are present in the resultant DEG list. To date, more than 20 different multiple test methods have been reported that compute overall Type I error rates in microarray experiments. However, these methods share the following dilemma: they have low power in cases where only a small number of DEGs exist among a large number of total genes on the array. 相似文献

10.

Diffprot - software for non-parametric statistical analysis of differential proteomics data

Malinowska A Kistowski M Bakun M Rubel T Tkaczyk M Mierzejewska J Dadlez M 《Journal of Proteomics》2012,75(13):4062-4073

Mass spectrometry-based global proteomics experiments generate large sets of data that can be converted into useful information only with an appropriate statistical approach. We present Diffprot - a software tool for statistical analysis of MS-derived quantitative data. With implemented resampling-based statistical test and local variance estimate, Diffprot allows to draw significant results from small scale experiments and effectively eliminates false positive results. To demonstrate the advantages of this software, we performed two spike-in tests with complex biological matrices, one label-free and one based on iTRAQ quantification; in addition, we performed an iTRAQ experiment on bacterial samples. In the spike-in tests, protein ratios were estimated and were in good agreement with theoretical values; statistical significance was assigned to spiked proteins and single or no false positive results were obtained with Diffprot. We compared the performance of Diffprot with other statistical tests - widely used t-test and non-parametric Wilcoxon test. In contrast to Diffprot, both generated many false positive hits in the spike-in experiment. This proved the superiority of the resampling-based method in terms of specificity, making Diffprot a rational choice for small scale high-throughput experiments, when the need to control the false positive rate is particularly pressing. 相似文献

11.

Linear Mixed Model Selection for False Discovery Rate Control in Microarray Data Analysis

Cumhur Yusuf Demirkale Dan Nettleton Tapabrata Maiti 《Biometrics》2010,66(2):621-629

Summary In a microarray experiment, one experimental design is used to obtain expression measures for all genes. One popular analysis method involves fitting the same linear mixed model for each gene, obtaining gene‐specific p‐values for tests of interest involving fixed effects, and then choosing a threshold for significance that is intended to control false discovery rate (FDR) at a desired level. When one or more random factors have zero variance components for some genes, the standard practice of fitting the same full linear mixed model for all genes can result in failure to control FDR. We propose a new method that combines results from the fit of full and selected linear mixed models to identify differentially expressed genes and provide FDR control at target levels when the true underlying random effects structure varies across genes. 相似文献

12.

An associative analysis of gene expression array data

Dozmorov I Centola M 《Bioinformatics (Oxford, England)》2003,19(2):204-211

MOTIVATION: We face the absence of optimized standards to guide normalization, comparative analysis, and interpretation of data sets. One aspect of this is that current methods of statistical analysis do not adequately utilize the information inherent in the large data sets generated in a microarray experiment and require a tradeoff between detection sensitivity and specificity. RESULTS: We present a multistep procedure for analysis of mRNA expression data obtained from cDNA array methods. To identify and classify differentially expressed genes, results from standard paired t-test of normalized data are compared with those from a novel method, denoted an associative analysis. This method associates experimental gene expressions presented as residuals in regression analysis against control averaged expressions to a common standard-the family of similarly computed residuals for low variability genes derived from control experiments. By associating changes in expression of a given gene to a large family of equally expressed genes of the control group, this method utilizes the large data sets inherent in microarray experiments to increase both specificity and sensitivity. The overall procedure is illustrated by tabulation of genes whose expression differs significantly between Snell dwarf mice (dw/dw) and their phenotypically normal littermates (dw/+, +/+). Of the 2,352 genes examined only 450-500 were expressed above the background levels observed in nonexpressed genes and of these 120 were established as differentially expressed in dwarf mice at a significance level that excludes appearance of false positive determinations. 相似文献

13.

OpenDMAP: An open source, ontology-driven concept analysis engine, with applications to capturing knowledge regarding protein transport, protein interactions and cell-type-specific gene expression

Lawrence Hunter Zhiyong Lu James Firby William A Baumgartner Jr Helen L Johnson Philip V Ogren K Bretonnel Cohen 《BMC bioinformatics》2008,9(1):1-11

Background

Microarray technology provides an efficient means for globally exploring physiological processes governed by the coordinated expression of multiple genes. However, identification of genes differentially expressed in microarray experiments is challenging because of their potentially high type I error rate. Methods for large-scale statistical analyses have been developed but most of them are applicable to two-sample or two-condition data.

Results

We developed a large-scale multiple-group F-test based method, named ranking analysis of F-statistics (RAF), which is an extension of ranking analysis of microarray data (RAM) for two-sample t-test. In this method, we proposed a novel random splitting approach to generate the null distribution instead of using permutation, which may not be appropriate for microarray data. We also implemented a two-simulation strategy to estimate the false discovery rate. Simulation results suggested that it has higher efficiency in finding differentially expressed genes among multiple classes at a lower false discovery rate than some commonly used methods. By applying our method to the experimental data, we found 107 genes having significantly differential expressions among 4 treatments at <0.7% FDR, of which 31 belong to the expressed sequence tags (ESTs), 76 are unique genes who have known functions in the brain or central nervous system and belong to six major functional groups.

Conclusion

Our method is suitable to identify differentially expressed genes among multiple groups, in particular, when sample size is small. 相似文献

14.

Meta-analysis based on control of false discovery rate: combining yeast ChIP-chip datasets

Pyne S Futcher B Skiena S 《Bioinformatics (Oxford, England)》2006,22(20):2516-2522

相似文献

15.

Statistical assessment of functional categories of genes deregulated in pathological conditions by using microarray data 总被引：1，自引：0，他引：1

Maglietta R Piepoli A Catalano D Licciulli F Carella M Liuni S Pesole G Perri F Ancona N 《Bioinformatics (Oxford, England)》2007,23(16):2063-2072

相似文献

16.

A stochastic downhill search algorithm for estimating the local false discovery rate

Scheid S Spang R 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2004,1(3):98-108

Screening for differential gene expression in microarray studies leads to difficult large-scale multiple testing problems. The local false discovery rate is a statistical concept for quantifying uncertainty in multiple testing. We introduce a novel estimator for the local false discovery rate that is based on an algorithm which splits all genes into two groups, representing induced and noninduced genes, respectively. Starting from the full set of genes, we successively exclude genes until the gene-wise p-values of the remaining genes look like a typical sample from a uniform distribution. In comparison to other methods, our algorithm performs compatibly in detecting the shape of the local false discovery rate and has a smaller bias with respect to estimating the overall percentage of noninduced genes. Our algorithm is implemented in the Bioconductor compatible R package TWILIGHT version 1.0.1, which is available from http://compdiag.molgen.mpg.de/software or from the Bioconductor project at http://www.bioconductor.org. 相似文献

17.

Modularized learning of genetic interaction networks from biological annotations and mRNA expression data

Lee PH Lee D 《Bioinformatics (Oxford, England)》2005,21(11):2739-2747

MOTIVATION: Inferring the genetic interaction mechanism using Bayesian networks has recently drawn increasing attention due to its well-established theoretical foundation and statistical robustness. However, the relative insufficiency of experiments with respect to the number of genes leads to many false positive inferences. RESULTS: We propose a novel method to infer genetic networks by alleviating the shortage of available mRNA expression data with prior knowledge. We call the proposed method 'modularized network learning' (MONET). Firstly, the proposed method divides a whole gene set to overlapped modules considering biological annotations and expression data together. Secondly, it infers a Bayesian network for each module, and integrates the learned subnetworks to a global network. An algorithm that measures a similarity between genes based on hierarchy, specificity and multiplicity of biological annotations is presented. The proposed method draws a global picture of inter-module relationships as well as a detailed look of intra-module interactions. We applied the proposed method to analyze Saccharomyces cerevisiae stress data, and found several hypotheses to suggest putative functions of unclassified genes. We also compared the proposed method with a whole-set-based approach and two expression-based clustering approaches. 相似文献

18.

Differential analysis of DNA microarray gene expression data 总被引：6，自引：0，他引：6

Hatfield GW Hung SP Baldi P 《Molecular microbiology》2003,47(4):871-877

Here, we review briefly the sources of experimental and biological variance that affect the interpretation of high-dimensional DNA microarray experiments. We discuss methods using a regularized t-test based on a Bayesian statistical framework that allow the identification of differentially regulated genes with a higher level of confidence than a simple t-test when only a few experimental replicates are available. We also describe a computational method for calculating the global false-positive and false-negative levels inherent in a DNA microarray data set. This method provides a probability of differential expression for each gene based on experiment-wide false-positive and -negative levels driven by experimental error and biological variance. 相似文献

19.

Network analysis of gene lists for finding reproducible prognostic breast cancer gene signatures

Ulykbek Kairov Tatyana Karpenyuk Erlan Ramanculov Andrei Zinovyev 《Bioinformation》2012,8(16):773-776

Many genome-scale studies in molecular biology deliver results in the form of a ranked list of gene names, accordingly to some scoring method. There is always the question how many top-ranked genes to consider for further analysis, for example, in order creating a diagnostic or predictive gene signature for a disease. This question is usually approached from a statistical point of view, without considering any biological properties of top-ranked genes or how they are related to each other functionally. Here we suggest a new method for selecting a number of genes in a ranked gene list such that this set forms the Optimally Functionally Enriched Network (OFTEN), formed by known physical interactions between genes or their products. The method allows associating a network with the gene list, providing easier interpretation of the results and classifying the genes or proteins accordingly to their position in the resulting network. We demonstrate the method on four breast cancer datasets and show that 1) the resulting gene signatures are more reproducible from one dataset to another compared to standard statistical procedures and 2) the overlap of these signatures has significant prognostic potential. The method is implemented in BiNoM Cytoscape plugin (http://binom.curie.fr). 相似文献

20.

Increased power of microarray analysis by use of an algorithm based on a multivariate procedure

Krohn K Eszlinger M Paschke R Roeder I Schuster E 《Bioinformatics (Oxford, England)》2005,21(17):3530-3534

MOTIVATION: The power of microarray analyses to detect differential gene expression strongly depends on the statistical and bioinformatical approaches used for data analysis. Moreover, the simultaneous testing of tens of thousands of genes for differential expression raises the 'multiple testing problem', increasing the probability of obtaining false positive test results. To achieve more reliable results, it is, therefore, necessary to apply adjustment procedures to restrict the family-wise type I error rate (FWE) or the false discovery rate. However, for the biologist the statistical power of such procedures often remains abstract, unless validated by an alternative experimental approach. RESULTS: In the present study, we discuss a multiplicity adjustment procedure applied to classical univariate as well as to recently proposed multivariate gene-expression scores. All procedures strictly control the FWE. We demonstrate that the use of multivariate scores leads to a more efficient identification of differentially expressed genes than the widely used MAS5 approach provided by the Affymetrix software tools (Affymetrix Microarray Suite 5 or GeneChip Operating Software). The practical importance of this finding is successfully validated using real time quantitative PCR and data from spike-in experiments. AVAILABILITY: The R-code of the statistical routines can be obtained from the corresponding author. CONTACT: Schuster@imise.uni-leipzig.de 相似文献