首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Data from gene expression arrays are influenced by many experimental parameters that lead to variations not simply accessible by standard quantification methods. To compare measurements from gene expression array experiments, quantitative data are commonly normalised using reference genes or global normalisation methods based on mean or median values. These methods are based on the assumption that (i) selected reference genes are expressed at a standard level in all experiments or (ii) that mean or median signal of expression will give a quantitative reference for each individual experiment. We introduce here a new ranking diagram, with which we can show how the different normalisation methods compare, and how they are influenced by variations in measurements (noise) that occur in every experiment. Furthermore, we show that an upper trimmed mean provides a simple and robust method for normalisation of larger sets of experiments by comparative analysis.  相似文献   

2.
A modified Bonferroni method for discrete data   总被引:5,自引:1,他引:4  
R E Tarone 《Biometrics》1990,46(2):515-522
The Bonferroni adjustment for multiple comparisons is a simple and useful method of controlling the overall false positive error rate when several significance tests are performed in the evaluation of an experiment. In situations with categorical data, the test statistics have discrete distributions. The discreteness of the null distributions can be exploited to reduce the number of significance tests taken into account in the Bonferroni procedure. This reduction is accomplished by using only the information contained in the marginal totals.  相似文献   

3.
4.
Mass spectrometry-based global proteomics experiments generate large sets of data that can be converted into useful information only with an appropriate statistical approach. We present Diffprot - a software tool for statistical analysis of MS-derived quantitative data. With implemented resampling-based statistical test and local variance estimate, Diffprot allows to draw significant results from small scale experiments and effectively eliminates false positive results. To demonstrate the advantages of this software, we performed two spike-in tests with complex biological matrices, one label-free and one based on iTRAQ quantification; in addition, we performed an iTRAQ experiment on bacterial samples. In the spike-in tests, protein ratios were estimated and were in good agreement with theoretical values; statistical significance was assigned to spiked proteins and single or no false positive results were obtained with Diffprot. We compared the performance of Diffprot with other statistical tests - widely used t-test and non-parametric Wilcoxon test. In contrast to Diffprot, both generated many false positive hits in the spike-in experiment. This proved the superiority of the resampling-based method in terms of specificity, making Diffprot a rational choice for small scale high-throughput experiments, when the need to control the false positive rate is particularly pressing.  相似文献   

5.
6.
7.
An objective of many functional genomics studies is to estimate treatment-induced changes in gene expression. cDNA arrays interrogate each tissue sample for the levels of mRNA for hundreds to tens of thousands of genes, and the use of this technology leads to a multitude of treatment contrasts. By-gene hypotheses tests evaluate the evidence supporting no effect, but selecting a significance level requires dealing with the multitude of comparisons. The p-values from these tests order the genes such that a p-value cutoff divides the genes into two sets. Ideally one set would contain the affected genes and the other would contain the unaffected genes. However, the set of genes selected as affected will have false positives, i.e., genes that are not affected by treatment. Likewise, the other set of genes, selected as unaffected, will contain false negatives, i.e., genes that are affected. A plot of the observed p-values (1 - p) versus their expectation under a uniform [0, 1] distribution allows one to estimate the number of true null hypotheses. With this estimate, the false positive rates and false negative rates associated with any p-value cutoff can be estimated. When computed for a range of cutoffs, these rates summarize the ability of the study to resolve effects. In our work, we are more interested in selecting most of the affected genes rather than protecting against a few false positives. An optimum cutoff, i.e., the best set given the data, depends upon the relative cost of falsely classifying a gene as affected versus the cost of falsely classifying a gene as unaffected. We select the cutoff by a decision-theoretic method analogous to methods developed for receiver operating characteristic curves. In addition, we estimate the false discovery rate and the false nondiscovery rate associated with any cutoff value. Two functional genomics studies that were designed to assess a treatment effect are used to illustrate how the methods allowed the investigators to determine a cutoff to suit their research goals.  相似文献   

8.
Unsequenced bacterial strains can be characterized by comparing their genomic DNA to a sequenced reference genome of the same species. This comparative genomic approach, also called genomotyping, is leading to an increased understanding of bacterial evolution and pathogenesis. It is efficiently accomplished by comparative genomic hybridization on custom-designed cDNA microarrays. The microarray experiment results in fluorescence intensities for reference and sample genome for each gene. The log-ratio of these intensities is usually compared to a cut-off, classifying each gene of the sample genome as a candidate for an absent or present gene with respect to the reference genome. Reducing the usually high rate of false positives in the list of candidates for absent genes is decisive for both time and costs of the experiment. We propose a novel method to improve efficiency of genomotyping experiments in this sense, by rotating the normalized intensity data before setting up the list of candidate genes. We analyze simulated genomotyping data and also re-analyze an experimental data set for comparison and illustration. We approximately halve the proportion of false positives in the list of candidate absent genes for the example comparative genomic hybridization experiment as well as for the simulation experiments.  相似文献   

9.
MOTIVATION: Tandem mass-spectrometry of trypsin digests, followed by database searching, is one of the most popular approaches in high-throughput proteomics studies. Peptides are considered identified if they pass certain scoring thresholds. To avoid false positive protein identification, > or = 2 unique peptides identified within a single protein are generally recommended. Still, in a typical high-throughput experiment, hundreds of proteins are identified only by a single peptide. We introduce here a method for distinguishing between true and false identifications among single-hit proteins. The approach is based on randomized database searching and usage of logistic regression models with cross-validation. This approach is implemented to analyze three bacterial samples enabling recovery 68-98% of the correct single-hit proteins with an error rate of < 2%. This results in a 22-65% increase in number of identified proteins. Identifying true single-hit proteins will lead to discovering many crucial regulators, biomarkers and other low abundance proteins. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

10.
Exome sequencing constitutes an important technology for the study of human hereditary diseases and cancer. However, the ability of this approach to identify copy number alterations in primary tumor samples has not been fully addressed. Here we show that somatic copy number alterations can be reliably estimated using exome sequencing data through a strategy that we have termed exome2cnv. Using data from 86 paired normal and primary tumor samples, we identified losses and gains of complete chromosomes or large genomic regions, as well as smaller regions affecting a minimum of one gene. Comparison with high-resolution comparative genomic hybridization (CGH) arrays revealed a high sensitivity and a low number of false positives in the copy number estimation between both approaches. We explore the main factors affecting sensitivity and false positives with real data, and provide a side by side comparison with CGH arrays. Together, these results underscore the utility of exome sequencing to study cancer samples by allowing not only the identification of substitutions and indels, but also the accurate estimation of copy number alterations.  相似文献   

11.
12.
MOTIVATION: When running experiments that involve multiple high density oligonucleotide arrays, it is important to remove sources of variation between arrays of non-biological origin. Normalization is a process for reducing this variation. It is common to see non-linear relations between arrays and the standard normalization provided by Affymetrix does not perform well in these situations. RESULTS: We present three methods of performing normalization at the probe intensity level. These methods are called complete data methods because they make use of data from all arrays in an experiment to form the normalizing relation. These algorithms are compared to two methods that make use of a baseline array: a one number scaling based algorithm and a method that uses a non-linear normalizing relation by comparing the variability and bias of an expression measure. Two publicly available datasets are used to carry out the comparisons. The simplest and quickest complete data method is found to perform favorably. AVAILABILITY: Software implementing all three of the complete data normalization methods is available as part of the R package Affy, which is a part of the Bioconductor project http://www.bioconductor.org. SUPPLEMENTARY INFORMATION: Additional figures may be found at http://www.stat.berkeley.edu/~bolstad/normalize/index.html  相似文献   

13.
Effects of filtering by Present call on analysis of microarray experiments   总被引:1,自引:0,他引:1  

Background

Affymetrix GeneChips® are widely used for expression profiling of tens of thousands of genes. The large number of comparisons can lead to false positives. Various methods have been used to reduce false positives, but they have rarely been compared or quantitatively evaluated. Here we describe and evaluate a simple method that uses the detection (Present/Absent) call generated by the Affymetrix microarray suite version 5 software (MAS5) to remove data that is not reliably detected before further analysis, and compare this with filtering by expression level. We explore the effects of various thresholds for removing data in experiments of different size (from 3 to 10 arrays per treatment), as well as their relative power to detect significant differences in expression.

Results

Our approach sets a threshold for the fraction of arrays called Present in at least one treatment group. This method removes a large percentage of probe sets called Absent before carrying out the comparisons, while retaining most of the probe sets called Present. It preferentially retains the more significant probe sets (p ≤ 0.001) and those probe sets that are turned on or off, and improves the false discovery rate. Permutations to estimate false positives indicate that probe sets removed by the filter contribute a disproportionate number of false positives. Filtering by fraction Present is effective when applied to data generated either by the MAS5 algorithm or by other probe-level algorithms, for example RMA (robust multichip average). Experiment size greatly affects the ability to reproducibly detect significant differences, and also impacts the effect of filtering; smaller experiments (3–5 samples per treatment group) benefit from more restrictive filtering (≥50% Present).

Conclusion

Use of a threshold fraction of Present detection calls (derived by MAS5) provided a simple method that effectively eliminated from analysis probe sets that are unlikely to be reliable while preserving the most significant probe sets and those turned on or off; it thereby increased the ratio of true positives to false positives.  相似文献   

14.
Hornberg JJ  de Haas RR  Dekker H  Lankelma J 《BioTechniques》2002,33(1):108, 110, 112-108,3, passim
Nylon membrane-based macroarrays form a widely available alternative to microarrays for the collection of large-scale gene expression data. To carry out repetitive hybridization experiments with nylon cDNA arrays, we used phosphorothioate 33P-cDNA, followed by stripping under relatively mild conditions. We were able to use the same membranes more than 10 times without a measurable reduction in their performance. Thus, our protocol allowsfor more comparative studies of multiple data sets obtained from sequential hybridizations of the same set of membranes. We demonstrate how to analyze repetitive macroarray experiments and to determine the reliability or statistical significance of the gene expression data obtained. Both the averaging of signals per gene and the reversal of nylon membranes had a favorable effect on accuracy. By self-self comparisons, we show that in a duplicate experiment with four membranes, a 2-fold change in the gene expression can be measured reliably.  相似文献   

15.
The search for pairs (dyads) of related individuals in large databases of DNA-profiles has become an increasingly important inference tool in ecology. However, the many, partly dependent, pairwise comparisons introduce statistical issues. We show that the false discovery rate (FDR) procedure is well suited to control for the proportion of false positives, i.e. dyads consisting of unrelated individuals, which under normal circumstances would have been labelled as related individuals. We verify the behaviour of the standard FDR procedure by simulation, demonstrating that the FDR procedure works satisfactory in spite of the many dependent pairwise comparisons involved in an exhaustive database screening. A computer program that implements this method is available online. In addition, we propose to implement a second stage in the procedure, in which additional independent genetic markers are used to identify the false positives. We demonstrate the application of the approach in an analysis of a DNA database consisting of 3300 individual minke whales (Balaenoptera acutorostrata) each typed at ten microsatellite loci. Applying the standard procedure with an FDR of 50% led to the identification of 74 putative dyads of 1st- or 2nd-order relatives. However, introducing the second step, which involved additional genotypes at 15 microsatellite loci, revealed that only 21 of the putative dyads can be claimed with high certainty to be true dyads.  相似文献   

16.
Sequencing by hybridization (SBH) is a DNA sequencing technique, in which the sequence is reconstructed using its k-mer content. This content, which is called the spectrum of the sequence, is obtained by hybridization to a universal DNA array. Standard universal arrays contain all k-mers for some fixed k, typically 8 to 10. Currently, in spite of its promise and elegance, SBH is not competitive with standard gel-based sequencing methods. This is due to two main reasons: lack of tools to handle realistic levels of hybridization errors and an inherent limitation on the length of uniquely reconstructible sequence by standard universal arrays. In this paper, we deal with both problems. We introduce a simple polynomial reconstruction algorithm which can be applied to spectra from standard arrays and has provable performance in the presence of both false negative and false positive errors. We also propose a novel design of chips containing universal bases that differs from the one proposed by Preparata et al. (1999). We give a simple algorithm that uses spectra from such chips to reconstruct with high probability random sequences of length lower only by a squared log factor compared to the information theoretic bound. Our algorithm is very robust to errors and has a provable performance even if there are both false negative and false positive errors. Simulations indicate that its sensitivity to errors is also very small in practice.  相似文献   

17.
Life history patterns are usually identified by comparisons of extant species. Because of inferences regarding phylogenetic constraints, comparative data are often not statistically independent. In order to remove phylogenetic patterns embedded in life history data completely, we adopted a phylogenetic autoregressive method to reanalyse a data set of the ovipositional and developmental rates of 45 Phytoseiidae species. We first calculated the phylogenetic correlation in relation to different taxonomic levels using Moran's I statistics. Significant and positive phylogenetic correlations were found at the subgenus and subfamily levels. This indicates that some variation in both of these life-history traits could be accounted for by phylogeny. Phylogenetic associations, therefore, were removed by a phylogenetic autoregressive method. Using corrected data from this method, the specific components of the ovipositional rate are positively correlated with the specific components of th e developmental rate. The method that we have used obtains the same conclusion as others but differs from the phylogenetic effect in the way that it influences the relationship between comparative data. Because of no data reduction in the phylogenetic autoregressive method, the specific components are more useful than the mean values derived from the higher taxonomic nodes for testing ecological and evolutionary hypotheses about life history patterns. © Rapid Science Ltd. 1998  相似文献   

18.
MOTIVATION: We face the absence of optimized standards to guide normalization, comparative analysis, and interpretation of data sets. One aspect of this is that current methods of statistical analysis do not adequately utilize the information inherent in the large data sets generated in a microarray experiment and require a tradeoff between detection sensitivity and specificity. RESULTS: We present a multistep procedure for analysis of mRNA expression data obtained from cDNA array methods. To identify and classify differentially expressed genes, results from standard paired t-test of normalized data are compared with those from a novel method, denoted an associative analysis. This method associates experimental gene expressions presented as residuals in regression analysis against control averaged expressions to a common standard-the family of similarly computed residuals for low variability genes derived from control experiments. By associating changes in expression of a given gene to a large family of equally expressed genes of the control group, this method utilizes the large data sets inherent in microarray experiments to increase both specificity and sensitivity. The overall procedure is illustrated by tabulation of genes whose expression differs significantly between Snell dwarf mice (dw/dw) and their phenotypically normal littermates (dw/+, +/+). Of the 2,352 genes examined only 450-500 were expressed above the background levels observed in nonexpressed genes and of these 120 were established as differentially expressed in dwarf mice at a significance level that excludes appearance of false positive determinations.  相似文献   

19.
Researchers have several options when designing proteomics experiments. Primary among these are choices of experimental method, instrumentation and spectral interpretation software. To evaluate these choices on a proteome scale, we compared triplicate measurements of the yeast proteome by liquid chromatography tandem mass spectrometry (LC-MS/MS) using linear ion trap (LTQ) and hybrid quadrupole time-of-flight (QqTOF; QSTAR) mass spectrometers. Acquired MS/MS spectra were interpreted with Mascot and SEQUEST algorithms with and without the requirement that all returned peptides be tryptic. Using a composite target decoy database strategy, we selected scoring criteria yielding 1% estimated false positive identifications at maximum sensitivity for all data sets, allowing reasonable comparisons between them. These comparisons indicate that Mascot and SEQUEST yield similar results for LTQ-acquired spectra but less so for QSTAR spectra. Furthermore, low reproducibility between replicate data acquisitions made on one or both instrument platforms can be exploited to increase sensitivity and confidence in large-scale protein identifications.  相似文献   

20.
Solving complex photocycle kinetics. Theory and direct method.   总被引:4,自引:2,他引:2       下载免费PDF全文
A direct nonlinear least squares method is described that obtains the true kinetic rate constants and the temperature-independent spectra of n intermediates from spectroscopic data taken in the visible at three or more temperatures. A theoretical analysis, which is independent of implementation of the direct method, proves that well determined local solutions are not possible for fewer than three temperatures. This analysis also proves that measurements at more than n wavelengths are redundant, although the direct method indicates that convergence is faster if n + m wavelengths are measured, where m is of order one. This suggests that measurements should concentrate on high precision for a few measuring wavelengths, rather than lower precision for many wavelengths. Globally, false solutions occur, and the ability to reject these depends upon the precision of the data, as shown by explicit example. An optimized way to analyze vibrational spectroscopic data is also presented. Such data yield unique results, which are comparably accurate to those obtained from data taken in the visible with comparable noise. It is discussed how use of both kinds of data is advantageous if the data taken in the visible are significantly less noisy.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号