首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In quantitative proteomics, the false discovery rate (FDR) can be defined as the number of false positives within statistically significant changes in expression. False positives accumulate during the simultaneous testing of expression changes across hundreds or thousands of protein or peptide species when univariate tests such as the Student's t test are used. Currently most researchers rely solely on the estimation of p values and a significance threshold, but this approach may result in false positives because it does not account for the multiple testing effect. For each species, a measure of significance in terms of the FDR can be calculated, producing individual q values. The q value maintains power by allowing the investigator to achieve an acceptable level of true or false positives within the calls of significance. The q value approach relies on the use of the correct statistical test for the experimental design. In this situation, a uniform p value frequency distribution when there are no differences in expression between two samples should be obtained. Here we report a bias in p value distribution in the case of a three-dye DIGE experiment where no changes in expression are occurring. The bias was shown to arise from correlation in the data from the use of a common internal standard. With a two-dye schema, where each sample has its own internal standard, such bias was removed, enabling the application of the q value to two different proteomics studies. In the case of the first study, we demonstrate that 80% of calls of significance by the more traditional method are false positives. In the second, we show that calculating the q value gives the user control over the FDR. These studies demonstrate the power and ease of use of the q value in correcting for multiple testing. This work also highlights the need for robust experimental design that includes the appropriate application of statistical procedures.  相似文献   

2.
The common scenario in computational biology in which a community of researchers conduct multiple statistical tests on one shared database gives rise to the multiple hypothesis testing problem. Conventional procedures for solving this problem control the probability of false discovery by sacrificing some of the power of the tests. We suggest a scheme for controlling false discovery without any power loss by adding new samples for each use of the database and charging the user with the expenses. The crux of the scheme is a carefully crafted pricing system that fairly prices different user requests based on their demands while keeping the probability of false discovery bounded. We demonstrate this idea in the context of HIV treatment research, where multiple researchers conduct tests on a repository of HIV samples.  相似文献   

3.
BACKGROUND: Quantitative proteomics is an emerging field that encompasses multiplexed measurement of many known proteins in groups of experimental samples in order to identify differences between groups. Antibody arrays are a novel technology that is increasingly being used for quantitative proteomics studies due to highly multiplexed content, scalability, matrix flexibility and economy of sample consumption. Key applications of antibody arrays in quantitative proteomics studies are identification of novel diagnostic assays, biomarker discovery in trials of new drugs, and validation of qualitative proteomics discoveries. These applications require performance benchmarking, standardization and specification. RESULTS: Six dual-antibody, sandwich immunoassay arrays that measure 170 serum or plasma proteins were developed and experimental procedures refined in more than thirty quantitative proteomics studies. This report provides detailed information and specification for manufacture, qualification, assay automation, performance, assay validation and data processing for antibody arrays in large scale quantitative proteomics studies. CONCLUSION: The present report describes development of first generation standards for antibody arrays in quantitative proteomics. Specifically, it describes the requirements of a comprehensive validation program to identify and minimize antibody cross reaction under highly multiplexed conditions; provides the rationale for the application of standardized statistical approaches to manage the data output of highly replicated assays; defines design requirements for controls to normalize sample replicate measurements; emphasizes the importance of stringent quality control testing of reagents and antibody microarrays; recommends the use of real-time monitors to evaluate sensitivity, dynamic range and platform precision; and presents survey procedures to reveal the significance of biomarker findings.  相似文献   

4.
D Brusick 《Mutation research》1988,205(1-4):69-78
Shortly following the inception of genetic toxicology as a distinct discipline within toxicology, questions arose regarding the type and number of tests needed to classify a chemical as a mutagenic hazard or as a potential carcinogen. To some degree the discipline separated into two sub-specialties, (1) genetic risk assessment and (2) cancer prediction since data from experimental oncology also supports the existence of a genotoxic step in tumor initiation. The issue of which and how many tests continued to be debated, but is now focused more tightly around two independent phenomena. Tier or sequential testing was initially proposed as a logical and cost-effective method, but was discarded on the basis that the lower tier tests appeared to have too many false responses to force or exclude further testing of the test agent. Matrix (battery) testing was proposed for screening on the hypothesis that combinations of endpoints and multiple phylogenetic target organisms were needed to achieve satisfactory predictability. As the results from short-term test 'validation' studies for carcinogen prediction and evaluations of EPA's Gene-Tox data accumulated, it became obvious that qualitative differences remained between predictive and definitive tests and by assembling different combinations of short-term assays investigators did not appear to resolve the lack of concordance. Recent trends in genetic toxicology testing have focused on mathematical models for test selection, and standardized systems for multi-test data assessment.  相似文献   

5.
Rodent tumorigenicity experiments are conducted to determine the safety of substances for human exposure. The carcinogenicity of a substance is generally determined by statistical tests that compare the effects of treatment on the rate of tumor development at several body sites. The statistical analysis of such studies often includes hypothesis testing of the dose effect at each of the sites. However, the multiplicity of the significance tests may cause an excess overall false positive rate. In consideration of this problem, recent interest has focused on developing methods to test simultaneously for the treatment effect at multiple sites. In this paper, we propose a test that is based on the count of tumor-bearing sites. The test is appropriate regardless of tumor lethality or of treatment-related differences in the underlying mortality. Simulations are given which compare the performance of the proposed test to several other tests including a Bonferroni adjustment of site-specific tests, and the test is illustrated using the data from the large ED01 experiment.  相似文献   

6.

Background  

Genomics and proteomics analyses regularly involve the simultaneous test of hundreds of hypotheses, either on numerical or categorical data. To correct for the occurrence of false positives, validation tests based on multiple testing correction, such as Bonferroni and Benjamini and Hochberg, and re-sampling, such as permutation tests, are frequently used. Despite the known power of permutation-based tests, most available tools offer such tests for either t-test or ANOVA only. Less attention has been given to tests for categorical data, such as the Chi-square. This project takes a first step by developing an open-source software tool, Ptest, that addresses the need to offer public software tools incorporating these and other statistical tests with options for correcting for multiple hypotheses.  相似文献   

7.
Precise protein quantification is essential in comparative proteomics. Currently, quantification bias is inevitable when using proteotypic peptide‐based quantitative proteomics strategy for the differences in peptides measurability. To improve quantification accuracy, we proposed an “empirical rule for linearly correlated peptide selection (ERLPS)” in quantitative proteomics in our previous work. However, a systematic evaluation on general application of ERLPS in quantitative proteomics under diverse experimental conditions needs to be conducted. In this study, the practice workflow of ERLPS was explicitly illustrated; different experimental variables, such as, different MS systems, sample complexities, sample preparations, elution gradients, matrix effects, loading amounts, and other factors were comprehensively investigated to evaluate the applicability, reproducibility, and transferability of ERPLS. The results demonstrated that ERLPS was highly reproducible and transferable within appropriate loading amounts and linearly correlated response peptides should be selected for each specific experiment. ERLPS was used to proteome samples from yeast to mouse and human, and in quantitative methods from label‐free to O18/O16‐labeled and SILAC analysis, and enabled accurate measurements for all proteotypic peptide‐based quantitative proteomics over a large dynamic range.  相似文献   

8.
定量蛋白质组学已经成为组学领域研究的热点之一.相关实验技术和计算方法的不断创新极大地促进了定量蛋白质组学的飞速发展.常用的定量蛋白质组学策略按照是否需要稳定同位素标记可以分为无标定量和有标定量两大类.每类策略又产生了众多定量方法和工具,它们一方面推动了定量蛋白质组学的深入发展;另一方面,也在实验策略与技术的发展过程中不断更新.因此对这些定量实验策略和方法进行系统总结和归纳将有助于定量蛋白质组学的研究.本文主要从方法学角度全面归纳了目前定量蛋白质组学研究的相关策略和算法,详述了无标定量和有标定量的具体算法流程并比较了各自特点,还对以研究蛋白质绝对丰度为目标的绝对定量算法进行了总结,列举了常用的定量软件和工具,最后概述了定量结果的质量控制方法,对定量蛋白质组学方法发展的前景进行了展望.  相似文献   

9.
In this article, we consider the probabilistic identification of amino acid positions that evolve under positive selection as a multiple hypothesis testing problem. The null hypothesis "H0,s: site s evolves under a negative selection or under a neutral process of evolution" is tested at each codon site of the alignment of homologous coding sequences. Standard hypothesis testing is based on the control of the expected proportion of falsely rejected null hypotheses or type-I error rate. As the number of tests increases, however, the power of an individual test may become unacceptably low. Recent advances in statistics have shown that the false discovery rate--in this case, the expected proportion of sites that do not evolve under positive selection among those that are estimated to evolve under this selection regime--is a quantity that can be controlled. Keeping the proportion of false positives low among the significant results generally leads to an increase in power. In this article, we show that controlling the false detection rate is relevant when searching for positively selected sites. We also compare this new approach to traditional methods using extensive simulations.  相似文献   

10.
Quantitation is an inherent requirement in comparative proteomics and there is no exception to this for plant proteomics. Quantitative proteomics has high demands on the experimental workflow, requiring a thorough design and often a complex multi-step structure. It has to include sufficient numbers of biological and technical replicates and methods that are able to facilitate a quantitative signal read-out. Quantitative plant proteomics in particular poses many additional challenges but because of the nature of plants it also offers some potential advantages. In general, analysis of plants has been less prominent in proteomics. Low protein concentration, difficulties in protein extraction, genome multiploidy, high Rubisco abundance in green tissue, and an absence of well-annotated and completed genome sequences are some of the main challenges in plant proteomics. However, the latter is now changing with several genomes emerging for model plants and crops such as potato, tomato, soybean, rice, maize and barley. This review discusses the current status in quantitative plant proteomics (MS-based and non-MS-based) and its challenges and potentials. Both relative and absolute quantitation methods in plant proteomics from DIGE to MS-based analysis after isotope labeling and label-free quantitation are described and illustrated by published studies. In particular, we describe plant-specific quantitative methods such as metabolic labeling methods that can take full advantage of plant metabolism and culture practices, and discuss other potential advantages and challenges that may arise from the unique properties of plants.  相似文献   

11.
Western blot data are widely used in quantitative applications such as statistical testing and mathematical modelling. To ensure accurate quantitation and comparability between experiments, Western blot replicates must be normalised, but it is unclear how the available methods affect statistical properties of the data. Here we evaluate three commonly used normalisation strategies: (i) by fixed normalisation point or control; (ii) by sum of all data points in a replicate; and (iii) by optimal alignment of the replicates. We consider how these different strategies affect the coefficient of variation (CV) and the results of hypothesis testing with the normalised data. Normalisation by fixed point tends to increase the mean CV of normalised data in a manner that naturally depends on the choice of the normalisation point. Thus, in the context of hypothesis testing, normalisation by fixed point reduces false positives and increases false negatives. Analysis of published experimental data shows that choosing normalisation points with low quantified intensities results in a high normalised data CV and should thus be avoided. Normalisation by sum or by optimal alignment redistributes the raw data uncertainty in a mean-dependent manner, reducing the CV of high intensity points and increasing the CV of low intensity points. This causes the effect of normalisations by sum or optimal alignment on hypothesis testing to depend on the mean of the data tested; for high intensity points, false positives are increased and false negatives are decreased, while for low intensity points, false positives are decreased and false negatives are increased. These results will aid users of Western blotting to choose a suitable normalisation strategy and also understand the implications of this normalisation for subsequent hypothesis testing.  相似文献   

12.
Multilocus DNA fingerprinting methods have been used extensively to address genetic issues in wildlife populations. Hypotheses concerning population subdivision and differing levels of diversity can be addressed through the use of the similarity index (S), a band-sharing coefficient, and many researchers construct hypothesis tests with S based on the work of Lynch. It is shown in the present study, through mathematical analysis and through simulations, that estimates of the variance of a mean S based on Lynch's work are downwardly biased. An unbiased alternative is presented and mathematically justified. It is shown further, however, that even when the bias in Lynch's estimator is corrected, the estimator is highly imprecise compared with estimates based on an alternative approach such as 'parametric bootstrapping' of allele frequencies. Also discussed are permutation tests and their construction given the interdependence of Ss which share individuals. A simulation illustrates how some published misuses of these tests can lead to incorrect conclusions in hypothesis testing.  相似文献   

13.
Zhang SD 《PloS one》2011,6(4):e18874
BACKGROUND: Biomedical researchers are now often faced with situations where it is necessary to test a large number of hypotheses simultaneously, eg, in comparative gene expression studies using high-throughput microarray technology. To properly control false positive errors the FDR (false discovery rate) approach has become widely used in multiple testing. The accurate estimation of FDR requires the proportion of true null hypotheses being accurately estimated. To date many methods for estimating this quantity have been proposed. Typically when a new method is introduced, some simulations are carried out to show the improved accuracy of the new method. However, the simulations are often very limited to covering only a few points in the parameter space. RESULTS: Here I have carried out extensive in silico experiments to compare some commonly used methods for estimating the proportion of true null hypotheses. The coverage of these simulations is unprecedented thorough over the parameter space compared to typical simulation studies in the literature. Thus this work enables us to draw conclusions globally as to the performance of these different methods. It was found that a very simple method gives the most accurate estimation in a dominantly large area of the parameter space. Given its simplicity and its overall superior accuracy I recommend its use as the first choice for estimating the proportion of true null hypotheses in multiple testing.  相似文献   

14.
MS/MS combined with database search methods can identify the proteins present in complex mixtures. High throughput methods that infer probable peptide sequences from enzymatically digested protein samples create a challenge in how best to aggregate the evidence for candidate proteins. Typically the results of multiple technical and/or biological replicate experiments must be combined to maximize sensitivity. We present a statistical method for estimating probabilities of protein expression that integrates peptide sequence identifications from multiple search algorithms and replicate experimental runs. The method was applied to create a repository of 797 non-homologous zebrafish (Danio rerio) proteins, at an empirically validated false identification rate under 1%, as a resource for the development of targeted quantitative proteomics assays. We have implemented this statistical method as an analytic module that can be integrated with an existing suite of open-source proteomics software.  相似文献   

15.
There is growing interest in understanding how the brain utilizes synchronized oscillatory activity to integrate information across functionally connected regions. Computing phase-locking values (PLV) between EEG signals is a popular method for quantifying such synchronizations and elucidating their role in cognitive tasks. However, high-dimensionality in PLV data incurs a serious multiple testing problem. Standard multiple testing methods in neuroimaging research (e.g., false discovery rate, FDR) suffer severe loss of power, because they fail to exploit complex dependence structure between hypotheses that vary in spectral, temporal and spatial dimension. Previously, we showed that a hierarchical FDR and optimal discovery procedures could be effectively applied for PLV analysis to provide better power than FDR. In this article, we revisit the multiple comparison problem from a new Empirical Bayes perspective and propose the application of the local FDR method (locFDR; Efron, 2001) for PLV synchrony analysis to compute FDR as a posterior probability that an observed statistic belongs to a null hypothesis. We demonstrate the application of Efron''s Empirical Bayes approach for PLV synchrony analysis for the first time. We use simulations to validate the specificity and sensitivity of locFDR and a real EEG dataset from a visual search study for experimental validation. We also compare locFDR with hierarchical FDR and optimal discovery procedures in both simulation and experimental analyses. Our simulation results showed that the locFDR can effectively control false positives without compromising on the power of PLV synchrony inference. Our results from the application locFDR on experiment data detected more significant discoveries than our previously proposed methods whereas the standard FDR method failed to detect any significant discoveries.  相似文献   

16.
Structure-based virtual screening followed by selection of a top fraction of the rank-ordered result list suffers from many false positives and false negatives because the general scoring functions are not accurate enough. Many approaches have emerged to address this problem by including knowledge about the specific target in the scoring and selection steps. This target bias can include requirements for critical interactions, use of pharmacophore patterns or interaction patterns found in known co-crystal structures, and similarity to known ligands. Such biases are implemented in methods that vary from filtering tools for pre- or post-processing, to expert systems, quantitative (re)scoring functions, and docking tools that generate target-biased poses.  相似文献   

17.
Many recently developed nonparametric jump tests can be viewed as multiple hypothesis testing problems. For such multiple hypothesis tests, it is well known that controlling type I error often makes a large proportion of erroneous rejections, and such situation becomes even worse when the jump occurrence is a rare event. To obtain more reliable results, we aim to control the false discovery rate (FDR), an efficient compound error measure for erroneous rejections in multiple testing problems. We perform the test via the Barndorff-Nielsen and Shephard (BNS) test statistic, and control the FDR with the Benjamini and Hochberg (BH) procedure. We provide asymptotic results for the FDR control. From simulations, we examine relevant theoretical results and demonstrate the advantages of controlling the FDR. The hybrid approach is then applied to empirical analysis on two benchmark stock indices with high frequency data.  相似文献   

18.
Traditional resampling-based tests for homogeneity in covariance matrices across multiple groups resample residuals, that is, data centered by group means. These residuals do not share the same second moments when the null hypothesis is false, which makes them difficult to use in the setting of multiple testing. An alternative approach is to resample standardized residuals, data centered by group sample means and standardized by group sample covariance matrices. This approach, however, has been observed to inflate type I error when sample size is small or data are generated from heavy-tailed distributions. We propose to improve this approach by using robust estimation for the first and second moments. We discuss two statistics: the Bartlett statistic and a statistic based on eigen-decomposition of sample covariance matrices. Both statistics can be expressed in terms of standardized errors under the null hypothesis. These methods are extended to test homogeneity in correlation matrices. Using simulation studies, we demonstrate that the robust resampling approach provides comparable or superior performance, relative to traditional approaches, for single testing and reasonable performance for multiple testing. The proposed methods are applied to data collected in an HIV vaccine trial to investigate possible determinants, including vaccine status, vaccine-induced immune response level and viral genotype, of unusual correlation pattern between HIV viral load and CD4 count in newly infected patients.  相似文献   

19.

Background  

In binary high-throughput screening projects where the goal is the identification of low-frequency events, beyond the obvious issue of efficiency, false positives and false negatives are a major concern. Pooling constitutes a natural solution: it reduces the number of tests, while providing critical duplication of the individual experiments, thereby correcting for experimental noise. The main difficulty consists in designing the pools in a manner that is both efficient and robust: few pools should be necessary to correct the errors and identify the positives, yet the experiment should not be too vulnerable to biological shakiness. For example, some information should still be obtained even if there are slightly more positives or errors than expected. This is known as the group testing problem, or pooling problem.  相似文献   

20.
Ryman N  Jorde PE 《Molecular ecology》2001,10(10):2361-2373
A variety of statistical procedures are commonly employed when testing for genetic differentiation. In a typical situation two or more samples of individuals have been genotyped at several gene loci by molecular or biochemical means, and in a first step a statistical test for allele frequency homogeneity is performed at each locus separately, using, e.g. the contingency chi-square test, Fisher's exact test, or some modification thereof. In a second step the results from the separate tests are combined for evaluation of the joint null hypothesis that there is no allele frequency difference at any locus, corresponding to the important case where the samples would be regarded as drawn from the same statistical and, hence, biological population. Presently, there are two conceptually different strategies in use for testing the joint null hypothesis of no difference at any locus. One approach is based on the summation of chi-square statistics over loci. Another method is employed by investigators applying the Bonferroni technique (adjusting the P-value required for rejection to account for the elevated alpha errors when performing multiple tests simultaneously) to test if the heterogeneity observed at any particular locus can be regarded significant when considered separately. Under this approach the joint null hypothesis is rejected if one or more of the component single locus tests is considered significant under the Bonferroni criterion. We used computer simulations to evaluate the statistical power and realized alpha errors of these strategies when evaluating the joint hypothesis after scoring multiple loci. We find that the 'extended' Bonferroni approach generally is associated with low statistical power and should not be applied in the current setting. Further, and contrary to what might be expected, we find that 'exact' tests typically behave poorly when combined in existing procedures for joint hypothesis testing. Thus, while exact tests are generally to be preferred over approximate ones when testing each particular locus, approximate tests such as the traditional chi-square seem preferable when addressing the joint hypothesis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号