首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 371 毫秒
1.
Chen L  Storey JD 《Genetics》2006,173(4):2371-2381
Linkage analysis involves performing significance tests at many loci located throughout the genome. Traditional criteria for declaring a linkage statistically significant have been formulated with the goal of controlling the rate at which any single false positive occurs, called the genomewise error rate (GWER). As complex traits have become the focus of linkage analysis, it is increasingly common to expect that a number of loci are truly linked to the trait. This is especially true in mapping quantitative trait loci (QTL), where sometimes dozens of QTL may exist. Therefore, alternatives to the strict goal of preventing any single false positive have recently been explored, such as the false discovery rate (FDR) criterion. Here, we characterize some of the challenges that arise when defining relaxed significance criteria that allow for at least one false positive linkage to occur. In particular, we show that the FDR suffers from several problems when applied to linkage analysis of a single trait. We therefore conclude that the general applicability of FDR for declaring significant linkages in the analysis of a single trait is dubious. Instead, we propose a significance criterion that is more relaxed than the traditional GWER, but does not appear to suffer from the problems of the FDR. A generalized version of the GWER is proposed, called GWERk, that allows one to provide a more liberal balance between true positives and false positives at no additional cost in computation or assumptions.  相似文献   

2.

Background  

The evaluation of statistical significance has become a critical process in identifying differentially expressed genes in microarray studies. Classical p-value adjustment methods for multiple comparisons such as family-wise error rate (FWER) have been found to be too conservative in analyzing large-screening microarray data, and the False Discovery Rate (FDR), the expected proportion of false positives among all positives, has been recently suggested as an alternative for controlling false positives. Several statistical approaches have been used to estimate and control FDR, but these may not provide reliable FDR estimation when applied to microarray data sets with a small number of replicates.  相似文献   

3.
Microsatellite loci are widely used in population genetic studies, but the presence of null alleles may lead to biased results. Here, we assessed five methods that indirectly detect null alleles and found large inconsistencies among them. Our analysis was based on 20 microsatellite loci genotyped in a natural population of Microtus oeconomus sampled during 8 years, together with 1200 simulated populations without null alleles, but experiencing bottlenecks of varying duration and intensity, and 120 simulated populations with known null alleles. In the natural population, 29% of positive results were consistent between the methods in pairwise comparisons, and in the simulated data set, this proportion was 14%. The positive results were also inconsistent between different years in the natural population. In the null‐allele‐free simulated data set, the number of false positives increased with increased bottleneck intensity and duration. We also found a low concordance in null allele detection between the original simulated populations and their 20% random subsets. In the populations simulated to include null alleles, between 22% and 42% of true null alleles remained undetected, which highlighted that detection errors are not restricted to false positives. None of the evaluated methods clearly outperformed the others when both false‐positive and false‐negative rates were considered. Accepting only the positive results consistent between at least two methods should considerably reduce the false‐positive rate, but this approach may increase the false‐negative rate. Our study demonstrates the need for novel null allele detection methods that could be reliably applied to natural populations.  相似文献   

4.
Many exploratory microarray data analysis tools such as gene clustering and relevance networks rely on detecting pairwise gene co-expression. Traditional screening of pairwise co-expression either controls biological significance or statistical significance, but not both. The former approach does not provide stochastic error control, and the later approach screens many co-expressions with excessively low correlation. We have designed and implemented a statistically sound two-stage co-expression detection algorithm that controls both statistical significance (false discovery rate, FDR) and biological significance (minimum acceptable strength, MAS) of the discovered co-expressions. Based on estimation of pairwise gene correlation, the algorithm provides an initial co-expression discovery that controls only FDR, which is then followed by a second stage co-expression discovery which controls both FDR and MAS. It also computes and thresholds the set of FDR p-values for each correlation that satisfied the MAS criterion. Using simulated data, we validated asymptotic null distributions of the Pearson and Kendall correlation coefficients and the two-stage error-control procedure; we also compared our two-stage test procedure with another two-stage test procedure using the receiver operating characteristic (ROC) curve. We then used yeast galactose metabolism data to illustrate the advantage of our method for clustering genes and constructing a relevance network. The method has been implemented in an R package "GeneNT" that is freely available from the Comprehensive R Archive Network (CRAN): www.cran.r-project.org/.  相似文献   

5.
The Newman-Keuls (NK) procedure for testing all pairwise comparisons among a set of treatment means, introduced by Newman (1939) and in a slightly different form by Keuls (1952) was proposed as a reasonable way to alleviate the inflation of error rates when a large number of means are compared. It was proposed before the concepts of different types of multiple error rates were introduced by Tukey (1952a, b; 1953). Although it was popular in the 1950s and 1960s, once control of the familywise error rate (FWER) was accepted generally as an appropriate criterion in multiple testing, and it was realized that the NK procedure does not control the FWER at the nominal level at which it is performed, the procedure gradually fell out of favor. Recently, a more liberal criterion, control of the false discovery rate (FDR), has been proposed as more appropriate in some situations than FWER control. This paper notes that the NK procedure and a nonparametric extension controls the FWER within any set of homogeneous treatments. It proves that the extended procedure controls the FDR when there are well-separated clusters of homogeneous means and between-cluster test statistics are independent, and extensive simulation provides strong evidence that the original procedure controls the FDR under the same conditions and some dependent conditions when the clusters are not well-separated. Thus, the test has two desirable error-controlling properties, providing a compromise between FDR control with no subgroup FWER control and global FWER control. Yekutieli (2002) developed an FDR-controlling procedure for testing all pairwise differences among means, without any FWER-controlling criteria when there is more than one cluster. The empirica example in Yekutieli's paper was used to compare the Benjamini-Hochberg (1995) method with apparent FDR control in this context, Yekutieli's proposed method with proven FDR control, the Newman-Keuls method that controls FWER within equal clusters with apparent FDR control, and several methods that control FWER globally. The Newman-Keuls is shown to be intermediate in number of rejections to the FWER-controlling methods and the FDR-controlling methods in this example, although it is not always more conservative than the other FDR-controlling methods.  相似文献   

6.
Associations between microsatellite markers and traits related to growth and fatness were investigated using resource broiler population. A sire-line x dam-line F1 male was backcrossed to 12 dam-line females to produce 24 sires and 47 dams of the backcross 1 (BC1) generation. These 71 parents were genotyped with 76 microsatellite markers. Following full-sib mating among the parents, 234 BC1-F2 progeny were phenotyped for five growth traits (body weight at 49 days from hatch, wog weight, front half weight, breast weight and tender weight) and abdominal fat weight. Maximum likelihood analysis was used to estimate the marker effects and to evaluate their statistical significance. Individual marker-trait analysis revealed 44 significant associations out of the 456 marker-trait combinations. Correction for multiple comparisons by controlling the false discovery rate (FDR) resulted in 12 significant associations at FDR = 10% with markers on chromosomes 1, 2, 5 and 13. Seventy-five percent of the 44 significant associations displayed no dependence on either hatch or gender; half of the remaining associations displayed dependence of the quantitative trait loci (QTL) effect on hatch x gender interaction. Thus, the analysed traits in this study may be dependent on external factors.  相似文献   

7.
In quantitative proteomics, the false discovery rate (FDR) can be defined as the number of false positives within statistically significant changes in expression. False positives accumulate during the simultaneous testing of expression changes across hundreds or thousands of protein or peptide species when univariate tests such as the Student's t test are used. Currently most researchers rely solely on the estimation of p values and a significance threshold, but this approach may result in false positives because it does not account for the multiple testing effect. For each species, a measure of significance in terms of the FDR can be calculated, producing individual q values. The q value maintains power by allowing the investigator to achieve an acceptable level of true or false positives within the calls of significance. The q value approach relies on the use of the correct statistical test for the experimental design. In this situation, a uniform p value frequency distribution when there are no differences in expression between two samples should be obtained. Here we report a bias in p value distribution in the case of a three-dye DIGE experiment where no changes in expression are occurring. The bias was shown to arise from correlation in the data from the use of a common internal standard. With a two-dye schema, where each sample has its own internal standard, such bias was removed, enabling the application of the q value to two different proteomics studies. In the case of the first study, we demonstrate that 80% of calls of significance by the more traditional method are false positives. In the second, we show that calculating the q value gives the user control over the FDR. These studies demonstrate the power and ease of use of the q value in correcting for multiple testing. This work also highlights the need for robust experimental design that includes the appropriate application of statistical procedures.  相似文献   

8.
9.
From a partial genomic library enriched for GATA short tandem repeats, we developed 12 polymorphic microsatellite loci from the green‐backed tit (Parus monticolus). We characterized these loci by genotyping 30 adult individuals with unknown relationship. The number of alleles ranged from four to 17 per locus (mean = 9.3 alleles) and the observed heterozygosity for each locus ranged from 0.633 to 0.933 (mean = 0.789). All loci conformed to Hardy–Weinberg expectations. Four of 66 possible pairwise comparisons between loci showed significant gametic disequilibrium.  相似文献   

10.
Ten highly polymorphic microsatellite loci are described for Amazonian red‐handed howlers (Alouatta belzebul), an endemic Brazilian primate species subject to intense hunting pressure. The number of alleles observed in 30 individuals ranged from nine to 20, and observed heterozygosity varied from 0.35 to 0.93. No linkage associations were evident from pairwise comparisons of loci. These microsatellites offer a powerful tool for fine‐scale studies of genetic structure in both captive colonies and wild populations of red‐handed howlers.  相似文献   

11.
The use of multiple hypothesis testing procedures has been receiving a lot of attention recently by statisticians in DNA microarray analysis. The traditional FWER controlling procedures are not very useful in this situation since the experiments are exploratory by nature and researchers are more interested in controlling the rate of false positives rather than controlling the probability of making a single erroneous decision. This has led to increased use of FDR (False Discovery Rate) controlling procedures. Genovese and Wasserman proposed a single-step FDR procedure that is an asymptotic approximation to the original Benjamini and Hochberg stepwise procedure. In this paper, we modify the Genovese-Wasserman procedure to force the FDR control closer to the level alpha in the independence setting. Assuming that the data comes from a mixture of two normals, we also propose to make this procedure adaptive by first estimating the parameters using the EM algorithm and then using these estimated parameters into the above modification of the Genovese-Wasserman procedure. We compare this procedure with the original Benjamini-Hochberg and the SAM thresholding procedures. The FDR control and other properties of this adaptive procedure are verified numerically.  相似文献   

12.
The data from genome-wide association studies (GWAS) in humans are still predominantly analyzed using single-marker association methods. As an alternative to single-marker analysis (SMA), all or subsets of markers can be tested simultaneously. This approach requires a form of penalized regression (PR) as the number of SNPs is much larger than the sample size. Here we review PR methods in the context of GWAS, extend them to perform penalty parameter and SNP selection by false discovery rate (FDR) control, and assess their performance in comparison with SMA. PR methods were compared with SMA, using realistically simulated GWAS data with a continuous phenotype and real data. Based on these comparisons our analytic FDR criterion may currently be the best approach to SNP selection using PR for GWAS. We found that PR with FDR control provides substantially more power than SMA with genome-wide type-I error control but somewhat less power than SMA with Benjamini–Hochberg FDR control (SMA-BH). PR with FDR-based penalty parameter selection controlled the FDR somewhat conservatively while SMA-BH may not achieve FDR control in all situations. Differences among PR methods seem quite small when the focus is on SNP selection with FDR control. Incorporating linkage disequilibrium into the penalization by adapting penalties developed for covariates measured on graphs can improve power but also generate more false positives or wider regions for follow-up. We recommend the elastic net with a mixing weight for the Lasso penalty near 0.5 as the best method.  相似文献   

13.
Thirty‐three microsatellite loci were isolated for the Australian rainforest tree Macadamia integrifolia. Genotyping across a test panel of 43 commercial cultivars generated an average polymorphic information content of 0.480. Five loci showed no polymorphism across cultivars. Significant linkage disequilibrium was detected in 10 pairwise comparisons, including two pairs of loci identified from the same clone sequence. The 33 microsatellite loci represent a significant tool for genome mapping and population genetic studies.  相似文献   

14.
MOTIVATION: False discovery rate (FDR) is defined as the expected percentage of false positives among all the claimed positives. In practice, with the true FDR unknown, an estimated FDR can serve as a criterion to evaluate the performance of various statistical methods under the condition that the estimated FDR approximates the true FDR well, or at least, it does not improperly favor or disfavor any particular method. Permutation methods have become popular to estimate FDR in genomic studies. The purpose of this paper is 2-fold. First, we investigate theoretically and empirically whether the standard permutation-based FDR estimator is biased, and if so, whether the bias inappropriately favors or disfavors any method. Second, we propose a simple modification of the standard permutation to yield a better FDR estimator, which can in turn serve as a more fair criterion to evaluate various statistical methods. RESULTS: Both simulated and real data examples are used for illustration and comparison. Three commonly used test statistics, the sample mean, SAM statistic and Student's t-statistic, are considered. The results show that the standard permutation method overestimates FDR. The overestimation is the most severe for the sample mean statistic while the least for the t-statistic with the SAM-statistic lying between the two extremes, suggesting that one has to be cautious when using the standard permutation-based FDR estimates to evaluate various statistical methods. In addition, our proposed FDR estimation method is simple and outperforms the standard method.  相似文献   

15.
Use of microsatellite loci to classify individuals by relatedness   总被引:19,自引:1,他引:18  
This study investigates the use of microsatellite loci for estimating relatedness between individuals in wild, outbred, vertebrate populations. We measured allele frequencies at 20 unlinked, dinucleotide-repeat microsatellite loci in a population of wild mice ( Mus musculus ), and used these observed frequencies to generate the expected distributions of pairwise relatedness among full sib, half sib, and unrelated pairs of individuals, as would be estimated from the microsatellite data. In this population one should be able to discriminate between unrelated and full-sib dyads with at least 97% accuracy, and to discriminate half-sib pairs from unrelated pairs or from full-sib pairs with better than 80% accuracy. If one uses the criterion that parent-offspring pairs must share at least one allele per locus, then only 15% of full-sib pairs, 2% of half-sib pairs, and 0% of unrelated pairs in this population would qualify as potential parent-offspring pairs. We verified that the simulation results (which assume a random mating population in Hardy-Weinberg and linkage equilibrium) accurately predict results one would obtain from this population in real life by scoring laboratory-bred full- and half-sib families whose parents were wild-caught mice from the study population. We also investigated the effects of using different numbers of loci, or loci of different average heterozygosities ( He ), on misclassification frequencies. Both variables have strong effects on misclassification rate. For example, it requires almost twice as many loci of He = 0.62 to achieve the same accuracy as a given number of loci of He = 0.75. Finally, we tested the ability of UPGMA clustering to identify family groups in our population. Clustering of allele matching scores among the offspring of four sets of independent maternal half sibships (four females, each mated to two different males) perfectly recovered the true family relationships.  相似文献   

16.
Parentage analysis in natural populations presents a valuable yet unique challenge because of large numbers of pairwise comparisons, marker set limitations and few sampled true parent-offspring pairs. These limitations can result in the incorrect assignment of false parent-offspring pairs that share alleles across multi-locus genotypes by chance alone. I first define a probability, Pr(δ), to estimate the expected number of false parent-offspring pairs within a data set. This probability can be used to determine whether one can accept all putative parent-offspring pairs with strict exclusion. I next define the probability Pr(φ|λ), which employs Bayes' theorem to determine the probability of a putative parent-offspring pair being false given the frequencies of shared alleles. This probability can be used to separate true parent-offspring pairs from false pairs that occur by chance when a data set lacks sufficient numbers of loci to accept all putative parent-offspring pairs. Finally, I propose a method to quantitatively determine how many loci to let mismatch for study-specific error rates and demonstrate that few data sets should need to allow more than two loci to mismatch. I test all theoretical predictions with simulated data and find that, first, Pr(δ) and Pr(φ|λ) have very low bias, and second, that power increases with lower sample sizes, uniform allele frequency distributions, and higher numbers of loci and alleles per locus. Comparisons of Pr(φ|λ) to strict exclusion and CERVUS demonstrate that this method may be most appropriate for large natural populations when supplemental data (e.g. genealogies, candidate parents) are absent.  相似文献   

17.
The statistical validation of database search results is a complex issue in bottom-up proteomics. The correct and incorrect peptide spectrum match (PSM) scores overlap significantly, making an accurate assessment of true peptide matches challenging. Since the complete separation between the true and false hits is practically never achieved, there is need for better methods and rescoring algorithms to improve upon the primary database search results. Here we describe the calibration and False Discovery Rate (FDR) estimation of database search scores through a dynamic FDR calculation method, FlexiFDR, which increases both the sensitivity and specificity of search results. Modelling a simple linear regression on the decoy hits for different charge states, the method maximized the number of true positives and reduced the number of false negatives in several standard datasets of varying complexity (18-mix, 49-mix, 200-mix) and few complex datasets (E. coli and Yeast) obtained from a wide variety of MS platforms. The net positive gain for correct spectral and peptide identifications was up to 14.81% and 6.2% respectively. The approach is applicable to different search methodologies- separate as well as concatenated database search, high mass accuracy, and semi-tryptic and modification searches. FlexiFDR was also applied to Mascot results and showed better performance than before. We have shown that appropriate threshold learnt from decoys, can be very effective in improving the database search results. FlexiFDR adapts itself to different instruments, data types and MS platforms. It learns from the decoy hits and sets a flexible threshold that automatically aligns itself to the underlying variables of data quality and size.  相似文献   

18.
The target-decoy database search strategy is widely accepted as a standard method for estimating the false discovery rate (FDR) of peptide identification, based on which peptide-spectrum matches (PSMs) from the target database are filtered. To improve the sensitivity of protein identification given a fixed accuracy (frequently defined by a protein FDR threshold), a postprocessing procedure is often used that integrates results from different peptide search engines that had assayed the same data set. In this work, we show that PSMs that are grouped by the precursor charge, the number of missed internal cleavage sites, the modification state, and the numbers of protease termini and that the proteins grouped by their unique peptide count should be filtered separately according to the given FDR. We also develop an iterative procedure to filter the PSMs and proteins simultaneously, according to the given FDR. Finally, we present a general framework to integrate the results from different peptide search engines using the same FDR threshold. Our method was tested with several shotgun proteomics data sets that were acquired by multiple LC/MS instruments from two different biological samples. The results showed a satisfactory performance. We implemented the method in a user-friendly software package called BuildSummary, which can be downloaded for free from http://www.proteomics.ac.cn/software/proteomicstools/index.htm as part of the software suite ProteomicsTools.  相似文献   

19.
MOTIVATION: Statistical tests for the detection of differentially expressed genes lead to a large collection of p-values one for each gene comparison. Without any further adjustment, these p-values may lead to a large number of false positives, simply because the number of genes to be tested is huge, which might mean wastage of laboratory resources. To account for multiple hypotheses, these p-values are typically adjusted using a single step method or a step-down method in order to achieve an overall control of the error rate (the so-called familywise error rate). In many applications, this may lead to an overly conservative strategy leading to too few genes being flagged. RESULTS: In this paper we introduce a novel empirical Bayes screening (EBS) technique to inspect a large number of p-values in an effort to detect additional positive cases. In effect, each case borrows strength from an overall picture of the alternative hypotheses computed from all the p-values, while the entire procedure is calibrated by a step-down method so that the familywise error rate at the complete null hypothesis is still controlled. It is shown that the EBS has substantially higher sensitivity than the standard step-down approach for multiple comparison at the cost of a modest increase in the false discovery rate (FDR). The EBS procedure also compares favorably when compared with existing FDR control procedures for multiple testing. The EBS procedure is particularly useful in situations where it is important to identify all possible potentially positive cases which can be subjected to further confirmatory testing in order to eliminate the false positives. We illustrated this screening procedure using a data set on human colorectal cancer where we show that the EBS method detected additional genes related to colon cancer that were missed by other methods.This novel empirical Bayes procedure is advantageous over our earlier proposed empirical Bayes adjustments due to the following reasons: (i) it offers an automatic screening of the p-values the user may obtain from a univariate (i.e., gene by gene) analysis package making it extremely easy to use for a non-statistician, (ii) since it applies to the p-values, the tests do not have to be t-tests; in particular they could be F-tests which might arise in certain ANOVA formulations with expression data or even nonparametric tests, (iii) the empirical Bayes adjustment uses nonparametric function estimation techniques to estimate the marginal density of the transformed p-values rather than using a parametric model for the prior distribution and is therefore robust against model mis-specification. AVAILABILITY: R code for EBS is available from the authors upon request. SUPPLEMENTARY INFORMATION: http://www.stat.uga.edu/~datta/EBS/supp.htm  相似文献   

20.
Sabatti C  Service S  Freimer N 《Genetics》2003,164(2):829-833
We explore the implications of the false discovery rate (FDR) controlling procedure in disease gene mapping. With the aid of simulations, we show how, under models commonly used, the simple step-down procedure introduced by Benjamini and Hochberg controls the FDR for the dependent tests on which linkage and association genome screens are based. This adaptive multiple comparison procedure may offer an important tool for mapping susceptibility genes for complex diseases.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号