首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Genetic association studies routinely involve massive numbers of statistical tests accompanied by P-values. Whole genome sequencing technologies increased the potential number of tested variants to tens of millions. The more tests are performed, the smaller P-value is required to be deemed significant. However, a small P-value is not equivalent to small chances of a spurious finding and significance thresholds may fail to serve as efficient filters against false results. While the Bayesian approach can provide a direct assessment of the probability that a finding is spurious, its adoption in association studies has been slow, due in part to the ubiquity of P-values and the automated way they are, as a rule, produced by software packages. Attempts to design simple ways to convert an association P-value into the probability that a finding is spurious have been met with difficulties. The False Positive Report Probability (FPRP) method has gained increasing popularity. However, FPRP is not designed to estimate the probability for a particular finding, because it is defined for an entire region of hypothetical findings with P-values at least as small as the one observed for that finding. Here we propose a method that lets researchers extract probability that a finding is spurious directly from a P-value. Considering the counterpart of that probability, we term this method POFIG: the Probability that a Finding is Genuine. Our approach shares FPRP''s simplicity, but gives a valid probability that a finding is spurious given a P-value. In addition to straightforward interpretation, POFIG has desirable statistical properties. The POFIG average across a set of tentative associations provides an estimated proportion of false discoveries in that set. POFIGs are easily combined across studies and are immune to multiple testing and selection bias. We illustrate an application of POFIG method via analysis of GWAS associations with Crohn''s disease.  相似文献   

2.
Identifying a biomarker or treatment-dose threshold that marks a specified level of risk is an important problem, especially in clinical trials. In view of this goal, we consider a covariate-adjusted threshold-based interventional estimand, which happens to equal the binary treatment–specific mean estimand from the causal inference literature obtained by dichotomizing the continuous biomarker or treatment as above or below a threshold. The unadjusted version of this estimand was considered in Donovan et al.. Expanding upon Stitelman et al., we show that this estimand, under conditions, identifies the expected outcome of a stochastic intervention that sets the treatment dose of all participants above the threshold. We propose a novel nonparametric efficient estimator for the covariate-adjusted threshold-response function for the case of informative outcome missingness, which utilizes machine learning and targeted minimum-loss estimation (TMLE). We prove the estimator is efficient and characterize its asymptotic distribution and robustness properties. Construction of simultaneous 95% confidence bands for the threshold-specific estimand across a set of thresholds is discussed. In the Supporting Information, we discuss how to adjust our estimator when the biomarker is missing at random, as occurs in clinical trials with biased sampling designs, using inverse probability weighting. Efficiency and bias reduction of the proposed estimator are assessed in simulations. The methods are employed to estimate neutralizing antibody thresholds for virologically confirmed dengue risk in the CYD14 and CYD15 dengue vaccine trials.  相似文献   

3.
We propose a general working strategy to deal with incomplete reference libraries in the DNA barcoding identification of species. Considering that (1) queries with a large genetic distance with their best DNA barcode match are more likely to be misidentified and (2) imposing a distance threshold profitably reduces identification errors, we modelled relationships between identification performances and distance thresholds in four DNA barcode libraries of Diptera (n = 4270), Lepidoptera (n = 7577), Hymenoptera (n = 2067) and Tephritidae (n = 602 DNA barcodes). In all cases, more restrictive distance thresholds produced a gradual increase in the proportion of true negatives, a gradual decrease of false positives and more abrupt variations in the proportions of true positives and false negatives. More restrictive distance thresholds improved precision, yet negatively affected accuracy due to the higher proportions of queries discarded (viz. having a distance query-best match above the threshold). Using a simple linear regression we calculated an ad hoc distance threshold for the tephritid library producing an estimated relative identification error <0.05. According to the expectations, when we used this threshold for the identification of 188 independently collected tephritids, less than 5% of queries with a distance query-best match below the threshold were misidentified. Ad hoc thresholds can be calculated for each particular reference library of DNA barcodes and should be used as cut-off mark defining whether we can proceed identifying the query with a known estimated error probability (e.g. 5%) or whether we should discard the query and consider alternative/complementary identification methods.  相似文献   

4.
Amplified fragment length polymorphism (AFLP) has been widely used for clone identification, but numerous studies have shown that clonemates do not always present identical AFLP fingerprints. Pairwise AFLP distances that distinguish known clones from nonclones have been used to identify a threshold genetic dissimilarity distance below which samples are considered to represent a single clone. Most studies to date have reported threshold values between 2% and 4%. Here, I determine the consistency of the clonal threshold across five species in the tropical plant genus Piper, and evaluate the sensitivity of genetic diversity indices and estimates of frequency of clonal reproduction to the threshold value selected. I sampled multiple ramets per individual from widely distributed plants for each of the five Piper species to set a threshold at the point where the error rate of clonal assignments was lowest. I then sampled all individuals of each shade‐tolerant species in a 1‐ha plot, and of each light‐demanding species in 25 × 35‐m plot, to estimate the frequency of asexual recruitment in natural populations using a series of different thresholds including the threshold set with the preliminary sampling. Clonal threshold values for the different species ranged from 0% to 5% AFLP genetic dissimilarity distance. To determine the sensitivity of estimates of clonal reproduction, I calculated several clonal diversity indexes for the natural populations of each of the five species guided by the range in clonal threshold values observed across the five Piper species. I show that small changes in the value of the clonal threshold can lead to very different conclusions regarding the level of clonal reproduction in natural populations.  相似文献   

5.
In this project I investigate the use and possible misuse of p values in papers published in five (high-ranked) journals in experimental psychology. I use a data set of over 135’000 p values from more than five thousand papers. I inspect (1) the way in which the p values are reported and (2) their distribution. The main findings are following: first, it appears that some authors choose the mode of reporting their results in an arbitrary way. Moreover, they often end up doing it in such a way that makes their findings seem more statistically significant than they really are (which is well known to improve the chances for publication). Specifically, they frequently report p values “just above” significance thresholds directly, whereas other values are reported by means of inequalities (e.g. “p<.1”), they round the p values down more eagerly than up and appear to choose between the significance thresholds and between one- and two-sided tests only after seeing the data. Further, about 9.2% of reported p values are inconsistent with their underlying statistics (e.g. F or t) and it appears that there are “too many” “just significant” values. One interpretation of this is that researchers tend to choose the model or include/discard observations to bring the p value to the right side of the threshold.  相似文献   

6.

Introduction

Positive results have a greater chance of being published and outcomes that are statistically significant have a greater chance of being fully reported. One consequence of research underreporting is that it may influence the sample of studies that is available for a meta-analysis. Smaller studies are often characterized by larger effects in published meta-analyses, which can be possibly explained by publication bias. We investigated the association between the statistical significance of the results and the probability of being included in recent meta-analyses.

Methods

For meta-analyses of clinical trials, we defined the relative risk as the ratio of the probability of including statistically significant results favoring the treatment to the probability of including other results. For meta-analyses of other studies, we defined the relative risk as the ratio of the probability of including biologically plausible statistically significant results to the probability of including other results. We applied a Bayesian selection model for meta-analyses that included at least 30 studies and were published in four major general medical journals (BMJ, JAMA, Lancet, and PLOS Medicine) between 2008 and 2012.

Results

We identified 49 meta-analyses. The estimate of the relative risk was greater than one in 42 meta-analyses, greater than two in 16 meta-analyses, greater than three in eight meta-analyses, and greater than five in four meta-analyses. In 10 out of 28 meta-analyses of clinical trials, there was strong evidence that statistically significant results favoring the treatment were more likely to be included. In 4 out of 19 meta-analyses of observational studies, there was strong evidence that plausible statistically significant outcomes had a higher probability of being included.

Conclusions

Publication bias was present in a substantial proportion of large meta-analyses that were recently published in four major medical journals.  相似文献   

7.

Background

Beef carcass conformation and fat cover scores are measured by subjective grading performed by trained technicians. The discrete nature of these scores is taken into account in genetic evaluations using a threshold model, which assumes an underlying continuous distribution called liability that can be modelled by different methods.

Methods

Five threshold models were compared in this study: three threshold linear models, one including slaughterhouse and sex effects, along with other systematic effects, with homogeneous thresholds and two extensions with heterogeneous thresholds that vary across slaughterhouses and across slaughterhouse and sex and a generalised linear model with reverse extreme value errors. For this last model, the underlying variable followed a Weibull distribution and was both a log-linear model and a grouped data model. The fifth model was an extension of grouped data models with score-dependent effects in order to allow for heterogeneous thresholds that vary across slaughterhouse and sex. Goodness-of-fit of these models was tested using the bootstrap methodology. Field data included 2,539 carcasses of the Bruna dels Pirineus beef cattle breed.

Results

Differences in carcass conformation and fat cover scores among slaughterhouses could not be totally captured by a systematic slaughterhouse effect, as fitted in the threshold linear model with homogeneous thresholds, and different thresholds per slaughterhouse were estimated using a slaughterhouse-specific threshold model. This model fixed most of the deficiencies when stratification by slaughterhouse was done, but it still failed to correctly fit frequencies stratified by sex, especially for fat cover, as 5 of the 8 current percentages were not included within the bootstrap interval. This indicates that scoring varied with sex and a specific sex per slaughterhouse threshold linear model should be used in order to guarantee the goodness-of-fit of the genetic evaluation model. This was also observed in grouped data models that avoided fitting deficiencies when slaughterhouse and sex effects were score-dependent.

Conclusions

Both threshold linear models and grouped data models can guarantee the goodness-of-fit of the genetic evaluation for carcass conformation and fat cover, but our results highlight the need for specific thresholds by sex and slaughterhouse in order to avoid fitting deficiencies.  相似文献   

8.
The objectives of this study were to quantify the errors in economic values (EVs) for traits affected by cost or price thresholds when skewed or kurtotic distributions of varying degree are assumed to be normal and when data with a normal distribution is subject to censoring. EVs were estimated for a continuous trait with dichotomous economic implications because of a price premium or penalty arising from a threshold ranging between −4 and 4 standard deviations from the mean. In order to evaluate the impacts of skewness, positive and negative excess kurtosis, standard skew normal, Pearson and the raised cosine distributions were used, respectively. For the various evaluable levels of skewness and kurtosis, the results showed that EVs can be underestimated or overestimated by more than 100% when price determining thresholds fall within a range from the mean that might be expected in practice. Estimates of EVs were very sensitive to censoring or missing data. In contrast to practical genetic evaluation, economic evaluation is very sensitive to lack of normality and missing data. Although in some special situations, the presence of multiple thresholds may attenuate the combined effect of errors at each threshold point, in practical situations there is a tendency for a few key thresholds to dominate the EV, and there are many situations where errors could be compounded across multiple thresholds. In the development of breeding objectives for non-normal continuous traits influenced by value thresholds, it is necessary to select a transformation that will resolve problems of non-normality or consider alternative methods that are less sensitive to non-normality.  相似文献   

9.
A V Leonidov  A K Ezhov 《Biofizika》1991,36(4):703-707
On the basis of an analysis of mathematical models realizing concept of discrete and signal detection probability of the threshold it is shown that utilization for describing the discrete model concepts of probability and the apparatus of Dirac delta functions and root-mean-square of noise distribution being reduced to zero in the continuous model the analytical expressions of the models are identical. The evidence obtained shows inner unity of the examined thresholds models and universal nature of the thresholds model of Swets, Tanner and Birdsall built on the basis of the statistical theory of signal detectability. It provides solution of one of the central problems of psychophysics--that of the threshold of the sensory systems.  相似文献   

10.
The diffusion model for a population subject to Malthusian growth is generalized to include regulation effects. This is done by incorporating a logarithmic term in the regulation function in a way to obtain, in the absence of noise, an S-shaped growth law retaining the qualitative features of the logistic growth curve. The growth phenomenon is modeled as a diffusion process whose transition p.d.f. is obtained in closed form. Its steady state behavior turns out to be described by the lognormal distribution. The expected values and the mode of the transition p.d.f. are calculated, and it is proved that their time course is also represented by monotonically increasing functions asymptotically approaching saturation values. The first passage time problem is then considered. The Laplace transform of the first passage time p.d.f. is obtained for arbitrary thresholds and is used to calculate the expected value of the first passage time. The inverse Laplace transform is then determined for a threshold equal to the saturation value attained by the population size in the absence of random components. The probability of absorption for an arbitrary barrier is finally calculated as the limit of the absorption probability in a two-barrier problem.  相似文献   

11.
12.
13.
The pure-tone thresholds of four domestic female chickens were determined from 2 Hz to 9 kHz using the method of conditioned suppression/avoidance. At a level of 60 dB sound pressure level (re 20 μN/m2), their hearing range extends from 9.1 Hz to 7.2 kHz, with a best sensitivity of 2.6 dB at 2 kHz. Chickens have better sensitivity than humans for frequencies below 64 Hz; indeed, their sensitivity to infrasound exceeds that of the homing pigeon. However, when threshold testing moved to the lower frequencies, the animals required additional training before their final thresholds were obtained, suggesting that they may perceive frequencies below 64 Hz differently than higher frequencies.  相似文献   

14.
Quantifying the impact of scientific research is almost always controversial, and there is a need for a uniform method that can be applied across all fields. Increasingly, however, the quantification has been summed up in the impact factor of the journal in which the work is published, which is known to show differences between fields. Here the h-index, a way to summarize an individual's highly cited work, was calculated for journals over a twenty year time span and compared to the size of the journal in four fields, Agriculture, Condensed Matter Physics, Genetics and Heredity and Mathematical Physics. There is a linear log-log relationship between the h-index and the size of the journal: the larger the journal, the more likely it is to have a high h-index. The four fields cannot be separated from each other suggesting that this relationship applies to all fields. A strike rate index (SRI) based on the log relationship of the h-index and the size of the journal shows a similar distribution in the four fields, with similar thresholds for quality, allowing journals across diverse fields to be compared to each other. The SRI explains more than four times the variation in citation counts compared to the impact factor.  相似文献   

15.
Publication bias leads consumers of research to observe a selected sample of statistical estimates calculated by producers of research. We calculate critical values for statistical significance that could help to adjust after the fact for the distortions created by this selection effect, assuming that the only source of publication bias is file drawer bias. These adjusted critical values are easy to calculate and differ from unadjusted critical values by approximately 50%—rather than rejecting a null hypothesis when the t-ratio exceeds 2, the analysis suggests rejecting a null hypothesis when the t-ratio exceeds 3. Samples of published social science research indicate that on average, across research fields, approximately 30% of published t-statistics fall between the standard and adjusted cutoffs.  相似文献   

16.
Species distribution models are used for a range of ecological and evolutionary questions, but often are constructed from few and/or biased species occurrence records. Recent work has shown that the presence‐only model Maxent performs well with small sample sizes. While the apparent accuracy of such models with small samples has been studied, less emphasis has been placed on the effect of small or biased species records on the secondary modeling steps, specifically accuracy assessment and threshold selection, particularly with profile (presence‐only) modeling techniques. When testing the effects of small sample sizes on distribution models, accuracy assessment has generally been conducted with complete species occurrence data, rather than similarly limited (e.g. few or biased) test data. Likewise, selection of a probability threshold – a selection of probability that classifies a model into discrete areas of presences and absences – has also generally been conducted with complete data. In this study we subsampled distribution data for an endangered rodent across multiple years to assess the effects of different sample sizes and types of bias on threshold selection, and examine the differences between apparent and actual accuracy of the models. Although some previously recommended threshold selection techniques showed little difference in threshold selection, the most commonly used methods performed poorly. Apparent model accuracy calculated from limited data was much higher than true model accuracy, but the true model accuracy was lower than it could have been with a more optimal threshold. That is, models with thresholds and accuracy calculated from biased and limited data had inflated reported accuracy, but were less accurate than they could have been if better data on species distribution were available and an optimal threshold were used.  相似文献   

17.
Goldringer I  Bataillon T 《Genetics》2004,168(1):563-568
The effective population size (Ne) is frequently estimated using temporal changes in allele frequencies at neutral markers. Such temporal changes in allele frequencies are usually estimated from the standardized variance in allele frequencies (Fc). We simulate Wright-Fisher populations to generate expected distributions of Fc and of Fc (Fc averaged over several loci). We explore the adjustment of these simulated Fc distributions to a chi-square distribution and evaluate the resulting precision on the estimation of Ne for various scenarios. Next, we outline a procedure to test for the homogeneity of the individual Fc across loci and identify markers exhibiting extreme Fc-values compared to the rest of the genome. Such loci are likely to be in genomic areas undergoing selection, driving Fc to values greater (or smaller) than expected under drift alone. Our procedure assigns a P-value to each locus under the null hypothesis (drift is homogeneous throughout the genome) and simultaneously controls the rate of false positive among loci declared as departing significantly from the null. The procedure is illustrated using two published data sets: (i) an experimental wheat population subject to natural selection and (ii) a maize population undergoing recurrent selection.  相似文献   

18.
Abstract:  Duplication of previously published text or figures in the scientific literature without adequate citation is plagiarism or, in the case of an author's own work, self-plagiarism. It breaches the ethical standards that are expected in science and threatens the integrity of scientific journals. Three examples of duplication are noted, one of which involves Palaeontology . Redundant publication lowers the quality of scientific literature, damages the good standing of journals, and reduces the intellectual impact of a study. Multiple papers on a particular theme are only acceptable if each builds significantly upon previous work and contains only as much background information as necessary to put the new data and observations into perspective.  相似文献   

19.
20.
The search for generality in ecology should include assessing the influence of studies done in one system on those done in other systems. Assuming generality is reflected in citation patterns, we analyzed frequencies of terrestrial, marine, and freshwater citations in papers categorized as terrestrial, marine and freshwater in high-impact “general” ecological journals. Citation frequencies were strikingly asymmetric. Aquatic researchers cited terrestrial papers ~ 10 times more often than the reverse, implying uneven cross-fertilization of information between aquatic and terrestrial ecologists. Comparisons between citation frequencies in the early 1980s and the early 2000s for two of the seven journals yielded similar results. Summing across all journals, 60% of all research papers (n = 5824) published in these journals in 2002–2006 were terrestrial vs. 9% freshwater and 8% marine. Since total numbers of terrestrial and aquatic ecologists are more similar than these proportions suggest, the representation of publications by habitat in “general” ecological journals appears disproportional and unrepresentative of the ecological science community at large. Such asymmetries are a concern because (1) aquatic and terrestrial systems can be tightly integrated, (2) pressure for across-system understanding to meet the challenge of climate change is increasing, (3) citation asymmetry implies barriers to among-system flow of understanding, thus (4) impeding scientific and societal progress. Changing this imbalance likely depends on a bottom-up approach originating from the ecological community, through pressure on societies, journals, editors and reviewers.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号