期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Modelling publication bias and p-hacking

Jonas Moss Riccardo De Bin 《Biometrics》2023,79(1):319-331

Publication bias and p-hacking are two well-known phenomena that strongly affect the scientific literature and cause severe problems in meta-analyses. Due to these phenomena, the assumptions of meta-analyses are seriously violated and the results of the studies cannot be trusted. While publication bias is very often captured well by the weighting function selection model, p-hacking is much harder to model and no definitive solution has been found yet. In this paper, we advocate the selection model approach to model publication bias and propose a mixture model for p-hacking. We derive some properties for these models, and we compare them formally and through simulations. Finally, two real data examples are used to show how the models work in practice. 相似文献

2.

Accurate Computation of Survival Statistics in Genome-Wide Studies

Fabio Vandin Alexandra Papoutsaki Benjamin J. Raphael Eli Upfal 《PLoS computational biology》2015,11(5)

A key challenge in genomics is to identify genetic variants that distinguish patients with different survival time following diagnosis or treatment. While the log-rank test is widely used for this purpose, nearly all implementations of the log-rank test rely on an asymptotic approximation that is not appropriate in many genomics applications. This is because: the two populations determined by a genetic variant may have very different sizes; and the evaluation of many possible variants demands highly accurate computation of very small p-values. We demonstrate this problem for cancer genomics data where the standard log-rank test leads to many false positive associations between somatic mutations and survival time. We develop and analyze a novel algorithm, Exact Log-rank Test (ExaLT), that accurately computes the p-value of the log-rank statistic under an exact distribution that is appropriate for any size populations. We demonstrate the advantages of ExaLT on data from published cancer genomics studies, finding significant differences from the reported p-values. We analyze somatic mutations in six cancer types from The Cancer Genome Atlas (TCGA), finding mutations with known association to survival as well as several novel associations. In contrast, standard implementations of the log-rank test report dozens-hundreds of likely false positive associations as more significant than these known associations. 相似文献

3.

False discovery rate control for multiple testing based on discrete p-values

Xiongzhi Chen 《Biometrical journal. Biometrische Zeitschrift》2020,62(4):1060-1079

For multiple testing based on discrete p-values, we propose a false discovery rate (FDR) procedure “BH+” with proven conservativeness. BH+ is at least as powerful as the BH (i.e., Benjamini-Hochberg) procedure when they are applied to superuniform p-values. Further, when applied to mid-p-values, BH+ can be more powerful than it is applied to conventional p-values. An easily verifiable necessary and sufficient condition for this is provided. BH+ is perhaps the first conservative FDR procedure applicable to mid-p-values and to p-values with general distributions. It is applied to multiple testing based on discrete p-values in a methylation study, an HIV study and a clinical safety study, where it makes considerably more discoveries than the BH procedure. In addition, we propose an adaptive version of the BH+ procedure, prove its conservativeness under certain conditions, and provide evidence on its excellent performance via simulation studies. 相似文献

4.

Periodontal Disease and Risk of Head and Neck Cancer: A Meta-Analysis of Observational Studies

Xian-Tao Zeng Ai-Ping Deng Cheng Li Ling-Yun Xia Yu-Ming Niu Wei-Dong Leng 《PloS one》2013,8(10)

Background

Many epidemiological studies have found a positive association of periodontal disease (PD) with risk of head and neck cancer (HNC), but the findings are varied or even contradictory. In this work, we performed a meta-analysis to ascertain the relationship between PD and HNC risk.

Methods

We searched the PubMed, Embase, and Cochrane Library databases for relevant observational studies on the association between PD and HNC risk published up to March 23, 2013. Data from the included studies were extracted and analyzed independently by two authors. Meta-analysis was performed using RevMan 5.2 software.

Results

We obtained seven observational studies involving two cohort and six case-control studies. Random-effects meta-analysis indicated a significant association between PD and HNC risk (odds ratio = 2.63, 95% confidence interval = 1.1.68 - 4.14; p < 0.001), with sensitivity analysis showing that the result was robust. Subgroup analyses based on adjustment for covariates, study design, PD assessment, tumor site, and ethnicity also revealed a significant association.

Conclusions

Based on currently evidence, PD is probably a significant and independent risk factor of HNC. 相似文献

5.

Trials and tribulations of statistical significance in biochemistry and omics

《Trends in biochemical sciences》2023,48(6):503-512

Over recent years many statisticians and researchers have highlighted that statistical inference would benefit from a better use and understanding of hypothesis testing, p-values, and statistical significance. We highlight three recommendations in the context of biochemical sciences. First recommendation: to improve the biological interpretation of biochemical data, do not use p-values (or similar test statistics) as thresholded values to select biomolecules. Second recommendation: to improve comparison among studies and to achieve robust knowledge, perform complete reporting of data. Third recommendation: statistical analyses should be reported completely with exact numbers (not as asterisks or inequalities). Owing to the high number of variables, a better use of statistics is of special importance in omic studies. 相似文献

6.

Beyond the E-Value: Stratified Statistics for Protein Domain Prediction

Alejandro Ochoa John D. Storey Manuel Llinás Mona Singh 《PLoS computational biology》2015,11(11)

E-values have been the dominant statistic for protein sequence analysis for the past two decades: from identifying statistically significant local sequence alignments to evaluating matches to hidden Markov models describing protein domain families. Here we formally show that for “stratified” multiple hypothesis testing problems—that is, those in which statistical tests can be partitioned naturally—controlling the local False Discovery Rate (lFDR) per stratum, or partition, yields the most predictions across the data at any given threshold on the FDR or E-value over all strata combined. For the important problem of protein domain prediction, a key step in characterizing protein structure, function and evolution, we show that stratifying statistical tests by domain family yields excellent results. We develop the first FDR-estimating algorithms for domain prediction, and evaluate how well thresholds based on q-values, E-values and lFDRs perform in domain prediction using five complementary approaches for estimating empirical FDRs in this context. We show that stratified q-value thresholds substantially outperform E-values. Contradicting our theoretical results, q-values also outperform lFDRs; however, our tests reveal a small but coherent subset of domain families, biased towards models for specific repetitive patterns, for which weaknesses in random sequence models yield notably inaccurate statistical significance measures. Usage of lFDR thresholds outperform q-values for the remaining families, which have as-expected noise, suggesting that further improvements in domain predictions can be achieved with improved modeling of random sequences. Overall, our theoretical and empirical findings suggest that the use of stratified q-values and lFDRs could result in improvements in a host of structured multiple hypothesis testing problems arising in bioinformatics, including genome-wide association studies, orthology prediction, and motif scanning. 相似文献

7.

A comparative review of estimates of the proportion unchanged genes and the false discovery rate

Per?Broberg Email author 《BMC bioinformatics》2005,6(1):199

Background

In the analysis of microarray data one generally produces a vector of p-values that for each gene give the likelihood of obtaining equally strong evidence of change by pure chance. The distribution of these p-values is a mixture of two components corresponding to the changed genes and the unchanged ones. The focus of this article is how to estimate the proportion unchanged and the false discovery rate (FDR) and how to make inferences based on these concepts. Six published methods for estimating the proportion unchanged genes are reviewed, two alternatives are presented, and all are tested on both simulated and real data. All estimates but one make do without any parametric assumptions concerning the distributions of the p-values. Furthermore, the estimation and use of the FDR and the closely related q-value is illustrated with examples. Five published estimates of the FDR and one new are presented and tested. Implementations in R code are available. 相似文献

8.

A Brief Review: The Z-curve Theory and its Application in Genome Analysis

Ren Zhang Chun-Ting Zhang 《Current Genomics》2014,15(2):78-94

In theoretical physics, there exist two basic mathematical approaches, algebraic and geometrical methods, which, in most cases, are complementary. In the area of genome sequence analysis, however, algebraic approaches have been widely used, while geometrical approaches have been less explored for a long time. The Z-curve theory is a geometrical approach to genome analysis. The Z-curve is a three-dimensional curve that represents a given DNA sequence in the sense that each can be uniquely reconstructed given the other. The Z-curve, therefore, contains all the information that the corresponding DNA sequence carries. The analysis of a DNA sequence can then be performed through studying the corresponding Z-curve. The Z-curve method has found applications in a wide range of areas in the past two decades, including the identifications of protein-coding genes, replication origins, horizontally-transferred genomic islands, promoters, translational start sides and isochores, as well as studies on phylogenetics, genome visualization and comparative genomics. Here, we review the progress of Z-curve studies from aspects of both theory and applications in genome analysis. 相似文献

9.

Does Publication Bias Inflate the Apparent Efficacy of Psychological Treatment for Major Depressive Disorder? A Systematic Review and Meta-Analysis of US National Institutes of Health-Funded Trials

Ellen Driessen Steven D. Hollon Claudi L. H. Bockting Pim Cuijpers Erick H. Turner 《PloS one》2015,10(9)

Background

The efficacy of antidepressant medication has been shown empirically to be overestimated due to publication bias, but this has only been inferred statistically with regard to psychological treatment for depression. We assessed directly the extent of study publication bias in trials examining the efficacy of psychological treatment for depression.

Methods and Findings

We identified US National Institutes of Health grants awarded to fund randomized clinical trials comparing psychological treatment to control conditions or other treatments in patients diagnosed with major depressive disorder for the period 1972–2008, and we determined whether those grants led to publications. For studies that were not published, data were requested from investigators and included in the meta-analyses. Thirteen (23.6%) of the 55 funded grants that began trials did not result in publications, and two others never started. Among comparisons to control conditions, adding unpublished studies (Hedges’ g = 0.20; CI_95% -0.11~0.51; k = 6) to published studies (g = 0.52; 0.37~0.68; k = 20) reduced the psychotherapy effect size point estimate (g = 0.39; 0.08~0.70) by 25%. Moreover, these findings may overestimate the "true" effect of psychological treatment for depression as outcome reporting bias could not be examined quantitatively.

Conclusion

The efficacy of psychological interventions for depression has been overestimated in the published literature, just as it has been for pharmacotherapy. Both are efficacious but not to the extent that the published literature would suggest. Funding agencies and journals should archive both original protocols and raw data from treatment trials to allow the detection and correction of outcome reporting bias. Clinicians, guidelines developers, and decision makers should be aware that the published literature overestimates the effects of the predominant treatments for depression. 相似文献

10.

Structural analysis of 1l-chiro-inositol diester from Taraxacumudum

Klaudia Michalska Jolanta Marciniuk 《Carbohydrate research》2010,345(1):172-174

1l-1,5-Di-O-p-hydroxyphenylacetyl-chiro-inositol was isolated from the leaves of Taraxacumudum, along with seven other secondary metabolites. Identification of the inositol derivative, based on extensive spectroscopic analyses (¹H, ¹³C and 2D NMR) in two solvents, allowed the correction of previously published data and conformational studies. This is the second report on the presence of inositol esters with p-hydroxyphenylacetic acid in plants. 相似文献

11.

On Fisher's Method of Combining p-Values

R. C. Elston 《Biometrical journal. Biometrische Zeitschrift》1991,33(3):339-345

The problem of combining p-values from independent experiments is discussed. It is shown that Fisher's solution to the problem can be derived from a “weight-free” method that has been suggested for the purpose of ranking vector observations (Biometrics 19: 85–97, 1963). The method implies that the value p = 0.37 is a critical one: p-values below 0.37 suggest that the null hypothesis is more likely to be false, whereas p-values above 0.37 suggest that it is more likely to be true. 相似文献

12.

A Note on Plots of P-Values to Evaluate Many Tests Simultaneously

Dr. James A. Koziol 《Biometrical journal. Biometrische Zeitschrift》1989,31(8):969-972

Schweder and Spjøtvoll (1982) proposed an informal graphical procedure for simultaneous evaluation of possibly related tests, based on a plot of cumulative p-values using the observed significance probabilities. We formalize this notion by application of Holm's (1979) sequentially rejective Bonferroni procedure: this maintains an overall experimentwise significance level, and yields an immediate estimate of the number of true hypotheses. 相似文献

13.

A review of prostate cancer incidence and mortality studies of farmers and non-farmers, 2002–2013

《Cancer epidemiology》2014,38(6):654-662

ObjectivesTo review the recent literature on the incidence and mortality of prostate cancer in farmers compared to non-farmers.MethodsSearches were conducted in seven electronic databases for observational studies published from 2002 to 2013. Studies were assessed against eligibility criteria and a narrative summary of findings presented.ResultsEighteen primary research articles were included in the review. Four of ten mortality studies and two of nine incidence studies reported statistically significant increases in prostate cancer risk in farmers. However, nearly half of all studies reported non-significant reductions in farmers’ risk. Additionally, one study reported significantly increased and decreased risk using different outcome measures. Results varied considerably by geographic region, study design and degree of control for confounders, affecting comparability and strength of findings.ConclusionsThe overall evidence for increased prostate cancer risk in farmers was weak. 相似文献

14.

Exponential growth, random transitions and progress through the G₁ phase: computer simulation of experimental data

R. Sennerstam J.-O. Strömberg 《Cell proliferation》1996,29(11):609-622

Abstract. At a time of increasing knowledge of gene and molecular regulation of cell cycle progression, a re-evaluation is presented concerning a phenomenon discussed before the present expanding era of cell cycle research. 'Random transition'and exponential slopes of α- and β-curves were conceived in the 1970s and early 1980s to explain cell cycle progression. An exponential behaviour of the β-curve was claimed as being necessary and sufficient for a 'random transition'in the cell cycle. In our present work, similar slopes of those curves were shown to materialize when the increase in mass of single cells was set as exponential in a structured cell cycle model where DNA replication and increase in cell mass were postulated to be two loosely coupled subcycles of the cell cycle, without introducing any 'random transition'. Findings published in the 1980s demonstrating the effect of serum depletion of 3T3 Balb-c cells were simulated and the shallower slope of the α- and β-curves found experimentally could be attributed to the reduced rate of exponential growth in cell mass, rather than to a reduced 'transition probability'. 相似文献

15.

A weighted FDR procedure under discrete and heterogeneous null distributions

Xiongzhi Chen R. W. Doerge Sanat K. Sarkar 《Biometrical journal. Biometrische Zeitschrift》2020,62(6):1544-1563

Multiple testing (MT) with false discovery rate (FDR) control has been widely conducted in the “discrete paradigm” where p-values have discrete and heterogeneous null distributions. However, in this scenario existing FDR procedures often lose some power and may yield unreliable inference, and for this scenario there does not seem to be an FDR procedure that partitions hypotheses into groups, employs data-adaptive weights and is nonasymptotically conservative. We propose a weighted p-value-based FDR procedure, “weighted FDR (wFDR) procedure” for short, for MT in the discrete paradigm that efficiently adapts to both heterogeneity and discreteness of p-value distributions. We theoretically justify the nonasymptotic conservativeness of the wFDR procedure under independence, and show via simulation studies that, for MT based on p-values of binomial test or Fisher's exact test, it is more powerful than six other procedures. The wFDR procedure is applied to two examples based on discrete data, a drug safety study, and a differential methylation study, where it makes more discoveries than two existing methods. 相似文献

16.

Use of meta-analysis to combine candidate gene association studies: application to study the relationship between the ESR PvuII polymorphism and sow litter size

Leopoldo Alfonso 《遗传、选种与进化》2005,37(5):417-435

This article investigates the application of meta-analysis on livestock candidate gene effects. The PvuII polymorphism of the ESR gene is used as an example. The association among ESR PvuII alleles with the number of piglets born alive and total born in the first (NBA1, TNB1) and later parities (NBA, TNB) is reviewed by conducting a meta-analysis of 15 published studies including 9329 sows. Under a fixed effects model, litter size values were significantly lower in the "AA" genotype groups when compared with "AB" and "BB" homozygotes. Under the random effects model, the results were similar although differences between "AA" and "AB" genotype groups were not clearly significant for NBA and TNB. Nevertheless, the most noticeable result was the high and significant heterogeneity estimated among studies. This heterogeneity could be assigned to error sampling, genotype by environment interaction, linkage or epistasis, as referred to in the literature, but also to the hypothesis of population admixture/stratification. It is concluded that meta-analysis can be considered as a helpful analytical tool to synthesise and discuss livestock candidate gene effects. The main difficulty found was the insufficient information on the standard errors of the estimated genotype effects in several publications. Consequently, the convenience of publishing the standard errors or the concrete P-values instead of the test significance level should be recommended to guarantee the quality of candidate gene effect meta-analyses. 相似文献

17.

Exercise prescription for patients with a Fontan circulation: current evidence and future directions

T. Takken H. J. Hulzebos A. C. Blank M. H. P. Tacken P. J. M. Helders J. L. M. Strengers 《Netherlands heart journal》2007,15(4):142-147

It is well documented that children with a Fontan circulation have a reduced exercise capacity. One of the modalities to improve exercise capacity might be exercise training. We performed a systematic literature review on the effects of exercise training in patients with a Fontan circulation. Six published studies were included that reported on the effects of exercise training in 40 patients. All studies had a small sample size and/or did not include a control group. Based on the six published studies we can conclude that children who have undergone a Fontan operation and who are in a stable haemodynamic condition can safely participate in an exercise training programme and that exercise training results in an improved exercise capacity. However, more research is needed to establish the optimal exercise mode, dose-response relation, and the effects of exercise training on cardiac function, peripheral muscle function, physical activity, and health-related quality of life. (Neth Heart J 2007; 15:142-7.) Based on the six published studies we can conclude that children who have undergone a Fontan operation and who are in a stable haemodynamic condition can safely participate in an exercise training programme and that exercise training results in an improved exercise capacity. However, more research is needed to establish the optimal exercise mode, dose-response relation, and the effects of exercise training on cardiac function, peripheral muscle function, physical activity, and health-related quality of life. (Neth Heart J 2007; 15:142-7.) 相似文献

18.

rPCMP: robust <Emphasis Type="Italic">p</Emphasis>-value combination by multiple partitions with applications to ATAC-seq data

Menglan Cai Limin Li 《BMC systems biology》2018,12(9):141

Background

Evaluating the significance for a group of genes or proteins in a pathway or biological process for a disease could help researchers understand the mechanism of the disease. For example, identifying related pathways or gene functions for chromatin states of tumor-specific T cells will help determine whether T cells could reprogram or not, and further help design the cancer treatment strategy. Some existing p-value combination methods can be used in this scenario. However, these methods suffer from different disadvantages, and thus it is still challenging to design more powerful and robust statistical method.

Results

The existing method of Group combined p-value (GCP) first partitions p-values to several groups using a set of several truncation points, but the method is often sensitive to these truncation points. Another method of adaptive rank truncated product method(ARTP) makes use of multiple truncation integers to adaptively combine the smallest p-values, but the method loses statistical power since it ignores the larger p-values. To tackle these problems, we propose a robust p-value combination method (rPCMP) by considering multiple partitions of p-values with different sets of truncation points. The proposed rPCMP statistic have a three-layer hierarchical structure. The inner-layer considers a statistic which combines p-values in a specified interval defined by two thresholds points, the intermediate-layer uses a GCP statistic which optimizes the statistic from the inner layer for a partition set of threshold points, and the outer-layer integrates the GCP statistic from multiple partitions of p-values. The empirical distribution of statistic under null distribution could be estimated by permutation procedure.

Conclusions

Our proposed rPCMP method has been shown to be more robust and have higher statistical power. Simulation study shows that our method can effectively control the type I error rates and have higher statistical power than the existing methods. We finally apply our rPCMP method to an ATAC-seq dataset for discovering the related gene functions with chromatin states in mouse tumors T cell.

相似文献

19.

Comparative efficacy and safety of pharmacological interventions for the treatment of COVID-19: A systematic review and network meta-analysis

Min Seo Kim Min Ho An Won Jun Kim Tae-Ho Hwang 《PLoS medicine》2020,17(12)

BackgroundNumerous clinical trials and observational studies have investigated various pharmacological agents as potential treatment for Coronavirus Disease 2019 (COVID-19), but the results are heterogeneous and sometimes even contradictory to one another, making it difficult for clinicians to determine which treatments are truly effective.Methods and findingsWe carried out a systematic review and network meta-analysis (NMA) to systematically evaluate the comparative efficacy and safety of pharmacological interventions and the level of evidence behind each treatment regimen in different clinical settings. Both published and unpublished randomized controlled trials (RCTs) and confounding-adjusted observational studies which met our predefined eligibility criteria were collected. We included studies investigating the effect of pharmacological management of patients hospitalized for COVID-19 management. Mild patients who do not require hospitalization or have self-limiting disease courses were not eligible for our NMA. A total of 110 studies (40 RCTs and 70 observational studies) were included. PubMed, Google Scholar, MEDLINE, the Cochrane Library, medRxiv, SSRN, WHO International Clinical Trials Registry Platform, and ClinicalTrials.gov were searched from the beginning of 2020 to August 24, 2020. Studies from Asia (41 countries, 37.2%), Europe (28 countries, 25.4%), North America (24 countries, 21.8%), South America (5 countries, 4.5%), and Middle East (6 countries, 5.4%), and additional 6 multinational studies (5.4%) were included in our analyses. The outcomes of interest were mortality, progression to severe disease (severe pneumonia, admission to intensive care unit (ICU), and/or mechanical ventilation), viral clearance rate, QT prolongation, fatal cardiac complications, and noncardiac serious adverse events. Based on RCTs, the risk of progression to severe course and mortality was significantly reduced with corticosteroids (odds ratio (OR) 0.23, 95% confidence interval (CI) 0.06 to 0.86, p = 0.032, and OR 0.78, 95% CI 0.66 to 0.91, p = 0.002, respectively) and remdesivir (OR 0.29, 95% CI 0.17 to 0.50, p < 0.001, and OR 0.62, 95% CI 0.39 to 0.98, p = 0.041, respectively) compared to standard care for moderate to severe COVID-19 patients in non-ICU; corticosteroids were also shown to reduce mortality rate (OR 0.54, 95% CI 0.40 to 0.73, p < 0.001) for critically ill patients in ICU. In analyses including observational studies, interferon-alpha (OR 0.05, 95% CI 0.01 to 0.39, p = 0.004), itolizumab (OR 0.10, 95% CI 0.01 to 0.92, p = 0.042), sofosbuvir plus daclatasvir (OR 0.26, 95% CI 0.07 to 0.88, p = 0.030), anakinra (OR 0.30, 95% CI 0.11 to 0.82, p = 0.019), tocilizumab (OR 0.43, 95% CI 0.30 to 0.60, p < 0.001), and convalescent plasma (OR 0.48, 95% CI 0.24 to 0.96, p = 0.038) were associated with reduced mortality rate in non-ICU setting, while high-dose intravenous immunoglobulin (IVIG) (OR 0.13, 95% CI 0.03 to 0.49, p = 0.003), ivermectin (OR 0.15, 95% CI 0.04 to 0.57, p = 0.005), and tocilizumab (OR 0.62, 95% CI 0.42 to 0.90, p = 0.012) were associated with reduced mortality rate in critically ill patients. Convalescent plasma was the only treatment option that was associated with improved viral clearance rate at 2 weeks compared to standard care (OR 11.39, 95% CI 3.91 to 33.18, p < 0.001). The combination of hydroxychloroquine and azithromycin was shown to be associated with increased QT prolongation incidence (OR 2.01, 95% CI 1.26 to 3.20, p = 0.003) and fatal cardiac complications in cardiac-impaired populations (OR 2.23, 95% CI 1.24 to 4.00, p = 0.007). No drug was significantly associated with increased noncardiac serious adverse events compared to standard care. The quality of evidence of collective outcomes were estimated using the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) framework. The major limitation of the present study is the overall low level of evidence that reduces the certainty of recommendations. Besides, the risk of bias (RoB) measured by RoB2 and ROBINS-I framework for individual studies was generally low to moderate. The outcomes deducted from observational studies could not infer causality and can only imply associations. The study protocol is publicly available on PROSPERO (CRD42020186527).ConclusionsIn this NMA, we found that anti-inflammatory agents (corticosteroids, tocilizumab, anakinra, and IVIG), convalescent plasma, and remdesivir were associated with improved outcomes of hospitalized COVID-19 patients. Hydroxychloroquine did not provide clinical benefits while posing cardiac safety risks when combined with azithromycin, especially in the vulnerable population. Only 29% of current evidence on pharmacological management of COVID-19 is supported by moderate or high certainty and can be translated to practice and policy; the remaining 71% are of low or very low certainty and warrant further studies to establish firm conclusions.

In this meta-analysis, Min Seo Kim and colleagues synthesise results from randomized trials and observational studies on COVID-19 treatments. 相似文献

20.

Feature Engineering and a Proposed Decision-Support System for Systematic Reviewers of Medical Evidence

Tanja Bekhuis Eugene Tseytlin Kevin J. Mitchell Dina Demner-Fushman 《PloS one》2014,9(1)

Objectives

Evidence-based medicine depends on the timely synthesis of research findings. An important source of synthesized evidence resides in systematic reviews. However, a bottleneck in review production involves dual screening of citations with titles and abstracts to find eligible studies. For this research, we tested the effect of various kinds of textual information (features) on performance of a machine learning classifier. Based on our findings, we propose an automated system to reduce screeing burden, as well as offer quality assurance.

Methods

We built a database of citations from 5 systematic reviews that varied with respect to domain, topic, and sponsor. Consensus judgments regarding eligibility were inferred from published reports. We extracted 5 feature sets from citations: alphabetic, alphanumeric⁺, indexing, features mapped to concepts in systematic reviews, and topic models. To simulate a two-person team, we divided the data into random halves. We optimized the parameters of a Bayesian classifier, then trained and tested models on alternate data halves. Overall, we conducted 50 independent tests.

Results

All tests of summary performance (mean F3) surpassed the corresponding baseline, P<0.0001. The ranks for mean F3, precision, and classification error were statistically different across feature sets averaged over reviews; P-values for Friedman''s test were .045, .002, and .002, respectively. Differences in ranks for mean recall were not statistically significant. Alphanumeric⁺ features were associated with best performance; mean reduction in screening burden for this feature type ranged from 88% to 98% for the second pass through citations and from 38% to 48% overall.

Conclusions

A computer-assisted, decision support system based on our methods could substantially reduce the burden of screening citations for systematic review teams and solo reviewers. Additionally, such a system could deliver quality assurance both by confirming concordant decisions and by naming studies associated with discordant decisions for further consideration. 相似文献