首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Estimating false discovery rates (FDRs) of protein identification continues to be an important topic in mass spectrometry–based proteomics, particularly when analyzing very large datasets. One performant method for this purpose is the Picked Protein FDR approach which is based on a target-decoy competition strategy on the protein level that ensures that FDRs scale to large datasets. Here, we present an extension to this method that can also deal with protein groups, that is, proteins that share common peptides such as protein isoforms of the same gene. To obtain well-calibrated FDR estimates that preserve protein identification sensitivity, we introduce two novel ideas. First, the picked group target-decoy and second, the rescued subset grouping strategies. Using entrapment searches and simulated data for validation, we demonstrate that the new Picked Protein Group FDR method produces accurate protein group-level FDR estimates regardless of the size of the data set. The validation analysis also uncovered that applying the commonly used Occam’s razor principle leads to anticonservative FDR estimates for large datasets. This is not the case for the Picked Protein Group FDR method. Reanalysis of deep proteomes of 29 human tissues showed that the new method identified up to 4% more protein groups than MaxQuant. Applying the method to the reanalysis of the entire human section of ProteomicsDB led to the identification of 18,000 protein groups at 1% protein group-level FDR. The analysis also showed that about 1250 genes were represented by ≥2 identified protein groups. To make the method accessible to the proteomics community, we provide a software tool including a graphical user interface that enables merging results from multiple MaxQuant searches into a single list of identified and quantified protein groups.  相似文献   

2.
Calculating the number of confidently identified proteins and estimating false discovery rate (FDR) is a challenge when analyzing very large proteomic data sets such as entire human proteomes. Biological and technical heterogeneity in proteomic experiments further add to the challenge and there are strong differences in opinion regarding the conceptual validity of a protein FDR and no consensus regarding the methodology for protein FDR determination. There are also limitations inherent to the widely used classic target–decoy strategy that particularly show when analyzing very large data sets and that lead to a strong over-representation of decoy identifications. In this study, we investigated the merits of the classic, as well as a novel target–decoy-based protein FDR estimation approach, taking advantage of a heterogeneous data collection comprised of ∼19,000 LC-MS/MS runs deposited in ProteomicsDB (https://www.proteomicsdb.org). The “picked” protein FDR approach treats target and decoy sequences of the same protein as a pair rather than as individual entities and chooses either the target or the decoy sequence depending on which receives the highest score. We investigated the performance of this approach in combination with q-value based peptide scoring to normalize sample-, instrument-, and search engine-specific differences. The “picked” target–decoy strategy performed best when protein scoring was based on the best peptide q-value for each protein yielding a stable number of true positive protein identifications over a wide range of q-value thresholds. We show that this simple and unbiased strategy eliminates a conceptual issue in the commonly used “classic” protein FDR approach that causes overprediction of false-positive protein identification in large data sets. The approach scales from small to very large data sets without losing performance, consistently increases the number of true-positive protein identifications and is readily implemented in proteomics analysis software.Shotgun proteomics is the most popular approach for large-scale identification and quantification of proteins. The rapid evolution of high-end mass spectrometers in recent years (15) has made proteomic studies feasible that identify and quantify as many as 10,000 proteins in a sample (68) and enables many lines of new scientific research including, for example, the analysis of many human proteomes, and proteome-wide protein–drug interaction studies (911). One fundamental step in most proteomic experiments is the identification of proteins in the biological system under investigation. To achieve this, proteins are digested into peptides, analyzed by LC-MS/MS, and tandem mass spectra are used to interrogate protein sequence databases using search engines that match experimental data to data generated in silico (12, 13). Peptide spectrum matches (PSMs)1 are commonly assigned by a search engine using either a heuristic or a probabilistic scoring scheme (1418). Proteins are then inferred from identified peptides and a protein score or a probability derived as a measure for the confidence in the identification (13, 19).Estimating the proportion of false matches (false discovery rate; FDR) in an experiment is important to assess and maintain the quality of protein identifications. Owing to its conceptual and practical simplicity, the most widely used strategy to estimate FDR in proteomics is the target–decoy database search strategy (target–decoy strategy; TDS) (20). The main assumption underlying this idea is that random matches (false positives) should occur with similar likelihood in the target database and the decoy (reversed, shuffled, or otherwise randomized) version of the same database (21, 22). The number of matches to the decoy database, therefore, provides an estimate of the number of random matches one should expect to obtain in the target database. The number of target and decoy hits can then be used to calculate either a local or a global FDR for a given data set (2126). This general idea can be applied to control the FDR at the level of PSMs, peptides, and proteins, typically by counting the number of target and decoy observations above a specified score.Despite the significant practical impact of the TDS, it has been observed that a peptide FDR that results in an acceptable protein FDR (of say 1%) for a small or medium sized data set, turns into an unacceptably high protein FDR when the data set grows larger (22, 27). This is because the basic assumption of the classical TDS is compromised when a large proportion of the true positive proteins have already been identified. In small data sets, containing say only a few hundred to a few thousand proteins, random peptide matches will be distributed roughly equally over all decoy and “leftover” target proteins, allowing for a reasonably accurate estimation of false positive target identifications by using the number of decoy identifications. However, in large experiments comprising hundreds to thousands of LC-MS/MS runs, 10,000 or more target proteins may be genuinely and repeatedly identified, leaving an ever smaller number of (target) proteins to be hit by new false positive peptide matches. In contrast, decoy proteins are only hit by the occasional random peptide match but fully count toward the number of false positive protein identifications estimated from the decoy hits. The higher the number of genuinely identified target proteins gets, the larger this imbalance becomes. If this is not corrected for in the decoy space, an overestimation of false positives will occur.This problem has been recognized and e.g. Reiter and colleagues suggested a way for correcting for the overestimation of false positive protein hits termed MAYU (27). Following the main assumption that protein identifications containing false positive PSMs are uniformly distributed over the target database, MAYU models the number of false positive protein identifications using a hypergeometric distribution. Its parameters are estimated from the number of protein database entries and the total number of target and decoy protein identifications. The protein FDR is then estimated by dividing the number of expected false positive identifications (expectation value of the hypergeometric distribution) by the total number of target identifications. Although this approach was specifically designed for large data sets (tested on ∼1300 LC-MS/MS runs from digests of C. elegans proteins), it is not clear how far the approach actually scales. Another correction strategy for overestimation of false positive rates, the R factor, was suggested initially for peptides (28) and more recently for proteins (29). A ratio, R, of forward and decoy hits in the low probability range is calculated, where the number of true peptide or protein identifications is expected to be close to zero, and hence, R should approximate one. The number of decoy hits is then multiplied (corrected) by the R factor when performing FDR calculations. The approach is conceptually simpler than the MAYU strategy and easy to implement, but is also based on the assumption that the inflation of the decoy hits intrinsic in the classic target–decoy strategy occurs to the same extent in all probability ranges.In the context of the above, it is interesting to note that there is currently no consensus in the community regarding if and how protein FDRs should be calculated for data of any size. One perhaps extreme view is that, owing to issues and assumptions related to the peptide to protein inference step and ways of constructing decoy protein sequences, protein level FDRs cannot be meaningfully estimated at all (30). This is somewhat unsatisfactory as an estimate of protein level error in proteomic experiments is highly desirable. Others have argued that target–decoy searches are not even needed when accurate p values of individual PSMs are available (31) whereas others choose to tighten the PSM or peptide FDRs obtained from TDS analysis to whatever threshold necessary to obtain a desired protein FDR (32). This is likely too conservative.We have recently proposed an alternative protein FDR approach termed “picked” target–decoy strategy (picked TDS) that indicated improved performance over the classical TDS in a very large proteomic data set (9) but a systematic investigation of the idea had not been performed at the time. In this study, we further characterized the picked TDS for protein FDR estimation and investigated its scalability compared with that of the classic TDS FDR method in data sets of increasing size up to ∼19,000 LC-MS/MS runs. The results show that the picked TDS is effective in preventing decoy protein over-representation, identifies more true positive hits, and works equally well for small and large proteomic data sets.  相似文献   

3.
4.
Summary .  Pharmacovigilance systems aim at early detection of adverse effects of marketed drugs. They maintain large spontaneous reporting databases for which several automatic signaling methods have been developed. One limit of those methods is that the decision rules for the signal generation are based on arbitrary thresholds. In this article, we propose a new signal-generation procedure. The decision criterion is formulated in terms of a critical region for the P-values resulting from the reporting odds ratio method as well as from the Fisher's exact test. For the latter, we also study the use of mid-P-values. The critical region is defined by the false discovery rate, which can be estimated by adapting the P-values mixture model based procedures to one-sided tests. The methodology is mainly illustrated with the location-based estimator procedure. It is studied through a large simulation study and applied to the French pharmacovigilance database.  相似文献   

5.
A fundamental aspect of proteomics is the analysis of post-translational modifications, of which phosphorylation is an important class. Numerous nonradioactivity-based methods have been described for high-sensitivity phosphorylation site mapping. The ABRF Proteomics Research Group has conducted a study to help determine how many laboratories are equipped to take on such projects, which methods they choose to apply, and how successful the laboratories are in implementing particular methodologies. The ABRF-PRG03 sample was distributed as a tryptic digest of a mixture of two proteins with two synthetic phosphopeptides added. Each sample contained 5 pmol of unphosphorylated protein digest, 1 pmol of each phosphopeptide from the same protein, and 200 fmol of a minor protein component. Study participants were challenged to identify the two proteins and the two phosphorylated peptides, and determine the site of phosphorylation in each peptide. Almost all respondents successfully identified the major protein component, whereas only 10% identified the minor protein component. Phosphorylation site analysis proved surprisingly difficult, with only 3 of the 54 laboratories correctly determining both sites of phosphorylation. Various strategies and instruments were applied to this task with mixed success; chromatographic separation of the peptides was clearly helpful, whereas enrichment by metal affinity chromatography met with surprisingly little success. We conclude that locating sites of phosphorylation remains a significant challenge at this level of sample abundance.  相似文献   

6.
7.
A method of geographic mapping of the stationary (limiting) gene migration rate has been developed. The method is based on approximation of the empirical distribution of gene frequencies by a theoretical steady-state distribution. The maximum likelihood method and the 2 minimization method are used to obtain consistent estimations of the gene migration rate as a parameter of the steady-state distribution. The new method makes it possible to determine the geographical distribution of the ratio between the properties of the population migration structure described by the stepping-stone and island models and to construct a geographical map of 2 values. This map approximately reflects the distribution of natural selection pressure on the gene pool if genetic processes are quasisteady.  相似文献   

8.

Background

The recent guidelines for preventing atherosclerotic cardiovascular events are an important advancement. For primary prevention, statins are recommended if the ten-year risk is ≥ 5% (consideration for therapy) or ≥ 7.5% (definitive treatment unless contraindication after discussion). We rationalized that a significant cohort with ten-year risk below the treatment thresholds would predictably surpass them within the recommended 4–6 year window for reassessing the ten-year risk. As atherosclerosis is a progressive disease, these individuals may therefore benefit with more aggressive therapies even at baseline.

Methods and Findings

We used publicly available NHANES dataset for ten-year risk calculation. There were 1805 participants. To evaluate the ten-year risk change at five years, we considered two scenarios: no change in the baseline parameters except increased age by five (No Change) and alternatively 10% improvement in systolic BP, total and HDL-c, no smoking with five-year increase in age (Reduced Risk Profile). Amongst non-diabetics with <5% risk at baseline, 35% reached or exceeded 5% risk in five years (5% reached or exceed the 7.5% risk) with No Change and 9% reached or exceeded 5% risk in five years (none reached 7.5% risk) with Reduced Risk Profile; furthermore, 94% of the non-diabetic cohort with baseline risk between 3.5%–5% would exceed the 5% and/or 7.5% boundary limit with No Change. Amongst non-diabetics with 5–7.5% baseline risks, 87% reached or exceeded 7.5% with No Change while 30% reached or exceeded 7.5% risk with Reduced Risk Profile.

Conclusions

A significant population cohort at levels below the treatment thresholds will predictably exceed these limits with time with or without improvement in modifiable risk factors and may benefit with more aggressive therapy at baseline. We provide an improved risk calculator that allows for integrating expected risk modification into discussion with an individual. This needs to be prospectively tested in clinical trials.  相似文献   

9.
10.

Introduction

The existence of partial volume effects in brain MR images makes it challenging to understand physio-pathological alterations underlying signal changes due to pathology across groups of healthy subjects and patients. In this study, we implement a new approach to disentangle gray and white matter alterations in the thalamus and the basal ganglia. The proposed method was applied to a cohort of early multiple sclerosis (MS) patients and healthy subjects to evaluate tissue-specific alterations related to diffuse inflammatory or neurodegenerative processes.

Method

Forty-three relapsing-remitting MS patients and nineteen healthy controls underwent 3T MRI including: (i) fluid-attenuated inversion recovery, double inversion recovery, magnetization-prepared gradient echo for lesion count, and (ii) T1 relaxometry. We applied a partial volume estimation algorithm to T1 relaxometry maps to gray and white matter local concentrations as well as T1 values characteristic of gray and white matter in the thalamus and the basal ganglia. Statistical tests were performed to compare groups in terms of both global T1 values, tissue characteristic T1 values, and tissue concentrations.

Results

Significant increases in global T1 values were observed in the thalamus (p = 0.038) and the putamen (p = 0.026) in RRMS patients compared to HC. In the Thalamus, the T1 increase was associated with a significant increase in gray matter characteristic T1 (p = 0.0016) with no significant effect in white matter.

Conclusion

The presented methodology provides additional information to standard MR signal averaging approaches that holds promise to identify the presence and nature of diffuse pathology in neuro-inflammatory and neurodegenerative diseases.  相似文献   

11.
12.
The domestic dog, Canis familiaris, exhibits profound phenotypic diversity and is an ideal model organism for the genetic dissection of simple and complex traits. However, some of the most interesting phenotypes are fixed in particular breeds and are therefore less tractable to genetic analysis using classical segregation-based mapping approaches. We implemented an across breed mapping approach using a moderately dense SNP array, a low number of animals and breeds carefully selected for the phenotypes of interest to identify genetic variants responsible for breed-defining characteristics. Using a modest number of affected (10–30) and control (20–60) samples from multiple breeds, the correct chromosomal assignment was identified in a proof of concept experiment using three previously defined loci; hyperuricosuria, white spotting and chondrodysplasia. Genome-wide association was performed in a similar manner for one of the most striking morphological traits in dogs: brachycephalic head type. Although candidate gene approaches based on comparable phenotypes in mice and humans have been utilized for this trait, the causative gene has remained elusive using this method. Samples from nine affected breeds and thirteen control breeds identified strong genome-wide associations for brachycephalic head type on Cfa 1. Two independent datasets identified the same genomic region. Levels of relative heterozygosity in the associated region indicate that it has been subjected to a selective sweep, consistent with it being a breed defining morphological characteristic. Genotyping additional dogs in the region confirmed the association. To date, the genetic structure of dog breeds has primarily been exploited for genome wide association for segregating traits. These results demonstrate that non-segregating traits under strong selection are equally tractable to genetic analysis using small sample numbers.  相似文献   

13.
Antisense oligonucleotides have been studied for many years as a tool for gene silencing. One of the most difficult cases of selective RNA silencing involves the alleles of single nucleotide polymorphisms, in which the allele sequence is differentiated by a single nucleotide. A new approach to improve the performance of allele selectivity for antisense oligonucleotides is proposed. It is based on the simultaneous application of two oligonucleotides. One is complementary to the mutated form of the targeted RNA and is able to activate RNase H to cleave the RNA. The other oligonucleotide, which is complementary to the wild type allele of the targeted RNA, is able to inhibit RNase H cleavage. Five types of SNPs, C/G, G/C, G/A, A/G, and C/U, were analyzed within the sequence context of genes associated with neurodegenerative disorders such as Alzheimer’s disease, Parkinson’s disease, ALS (Amyotrophic Lateral Sclerosis), and Machado-Joseph disease. For most analyzed cases, the application of the tandem approach increased allele-selective RNA degradation 1.5–15 fold relative to the use of a single antisense oligonucleotide. The presented study proves that differentiation between single substitution is highly dependent on the nature of the SNP and surrounding nucleotides. These variables are crucial for determining the proper length of the inhibitor antisense oligonucleotide. In the tandem approach, the comparison of thermodynamic stability of the favorable duplexes WT RNA-inhibitor and Mut RNA-gapmer with the other possible duplexes allows for the evaluation of chances for the allele-selective degradation of RNA. A larger difference in thermodynamic stability between favorable duplexes and those that could possibly form, usually results in the better allele selectivity of RNA degradation.  相似文献   

14.
Several lines of evidence suggest that genome-wide association studies (GWAS) have the potential to explain more of the “missing heritability” of common complex phenotypes. However, reliable methods to identify a larger proportion of single nucleotide polymorphisms (SNPs) that impact disease risk are currently lacking. Here, we use a genetic pleiotropy-informed conditional false discovery rate (FDR) method on GWAS summary statistics data to identify new loci associated with schizophrenia (SCZ) and bipolar disorders (BD), two highly heritable disorders with significant missing heritability. Epidemiological and clinical evidence suggest similar disease characteristics and overlapping genes between SCZ and BD. Here, we computed conditional Q–Q curves of data from the Psychiatric Genome Consortium (SCZ; n = 9,379 cases and n = 7,736 controls; BD: n = 6,990 cases and n = 4,820 controls) to show enrichment of SNPs associated with SCZ as a function of association with BD and vice versa with a corresponding reduction in FDR. Applying the conditional FDR method, we identified 58 loci associated with SCZ and 35 loci associated with BD below the conditional FDR level of 0.05. Of these, 14 loci were associated with both SCZ and BD (conjunction FDR). Together, these findings show the feasibility of genetic pleiotropy-informed methods to improve gene discovery in SCZ and BD and indicate overlapping genetic mechanisms between these two disorders.  相似文献   

15.

Objective

To establish a simple two-compartment model for glomerular filtration rate (GFR) and renal plasma flow (RPF) estimations by dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI).

Materials and Methods

A total of eight New Zealand white rabbits were included in DCE-MRI. The two-compartment model was modified with the impulse residue function in this study. First, the reliability of GFR measurement of the proposed model was compared with other published models in Monte Carlo simulation at different noise levels. Then, functional parameters were estimated in six healthy rabbits to test the feasibility of the new model. Moreover, in order to investigate its validity of GFR estimation, two rabbits underwent acute ischemia surgical procedure in unilateral kidney before DCE-MRI, and pixel-wise measurements were implemented to detect the cortical GFR alterations between normal and abnormal kidneys.

Results

The lowest variability of GFR and RPF measurements were found in the proposed model in the comparison. Mean GFR was 3.03±1.1 ml/min and mean RPF was 2.64±0.5 ml/g/min in normal animals, which were in good agreement with the published values. Moreover, large GFR decline was found in dysfunction kidneys comparing to the contralateral control group.

Conclusion

Results in our study demonstrate that measurement of renal kinetic parameters based on the proposed model is feasible and it has the ability to discriminate GFR changes in healthy and diseased kidneys.  相似文献   

16.
The traditional q1 * methodology for constructing upper confidence limits (UCLs) for the low-dose slopes of quantal dose-response functions has two limitations: (i) it is based on an asymptotic statistical result that has been shown via Monte Carlo simulation not to hold in practice for small, real bioassay experiments (Portier and Hoel, 1983); and (ii) it assumes that the multistage model (which represents cumulative hazard as a polynomial function of dose) is correct. This paper presents an uncertainty analysis approach for fitting dose-response functions to data that does not require specific parametric assumptions or depend on asymptotic results. It has the advantage that the resulting estimates of the dose-response function (and uncertainties about it) no longer depend on the validity of an assumed parametric family nor on the accuracy of the asymptotic approximation. The method derives posterior densities for the true response rates in the dose groups, rather than deriving posterior densities for model parameters, as in other Bayesian approaches (Sielken, 1991), or resampling the observed data points, as in the bootstrap and other resampling methods. It does so by conditioning constrained maximum-entropy priors on the observed data. Monte Carlo sampling of the posterior (constrained, conditioned) probability distributions generate values of response probabilities that might be observed if the experiment were repeated with very large sample sizes. A dose-response curve is fit to each such simulated dataset. If no parametric model has been specified, then a generalized representation (e.g., a power-series or orthonormal polynomial expansion) of the unknown dose-response function is fit to each simulated dataset using “model-free” methods. The simulation-based frequency distribution of all the dose-response curves fit to the simulated datasets yields a posterior distribution function for the low-dose slope of the dose-response curve. An upper confidence limit on the low-dose slope is obtained directly from this posterior distribution. This “Data Cube” procedure is illustrated with a real dataset for benzene, and is seen to produce more policy-relevant insights than does the traditional q1 * methodology. For example, it shows how far apart are the 90%, 95%, and 99% limits and reveals how uncertainty about total and incremental risk vary with dose level (typically being dominated at low doses by uncertainty about the response of the control group, and being dominated at high doses by sampling variability). Strengths and limitations of the Data Cube approach are summarized, and potential decision-analytic applications to making better informed risk management decisions are briefly discussed.  相似文献   

17.
Object localization plays a key role in many popular applications of Wireless Multimedia Sensor Networks (WMSN) and as a result, it has acquired a significant status for the research community. A significant body of research performs this task without considering node orientation, object geometry and environmental variations. As a result, the localized object does not reflect the real world scenarios. In this paper, a novel object localization scheme for WMSN has been proposed that utilizes range free localization, computer vision, and principle component analysis based algorithms. The proposed approach provides the best possible approximation of distance between a wmsn sink and an object, and the orientation of the object using image based information. Simulation results report 99% efficiency and an error ratio of 0.01 (around 1 ft) when compared to other popular techniques.  相似文献   

18.
We employed deep genome sequencing of two parents and 12 of their offspring to estimate the mutation rate per site per generation in a full-sib family of Drosophila melanogaster recently sampled from a natural population. Sites that were homozygous for the same allele in the parents and heterozygous in one or more offspring were categorized as candidate mutations and subjected to detailed analysis. In 1.23 × 109 callable sites from 12 individuals, we confirmed six single nucleotide mutations. We estimated the false negative rate in the experiment by generating synthetic mutations using the empirical distributions of numbers of nonreference bases at heterozygous sites in the offspring. The proportion of synthetic mutations at callable sites that we failed to detect was <1%, implying that the false negative rate was extremely low. Our estimate of the point mutation rate is 2.8 × 10−9 (95% confidence interval = 1.0 × 10−9 − 6.1 × 10−9) per site per generation, which is at the low end of the range of previous estimates, and suggests an effective population size for the species of ∼1.4 × 106. At one site, point mutations were present in two individuals, indicating that there had been a premeiotic mutation cluster, although surprisingly one individual had a G→A transition and the other a G→T transversion, possibly associated with error-prone mismatch repair. We also detected three short deletion mutations and no insertions, giving a deletion mutation rate of 1.2 × 10−9 (95% confidence interval = 0.7 × 10−9 − 11 × 10−9).  相似文献   

19.
《IRBM》2020,41(1):58-70
ObjectivesObjective of this paper is to present a reliable and accurate technique for Myocardial Infarction (MI) detection and localization.Material and methodsStationary wavelet transform has been used to decompose the ECG signal. Energy, entropy and slope based features were extracted at specific wavelet bands from selected lead of ECG. k-Nearest Neighbors (kNN) with Mahalanobis distance function has been used for classification. Sensitivity (Se), specificity (Sp), positive predictivity (+P), accuracy (Acc), and area under the receiver operating characteristics curve (AUC) analyzed over 200 subjects (52 health control, 148 with MI) from Physikalisch-Technische Bundesanstalt (PTB) database has been used for performance analysis. To handle the imbalanced data adaptive synthetic (ADASYN) sampling approach has been adopted.ResultsFor detection of MI, the proposed technique has shown an AUC = 0.99, Se = 98.62%, Sp = 99.40%, PPR = 99.41% and Acc = 99.00% using 12 top ranked features, extracted from multiple leads of ECG and AUC = 0.99, Se = 98.34%, Sp = 99.77%, PPR = 99.77% and Acc = 99.05% using 12 features extracted from a single ECG lead (i.e. lead V5). For localization of MI, the proposed technique has an AUC = 0.99, Se = 98.78%, Sp = 99.86%, PPR = 98.80%, and Acc = 99.76% using 5 top ranked features from multiple leads of ECG and AUC = 0.98, Se = 96.47%, Sp = 99.60%, PPR = 96.49% and Acc = 99.28% using 8 features extracted from a single ECG lead (i.e. lead V3).ConclusionThus for MI detection and localization, the proposed technique is independent of time-domain ECG fiducial markers and can work using specific leads of ECG.  相似文献   

20.

Background

Non-inferiority trials are performed when the main therapeutic effect of the new therapy is expected to be not unacceptably worse than that of the standard therapy, and the new therapy is expected to have advantages over the standard therapy in costs or other (health) consequences. These advantages however are not included in the classic frequentist approach of sample size calculation for non-inferiority trials. In contrast, the decision theory approach of sample size calculation does include these factors. The objective of this study is to compare the conceptual and practical aspects of the frequentist approach and decision theory approach of sample size calculation for non-inferiority trials, thereby demonstrating that the decision theory approach is more appropriate for sample size calculation of non-inferiority trials.

Methods

The frequentist approach and decision theory approach of sample size calculation for non-inferiority trials are compared and applied to a case of a non-inferiority trial on individually tailored duration of elastic compression stocking therapy compared to two years elastic compression stocking therapy for the prevention of post thrombotic syndrome after deep vein thrombosis.

Results

The two approaches differ substantially in conceptual background, analytical approach, and input requirements. The sample size calculated according to the frequentist approach yielded 788 patients, using a power of 80% and a one-sided significance level of 5%. The decision theory approach indicated that the optimal sample size was 500 patients, with a net value of €92 million.

Conclusions

This study demonstrates and explains the differences between the classic frequentist approach and the decision theory approach of sample size calculation for non-inferiority trials. We argue that the decision theory approach of sample size estimation is most suitable for sample size calculation of non-inferiority trials.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号