首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Gosselin F 《PloS one》2011,6(3):e14770

Background

Recent approaches mixing frequentist principles with Bayesian inference propose internal goodness-of-fit (GOF) p-values that might be valuable for critical analysis of Bayesian statistical models. However, GOF p-values developed to date only have known probability distributions under restrictive conditions. As a result, no known GOF p-value has a known probability distribution for any discrepancy function.

Methodology/Principal Findings

We show mathematically that a new GOF p-value, called the sampled posterior p-value (SPP), asymptotically has a uniform probability distribution whatever the discrepancy function. In a moderate finite sample context, simulations also showed that the SPP appears stable to relatively uninformative misspecifications of the prior distribution.

Conclusions/Significance

These reasons, together with its numerical simplicity, make the SPP a better canonical GOF p-value than existing GOF p-values.  相似文献   

2.
MOTIVATION: The most commonly utilized microarrays for mRNA profiling (Affymetrix) include 'probe sets' of a series of perfect match and mismatch probes (typically 22 oligonucleotides per probe set). There are an increasing number of reported 'probe set algorithms' that differ in their interpretation of a probe set to derive a single normalized 'signal' representative of expression of each mRNA. These algorithms are known to differ in accuracy and sensitivity, and optimization has been done using a small set of standardized control microarray data. We hypothesized that different mRNA profiling projects have varying sources and degrees of confounding noise, and that these should alter the choice of a specific probe set algorithm. Also, we hypothesized that use of the Microarray Suite (MAS) 5.0 probe set detection p-value as a weighting function would improve the performance of all probe set algorithms. RESULTS: We built an interactive visual analysis software tool (HCE2W) to test and define parameters in Affymetrix analyses that optimize the ratio of signal (desired biological variable) versus noise (confounding uncontrolled variables). Five probe set algorithms were studied with and without statistical weighting of probe sets using the MAS 5.0 probe set detection p-values. The signal-to-noise ratio optimization method was tested in two large novel microarray datasets with different levels of confounding noise, a 105 sample U133A human muscle biopsy dataset (11 groups: mutation-defined, extensive noise), and a 40 sample U74A inbred mouse lung dataset (8 groups: little noise). Performance was measured by the ability of the specific probe set algorithm, with and without detection p-value weighting, to cluster samples into the appropriate biological groups (unsupervised agglomerative clustering with F-measure values). Of the total random sampling analyses, 50% showed a highly statistically significant difference between probe set algorithms by ANOVA [F(4,10) > 14, p < 0.0001], with weighting by MAS 5.0 detection p-value showing significance in the mouse data by ANOVA [F(1,10) > 9, p < 0.013] and paired t-test [t(9) = -3.675, p = 0.005]. Probe set detection p-value weighting had the greatest positive effect on performance of dChip difference model, ProbeProfiler and RMA algorithms. Importantly, probe set algorithms did indeed perform differently depending on the specific project, most probably due to the degree of confounding noise. Our data indicate that significantly improved data analysis of mRNA profile projects can be achieved by optimizing the choice of probe set algorithm with the noise levels intrinsic to a project, with dChip difference model with MAS 5.0 detection p-value continuous weighting showing the best overall performance in both projects. Furthermore, both existing and newly developed probe set algorithms should incorporate a detection p-value weighting to improve performance. AVAILABILITY: The Hierarchical Clustering Explorer 2.0 is available at http://www.cs.umd.edu/hcil/hce/ Murine arrays (40 samples) are publicly available at the PEPR resource (http://microarray.cnmcresearch.org/pgadatatable.asp http://pepr.cnmcresearch.org Chen et al., 2004).  相似文献   

3.
CONSEL: for assessing the confidence of phylogenetic tree selection.   总被引:10,自引:0,他引:10  
CONSEL is a program to assess the confidence of the tree selection by giving the p-values for the trees. The main thrust of the program is to calculate the p-value of the Approximately Unbiased (AU) test using the multi-scale bootstrap technique. This p-value is less biased than the other conventional p-values such as the Bootstrap Probability (BP), the Kishino-Hasegawa (KH) test, the Shimodaira-Hasegawa (SH) test, and the Weighted Shimodaira-Hasegawa (WSH) test. CONSEL calculates all these p-values from the output of the phylogeny program packages such as Molphy, PAML, and PAUP*. Furthermore, CONSEL is applicable to a wide class of problems where the BPs are available. AVAILABILITY: The programs are written in C language. The source code for Unix and the executable binary for DOS are found at http://www.ism.ac.jp/~shimo/ CONTACT: shimo@ism.ac.jp  相似文献   

4.
MOTIVATION: Human clinical projects typically require a priori statistical power analyses. Towards this end, we sought to build a flexible and interactive power analysis tool for microarray studies integrated into our public domain HCE 3.5 software package. We then sought to determine if probe set algorithms or organism type strongly influenced power analysis results. RESULTS: The HCE 3.5 power analysis tool was designed to import any pre-existing Affymetrix microarray project, and interactively test the effects of user-defined definitions of alpha (significance), beta (1-power), sample size and effect size. The tool generates a filter for all probe sets or more focused ontology-based subsets, with or without noise filters that can be used to limit analyses of a future project to appropriately powered probe sets. We studied projects from three organisms (Arabidopsis, rat, human), and three probe set algorithms (MAS5.0, RMA, dChip PM/MM). We found large differences in power results based on probe set algorithm selection and noise filters. RMA provided high sensitivity for low numbers of arrays, but this came at a cost of high false positive results (24% false positive in the human project studied). Our data suggest that a priori power calculations are important for both experimental design in hypothesis testing and hypothesis generation, as well as for the selection of optimized data analysis parameters. AVAILABILITY: The Hierarchical Clustering Explorer 3.5 with the interactive power analysis functions is available at www.cs.umd.edu/hcil/hce or www.cnmcresearch.org/bioinformatics. CONTACT: jseo@cnmcresearch.org  相似文献   

5.
6.
A chemiluminescent approach for sequential DNA hybridizations to high-density filter arrays of cDNAs, using a biotin-based random priming method followed by a streptavidin/alkaline phosphatase/CDP-Star detection protocol, is presented. The method has been applied to the Brugia malayi genome project, wherein cDNA libraries, cosmid and bacterial artificial chromosome (BAC) libraries have been gridded at high density onto nylon filters for subsequent analysis by hybridization. Individual probes and pools of rRNA probes, ribosomal protein probes and expressed sequence tag probes show correct specificity and high signal-to-noise ratios even after ten rounds of hybridization, detection, stripping of the probes from the membranes and rehybridization with additional probe sets. This approach provides a subtraction method that leads to a reduction in redundant DNA sequencing, thus increasing the rate of novel gene discovery. The method is also applicable for detecting target sequences, which are present in one or only a few copies per cell; it has proven useful for physical mapping of BAC and cosmid high-density filter arrays, wherein multiple probes have been hybridized at one time (multiplexed) and subsequently "deplexed" into individual components for specific probe localizations.  相似文献   

7.
The analysis of co-occurrence matrices is a common practice to evaluate community structure. The observed data are compared with a "null model", a randomised co-occurrence matrix derived from the observation by using a statistic, e.g. the C-score, sensitive to the pattern investigated. The most frequently used algorithm, "sequential swap", has been criticised for not sampling with equal frequencies thereby calling into question the results of earlier analysis. The bias of the "sequential swap" algorithm when used with the C-score was assessed by analysing 291 published presence-absence matrices. In 152 cases, the true p-value differed by >5% from the p-value generated by an uncorrected "sequential swap". However, the absolute value of the difference was rather small. Out of the 291 matrices, there were only 5 cases in which an incorrect statistical decision would have been reached by using the uncorrected p-value (3 at the p<0.05 and 2 at the p<0.01 level), and in all 5 of these cases, the true p-value was close to the significance level. Our results confirm analytical studies of Miklos and Podani which show that the uncorrected swap gives slightly conservative results in tests for competitive segregation. However, the bias is very small and should not distort the ecological interpretation. We also estimated the number of iterations needed for the "sequential swap" to generate accurate p-values. While most authors do not exceed a number of 104 iterations, the suggested minimum number of swaps for 29 out of the 291 tested matrices is greater than 104. We recommend to use 30 000 "sequential swaps" if the required sample size is not assessed otherwise.  相似文献   

8.
Permutation tests are amongst the most commonly used statistical tools in modern genomic research, a process by which p-values are attached to a test statistic by randomly permuting the sample or gene labels. Yet permutation p-values published in the genomic literature are often computed incorrectly, understated by about 1/m, where m is the number of permutations. The same is often true in the more general situation when Monte Carlo simulation is used to assign p-values. Although the p-value understatement is usually small in absolute terms, the implications can be serious in a multiple testing context. The understatement arises from the intuitive but mistaken idea of using permutation to estimate the tail probability of the test statistic. We argue instead that permutation should be viewed as generating an exact discrete null distribution. The relevant literature, some of which is likely to have been relatively inaccessible to the genomic community, is reviewed and summarized. A computation strategy is developed for exact p-values when permutations are randomly drawn. The strategy is valid for any number of permutations and samples. Some simple recommendations are made for the implementation of permutation tests in practice.  相似文献   

9.
Statistical analysis of domains in interacting protein pairs   总被引:10,自引:0,他引:10  
MOTIVATION: Several methods have recently been developed to analyse large-scale sets of physical interactions between proteins in terms of physical contacts between the constituent domains, often with a view to predicting new pairwise interactions. Our aim is to combine genomic interaction data, in which domain-domain contacts are not explicitly reported, with the domain-level structure of individual proteins, in order to learn about the structure of interacting protein pairs. Our approach is driven by the need to assess the evidence for physical contacts between domains in a statistically rigorous way. RESULTS: We develop a statistical approach that assigns p-values to pairs of domain superfamilies, measuring the strength of evidence within a set of protein interactions that domains from these superfamilies form contacts. A set of p-values is calculated for SCOP superfamily pairs, based on a pooled data set of interactions from yeast. These p-values can be used to predict which domains come into contact in an interacting protein pair. This predictive scheme is tested against protein complexes in the Protein Quaternary Structure (PQS) database, and is used to predict domain-domain contacts within 705 interacting protein pairs taken from our pooled data set.  相似文献   

10.
Pvclust: an R package for assessing the uncertainty in hierarchical clustering   总被引:11,自引:0,他引:11  
SUMMARY: Pvclust is an add-on package for a statistical software R to assess the uncertainty in hierarchical cluster analysis. Pvclust can be used easily for general statistical problems, such as DNA microarray analysis, to perform the bootstrap analysis of clustering, which has been popular in phylogenetic analysis. Pvclust calculates probability values (p-values) for each cluster using bootstrap resampling techniques. Two types of p-values are available: approximately unbiased (AU) p-value and bootstrap probability (BP) value. Multiscale bootstrap resampling is used for the calculation of AU p-value, which has superiority in bias over BP value calculated by the ordinary bootstrap resampling. In addition the computation time can be enormously decreased with parallel computing option.  相似文献   

11.
Environmental DNA (eDNA) is a promising tool for rapid and noninvasive biodiversity monitoring. eDNA density is low in environmental samples, and a capture method, such as filtration, is often required to concentrate eDNA for downstream analyses. In this study, six treatments, with differing filter types and pore sizes for eDNA capture, were compared for their efficiency and accuracy to assess fish community structure with known fish abundance and biomass via eDNA metabarcoding. Our results showed that different filters (with the exception of 20‐μm large‐pore filters) were broadly consistent in their DNA capture ability. The 0.45‐μm filters performed the best in terms of total DNA yield, probability of species detection, repeatability within pond and consistency between ponds. However performance of 0.45‐μm filters was only marginally better than for 0.8‐μm filters, while filtration time was significantly longer. Given this trade‐off, the 0.8‐μm filter is the optimal pore size of membrane filter for turbid, eutrophic and high fish density ponds analysed here. The 0.45‐μm Sterivex enclosed filters performed reasonably well and are suitable in situations where on‐site filtration is required. Finally, prefilters are applied only if absolutely essential for reducing the filtration time or increasing the throughput volume of the capture filters. In summary, we found encouraging similarity in the results obtained from different filtration methods, but the optimal pore size of filter or filter type might strongly depend on the water type under study.  相似文献   

12.
13.
Pay-for-performance programs are often aimed to improve the management of chronic diseases. We evaluate the impact of a local pay for performance programme (QOF+), which rewarded financially more ambitious quality targets (‘stretch targets’) than those used nationally in the Quality and Outcomes Framework (QOF). We focus on targets for intermediate outcomes in patients with cardiovascular disease and diabetes. A difference-in-difference approach is used to compare practice level achievements before and after the introduction of the local pay for performance program. In addition, we analysed patient-level data on exception reporting and intermediate outcomes utilizing an interrupted time series analysis. The local pay for performance program led to significantly higher target achievements (hypertension: p-value <0.001, coronary heart disease: p-values <0.001, diabetes: p-values <0.061, stroke: p-values <0.003). However, the increase was driven by higher rates of exception reporting (hypertension: p-value <0.001, coronary heart disease: p-values <0.03, diabetes: p-values <0.05) in patients with all conditions except for stroke. Exception reporting allows practitioners to exclude patients from target calculations if certain criteria are met, e.g. informed dissent of the patient for treatment. There were no statistically significant improvements in mean blood pressure, cholesterol or HbA1c levels. Thus, achievement of higher payment thresholds in the local pay for performance scheme was mainly attributed to increased exception reporting by practices with no discernable improvements in overall clinical quality. Hence, active monitoring of exception reporting should be considered when setting more ambitious quality targets. More generally, the study suggests a trade-off between additional incentive for better care and monitoring costs.  相似文献   

14.
The assessment of the effectiveness of a treatment in a clinical trial, depends on calculating p-values. However, p-values are only indirect and partial indicators of a genuine effect. Particularly in situations where publication bias is very likely, assessment using a p-value of 0.05 may not be sufficiently cautious. In other situations it seems reasonable to believe that assessment based on p-values may be unduly conservative. Assessments could be improved by using prior information. This implies using a Bayesian approach to take account of prior probability. However, the use of prior information in the form of expert opinion can allow bias. A method is given here that applies to assessments already included or likely to be included in the Cochrane Collaboration, excluding those reviews concerning new drugs. This method uses prior information and a Bayesian approach, but the prior information comes not from expert opinion but simply from the distribution of effectiveness apparent in a random sample of summary statistics in the Cochrane Collaboration. The method takes certain types of summary statistics and their confidence intervals and with the help of a graph, translates this into probabilities that the treatments being trialled are effective.  相似文献   

15.
In order to detect linkage of the simulated complex disease Kofendrerd Personality Disorder across studies from multiple populations, we performed a genome scan meta-analysis (GSMA). Using the 7-cM microsatellite map, nonparametric multipoint linkage analyses were performed separately on each of the four simulated populations independently to determine p-values. The genome of each population was divided into 20-cM bin regions, and each bin was rank-ordered based on the most significant linkage p-value for that population in that region. The bin ranks were then averaged across all four studies to determine the most significant 20-cM regions over all studies. Statistical significance of the averaged bin ranks was determined from a normal distribution of randomly assigned rank averages. To narrow the region of interest for fine-mapping, the meta-analysis was repeated two additional times, with each of the 20-cM bins offset by 7 cM and 13 cM, respectively, creating regions of overlap with the original method. The 6-7 cM shared regions, where the highest averaged 20-cM bins from each of the three offsets overlap, designated the minimum region of maximum significance (MRMS). Application of the GSMA-MRMS method revealed genome wide significance (p-values refer to the average rank assigned to the bin) at regions including or adjacent to all of the simulated disease loci: chromosome 1 (p < 0.0001 for 160-167 cM, including D1), chromosome 3 (p-value < 0.0000001 for 287-294 cM, including D2), chromosome 5 (p-value < 0.001 for 0-7 cM, including D3), and chromosome 9 (p-value < 0.05 for 7-14 cM, the region adjacent to D4). This GSMA analysis approach demonstrates the power of linkage meta-analysis to detect multiple genes simultaneously for a complex disorder. The MRMS method enhances this powerful tool to focus on more localized regions of linkage.  相似文献   

16.
In this report, we describe the result of an extensive investigation of the effects of the conformations of proteins on the solvency of the bulk-phase water in which the proteins are dissolved. The concentrations of the proteins used were usually between 20 to 40%; the temperature was 25 degrees +/- 1 degree C. To probe the solvency of the water, the apparent equilibrium distribution coefficients (or p-values) of 4 solutes were studied: Na+ (sulfate), glycine, sucrose, and urea. From 8 to 14 isolated proteins in three types of conformations were investigated: native; denatured by agents that unravel the secondary structure (e.g., alpha-helix, beta-pleated sheet) of the protein (i.e., 9 M urea, 3 M guanidine HCl); denatured by agents that only disrupt the tertiary structure but leave the secondary structure intact or even strengthened (i.e., 0.1 M sodium dodecylsulfate or SDS, 2 M n-propanol). The results are as follows: (1) as a rule, native proteins have no or weak effect on the solvency of the water for all 4 probes; (2) exposure to 0.1 M SDS and to 2 M n-propanol, as a rule, does not significantly decrease the p-value of all 4 probes; (3) exposure to 9 M urea and to 3 M guanidine HCl consistently lowers the p-values of sucrose, glycine and Na+ (sulfate) and equally consistently produces no effect on the p-value of urea. Sucrose, glycine, and Na+ are found in low concentrations in cell water while urea is not. These experiments were designed and carried out primarily to test two subsidiary theories of the AI hypotheses: the polarized multilayer (PM) theory of cell water; and the theory of size-dependent solute exclusion.(ABSTRACT TRUNCATED AT 400 WORDS)  相似文献   

17.
Expression levels of mRNAs are among other factors regulated by microRNAs. A particular microRNA can bind specifically to several target mRNAs and lead to their degradation. Expression levels of both, mRNAs and microRNAs, can be obtained by microarray experiments. In order to increase the power of detecting microRNAs that are differentially expressed between two different groups of samples, we incorporate expression levels of their related target gene sets. Group effects are determined individually for each microRNA, and by enrichment tests and global tests for target gene sets. The resulting lists of p-values from individual and set-wise testing are combined by means of meta analysis. We propose a new approach to connect microRNA-wise and gene set-wise information by means of p-value combination as often used in meta-analysis. In this context, we evaluate the usefulness of different approaches of gene set tests. In a simulation study we reveal that our combination approach is more powerful than microRNA-wise testing alone. Furthermore, we show that combining microRNA-wise results with 'competitive' gene set tests maintains a pre-specified false discovery rate. In contrast, a combination with 'self-contained' gene set tests can harm the false discovery rate, particularly when gene sets are not disjunct.  相似文献   

18.
Summary A new type of fluorescence cytophotometer has been developed for multi-parameter cell analysis (Olympus BH2-QRFL). For multi-color fluorescence cytophotometry, this instrument is equipped with four sets of interchangeable filters, each consisting of an excitation filter, a dichroic mirror with a barrier filter, and a measuring filter. For permitting automatic operation of the filter sets, the cytophotometer is connected on line with a personal computer (HP 85F). A desired sequence of filter sets can be memorized in the software and multiple cellular constituents can be rapidly and consecutively determined on a single cell basis. All data are stored in the same computer and can be retrieved for further statistical analysis and display either in tabular form, or as histogram, correlation histograms, two-dimensional scatter plot, or two-dimensional frequency distribution histogram, on the CRT (cathode ray tube) with simultaneous hard copy. As an example of multiparameter cell analysis, combined protein and DNA measurements were performed on normal, border-line, and cancerous gynecological cytology specimens by using the ninhydrin-Schiff and Feulgen techniques.  相似文献   

19.
A THREE-STAGE ALGORITHM FOR FILTERING ERRONEOUS ARGOS SATELLITE LOCATIONS   总被引:2,自引:0,他引:2  
Several methods have been used to identify erroneous animal locations based on Argos satellite data. Using 15,987 satellite locations for 37 gray seals ( Haliockoerus grypus ), we tested a three-stage filtering algorithm designed to address shortcomings of other filters. In stage 1, for each location, four rates of travel were calculated—the rate to each of the two previous locations and the two subsequent locations. If all four rates exceeded 2 m/sec (95th percentile of our data), the location was removed (7.25% of total locations). Stage 2 incorporated the filtering algorithm developed by McConnell et al. (1992) resulting in the rejection of 22.75% of total locations based on reasonable assumptions of straight-line travel. At stage 3, the remaining data were evaluated against a distance threshold, defined as the 99th percentile of realized distance traveled over a period of seven days. Locations exceeding this threshold-were rejected (0.69% of total locations). Overall, the three-stage filter eliminated fewer locations (30.7 ± 1.62%), than the stage 2 filter alone. Most standard locations were retained, but 85.7% of location class 0, 76.6% of A, and 41.9% of B were also retained. These location classes account for most of data routinely collected but not used.  相似文献   

20.
We study an adaptive statistical approach to analyze brain networks represented by brain connection matrices of interregional connectivity (connectomes). Our approach is at a middle level between a global analysis and single connections analysis by considering subnetworks of the global brain network. These subnetworks represent either the inter-connectivity between two brain anatomical regions or by the intra-connectivity within the same brain anatomical region. An appropriate summary statistic, that characterizes a meaningful feature of the subnetwork, is evaluated. Based on this summary statistic, a statistical test is performed to derive the corresponding p-value. The reformulation of the problem in this way reduces the number of statistical tests in an orderly fashion based on our understanding of the problem. Considering the global testing problem, the p-values are corrected to control the rate of false discoveries. Finally, the procedure is followed by a local investigation within the significant subnetworks. We contrast this strategy with the one based on the individual measures in terms of power. We show that this strategy has a great potential, in particular in cases where the subnetworks are well defined and the summary statistics are properly chosen. As an application example, we compare structural brain connection matrices of two groups of subjects with a 22q11.2 deletion syndrome, distinguished by their IQ scores.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号