首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Microarrays allow researchers to measure the expression of thousands of genes in a single experiment. Before statistical comparisons can be made, the data must be assessed for quality and normalisation procedures must be applied, of which many have been proposed. Methods of comparing the normalised data are also abundant, and no clear consensus has yet been reached. The purpose of this paper was to compare those methods used by the EADGENE network on a very noisy simulated data set. With the a priori knowledge of which genes are differentially expressed, it is possible to compare the success of each approach quantitatively. Use of an intensity-dependent normalisation procedure was common, as was correction for multiple testing. Most variety in performance resulted from differing approaches to data quality and the use of different statistical tests. Very few of the methods used any kind of background correction. A number of approaches achieved a success rate of 95% or above, with relatively small numbers of false positives and negatives. Applying stringent spot selection criteria and elimination of data did not improve the false positive rate and greatly increased the false negative rate. However, most approaches performed well, and it is encouraging that widely available techniques can achieve such good results on a very noisy data set.  相似文献   

2.
When detecting positive selection in proteins, the prevalence of errors resulting from misalignment and the ability of alignment filters to mitigate such errors are not well understood, but filters are commonly applied to try to avoid false positive results. Focusing on the sitewise detection of positive selection across a wide range of divergence levels and indel rates, we performed simulation experiments to quantify the false positives and false negatives introduced by alignment error and the ability of alignment filters to improve performance. We found that some aligners led to many false positives, whereas others resulted in very few. False negatives were a problem for all aligners, increasing with sequence divergence. Of the aligners tested, PRANK's codon-based alignments consistently performed the best and ClustalW performed the worst. Of the filters tested, GUIDANCE performed the best and Gblocks performed the worst. Although some filters showed good ability to reduce the error rates from ClustalW and MAFFT alignments, none were found to substantially improve the performance of PRANK alignments under most conditions. Our results revealed distinct trends in error rates and power levels for aligners and filters within a biologically plausible parameter space. With the best aligner, a low false positive rate was maintained even with extremely divergent indel-prone sequences. Controls using the true alignment and an optimal filtering method suggested that performance improvements could be gained by improving aligners or filters to reduce the prevalence of false negatives, especially at higher divergence levels and indel rates.  相似文献   

3.
A severe drawback in the high-throughput screening (HTS) process is the unintentional (random) presence of false positives and negatives. Their rates depend, among others, on the screening process being applied and the target class. Although false positives can be sorted out in subsequent process steps, their occurrence can lead to increased project cost. More fundamentally, it is not possible to rescue false nonhits. In this article, we investigate the prediction of the primary hit rate, hit confirmation rate, and false-positive and false-negative rates. Results for approximately 2800 compounds are considered that are tested as a pilot screen ahead of the primary screening work. This pilot screen is done at several concentrations and in replicates. The rates are predicted as a function of the proposed hit threshold by having the replicates serve as each other's confirmers, and confidence limits to the prediction are attached by means of a resampling scheme. A comparison of the rates resulting from the resampling with the primary hit rate and the confirmation rates obtained during the screening campaign shows how accurate this method is. Hence, the "optimal" compound concentration for the screen as well as the optimal hit threshold corresponding to low false rates can be determined prior to starting the subsequent screening campaign.  相似文献   

4.
Environmental DNA (eDNA) metabarcoding is increasingly used to study the present and past biodiversity. eDNA analyses often rely on amplification of very small quantities or degraded DNA. To avoid missing detection of taxa that are actually present (false negatives), multiple extractions and amplifications of the same samples are often performed. However, the level of replication needed for reliable estimates of the presence/absence patterns remains an unaddressed topic. Furthermore, degraded DNA and PCR/sequencing errors might produce false positives. We used simulations and empirical data to evaluate the level of replication required for accurate detection of targeted taxa in different contexts and to assess the performance of methods used to reduce the risk of false detections. Furthermore, we evaluated whether statistical approaches developed to estimate occupancy in the presence of observational errors can successfully estimate true prevalence, detection probability and false‐positive rates. Replications reduced the rate of false negatives; the optimal level of replication was strongly dependent on the detection probability of taxa. Occupancy models successfully estimated true prevalence, detection probability and false‐positive rates, but their performance increased with the number of replicates. At least eight PCR replicates should be performed if detection probability is not high, such as in ancient DNA studies. Multiple DNA extractions from the same sample yielded consistent results; in some cases, collecting multiple samples from the same locality allowed detecting more species. The optimal level of replication for accurate species detection strongly varies among studies and could be explicitly estimated to improve the reliability of results.  相似文献   

5.
Acquisition of microarray data is prone to systematic errors. A correction, called normalisation, must be applied to the data before further analysis is performed. With many normalisation techniques published and in use, the best way of executing this correction remains an open question. In this study, a variety of single-slide normalisation techniques, and different parameter settings for these techniques, were compared over many replicated microarray experiments. Different normalisation techniques were assessed through the distribution of the standard deviation of replicates from one biological sample across different slides. It is shown that local normalisation outperformed global normalisation, and intensity-based 'LOWESS' outperformed trimmed mean and median normalisation techniques. Overall, the top performing normalisation technique was a print-tip-based LOWESS with zero robust iterations. Lastly, we validated this evaluation methodology by examining the ability to predict oestrogen receptor-positive and -negative breast cancer samples with data that had been normalised using different techniques.  相似文献   

6.
Statistical methods for ranking differentially expressed genes   总被引:3,自引:1,他引:2  
In the analysis of microarray data the identification of differential expression is paramount. Here I outline a method for finding an optimal test statistic with which to rank genes with respect to differential expression. Tests of the method show that it allows generation of top gene lists that give few false positives and few false negatives. Estimation of the false-negative as well as the false-positive rate lies at the heart of the method.  相似文献   

7.
INTRODUCTION: Microarray experiments often have complex designs that include sample pooling, biological and technical replication, sample pairing and dye-swapping. This article demonstrates how statistical modelling can illuminate issues in the design and analysis of microarray experiments, and this information can then be used to plan effective studies. METHODS: A very detailed statistical model for microarray data is introduced, to show the possible sources of variation that are present in even the simplest microarray experiments. Based on this model, the efficacy of common experimental designs, normalisation methodologies and analyses is determined. RESULTS: When the cost of the arrays is high compared with the cost of samples, sample pooling and spot replication are shown to be efficient variance reduction methods, whereas technical replication of whole arrays is demonstrated to be very inefficient. Dye-swap designs can use biological replicates rather than technical replicates to improve efficiency and simplify analysis. When the cost of samples is high and technical variation is a major portion of the error, technical replication can be cost effective. Normalisation by centreing on a small number of spots may reduce array effects, but can introduce considerable variation in the results. Centreing using the bulk of spots on the array is less variable. Similarly, normalisation methods based on regression methods can introduce variability. Except for normalisation methods based on spiking controls, all normalisation requires that most genes do not differentially express. Methods based on spatial location and/or intensity also require that the nondifferentially expressing genes are at random with respect to location and intensity. Spotting designs should be carefully done so that spot replicates are widely spaced on the array, and genes with similar expression patterns are not clustered. DISCUSSION: The tools for statistical design of experiments can be applied to microarray experiments to improve both efficiency and validity of the studies. Given the high cost of microarray experiments, the benefits of statistical input prior to running the experiment cannot be over-emphasised.  相似文献   

8.
Though surface electromyography (EMG) has been widely used in studies of occupational exposure, its precision in terms of the variance between-days and between-subjects has seldom been evaluated. This study aimed at such an evaluation. Six women performed three different work tasks: 'materials picking', 'light assembly', and 'heavy assembly', repeated on 3 different days. EMG was recorded from m. trapezius, m. infraspinatus and the forearm extensors. Normalisation was made to a maximal (MVE), and a submaximal (RVE), reference contraction. Variance components between days (within subjects) and between subjects were derived for the 10th, 50th and 90th percentiles, as well as for muscular rest parameters. For the task 'heavy assembly', the coefficient of variation between days (CV(BD)) was 8% for m. trapezius (right side, 50th percentile, MVE normalised values). Larger variabilities were found for m. infraspinatus (CV(BD) 15%), and the forearm extensors (CV(BD) 33%). Between-subjects variability (CV(BS)) was greater, 16% for m. trapezius and 57% for m. infraspinatus, 29% for the forearm extensors. RVE normalisation resulted in larger CV(BD), while reducing CV(BS). The between-days and between-subjects variability may be used to optimise sampling strategy, and to assess the bias in epidemiological studies. The bias caused by measurement procedures per se is acceptable.  相似文献   

9.
MOTIVATION: Experimental limitations in high-throughput protein-protein interaction detection methods have resulted in low quality interaction datasets that contained sizable fractions of false positives and false negatives. Small-scale, focused experiments are then needed to complement the high-throughput methods to extract true protein interactions. However, the naturally vast interactomes would require much more scalable approaches. RESULTS: We describe a novel method called IRAP* as a computational complement for repurification of the highly erroneous experimentally derived protein interactomes. Our method involves an iterative process of removing interactions that are confidently identified as false positives and adding interactions detected as false negatives into the interactomes. Identification of both false positives and false negatives are performed in IRAP* using interaction confidence measures based on network topological metrics. Potential false positives are identified amongst the detected interactions as those with very low computed confidence values, while potential false negatives are discovered as the undetected interactions with high computed confidence values. Our results from applying IRAP* on large-scale interaction datasets generated by the popular yeast-two-hybrid assays for yeast, fruit fly and worm showed that the computationally repurified interaction datasets contained potentially lower fractions of false positive and false negative errors based on functional homogeneity. AVAILABILITY: The confidence indices for PPIs in yeast, fruit fly and worm as computed by our method can be found at our website http://www.comp.nus.edu.sg/~chenjin/fpfn.  相似文献   

10.
Amniotic-fluid intestinal alkaline phosphatase activity, gamma-glutamyltranspeptidase activity, and leucine-aminopeptidase activity were quantitated to assess their reliability for the prenatal diagnosis of cystic fibrosis. The results indicate that for each of these enzymes an arbitrary cutoff point could be chosen that would enable one to correctly predict the outcome for the majority of at-risk pregnancies. However, some false positives and false negatives occurred with each enzyme. To obtain optimal diagnostic discrimination, the three enzyme values obtained for each sample were combined into a single linear discriminant function that proved to be a more accurate indicator of the outcome of the pregnancy. This was especially important for those cases in which the predicted outcome as based on the individual enzyme results was in disagreement. From the cases studied here, it appears that this method can be expected to give a correct prediction in approximately 96.5% of all 25%-at-risk pregnancies. False positives can be expected in approximately 1.4% of the pregnancies and false negatives in approximately 2.2%.  相似文献   

11.
《Comptes Rendus Palevol》2013,12(6):381-387
I propose an approach to identify, among several strategies of phylogenetic analysis, those producing the most accurate results. This approach is based on the hypothesis that the more a result is reproduced from independent data, the more it reflects the historical signal common to the analysed data. Under this hypothesis, the capacity of an analytical strategy to extract historical signal should correlate positively with the coherence of the obtained results. I apply this approach to a series of analyses on empirical data, basing the coherence measure on the Robinson–Foulds distances between the obtained trees. At first approximation, the analytical strategies most suitable for the data produce the most coherent results. However, risks of false positives and false negatives are identified, which are difficult to rule out.  相似文献   

12.
The ability of plant genotoxicity assays to predict carcinogenicity   总被引:3,自引:0,他引:3  
A number of assays have been developed which use higher plants for measuring mutagenic or cytogenetic effects of chemicals, as an indication of carcinogenicity. Plant assays require less extensive equipment, materials and personnel than most other genotoxicity tests, which is a potential advantage, particularly in less developed parts of the world. We have analyzed data on 9 plant genotoxicity assays evaluated by the Gene-Tox program of the U.S. Environmental Protection Agency, using methodologies we have recently developed to assess the capability of assays to predict carcinogenicity and carcinogenic potency. All 9 of the plant assays appear to have high sensitivity (few false negatives). Specificity (rate of true negatives) was more difficult to evaluate because of limited testing on non-carcinogens; however, available data indicate that only the Arabidopsis mutagenicity (ArM) test appears to have high specificity. Based upon their high sensitivity, plant genotoxicity tests are most appropriate for a risk-averse testing program, because although many false positives will be generated, the relatively few negative results will be quite reliable.  相似文献   

13.
Although the aldosterone/renin ratio (ARR) is the most reliable screening test for primary aldo-steronism, false positives and negatives occur. Dietary salt restriction, concomitant malignant or renovascular hypertension, pregnancy and treatment with diuretics (including spironolactone), dihydropyridine calcium blockers, angiotensin converting enzyme inhibitors, and angiotensin receptor antagonists can produce false negatives by stimulating renin. We recently reported selective serotonin reuptake inhibitors lower the ratio. Because potassium regulates aldosterone, uncorrected hypokalemia can lead to false negatives. Beta-blockers, alpha-methyldopa, clonidine, and nonsteroidal anti-inflammatory drugs suppress renin, raising the ARR with potential for false positives. False positives may occur in patients with renal dysfunction or advancing age. We recently showed that (1) females have higher ratios than males, and (2) false positive ratios can occur during the luteal menstrual phase and while taking an oral ethynylestradiol/drospirenone (but not implanted subdermal etonogestrel) contraceptive, but only if calculated using direct renin concentration and not plasma renin activity. Where feasible, diuretics should be ceased at least 6 weeks and other interfering medications at least 2 before ARR measurement, substituting noninterfering agents (e. g., verapamil slow-release±hydralazine and prazosin or doxazosin) were required. Hypokalemia should be corrected and a liberal salt diet encouraged. Collecting blood midmorning from seated patients following 2-4 h upright posture improves sensitivity. The ARR is a screening test only and should be repeated once or more before deciding whether to proceed to confirmatory suppression testing. Liquid chromatography-tandem mass spectrometry aldosterone assays represent a major advance towards addressing inaccuracies inherent in other available methods.  相似文献   

14.
Normalisation of data to minimise the impact of technical variation on comparative sample analysis is often carried out. Using SELDI data as a model, we have examined the effects of normalisation by TIC which is commonly used for MS data. Significant intergroup differences in normalisation factor were found for serum profiles which could not be explained by experimental factors, implying that normalisation by TIC may in some situations also normalise biological differences and should be systematically evaluated.  相似文献   

15.
A comparison was made of four statistically based schemes for classifying epithelial cells from 243 fine needle aspirates of breast masses as benign or malignant. Two schemes were computer-generated decision trees and two were user generated. Eleven cytologic characteristics described in the literature as being useful in distinguishing benign from malignant breast aspirates were assessed on a scale of 1 to 10, with 1 being closest to that described as benign and 10 to that described as malignant. The original computer-generated dichotomous decision tree gave 6 false negatives and 12 false positives on the data set; another tree generated from the current data improved performance slightly, with 5 false negatives and 10 false positives. Maximum diagnostic overlap occurred at the cut-point of the original dichotomous tree. The insertion of a third node evaluating additional parameters resulted in one false negative and seven false positives. This performance was matched by summing the scores of the eight characteristics that individually were most effective in separating benign from malignant. We conclude that, while statistically designed, computer-generated dichotomous decision trees identify a starting sequence for applying cytologic characteristics to distinguish between benign and malignant breast aspirates, modifications based on human expert knowledge may result in schemes that improve diagnostic performance.  相似文献   

16.
Protein-protein interactions (PPIs) are essential to most fundamental cellular processes. There has been increasing interest in reconstructing PPIs networks. However, several critical difficulties exist in obtaining reliable predictions. Noticeably, false positive rates can be as high as >80%. Error correction from each generating source can be both time-consuming and inefficient due to the difficulty of covering the errors from multiple levels of data processing procedures within a single test. We propose a novel Bayesian integration method, deemed nonparametric Bayes ensemble learning (NBEL), to lower the misclassification rate (both false positives and negatives) through automatically up-weighting data sources that are most informative, while down-weighting less informative and biased sources. Extensive studies indicate that NBEL is significantly more robust than the classic naïve Bayes to unreliable, error-prone and contaminated data. On a large human data set our NBEL approach predicts many more PPIs than naïve Bayes. This suggests that previous studies may have large numbers of not only false positives but also false negatives. The validation on two human PPIs datasets having high quality supports our observations. Our experiments demonstrate that it is feasible to predict high-throughput PPIs computationally with substantially reduced false positives and false negatives. The ability of predicting large numbers of PPIs both reliably and automatically may inspire people to use computational approaches to correct data errors in general, and may speed up PPIs prediction with high quality. Such a reliable prediction may provide a solid platform to other studies such as protein functions prediction and roles of PPIs in disease susceptibility.  相似文献   

17.
18.
Large-scale presence-absence monitoring programs have great promise for many conservation applications. Their value can be limited by potential incorrect inferences owing to observational errors, especially when data are collected by the public. To combat this, previous analytical methods have focused on addressing non-detection from public survey data. Misclassification errors have received less attention but are also likely to be a common component of public surveys, as well as many other data types. We derive estimators for dynamic occupancy parameters (extinction and colonization), focusing on the case where certainty can be assumed for a subset of detections. We demonstrate how to simultaneously account for non-detection (false negatives) and misclassification (false positives) when estimating occurrence parameters for gray wolves in northern Montana from 2007–2010. Our primary data source for the analysis was observations by deer and elk hunters, reported as part of the state’s annual hunter survey. This data was supplemented with data from known locations of radio-collared wolves. We found that occupancy was relatively stable during the years of the study and wolves were largely restricted to the highest quality habitats in the study area. Transitions in the occupancy status of sites were rare, as occupied sites almost always remained occupied and unoccupied sites remained unoccupied. Failing to account for false positives led to over estimation of both the area inhabited by wolves and the frequency of turnover. The ability to properly account for both false negatives and false positives is an important step to improve inferences for conservation from large-scale public surveys. The approach we propose will improve our understanding of the status of wolf populations and is relevant to many other data types where false positives are a component of observations.  相似文献   

19.

Background  

Yeast two-hybrid (Y2H) screens have been among the most powerful methods to detect and analyze protein-protein interactions. However, they suffer from a significant degree of false negatives, i.e. true interactions that are not detected, and to a certain degree from false positives, i.e. interactions that appear to take place only in the context of the Y2H assay. While the fraction of false positives remains difficult to estimate, the fraction of false negatives in typical Y2H screens is on the order of 70-90%. Here we present novel Y2H vectors that significantly decrease the number of false negatives and help to mitigate the false positive problem.  相似文献   

20.
MOTIVATION: The occurrence of false positives and false negatives in a microarray analysis could be easily estimated if the distribution of p-values were approximated and then expressed as a mixture of null and alternative densities. Essentially any distribution of p-values can be expressed as such a mixture by extracting a uniform density from it. RESULTS: The occurrence of false positives and false negatives in a microarray analysis could be easily estimated if the distribution of p-values were approximated and then expressed as a mixture of null and alternative densities. Essentially any distribution of p-values can be expressed as such a mixture by extracting a uniform density from it. AVAILABILITY: An S-plus function library is available from http://www.stjuderesearch.org/statistics.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号