首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
DNA microarray is an important tool for the study of gene activities but the resultant data consisting of thousands of points are error-prone. A serious limitation in microarray analysis is the unreliability of the data generated from low signal intensities. Such data may produce erroneous gene expression ratios and cause unnecessary validation or post-analysis follow-up tasks. In this study, we describe an approach based on normal mixture modeling for determining optimal signal intensity thresholds to identify reliable measurements of the microarray elements and subsequently eliminate false expression ratios. We used univariate and bivariate mixture modeling to segregate the microarray data into two classes, low signal intensity and reliable signal intensity populations, and applied Bayesian decision theory to find the optimal signal thresholds. The bivariate analysis approach was found to be more accurate than the univariate approach; both approaches were superior to a conventional method when validated against a reference set of biological data that consisted of true and false gene expression data. Elimination of unreliable signal intensities in microarray data should contribute to the quality of microarray data including reproducibility and reliability of gene expression ratios.  相似文献   

2.
Western blot data are widely used in quantitative applications such as statistical testing and mathematical modelling. To ensure accurate quantitation and comparability between experiments, Western blot replicates must be normalised, but it is unclear how the available methods affect statistical properties of the data. Here we evaluate three commonly used normalisation strategies: (i) by fixed normalisation point or control; (ii) by sum of all data points in a replicate; and (iii) by optimal alignment of the replicates. We consider how these different strategies affect the coefficient of variation (CV) and the results of hypothesis testing with the normalised data. Normalisation by fixed point tends to increase the mean CV of normalised data in a manner that naturally depends on the choice of the normalisation point. Thus, in the context of hypothesis testing, normalisation by fixed point reduces false positives and increases false negatives. Analysis of published experimental data shows that choosing normalisation points with low quantified intensities results in a high normalised data CV and should thus be avoided. Normalisation by sum or by optimal alignment redistributes the raw data uncertainty in a mean-dependent manner, reducing the CV of high intensity points and increasing the CV of low intensity points. This causes the effect of normalisations by sum or optimal alignment on hypothesis testing to depend on the mean of the data tested; for high intensity points, false positives are increased and false negatives are decreased, while for low intensity points, false positives are decreased and false negatives are increased. These results will aid users of Western blotting to choose a suitable normalisation strategy and also understand the implications of this normalisation for subsequent hypothesis testing.  相似文献   

3.
An increasing number of studies are using landscape genomics to investigate local adaptation in wild and domestic populations. Implementation of this approach requires the sampling phase to consider the complexity of environmental settings and the burden of logistical constraints. These important aspects are often underestimated in the literature dedicated to sampling strategies. In this study, we computed simulated genomic data sets to run against actual environmental data in order to trial landscape genomics experiments under distinct sampling strategies. These strategies differed by design approach (to enhance environmental and/or geographical representativeness at study sites), number of sampling locations and sample sizes. We then evaluated how these elements affected statistical performances (power and false discoveries) under two antithetical demographic scenarios. Our results highlight the importance of selecting an appropriate sample size, which should be modified based on the demographic characteristics of the studied population. For species with limited dispersal, sample sizes above 200 units are generally sufficient to detect most adaptive signals, while in random mating populations this threshold should be increased to 400 units. Furthermore, we describe a design approach that maximizes both environmental and geographical representativeness of sampling sites and show how it systematically outperforms random or regular sampling schemes. Finally, we show that although having more sampling locations (between 40 and 50 sites) increase statistical power and reduce false discovery rate, similar results can be achieved with a moderate number of sites (20 sites). Overall, this study provides valuable guidelines for optimizing sampling strategies for landscape genomics experiments.  相似文献   

4.
Problems involving thousands of null hypotheses have been addressed by estimating the local false discovery rate (LFDR). A previous LFDR approach to reporting point and interval estimates of an effect-size parameter uses an estimate of the prior distribution of the parameter conditional on the alternative hypothesis. That estimated prior is often unreliable, and yet strongly influences the posterior intervals and point estimates, causing the posterior intervals to differ from fixed-parameter confidence intervals, even for arbitrarily small estimates of the LFDR. That influence of the estimated prior manifests the failure of the conditional posterior intervals, given the truth of the alternative hypothesis, to match the confidence intervals. Those problems are overcome by changing the posterior distribution conditional on the alternative hypothesis from a Bayesian posterior to a confidence posterior. Unlike the Bayesian posterior, the confidence posterior equates the posterior probability that the parameter lies in a fixed interval with the coverage rate of the coinciding confidence interval. The resulting confidence-Bayes hybrid posterior supplies interval and point estimates that shrink toward the null hypothesis value. The confidence intervals tend to be much shorter than their fixed-parameter counterparts, as illustrated with gene expression data. Simulations nonetheless confirm that the shrunken confidence intervals cover the parameter more frequently than stated. Generally applicable sufficient conditions for correct coverage are given. In addition to having those frequentist properties, the hybrid posterior can also be motivated from an objective Bayesian perspective by requiring coherence with some default prior conditional on the alternative hypothesis. That requirement generates a new class of approximate posteriors that supplement Bayes factors modified for improper priors and that dampen the influence of proper priors on the credibility intervals. While that class of posteriors intersects the class of confidence-Bayes posteriors, neither class is a subset of the other. In short, two first principles generate both classes of posteriors: a coherence principle and a relevance principle. The coherence principle requires that all effect size estimates comply with the same probability distribution. The relevance principle means effect size estimates given the truth of an alternative hypothesis cannot depend on whether that truth was known prior to observing the data or whether it was learned from the data.  相似文献   

5.
Statistical procedures underpin the process of scientific discovery. As researchers, one way we use these procedures is to test the validity of a null hypothesis. Often, we test the validity of more than one null hypothesis. If we fail to use an appropriate procedure to account for this multiplicity, then we are more likely to reach a wrong scientific conclusion-we are more likely to make a mistake. In physiology, experiments that involve multiple comparisons are common: of the original articles published in 1997 by the American Physiological Society, approximately 40% cite a multiple comparison procedure. In this review, I demonstrate the statistical issue embedded in multiple comparisons, and I summarize the philosophies of handling this issue. I also illustrate the three procedures-Newman-Keuls, Bonferroni, least significant difference-cited most often in my literature review; each of these procedures is of limited practical value. Last, I demonstrate the false discovery rate procedure, a promising development in multiple comparisons. The false discovery rate procedure may be the best practical solution to the problems of multiple comparisons that exist within physiology and other scientific disciplines.  相似文献   

6.
Explanations for the emergence of monogamous marriage have focused on the cross-cultural distribution of marriage strategies, thus failing to account for their history. In this paper I reconstruct the pattern of change in marriage strategies in the history of societies speaking Indo-European languages, using cross-cultural data in the systematic and explicitly historical framework afforded by the phylogenetic comparative approach. The analysis provides evidence in support of Proto-Indo-European monogamy, and that this pattern may have extended back to Proto-Indo-Hittite. These reconstructions push the origin of monogamous marriage into prehistory, well beyond the earliest instances documented in the historical record; this, in turn, challenges notions that the cross-cultural distribution of monogamous marriage reflects features of social organization typically associated with Eurasian societies, and with "societal complexity" and "modernization" more generally. I discuss implications of these findings in the context of the archaeological and genetic evidence on prehistoric social organization.  相似文献   

7.
Many recently developed nonparametric jump tests can be viewed as multiple hypothesis testing problems. For such multiple hypothesis tests, it is well known that controlling type I error often makes a large proportion of erroneous rejections, and such situation becomes even worse when the jump occurrence is a rare event. To obtain more reliable results, we aim to control the false discovery rate (FDR), an efficient compound error measure for erroneous rejections in multiple testing problems. We perform the test via the Barndorff-Nielsen and Shephard (BNS) test statistic, and control the FDR with the Benjamini and Hochberg (BH) procedure. We provide asymptotic results for the FDR control. From simulations, we examine relevant theoretical results and demonstrate the advantages of controlling the FDR. The hybrid approach is then applied to empirical analysis on two benchmark stock indices with high frequency data.  相似文献   

8.
Polanski A  Kimmel M 《Genetics》2003,165(1):427-436
We present new methodology for calculating sampling distributions of single-nucleotide polymorphism (SNP) frequencies in populations with time-varying size. Our approach is based on deriving analytical expressions for frequencies of SNPs. Analytical expressions allow for computations that are faster and more accurate than Monte Carlo simulations. In contrast to other articles showing analytical formulas for frequencies of SNPs, we derive expressions that contain coefficients that do not explode when the genealogy size increases. We also provide analytical formulas to describe the way in which the ascertainment procedure modifies SNP distributions. Using our methods, we study the power to test the hypothesis of exponential population expansion vs. the hypothesis of evolution with constant population size. We also analyze some of the available SNP data and we compare our results of demographic parameters estimation to those obtained in previous studies in population genetics. The analyzed data seem consistent with the hypothesis of past population growth of modern humans. The analysis of the data also shows a very strong sensitivity of estimated demographic parameters to changes of the model of the ascertainment procedure.  相似文献   

9.
For the approval of biosimilars, it is, in most cases, necessary to conduct large Phase III clinical trials in patients to convince the regulatory authorities that the product is comparable in terms of efficacy and safety to the originator product. As the originator product has already been studied in several trials beforehand, it seems natural to include this historical information into the showing of equivalent efficacy. Since all studies for the regulatory approval of biosimilars are confirmatory studies, it is required that the statistical approach has reasonable frequentist properties, most importantly, that the Type I error rate is controlled—at least in all scenarios that are realistic in practice. However, it is well known that the incorporation of historical information can lead to an inflation of the Type I error rate in the case of a conflict between the distribution of the historical data and the distribution of the trial data. We illustrate this issue and confirm, using the Bayesian robustified meta‐analytic‐predictive (MAP) approach as an example, that simultaneously controlling the Type I error rate over the complete parameter space and gaining power in comparison to a standard frequentist approach that only considers the data in the new study, is not possible. We propose a hybrid Bayesian‐frequentist approach for binary endpoints that controls the Type I error rate in the neighborhood of the center of the prior distribution, while improving the power. We study the properties of this approach in an extensive simulation study and provide a real‐world example.  相似文献   

10.
Direct optimization of unaligned sequence characters provides a natural framework to explore the sensitivity of phylogenetic hypotheses to variation in analytical parameters. Phenotypic data, when combined into such analyses, are typically analyzed with static homology correspondences unlike the dynamic homology sequence data. Static homology characters may be expected to constrain the direct optimization and thus, potentially increase the similarity of phylogenetic hypotheses under different cost sets. However, whether a total-evidence approach increases the phylogenetic stability or not remains empirically largely unexplored. Here, I studied the impact of static homology data on sensitivity using six empirical data sets composed of several molecular markers and phenotypic data. The inclusion of static homology phenotypic data increased the average stability of phylogenetic hypothesis in five out of the six data sets. To investigate if any static homology characters would have similar effect, the analyses were repeated with randomized phenotypic data, and with one of the molecular markers fixed as static homology characters. These analyses had, on average, almost no effect on the phylogenetic stability, although the randomized phenotypic data sometimes resulted in even higher stability than empirical phenotypic data. The impact was related to the strength of the phylogenetic signal in the phenotypic data: higher average jackknife support of the phenotypic tree correlated with stronger stabilizing effect in the total-evidence analysis. Phenotypic data with a strong signal made the total-evidence trees topologically more similar to the phenotypic trees, thus, they constrained the dynamic homology correspondences of the sequence data. Characters that increase phylogenetic stability are particularly valuable for phylogenetic inference. These results indicate an important role and additive value of phenotypic data in increasing the stability of phylogenetic hypotheses in total-evidence analyses.  相似文献   

11.
Genome scans with many genetic markers provide the opportunity to investigate local adaptation in natural populations and identify candidate genes under selection. In particular, SNPs are dense throughout the genome of most organisms and are commonly observed in functional genes making them ideal markers to study adaptive molecular variation. This approach has become commonly employed in ecological and population genetics studies to detect outlier loci that are putatively under selection. However, there are several challenges to address with outlier approaches including genotyping errors, underlying population structure and false positives, variation in mutation rate and limited sensitivity (false negatives). In this study, we evaluated multiple outlier tests and their type I (false positive) and type II (false negative) error rates in a series of simulated data sets. Comparisons included simulation procedures (FDIST2, ARLEQUIN v.3.5 and BAYESCAN) as well as more conventional tools such as global F(ST) histograms. Of the three simulation methods, FDIST2 and BAYESCAN typically had the lowest type II error, BAYESCAN had the least type I error and Arlequin had highest type I and II error. High error rates in Arlequin with a hierarchical approach were partially because of confounding scenarios where patterns of adaptive variation were contrary to neutral structure; however, Arlequin consistently had highest type I and type II error in all four simulation scenarios tested in this study. Given the results provided here, it is important that outlier loci are interpreted cautiously and error rates of various methods are taken into consideration in studies of adaptive molecular variation, especially when hierarchical structure is included.  相似文献   

12.
Ewald (1994) has suggested that vector-borne parasites are expected to evolve a higher level of host exploitation than directly transmitted parasites, and this should thereby result in them being more virulent. Indeed, some data do conform to this general pattern. Nevertheless, his hypothesis has generated some debate about the extent to which it is valid. I explore this issue quantitatively within the framework of mathematical epidemiology. In particular, I present a dynamic optimization model for the evolution of parasite replication strategies that explicitly explores the validity of this hypothesis. A few different model assumptions are explored and it is found that Ewald's hypothesis has only qualified support as a general explanation for why vector-borne parasites are more virulent than those that are directly transmitted. I conclude by suggesting that an alternative explanation might lie in differences in inoculum size between these two types of transmission.  相似文献   

13.
MOTIVATION: The power of microarray analyses to detect differential gene expression strongly depends on the statistical and bioinformatical approaches used for data analysis. Moreover, the simultaneous testing of tens of thousands of genes for differential expression raises the 'multiple testing problem', increasing the probability of obtaining false positive test results. To achieve more reliable results, it is, therefore, necessary to apply adjustment procedures to restrict the family-wise type I error rate (FWE) or the false discovery rate. However, for the biologist the statistical power of such procedures often remains abstract, unless validated by an alternative experimental approach. RESULTS: In the present study, we discuss a multiplicity adjustment procedure applied to classical univariate as well as to recently proposed multivariate gene-expression scores. All procedures strictly control the FWE. We demonstrate that the use of multivariate scores leads to a more efficient identification of differentially expressed genes than the widely used MAS5 approach provided by the Affymetrix software tools (Affymetrix Microarray Suite 5 or GeneChip Operating Software). The practical importance of this finding is successfully validated using real time quantitative PCR and data from spike-in experiments. AVAILABILITY: The R-code of the statistical routines can be obtained from the corresponding author. CONTACT: Schuster@imise.uni-leipzig.de  相似文献   

14.
Use of historical data and real-world evidence holds great potential to improve the efficiency of clinical trials. One major challenge is to effectively borrow information from historical data while maintaining a reasonable type I error and minimal bias. We propose the elastic prior approach to address this challenge. Unlike existing approaches, this approach proactively controls the behavior of information borrowing and type I errors by incorporating a well-known concept of clinically significant difference through an elastic function, defined as a monotonic function of a congruence measure between historical data and trial data. The elastic function is constructed to satisfy a set of prespecified criteria such that the resulting prior will strongly borrow information when historical and trial data are congruent, but refrain from information borrowing when historical and trial data are incongruent. The elastic prior approach has a desirable property of being information borrowing consistent, that is, asymptotically controls type I error at the nominal value, no matter that historical data are congruent or not to the trial data. Our simulation study that evaluates the finite sample characteristic confirms that, compared to existing methods, the elastic prior has better type I error control and yields competitive or higher power. The proposed approach is applicable to binary, continuous, and survival endpoints.  相似文献   

15.
MOTIVATION: Multiple hypothesis testing is a common problem in genome research, particularly in microarray experiments and genomewide association studies. Failure to account for the effects of multiple comparisons would result in an abundance of false positive results. The Bonferroni correction and Holm's step-down procedure are overly conservative, whereas the permutation test is time-consuming and is restricted to simple problems. RESULTS: We developed an efficient Monte Carlo approach to approximating the joint distribution of the test statistics along the genome. We then used the Monte Carlo distribution to evaluate the commonly used criteria for error control, such as familywise error rates and positive false discovery rates. This approach is applicable to any data structures and test statistics. Applications to simulated and real data demonstrate that the proposed approach provides accurate error control, and can be substantially more powerful than the Bonferroni and Holm methods, especially when the test statistics are highly correlated.  相似文献   

16.
The problem of constructing a dendrogram depicting phylogenetic relationships for a collection of contemporary species is considered. An approach was developed based on the additive hypothesis in which each “length” between two species can be described by the shortest sum of lengths for the individual links on the dendrogram topology which connect the two species. The additive hypothesis holds equally well if the dendro gram is replaced by its corresponding (rootless) network. Network topologies are defined set theoretically in terms of the initial, contemporary species, and a coefficient is defined for each point of any conceivable network. It is proved mathematically that each point of an additive network gives a coefficient value of zero, whereas each point not belonging to an additive network gives a coefficient value greater than zero. This suggests an iterative procedure in which “false” network points are replaced by “true” ones, or more generally in which “very false” network points are replaced by “nearly true” ones. The first procedure follows from the mathematical proof and the second is confirmed by simulation. Since most real data sets are not additive in the strict sense, a real data example was presented in which the iterative procedure produced a plausible network topology.  相似文献   

17.
Determining the residency of an aquatic species is important but challenging and it remains unclear what is the best sampling methodology. Photo-identification has been used extensively to estimate patterns of animals' residency and is arguably the most common approach, but it may not be the most effective approach in marine environments. To examine this, in 2005, we deployed acoustic transmitters on 22 white sharks (Carcharodon carcharias) in Mossel Bay, South Africa to quantify the probability of detecting these tagged sharks by photo-identification and different deployment strategies of acoustic telemetry equipment. Using the data collected by the different sampling approaches (detections from an acoustic listening station deployed under a chumming vessel versus those from visual sightings and photo-identification), we quantified the methodologies' probability of detection and determined if the sampling approaches, also including an acoustic telemetry array, produce comparable results for patterns of residency. Photo-identification had the lowest probability of detection and underestimated residency. The underestimation is driven by various factors primarily that acoustic telemetry monitors a large area and this reduces the occurrence of false negatives. Therefore, we propose that researchers need to use acoustic telemetry and also continue to develop new sampling approaches as photo-identification techniques are inadequate to determine residency. Using the methods presented in this paper will allow researchers to further refine sampling approaches that enable them to collect more accurate data that will result in better research and more informed management efforts and policy decisions.  相似文献   

18.
Aim Various methods are employed to recover patterns of area relationships in extinct and extant clades. The fidelity of these patterns can be adversely affected by sampling error in the form of missing data. Here we use simulation studies to evaluate the sensitivity of an analytical biogeographical method, namely tree reconciliation analysis (TRA), to this form of sampling failure. Location Simulation study. Methods To approximate varying degrees of taxonomic sampling failure within phylogenies varying in size and in redundancy of biogeographical signal, we applied sequential pruning protocols to artificial taxon–area cladograms displaying congruent patterns of area relationships. Initial trials assumed equal probability of sampling failure among all areas. Additional trials assigned weighted probabilities to each of the areas in order to explore the effects of uneven geographical sampling. Pruned taxon–area cladograms were then analysed with TRA to determine if the optimal area cladograms recovered match the original biogeographical signal, or if they represent false, ambiguous or uninformative signals. Results The results indicate a period of consistently accurate recovery of the true biogeographical signal, followed by a nonlinear decrease in signal recovery as more taxa are pruned. At high levels of sampling failure, false biogeographical signals are more likely to be recovered than the true signal. However, randomization testing for statistical significance greatly decreases the chance of accepting false signals. The primary inflection of the signal recovery curve, and its steepness and slope depend upon taxon–area cladogram size and area redundancy, as well as on the evenness of sampling. Uneven sampling across geographical areas is found to have serious deleterious effects on TRA, with the accuracy of recovery of biogeographical signal varying by an order of magnitude or more across different sampling regimes. Main conclusions These simulations reiterate the importance of taxon sampling in biogeographical analysis, and attest to the importance of considering geographical, as well as overall, sampling failure when interpreting the robustness of biogeographical signals. In addition to randomization testing for significance, we suggest the use of randomized sequential taxon deletions and the construction of signal decay curves as a means to assess the robustness of biogeographical signals for empirical data sets.  相似文献   

19.
Hemoglobin (Hb) is probably the most thoroughly studied protein in the human body. However, it has recently been proposed that in addition to the well known function of dioxygen and carbon dioxide transporter, one of the main roles of hemoglobin is to store and transport nitrogen monoxide. This hypothesis is highly disputed and is in contrast to the proposal that hemoglobin serves as an NO. scavenger in the blood. In this short review, I have presented the current status of research on the much-debated mechanism of the reaction between circulating hemoglobin and NO.. Despite the fact that oxyHb is extremely rapidly oxidized by NO., under basal physiological conditions the biological activity of NO. in the blood vessels is not completely lost. It has been shown that three factors reduce the efficiency of hemoglobin to scavenge NO.: a so-called red blood cell-free zone created close to the vessel wall by intravascular flow, an undisturbed layer around the red blood cells--where the NO. concentration is much smaller than the bulk concentration--and/or the red blood cell membrane. Alternatively, it has been proposed that NO. binds to Cys beta 93 of oxyHb, is liberated after deoxygenation of Hb, and consequently allows for a more effective delivery of O2 to peripheral tissues. However, because of the extremely fast rate of the reaction between NO. and oxyHb, experiments in vitro lead to artefactual production of large amounts of S-nitroso-hemoglobin. These results, together with other data, which challenge most steps of the NO.-transporter hypothesis, are discussed.  相似文献   

20.
In quantitative proteomics work, the differences in expression of many separate proteins are routinely examined to test for significant differences between treatments. This leads to the multiple hypothesis testing problem: when many separate tests are performed many will be significant by chance and be false positive results. Statistical methods such as the false discovery rate method that deal with this problem have been disseminated for more than one decade. However a survey of proteomics journals shows that such tests are not widely implemented in one commonly used technique, quantitative proteomics using two-dimensional electrophoresis. We outline a selection of multiple hypothesis testing methods, including some that are well known and some lesser known, and present a simple strategy for their use by the experimental scientist in quantitative proteomics work generally. The strategy focuses on the desirability of simultaneous use of several different methods, the choice and emphasis dependent on research priorities and the results in hand. This approach is demonstrated using case scenarios with experimental and simulated model data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号