首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Bounds on the minimum number of recombination events in a sample history   总被引:11,自引:0,他引:11  
Myers SR  Griffiths RC 《Genetics》2003,163(1):375-394
Recombination is an important evolutionary factor in many organisms, including humans, and understanding its effects is an important task facing geneticists. Detecting past recombination events is thus important; this article introduces statistics that give a lower bound on the number of recombination events in the history of a sample, on the basis of the patterns of variation in the sample DNA. Such lower bounds are appropriate, since many recombination events in the history are typically undetectable, so the true number of historical recombinations is unobtainable. The statistics can be calculated quickly by computer and improve upon the earlier bound of Hudson and Kaplan 1985. A method is developed to combine bounds on local regions in the data to produce more powerful improved bounds. The method is flexible to different models of recombination occurrence. The approach gives recombination event bounds between all pairs of sites, to help identify regions with more detectable recombinations, and these bounds can be viewed graphically. Under coalescent simulations, there is a substantial improvement over the earlier method (of up to a factor of 2) in the expected number of recombination events detected by one of the new minima, across a wide range of parameter values. The method is applied to data from a region within the lipoprotein lipase gene and the amount of detected recombination is substantially increased. Further, there is strong clustering of detected recombination events in an area near the center of the region. A program implementing these statistics, which was used for this article, is available from http://www.stats.ox.ac.uk/mathgen/programs.html.  相似文献   

2.
The multiple testing problem attributed to gene expression analysis is challenging not only by its size, but also by possible dependence between the expression levels of different genes resulting from coregulations of the genes. Furthermore, the measurement errors of these expression levels may be dependent as well since they are subjected to several technical factors. Multiple testing of such data faces the challenge of correlated test statistics. In such a case, the control of the False Discovery Rate (FDR) is not straightforward, and thus demands new approaches and solutions that will address multiplicity while accounting for this dependency. This paper investigates the effects of dependency between bormal test statistics on FDR control in two-sided testing, using the linear step-up procedure (BH) of Benjamini and Hochberg (1995). The case of two multiple hypotheses is examined first. A simulation study offers primary insight into the behavior of the FDR subjected to different levels of correlation and distance between null and alternative means. A theoretical analysis follows in order to obtain explicit upper bounds to the FDR. These results are then extended to more than two multiple tests, thereby offering a better perspective on the effect of the proportion of false null hypotheses, as well as the structure of the test statistics correlation matrix. An example from gene expression data analysis is presented.  相似文献   

3.
In this paper, we derive score test statistics to discriminate between proportional hazards and proportional odds models for grouped survival data. These models are embedded within a power family transformation in order to obtain the score tests. In simple cases, some small-sample results are obtained for the score statistics using Monte Carlo simulations. Score statistics have distributions well approximated by the chi-squared distribution. Real examples illustrate the proposed tests.  相似文献   

4.
The computation of surface correlations using a variety of molecular models has been applied to the unbound protein docking problem. Because of the computational complexity involved in examining all possible molecular orientations, the fast Fourier transform (FFT) (a fast numerical implementation of the discrete Fourier transform (DFT)) is generally applied to minimize the number of calculations. This approach is rooted in the convolution theorem which allows one to inverse transform the product of two DFTs in order to perform the correlation calculation. However, such a DFT calculation results in a cyclic or "circular" correlation which, in general, does not lead to the same result as the linear correlation desired for the docking problem. In this work, we provide computational bounds for constructing molecular models used in the molecular surface correlation problem. The derived bounds are then shown to be consistent with various intuitive guidelines previously reported in the protein docking literature. Finally, these bounds are applied to different molecular models in order to investigate their effect on the correlation calculation.  相似文献   

5.
Asymptotically correct 90 and 95 percentage points are given for multiple comparisons with control and for all pair comparisons of several independent samples of equal size from polynomial distributions. Test statistics are the maxima of the X2-statistics for single comparisons. For only two categories the asymptotic distributions of these test statistics result from DUNNETT'S many-one tests and TUKEY'S range test (cf. MILLER, 1981). The percentage points for comparisons with control are computed from the limit distribution of the test statistic under the overall hypothesis H0. To some extent the applicability of these bounds is investigated by simulation. The bounds can also be used to improve Holm's sequentially rejective Bonferroni test procedure (cf. HOLM, 1979). The percentage points for all pair comparisons are obtained by large simulations. Especially for 3×3-tables the limit distribution of the test statistic under H0 is derived also for samples of unequal size. Also these bounds can improve the corresponding Bonferroni-Holm procedure. Finally from SKIDÁK's probability inequality for normal random vectors (cf. SKIDÁK, 1967) a similar inequality is derived for dependent X2-variables applicable to simultaneous X2-tests.  相似文献   

6.
Correlation coefficients among multiple variables are commonly described in the form of matrices. Applications of such correlation matrices can be found in many fields, such as finance, engineering, statistics, and medicine. This article proposes an efficient way to sequentially obtain the theoretical bounds of correlation coefficients together with an algorithm to generate n n correlation matrices using any bounded random variables. Interestingly, the correlation matrices generated by this method using uniform random variables as an example produce more extreme relationships among the variables than other methods, which might be useful for modeling complex biological systems where rare cases are very important.  相似文献   

7.
Bowersox  Mark A.  Brown  Daniel G. 《Plant Ecology》2001,156(1):89-103
The use of statistics of landscape pattern to infer ecological process at ecotones requires knowledge of the specific sensitivities of statistics to ecotone characteristics. In this study, sets of patch-based and boundary-based statistics were evaluated to assess their suitability as measures of abruptness on simulated ecotone landscapes. We generated 50 realizations each for 25 groups of ecotones that varied systematically in their degree of abruptness and patchiness. Factorial ANOVA was used to evaluate the sensitivity of statistics to the known differences among the simulated groups. Suitability of each index for measuring abruptness was evaluated using the ANOVA results. The statistics were then ranked in order of their suitability as abruptness statistics based on their sensitivity to abruptness, the consistency of the relationship, and their lack of sensitivity to patchiness. The two best statistics for quantifying abruptness were those we developed based on lattice delineation methods, and are called cumulative boundary elements and boundary element dispersion. The results of this research provide support for studies of ecotone process that rely on the interpretation of patch or boundary statistics.  相似文献   

8.
In this paper we derive entropy bounds for hierarchical networks. More precisely, starting from a recently introduced measure to determine the topological entropy of non-hierarchical networks, we provide bounds for estimating the entropy of hierarchical graphs. Apart from bounds to estimate the entropy of a single hierarchical graph, we see that the derived bounds can also be used for characterizing graph classes. Our contribution is an important extension to previous results about the entropy of non-hierarchical networks because for practical applications hierarchical networks are playing an important role in chemistry and biology. In addition to the derivation of the entropy bounds, we provide a numerical analysis for two special graph classes, rooted trees and generalized trees, and demonstrate hereby not only the computational feasibility of our method but also learn about its characteristics and interpretability with respect to data analysis.  相似文献   

9.
Alignment of sequences is an important routine in various areas of science, notably molecular biology. Multiple sequence alignment is a computationally hard optimization problem which involves the consideration of different possible alignments in order to find an optimal one, given a measure of goodness of alignments. Dynamic programming algorithms are generally well suited for the search of optimal alignments, but are constrained by unwieldy space requirements for large numbers of sequences. Carrillo and Lipman devised a method that helps to reduce the search space for an optimal alignment under a sum-of-pairs measure using bounds on the scores of its pairwise projections. In this paper, we generalize Carrillo and Lipman bounds and demonstrate a novel approach for finding optimal sum-of-pairs multiple alignments that allows incremental pruning of the optimal alignment search space. This approach can result in a drastic pruning of the final search space polytope (where we search for the optimal alignment) when compared to Carrillo and Lipman's approach and hence allows many runs that are not feasible with the original method.  相似文献   

10.
In multiple linear regression, test for the discordancy of a single outlier in the response variable is usually based on the ‘maximum studentized residual’ statistic. Exact critical values for the test statistic t are not available. Upper bounds for the critical values have been found by SRIKANTAN (1961), PRESCOTT (1975) and LUND (1975). In this note we show that all these upper bounds are algebraically equivalent.  相似文献   

11.
We implement a Bayesian Markov chain Monte Carlo algorithm for estimating species divergence times that uses heterogeneous data from multiple gene loci and accommodates multiple fossil calibration nodes. A birth-death process with species sampling is used to specify a prior for divergence times, which allows easy assessment of the effects of that prior on posterior time estimates. We propose a new approach for specifying calibration points on the phylogeny, which allows the use of arbitrary and flexible statistical distributions to describe uncertainties in fossil dates. In particular, we use soft bounds, so that the probability that the true divergence time is outside the bounds is small but nonzero. A strict molecular clock is assumed in the current implementation, although this assumption may be relaxed. We apply our new algorithm to two data sets concerning divergences of several primate species, to examine the effects of the substitution model and of the prior for divergence times on Bayesian time estimation. We also conduct computer simulation to examine the differences between soft and hard bounds. We demonstrate that divergence time estimation is intrinsically hampered by uncertainties in fossil calibrations, and the error in Bayesian time estimates will not go to zero with increased amounts of sequence data. Our analyses of both real and simulated data demonstrate potentially large differences between divergence time estimates obtained using soft versus hard bounds and a general superiority of soft bounds. Our main findings are as follows. (1) When the fossils are consistent with each other and with the molecular data, and the posterior time estimates are well within the prior bounds, soft and hard bounds produce similar results. (2) When the fossils are in conflict with each other or with the molecules, soft and hard bounds behave very differently; soft bounds allow sequence data to correct poor calibrations, while poor hard bounds are impossible to overcome by any amount of data. (3) Soft bounds eliminate the need for "safe" but unrealistically high upper bounds, which may bias posterior time estimates. (4) Soft bounds allow more reliable assessment of estimation errors, while hard bounds generate misleadingly high precisions when fossils and molecules are in conflict.  相似文献   

12.
This paper addresses treatment effect heterogeneity (also referred to, more compactly, as 'treatment heterogeneity') in the context of a controlled clinical trial with binary endpoints. Treatment heterogeneity, variation in the true (causal) individual treatment effects, is explored using the concept of the potential outcome. This framework supposes the existance of latent responses for each subject corresponding to each possible treatment. In the context of a binary endpoint, treatment heterogeniety may be represented by the parameter, pi2, the probability that an individual would have a failure on the experimental treatment, if received, and would have a success on control, if received. Previous research derived bounds for pi2 based on matched pairs data. The present research extends this method to the blocked data context. Estimates (and their variances) and confidence intervals for the bounds are derived. We apply the new method to data from a renal disease clinical trial. In this example, bounds based on the blocked data are narrower than the corresponding bounds based only on the marginal success proportions. Some remaining challenges (including the possibility of further reducing bound widths) are discussed.  相似文献   

13.
Colin B. Fogarty 《Biometrics》2023,79(3):2196-2207
We develop sensitivity analyses for the sample average treatment effect in matched observational studies while allowing unit-level treatment effects to vary. The methods may be applied to studies using any optimal without-replacement matching algorithm. In contrast to randomized experiments and to paired observational studies, we show for general matched designs that over a large class of test statistics, any procedure bounding the worst-case expectation while allowing for arbitrary effect heterogeneity must be unnecessarily conservative if treatment effects are actually constant across individuals. We present a sensitivity analysis which bounds the worst-case expectation while allowing for effect heterogeneity, and illustrate why it is generally conservative if effects are constant. An alternative procedure is presented that is asymptotically sharp if treatment effects are constant, and that is valid for testing the sample average effect under additional restrictions which may be deemed benign by practitioners. Simulations demonstrate that this alternative procedure results in a valid sensitivity analysis for the weak null hypothesis under a host of reasonable data-generating processes. The procedures allow practitioners to assess robustness of estimated sample average treatment effects to hidden bias while allowing for effect heterogeneity in matched observational studies.  相似文献   

14.

Background:  

In order to compute pattern statistics in computational biology a Markov model is commonly used to take into account the sequence composition. Usually its parameter must be estimated. The aim of this paper is to determine how sensitive these statistics are to parameter estimation, and what are the consequences of this variability on pattern studies (finding the most over-represented words in a genome, the most significant common words to a set of sequences,...).  相似文献   

15.
In this note we present improved Bonferroni type inequalities for the union of a set of nonexchangeable events. These new bounds are stronger than or equivalent with the bounds proposed by KOUNIAS (1968), HUNTER (1976), MARGOLIN & MAURER (1976), GALAMBOS (1977) and by Mǎrgǎkritescu (1983). In the particular case of the exchangeable events, the bounds given by SOBEL & UPPULURI (1972) can be derived. A generalization in terms of a partition of {1, 2, …, n} is also established.  相似文献   

16.
Summary statistics of various hematologic and serum biochemical measures are presented for a colony of 74 chimpanzees (Pan troglodytes). Covariance analysis of longitudinal values revealed a progression of some measures with maturity. Equations for evaluating these measures as they relate to the health of individual colony members and new additions to the colony were formulated. From these equations, confidence bounds (95%), which can be regarded as normative ranges, were established for each of the measures. The literature on hematologic and serum biochemical values in the chimpanzee, especially as they pertain to the evaluation and progression of values, is reviewed.  相似文献   

17.
Chen H  Stasny EA  Wolfe DA 《Biometrics》2006,62(1):150-158
The application of ranked set sampling (RSS) techniques to data from a dichotomous population is currently an active research topic, and it has been shown that balanced RSS leads to improvement in precision over simple random sampling (SRS) for estimation of a population proportion. Balanced RSS, however, is not in general optimal in terms of variance reduction for this setting. The objective of this article is to investigate the application of unbalanced RSS in estimation of a population proportion under perfect ranking, where the probabilities of success for the order statistics are functions of the underlying population proportion. In particular, the Neyman allocation, which assigns sample units for each order statistic proportionally to its standard deviation, is shown to be optimal in the sense that it leads to minimum variance within the class of RSS estimators that are simple averages of the means of the order statistics. We also use a substantial data set, the National Health and Nutrition Examination Survey III (NHANES III) data, to demonstrate the feasibility and benefits of Neyman allocation in RSS for binary variables.  相似文献   

18.
In a clinical trial, statistical reports are typically concerned about the mean difference in two groups. Now there is increasing interest in the heterogeneity of the treatment effect, which has important implications in treatment evaluation and selection. The treatment harm rate (THR), which is defined by the proportion of people who has a worse outcome on the treatment compared to the control, was used to characterize the heterogeneity. Since THR involves the joint distribution of the two potential outcomes, it cannot be identified without further assumptions even in the randomized trials. We can only derive the simple bounds with the observed data. But the simple bounds are usually too wide. In this paper, we use a secondary outcome that satisfies the monotonicity assumption to tighten the bounds. It is shown that the bounds we derive cannot be wider than the simple bounds. We also construct some simulation studies to assess the performance of our bounds in finite sample. The results show that a secondary outcome, which is more closely related to the primary outcome, can lead to narrower bounds. Finally, we illustrate the application of the proposed bounds in a randomized clinical trial of determining whether the intensive glycemia could reduce the risk of development or progression of diabetic retinopathy.  相似文献   

19.
We present an approach for using kinetic theory to capture first and second order statistics of neuronal activity. We coarse grain neuronal networks into populations of neurons and calculate the population average firing rate and output cross-correlation in response to time varying correlated input. We derive coupling equations for the populations based on first and second order statistics of the network connectivity. This coupling scheme is based on the hypothesis that second order statistics of the network connectivity are sufficient to determine second order statistics of neuronal activity. We implement a kinetic theory representation of a simple feed-forward network and demonstrate that the kinetic theory model captures key aspects of the emergence and propagation of correlations in the network, as long as the correlations do not become too strong. By analyzing the correlated activity of feed-forward networks with a variety of connectivity patterns, we provide evidence supporting our hypothesis of the sufficiency of second order connectivity statistics. Action Editor: Carson C. Chow  相似文献   

20.
Replicability, the ability to replicate scientific findings, is a prerequisite for scientific discovery and clinical utility. Troublingly, we are in the midst of a replicability crisis. A key to replicability is that multiple measurements of the same item (e.g., experimental sample or clinical participant) under fixed experimental constraints are relatively similar to one another. Thus, statistics that quantify the relative contributions of accidental deviations—such as measurement error—as compared to systematic deviations—such as individual differences—are critical. We demonstrate that existing replicability statistics, such as intra-class correlation coefficient and fingerprinting, fail to adequately differentiate between accidental and systematic deviations in very simple settings. We therefore propose a novel statistic, discriminability, which quantifies the degree to which an individual’s samples are relatively similar to one another, without restricting the data to be univariate, Gaussian, or even Euclidean. Using this statistic, we introduce the possibility of optimizing experimental design via increasing discriminability and prove that optimizing discriminability improves performance bounds in subsequent inference tasks. In extensive simulated and real datasets (focusing on brain imaging and demonstrating on genomics), only optimizing data discriminability improves performance on all subsequent inference tasks for each dataset. We therefore suggest that designing experiments and analyses to optimize discriminability may be a crucial step in solving the replicability crisis, and more generally, mitigating accidental measurement error.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号