首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Innan H  Zhang K  Marjoram P  Tavaré S  Rosenberg NA 《Genetics》2005,169(3):1763-1777
Several tests of neutral evolution employ the observed number of segregating sites and properties of the haplotype frequency distribution as summary statistics and use simulations to obtain rejection probabilities. Here we develop a “haplotype configuration test” of neutrality (HCT) based on the full haplotype frequency distribution. To enable exact computation of rejection probabilities for small samples, we derive a recursion under the standard coalescent model for the joint distribution of the haplotype frequencies and the number of segregating sites. For larger samples, we consider simulation-based approaches. The utility of the HCT is demonstrated in simulations of alternative models and in application to data from Drosophila melanogaster.  相似文献   

2.
Multilocus genotype probabilities, estimated using the assumption of independent association of alleles within and across loci, are subject to sampling fluctuation, since allele frequencies used in such computations are derived from samples drawn from a population. We derive exact sampling variances of estimated genotype probabilities and provide simple approximation of sampling variances. Computer simulations conducted using real DNA typing data indicate that, while the sampling distribution of estimated genotype probabilities is not symmetric around the point estimate, the confidence interval of estimated (single-locus or multilocus) genotype probabilities can be obtained from the sampling of a logarithmic transformation of the estimated values. This, in turn, allows an examination of heterogeneity of estimators derived from data on different reference populations. Applications of this theory to DNA typing data at VNTR loci suggest that use of different reference population data may yield significantly different estimates. However, significant differences generally occur with rare (less than 1 in 40,000) genotype probabilities. Conservative estimates of five-locus DNA profile probabilities are always less than 1 in 1 million in an individual from the United States, irrespective of the racial/ethnic origin.  相似文献   

3.
Protein folding, stability, and function are usually influenced by pH. And free energy plays a fundamental role in analysis of such pH-dependent properties. Electrostatics-based theoretical framework using dielectric solvent continuum model and solving Poisson-Boltzmann equation numerically has been shown to be very successful in understanding the pH-dependent properties. However, in this approach the exact computation of pH-dependent free energy becomes impractical for proteins possessing more than several tens of ionizable sites (e.g.>30), because exact evaluation of the partition function requires a summation over a vast number of possible protonation microstates. Here we present a method which computes the free energy using the average energy and the protonation probabilities of ionizable sites obtained by the well-established Monte Carlo sampling procedure. The key feature is to calculate the entropy by using the protonation probabilities. We used this method to examine a well-studied protein (lysozyme) and produced results which agree very well with the exact calculations. Applications to the optimum pH of maximal stability of proteins and protein–DNA interactions have also resulted in good agreement with experimental data. These examples recommend our method for application to the elucidation of the pH-dependent properties of proteins.  相似文献   

4.
Sensitivity and specificity have traditionally been used to assess the performance of a diagnostic procedure. Diagnostic procedures with both high sensitivity and high specificity are desirable, but these procedures are frequently too expensive, hazardous, and/or difficult to operate. A less sophisticated procedure may be preferred, if the loss of the sensitivity or specificity is determined to be clinically acceptable. This paper addresses the problem of simultaneous testing of sensitivity and specificity for an alternative test procedure with a reference test procedure when a gold standard is present. The hypothesis is formulated as a compound hypothesis of two non‐inferiority (one‐sided equivalence) tests. We present an asymptotic test statistic based on the restricted maximum likelihood estimate in the framework of comparing two correlated proportions under the prospective and retrospective sampling designs. The sample size and power of an asymptotic test statistic are derived. The actual type I error and power are calculated by enumerating the exact probabilities in the rejection region. For applications that require high sensitivity as well as high specificity, a large number of positive subjects and a large number of negative subjects are needed. We also propose a weighted sum statistic as an alternative test by comparing a combined measure of sensitivity and specificity of the two procedures. The sample size determination is independent of the sampling plan for the two tests.  相似文献   

5.
Assuming a lognormally distributed measure of bioavailability, individual bioequivalence is defined as originally proposed by Anderson and Hauck (1990) and Wellek (1990; 1993). For the posterior probability of the associated statistical hypothesis with respect to a noninformative reference prior, a numerically efficient algorithm is constructed which serves as the building block of a procedure for computing exact rejection probabilities of the Bayesian test under arbitrary parameter constellations. By means of this tool, the Bayesian test can be shown to maintain the significance level without being over‐conservative and to yield gains in power of up to 30% as compared to the distribution‐free procedure which gained some popularity under the name TIER. Moreover, it is shown that the Bayesian construction also allows scaling of the probability‐based criterion with respect to the proportion of subjects exhibiting bioequivalent responses to repeated administrations of the reference formulation of the drug under study.  相似文献   

6.
A structure for representing problems in decision analysis and in expert systems, which reason under uncertainty, is the influence diagram or causal network. A causal network consists of an underlying joint probability distribution and a directed acyclic graph in which a propositional variable that represents a marginal distribution is stored at each vertex in the graph. This paper is concerned with two of the problems in applications that use causal networks. The first problem is the determination of the conditional probabilities of the values of remaining propositional variables in the network given that certain variables are instantiated for particular values. This is called probability propagation. The second problem is the determination of the most probable, second most probable, third most probable, and so on sets of values of a particular set of variables (called the explanation set) given that certain variables are instantiated for particular values. This problem is called abductive inference. There exists a class of causal networks in which each variable has only two parents, for which the time is required, by any known method, for probability propagation is exponential relative to the number of vertices in the network. The determination of a new method that would be efficient for all causal networks appears unlikely, because probability propagation has been shown to be #P-complete. In many medical applications, networks are often large and not sparsely connected. Therefore a method for the exact determination of probability values appears unlikely for such applications, and the development of approximation methods seems to be the best solution. The current approximation methods obtain interval bounds for the probability values. When such intervals are obtained, it is not possible in general to rank the alternatives. In this paper, a method is developed for obtaining expected values for the point probabilities from interval constraints on the probabilities. The method is based on an application of the principle of indifference to the probability values themselves. The distributions obtained with the principle of indifference are a generalization of the symmetric Dirichlet distribution in which prior ignorance is assumed.  相似文献   

7.
The procedures currently used for testing the bioequivalence of two drug formulations achieve control over the error probability of erroneously accepting bioequivalence or over the probability of erroneous rejection, but not over both error probabilities. A two-stage procedure that rectifies this drawback is presented, assuming that the performance of the drug is characterized by a normally distributed variate.  相似文献   

8.
The evolution of the probabilities of genetic identity within and between tandemly repeated loci of a multigene family is investigated analytically and numerically. Unbiased intrachromosomal gene conversion, equal crossing over, random genetic drift, and mutation to new alleles are incorporated. Generations are discrete and nonoverlapping; the diploid, monoecious population mates at random. Under the restriction that there is at most one crossover in the multigene family per individual per generation, the dependence on location of the probabilities of identity is treated exactly. In the "homogeneous" approximation to this "exact" model, end effects are disregarded; in the "exchangeable" approximation, to which all previous work was confined, all position dependence is neglected. Numerical results indicate that the exchangeable and homogeneous models are both qualitatively correct, the exchangeable model is sometimes too inaccurate for quantitative conclusions, and the homogeneous model is always more accurate than the exchangeable one and is always sufficiently accurate for quantitative conclusions.  相似文献   

9.
Markov chain Monte Carlo (MCMC) methods have been proposed to overcome computational problems in linkage and segregation analyses. This approach involves sampling genotypes at the marker and trait loci. Scalar-Gibbs is easy to implement, and it is widely used in genetics. However, the Markov chain that corresponds to scalar-Gibbs may not be irreducible when the marker locus has more than two alleles, and even when the chain is irreducible, mixing has been observed to be slow. These problems do not arise if the genotypes are sampled jointly from the entire pedigree. This paper proposes a method to jointly sample genotypes. The method combines the Elston-Stewart algorithm and iterative peeling, and is called the ESIP sampler. For a hypothetical pedigree, genotype probabilities are estimated from samples obtained using ESIP and also scalar-Gibbs. Approximate probabilities were also obtained by iterative peeling. Comparisons of these with exact genotypic probabilities obtained by the Elston-Stewart algorithm showed that ESIP and iterative peeling yielded genotypic probabilities that were very close to the exact values. Nevertheless, estimated probabilities from scalar-Gibbs with a chain of length 235 000, including a burn-in of 200 000 steps, were less accurate than probabilities estimated using ESIP with a chain of length 10 000, with a burn-in of 5 000 steps. The effective chain size (ECS) was estimated from the last 25 000 elements of the chain of length 125 000. For one of the ESIP samplers, the ECS ranged from 21 579 to 22 741, while for the scalar-Gibbs sampler, the ECS ranged from 64 to 671. Genotype probabilities were also estimated for a large real pedigree consisting of 3 223 individuals. For this pedigree, it is not feasible to obtain exact genotype probabilities by the Elston-Stewart algorithm. ESIP and iterative peeling yielded very similar results. However, results from scalar-Gibbs were less accurate.  相似文献   

10.
Stephens and Donnelly have introduced a simple yet powerful importance sampling scheme for computing the likelihood in population genetic models. Fundamental to the method is an approximation to the conditional probability of the allelic type of an additional gene, given those currently in the sample. As noted by Li and Stephens, the product of these conditional probabilities for a sequence of draws that gives the frequency of allelic types in a sample is an approximation to the likelihood, and can be used directly in inference. The aim of this note is to demonstrate the high level of accuracy of "product of approximate conditionals" (PAC) likelihood when used with microsatellite data. Results obtained on simulated microsatellite data show that this strategy leads to a negligible bias over a wide range of the scaled mutation parameter theta. Furthermore, the sampling variance of likelihood estimates as well as the computation time are lower than that obtained with importance sampling on the whole range of theta. It follows that this approach represents an efficient substitute to IS algorithms in computer intensive (e.g. MCMC) inference methods in population genetics.  相似文献   

11.
Efficiently computing the Robinson-Foulds metric.   总被引:1,自引:0,他引:1  
The Robinson-Foulds (RF) metric is the measure most widely used in comparing phylogenetic trees; it can be computed in linear time using Day's algorithm. When faced with the need to compare large numbers of large trees, however, even linear time becomes prohibitive. We present a randomized approximation scheme that provides, in sublinear time and with high probability, a (1 + epsilon) approximation of the true RF metric. Our approach is to use a sublinear-space embedding of the trees, combined with an application of the Johnson-Lindenstrauss lemma to approximate vector norms very rapidly. We complement our algorithm by presenting an efficient embedding procedure, thereby resolving an open issue from the preliminary version of this paper. We have also improved the performance of Day's (exact) algorithm in practice by using techniques discovered while implementing our approximation scheme. Indeed, we give a unified framework for edge-based tree algorithms in which implementation tradeoffs are clear. Finally, we present detailed experimental results illustrating the precision and running-time tradeoffs as well as demonstrating the speed of our approach. Our new implementation, FastRF, is available as an open-source tool for phylogenetic analysis.  相似文献   

12.
Cheung YK 《Biometrics》2005,61(2):524-531
When comparing follow-up measurements from two independent populations, missing records may arise due to censoring by events whose occurrence is associated with baseline covariates. In these situations, inferences based only on the completely followed observations may be biased if the follow-up measurements and the covariates are correlated. This article describes exact inference for a class of modified U-statistics under covariate-dependent dropouts. The method involves weighing each permutation according to the retention probabilities, and thus requires estimation of the missing data mechanism. The proposed procedure is nonparametric in that no distributional assumption is necessary for the outcome variables and the missingness patterns. Monte Carlo approximation by the Gibbs sampler is proposed, and is shown to be fast and accurate via simulation. The method is illustrated in two small data sets for which asymptotic inferential procedures may not be appropriate.  相似文献   

13.
Determination of the relative gene order on chromosomes is of critical importance in the construction of human gene maps. In this paper we develop a sequential algorithm for gene ordering. We start by comparing three sequential procedures to order three genes on the basis of Bayesian posterior probabilities, maximum-likelihood ratio, and minimal recombinant class. In the second part of the paper we extend sequential procedure based on the posterior probabilities to the general case of g genes. We present a theorem that states that the predicted average probability of committing a decision error, associated with a Bayesian sequential procedure that accepts the hypothesis of a gene-order configuration with posterior probability equal to or greater than pi *, is smaller than 1 - pi *. This theorem holds irrespective of the number of genes, the genetic model, and the source of genetic information. The theorem is an extension of a classical result of Wald, concerning the sum of the actual and the nominal error probabilities in the sequential probability ratio test of two hypotheses. A stepwise strategy for ordering a large number of genes, with control over the decision-error probabilities, is discussed. An asymptotic approximation is provided, which facilitates the calculations with existing computer software for gene mapping, of the posterior probabilities of an order and the error probabilities. We illustrate with some simulations that the stepwise ordering is an efficient procedure.  相似文献   

14.
Innan H  Nordborg M 《Genetics》2003,165(1):437-444
Various expressions related to the length of a conserved haplotype around a polymorphism of known frequency are derived. We obtain exact expressions for the probability that no recombination has occurred in a sample or subsample. We obtain an approximation for the probability that no recombination that could give rise to a detectable recombination event (through the four-gamete test) has occurred. The probabilities can be used to obtain approximate distributions for the length of variously defined haplotypes around a polymorphic site. The implications of our results for data analysis, and in particular for detecting selection, are discussed.  相似文献   

15.
The solution of Boltzmann equation for plasma in magnetic field with arbitrarily degenerate electrons and nondegenerate nuclei is obtained by Chapman?Enskog method. Functions generalizing Sonine polynomials are used for obtaining an approximate solution. Fully ionized plasma is considered. The tensor of the heat conductivity coefficients in nonquantized magnetic field is calculated. For nondegenerate and strongly degenerate plasma the asymptotic analytic formulas are obtained and compared with results of previous authors. The Lorentz approximation with neglecting of electron?electron encounters is asymptotically exact for strongly degenerate plasma. For the first time, analytical expressions for the heat conductivity tensor for nondegenerate electrons in the presence of a magnetic field are obtained in the three-polynomial approximation with account of electron?electron collisions. Account of the third polynomial improved substantially the precision of results. In the two-polynomial approximation, the obtained solution coincides with the published results. For strongly degenerate electrons, an asymptotically exact analytical solution for the heat conductivity tensor in the presence of a magnetic field is obtained for the first time. This solution has a considerably more complicated dependence on the magnetic field than those in previous publications and gives a several times smaller relative value of the thermal conductivity across the magnetic field at ωτ * 0.8.  相似文献   

16.
Using techniques from optimization theory, we have developed a computer program that approximates a desired probability distribution for amino acids by imposing a probability distribution on the four nucleotides in each of the three codon positions. These base probabilities allow for the generation of biased codons for use in mutational studies and in the design of biologically encoded libraries. The dependencies between codons in the genetic code often makes the exact generation of the desired probability distribution for amino acids impossible. Compromises are often necessary. The program, therefore, not only solves for the "optimal" approximation to the desired distribution (where the definition of "optimal" is influenced by several types of parameters entered by the user), but also solves for a number of "sub-optimal" solutions that are classified into families of similar solutions. A representative of each family is presented to the program user, who can then choose the type of approximation that is best for the intended application. The Combinatorial Codons program is available for use over the web from http://www.wi.mit.edu/kim/computing.html.  相似文献   

17.
An asymptotic approximation of the density function of 2-locus 2-allele model with mutual neutral mutations was obtained invoking the small disturbance asymptotic theory. It was shown by comparing the approximate formula with simulations that this asymptotic method gives a good approximation over the whole time evolution when the mutation rates are high, though it does not give good approximations near the stationary state when the mutation rates are low. On the stationary state, the squared standard linkage deviation made up by using the approximate formula was compared with the exact one obtained by Ohta and Kimura (1969b). It gave a good approximation when the recombination rate is high, even under low mutation rates. Furthermore, as an application of the asymptotic method, The Ancestral Recombination Graph (ARG) was considered.  相似文献   

18.

The inherent stochasticity of gene expression in the context of regulatory networks profoundly influences the dynamics of the involved species. Mathematically speaking, the propagators which describe the evolution of such networks in time are typically defined as solutions of the corresponding chemical master equation (CME). However, it is not possible in general to obtain exact solutions to the CME in closed form, which is due largely to its high dimensionality. In the present article, we propose an analytical method for the efficient approximation of these propagators. We illustrate our method on the basis of two categories of stochastic models for gene expression that have been discussed in the literature. The requisite procedure consists of three steps: a probability-generating function is introduced which transforms the CME into (a system of) partial differential equations (PDEs); application of the method of characteristics then yields (a system of) ordinary differential equations (ODEs) which can be solved using dynamical systems techniques, giving closed-form expressions for the generating function; finally, propagator probabilities can be reconstructed numerically from these expressions via the Cauchy integral formula. The resulting ‘library’ of propagators lends itself naturally to implementation in a Bayesian parameter inference scheme, and can be generalised systematically to related categories of stochastic models beyond the ones considered here.

  相似文献   

19.
Consider a positively regular, slightly supercritical branching process with K types. An approximation to the probability of survival of a line descended from a single individual of type i has recently been derived by Hoppe. If K is large, however, this approximation may not be easy to compute. A further approximation that is easily computable is given. The result is used to estimate probabilities of survival of an allele A that is originally present in one male or one female in a large, random mating, age-structured population. Both autosomal and sex-linked loci are considered. Another application of the approximation is also discussed.Journal paper no. J-13183 of the Iowa Agriculture and Home Economics Experiment Station, Ames, Iowa, project 2588  相似文献   

20.
We begin with a review of the areas of application of the signed-rank tests (SRTs) and we conclude that the results are exact only if no ties of non-null differences exist. In order to apply the SRTs according to WILCOXON and according to PRATT also in the presence of ties, by assigning midranks, we derive their null distributions. As special cases the null distributions for the problem without ties are obtained. In order to save the practising statistician the time-consuming calculations of the distribution functions, we compute tables of critical values (for reasons of volume they will be published as part of the reprints only). For N0 = 0 (1) 5 null differences and M = = 1(1) 10 non-null differences the critical values of all distributions with all possible tie vectors are calculated. Instructions are provided and an example serves to illustrate the use of the table. The extension of the tables are obtained by means of counting formulas given in the text. Approximations are provided in order to make the application of tests possible for larger samples as well. It is shown that the approximation of the null distribution in the presence of ties by the null distributions under the assumption of no ties in some cases overstates and sometimes understates the exact rejection probability. For N0 = 0 (1) 10 and M = 1 (1) 10 all distributions with all possible tie vectors for the SRTs with WILCOXON and PRATT ranking are examined with respect to the lattice type of the test statistic. The result is given in table 6. It is evident that the portion of PRATT -distributions with lattice character decreases as the number of null differences increases. Continuity corrections are obtained for the asymptotic normal distribution which take into account the lattice character of the distribution of the test statistic.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号