首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
It is shown that the moments of order statistics in samples drawn from a continuous population with pdf f(x) symmetric about zero comprising a single outlier with pdf g(x) symmetric about zero can be expressed in terms of the moments of order statistics in samples drawn from the population obtained by folding the pdf f(x) at zero and the moments of order statistics in samples drawn from the population obtained by folding the pdf f(x) at zero comprising a single outlier with pdf obtained by folding g(x) at zero. The cumulative round off error involved in numerical evaluation of the moments of order statistics from the symmetric-outlier model, using a table of the moments of order statistics from the folded population and the moments of order statistics from the folded-outlier model, is not serious.  相似文献   

2.
Many East Asian human populations harbor a high-frequency deficiency allele for the aldehyde dehydrogenase 2 (ALDH2) enzyme, a critical protein involved in the metabolism of ethanol. Here we use resequencing and long-range SNP haplotype data from a Japanese sample to test whether patterns of nucleotide diversity and linkage disequilibrium at this locus are compatible with a standard neutral model of evolution. Examination of the pattern of polymorphism at a locus such as this, where the frequency of a common allele is known a priori, introduces an ascertainment bias that must be corrected for in analyses of the frequency spectrum of polymorphisms. We apply a flexible and generally applicable simulation approach to correct for this bias in our ALDH2 data and, also, to explore the effect of bias on the commonly used summary statistics Tajima’s D, Fu and Li’s D, and Fay and Wu’s H. Our study finds no evidence that the pattern of genetic variation at ALDH2 differs from that expected under a standard neutral model. However, our general examination of ascertainment bias indicates that a priori knowledge of segregating alleles greatly affects the expected distributions of summary statistics. Under many parameter combinations we find that ascertainment bias introduces an elevated rate of false positives when summary statistics are used to test for deviations from a standard neutral model. However, we also show that over a wide range of conditions the power of all summary statistics can be greatly increased by incorporating prior knowledge of segregating alleles. [Reviewing Editor: Dr. Martin Kreitman]  相似文献   

3.
Mortality statistics from five populations of small New World monkeys (includinsg Callithrix jaccus, Leontopithecus rosalia, Saguinus fuscicollis, and Saguinus oedipus) were combined to generate a standard model life table reflecting the mortality patterns of these primates. The model is applied to three individual populations to illustrate a strategy for smoothing and interpolating mortality statistics of varying completeness and quality. © 1993 Wiley-Liss, Inc.  相似文献   

4.
Exact test statistics and confidence intervals for a general split block ANOCOVA model are derived. With a single covariate, each statistic for testing main effect A, main effect B, and the AxB interaction has one less numerator degree of freedom than its counterpart in the ordinary ANOVA without a covariate. Sufficient conditions on the model parameters which allow these lost numerator degrees of freedom to be regained are given, as are exact statistics and confidence intervals for the corresponding reduced models. A note of caution is offered when constructing test statistics for reduced versions of the general model using the method of generalized least squares. General analysis of covariance models for two other block designs are presented.  相似文献   

5.
Renewable energy from lignocellulosic biomass has been deemed an alternative to depleting fossil fuels. In order to improve this technology, we aim to develop robust mathematical models for the enzymatic lignocellulose degradation process. By analyzing 96 groups of previously published and newly obtained lignocellulose saccharification results and fitting them to Weibull distribution, we discovered Weibull statistics can accurately predict lignocellulose saccharification data, regardless of the type of substrates, enzymes and saccharification conditions. A mathematical model for enzymatic lignocellulose degradation was subsequently constructed based on Weibull statistics. Further analysis of the mathematical structure of the model and experimental saccharification data showed the significance of the two parameters in this model. In particular, the λ value, defined the characteristic time, represents the overall performance of the saccharification system. This suggestion was further supported by statistical analysis of experimental saccharification data and analysis of the glucose production levels when λ and n values change. In conclusion, the constructed Weibull statistics‐based model can accurately predict lignocellulose hydrolysis behavior and we can use the λ parameter to assess the overall performance of enzymatic lignocellulose degradation. Advantages and potential applications of the model and the λ value in saccharification performance assessment were discussed.  相似文献   

6.
Bayesian inference operates under the assumption that the empirical data are a good statistical fit to the analytical model, but this assumption can be challenging to evaluate. Here, we introduce a novel r package that utilizes posterior predictive simulation to evaluate the fit of the multispecies coalescent model used to estimate species trees. We conduct a simulation study to evaluate the consistency of different summary statistics in comparing posterior and posterior predictive distributions, the use of simulation replication in reducing error rates and the utility of parallel process invocation towards improving computation times. We also test P2C2M on two empirical data sets in which hybridization and gene flow are suspected of contributing to shared polymorphism, which is in violation with the coalescent model: Tamias chipmunks and Myotis bats. Our results indicate that (i) probability‐based summary statistics display the lowest error rates, (ii) the implementation of simulation replication decreases the rate of type II errors, and (iii) our r package displays improved statistical power compared to previous implementations of this approach. When probabilistic summary statistics are used, P2C2M corroborates the assumption that genealogies collected from Tamias and Myotis are not a good fit to the multispecies coalescent model. Taken as a whole, our findings argue that an assessment of the fit of the multispecies coalescent model should accompany any phylogenetic analysis that estimates a species tree.  相似文献   

7.
8.
Most species data display spatial autocorrelation that can affect ecological niche models (ENMs) accuracy‐statistics, affecting its ability to infer geographic distributions. Here we evaluate whether the spatial autocorrelation underlying species data affects accuracy‐statistics and map the uncertainties due to spatial autocorrelation effects on species range predictions under past and future climate models. As an example, ENMs were fitted to Qualea grandiflora (Vochysiaceae), a widely distributed plant from Brazilian Cerrado. We corrected for spatial autocorrelation in ENMs by selecting sampling sites equidistant in geographical (GEO) and environmental (ENV) spaces. Distributions were modelled using 13 ENMs evaluated by two accuracy‐statistics (TSS and AUC), which were compared with uncorrected ENMs. Null models and the similarity statistics I were used to evaluate the effects of spatial autocorrelation. Moreover, we applied a hierarchical ANOVA to partition and map the uncertainties from the time (across last glacial maximum, pre‐insustrial, and 2080 time periods) and methodological components (ENMs and autocorrelation corrections). The GEO and ENV models had the highest accuracy‐statistics values, although only the ENV model had values higher than expected by chance alone for most of the 13 ENMs. Uncertainties from time component were higher in the core region of the Brazilian Cerrado where Q. grandiflora occurs, whereas methodological components presented higher uncertainties in the extreme northern and southern regions of South America (i.e. outside of Brazilian Cerrado). Our findings show that accounting for autocorrelation in environmental space is more efficient than doing so in geographical space. Methodological uncertainties were concentrated in outside the core region of Q. grandiflora's habitat. Conversely, uncertainty due to time component in the Brazilian Cerrado reveals that ENMs were able to capture climate change effects on Q. grandiflora distributions.  相似文献   

9.
Stability analysis of multilocation trials is often based on a mixed two-way model. Two stability measures in frequent use are the environmental variance (S i 2 )and the ecovalence (W i). Under the two-way model the rank orders of the expected values of these two statistics are identical for a given set of genotypes. By contrast, empirical rank correlations among these measures are consistently low. This suggests that the two-way mixed model may not be appropriate for describing real data. To check this hypothesis, a Monte Carlo simulation was conducted. It revealed that the low empirical rank correlation amongS i 2 and W i is most likely due to sampling errors. It is concluded that the observed low rank correlation does not invalidate the two-way model. The paper also discusses tests for homogeneity of S i 2 as well as implications of the two-way model for the classification of stability statistics.  相似文献   

10.
The pair correlation function g(r) is an important tool in exploratory data analysis and model choice in point process statistics. In the case of cluster processes, the behaviour of g(r) for small r is particularly interesting. But just these values of g(r) can be estimated with difficulties only. This paper tries to show that kernel estimators yield reliable results. It is useful to work with variable band widths. An example where the points are positions of pines in a forest illustrates the application of the method.  相似文献   

11.
12.
To test an assumed mean vector, and to test the equality of two mean vectors, robust statistics are developed which have exactly the same form as the Hotelling T2 statistics. These statistics are shown to have remarkable type I error robustness and power.  相似文献   

13.
Hong Zhang  Zheyang Wu 《Biometrics》2023,79(2):1159-1172
Combining dependent tests of significance has broad applications but the related p-value calculation is challenging. For Fisher's combination test, current p-value calculation methods (eg, Brown's approximation) tend to inflate the type I error rate when the desired significance level is substantially less than 0.05. The problem could lead to significant false discoveries in big data analyses. This paper provides two main contributions. First, it presents a general family of Fisher type statistics, referred to as the GFisher, which covers many classic statistics, such as Fisher's combination, Good's statistic, Lancaster's statistic, weighted Z-score combination, and so forth. The GFisher allows a flexible weighting scheme, as well as an omnibus procedure that automatically adapts proper weights and the statistic-defining parameters to a given data. Second, the paper presents several new p-value calculation methods based on two novel ideas: moment-ratio matching and joint-distribution surrogating. Systematic simulations show that the new calculation methods are more accurate under multivariate Gaussian, and more robust under the generalized linear model and the multivariate t-distribution. The applications of the GFisher and the new p-value calculation methods are demonstrated by a gene-based single nucleotide polymorphism (SNP)-set association study. Relevant computation has been implemented to an R package GFisher available on the Comprehensive R Archive Network.  相似文献   

14.
The Trojan Y-Chromosome (TYC) strategy, an autocidal genetic biocontrol method, has been proposed to eliminate invasive alien species. In this work, we develop a Markov jump process model for this strategy, and we verify that there is a positive probability for wild-type females going extinct within a finite time. Moreover, when sex-reversed Trojan females are introduced at a constant population size, we formulate a stochastic differential equation (SDE) model as an approximation to the proposed Markov jump process model. Using the SDE model, we investigate the probability distribution and expectation of the extinction time of wild-type females by solving Kolmogorov equations associated with these statistics. The results indicate how the probability distribution and expectation of the extinction time are shaped by the initial conditions and the model parameters.  相似文献   

15.
We have evaluated the power for detecting a common trait determined by two loci, using seven statistics, of which five are implemented in the computer program SimWalk2, and two are implemented in GENEHUNTER. Unlike most previous reports which involve evaluations of the power of allelesharing statistics for a single disease locus, we have used a simulated data set of general pedigrees in which a twolocus disease is segregating and evaluated several nonparametric linkage statistics implemented in the two programs. We found that the power for detecting linkage using the Sall statistic in GENEHUNTER (GH, version 2.1), implemented as statisticE in SimWalk2 (version 2.82), is different in the two. TheP values associated with statisticE output by SimWalk2 are consistently more conservative than those from GENEHUNTER except when the underlying model includes heterogeneity at a level of 50% where theP values output are very comparable. On the other hand, when the thresholds are determined empirically under the null hypothesis, Sall in GENEHUNTER and statisticE have similar power.  相似文献   

16.
Weighted logrank testing procedures for comparing r treatments with a control when some of the data are randomly censored are discussed. Four kinds of test statistics for the simple tree alternatives are considered. The weighted logrank statistics based on pairwise ranking scheme is proposed and the covariances of the test statistics are explicitly obtained. This class of test statistics can be viewed as the general statistics of constructing the test procedures for various order restricted alternatives by modifying weights. Four kinds of weighted logrank tests are illustrated with an example. Simulation studies are performed to compare the sizes and the powers of the considered tests with the other.  相似文献   

17.
目的 构建公立医院公共卫生服务监管的评价模型,评价公立医院公共卫生服务的监管能力。方法 采用层次分析法、综合指数法、数理统计分析、模糊数学法来构建评价模型,评价监管能力,以此来建立政府对公立医院公共卫生监管准绳。结果 给出包括5项一级指标、18项二级指标的公立医院公共卫生服务监管能力评价模型。结论 构建的公立医院公共卫生服务监管能力评价模型,为综合评价公立医院公共卫生服务监管能力提供了内容依据和量化标准,为政府有效评价公共卫生服务质量提供了科学依据。  相似文献   

18.
Questions: Can probability of occurrence and dominance be accurately estimated for six important conifer species with varying range sizes? Does range size impact the accuracy of species probability of occurrence models? Is species predicted probability of occurrence significantly related to observed dominance? Location: Pacific Northwest region, North America (60°–40°N, 140°–110°W). Methods: This study develops near range‐wide predictive distribution maps for six important conifer species (Pseudotsuga menziesii, Tsuga heterophylla, Pinus contorta, Thuja plicata, Larix occidentalis, and Picea glauca) using forest inventory data collected across the United States and Canada. Species model accuracies are compared with range size using a rank scoring system. A suite of climate and topographic predictor variables are used to investigate environmental constraints that limit species range and quantify relationships between species predicted probability of occurrence and dominance at both plot and landscape scales. Results: Evaluation statistics revealed significant and accurate probability of occurrence models were developed for all six species. Based on ranked evaluation statistics, Tsuga heterophylla had highest overall model accuracy (statistic rank score=5) and Pinus contorta the lowest (statistic rank score=17). Across species, ranked evaluation statistics also revealed a pattern of decreasing model accuracy with increasing range size. At plot level, correlations between dominance and probability of occurrence were weakly positive for all species with only half of the species having statistically significant correlations. Pseudotsuga menziesii had the highest correlation (r=0.36, P<0.001) and Thuja plicata lowest (r=0.038, P=0.799). At the 50‐km scale, correlations between dominance and probability of occurrence improved for all species except Pinus contorta. Pseudotsuga menziesii displayed the highest correlation (r=0.68, P<0.001) and Thuja plicata the lowest (r=0.07, P>0.709). Conclusions: Species probability of occurrence model accuracy decreased with increasing range size. The strength and significance of correlations between probability of occurrence and dominance varied considerably by species and across spatial scales. Apart from Pseudotsuga menziesii and L. occidentalis, the results suggest that probability of occurrence is not a consistently reliable surrogate for species dominance in Pacific Northwest forests. We demonstrate how the degree of correlation between species occurrence and dominance can be used as an indicator of how well predictions of occurrence characterize the optimal niche of a species.  相似文献   

19.
In meta-analysis, hypothesis testing is one of the commonly used approaches for assessing whether heterogeneity exists in effects between studies. The literature concluded that the Q-statistic is clearly the best choice and criticized the performance of the likelihood ratio test in terms of the type I error control and power. However, all the criticism for the likelihood ratio test is based on the use of a mixture of two chi-square distributions with 0 and 1 degrees of freedom, which is justified only asymptotically. In this study, we develop a novel method to derive the finite sample distribution of the likelihood ratio test and restricted likelihood ratio test statistics for testing the zero variance component in the random effects model for meta-analysis. We also extend this result to the heterogeneity test when metaregression is applied. A numerical study shows that the proposed statistics have superior performance to the Q-statistic, especially when the number of studies collected for meta-analysis is small to moderate.  相似文献   

20.
Longitudinal samples of DNA sequences are the DNA sequences sampled from the same population at different time points. For fast evolving organisms, e.g. RNA virus, these kind of samples have increasingly been used to study the evolutionary process in action. Longitudinal samples provide some interesting new summary statistics of genetic variation, such as the frequency of mutation of size i in one sample and size j in another, the average number of mutations accumulated since the common ancestor of two sequences each from a different sample, and number of private, shared and fixed mutations within samples. To make the results more applicable, we used in this study a general two-sample model, which assumes two longitudinal samples were taken from the same measurably evolving population. Inspired by the HIV study, we also studied a two-sample-two-stage model, which is a special case of two-sample model and assumes a treatment after the first sampling instantaneously changes the population size. We derived the formulas for calculating statistical properties, e.g. expectations, variances and covariances, of these new summary statistics under the two models. Potential applications of these results were discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号