首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Guan Y 《Biometrics》2011,67(3):926-936
Summary We introduce novel regression extrapolation based methods to correct the often large bias in subsampling variance estimation as well as hypothesis testing for spatial point and marked point processes. For variance estimation, our proposed estimators are linear combinations of the usual subsampling variance estimator based on subblock sizes in a continuous interval. We show that they can achieve better rates in mean squared error than the usual subsampling variance estimator. In particular, for n×n observation windows, the optimal rate of n?2 can be achieved if the data have a finite dependence range. For hypothesis testing, we apply the proposed regression extrapolation directly to the test statistics based on different subblock sizes, and therefore avoid the need to conduct bias correction for each element in the covariance matrix used to set up the test statistics. We assess the numerical performance of the proposed methods through simulation, and apply them to analyze a tropical forest data set.  相似文献   

2.
The statistical modelling of count data permeates the discipline of ecology. Such data often exhibit overdispersion compared with a standard Poisson distribution, so that the variance of the counts is greater than that of the mean. Whereas modelling to reveal the effects of explanatory variables on the mean is commonplace, overdispersion is generally regarded as a nuisance parameter to be accounted for and subsequently ignored. Instead, we propose a method that models the overdispersion as a biologically interesting property of a data set and show how novel inference is provided as a result. We adapted the double hierarchical generalized linear model approach to create an easily extendible model structure that quantifies the influence of explanatory variables on the overdispersion of count data, and apply it to farmland birds. These data were from a study within Irish agricultural ecosystems, in which total bird species abundance and the abundance of farmland indicator species were compared on dairy and non‐dairy farms in the winter and breeding seasons. In general, overdispersion in bird counts was greater on dairy farms than on non‐dairy farms, and for total bird numbers, overdispersion was greatest on dairy farms in winter. Our code is fitted using the Bayesian package Rstan, and we make all code and data available in a GitHub repository. Within a Bayesian framework, this approach facilitates a meaningful quantification of the effects of categorical explanatory variables on any response variable with a tendency to overdispersion that has a meaningful biological or ecological explanation.  相似文献   

3.
R2-statistic is a popular and very widely used statistic in regression analysis to estimate the square multiple correlation (SMC), ρ2, between a response variable Y and p predictor variables, X1, …, Xp. Numerous articles are available in the statistical literature on the properties of R2 as an estimator of ρ2 when the observations are uncorrelated. However, relatively little is known about the behavior of R2 when the available observations are correlated such as the data that result from complex sampling schemes. In this paper, we study the behavior R2 in the presence of two-stage sampling data. An approximate expressions for the variance and the bias of R2 in the presence of two-stage cluster sampling data with positive intracluster correlation (ρ*) are obtained. It is evident from these formulas and from a simulation study that R2 is a poor estimator of ρ2 except when ρ* is small. As such, we consider several alternative estimators of ρ2 and evaluate their theoretical properties and finite sample performance using a simulation study.  相似文献   

4.
Abstract: The assumption of independent sample units is potentially violated in survival analyses where siblings comprise a high proportion of the sample. Violation of the independence assumption causes sample data to be overdispersed relative to a binomial model, which leads to underestimates of sampling variances. A variance inflation factor, c, is therefore required to obtain appropriate estimates of variances. We evaluated overdispersion in fetal and neonatal mule deer (Odocoileus hemionus) datasets where more than half of the sample units were comprised of siblings. We developed a likelihood function for estimating fetal survival when the fates of some fetuses are unknown, and we used several variations of the binomial model to estimate neonatal survival. We compared theoretical variance estimates obtained from these analyses with empirical variance estimates obtained from data-bootstrap analyses to estimate the overdispersion parameter, c. Our estimates of c for fetal survival ranged from 0.678 to 1.118, which indicate little to no evidence of overdispersion. For neonatal survival, 3 different models indicated that ĉ ranged from 1.1 to 1.4 and averaged 1.24–1.26, providing evidence of limited overdispersion (i.e., limited sibling dependence). Our results indicate that fates of sibling mule deer fetuses and neonates may often be independent even though they have the same dam. Predation tends to act independently on sibling neonates because of dam-neonate behavioral adaptations. The effect of maternal characteristics on sibling fate dependence is less straightforward and may vary by circumstance. We recommend that future neonatal survival studies incorporate additional sampling intensity to accommodate modest overdispersion (i.e., ĉ = 1.25), which would facilitate a corresponding ĉ adjustment in a model selection analysis using quasi-likelihood without a reduction in power. Our computational approach could be used to evaluate sample unit dependence in other studies where fates of individually marked siblings are monitored.  相似文献   

5.
Driving X chromosomes (XDs) bias their own transmission through males by killing Y‐bearing gametes. These chromosomes can in theory spread rapidly in populations and cause extinction, but many are found as balanced polymorphisms or as “cryptic” XDs shut down by drive suppressors. The relative likelihood of these outcomes and the evolutionary pathways through which they come about are not well understood. An XD was recently discovered in the mycophagous fly, Drosophila testacea, presenting the opportunity to compare this XD with the well‐studied XD of its sister species, Drosophila neotestacea. Comparing features of independently evolved XDs in young sister species is a promising avenue towards understanding how XDs and their counteracting forces change over time. In contrast to the XD of D. neotestacea, we find that the XD of D. testacea is old, with its origin predating the radiation of three species: D. testacea, D. neotestacea and their shared sister species, Drosophila orientacea. Motivated by the suggestion that older XDs should be more deleterious to carriers, we assessed the effect of the XD on both male and female fertility. Unlike what is known from D. neotestacea, we found a strong fitness cost in females homozygous for the XD in D. testacea: a large proportion of homozygous females failed to produce offspring after being housed with males for several days. Our male fertility experiments show that although XD male fertility is lower under sperm‐depleting conditions, XD males have comparable fertility to males carrying a standard X chromosome under a free‐mating regime, which may better approximate conditions in wild populations of D. testacea. Lastly, we demonstrate the presence of autosomal suppression of X chromosome drive. Our results provide support for a model of XD evolution where the dynamics of young XDs are governed by fitness consequences in males, whereas in older XD systems, both suppression and fitness consequences in females likely supersede male fitness costs.  相似文献   

6.
For some applications of the WILCOXON-MANN-WHITNEY-statistic its variance has to be estimated. So e.g. for the test of POTTHOFF (1963) to detect differences in medians of two symmetric distributions as well as for the computation of approximate, confidence bounds for the probability P(X1X2), cf. GOVINDARAJULU (1968). In the present paper an easy to compute variance estimator is proposed which as only information uses the ranks of the data with the additional property that it is unbiased for the finite variance. Because of its invariance under any monotone transformation of the data its applicability is not confined to quantitative data. The estimator may be applied to ordinal data just as well. Some properties are discussed and a numerical example is given.  相似文献   

7.
Experimental data for the induction of dicentric chromosomes in phytohemagglutinin (PHA)-stimulated human T lymphocytes by 241Am alpha-particles obtained by Schmid et al. have been analyzed in the light of biophysical theory. As usual in experiments with alpha-particles, the relative variance of the intercellular distribution of the number of aberrations per cell exceeds unity, and the multiplicity of the aberrations per particle traversal through the cell is understood as the basic effect causing this overdispersion. However, the clearly expressed dose dependence of the relative variance differs from the dose-independent relative variance predicted by the multiplicity effect alone. Since such dose dependence is often observed in experiments with alpha-particles, protons, and high-energy neutrons, the interpretation of the overdispersion needs to be supplemented. In a new, more general statistical model, the distribution function of the number of aberrations is interpreted as resulting from the convolution of a Poisson distribution for the spontaneous aberrations with the overdispersed distributions for the aberrations caused by intratrack or intertrack lesion interaction, and the fluctuation of the cross-sectional area of the cellular chromatin must also be considered. Using a suitable mathematical formulation of the resulting dose-dependent overdispersion, the mean number λ 1 of the aberrations produced by a single particle traversal through the cell nucleus and the mean number λ 2 of the aberrations per pairwise approach between two alpha-particle tracks could be estimated. Coefficient α of the dose-proportional yield component, when compared between 241Am alpha-particle irradiation and 137Cs gamma-ray exposure, is found to increase approximately in proportion to dose-mean restricted linear energy transfer, which indicates an underlying pairwise molecular lesion interaction on the nanometer scale. Received: 17 December 1996 / Accepted in revised form: 20 April 1997  相似文献   

8.
John Graunt (1662) was the first to estimate the ratio y/x where y represents the total population and x the known total number of registered births in the same areas during the preceding year. About 1765 Messance (Stephan, 1948) and Moheau (1778) published very carefully prepared estimates for France based on enumeration of population in certain districts and on the count of births, deaths and marriages as reported for the whole country. The districts from which the ratio of inhabitants to birth was determined only constituted a sample. Laplace (1786) prepared similar estimates in 1802 based on a two-stage sampling plan. Recently Hansen and Hurwitz (1943) showed that the ratio estimate (yi/ni)X of Y is unbiased where all xi's are known and the nth cluster is selected with p.p.s. More recently Hájek (1949), Lahiri (1951), Midzuno (1952) and Sen (1952) developed independently the sampling of n clusters with p.p.s to the totals of the sizes of the sample clusters S(xi). Des Raj (1954) and Sen (1952, 1953) gave unbiased estimate of the variance of the estimator which was generally non-negative for samples with smaller probabilities. Rao and Vijayan (1977) gave an unbiased estimator which is non-negative for samples with larger probabilities. Hájek (1949) provided an almost unbiased estimator of the variance of the estimator. The paper discusses situations where Hájek's estimator of variance should be preferred to the Rao-Vijayan estimator and vice versa.  相似文献   

9.
Consider the two linear regression models of Yij on Xij, namely Yij = βio + βil Xij + εij,j = 1,2,…,ni, i = 1,2, where εij are assumed to be normally distributed with zero mean and common unknown variance σ2. The estimated value of a mean of Y1 for a given value of X1 is made to depend on a preliminary test of significance of the hypothesis β11 = β21. The bias and the mean square error of the estimator for the conditional mean of Y1 are given. The relative efficiency of the estimator to the usual estimator is computed and is used to determine a proper choice of the significance level of the preliminary test.  相似文献   

10.
Consider the two linear regression models of Yij on Xij, namely Yij = βio + βij, Xij + Eij = 1, 2,…, ni, i = 1, 2, where Eij are assumed to be normally distributed with zero mean and common unknown variance σ2. The problem of estimating the conditional mean of Y1 for a given value of X1 is considered when it is a priori suspected that β10 = β20 and β11 = β21. The preliminary test estimator is proposed. The exact expressions for the bias and the mean square error of the estimator are derived. The relative efficiency of the new estimator to the usual least square estimator based on the first regression alone is computed and is used to determine the appropriate value of the significance level of the preliminary test β10 = β20 and β11 = β21.  相似文献   

11.
Reynolds J  Weir BS  Cockerham CC 《Genetics》1983,105(3):767-779
A distance measure for populations diverging by drift only is based on the coancestry coefficient θ, and three estimators of the distance D = -ln(1 - θ) are constructed for multiallelic, multilocus data. Simulations of a monoecious population mating at random showed that a weighted ratio of single-locus estimators performed better than an unweighted average or a least squares estimator. Jackknifing over loci provided satisfactory variance estimates of distance values. In the drift situation, in which mutation is excluded, the weighted estimator of D appears to be a better measure of distance than others that have appeared in the literature.  相似文献   

12.
The relationship between developmental stability and morphological asymmetry is derived under the standard view that structures on each side of an individual develop independently and are normally distributed. I use developmental variance of sizes of parts, VD, as the converse of developmental stability, and assume that VD follows a gamma distribution. Repeatability of asymmetry, a measure of how informative asymmetry is about VD, is quite insensitive to the variance in VD, for example only reaching 20% when the coefficient of variation of VD is 100%. The coefficient of variation of asymmetry, CVFA, also increases very slowly with increasing population variation in VD. CVFA values from empirical data are sometimes over 100%, implying that developmental stability is sometimes more variable than any previously studied type of trait. This result suggests that alternatives to this model may be needed.  相似文献   

13.
ABSTRACT Count data with means <2 are often assumed to follow a Poisson distribution. However, in many cases these kinds of data, such as number of young fledged, are more appropriately considered to be multinomial observations due to naturally occurring upper truncation of the distribution. We evaluated the performance of several versions of multinomial regression, plus Poisson and normal regression, for analysis of count data with means <2 through Monte Carlo simulations. Simulated data mimicked observed counts of number of young fledged (0, 1, 2, or 3) by California spotted owls (Strix occidentalis occidentalis). We considered size and power of tests to detect differences among 10 levels of a categorical predictor, as well as tests for trends across 10-year periods. We found regular regression and analysis of variance procedures based on a normal distribution to perform satisfactorily in all cases we considered, whereas failure rate of multinomial procedures was often excessively high, and the Poisson model demonstrated inappropriate test size for data where the variance/mean ratio was <1 or >1.2. Thus, managers can use simple statistical methods with which they are likely already familiar to analyze the kinds of count data we described here.  相似文献   

14.
ABSTRACT

Proportion data from dose-response experiments are often overdispersed, characterised by a larger variance than assumed by the standard binomial model. Here, we present different models proposed in the literature that incorporate overdispersion. We also discuss how to select the best model to describe the data and present, using R software, specific code used to fit and interpret binomial, quasi-binomial, beta-binomial, and binomial-normal models, as well as to assess goodness-of-fit. We illustrate applications of these generalized linear models and generalized linear mixed models with a case study from a biological control experiment, where different isolates of Isaria fumosorosea (Hypocreales: Cordycipitaceae) were used to assess which ones presented higher resistance to UV-B radiation. We show how to test for differences between isolates and also how to statistically group isolates presenting a similar behaviour.  相似文献   

15.
For estimating the finite population mean Y- of the study character y, an estimator using a transformed auxiliary variable has been defined. The bias and mean-squared error (MSE) of the proposed estimator have been obtained. The regions of preference have been obtained under which it is better than usual unbiased estimator y-, the ratio estimator y-R = y-X-/x-, Sisodia and Dwivedi (1981) estimator y-s = y-(X- + Cx)/(x- + Cx) and Singh and Kakran (1993) estimator y-k = y[X- + β2(x)]/[x- + β2(x)]. An empirical study has been carried out to demonstrate the superiority of the suggested estimator over the others.  相似文献   

16.
Many investigators use the reduced major axis (RMA) instead of ordinary least squares (OLS) to define a line of best fit for a bivariate relationship when the variable represented on the X‐axis is measured with error. OLS frequently is described as requiring the assumption that X is measured without error while RMA incorporates an assumption that there is error in X. Although an RMA fit actually involves a very specific pattern of error variance, investigators have prioritized the presence versus the absence of error rather than the pattern of error in selecting between the two methods. Another difference between RMA and OLS is that RMA is symmetric, meaning that a single line defines the bivariate relationship, regardless of which variable is X and which is Y, while OLS is asymmetric, so that the slope and resulting interpretation of the data are changed when the variables assigned to X and Y are reversed. The concept of error is reviewed and expanded from previous discussions, and it is argued that the symmetry‐asymmetry issue should be the criterion by which investigators choose between RMA and OLS. This is a biological question about the relationship between variables. It is determined by the investigator, not dictated by the pattern of error in the data. If X is measured with error but OLS should be used because the biological question is asymmetric, there are several methods available for adjusting the OLS slope to reflect the bias due to error. RMA is being used in many analyses for which OLS would be more appropriate. Am J Phys Anthropol, 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

17.
Jingru Zhang  Wei Lin 《Biometrics》2019,75(4):1098-1108
Clustered multinomial data are prevalent in a variety of applications such as microbiome studies, where metagenomic sequencing data are summarized as multinomial counts for a large number of bacterial taxa per subject. Count normalization with ad hoc zero adjustment tends to result in poor estimates of abundances for taxa with zero or small counts. To account for heterogeneity and overdispersion in such data, we suggest using the logistic normal multinomial (LNM) model with an arbitrary correlation structure to simultaneously estimate the taxa compositions by borrowing information across subjects. We overcome the computational difficulties in high dimensions by developing a stochastic approximation EM algorithm with Hamiltonian Monte Carlo sampling for scalable parameter estimation in the LNM model. The ill‐conditioning problem due to unstructured covariance is further mitigated by a covariance‐regularized estimator with a condition number constraint. The advantages of the proposed methods are illustrated through simulations and an application to human gut microbiome data.  相似文献   

18.
RAPD analysis was carried out to study the genetic variation and phylogenetic relationships of polyploid Aegilops species, which contain the D genome as a component of the alloploid genome, and diploid Aegilops tauschii, which is a putative donor of the D genome for common wheat. In total, 74 accessions of six D-genome Aegilops species were examined. The highest intraspecific variation (0.03–0.21) was observed for Ae. tauschii. Intraspecific distances between accessions ranged 0.007–0.067 in Ae. cylindrica, 0.017–0.047 in Ae. vavilovii, and 0–0.053 inAe. juvenalis.Likewise, Ae. ventricosaand Ae. crassa showed low intraspecific polymorphism. The among-accession difference in alloploidAe. ventricosa (genome DvNv) was similar to that of one parental species, Ae. uniaristata (N), and substantially lower than in the other parent, Ae. tauschii (D). The among-accession difference in Ae. cylindrica(CcDc) was considerably lower than in either parent, Ae. tauschii (D) orAe. caudata (C). With the exception of Ae. cylindrica, all D-genome species—Ae. tauschii (D),Ae. ventricosa (DvNv), Ae. crassa (XcrDcr1 and XcrDcr1Dcr2), Ae. juvenalis (XjDjUj), andAe. vavilovii (XvaDvaSva)—formed a single polymorphic cluster, which was distinct from clusters of other species. The only exception, Ae. cylindrica(CcDc), did not group with the other D-genome species, but clustered withAe. caudata (C), a donor of the C genome. The cluster of these two species was clearly distinct from the cluster of the other D-genome species and close to a cluster of Ae. umbellulata (genome U) and Ae. ovata (genome UgMg). Thus, RAPD analysis for the first time was used to estimate and to compare the interpopulation polymorphism and to establish the phylogenetic relationships of all diploid and alloploid D-genome Aegilops species.  相似文献   

19.
The 1H-nmr chemical shifts and the spin–spin coupling constants of the common amino acid residues were measured in solutions of the linear tetrapeptides H-Gly-Gly-X-L -Ala-OH in D2O and H2O, the influence of X on the nmr parameters of the neighboring residues Gly 2 and Ala 4 was investigated. The titration parameters for the side chains of Asp, Glu, Lys, Tyr, and His were determined. The pKa values obtained in D2O, with the use of pH-meter readings with a combination glass electrode uncorrected for istope effects, were 0.06 pH units higher in the acidic range and 0.10 pH units higher in the basic range than the corresponding pKa values in H2O. This suggests that the present data are suitable “random-coil” 1H-nmr parameters for conformational studies of polypeptide chains in D2O and H2O solutions.  相似文献   

20.
The relationship between development of light leaf spot and yield loss in winter oilseed rape was analysed, initially using data from three experiments at sites near Aberdeen in Scotland in the seasons 1991/92, 1992/93 and 1993/94, respectively. Over the three seasons, single-point models relating yield to light leaf spot incidence (% plants with leaves with light leaf spot) at GS 3.3 (flower buds visible) generally accounted for more of the variance than single-point models at earlier or later growth stages. Only in 1992/93, when a severe light leaf spot epidemic developed on leaves early in the season, did the single-point model for disease severity on leaves at GS 3.5/4.0 account for more of the variance than that for disease incidence at GS 3.3. In 1991/92 and 1992/3, when reasonably severe epidemics developed on stems, the single-point model for light leaf spot incidence (stems) at GS 6.3 accounted for as much of the variance. Two-point (disease severity at GS 3.3 and GS 4.0) and AUDPC models (disease incidence/severity) accounted for more of the variance than the single-point model based on disease incidence at GS 3.3 in 1992/93 but not in the other two seasons. Therefore, a simple model using the light leaf spot incidence at GS 3.3 (x) as the explanatory variable was selected as a predictive model to estimate % yield loss (yr): yr= 0.32x– 0.57. This model fitted all three data sets from Scotland, When data sets from Rothamsted, Rosemaund and Thurloxton in England were used to test it, this single-point predictive model generally fitted the data well, except when yield loss was clearly not related to occurrence of light leaf spot. However, the regression lines relating observed yield loss to light leaf spot incidence at GS 3.3 often had smaller slopes than the line produce, by the model based on Scottish data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号