首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
R2-statistic is a popular and very widely used statistic in regression analysis to estimate the square multiple correlation (SMC), ρ2, between a response variable Y and p predictor variables, X1, …, Xp. Numerous articles are available in the statistical literature on the properties of R2 as an estimator of ρ2 when the observations are uncorrelated. However, relatively little is known about the behavior of R2 when the available observations are correlated such as the data that result from complex sampling schemes. In this paper, we study the behavior R2 in the presence of two-stage sampling data. An approximate expressions for the variance and the bias of R2 in the presence of two-stage cluster sampling data with positive intracluster correlation (ρ*) are obtained. It is evident from these formulas and from a simulation study that R2 is a poor estimator of ρ2 except when ρ* is small. As such, we consider several alternative estimators of ρ2 and evaluate their theoretical properties and finite sample performance using a simulation study.  相似文献   

2.
This paper presents an analysis of variance (ANOVA) approach by which estimation of F-statistics can be made from data with an arbitrary s-level hierarchical population structure. Assuming a complete random-effect model, a general ANOVA procedure is developed to estimate F-statistics as ratios of different variance components for all levels of population subdivision in the hierarchy. A generalized relationship among F-statistics is also derived to extend the well-known relationship originally found by Sewall Wright. Although not entirely free from the bias particular to small number of subdivisions at each hierarchy and extreme gene frequencies, the ANOVA estimators of F-statistics consider sampling effects at each level of hierarchy, thus removing the bias incurred in the other estimators that are commonly based on direct substitution of unknown gene frequencies by their sample estimates. Therefore, the ANOVA estimation procedure presented here may become increasingly useful in analyzing complex population structure because of increasing use of the estimated hierarchical F-statistics to infer genetic and demographic structures of natural populations within and among species.  相似文献   

3.
Extensions of linear models are very commonly used in the analysis of biological data. Whereas goodness of fit measures such as the coefficient of determination (R2) or the adjusted R2 are well established for linear models, it is not obvious how such measures should be defined for generalized linear and mixed models. There are by now several proposals but no consensus has yet emerged as to the best unified approach in these settings. In particular, it is an open question how to best account for heteroscedasticity and for covariance among observations present in residual error or induced by random effects. This paper proposes a new approach that addresses this issue and is universally applicable for arbitrary variance‐covariance structures including spatial models and repeated measures. It is exemplified using three biological examples.  相似文献   

4.
The case-cohort study involves two-phase samplings: simple random sampling from an infinite superpopulation at phase one and stratified random sampling from a finite cohort at phase two. Standard analyses of case-cohort data involve solution of inverse probability weighted (IPW) estimating equations, with weights determined by the known phase two sampling fractions. The variance of parameter estimates in (semi)parametric models, including the Cox model, is the sum of two terms: (i) the model-based variance of the usual estimates that would be calculated if full data were available for the entire cohort; and (ii) the design-based variance from IPW estimation of the unknown cohort total of the efficient influence function (IF) contributions. This second variance component may be reduced by adjusting the sampling weights, either by calibration to known cohort totals of auxiliary variables correlated with the IF contributions or by their estimation using these same auxiliary variables. Both adjustment methods are implemented in the R survey package. We derive the limit laws of coefficients estimated using adjusted weights. The asymptotic results suggest practical methods for construction of auxiliary variables that are evaluated by simulation of case-cohort samples from the National Wilms Tumor Study and by log-linear modeling of case-cohort data from the Atherosclerosis Risk in Communities Study. Although not semiparametric efficient, estimators based on adjusted weights may come close to achieving full efficiency within the class of augmented IPW estimators.  相似文献   

5.
The estimation of the contribution of an individual quantitative trait locus (QTL) to the variance of a quantitative trait is considered in the framework of an analysis of variance (ANOVA). ANOVA mean squares expectations which are appropriate to the specific case of QTL mapping experiments are derived. These expectations allow the specificities associated with the limited number of genotypes at a given locus to be taken into account. Discrepancies with classical expectations are particularly important for two-class experiments (backcross, recombinant inbred lines, doubled haploid populations) and F2 populations. The result allows us firstly to reconsider the power of experiments (i.e. the probability of detecting a QTL with a given contribution to the variance of the trait). It illustrates that the use of classical formulae for mean squares expectations leads to a strong underestimation of the power of the experiments. Secondly, from the observed mean squares it is possible to estimate directly the variance associated with a locus and the fraction of the total variance associated to this locus (r l 2 ). When compared to other methods, the values estimated using this method are unbiased. Considering unbiased estimators increases in importance when (1) the experimental size is limited; (2) the number of genotypes at the locus of interest is large; and (3) the fraction of the variation associated with this locus is small. Finally, specific mean squares expectations allows us to propose a simple analytical method by which to estimate the confidence interval of r l 2 . This point is particularly important since results indicate that 95% confidence intervals for r l 2 can be rather wide:2–23% for a 10% estimate and 8–34% for a 20% estimate if 100 individuals are considered.  相似文献   

6.
Abstract: The assumption of independent sample units is potentially violated in survival analyses where siblings comprise a high proportion of the sample. Violation of the independence assumption causes sample data to be overdispersed relative to a binomial model, which leads to underestimates of sampling variances. A variance inflation factor, c, is therefore required to obtain appropriate estimates of variances. We evaluated overdispersion in fetal and neonatal mule deer (Odocoileus hemionus) datasets where more than half of the sample units were comprised of siblings. We developed a likelihood function for estimating fetal survival when the fates of some fetuses are unknown, and we used several variations of the binomial model to estimate neonatal survival. We compared theoretical variance estimates obtained from these analyses with empirical variance estimates obtained from data-bootstrap analyses to estimate the overdispersion parameter, c. Our estimates of c for fetal survival ranged from 0.678 to 1.118, which indicate little to no evidence of overdispersion. For neonatal survival, 3 different models indicated that ĉ ranged from 1.1 to 1.4 and averaged 1.24–1.26, providing evidence of limited overdispersion (i.e., limited sibling dependence). Our results indicate that fates of sibling mule deer fetuses and neonates may often be independent even though they have the same dam. Predation tends to act independently on sibling neonates because of dam-neonate behavioral adaptations. The effect of maternal characteristics on sibling fate dependence is less straightforward and may vary by circumstance. We recommend that future neonatal survival studies incorporate additional sampling intensity to accommodate modest overdispersion (i.e., ĉ = 1.25), which would facilitate a corresponding ĉ adjustment in a model selection analysis using quasi-likelihood without a reduction in power. Our computational approach could be used to evaluate sample unit dependence in other studies where fates of individually marked siblings are monitored.  相似文献   

7.
The heritability (h2) of fitness traits is often low. Although this has been attributed to directional selection having eroded genetic variation in direct proportion to the strength of selection, heritability does not necessarily reflect a trait's additive genetic variance and evolutionary potential (“evolvability”). Recent studies suggest that the low h2 of fitness traits in wild populations is caused not by a paucity of additive genetic variance (VA) but by greater environmental or nonadditive genetic variance (VR). We examined the relationship between h2 and variance‐standardized selection intensities (i or βσ), and between evolvability (IA:VA divided by squared phenotypic trait mean) and mean‐standardized selection gradients (βμ). Using 24 years of data from an island population of Savannah sparrows, we show that, across diverse traits, h2 declines with the strength of selection, whereas IA and IR (VR divided by squared trait mean) are independent of the strength of selection. Within trait types (morphological, reproductive, life‐history), h2, IA, and IR are all independent of the strength of selection. This indicates that certain traits have low heritability because of increased residual variance due to the age at which they are expressed or the multiple factors influencing their expression, rather than their association with fitness.  相似文献   

8.
Tower‐based eddy covariance measurements of forest‐atmosphere carbon dioxide (CO2) exchange from many sites around the world indicate that there is considerable year‐to‐year variation in net ecosystem exchange (NEE). Here, we use a statistical modeling approach to partition the interannual variability in NEE (and its component fluxes, ecosystem respiration, Reco, and gross photosynthesis, Pgross) into two main effects: variation in environmental drivers (air and soil temperature, solar radiation, vapor pressure deficit, and soil water content) and variation in the biotic response to this environmental forcing (as characterized by the model parameters). The model is applied to a 9‐year data set from the Howland AmeriFlux site, a spruce‐dominated forest in Maine, USA. Gap‐filled flux measurements at this site indicate that the forest has been sequestering, on average, 190 g C m−2 yr−1, with a range from 130 to 270 g C m−2 yr−1. Our fitted model predicts somewhat more uptake (mean 270 g C m−2 yr−1), but interannual variation is similar, and wavelet variance analyses indicate good agreement between tower measurements and model predictions across a wide range of timescales (hours to years). Associated with the interannual variation in NEE are clear differences among years in model parameters for both Reco and Pgross. Analysis of model predictions suggests that, at the annual time step, about 40% of the variance in modeled NEE can be attributed to variation in environmental drivers, and 55% to variation in the biotic response to this forcing. As model predictions are aggregated at longer timescales (from individual days to months to calendar year), variation in environmental drivers becomes progressively less important, and variation in the biotic response becomes progressively more important, in determining the modeled flux. There is a strong negative correlation between modeled annual Pgross and Reco (r=−0.93, P≤0.001); two possible explanations for this correlation are discussed. The correlation promotes homeostasis of NEE: the interannual variation in modeled NEE is substantially less than that for either Pgross or Reco  相似文献   

9.
Several authors have suggested that plant biotechnologists perform regression or trend analysis to compare means of related quantitative treatments (e.g., doses of inositol). The present paper compares two statistical strategies to determine the effect of inositol (0–400 mg l−1) on proteolytic activity in the culture medium during pineapple growth in temporary immersion bioreactors. Strategy 1 involved one-way analysis of variance (ANOVA) followed by Tukey’s Honestly Significant Difference (HSD). Strategy 2 consisted in the development of different regression analyses to determine the best fit equations to describe the experimental results. Curvefit software (version 2.10-0, May 15, 1987, Thomas S. Cox) was used. Cauchy, Normal, Parabola, and Hoerl equations were the best fitted according to their determination coefficients (R 2). The optimal inositol concentrations to increase proteolytic activity were determined from the equations. Quite different results were obtained following strategy 2: 126.76 mg l−1 inositol from Cauchy, 131.29 mg l−1 from Normal, 145.06 mg l−1 from Parabola, and 14.05 mg l−1 from Hoerl equations. In contrast, experimental data identified 200 mg l−1 inositol as the most adequate concentration to increase proteolytic activity in the culture medium. The statistical strategy 1, one-way ANOVA followed by Tukey HSD clearly supported this biological observation. In this paper, regression analysis was not useful to describe our experimental results.  相似文献   

10.
If the variance, V = V(μ, ?) is some known function of the mean, μ = μ(β), where ? and β may include unknown parameters, then given empirical data, this paper describes how to estimate the unknown parameters by choosing them to satisfy the variance/mean relationship, and simultaneously to require that the sampling probability distribution has maximum entropy. Bounds for the estimated values of the unknown parameters can be obtained by a further application of the maximum entropy principle. The power variance function, V(μ)=λμ? is discussed, including some special cases of λ and ?. The procedure is briefly compared with quasi- likelihood, and illustrated by some numerical examples.  相似文献   

11.
Summary R st is an unstable allele of R that exhibits a mosaic phenotype in the aleurone, consisting of heavily pigmented spots on a colorless background. This variagated phenotype is presumably caused by the frequent somatic reversion of R from an inactive to an active form. Data are presented showing that such a reversion can take place at different times during the plant ontogenesis. Various stippled derivatives have been selected that differ in the number and size of dots formed in the endosperm. The reversion rate of R st and derivatives toward self-colored (R sc) has been estimated in the germinal and somatic tissues. This analysis indicates that some of the stippled derivatives differ in their capacity to revert toward R sc in both the somatic and germ cells. The effect of internal factors on the R st reversion rate has been measured. On the basis of these data the possible mechanism causing genetic instability of R st are briefly discussed.This investigation has been partially supported by Consiglio Nazionale delle Ricerche, Roma.  相似文献   

12.
There have been few studies that have examined the spatial variance of nutrient limitation over the scale of an entire set of headwater streams. We used nutrient diffusing substrata experiments (control, nitrogen addition, phosphorus addition, and nitrogen+phosphorus addition) to examine how nutrient limitation varied throughout the five creeks that comprise the McLeod River headwaters (Alberta, Canada). We assessed the variance of chlorophyll a accrual at spatial scales within reach, within creek, among creeks and across linear distance within the entire watershed to assess the consistency and scale of nutrient limitation. We analyzed the importance of the spatial scale using several methods. We assessed the coefficient of variation at different scales, the spatial covariance of nitrogen and phosphorus deficiency indices using a spline correlogram, and the variance through traditional analyses of variance methods. Chlorophyll a accrual responded significantly to nutrients in all creeks, though the response varied in magnitude and in the limiting nutrient among reaches and among creeks. Variance in chlorophyll a accrual was due primarily to the factor of creek (R 2=0.40) and secondarily to reach (R 2=0.07). The CV was 31.4% among creeks, 18.4% among reaches, and 17.9% within reaches. The N deficiency index showed a positive correlation at sites located <4 km apart and a negative correlation at sites greater than 6.5 km apart. The P deficiency index showed no discernible spatial correlation. Our results suggest that nutrient limitation varies on small scales and is often driven by local processes.  相似文献   

13.
Comparing fluctuating asymmetry (FA) between different traits can be difficult because traits vary at different scales. FA is generally quantified either as the variance of the difference between left and right (σ2L?R) or the mean of the absolute value of this difference (μ|R?L|). Corrections for scale differences are obtained by dividing by trait size mean. We show that a third index, one minus the correlation coefficient between left and right (1 ? rL,R), is equivalent to σ2L?R standardized by trait size variance. The indices are compared with Monte‐Carlo simulations. All achieve the expected correction for scale differences. High type I error rates (false indication of differences) occur only for σ2L?R and μ|R?L| if trait sizes close to or below 0 occur. 1 ? rL,R with a bootstrap test has always low error rates. Recommendation of the index to be used should be based on whether standardization of FA by trait size mean or trait size variance is preferred. A survey of 36 traits in the Speckled Wood Butterfly (Pararge aegeria) indicated that σ2L?R is slightly higher correlated to trait size variance than to trait size mean. Thus 1 ? rL,R seems to be the superior index and should be reported when FA of different traits is compared.  相似文献   

14.
Stability analysis of multilocation trials is often based on a mixed two-way model. Two stability measures in frequent use are the environmental variance (S i 2 )and the ecovalence (W i). Under the two-way model the rank orders of the expected values of these two statistics are identical for a given set of genotypes. By contrast, empirical rank correlations among these measures are consistently low. This suggests that the two-way mixed model may not be appropriate for describing real data. To check this hypothesis, a Monte Carlo simulation was conducted. It revealed that the low empirical rank correlation amongS i 2 and W i is most likely due to sampling errors. It is concluded that the observed low rank correlation does not invalidate the two-way model. The paper also discusses tests for homogeneity of S i 2 as well as implications of the two-way model for the classification of stability statistics.  相似文献   

15.
The need for a new analytical approach was encountered in the course of characterizing newly developed tomato lines resistant to late blight. Late blight resistant tomato lines were created in independent breeding programs using the accession Solanum pimpinellifolium L. (formerly Lycopersicon pimpinellifolium (L.) Miller) L3708 as the source of the resistance. However, initial field observation suggested that the late blight resistance in the lines produced by two independent breeding programs differed. Possible causes included a partial transfer of the late blight resistance derived from S. pimpinellifolium L3708 or the possibility of race specificity of this resistance. A crucial issue was determining the most appropriate and robust analytical method to use with data from laboratory analyses of the responses of nine tomato lines against five P. infestans isolates. Prior analysis by standard ANOVA revealed significant differences across tomato lines but could not determine whether the disease responses in the CLN-R lines were different from those of the heterozygous F1 hybrids, created by crossing susceptible tomatoes with the fixed CU-R lines. A different analytical method was needed. Therefore, sporangia numbers/leaflet and diseased area data were analyzed using a half-normal probability plot and regression analysis. The results of this analysis show its utility for genetic or pathology studies. Considering only populations of the uniform tomato lines, this method confirms the results obtained by using a standard ANOVA, but provides a clearer demonstration of the distributions of the individuals within the populations and how this distribution impacts variance and the difference among the populations. This method also allows a joint analysis of the uniform lines with an additional population that is less uniform, because it is segregating. Such an analysis would be invalid using a standard ANOVA. The results of this joint analysis determined that the additional population was divergent from the fixed CU-R lines, and, against some isolates, against the CLN-R lines as well. Half-normal probability plot analysis method would be applicable more broadly beyond analysis of disease resistance data. It could be useful for data from populations that are not normally distributed, for traits which are affected by epistatic gene action, and could be useful for selection of extremes.  相似文献   

16.
As monitoring plans for the restoration of Pinus ponderosa forests in the southwestern United States evolve toward examining multifactor ecosystem responses to ecological restoration, designing efficient sampling procedures for understory vegetation will become increasingly important. The objective of this study was to compare understory composition and diversity among thin/burn and control treatments in a P. ponderosa restoration, while simultaneously examining the effects of sampling design and multivariate analyses on which conclusions were based. Using multi‐response permutation procedures (MRPP), we tested the null hypothesis of no difference in understory species composition among treatments using different data matrices (e.g., frequency and cover) for two different sampling methods. Treatment differences were subtle and were detected by an intensive 50, 1‐m2 subplot sampling method for all data matrices but were not detected by a less intensive point‐intercept sampling method for any matrix. Sampling methods examined in this study controlled results of multivariate analyses more than the data matrices used to summarize data generated by a sampling method. We partitioned data into plant life form and native/exotic species categories for MRPP, and this partitioning isolated plant groups most responsible for treatment differences. We also examined the effects of number of 1‐m2 subplots sampled on mean‐species‐richness/m2 estimates and found that estimates based on 10 subplots and based on 50 subplots were highly correlated (r = 0.99). Species–area curves indicated that the 50, 1‐m2 subplot sampling method detected the common species of sites but failed to detect the majority of rare species. Additional sampling‐design studies are needed to develop single sampling designs that produce multifactor data on plant composition, diversity, and spatial patterns amenable to multivariate analyses as part of monitoring plans of vegetation responses to ecological restoration.  相似文献   

17.

Background

Genomic prediction of breeding values involves a so-called training analysis that predicts the influence of small genomic regions by regression of observed information on marker genotypes for a given population of individuals. Available observations may take the form of individual phenotypes, repeated observations, records on close family members such as progeny, estimated breeding values (EBV) or their deregressed counterparts from genetic evaluations. The literature indicates that researchers are inconsistent in their approach to using EBV or deregressed data, and as to using the appropriate methods for weighting some data sources to account for heterogeneous variance.

Methods

A logical approach to using information for genomic prediction is introduced, which demonstrates the appropriate weights for analyzing observations with heterogeneous variance and explains the need for and the manner in which EBV should have parent average effects removed, be deregressed and weighted.

Results

An appropriate deregression for genomic regression analyses is EBV/r2 where EBV excludes parent information and r2 is the reliability of that EBV. The appropriate weights for deregressed breeding values are neither the reliability nor the prediction error variance, two alternatives that have been used in published studies, but the ratio (1 - h2)/[(c + (1 - r2)/r2)h2] where c > 0 is the fraction of genetic variance not explained by markers.

Conclusions

Phenotypic information on some individuals and deregressed data on others can be combined in genomic analyses using appropriate weighting.  相似文献   

18.
Random amplified polymorphic DNA (RAPD) phenotypes generated by 13 primers were scored for 101 individuals in 14 populations of the endangered red-cockaded woodpecker Picoides borealis. Although no population-specific markers were found, the frequencies of several markers differed significantly among populations. Application of the recently developedamova method (analysis of molecular variance; Excoffier, Smouse & Quattro 1992) showed that more than 90% of phenotypic variance occurred among individuals within populations; of the remaining variance, half was attributed among groups of geographically adjacent populations and half among populations within those groups. The statistical significance of these patterns was supported by Monte Carlo sampling simulations and permutation tests. Estimation of allele frequencies from phenotypes provided somewhat weaker evidence for population structure, although among-population variance in allele frequencies was detectable (Fst= 0.19; x2169= 509.3, P < 0.0001). Upgma cluster analyses based on Rogers' (1972) genetic distance revealed grouping of some geographically proximate populations. A Mantel test indicated a positive (r = 0.16), although not significant, correlation between geographic and genetic distances. We compared a subset of our RAPD data with data from a previous study that used allozymes (Stangel, Lennartz & Smith 1992). RAPD (n= 75) and allozyme (n= 245) results based on samples from the same ten populations showed similar patterns. Our study indicates that RAPDs can be helpful in differentiating populations at the phenotypic level even when small sample sizes, estimation bias, and inability to test for Hardy-Weinberg equilibrium complicate the genotypic interpretation. Lack of large differences among populations of red-cockaded woodpeckers may allow flexibility in overpopulation translocations, provided factors such as habitat preference, latitudinal direction of translocation, and status of donor populations are considered.  相似文献   

19.
Several studies relating land cover to stream properties have used sample sizes of more than 100 watersheds, but the variance that they explain is moderate to low (R 2 less than 50%), limiting the predictive value of these studies when their models are applied to watersheds that were not included in the models’ development. We hypothesize that this is due to the increases in variation that occur with increases in sample size and in the geographic scales of the areas in which the watersheds are distributed. Land cover alone cannot explain all of that variation; more predictors must be considered. Conversely, models with high explicative power would require relatively small sample sizes distributed over small areas. This hypothesis was evaluated sampling 17 watersheds from southern Chile’s Lake Region, for which we developed regressive models between land cover/watershed area/precipitation/geomorphology and stream properties (i.e., conductivity, temperature). With a maximum n = 15 watersheds, on a regional scale, a poorly explained variation in hydrologic variables (mean 37–49%) was obtained. The R 2 increased slightly, to 45–52%, when precipitation was included as a predictor. In half of the cases analyzed, the models improved when geomorphology was considered as an additional predictor (60–66%), supporting our hypothesis. Furthermore, when our analysis was restricted to a narrower latitudinal span (n = 9), the R 2 was much stronger (68–87%) when only land cover and watershed area were included as predictors. These percentages also increased when more predictors were incorporated. Nevertheless, a portion of unexplained variance remained that would require the consideration of more predictors, such as geology and edaphology. The documented trade-off provides evidence that argues against the spatial generality of land cover/stream property models.  相似文献   

20.
Stepwise regression is often used in ecology to identify critical factors. From a large number of possible predictors, the procedure selects the subset generating the highest coefficient of determination,R 2. This work presents a method for testing the significance of this coefficient. Monte Carlo simulations are used to calculate the statistical distribution ofR 2 under the null hypothesis that the response variable is independent of the predictors. The method is illustrated by an application to a previously published analysis of the Canadian lynx population cycle where more than 75% of the variance could be explained by four meteorological factors.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号