首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
An attempt has been made to derive the theory of successive sampling for estimation of regression coefficient. The situations considered are estimation of regression coefficient for current occasion, estimation of change of regression coefficients over two occasions and estimation of average of regression coefficients over two occasions. The expressions of optimum estimators along with their variances have been worked out. On comparing their efficiencies empirically, it has been observed that similar to the estimation of mean, successive sampling can also be used with advantage for estimation of regression coefficient.  相似文献   

2.
Summary Expanding the regression coefficient as stability parameter (Finlay and Wilkinson 1963) requires an unbiased interpretation of the parameter. Information on the covariances among the genotypes of the population must be specific, particularly when the assumed relatedness of the genotypes appears questionable. Such a problem, however, is not expected when the covariances between the genotypes are either zero or equal and possibly non zero.  相似文献   

3.
In ecology, as in other research fields, efficient sampling for population estimation often drives sample designs toward unequal probability sampling, such as in stratified sampling. Design based statistical analysis tools are appropriate for seamless integration of sample design into the statistical analysis. However, it is also common and necessary, after a sampling design has been implemented, to use datasets to address questions that, in many cases, were not considered during the sampling design phase. Questions may arise requiring the use of model based statistical tools such as multiple regression, quantile regression, or regression tree analysis. However, such model based tools may require, for ensuring unbiased estimation, data from simple random samples, which can be problematic when analyzing data from unequal probability designs. Despite numerous method specific tools available to properly account for sampling design, too often in the analysis of ecological data, sample design is ignored and consequences are not properly considered. We demonstrate here that violation of this assumption can lead to biased parameter estimates in ecological research. In addition, to the set of tools available for researchers to properly account for sampling design in model based analysis, we introduce inverse probability bootstrapping (IPB). Inverse probability bootstrapping is an easily implemented method for obtaining equal probability re-samples from a probability sample, from which unbiased model based estimates can be made. We demonstrate the potential for bias in model-based analyses that ignore sample inclusion probabilities, and the effectiveness of IPB sampling in eliminating this bias, using both simulated and actual ecological data. For illustration, we considered three model based analysis tools—linear regression, quantile regression, and boosted regression tree analysis. In all models, using both simulated and actual ecological data, we found inferences to be biased, sometimes severely, when sample inclusion probabilities were ignored, while IPB sampling effectively produced unbiased parameter estimates.  相似文献   

4.
In this paper properties of an estimator of the population mean on current occasion under successive sampling scheme, when various weights (φh'S) and regression coefficients (βh,h-1) are estimated for h ≥ 2, have been studied. Some empirical results on the estimation of the variance of an unbiased estimator of population mean for h = 2 are also given.  相似文献   

5.
 β-多样性刻画了地理区域中不同地点物种组成的变化,是理解生态系统功能、生物多样性保护和生态系统管理的一个重要概念。该文介绍了如何从群落组成,相关环境和空间数据角度去分析β-多样性。β-多样性可以通过计算每个地点的多样性指数,进而对可能解释点之间差异的因子所作的假设进行检验来研究。也可以将涵盖所有点的群落组成数据表看作是一系列环境和空间变量的函数,进行直接分析。这种分析应用统计方法将多样性指数或群落组成数据表的方差进行关于环境和空间变量的分解。该文对方差分解进行阐述。方差分解是利用环境和空间变量来解释β-多样性的一种方法。β-多样性是生态学家用来比较不同地点或同一地点不同生态群落的一种手段。方差分解就是将群落组成数据表的总方差无偏分解成由各个解释变量所决定的子方差。调整的决定系数提供了针对多元回归和典范冗余分析的无偏估计。 方差分解后,可以对感兴趣的方差解释部分进行显著性检验,同时绘出基于这部分方差解释的预测图。  相似文献   

6.
Gerow K  McCulloch CE 《Biometrics》2000,56(3):873-878
This paper proposes a class of inferential procedures (incorporating both design and estimation elements) that yield estimates of means that are simultaneously model unbiased and design unbiased. Classical regression procedures yield conditionally unbiased estimators for the mean (conditioning on the model and choice of observation points). In contrast, design-based methods yield estimators that are unconditionally unbiased on matter what the form of the underlying model. Variance properties of the proposed class are examined, and applications to bioavailability, water quality from mine run-off, and finite population regression estimation are considered. The proposed procedures perform well, especially in the typical case where a model is only approximately correct.  相似文献   

7.
It is well known for direct response surveys (DR), where the responses are obtained from the respondents directly, that the sample mean, based on distinct units of a simple random sample selected with replacement (SRSWR) method, is more efficient than the sample mean based on all the units including repetition. In this paper, it is shown that a linear unbiased estimator based on distinct units is inadmissible for estimating a finite population mean when the sample is selected by an arbitrary with replacement (WR) sampling scheme and the responses are obtained independently by some RR technique. Efficiencies for a few linear unbiased estimators are compared under SRSWR sampling.  相似文献   

8.
A novel feature screening method is proposed to examine the correlation between latent responses and potential predictors in ultrahigh-dimensional data analysis. First, a confirmatory factor analysis (CFA) model is used to characterize latent responses through multiple observed variables. The expectation-maximization algorithm is employed to estimate the parameters in the CFA model. Second, R-Vector (RV) correlation is used to measure the dependence between the multivariate latent responses and covariates of interest. Third, a feature screening procedure is proposed on the basis of an unbiased estimator of the RV coefficient. The sure screening property of the proposed screening procedure is established under certain mild conditions. Monte Carlo simulations are conducted to assess the finite-sample performance of the feature screening procedure. The proposed method is applied to an investigation of the relationship between psychological well-being and the human genome.  相似文献   

9.
We present a novel sampling approach to explore large protein conformational transitions by determining unique substates from instantaneous normal modes calculated from an elastic network model, and applied to a progression of atomistic molecular dynamics snapshots. This unbiased sampling scheme allows us to direct the path sampling between the conformational end states over simulation timescales that are greatly reduced relative to the known experimental timescales. We use adenylate kinase as a test system to show that instantaneous normal modes can be used to identify substates that drive the structural fluctuations of adenylate kinase from its closed to open conformations, in which we observe 16 complete transitions in 4 μs of simulation time, reducing the timescale over conventional simulation timescales by two orders of magnitude. Analysis shows that the unbiased determination of substates is consistent with known pathways determined experimentally.  相似文献   

10.
Pan W 《Biometrics》2000,56(1):199-203
We propose a general semiparametric method based on multiple imputation for Cox regression with interval-censored data. The method consists of iterating the following two steps. First, from finite-interval-censored (but not right-censored) data, exact failure times are imputed using Tanner and Wei's poor man's or asymptotic normal data augmentation scheme based on the current estimates of the regression coefficient and the baseline survival curve. Second, a standard statistical procedure for right-censored data, such as the Cox partial likelihood method, is applied to imputed data to update the estimates. Through simulation, we demonstrate that the resulting estimate of the regression coefficient and its associated standard error provide a promising alternative to the nonparametric maximum likelihood estimate. Our proposal is easily implemented by taking advantage of existing computer programs for right-censored data.  相似文献   

11.
Wang X  Zhou H 《Biometrics》2006,62(4):1149-1160
We consider a semiparametric inference procedure for data from epidemiologic studies conducted with a two-component sampling scheme where both a simple random sample and multiple outcome- or outcome-/auxiliary-dependent samples are observed. This sampling scheme allows the investigators to oversample certain subpopulations believed to have more information about the regression model while still gaining insights about the underlying population through the simple random sample. We focus on settings where there is no additional information about the parent cohort and the sampling probability is nonidentifiable. We motivate our problem with an ongoing study to assess the association between the mutation level of epidermal growth factor receptor (EGFR) and the antitumor response to EGFR-targeted therapy among nonsmall cell lung cancer patients. The proposed method applies to both binary and multicategorical outcome data and allows an arbitrary link function in the framework of generalized linear models. Simulation studies show that the proposed estimator has nice small sample properties. The proposed method is illustrated with a data example.  相似文献   

12.
Summary Nested case–control (NCC) design is a popular sampling method in large epidemiological studies for its cost effectiveness to investigate the temporal relationship of diseases with environmental exposures or biological precursors. Thomas' maximum partial likelihood estimator is commonly used to estimate the regression parameters in Cox's model for NCC data. In this article, we consider a situation in which failure/censoring information and some crude covariates are available for the entire cohort in addition to NCC data and propose an improved estimator that is asymptotically more efficient than Thomas' estimator. We adopt a projection approach that, heretofore, has only been employed in situations of random validation sampling and show that it can be well adapted to NCC designs where the sampling scheme is a dynamic process and is not independent for controls. Under certain conditions, consistency and asymptotic normality of the proposed estimator are established and a consistent variance estimator is also developed. Furthermore, a simplified approximate estimator is proposed when the disease is rare. Extensive simulations are conducted to evaluate the finite sample performance of our proposed estimators and to compare the efficiency with Thomas' estimator and other competing estimators. Moreover, sensitivity analyses are conducted to demonstrate the behavior of the proposed estimator when model assumptions are violated, and we find that the biases are reasonably small in realistic situations. We further demonstrate the proposed method with data from studies on Wilms' tumor.  相似文献   

13.
Informative drop-out arises in longitudinal studies when the subject's follow-up time depends on the unobserved values of the response variable. We specify a semiparametric linear regression model for the repeatedly measured response variable and an accelerated failure time model for the time to informative drop-out. The error terms from the two models are assumed to have a common, but completely arbitrary joint distribution. Using a rank-based estimator for the accelerated failure time model and an artificial censoring device, we construct an asymptotically unbiased estimating function for the linear regression model. The resultant estimator is shown to be consistent and asymptotically normal. A resampling scheme is developed to estimate the limiting covariance matrix. Extensive simulation studies demonstrate that the proposed methods are suitable for practical use. Illustrations with data taken from two AIDS clinical trials are provided.  相似文献   

14.
Aims: Beta diversity is the variation in species composition amongsites in a geographic region. Beta diversity is a key conceptfor understanding the functioning of ecosystems, for the conservationof biodiversity and for ecosystem management. The present reportdescribes how to analyse beta diversity from community compositionand associated environmental and spatial data tables. Methods: Beta diversity can be studied by computing diversity indicesfor each site and testing hypotheses about the factors thatmay explain the variation among sites. Alternatively, one cancarry out a direct analysis of the community composition datatable over the study sites, as a function of sets of environmentaland spatial variables. These analyses are carried out by thestatistical method of partitioning the variation of the diversityindices or the community composition data table with respectto environmental and spatial variables. Variation partitioningis briefly described herein. Important findings: Variation partitioning is a method of choice for the interpretationof beta diversity using tables of environmental and spatialvariables. Beta diversity is an interesting ‘currency’for ecologists to compare either different sampling areas ordifferent ecological communities co-occurring in an area. Partitioningmust be based upon unbiased estimates of the variation of thecommunity composition data table that is explained by the varioustables of explanatory variables. The adjusted coefficient ofdetermination provides such an unbiased estimate in both multipleregression and canonical redundancy analysis. After partitioning,one can test the significance of the fractions of interest andplot maps of the fitted values corresponding to these fractions.  相似文献   

15.
A finite population consists of kN individuals of N different categories with k individuals each. It is required to estimate the unknown parameter N, the number of different classes in the population. A sequential sampling scheme is considered in which individuals are sampled until a preassigned number of repetitions of already observed categories occur in the sample. Corresponding fixed sample size schemes were considered by Charalambides (1981). The sequential sampling scheme has the advantage of always allowing unbiased estimation of the size parameter N. It is shown that relative to Charalambides' fixed sample size scheme only minor adjustments are required to account for the sequential scheme. In particular, MVU estimators of parametric functions are expressible in terms of the C-numbers introduced by Charalambides.  相似文献   

16.
Coefficient of variation, standard deviation divided by mean, has some essential defects. Its density, expectation and variance are too complex to make the statistical inference for such a coefficient. The definition of stabilization coefficient is just the reciprocal of variation coefficient, mean divided by standard deviation. Such a coefficient has a simple expectation and a simple variance, and is an asymptotically unbiased estimator and a consistent estimator of its true value. Furthermore, coefficient of stabilization has an asymptotic normality. Due to its statistical advantages, coefficient of stabilization is easy to be tested statistically. In some applied fields, usually, there is an increasing standard deviation accompanying an increasing mean. Coefficient of stabilization can be practically used for some comparison studies in such fields. Illustrations about comparing microorganism strains are given in this paper. The robustness of stabilization coefficient is satisfactory.  相似文献   

17.
Kinjo AR  Horimoto K  Nishikawa K 《Proteins》2005,58(1):158-165
The contact number of an amino acid residue in a protein structure is defined by the number of C(beta) atoms around the C(beta) atom of the given residue, a quantity similar to, but different from, solvent accessible surface area. We present a method to predict the contact numbers of a protein from its amino acid sequence. The method is based on a simple linear regression scheme and predicts the absolute values of contact numbers. When single sequences are used for both parameter estimation and cross-validation, the present method predicts the contact numbers with a correlation coefficient of 0.555 on average. When multiple sequence alignments are used, the correlation increases to 0.627, which is a significant improvement over previous methods. In terms of discrete states prediction, the accuracies for 2-, 3-, and 10-state predictions are, respectively, 71.4%, 54.1%, and 18.9% with residue type-dependent unbiased thresholds, and 76.3%, 59.2%, and 21.8% with residue type-independent unbiased thresholds. The difference between accessible surface area and contact number from a prediction viewpoint and the application of contact number prediction to three-dimensional structure prediction are discussed.  相似文献   

18.
Case–control designs are commonly employed in genetic association studies. In addition to the case–control status, data on secondary traits are often collected. Directly regressing secondary traits on genetic variants from a case–control sample often leads to biased estimation. Several statistical methods have been proposed to address this issue. The inverse probability weighting (IPW) approach and the semiparametric maximum-likelihood (SPML) approach are the most commonly used. A new weighted estimating equation (WEE) approach is proposed to provide unbiased estimation of genetic associations with secondary traits, by combining observed and counterfactual outcomes. Compared to the existing approaches, WEE is more robust against biased sampling and disease model misspecification. We conducted simulations to evaluate the performance of the WEE under various models and sampling schemes. The WEE demonstrated robustness in all scenarios investigated, had appropriate type I error, and was as powerful or more powerful than the IPW and SPML approaches. We applied the WEE to an asthma case–control study to estimate the associations between the thymic stromal lymphopoietin gene and two secondary traits: overweight status and serum IgE level. The WEE identified two SNPs associated with overweight in logistic regression, three SNPs associated with serum IgE levels in linear regression, and an additional four SNPs that were missed in linear regression to be associated with the 75th quantile of IgE in quantile regression. The WEE approach provides a general and robust secondary analysis framework, which complements the existing approaches and should serve as a valuable tool for identifying new associations with secondary traits.  相似文献   

19.
Rapidly improving high-throughput sequencing technologies provide unprecedented opportunities for carrying out population-genomic studies with various organisms. To take full advantage of these methods, it is essential to correctly estimate allele and genotype frequencies, and here we present a maximum-likelihood method that accomplishes these tasks. The proposed method fully accounts for uncertainties resulting from sequencing errors and biparental chromosome sampling and yields essentially unbiased estimates with minimal sampling variances with moderately high depths of coverage regardless of a mating system and structure of the population. Moreover, we have developed statistical tests for examining the significance of polymorphisms and their genotypic deviations from Hardy–Weinberg equilibrium. We examine the performance of the proposed method by computer simulations and apply it to low-coverage human data generated by high-throughput sequencing. The results show that the proposed method improves our ability to carry out population-genomic analyses in important ways. The software package of the proposed method is freely available from https://github.com/Takahiro-Maruki/Package-GFE.  相似文献   

20.
The existence of missing observations when the difference of means is estimated determines the need of sub sampling among the non respondents. Ranked set sampling is used for sub sampling. The information provided on one of the variables by the non respondents at the first attempt permits to rank them. The behavior of a ranked set sampling model with respect to other alternattives is studied in this paper. An unbiased estimator is derived and its expected variance is obtained. The proposed model is compared with the use of simple random sampling and Two‐phase sampling for stratification.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号