首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In the era of big data, univariate models have widely been used as a workhorse tool for quickly producing marginal estimators; and this is true even when in a high-dimensional dense setting, in which many features are “true,” but weak signals. Genome-wide association studies (GWAS) epitomize this type of setting. Although the GWAS marginal estimator is popular, it has long been criticized for ignoring the correlation structure of genetic variants (i.e., the linkage disequilibrium [LD] pattern). In this paper, we study the effects of LD pattern on the GWAS marginal estimator and investigate whether or not additionally accounting for the LD can improve the prediction accuracy of complex traits. We consider a general high-dimensional dense setting for GWAS and study a class of ridge-type estimators, including the popular marginal estimator and the best linear unbiased prediction (BLUP) estimator as two special cases. We show that the performance of GWAS marginal estimator depends on the LD pattern through the first three moments of its eigenvalue distribution. Furthermore, we uncover that the relative performance of GWAS marginal and BLUP estimators highly depends on the ratio of GWAS sample size over the number of genetic variants. Particularly, our finding reveals that the marginal estimator can easily become near-optimal within this class when the sample size is relatively small, even though it ignores the LD pattern. On the other hand, BLUP estimator has substantially better performance than the marginal estimator as the sample size increases toward the number of genetic variants, which is typically in millions. Therefore, adjusting for the LD (such as in the BLUP) is most needed when GWAS sample size is large. We illustrate the importance of our results by using the simulated data and real GWAS.  相似文献   

2.
A logistic regression with random effects model is commonly applied to analyze clustered binary data, and every cluster is assumed to have a different proportion of success. However, it could be of interest to obtain the proportion of success over clusters (i.e. the marginal proportion of success). Furthermore, the degree of correlation among data of the same cluster (intraclass correlation) is also a relevant concept to assess, but when using logistic regression with random effects it is not possible to get an analytical expression of the estimators for marginal proportion and intraclass correlation. In our paper, we assess and compare approaches using different kinds of approximations: based on the logistic‐normal mixed effects model (LN), linear mixed model (LMM), and generalized estimating equations (GEE). The comparisons are completed by using two real data examples and a simulation study. The results show the performance of the approaches strongly depends on the magnitude of the marginal proportion, the intraclass correlation, and the sample size. In general, the reliability of the approaches get worsen with low marginal proportion and large intraclass correlation. LMM and GEE approaches arises as reliable approaches when the sample size is large.  相似文献   

3.
Manatunga AK  Chen S 《Biometrics》2000,56(2):616-621
We present a method for computing sample size for cluster-randomized studies involving a large number of clusters with relatively small numbers of observations within each cluster. For multivariate survival data, only the marginal bivariate distribution is assumed to be known. The validity of this assumption is also discussed.  相似文献   

4.
We present a model to estimate the size of an unknown population from a number of lists that applies when the assumptions of (a) homogeneity of capture probabilities of individuals and (b) marginal independence of lists are violated. This situation typically occurs in epidemiological studies, where the heterogeneity of individuals is severe and researchers cannot control the independence between sources of ascertainment. We discuss the situation when categorical covariates are available and the interest is not only in the total undercount, but also in the undercount within each stratum resulting from the cross-classification of the covariates. We also present several techniques for determining confidence intervals of the undercount within each stratum using the profile log likelihood, thereby extending the work of Cormack (1992, Biometrics48, 567-576).  相似文献   

5.
We propose a new approach to fitting marginal models to clustered data when cluster size is informative. This approach uses a generalized estimating equation (GEE) that is weighted inversely with the cluster size. We show that our approach is asymptotically equivalent to within-cluster resampling (Hoffman, Sen, and Weinberg, 2001, Biometrika 73, 13-22), a computationally intensive approach in which replicate data sets containing a randomly selected observation from each cluster are analyzed, and the resulting estimates averaged. Using simulated data and an example involving dental health, we show the superior performance of our approach compared to unweighted GEE, the equivalence of our approach with WCR for large sample sizes, and the superior performance of our approach compared with WCR when sample sizes are small.  相似文献   

6.
We consider sample size calculations for testing differences in means between two samples and allowing for different variances in the two groups. Typically, the power functions depend on the sample size and a set of parameters assumed known, and the sample size needed to obtain a prespecified power is calculated. Here, we account for two sources of variability: we allow the sample size in the power function to be a stochastic variable, and we consider estimating the parameters from preliminary data. An example of the first source of variability is nonadherence (noncompliance). We assume that the proportion of subjects who will adhere to their treatment regimen is not known before the study, but that the proportion is a stochastic variable with a known distribution. Under this assumption, we develop simple closed form sample size calculations based on asymptotic normality. The second source of variability is in parameter estimates that are estimated from prior data. For example, we account for variability in estimating the variance of the normal response from existing data which are assumed to have the same variance as the study for which we are calculating the sample size. We show that we can account for the variability of the variance estimate by simply using a slightly larger nominal power in the usual sample size calculation, which we call the calibrated power. We show that the calculation of the calibrated power depends only on the sample size of the existing data, and we give a table of calibrated power by sample size. Further, we consider the calculation of the sample size in the rarer situation where we account for the variability in estimating the standardized effect size from some existing data. This latter situation, as well as several of the previous ones, is motivated by sample size calculations for a Phase II trial of a malaria vaccine candidate.  相似文献   

7.
Microarray studies, in order to identify genes associated with an outcome of interest, usually produce noisy measurements for a large number of gene expression features from a small number of subjects. One common approach to analyzing such high-dimensional data is to use linear errors-in-variables (EIV) models; however, current methods for fitting such models are computationally expensive. In this paper, we present two efficient screening procedures, namely, corrected penalized marginal screening (PMSc) and corrected sure independence screening (SISc), to reduce the number of variables for final model building. Both screening procedures are based on fitting corrected marginal regression models relating the outcome to each contaminated covariate separately, which can be computed efficiently even with a large number of features. Under mild conditions, we show that these procedures achieve screening consistency and reduce the number of features substantially, even when the number of covariates grows exponentially with sample size. In addition, if the true covariates are weakly correlated, we show that PMSc can achieve full variable selection consistency. Through a simulation study and an analysis of gene expression data for bone mineral density of Norwegian women, we demonstrate that the two new screening procedures make estimation of linear EIV models computationally scalable in high-dimensional settings, and improve finite sample estimation and selection performance compared with estimators that do not employ a screening stage.  相似文献   

8.
Manly 《Ecology letters》1998,1(2):104-111
Testing for a relationship between the body size of animals and a gradient such as latitude is complicated by the fact that typically there is a single average size for each species, and each species occurs at several sample stations over the gradient. This results in standard tests for statistical significance being invalid. This problem can be overcome by using a randomization test. A more difficult problem, however, is determining whether the relationship between size and latitude is the same for two subfamilies of species. In this paper a general method for relating body size to latitude and subfamily differences is proposed, with the significance of effects determined by randomization. A simulation study suggests that this procedure has good properties. This approach to data analysis has promise both for the particular situation considered and for other related problems in biogeography.  相似文献   

9.
Simple, defensible sample sizes based on cost efficiency   总被引:1,自引:0,他引:1  
Summary .   The conventional approach of choosing sample size to provide 80% or greater power ignores the cost implications of different sample size choices. Costs, however, are often impossible for investigators and funders to ignore in actual practice. Here, we propose and justify a new approach for choosing sample size based on cost efficiency, the ratio of a study's projected scientific and/or practical value to its total cost. By showing that a study's projected value exhibits diminishing marginal returns as a function of increasing sample size for a wide variety of definitions of study value, we are able to develop two simple choices that can be defended as more cost efficient than any larger sample size. The first is to choose the sample size that minimizes the average cost per subject. The second is to choose sample size to minimize total cost divided by the square root of sample size. This latter method is theoretically more justifiable for innovative studies, but also performs reasonably well and has some justification in other cases. For example, if projected study value is assumed to be proportional to power at a specific alternative and total cost is a linear function of sample size, then this approach is guaranteed either to produce more than 90% power or to be more cost efficient than any sample size that does. These methods are easy to implement, based on reliable inputs, and well justified, so they should be regarded as acceptable alternatives to current conventional approaches.  相似文献   

10.
A fundamental goal of ecology and evolution is to explain patterns of species distribution and abundance. However, the way in which stable distribution ranges are shaped by natural selection is still poorly understood, especially whether patterns of resource allocation have contributed to the range size and the formation of range boundary received little attention. For annual herb, the maximum reproductive allocation is predicted to be 50%, and thus we predicted that reproductive allocation might contribute to the formation of range boundary since plant will enhance allocations to reproduction in stressful environments. In this study, we presented our data on resource allocation between population from the glacial refegium and those from the marginal populations in Gymnaconitum gymnandrum, an alpine biennial native to the Qinghai Tibet Plateau, aiming to find the contribution of resource allocation to the formation of range boundary. Our results showed that resource allocations to vegetative organs, including roots, plant height and stem leaf biomass, were significantly higher in the refugium population that in the two marginal populations, and allocations to reproductive organs, including flower number and flower biomass, were significantly lower in one marginal population (Haibei population) than in the other marginal population (Xinghai population) and the refugium population (Tongren population). However, reproductive allocation was significantly higher in the marginal populations than in the refugium population. In addition, in each of the three populations, we found a positive relationship between the plant size and flower biomass but a negative relationship between the plant size and reproductive allocation. Our results indicated a size dependent reproductive allocation in Ggymnandrum, but we did not find a size threshold for reproduction in each of the three populations of this plant, which might be attributed to the life history of this biennial herb. We also suggested that reproductive allocation was increased during the process of range expansion and may rise to the optimal reproductive allocation in the marginal populations, which suggested the important role of sexual reproduction for plants in more stressful environments and the formation of range boundary. However, these conclusions need to be further proved in other plant species.  相似文献   

11.
物种分布范围的形成是进化生态学研究的基本问题之一,但植物的资源分配策略是否与物种边界形成有关一直没有相关研究。青藏高原特有植物露蕊乌头在末次最大冰期时有4个避难所,但冰期后只有一个避难所的种群发生了扩张并最终形成了现代分布格局。以露蕊乌头的避难所种群(同仁种群)和扩张后邻近分布区边缘的两个种群(兴海种群和海北种群)为研究对象,通过比较避难所种群和边缘种群的资源分配方式,探讨露蕊乌头的资源分配与该植物分布区及边界形成的关系。结果发现:1)兴海和海北种群的营养结构(包括根、植株高度和茎叶生物量)均显著低于同仁种群,海北种群的繁殖结构(花数量和花生物量)显著低于同仁和兴海种群,但海北和兴海的繁殖分配均显著高于同仁种群;2)3个种群的繁殖资源与个体大小呈现显著的正相关关系,投入到繁殖资源的比例(繁殖分配)与个体大小呈显著的负相关关系。对露蕊乌头的研究结果一方面进一步证明了个体大小依赖的繁殖分配,但不符合“植物开始繁殖必须达到一定的大小(阈值)”这一结论,这可能与露蕊乌头的生活史特征有关;而另一方面,露蕊乌头在扩张过程中逐渐增加了对繁殖资源投资的比例,说明胁迫生境中有性繁殖对该植物具有更为重要的意义,且露蕊乌头在扩张过程中可能逐渐实现繁殖产出最大化,并可能在边缘种群实现最优繁殖分配进而最终形成该物种分布区的边界,但这一结论仍需在更多的植物类群中验证。  相似文献   

12.
The child-rearing climate ofMandrillus sphinx andTheropithecus gelada was studied for two weeks in a zoo situation. The social interactions between the young male of each group and their respective group members are described and examined in detail, noting similarities and differences. Possible factors affecting the findings are suggested (notable among them being small sample size, age differences, and group compositional differences).  相似文献   

13.
物种分布范围的形成是进化生态学研究的基本问题之一。但植物的资源分配策略是否与物种边界形成有关一直没有相关研究。青藏高原特有植物露蕊乌头在末次最大冰期时有4个避难所,但冰期后只有一个避难所的种群发生了扩张并最终形成了现代分布格局。以露蕊乌头的避难所种群(同仁种群)和扩张后邻近分布区边缘的两个种群(兴海种群和海北种群)为研究对象,通过比较避难所种群和边缘种群的资源分配方式,探讨露蕊乌头的资源分配与该植物分布区及边界形成的关系。结果发现:1)兴海和海北种群的营养结构(包括根、植株高度和茎叶生物量)均显著低于同仁种群.海北种群的繁殖结构(花数量和花生物量)显著低于同仁和兴海种群,但海北和兴海的繁殖分配均显著高于同仁种群;2)3个种群的繁殖资源与个体大小呈现显著的正相关关系,投入到繁殖资源的比例(繁殖分配)与个体大小呈显著的负相关关系。对露蕊乌头的研究结果一方面进一步证明了个体大小依赖的繁殖分配,但不符合“植物开始繁殖必须达到一定的大小(阈值)”这一结论,这可能与露蕊乌头的生活史特征有关:而另一方面,露蕊乌头在扩张过程中逐渐增加了对繁殖资源投资的比例,说明胁迫生境中有性繁殖对该植物具有更为重要的意义,且露蕊乌头在扩张过程中可能逐渐实现繁殖产出最大化,并可能在边缘种群实现最优繁殖分配进而最终形成该物种分布区的边界,但这一结论仍需在更多的植物类群中验证。  相似文献   

14.
Marginal technologies are defined as the technologies actually affected by the small changes in demand typically studied in prospective, comparative life cycle assessments. Using data on marginal technologies thus give the best reflection of the actual consequences of a decision. Furthermore, data on marginal technologies are easier to collect, more precise, and more stable in time than data on average technologies. A 5-step procedure is suggested to identify the marginal technologies. The step-wise procedure first clarifies the situation in which the marginal should apply, and then identifies what specific technology is marginal in this situation. The procedure is illustrated in two examples: European electricity production and pulp and paper production.  相似文献   

15.
Maximum likelihood estimation of the model parameters for a spatial population based on data collected from a survey sample is usually straightforward when sampling and non-response are both non-informative, since the model can then usually be fitted using the available sample data, and no allowance is necessary for the fact that only a part of the population has been observed. Although for many regression models this naive strategy yields consistent estimates, this is not the case for some models, such as spatial auto-regressive models. In this paper, we show that for a broad class of such models, a maximum marginal likelihood approach that uses both sample and population data leads to more efficient estimates since it uses spatial information from sampled as well as non-sampled units. Extensive simulation experiments based on two well-known data sets are used to assess the impact of the spatial sampling design, the auto-correlation parameter and the sample size on the performance of this approach. When compared to some widely used methods that use only sample data, the results from these experiments show that the maximum marginal likelihood approach is much more precise.  相似文献   

16.
Summary An outcome‐adaptive Bayesian design is proposed for choosing the optimal dose pair of a chemotherapeutic agent and a biological agent used in combination in a phase I/II clinical trial. Patient outcome is characterized as a vector of two ordinal variables accounting for toxicity and treatment efficacy. A generalization of the Aranda‐Ordaz model (1981, Biometrika 68 , 357–363) is used for the marginal outcome probabilities as functions of a dose pair, and a Gaussian copula is assumed to obtain joint distributions. Numerical utilities of all elementary patient outcomes, allowing the possibility that efficacy is inevaluable due to severe toxicity, are obtained using an elicitation method aimed to establish consensus among the physicians planning the trial. For each successive patient cohort, a dose pair is chosen to maximize the posterior mean utility. The method is illustrated by a trial in bladder cancer, including simulation studies of the method's sensitivity to prior parameters, the numerical utilities, correlation between the outcomes, sample size, cohort size, and starting dose pair.  相似文献   

17.
Some drawbacks of the classical Mather's linkage text XL2 are considered, and the simple contingency analysis is suggested as an alternative method. The former test is conditional on Mendelian segregation at both loci, whereas the simple contingency test is not. Furthermore, the contingency test and the test for Mendelian segregation at each locus are orthogonal when performed using the G statistic. Simulation results show that, when the XL2 is used, the actual type I error probability (alpha a) can be dramatically perturbed. As expected, no alpha a perturbation is observed when the G contingency test is used. On the other hand, when segregation is Mendelian at both loci, the power of the XL2 method is larger than that of the contingency G test when sample size is small and strong marginal distortion is observed. Because strong marginal distortion may suggest that segregation may be non-Mendelian, the XL2 is in general discouraged in favor of the simple contingency analysis.  相似文献   

18.
19.
The present study aimed at exploring the statistical power of ergonomic intervention studies using electromyography (EMG) from the upper trapezius muscle. Data from a previous study of cyclic assembly work were reanalyzed with respect to exposure variability between subjects, between days, and within days. On basis of this information, the precision and power of different data collection strategies were explored. A sampling strategy comprising four registrations of about two min each (i.e. two work cycles) for one day per subject resulted in coefficients of variation between subjects on the 10-, 50-, and 90-APDF-percentiles of 0.44, 0.31, and 0.29, respectively. The corresponding necessary numbers of subjects in a study aiming at detecting a 20% exposure difference between two independent groups of equal size were 154, 78, and 68, respectively (p< or = 0.05, power 0.80). Multiple measurement days per subject would improve power, but only to a marginal extent beyond 4 days of recording. Increasing the number of recordings per day would have minor effects. Bootstrap resampling of the data set revealed that estimates of variability and power were associated with considerable uncertainty. The present results in combination with an overview of other occupational studies showed that common-size investigations using trapezius EMG percentiles are at great risk of suffering from insufficient statistical power, even if the expected intervention effect is substantial. The paper suggests a procedure of how to retrieve and use exposure variability information as an aid when studies are planned, and how to allocate measurements efficiently.  相似文献   

20.
The Exact Test for Cytonuclear Disequilibria   总被引:2,自引:0,他引:2       下载免费PDF全文
C. J. Basten  M. A. Asmussen 《Genetics》1997,146(3):1165-1171
We extend the analysis of the statistical properties of cytonuclear disequilibria in two major ways. First, we develop the asymptotic sampling theory for the nonrandom associations between the alleles at a haploid cytoplasmic locus and the alleles and genotypes at a diploid nuclear locus, when there are an arbitrary number of alleles at each marker. This includes the derivation of the maximum likelihood estimators and their sampling variances for each disequilibrium measure, together with simple tests of the null hypothesis of no disequilibrium. In addition to these new asymptotic tests, we provide the first implementation of Fisher's exact test for the genotypic cytonuclear disequilibria and some approximations of the exact test. We also outline an exact test for allelic cytonuclear disequilibria in multiallelic systems. An exact test should be used for data sets when either the marginal frequencies are extreme or the sample size is small. The utility of this new sampling theory is illustrated through applications to recent nuclear-mtDNA and nuclear-cpDNA data sets. The results also apply to population surveys of nuclear loci in conjunction with markers in cytoplasmically inherited microorganisms.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号