首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
In ecology, as in other research fields, efficient sampling for population estimation often drives sample designs toward unequal probability sampling, such as in stratified sampling. Design based statistical analysis tools are appropriate for seamless integration of sample design into the statistical analysis. However, it is also common and necessary, after a sampling design has been implemented, to use datasets to address questions that, in many cases, were not considered during the sampling design phase. Questions may arise requiring the use of model based statistical tools such as multiple regression, quantile regression, or regression tree analysis. However, such model based tools may require, for ensuring unbiased estimation, data from simple random samples, which can be problematic when analyzing data from unequal probability designs. Despite numerous method specific tools available to properly account for sampling design, too often in the analysis of ecological data, sample design is ignored and consequences are not properly considered. We demonstrate here that violation of this assumption can lead to biased parameter estimates in ecological research. In addition, to the set of tools available for researchers to properly account for sampling design in model based analysis, we introduce inverse probability bootstrapping (IPB). Inverse probability bootstrapping is an easily implemented method for obtaining equal probability re-samples from a probability sample, from which unbiased model based estimates can be made. We demonstrate the potential for bias in model-based analyses that ignore sample inclusion probabilities, and the effectiveness of IPB sampling in eliminating this bias, using both simulated and actual ecological data. For illustration, we considered three model based analysis tools—linear regression, quantile regression, and boosted regression tree analysis. In all models, using both simulated and actual ecological data, we found inferences to be biased, sometimes severely, when sample inclusion probabilities were ignored, while IPB sampling effectively produced unbiased parameter estimates.  相似文献   

3.
In biomedical or public health research, it is common for both survival time and longitudinal categorical outcomes to be collected for a subject, along with the subject’s characteristics or risk factors. Investigators are often interested in finding important variables for predicting both survival time and longitudinal outcomes which could be correlated within the same subject. Existing approaches for such joint analyses deal with continuous longitudinal outcomes. New statistical methods need to be developed for categorical longitudinal outcomes. We propose to simultaneously model the survival time with a stratified Cox proportional hazards model and the longitudinal categorical outcomes with a generalized linear mixed model. Random effects are introduced to account for the dependence between survival time and longitudinal outcomes due to unobserved factors. The Expectation–Maximization (EM) algorithm is used to derive the point estimates for the model parameters, and the observed information matrix is adopted to estimate their asymptotic variances. Asymptotic properties for our proposed maximum likelihood estimators are established using the theory of empirical processes. The method is demonstrated to perform well in finite samples via simulation studies. We illustrate our approach with data from the Carolina Head and Neck Cancer Study (CHANCE) and compare the results based on our simultaneous analysis and the separately conducted analyses using the generalized linear mixed model and the Cox proportional hazards model. Our proposed method identifies more predictors than by separate analyses.  相似文献   

4.
Analysis of categorical outcomes in a longitudinal study has been an important statistical issue. Continuous outcome in a similar study design is commonly handled by the mixed effects model. The longitudinal binary or Poisson-like outcome analysis is often handled by the generalized estimation equation (GEE) method. Neither method is appropriate for analyzing a multinomial outcome in a longitudinal study, although the cross-sectional multinomial outcome is often analyzed by generalized linear models. One reason that these methods are not used is that the correlation structure of two multinomial variables can not be easily specified. In addition, methods that rely upon GEE or mixed effects models are unsuitable in instances when the focus of a longitudinal study is on the rate of moving from one category to another. In this research, a longitudinal model that has three categories in the outcome variable will be examined. A continuous-time Markov chain model will be used to examine the transition from one category to another. This model permits an unbalanced number of measurements collected on individuals and an uneven duration between pairs of consecutive measurements. In this study, the explicit expression for the transition probability is derived that provides an algebraic form of the likelihood function and hence allows the implementation of the maximum likelihood method. Using this approach, the instantaneous transition rate that is assumed to be a function of the linear combination of independent variables can be estimated. For a comparison between two groups, the odds ratios of occurrence at a particular category and their confidence intervals can be calculated. Empirical studies will be performed to compare the goodness of fit of the proposed method with other available methods. An example will also be used to demonstrate the application of this method.  相似文献   

5.
Lou XY  Yang MC 《Genetica》2006,128(1-3):471-484
A genetic model is developed with additive and dominance effects of a single gene and polygenes as well as general and specific reciprocal effects for the progeny from a diallel mating design. The methods of ANOVA, minimum norm quadratic unbiased estimation (MINQUE), restricted maximum likelihood estimation (REML), and maximum likelihood estimation (ML) are suggested for estimating variance components, and the methods of generalized least squares (GLS) and ordinary least squares (OLS) for fixed effects, while best linear unbiased prediction, linear unbiased prediction (LUP), and adjusted unbiased prediction are suggested for analyzing random effects. Monte Carlo simulations were conducted to evaluate the unbiasedness and efficiency of statistical methods involving two diallel designs with commonly used sample sizes, 6 and 8 parents, with no and missing crosses, respectively. Simulation results show that GLS and OLS are almost equally efficient for estimation of fixed effects, while MINQUE (1) and REML are better estimators of the variance components and LUP is most practical method for prediction of random effects. Data from a Drosophila melanogaster experiment (Gilbert 1985a, Theor appl Genet 69:625–629) were used as a working example to demonstrate the statistical analysis. The new methodology is also applicable to screening candidate gene(s) and to other mating designs with multiple parents, such as nested (NC Design I) and factorial (NC Design II) designs. Moreover, this methodology can serve as a guide to develop new methods for detecting indiscernible major genes and mapping quantitative trait loci based on mixture distribution theory. The computer program for the methods suggested in this article is freely available from the authors.  相似文献   

6.
7.
Rapid developments in the biomedical sciences have increased the demand for automatic clustering of biomedical publications. In contrast to current approaches to text clustering, which focus exclusively on the contents of abstracts, a novel method is proposed for clustering and analysis of complete biomedical article texts. To reduce dimensionality, Cosine Coefficient is used on a sub-space of only two vectors, instead of computing the Euclidean distance within the space of all vectors. Then a strategy and algorithm is introduced for Semi-supervised Affinity Propagation (SSAP) to improve analysis efficiency, using biomedical journal names as an evaluation background. Experimental results show that by avoiding high-dimensional sparse matrix computations, SSAP outperforms conventional k-means methods and improves upon the standard Affinity Propagation algorithm. In constructing a directed relationship network and distribution matrix for the clustering results, it can be noted that overlaps in scope and interests among BioMed publications can be easily identified, providing a valuable analytical tool for editors, authors and readers.  相似文献   

8.
In order to maximize control of heterogeneity within complete blocks, an experimenter could use incomplete blocks of size k = 2 or 3. In certain situations, incomplete blocks of this nature would eliminate the need for such spatial types of analyses as nearest neighbor. The intrablock efficiency factors for such designs are relatively low. However, with recovery of interblock information, FEDERER and SPEED (1987) have presented measures of design efficiency factors which demonstrate that efficiency factors approach unity for certain ratios of the intrablock and interblock variance components. Hence with recovery of interblock information, even incomplete block designs with k = 2 or 3 have relatively high efficiency factors. The reduction in the intrablock error variance over the complete block error variance in many situations will provide designs with high efficiency. A simple procedure for constructing incomplete blocks of sizes 2 and 3 is presented. It is shown how to obtain additional zero-one association confounding arrangements when v = 4 t, t an integer, and for v = pk, k ≤ p. It is indicated how to do the statistical analysis for these designs.  相似文献   

9.
In the last years, biostatistical research has begun to apply linear models and design theory to develop efficient experimental designs and analysis tools for gene expression microarray data. With two-colour microarrays, direct comparisons of RNA-targets are possible and lead to incomplete block designs. In this setting, efficient designs for simple and factorial microarray experiments have mainly been proposed for technical replicates. But for biological replicates, which are crucial to obtain inference that can be generalised to a biological population, this question has only been discussed recently and is not fully solved yet. In this paper, we propose efficient designs for independent two-sample experiments using two-colour microarrays enabling biologists to measure their biological random samples in an efficient manner to draw generalisable conclusions. We give advice for experimental situations with differing group sizes and show the impact of different designs on the variance and degrees of freedom of the test statistics. The designs proposed in this paper can be evaluated using SAS PROC MIXED or S+/R lme.  相似文献   

10.
生物医学动物实验中的实验设计和统计分析   总被引:1,自引:0,他引:1  
实验设计和统计分析在动物实验研究的启动、实施和结果评价中起着关键的作用。我们对实验设计的因素、原则及实验设计类型进行了综述,阐明了统计分析在整个研究所有环节中的重要意义,并提出在生物医学动物实验中容易忽视的统计学分析的问题。  相似文献   

11.
12.
INTRODUCTION: Microarray experiments often have complex designs that include sample pooling, biological and technical replication, sample pairing and dye-swapping. This article demonstrates how statistical modelling can illuminate issues in the design and analysis of microarray experiments, and this information can then be used to plan effective studies. METHODS: A very detailed statistical model for microarray data is introduced, to show the possible sources of variation that are present in even the simplest microarray experiments. Based on this model, the efficacy of common experimental designs, normalisation methodologies and analyses is determined. RESULTS: When the cost of the arrays is high compared with the cost of samples, sample pooling and spot replication are shown to be efficient variance reduction methods, whereas technical replication of whole arrays is demonstrated to be very inefficient. Dye-swap designs can use biological replicates rather than technical replicates to improve efficiency and simplify analysis. When the cost of samples is high and technical variation is a major portion of the error, technical replication can be cost effective. Normalisation by centreing on a small number of spots may reduce array effects, but can introduce considerable variation in the results. Centreing using the bulk of spots on the array is less variable. Similarly, normalisation methods based on regression methods can introduce variability. Except for normalisation methods based on spiking controls, all normalisation requires that most genes do not differentially express. Methods based on spatial location and/or intensity also require that the nondifferentially expressing genes are at random with respect to location and intensity. Spotting designs should be carefully done so that spot replicates are widely spaced on the array, and genes with similar expression patterns are not clustered. DISCUSSION: The tools for statistical design of experiments can be applied to microarray experiments to improve both efficiency and validity of the studies. Given the high cost of microarray experiments, the benefits of statistical input prior to running the experiment cannot be over-emphasised.  相似文献   

13.
Analyzing incomplete longitudinal clinical trial data   总被引:1,自引:0,他引:1  
Using standard missing data taxonomy, due to Rubin and co-workers, and simple algebraic derivations, it is argued that some simple but commonly used methods to handle incomplete longitudinal clinical trial data, such as complete case analyses and methods based on last observation carried forward, require restrictive assumptions and stand on a weaker theoretical foundation than likelihood-based methods developed under the missing at random (MAR) framework. Given the availability of flexible software for analyzing longitudinal sequences of unequal length, implementation of likelihood-based MAR analyses is not limited by computational considerations. While such analyses are valid under the comparatively weak assumption of MAR, the possibility of data missing not at random (MNAR) is difficult to rule out. It is argued, however, that MNAR analyses are, themselves, surrounded with problems and therefore, rather than ignoring MNAR analyses altogether or blindly shifting to them, their optimal place is within sensitivity analysis. The concepts developed here are illustrated using data from three clinical trials, where it is shown that the analysis method may have an impact on the conclusions of the study.  相似文献   

14.
Statistical methods suitable for the analysis of plant tissue culture data   总被引:1,自引:0,他引:1  
Statistical analyses are an essential part of biological research. Statistical methods are available to biological researchers that range from very simple to extremely complex. Therefore, caution should be used when selecting a statistical method. When possible it is best to avoid complicated statistical procedures that are difficult to interpret and may hinder the researcher's ability to make treatment comparisons. Instead a method should be chosen that compliments a logical and practical treatment design. Statistics should be used as a tool to compare treatments of interest and should not dictate the treatments. Experimental designs should take into account the eventual analysis, otherwise one could conceive of a design that could not be analyzed or, when analyzed, would not answer the desired questions. Therefore, time should be spent before conducting an experiment to plan an experimental design and analysis that best compliments the treatment scheme and questions to be answered. The purpose of this paper is to present examples of experimental designs, means separation procedures, data transformations and presentation methods suitable for plant cell and tissue culture data.Abbreviations ANOVA analysis of variance - BA benzyladenine - CV coefficient of variation - DF degrees of freedom - IAA indole-3-acetic acid - IBA indole-3-butyric acid - LOF lack-of-fit - MSE mean square error - P-ITB phenyl indole-3-thiolobutyrate - S standard deviation - SE standard error of the mean - TDZ thidiazuron  相似文献   

15.
The physical input‐output table (PIOT) is a useful tool for analyzing the environmental sustainability of cities. Taking Chinese statistical sources as an example in this study, we discuss data acquisition methods for applying the PIOT to cities. We propose several methods and present a case study of Suzhou City to illustrate the proposed methods. These methods can provide foundations for constructing the PIOT of cities in other countries.  相似文献   

16.
拉曼光谱技术在生命科学领域已取得较为广泛的研究与应用,如何从低信噪比、质量差的拉曼光谱信号提取并充分利用谱图中所含信息,对光谱后续分析、样品的归类等至关重要.本文首先明确了生物组织拉曼光谱统计分析中几个重要的概念,进而对拉曼光谱数据的预处理,光谱分析研究中较为常用的几种多元统计方法进行了归纳、对比与分析.  相似文献   

17.
18.
The determination of a list of differentially expressed genes is a basic objective in many cDNA microarray experiments. We present a statistical approach that allows direct control over the percentage of false positives in such a list and, under certain reasonable assumptions, improves on existing methods with respect to the percentage of false negatives. The method accommodates a wide variety of experimental designs and can simultaneously assess significant differences between multiple types of biological samples. Two interconnected mixed linear models are central to the method and provide a flexible means to properly account for variability both across and within genes. The mixed model also provides a convenient framework for evaluating the statistical power of any particular experimental design and thus enables a researcher to a priori select an appropriate number of replicates. We also suggest some basic graphics for visualizing lists of significant genes. Analyses of published experiments studying human cancer and yeast cells illustrate the results.  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号