首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In randomized trials, an analysis of covariance (ANCOVA) is often used to analyze post-treatment measurements with pre-treatment measurements as a covariate to compare two treatment groups. Random allocation guarantees only equal variances of pre-treatment measurements. We hence consider data with unequal covariances and variances of post-treatment measurements without assuming normality. Recently, we showed that the actual type I error rate of the usual ANCOVA assuming equal slopes and equal residual variances is asymptotically at a nominal level under equal sample sizes, and that of the ANCOVA with unequal variances is asymptotically at a nominal level, even under unequal sample sizes. In this paper, we investigated the asymptotic properties of the ANCOVA with unequal slopes for such data. The estimators of the treatment effect at the observed mean are identical between equal and unequal variance assumptions, and these are asymptotically normal estimators for the treatment effect at the true mean. However, the variances of these estimators based on standard formulas are biased, and the actual type I error rates are not at a nominal level, irrespective of variance assumptions. In equal sample sizes, the efficiency of the usual ANCOVA assuming equal slopes and equal variances is asymptotically the same as those of the ANCOVA with unequal slopes and higher than that of the ANCOVA with equal slopes and unequal variances. Therefore, the use of the usual ANCOVA is appropriate in equal sample sizes.  相似文献   

2.
Testing for unequal variances is usually performed in order to check the validity of the assumptions that underlie standard tests for differences between means (the t-test and anova). However, existing methods for testing for unequal variances (Levene's test and Bartlett's test) are notoriously non-robust to normality assumptions, especially for small sample sizes. Moreover, although these methods were designed to deal with one hypothesis at a time, modern applications (such as to microarrays and fMRI experiments) often involve parallel testing over a large number of levels (genes or voxels). Moreover, in these settings a shift in variance may be biologically relevant, perhaps even more so than a change in the mean. This paper proposes a parsimonious model for parallel testing of the equal variance hypothesis. It is designed to work well when the number of tests is large; typically much larger than the sample sizes. The tests are implemented using an empirical Bayes estimation procedure which `borrows information' across levels. The method is shown to be quite robust to deviations from normality, and to substantially increase the power to detect differences in variance over the more traditional approaches even when the normality assumption is valid.  相似文献   

3.
Many human diseases are characterized by multiple stages of progression. While the typical sequence of disease progression can be identified, there may be large individual variations among patients. Identifying mean stage durations and their variations is critical for statistical hypothesis testing needed to determine if treatment is having a significant effect on the progression, or if a new therapy is showing a delay of progression through a multistage disease. In this paper we focus on two methods for extracting stage duration statistics from longitudinal datasets: an extension of the linear regression technique, and a counting algorithm. Both are non-iterative, non-parametric and computationally cheap methods, which makes them invaluable tools for studying the epidemiology of diseases, with a goal of identifying different patterns of progression by using bioinformatics methodologies. Here we show that the regression method performs well for calculating the mean stage durations under a wide variety of assumptions, however, its generalization to variance calculations fails under realistic assumptions about the data collection procedure. On the other hand, the counting method yields reliable estimations for both means and variances of stage durations. Applications to Alzheimer disease progression are discussed.  相似文献   

4.
Microarray experiments are being increasingly used in molecular biology. A common task is to detect genes with differential expression across two experimental conditions, such as two different tissues or the same tissue at two time points of biological development. To take proper account of statistical variability, some statistical approaches based on the t-statistic have been proposed. In constructing the t-statistic, one needs to estimate the variance of gene expression levels. With a small number of replicated array experiments, the variance estimation can be challenging. For instance, although the sample variance is unbiased, it may have large variability, leading to a large mean squared error. For duplicated array experiments, a new approach based on simple averaging has recently been proposed in the literature. Here we consider two more general approaches based on nonparametric smoothing. Our goal is to assess the performance of each method empirically. The three methods are applied to a colon cancer data set containing 2,000 genes. Using two arrays, we compare the variance estimates obtained from the three methods. We also consider their impact on the t-statistics. Our results indicate that the three methods give variance estimates close to each other. Due to its simplicity and generality, we recommend the use of the smoothed sample variance for data with a small number of replicates. Electronic Publication  相似文献   

5.
Accurately identifying differentially expressed genes from microarray data is not a trivial task, partly because of poor variance estimates of gene expression signals. Here, after analyzing 380 replicated microarray experiments, we found that probesets have typical, distinct variances that can be estimated based on a large number of microarray experiments. These probeset-specific variances depend at least in part on the function of the probed gene: genes for ribosomal or structural proteins often have a small variance, while genes implicated in stress responses often have large variances. We used these variance estimates to develop a statistical test for differentially expressed genes called EVE (external variance estimation). The EVE algorithm performs better than the t-test and LIMMA on some real-world data, where external information from appropriate databases is available. Thus, EVE helps to maximize the information gained from a typical microarray experiment. Nonetheless, only a large number of replicates will guarantee to identify nearly all truly differentially expressed genes. However, our simulation studies suggest that even limited numbers of replicates will usually result in good coverage of strongly differentially expressed genes.  相似文献   

6.
Data from an experimental mice population selected from 18 generations to increase weight gain were used to estimate the genetic parameters associated with environmental variability. The analysis involved three traits: weight at 21 days, weight at 42 days and weight gain between 21 and 42 days. A dataset of 5273 records for males was studied. Data were analysed using Bayesian procedures by comparing the Deviance Information Criterion (DIC) value of two different models: one assuming homogeneous environmental variances and another assuming them as heterogeneous. The model assuming heterogeneity was better in all cases and also showed higher additive genetic variances and lower common environmental variances. The heterogeneity of residual variance was associated with systematic and additive genetic effects thus making reduction by selection possible. Genetic correlations between the additive genetic effects on mean and environmental variance of the traits analysed were always negative, ranging from -0.19 to -0.38. An increase in the heritability of the traits was found when considering the genetic determination of the environmental variability. A suggested correlated canalised response was found in terms of coefficient of variation but it could be insufficient to compensate for the scale effect associated with an increase of the mean.  相似文献   

7.
The traditional quantitative genetics model was used as the unifying approach to derive six existing and new definitions of genomic additive and dominance relationships. The theoretical differences of these definitions were in the assumptions of equal SNP effects (equivalent to across-SNP standardization), equal SNP variances (equivalent to within-SNP standardization), and expected or sample SNP additive and dominance variances. The six definitions of genomic additive and dominance relationships on average were consistent with the pedigree relationships, but had individual genomic specificity and large variations not observed from pedigree relationships. These large variations may allow finding least related genomes even within the same family for minimizing genomic relatedness among breeding individuals. The six definitions of genomic relationships generally had similar numerical results in genomic best linear unbiased predictions of additive effects (GBLUP) and similar genomic REML (GREML) estimates of additive heritability. Predicted SNP dominance effects and GREML estimates of dominance heritability were similar within definitions assuming equal SNP effects or within definitions assuming equal SNP variance, but had differences between these two groups of definitions. We proposed a new measure of genomic inbreeding coefficient based on parental genomic co-ancestry coefficient and genomic additive correlation as a genomic approach for predicting offspring inbreeding level. This genomic inbreeding coefficient had the highest correlation with pedigree inbreeding coefficient among the four methods evaluated for calculating genomic inbreeding coefficient in a Holstein sample and a swine sample.  相似文献   

8.
MOTIVATION: mRNA expression data obtained from high-throughput DNA microarrays exhibit strong departures from homogeneity of variances. Often a complex relationship between mean expression value and variance is seen. Variance stabilization of such data is crucial for many types of statistical analyses, while regularization of variances (pooling of information) can greatly improve overall accuracy of test statistics. RESULTS: A Classification and Regression Tree (CART) procedure is introduced for variance stabilization as well as regularization. The CART procedure adaptively clusters genes by variances. Using both local and cluster wide information leads to improved estimation of population variances which improves test statistics. Whereas making use of cluster wide information allows for variance stabilization of data. AVAILABILITY: Sufficient details for our CART procedure are given so that the interested reader can program the method for themselves. The algorithm is also accessible within the Java software package BAMarray(TM), which is freely available to non-commercial users at www.bamarray.com. CONTACT: hemant.ishwaran@gmail.com.  相似文献   

9.
In studies designed to compare different methods of measurement where more than two methods are compared or replicate measurements by each method are available, standard statistical approaches such as computation of limits of agreement are not directly applicable. A model is presented for comparing several methods of measurement in the situation where replicate measurements by each method are available. Measurements are viewed as classified by method, subject and replicate. Models assuming exchangeable as well as non-exchangeable replicates are considered. A fitting algorithm is presented that allows the estimation of linear relationships between methods as well as relevant variance components. The algorithm only uses methods already implemented in most statistical software.  相似文献   

10.
Wang J  Jia M  Zhu L  Yuan Z  Li P  Chang C  Luo J  Liu M  Shi T 《PloS one》2010,5(10):e13721
Many methods, including parametric, nonparametric, and Bayesian methods, have been used for detecting differentially expressed genes based on the assumption that biological systems are linear, which ignores the nonlinear characteristics of most biological systems. More importantly, those methods do not simultaneously consider means, variances, and high moments, resulting in relatively high false positive rate. To overcome the limitations, the SWang test is proposed to determine differentially expressed genes according to the equality of distributions between case and control. Our method not only latently incorporates functional relationships among genes to consider nonlinear biological system but also considers the mean, variance, skewness, and kurtosis of expression profiles simultaneously. To illustrate biological significance of high moments, we construct a nonlinear gene interaction model, demonstrating that skewness and kurtosis could contain useful information of function association among genes in microarrays. Simulations and real microarray results show that false positive rate of SWang is lower than currently popular methods (T-test, F-test, SAM, and Fold-change) with much higher statistical power. Additionally, SWang can uniquely detect significant genes in real microarray data with imperceptible differential expression but higher variety in kurtosis and skewness. Those identified genes were confirmed with previous published literature or RT-PCR experiments performed in our lab.  相似文献   

11.
Summary The effect of gene association (or dispersion) and linkage on the estimation of genetic variances in a diallel experiment involving doubled haploid lines is evaluated. It is shown that the estimates of the additive and the additive X additive genetic variances, as obtained by Choo et al. (1979), are biased if genes are linked or are not independently distributed in the parents. However, this bias only occurs in the presence of interaction between homozygous loci. Gene association (or dispersion) and linkage, if present, can be detected by comparing the parental vs the crosses mean, the parental vs the doubled haploid lines variance, and the among vs the within crosses variance.  相似文献   

12.
Three two-trait selection methods were analyzed for their effects on genetic variance and correlation by multivariate methods, two-locus methods and computer simulation. The two-trait selection methods studied were independent culling levels (ICL), index (IND) and extreme (EXT) selection. The effects of the selection methods on genetic variance and correlation were partitioned into permanent effects due to changes in gene frequencies and temporary effects due to nonrandom association of alleles at different loci. Multivariate methods were used to predict temporary effects from a single generation of selection by each method and from several generations of index selection. Two-locus theory was used to determine the stability and rank of temporary effects on genetic correlation for all three methods. Predictions were compared to computer simulation results. When selection increased the means of both traits, EXT had the lowest (closest to -1.0) genetic correlation and highest variances, while ICL tended to have the highest (closest to 1.0) genetic correlation. When selection increased the mean of one trait and decreased the mean of the other, EXT had the highest genetic variances and correlation, while ICL had the lowest genetic variances and correlation.  相似文献   

13.
For continuous variables of randomized controlled trials, recently, longitudinal analysis of pre- and posttreatment measurements as bivariate responses is one of analytical methods to compare two treatment groups. Under random allocation, means and variances of pretreatment measurements are expected to be equal between groups, but covariances and posttreatment variances are not. Under random allocation with unequal covariances and posttreatment variances, we compared asymptotic variances of the treatment effect estimators in three longitudinal models. The data-generating model has equal baseline means and variances, and unequal covariances and posttreatment variances. The model with equal baseline means and unequal variance–covariance matrices has a redundant parameter. In large sample sizes, these two models keep a nominal type I error rate and have high efficiency. The model with equal baseline means and equal variance–covariance matrices wrongly assumes equal covariances and posttreatment variances. Only under equal sample sizes, this model keeps a nominal type I error rate. This model has the same high efficiency with the data-generating model under equal sample sizes. In conclusion, longitudinal analysis with equal baseline means performed well in large sample sizes. We also compared asymptotic properties of longitudinal models with those of the analysis of covariance (ANCOVA) and t-test.  相似文献   

14.
ABSTRACT: A number of recent works have introduced statistical methods for detecting genetic loci that affect phenotypic variability, which we refer to as variability-controlling quantitative trait loci (vQTL). These are genetic variants whose allelic state predicts how much phenotype values will vary about their expected means. Such loci are of great potential interest in both human and non-human genetic studies, one reason being that a detected vQTL could represent a previously undetected interaction with other genes or environmental factors. The simultaneous publication of these new methods in different journals has in many cases precluded opportunity for comparison. We survey some of these methods, the respective trade-offs they imply, and the connections between them. The methods fall into three main groups: classical non-parametric, fully parametric, and semi-parametric two-stage approximations. Choosing between alternatives involves balancing the need for robustness, flexibility, and speed. For each method, we identify important assumptions and limitations, including those of practical importance, such as their scope for including covariates and random effects. We show in simulations that both parametric methods and their semi-parametric approximations can give elevated false positive rates when they ignore mean-variance relationships intrinsic to the data generation process. We conclude that choice of method depends on the trait distribution, the need to include non-genetic covariates, and the population size and structure, coupled with a critical evaluation of how these fit with the assumptions of the statistical model.  相似文献   

15.
M. I. Chiu  T. L. Mason    G. R. Fink 《Genetics》1992,132(4):987-1001
Wright's method of estimating the number of genes contributing to the difference in a quantitative character between two populations involves observing the means and variances of the two parental populations and their hybrid populations. Although simple, Wright's method provides seriously biased estimates, largely due to linkage and unequal effects of alleles. A method is suggested to evaluate the bias of Wright's estimate, which relies on estimation of the mean recombination frequency between a pair of loci and a composite parameter of variability of allelic effects and frequencies among loci. Assuming that the loci are uniformly distributed in the genome, the mean recombination frequency can be calculated for some organisms. Theoretical analysis and an analysis of the Drosophila data on distributions of effects of P element inserts on bristle numbers indicate that the value of the composite parameter is likely to be about three or larger for many quantitative characters. There are, however, some serious problems with the current method, such as the irregular behavior of the statistic and large sampling variances of estimates. Because of that, the method is generally not recommended for use unless several favorable conditions are met. These conditions are: the two parental populations are many phenotypic standard deviations apart, linkage is not tight, and the sample size is very large. An example is given on the fruit weight of tomato from a cross with parental populations differing in means by more than 14 phenotypic standard deviations. It is estimated that the number of loci which account for 95% of the genic variance in the F2 population is 16, with a 95% confidence interval of 7-28, and the effect of the leading locus is 13% of the parental difference, with 95% confidence interval 8.5-25.7%.  相似文献   

16.
One of the key hypothesized drivers of gradients in species richness is environmental filtering, where environmental stress limits which species from a larger species pool gain membership in a local community owing to their traits. Whereas most studies focus on small‐scale variation in functional traits along environmental gradient, the effect of large‐scale environmental filtering is less well understood. Furthermore, it has been rarely tested whether the factors that constrain the niche space limit the total number of coexisting species. We assessed the role of environmental filtering in shaping tree assemblages across North America north of Mexico by testing the hypothesis that colder, drier, or seasonal environments (stressful conditions for most plants) constrain tree trait diversity and thereby limit species richness. We assessed geographic patterns in trait filtering and their relationships to species richness pattern using a comprehensive set of tree range maps. We focused on four key plant functional traits reflecting major life history axes (maximum height, specific leaf area, seed mass, and wood density) and four climatic variables (annual mean and seasonality of temperature and precipitation). We tested for significant spatial shifts in trait means and variances using a null model approach. While we found significant shifts in mean species’ trait values at most grid cells, trait variances at most grid cells did not deviate from the null expectation. Measures of environmental harshness (cold, dry, seasonal climates) and lower species richness were weakly associated with a reduction in variance of seed mass and specific leaf area. The pattern in variance of height and wood density was, however, opposite. These findings do not support the hypothesis that more stressful conditions universally limit species and trait diversity in North America. Environmental filtering does, however, structure assemblage composition, by selecting for certain optimum trait values under a given set of conditions.  相似文献   

17.
Experiments using quantitative real-time PCR to test hypotheses are limited by technical and biological variability; we seek to minimise sources of confounding variability through optimum use of biological and technical replicates. The quality of an experiment design is commonly assessed by calculating its prospective power. Such calculations rely on knowledge of the expected variances of the measurements of each group of samples and the magnitude of the treatment effect; the estimation of which is often uninformed and unreliable. Here we introduce a method that exploits a small pilot study to estimate the biological and technical variances in order to improve the design of a subsequent large experiment. We measure the variance contributions at several ‘levels’ of the experiment design and provide a means of using this information to predict both the total variance and the prospective power of the assay. A validation of the method is provided through a variance analysis of representative genes in several bovine tissue-types. We also discuss the effect of normalisation to a reference gene in terms of the measured variance components of the gene of interest. Finally, we describe a software implementation of these methods, powerNest, that gives the user the opportunity to input data from a pilot study and interactively modify the design of the assay. The software automatically calculates expected variances, statistical power, and optimal design of the larger experiment. powerNest enables the researcher to minimise the total confounding variance and maximise prospective power for a specified maximum cost for the large study.  相似文献   

18.
D F Moore  A Tsiatis 《Biometrics》1991,47(2):383-401
When faced with data in the form of overdispersed counts or proportions, moment methods allow consistent parameter estimation when only the form of the mean and variance is specified. If the variance form is misspecified, these methods still yield consistent parameter estimates, though with lower efficiency, and the variances of the estimates will be inconsistent. A variance correction is available that yields consistent variance estimates in these circumstances. The asymptotic and small-sample efficiencies of this correction are calculated, and its performance under variance misspecification is studied. A group-randomized breast self-examination prevention study that is now underway serves as a focal point for the study of these properties. The use of the variance correction in modelling is illustrated on a teratology data set.  相似文献   

19.
A neglected life-history trait: clutch-size variance in snakes   总被引:3,自引:0,他引:3  
Most analyses of life-history traits have focused on mean values rather than their associated variance. We review published and original data on snakes, including records gathered over many years on single populations, to examine patterns in clutch-size variability in these animals. Within single populations, the coefficient of variation of clutch size did not vary significantly with maternal body size, or among years. The stability of clutch-size variance through time is consistent with experimental studies showing no significant influence of food intake rates on this characteristic. Clutch-size variances did not differ between viviparous and oviparous snakes, but were dependent upon allometric relationships involving maternal body size and the relationship between clutch size and body size. Clutch-size variability was highest in species with relatively variable female sizes, and with a high rate of increase in clutch size with increasing body size. These two factors acted to magnify the extent of clutch-size variability engendered by variability in maternal body sizes. The relationships among these variables were similar in the two squamate Suborders, but the larger body sizes and mean clutch sizes of snakes resulted in clutch-size variances being higher in snakes than in lizards.  相似文献   

20.
In a bioassay, under certain experimental circumstances, information on concentration (dose rate) and time to response for some subjects can be combined in a single analysis. An underlying logistic random variable is assumed and the resulting mixed- (continuous-quantal) response model is analyzed by likelihood methods. The estimation procedure for the mean and the variance is described, and expressions for asymptotic variances are obtained. A comparison of results from the mixed model and from the standard quantal-response model shows that there is a substantial reduction in the variance of the estimators for the mixed model. On the basis of the table of asymptotic variances, some design implications are discussed. An example from insect pheromone research is used to illustrate the main ideas.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号