首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 22 毫秒
1.
Generalized linear mixed models (GLMMs) have become a frequently used tool for the analysis of non-Gaussian longitudinal data. Estimation is based on maximum likelihood theory, which assumes that the underlying probability model is correctly specified. Recent research is showing that the results obtained from these models are not always robust against departures from the assumptions on which these models are based. In the present work we have used simulations with a logistic random-intercept model to study the impact of misspecifying the random-effects distribution on the type I and II errors of the tests for the mean structure in GLMMs. We found that the misspecification can either increase or decrease the power of the tests, depending on the shape of the underlying random-effects distribution, and it can considerably inflate the type I error rate. Additionally, we have found a theoretical result which states that whenever a subset of fixed-effects parameters, not included in the random-effects structure equals zero, the corresponding maximum likelihood estimator will consistently estimate zero. This implies that under certain conditions a significant effect could be considered as a reliable result, even if the random-effects distribution is misspecified.  相似文献   

2.

Background  

Complexity and noise in expression quantitative trait loci (eQTL) studies make it difficult to distinguish potential regulatory relationships among the many interactions. The predominant method of identifying eQTLs finds associations that are significant at a genome-wide level. The vast number of statistical tests carried out on these data make false negatives very likely. Corrections for multiple testing error render genome-wide eQTL techniques unable to detect modest regulatory effects.  相似文献   

3.

The statistical analysis of enzyme kinetic reactions usually involves models of the response functions which are well defined on the basis of Michaelis–Menten type equations. The error structure, however, is often without good reason assumed as additive Gaussian noise. This simple assumption may lead to undesired properties of the analysis, particularly when simulations are involved and consequently negative simulated reaction rates may occur. In this study, we investigate the effect of assuming multiplicative log normal errors instead. While there is typically little impact on the estimates, the experimental designs and their efficiencies are decisively affected, particularly when it comes to model discrimination problems.

  相似文献   

4.
Swerup  C. 《Biological cybernetics》1978,29(2):97-104
The cross-correlation between output and input of a system containing nonlinearities, when that system is stimulated with Gaussian white noise, is a good estimate of the linear properties of the system. In practice, however, when sequences of pseudonoise are used, great errors may be introduced in the estimate of the linear part depending on the properties of the noise. This consideration assumes special importance in the analysis of the linear properties of the peripheral auditory system, where the rectifying properties of the haircells constitute a second order nonlinearity. To explore this problem, a simple model has been designed, consisting of a second order nonlinearity without memory and sandwiched between two bandpass filters. Different types of pseudonoise are used as input whereupon it is shown that noise based on binary m-sequences, which is commonly used in noise generators, will yield totally incorrect information about this system. Somewhat better results are achieved with other types of noise. By using inverse-repeat sequences the results are greatly improved. Furthermore, certain anomalies obtained in the analysis of responses from single fibers in the auditory nerve are viewed in the light of the present results. The theoretical analysis of these anomalies reveals some information about the organization of the peripheral auditory system. For example, the possibility of the existence of a second bandpass filter in the auditory periphery seems to be excluded.  相似文献   

5.
Ecological data sets often record the abundance of species, together with a set of explanatory variables. Multivariate statistical methods are optimal to analyze such data and are thus frequently used in ecology for exploration, visualization, and inference. Most approaches are based on pairwise distance matrices instead of the sites‐by‐species matrix, which stands in stark contrast to univariate statistics, where data models, assuming specific distributions, are the norm. However, through advances in statistical theory and computational power, models for multivariate data have gained traction. Systematic simulation‐based performance evaluations of these methods are important as guides for practitioners but still lacking. Here, we compare two model‐based methods, multivariate generalized linear models (MvGLMs) and constrained quadratic ordination (CQO), with two distance‐based methods, distance‐based redundancy analysis (dbRDA) and canonical correspondence analysis (CCA). We studied the performance of the methods to discriminate between causal variables and noise variables for 190 simulated data sets covering different sample sizes and data distributions. MvGLM and dbRDA differentiated accurately between causal and noise variables. The former had the lowest false‐positive rate (0.008), while the latter had the lowest false‐negative rate (0.027). CQO and CCA had the highest false‐negative rate (0.291) and false‐positive rate (0.256), respectively, where these error rates were typically high for data sets with linear responses. Our study shows that both model‐ and distance‐based methods have their place in the ecologist's statistical toolbox. MvGLM and dbRDA are reliable for analyzing species–environment relations, whereas both CQO and CCA exhibited considerable flaws, especially with linear environmental gradients.  相似文献   

6.
Wei Zou  Zhao-Bang Zeng 《Genetica》2009,137(2):125-134
To find the correlations between genome-wide gene expression variations and sequence polymorphisms in inbred cross populations, we developed a statistical method to claim expression quantitative trait loci (eQTL) in a genome. The method is based on multiple interval mapping (MIM), a model selection procedure, and uses false discovery rate (FDR) to measure the statistical significance of the large number of eQTL. We compared our method with a similar procedure proposed by Storey et al. and found that our method can be more powerful. We identified the features in the two methods that resulted in different statistical powers for eQTL detection, and confirmed them by simulation. We organized our computational procedure in an R package which can estimate FDR for positive findings from similar model selection procedures. The R package, MIM-eQTL, can be found at .  相似文献   

7.
8.
Molecular techniques for detecting microorganisms, macroorganisms and infectious agents are susceptible to false‐negative and false‐positive errors. If left unaddressed, these observational errors may yield misleading inference concerning occurrence, prevalence, sensitivity, specificity and covariate relationships. Occupancy models are widely used to account for false‐negative errors and more recently have even been used to address false‐positive errors, too. Current modelling options assume false‐positive errors only occur in truly negative samples, an assumption that yields biased inference concerning detection because a positive sample could be classified as such not because the target agent was successfully detected, but rather due to a false‐positive test result. We present an extension to the occupancy modelling framework that allows false‐positive errors in both negative and positive samples, thereby providing unbiased inference concerning occurrence and detection, as well as reliable conclusions about the efficacy of sampling designs, handling protocols and diagnostic tests. We apply the model to simulated data, showing that it recovers known parameters and outperforms other approaches that are commonly used when confronted with observation errors. We then apply the model to an experimental data set on Batrachochytrium dendrobatidis, a pathogenic fungus that is implicated in the global decline or extinction of hundreds of amphibian species. The model‐based approach we present is not only useful for obtaining reliable inference when data are contaminated with observational errors, but also eliminates the need for establishing arbitrary thresholds or decision rules that have hidden and unintended consequences.  相似文献   

9.

Background

RNA sequencing (RNA-seq) is the current gold-standard method to quantify gene expression for expression quantitative trait locus (eQTL) studies. However, a potential caveat in these studies is that RNA-seq reads carrying the non-reference allele of variant loci can have lower probability to map correctly to the reference genome, which could bias gene quantifications and cause false positive eQTL associations. In this study, we analyze the effect of this allelic mapping bias in eQTL discovery.

Results

We simulate RNA-seq read mapping over 9.5 M common SNPs and indels, with 15.6% of variants showing biased mapping rate for reference versus non-reference reads. However, removing potentially biased RNA-seq reads from an eQTL dataset of 185 individuals has a very small effect on gene and exon quantifications and eQTL discovery. We detect only a handful of likely false positive eQTLs, and overall eQTL SNPs show no significant enrichment for high mapping bias.

Conclusion

Our results suggest that RNA-seq quantifications are generally robust against allelic mapping bias, and that this does not have a severe effect on eQTL discovery. Nevertheless, we provide our catalog of putatively biased loci to allow better controlling for mapping bias to obtain more accurate results in future RNA-seq studies.

Electronic supplementary material

The online version of this article (doi:10.1186/s13059-014-0467-2) contains supplementary material, which is available to authorized users.  相似文献   

10.
The quality of fit of sedimentation velocity data is critical to judge the veracity of the sedimentation model and accuracy of the derived macromolecular parameters. Absolute statistical measures are usually complicated by the presence of characteristic systematic errors and run-to-run variation in the stochastic noise of data acquisition. We present a new graphical approach to visualize systematic deviations between data and model in the form of a histogram of residuals. In comparison with the ideally expected Gaussian distribution, it can provide a robust measure of fit quality and be used to flag poor models.  相似文献   

11.
This work evaluates three techniques of calibrating capacitance (dielectric) spectrometers used for on-line monitoring of biomass: modeling of cell properties using the theoretical Cole–Cole equation, linear regression of dual-frequency capacitance measurements on biomass concentration, and multivariate (PLS) modeling of scanning dielectric spectra. The performance and robustness of each technique is assessed during a sequence of validation batches in two experimental settings of differing signal noise. In more noisy conditions, the Cole–Cole model had significantly higher biomass concentration prediction errors than the linear and multivariate models. The PLS model was the most robust in handling signal noise. In less noisy conditions, the three models performed similarly. Estimates of the mean cell size were done additionally using the Cole–Cole and PLS models, the latter technique giving more satisfactory results.  相似文献   

12.
The reliabilities of parsimony-based and likelihood-based methods for inferring positive selection at single amino acid sites were studied using the nucleotide sequences of human leukocyte antigen (HLA) genes, in which positive selection is known to be operating at the antigen recognition site. The results indicate that the inference by parsimony-based methods is robust to the use of different evolutionary models and generally more reliable than that by likelihood-based methods. In contrast, the results obtained by likelihood-based methods depend on the models and on the initial parameter values used. It is sometimes difficult to obtain the maximum likelihood estimates of parameters for a given model, and the results obtained may be false negatives or false positives depending on the initial parameter values. It is therefore preferable to use parsimony-based methods as long as the number of sequences is relatively large and the branch lengths of the phylogenetic tree are relatively small.  相似文献   

13.
Copt S  Heritier S 《Biometrics》2007,63(4):1045-1052
Mixed linear models are commonly used to analyze data in many settings. These models are generally fitted by means of (restricted) maximum likelihood techniques relying heavily on normality. The sensitivity of the resulting estimators and related tests to this underlying assumption has been identified as a weakness that can even lead to wrong interpretations. Very recently a highly robust estimator based on a scale estimate, that is, an S-estimator, has been proposed for general mixed linear models. It has the advantage of being easy to compute and allows the computation of a robust score test. However, this proposal cannot be used to define a likelihood ratio type test that is certainly the most direct route to robustify an F-test. As the latter is usually a key tool of hypothesis testing in mixed linear models, we propose two new robust estimators that allow the desired extension. They also lead to resistant Wald-type tests useful for testing contrasts and covariate effects. We study their properties theoretically and by means of simulations. The analysis of a real data set illustrates the advantage of the new approach in the presence of outlying observations.  相似文献   

14.
Summary .  Regression models are often used to test for cause-effect relationships from data collected in randomized trials or experiments. This practice has deservedly come under heavy scrutiny, because commonly used models such as linear and logistic regression will often not capture the actual relationships between variables, and incorrectly specified models potentially lead to incorrect conclusions. In this article, we focus on hypothesis tests of whether the treatment given in a randomized trial has any effect on the mean of the primary outcome, within strata of baseline variables such as age, sex, and health status. Our primary concern is ensuring that such hypothesis tests have correct type I error for large samples. Our main result is that for a surprisingly large class of commonly used regression models, standard regression-based hypothesis tests (but using robust variance estimators) are guaranteed to have correct type I error for large samples, even when the models are incorrectly specified. To the best of our knowledge, this robustness of such model-based hypothesis tests to incorrectly specified models was previously unknown for Poisson regression models and for other commonly used models we consider. Our results have practical implications for understanding the reliability of commonly used, model-based tests for analyzing randomized trials.  相似文献   

15.
Likelihood analysis for regression models with measurement errors in explanatory variables typically involves integrals that do not have a closed-form solution. In this case, numerical methods such as Gaussian quadrature are generally employed. However, when the dimension of the integral is large, these methods become computationally demanding or even unfeasible. This paper proposes the use of the Laplace approximation to deal with measurement error problems when the likelihood function involves high-dimensional integrals. The cases considered are generalized linear models with multiple covariates measured with error and generalized linear mixed models with measurement error in the covariates. The asymptotic order of the approximation and the asymptotic properties of the Laplace-based estimator for these models are derived. The method is illustrated using simulations and real-data analysis.  相似文献   

16.
17.
Concurrent coding is an encoding scheme with ‘holographic’ type properties that are shown here to be robust against a significant amount of noise and signal loss. This single encoding scheme is able to correct for random errors and burst errors simultaneously, but does not rely on cyclic codes. A simple and practical scheme has been tested that displays perfect decoding when the signal to noise ratio is of order -18dB. The same scheme also displays perfect reconstruction when a contiguous block of 40% of the transmission is missing. In addition this scheme is 50% more efficient in terms of transmitted power requirements than equivalent cyclic codes. A simple model is presented that describes the process of decoding and can determine the computational load that would be expected, as well as describing the critical levels of noise and missing data at which false messages begin to be generated.  相似文献   

18.
Most genome-wide association studies consider genes that are located closest to single nucleotide polymorphisms (SNPs) that are highly significant for those studies. However, the significance of the associations between SNPs and candidate genes has not been fully determined. An alternative approach that used SNPs in expression quantitative trait loci (eQTL) was reported previously for Crohn’s disease; it was shown that eQTL-based preselection for follow-up studies was a useful approach for identifying risk loci from the results of moderately sized GWAS. In this study, we propose an approach that uses eQTL SNPs to support the functional relationships between an SNP and a candidate gene in a genome-wide association study. The genome-wide SNP genotypes and 10 biochemical measures (fasting glucose levels, BUN, serum albumin levels, AST, ALT, gamma GTP, total cholesterol, HDL cholesterol, triglycerides, and LDL cholesterol) were obtained from the Korean Association Resource (KARE) consortium. The eQTL SNPs were isolated from the SNP dataset based on the RegulomeDB eQTL-SNP data from the ENCODE projects and two recent eQTL reports. A total of 25,658 eQTL SNPs were tested for their association with the 10 metabolic traits in 2 Korean populations (Ansung and Ansan). The proportion of phenotypic variance explained by eQTL and non-eQTL SNPs showed that eQTL SNPs were more likely to be associated with the metabolic traits genetically compared with non-eQTL SNPs. Finally, via a meta-analysis of the two Korean populations, we identified 14 eQTL SNPs that were significantly associated with metabolic traits. These results suggest that our approach can be expanded to other genome-wide association studies.  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号