首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 203 毫秒
1.
针对DNA(脱氧核糖核酸)证据的量化过程中常用的插入算法存在的缺陷,即量化结果与样本大小无关,小样本时过分量化了DNA证据,本文考虑了样本大小的影响,引入了Bayes模型。给出了基于Bayes模型下的似然比的计算公式,结合实际案例,对比了两种方法下的计算结果,数据结果表明基于Bayes模型下的算法比插入算法更加精确和合理。  相似文献   

2.
非线性再生散度随机效应模型包括了非线性随机效应模型和指数族非线性随机效应模型等.通过视模型中的随机效应为假想的缺失数据和应用Metropolis-Hastings(简称MH) 算法,提出了模型参数极大似然估计的随机逼近算法.模拟研究和实例分析表明了该算法的可行性.  相似文献   

3.
非线性再生散度随机效应模型是指数族非线性随机效应模型和非线性再生散度模型的推广和发展.通过视模型中的随机效应为假想的缺失数据和应用Metropolis-Hastings(MH)算法,提出了模型参数极大似然估计的Monte-Carlo EM(MCEM)算法,并用模拟研究和实例分析说明了该算法的可行性.  相似文献   

4.
针对基因芯片数据缺失问题,利用蛋白质相互作用关系与基因表达的内在联系,提出了一种利用蛋白质相互作用信息提高基因芯片缺失数据估计精度的方法.将蛋白质间的相互作用关系与基因表达数据间的距离相结合来计算基因间的表达相似度,根据这个新的相似性度量标准为含有缺失数据的基因选择更为合适的用于估计缺失值的基因集合.将新的相似性度量标准与传统的KNNimpute、 LLSimpute方法相结合,描述了对应的改进算法PPI-KNNimpute、 PPI-LLSimpute.对真实的数据集测试表明,蛋白质相互作用信息能有效改善基因缺失数据估计的精度.  相似文献   

5.
基于总体最小二乘方法的基因表达缺失数据估计   总被引:2,自引:0,他引:2  
在基因芯片实验中,数据缺失客观存在,并在一定程度上影响芯片数据后续分析结果的准确性。在不增加实验次数的情况下,缺失值估计是降低缺失数据对后续分析影响的有效方法。针对基因表达数据含有噪声的特点,提出了基于总体最小二乘估计的基因表达缺失值估计算法。实验结果表明,新的估计算法具有比传统缺失值估计算法更好的稳定性和估计准确度。  相似文献   

6.
基因组育种值估计的贝叶斯方法   总被引:1,自引:0,他引:1  
基因组育种值估计是基因组选择的重要环节,基因组育种值的准确性是基因组选择成功应用的关键,而其准确性在很大程度上取决于估计方法。目前研究和应用最多的基因组育种值估计方法是贝叶斯(Bayes)和最佳线性无偏预测(BLUP)两大类方法。文章系统介绍了目前已提出的各种Bayes方法,并总结了该类方法的估计效果和各方面的改进。模拟数据和实际数据研究结果都表明,Bayes类方法估计基因组育种值的准确性优于BLUP类方法,特别对于存在较大效应QTL的性状其优势更明显。由于Bayes方法的理论和计算过程相对复杂,目前其在实际育种中的运用不如BLUP类方法普遍,但随着快速算法的开发和计算机硬件的改进,计算问题有望得到解决;另外,随着对基因组和性状遗传结构研究的深入开展,能为Bayes方法提供更为准确的先验信息,从而使Bayes方法估计基因组育种值准确性的优势更加突出,应用将会更加广泛。  相似文献   

7.
涡度相关法是在湖泊开展CH4通量长期连续观测的重要方法。受到多种因素的影响,CH4通量观测数据存在大量缺失。为重构完整的CH4通量时间序列,就需要适宜的数据插补方法。本研究利用太湖涡度通量观测网络东部的避风港站点2014—2017年的常规气象数据及涡度相关观测的CH4通量数据,在分析半小时尺度以及日尺度CH4通量影响要素的基础上,测试了非线性回归法以及随机森林算法和误差反向传播算法在半小时尺度及日尺度上插补CH4通量缺失数据的可行性。结果表明: 在半小时尺度上,避风港站生长季CH4通量主要受到底泥温度、摩擦风速、气温、相对湿度、潜热通量和20 cm处水温的影响,非生长季主要受到相对湿度、潜热通量、风速、感热通量和底泥温度的影响,而在日尺度上CH4通量主要受潜热通量和相对湿度的影响。在对CH4通量缺失数据的插补中,随机森林模型在所有时间尺度上都表现为最佳的插补性能,其中,将日序、太阳高度角、底泥温度、摩擦风速、气温、20 cm处水温、相对湿度、气压和风速作为输入变量的随机森林模型更适用于半小时尺度缺失数据的插补;将日序、底泥温度、摩擦风速、气温、20 cm处水温、相对湿度、气压、风速和向下短波辐射作为输入变量的随机森林模型更适用于日尺度缺失数据的插补;整体上,插补模型对日尺度缺失数据的插补优于半小时尺度。  相似文献   

8.
9.
粒子滤波解码算法在神经信息解码中已有较多应用,但在海马区位置细胞集群编码的运动轨迹重建中极其少见.针对大鼠海马区位置细胞的神经元响应特性,采用二次指数泊松方程建立了大鼠运动轨迹的位置细胞集群状态空间编码模型,然后利用仿真数据和实测数据研究了粒子滤波在大鼠运动轨迹重建中的性能,并与扩展卡尔曼和无迹卡尔曼重建算法进行了对比.仿真数据重建结果显示,与后两种算法相比,在相同的重建精度下,粒子滤波算法需要的位置细胞个数相对更少.实测数据重建结果显示,粒子滤波算法重建的轨迹与真实轨迹之间的相关系数和均方根误差均优于扩展卡尔曼和无迹卡尔曼重建算法.这些结果表明,粒子滤波算法不仅能够高效地利用位置细胞集群编码信息,而且具有更高精度的轨迹重建性能,将为空间认知神经机制的深入研究提供有力的技术支持.  相似文献   

10.
物种分布模型是建立在物种出现或缺失数据的基础上,但可获得的真实分布数据存在着各种各样的缺点(如:物种识别错误、坐标错误、抽样偏差、数据缺失等),影响着物种分布模型的预测性能、稳定性及应用,因此使用物种真实分布数据评估物种分布模型将带来很大的不确定性。为避免这种不确定性,越来越多的研究使用虚拟物种来评价物种分布模型的性能,评估新方法的优劣。虚拟物种是一种建立在真实(或虚拟)地理信息系统下人工生命,是简化和抽象的物种,它通过模拟物种对环境变量的响应关系,评估物种在不同环境变量下的出现概率,人为地给出虚拟的物种分布数据。虚拟物种具有数据容易获得、数据质量可控、避免过度模拟等优势,目前它被广泛用于评估物种特性、抽样偏差、地理信息、出现/缺失标准等对物种分布模型性能的影响。虚拟物种是大尺度研究中不可或缺的重要工具,有利于解决真实数据未能解决的科学问题。常用的构成算法有求和法、求积法和综合法,但这些方法可能存在补偿效应,扩大了物种的分布范围。考虑到虚拟物种的不足,提出了未来虚拟物种可能的发展方向(避免过度脱离真实,完善虚拟物种的构成算法,构建虚拟的模式生物、群落及生态系统等)。为帮助研究者快速构建虚拟物种,基于R环境开发了一个虚拟物种构成软件包(SDMvspecies)。虚拟物种可以与真实物种相结合,通过改进模型的构成方法,有利于解决一些真实数据未能解决的问题;虚拟物种的应用也将导致一些新理论的产生,有利于更好地理解生态学原理。  相似文献   

11.
Huang Y  Dagne G 《Biometrics》2012,68(3):943-953
Summary It is a common practice to analyze complex longitudinal data using semiparametric nonlinear mixed-effects (SNLME) models with a normal distribution. Normality assumption of model errors may unrealistically obscure important features of subject variations. To partially explain between- and within-subject variations, covariates are usually introduced in such models, but some covariates may often be measured with substantial errors. Moreover, the responses may be missing and the missingness may be nonignorable. Inferential procedures can be complicated dramatically when data with skewness, missing values, and measurement error are observed. In the literature, there has been considerable interest in accommodating either skewness, incompleteness or covariate measurement error in such models, but there has been relatively little study concerning all three features simultaneously. In this article, our objective is to address the simultaneous impact of skewness, missingness, and covariate measurement error by jointly modeling the response and covariate processes based on a flexible Bayesian SNLME model. The method is illustrated using a real AIDS data set to compare potential models with various scenarios and different distribution specifications.  相似文献   

12.
iTRAQ (isobaric Tags for Relative and Absolute Quantitation) is a technique that allows simultaneous quantitation of proteins in multiple samples. In this paper, we describe a Bayesian hierarchical model-based method to infer the relative protein expression levels and hence to identify differentially expressed proteins from iTRAQ data. Our model assumes that the measured peptide intensities are affected by both protein expression levels and peptide specific effects. The values of these two effects across experiments are modeled as random effects. The nonrandom missingness of peptide data is modeled with a logistic regression which relates the missingness probability for a peptide with the expression level of the protein that produces this peptide. We propose a Markov chain Monte Carlo method for the inference of model parameters, including the relative expression levels across samples. Our simulation results suggest that the estimates of relative protein expression levels based on the MCMC samples have smaller bias than those estimated from ANOVA models or fold changes. We apply our method to an iTRAQ dataset studying the roles of Caveolae for postnatal cardiovascular function.  相似文献   

13.
Methods in the literature for missing covariate data in survival models have relied on the missing at random (MAR) assumption to render regression parameters identifiable. MAR means that missingness can depend on the observed exit time, and whether or not that exit is a failure or a censoring event. By considering ways in which missingness of covariate X could depend on the true but possibly censored failure time T and the true censoring time C, we attempt to identify missingness mechanisms which would yield MAR data. We find that, under various reasonable assumptions about how missingness might depend on T and/or C, additional strong assumptions are needed to obtain MAR. We conclude that MAR is difficult to justify in practical applications. One exception arises when missingness is independent of T, and C is independent of the value of the missing X. As alternatives to MAR, we propose two new missingness assumptions. In one, the missingness depends on T but not on C; in the other, the situation is reversed. For each, we show that the failure time model is identifiable. When missingness is independent of T, we show that the naive complete record analysis will yield a consistent estimator of the failure time distribution. When missingness is independent of C, we develop a complete record likelihood function and a corresponding estimator for parametric failure time models. We propose analyses to evaluate the plausibility of either assumption in a particular data set, and illustrate the ideas using data from the literature on this problem.  相似文献   

14.
Wang YG 《Biometrics》1999,55(3):984-989
Troxel, Lipsitz, and Brennan (1997, Biometrics 53, 857-869) considered parameter estimation from survey data with nonignorable nonresponse and proposed weighted estimating equations to remove the biases in the complete-case analysis that ignores missing observations. This paper suggests two alternative modifications for unbiased estimation of regression parameters when a binary outcome is potentially observed at successive time points. The weighting approach of Robins, Rotnitzky, and Zhao (1995, Journal of the American Statistical Association 90, 106-121) is also modified to obtain unbiased estimating functions. The suggested estimating functions are unbiased only when the missingness probability is correctly specified, and misspecification of the missingness model will result in biases in the estimates. Simulation studies are carried out to assess the performance of different methods when the covariate is binary or normal. For the simulation models used, the relative efficiency of the two new methods to the weighting methods is about 3.0 for the slope parameter and about 2.0 for the intercept parameter when the covariate is continuous and the missingness probability is correctly specified. All methods produce substantial biases in the estimates when the missingness model is misspecified or underspecified. Analysis of data from a medical survey illustrates the use and possible differences of these estimating functions.  相似文献   

15.
We present a Bayesian approach to analyze matched "case-control" data with multiple disease states. The probability of disease development is described by a multinomial logistic regression model. The exposure distribution depends on the disease state and could vary across strata. In such a model, the number of stratum effect parameters grows in direct proportion to the sample size leading to inconsistent MLEs for the parameters of interest even when one uses a retrospective conditional likelihood. We adopt a semiparametric Bayesian framework instead, assuming a Dirichlet process prior with a mixing normal distribution on the distribution of the stratum effects. We also account for possible missingness in the exposure variable in our model. The actual estimation is carried out through a Markov chain Monte Carlo numerical integration scheme. The proposed methodology is illustrated through simulation and an example of a matched study on low birth weight of newborns (Hosmer, D. A. and Lemeshow, S., 2000, Applied Logistic Regression) with two possible disease groups matched with a control group.  相似文献   

16.
Generalized additive models (GAMs) have been widely used for flexible modeling of various types of outcomes. When the outcome in a GAM is subject to missing, practical analyses often assume that missingness is missing at random (MAR). This assumption can be of suspicion when the missingness is not by design. Evaluating the potential effects of alternative nonignorable missing data mechanism on the MAR inference from a GAM can be important but often challenging due to the complicatedness of alternative nonignorable models. We apply the index approach to local sensitivity (Troxel, Ma, and Heitjan 2004 (2004). Statistica Sinica 14 , 1221–1237) to evaluate the potential changes of the GAM estimates in the neighborhood of the MAR model. The approach avoids fitting any complicated nonignorable GAM. Only MAR estimates are required to calculate the resulting sensitivity index and adjust the GAM estimates to account for nonignorable missingness. Thus the proposed approach is considerably simpler to conduct, as compared with the alternative methods. The simulation study shows that the index provides valid assessment of the local sensitivity of the GAM estimates to nonignorable missingness. We then illustrate the method using a rheumatoid arthritis clinical trial data set.  相似文献   

17.
Toledano AY  Gatsonis C 《Biometrics》1999,55(2):488-496
We propose methods for regression analysis of repeatedly measured ordinal categorical data when there is nonmonotone missingness in these responses and when a key covariate is missing depending on observables. The methods use ordinal regression models in conjunction with generalized estimating equations (GEEs). We extend the GEE methodology to accommodate arbitrary patterns of missingness in the responses when this missingness is independent of the unobserved responses. We further extend the methodology to provide correction for possible bias when missingness in knowledge of a key covariate may depend on observables. The approach is illustrated with the analysis of data from a study in diagnostic oncology in which multiple correlated receiver operating characteristic curves are estimated and corrected for possible verification bias when the true disease status is missing depending on observables.  相似文献   

18.
Summary .  In this article, we study the estimation of mean response and regression coefficient in semiparametric regression problems when response variable is subject to nonrandom missingness. When the missingness is independent of the response conditional on high-dimensional auxiliary information, the parametric approach may misspecify the relationship between covariates and response while the nonparametric approach is infeasible because of the curse of dimensionality. To overcome this, we study a model-based approach to condense the auxiliary information and estimate the parameters of interest nonparametrically on the condensed covariate space. Our estimators possess the double robustness property, i.e., they are consistent whenever the model for the response given auxiliary covariates or the model for the missingness given auxiliary covariate is correct. We conduct a number of simulations to compare the numerical performance between our estimators and other existing estimators in the current missing data literature, including the propensity score approach and the inverse probability weighted estimating equation. A set of real data is used to illustrate our approach.  相似文献   

19.
Longitudinal data often encounter missingness with monotone and/or intermittent missing patterns. Multiple imputation (MI) has been popularly employed for analysis of missing longitudinal data. In particular, the MI‐GEE method has been proposed for inference of generalized estimating equations (GEE) when missing data are imputed via MI. However, little is known about how to perform model selection with multiply imputed longitudinal data. In this work, we extend the existing GEE model selection criteria, including the “quasi‐likelihood under the independence model criterion” (QIC) and the “missing longitudinal information criterion” (MLIC), to accommodate multiple imputed datasets for selection of the MI‐GEE mean model. According to real data analyses from a schizophrenia study and an AIDS study, as well as simulations under nonmonotone missingness with moderate proportion of missing observations, we conclude that: (i) more than a few imputed datasets are required for stable and reliable model selection in MI‐GEE analysis; (ii) the MI‐based GEE model selection methods with a suitable number of imputations generally perform well, while the naive application of existing model selection methods by simply ignoring missing observations may lead to very poor performance; (iii) the model selection criteria based on improper (frequentist) multiple imputation generally performs better than their analogies based on proper (Bayesian) multiple imputation.  相似文献   

20.
Summary With advances in modern medicine and clinical diagnosis, case–control data with characterization of finer subtypes of cases are often available. In matched case–control studies, missingness in exposure values often leads to deletion of entire stratum, and thus entails a significant loss in information. When subtypes of cases are treated as categorical outcomes, the data are further stratified and deletion of observations becomes even more expensive in terms of precision of the category‐specific odds‐ratio parameters, especially using the multinomial logit model. The stereotype regression model for categorical responses lies intermediate between the proportional odds and the multinomial or baseline category logit model. The use of this class of models has been limited as the structure of the model implies certain inferential challenges with nonidentifiability and nonlinearity in the parameters. We illustrate how to handle missing data in matched case–control studies with finer disease subclassification within the cases under a stereotype regression model. We present both Monte Carlo based full Bayesian approach and expectation/conditional maximization algorithm for the estimation of model parameters in the presence of a completely general missingness mechanism. We illustrate our methods by using data from an ongoing matched case–control study of colorectal cancer. Simulation results are presented under various missing data mechanisms and departures from modeling assumptions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号