共查询到20条相似文献,搜索用时 15 毫秒
1.
Longitudinal data usually consist of a number of short time series. A group of subjects or groups of subjects are followed over time and observations are often taken at unequally spaced time points, and may be at different times for different subjects. When the errors and random effects are Gaussian, the likelihood of these unbalanced linear mixed models can be directly calculated, and nonlinear optimization used to obtain maximum likelihood estimates of the fixed regression coefficients and parameters in the variance components. For binary longitudinal data, a two state, non-homogeneous continuous time Markov process approach is used to model serial correlation within subjects. Formulating the model as a continuous time Markov process allows the observations to be equally or unequally spaced. Fixed and time varying covariates can be included in the model, and the continuous time model allows the estimation of the odds ratio for an exposure variable based on the steady state distribution. Exact likelihoods can be calculated. The initial probability distribution on the first observation on each subject is estimated using logistic regression that can involve covariates, and this estimation is embedded in the overall estimation. These models are applied to an intervention study designed to reduce children's sun exposure. 相似文献
2.
Background
Sequencing of environmental DNA (often called metagenomics) has shown tremendous potential to uncover the vast number of unknown microbes that cannot be cultured and sequenced by traditional methods. Because the output from metagenomic sequencing is a large set of reads of unknown origin, clustering reads together that were sequenced from the same species is a crucial analysis step. Many effective approaches to this task rely on sequenced genomes in public databases, but these genomes are a highly biased sample that is not necessarily representative of environments interesting to many metagenomics projects. 相似文献3.
Quasi-consensus-based comparison of profile hidden Markov models for protein sequences 总被引:2,自引:0,他引:2
A simple approach for the sensitive detection of distant relationships among protein families and for sequence-structure alignment via comparison of hidden Markov models based on their quasi-consensus sequences is presented. Using a previously published benchmark dataset, the approach is demonstrated to give better homology detection and yield alignments with improved accuracy in comparison to an existing state-of-the-art dynamic programming profile-profile comparison method. This method also runs significantly faster and is therefore suitable for a server covering the rapidly increasing structure database. A server based on this method is available at http://liao.cis.udel.edu/website/servers/modmod 相似文献
4.
Messenger RNA sequences possess specific nucleotide patterns distinguishing them from non-coding genomic sequences. In this study, we explore the utilization of modified Markov models to analyze sequences up to 44 bp, far beyond the 8-bp limit of conventional Markov models, for exon/intron discrimination. In order to analyze nucleotide sequences of this length, their information content is first reduced by conversion into shorter binary patterns via the application of numerous abstraction schemes. After the conversion of genomic sequences to binary strings, homogenous Markov models trained on the binary sequences are used to discriminate between exons and introns. We term this approach the Binary Abstraction Markov Model (BAMM). High-quality abstraction schemes for exon/intron discrimination are selected using optimization algorithms on supercomputers. The best MM classifiers are then combined using support vector machines into a single classifier. With this approach, over 95% classification accuracy is achieved without taking reading frame into account. With further development, the BAMM approach can be applied to sequences lacking the genetic code such as ncRNAs and 5'-untranslated regions. 相似文献
5.
6.
We consider a class of semiparametric models for the covariate distribution and missing data mechanism for missing covariate and/or response data for general classes of regression models including generalized linear models and generalized linear mixed models. Ignorable and nonignorable missing covariate and/or response data are considered. The proposed semiparametric model can be viewed as a sensitivity analysis for model misspecification of the missing covariate distribution and/or missing data mechanism. The semiparametric model consists of a generalized additive model (GAM) for the covariate distribution and/or missing data mechanism. Penalized regression splines are used to express the GAMs as a generalized linear mixed effects model, in which the variance of the corresponding random effects provides an intuitive index for choosing between the semiparametric and parametric model. Maximum likelihood estimates are then obtained via the EM algorithm. Simulations are given to demonstrate the methodology, and a real data set from a melanoma cancer clinical trial is analyzed using the proposed methods. 相似文献
7.
Vergne N 《Statistical applications in genetics and molecular biology》2008,7(1):Article6
In this article, we introduce the drifting Markov models (DMMs) which are inhomogeneous Markov models designed for modeling the heterogeneities of sequences (in our case DNA or protein sequences) in a more flexible way than homogeneous Markov chains or even hidden Markov models (HMMs). We focus here on the polynomial drift: the transition matrix varies in a polynomial way. To show the reliability of our models on DNA, we exhibit high similarities between the probability distributions of nucleotides obtained by our models and the frequencies of these nucleotides computed by using a sliding window. In a further step, these DMMs can be used as the states of an HMM: on each of its segments, the observed process can be modeled by a drifting Markov model. Search of rare words in DNA sequences remains possible with DMMs and according to the fits provided, DMMs turn out to be a powerful tool for this purpose. The software is available on request from the author. It will soon be integrated on seq++ library (http://stat.genopole.cnrs.fr/seqpp/). 相似文献
8.
Identifying and characterizing the structure in genome sequences is one of the principal challenges in modern molecular biology, and comparative genomics offers a powerful tool. In this paper, we introduce a hidden Markov model that allows a comparative analysis of multiple sequences related by a phylogenetic tree, and we present an efficient method for estimating the parameters of the model. The model integrates structure prediction methods for one sequence, statistical multiple alignment methods, and phylogenetic information. This unified model is particularly useful for a detailed characterization of DNA sequences with a common gene. We illustrate the model on a variety of homologous sequences. 相似文献
9.
10.
Cost-efficient study designs for binary response data with Gaussian covariate measurement error. 总被引:2,自引:0,他引:2
When mismeasurement of the exposure variable is anticipated, epidemiologic cohort studies may be augmented to include a validation study, where a small sample of data relating the imperfect exposure measurement method to the better method is collected. Optimal study designs (i.e., least expensive subject to specified power constraints) are developed that give the overall sample size and proportion of the overall sample size allocated to the validation study. If better exposure measurements can be collected on a sample of subjects, an optimal design can be suggested that conforms to realistic budgetary constraints. The properties of three designs--those that include an internal validation study, those where the validated subsample is derived from subjects external to the primary investigation, and those that use the better method of exposure assessment on all subjects--are compared. The proportion of overall study resources allocated to the validation substudy increases with increasing sample disease frequency, decreasing unit cost of the superior exposure measurement relative to the imperfect one, increasing unit cost of outcome ascertainment, increasing distance between two alternative values of the relative risk between which the study is designed to discriminate, and increasing magnitude of hypothesized values. This proportion also depends in a nonlinear fashion on the severity of measurement error, and when the validation study is internal, measurement error reaches a point after which the optimal design is the smaller, fully validated one. 相似文献
11.
12.
Summary . Joint models are formulated to investigate the association between a primary endpoint and features of multiple longitudinal processes. In particular, the subject-specific random effects in a multivariate linear random-effects model for multiple longitudinal processes are predictors in a generalized linear model for primary endpoints. Li, Zhang, and Davidian (2004, Biometrics 60 , 1–7) proposed an estimation procedure that makes no distributional assumption on the random effects but assumes independent within-subject measurement errors in the longitudinal covariate process. Based on an asymptotic bias analysis, we found that their estimators can be biased when random effects do not fully explain the within-subject correlations among longitudinal covariate measurements. Specifically, the existing procedure is fairly sensitive to the independent measurement error assumption. To overcome this limitation, we propose new estimation procedures that require neither a distributional or covariance structural assumption on covariate random effects nor an independence assumption on within-subject measurement errors. These new procedures are more flexible, readily cover scenarios that have multivariate longitudinal covariate processes, and can be implemented using available software. Through simulations and an analysis of data from a hypertension study, we evaluate and illustrate the numerical performances of the new estimators. 相似文献
13.
Markov chain models for threshold exceedances 总被引:7,自引:0,他引:7
14.
We consider hidden Markov models as a versatile class of models for weakly dependent random phenomena. The topic of the present paper is likelihood-ratio testing for hidden Markov models, and we show that, under appropriate conditions, the standard asymptotic theory of likelihood-ratio tests is valid. Such tests are crucial in the specification of multivariate Gaussian hidden Markov models, which we use to illustrate the applicability of our general results. Finally, the methodology is illustrated by means of a real data set. 相似文献
15.
16.
Lunter G 《Bioinformatics (Oxford, England)》2007,23(18):2485-2487
17.
18.
Rathouz PJ 《Biostatistics (Oxford, England)》2007,8(2):345-356
Methods in the literature for missing covariate data in survival models have relied on the missing at random (MAR) assumption to render regression parameters identifiable. MAR means that missingness can depend on the observed exit time, and whether or not that exit is a failure or a censoring event. By considering ways in which missingness of covariate X could depend on the true but possibly censored failure time T and the true censoring time C, we attempt to identify missingness mechanisms which would yield MAR data. We find that, under various reasonable assumptions about how missingness might depend on T and/or C, additional strong assumptions are needed to obtain MAR. We conclude that MAR is difficult to justify in practical applications. One exception arises when missingness is independent of T, and C is independent of the value of the missing X. As alternatives to MAR, we propose two new missingness assumptions. In one, the missingness depends on T but not on C; in the other, the situation is reversed. For each, we show that the failure time model is identifiable. When missingness is independent of T, we show that the naive complete record analysis will yield a consistent estimator of the failure time distribution. When missingness is independent of C, we develop a complete record likelihood function and a corresponding estimator for parametric failure time models. We propose analyses to evaluate the plausibility of either assumption in a particular data set, and illustrate the ideas using data from the literature on this problem. 相似文献
19.
This paper studies a class of Poisson mixture models that includes covariates in rates. This model contains Poisson regression and independent Poisson mixtures as special cases. Estimation methods based on the EM and quasi-Newton algorithms, properties of these estimates, a model selection procedure, residual analysis, and goodness-of-fit test are discussed. A Monte Carlo study investigates implementation and model choice issues. This methodology is used to analyze seizure frequency and Ames salmonella assay data. 相似文献
20.
On errors-in-variables for binary regression models 总被引:2,自引:0,他引:2
CARROLL RAYMOND J.; SPIEGELMAN CLIFFORD H.; LAN K. K. GORDON; BAILEY KENT T.; ABBOTT ROBERT D. 《Biometrika》1984,71(1):19-25