期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Continuous time Markov models for binary longitudinal data

Jones RH Xu S Grunwald GK 《Biometrical journal. Biometrische Zeitschrift》2006,48(3):411-419

Longitudinal data usually consist of a number of short time series. A group of subjects or groups of subjects are followed over time and observations are often taken at unequally spaced time points, and may be at different times for different subjects. When the errors and random effects are Gaussian, the likelihood of these unbalanced linear mixed models can be directly calculated, and nonlinear optimization used to obtain maximum likelihood estimates of the fixed regression coefficients and parameters in the variance components. For binary longitudinal data, a two state, non-homogeneous continuous time Markov process approach is used to model serial correlation within subjects. Formulating the model as a continuous time Markov process allows the observations to be equally or unequally spaced. Fixed and time varying covariates can be included in the model, and the continuous time model allows the estimation of the odds ratio for an exposure variable based on the steady state distribution. Exact likelihoods can be calculated. The initial probability distribution on the first observation on each subject is estimated using logistic regression that can involve covariates, and this estimation is embedded in the overall estimation. These models are applied to an intervention study designed to reduce children's sun exposure. 相似文献

2.

Clustering metagenomic sequences with interpolated Markov models

David R Kelley Steven L Salzberg 《BMC bioinformatics》2010,11(1):544

Background

Sequencing of environmental DNA (often called metagenomics) has shown tremendous potential to uncover the vast number of unknown microbes that cannot be cultured and sequenced by traditional methods. Because the output from metagenomic sequencing is a large set of reads of unknown origin, clustering reads together that were sequenced from the same species is a crucial analysis step. Many effective approaches to this task rely on sequenced genomes in public databases, but these genomes are a highly biased sample that is not necessarily representative of environments interesting to many metagenomics projects. 相似文献

3.

Quasi-consensus-based comparison of profile hidden Markov models for protein sequences 总被引：2，自引：0，他引：2

Kahsay RY Wang G Gao G Liao L Dunbrack R 《Bioinformatics (Oxford, England)》2005,21(10):2287-2293

A simple approach for the sensitive detection of distant relationships among protein families and for sequence-structure alignment via comparison of hidden Markov models based on their quasi-consensus sequences is presented. Using a previously published benchmark dataset, the approach is demonstrated to give better homology detection and yield alignments with improved accuracy in comparison to an existing state-of-the-art dynamic programming profile-profile comparison method. This method also runs significantly faster and is therefore suitable for a server covering the rapidly increasing structure database. A server based on this method is available at http://liao.cis.udel.edu/website/servers/modmod 相似文献

4.

Exploiting mid-range DNA patterns for sequence classification: binary abstraction Markov models

Shepard SS McSweeny A Serpen G Fedorov A 《Nucleic acids research》2012,40(11):4765-4773

Messenger RNA sequences possess specific nucleotide patterns distinguishing them from non-coding genomic sequences. In this study, we explore the utilization of modified Markov models to analyze sequences up to 44 bp, far beyond the 8-bp limit of conventional Markov models, for exon/intron discrimination. In order to analyze nucleotide sequences of this length, their information content is first reduced by conversion into shorter binary patterns via the application of numerous abstraction schemes. After the conversion of genomic sequences to binary strings, homogenous Markov models trained on the binary sequences are used to discriminate between exons and introns. We term this approach the Binary Abstraction Markov Model (BAMM). High-quality abstraction schemes for exon/intron discrimination are selected using optimization algorithms on supercomputers. The best MM classifiers are then combined using support vector machines into a single classifier. With this approach, over 95% classification accuracy is achieved without taking reading frame into account. With further development, the BAMM approach can be applied to sequences lacking the genetic code such as ncRNAs and 5'-untranslated regions. 相似文献

5.

Bivariate models for dependence of angular observations and a related Markov process 总被引：2，自引：0，他引：2

WEHRLY THOMAS E.; JOHNSON RICHARD A. 《Biometrika》1980,67(1):255-256

相似文献

6.

Semiparametric models for missing covariate and response data in regression models

Chen Q Ibrahim JG 《Biometrics》2006,62(1):177-184

We consider a class of semiparametric models for the covariate distribution and missing data mechanism for missing covariate and/or response data for general classes of regression models including generalized linear models and generalized linear mixed models. Ignorable and nonignorable missing covariate and/or response data are considered. The proposed semiparametric model can be viewed as a sensitivity analysis for model misspecification of the missing covariate distribution and/or missing data mechanism. The semiparametric model consists of a generalized additive model (GAM) for the covariate distribution and/or missing data mechanism. Penalized regression splines are used to express the GAMs as a generalized linear mixed effects model, in which the variance of the corresponding random effects provides an intuitive index for choosing between the semiparametric and parametric model. Maximum likelihood estimates are then obtained via the EM algorithm. Simulations are given to demonstrate the methodology, and a real data set from a melanoma cancer clinical trial is analyzed using the proposed methods. 相似文献

7.

Drifting Markov models with polynomial drift and applications to DNA sequences

Vergne N 《Statistical applications in genetics and molecular biology》2008,7(1):Article6

In this article, we introduce the drifting Markov models (DMMs) which are inhomogeneous Markov models designed for modeling the heterogeneities of sequences (in our case DNA or protein sequences) in a more flexible way than homogeneous Markov chains or even hidden Markov models (HMMs). We focus here on the polynomial drift: the transition matrix varies in a polynomial way. To show the reliability of our models on DNA, we exhibit high similarities between the probability distributions of nucleotides obtained by our models and the frequencies of these nucleotides computed by using a sliding window. In a further step, these DMMs can be used as the states of an HMM: on each of its segments, the observed process can be modeled by a drifting Markov model. Search of rare words in DNA sequences remains possible with DMMs and according to the fits provided, DMMs turn out to be a powerful tool for this purpose. The software is available on request from the author. It will soon be integrated on seq++ library (http://stat.genopole.cnrs.fr/seqpp/). 相似文献

8.

Applications of hidden Markov models for characterization of homologous DNA sequences with a common gene.

Asger Hobolth Jens Ledet Jensen 《Journal of computational biology》2005,12(2):186-203

Identifying and characterizing the structure in genome sequences is one of the principal challenges in modern molecular biology, and comparative genomics offers a powerful tool. In this paper, we introduce a hidden Markov model that allows a comparative analysis of multiple sequences related by a phylogenetic tree, and we present an efficient method for estimating the parameters of the model. The model integrates structure prediction methods for one sequence, statistical multiple alignment methods, and phylogenetic information. This unified model is particularly useful for a detailed characterization of DNA sequences with a common gene. We illustrate the model on a variety of homologous sequences. 相似文献

9.

Random-effects models for binary responses

D Gianola R L Fernando 《Biometrics》1986,42(1):217-218

相似文献

10.

Cost-efficient study designs for binary response data with Gaussian covariate measurement error. 总被引：2，自引：0，他引：2

D Spiegelman R Gray 《Biometrics》1991,47(3):851-869

When mismeasurement of the exposure variable is anticipated, epidemiologic cohort studies may be augmented to include a validation study, where a small sample of data relating the imperfect exposure measurement method to the better method is collected. Optimal study designs (i.e., least expensive subject to specified power constraints) are developed that give the overall sample size and proportion of the overall sample size allocated to the validation study. If better exposure measurements can be collected on a sample of subjects, an optimal design can be suggested that conforms to realistic budgetary constraints. The properties of three designs--those that include an internal validation study, those where the validated subsample is derived from subjects external to the primary investigation, and those that use the better method of exposure assessment on all subjects--are compared. The proportion of overall study resources allocated to the validation substudy increases with increasing sample disease frequency, decreasing unit cost of the superior exposure measurement relative to the imperfect one, increasing unit cost of outcome ascertainment, increasing distance between two alternative values of the relative risk between which the study is designed to discriminate, and increasing magnitude of hypothesized values. This proportion also depends in a nonlinear fashion on the severity of measurement error, and when the validation study is internal, measurement error reaches a point after which the optimal design is the smaller, fully validated one. 相似文献

11.

Binding site discovery from nucleic acid sequences by discriminative learning of hidden Markov models

Jonas Maaskola Nikolaus Rajewsky 《Nucleic acids research》2014,42(21):12995-13011

相似文献

12.

Joint models for a primary endpoint and multiple longitudinal covariate processes

Li E Wang N Wang NY 《Biometrics》2007,63(4):1068-1078

Summary . Joint models are formulated to investigate the association between a primary endpoint and features of multiple longitudinal processes. In particular, the subject-specific random effects in a multivariate linear random-effects model for multiple longitudinal processes are predictors in a generalized linear model for primary endpoints. Li, Zhang, and Davidian (2004, Biometrics 60 , 1–7) proposed an estimation procedure that makes no distributional assumption on the random effects but assumes independent within-subject measurement errors in the longitudinal covariate process. Based on an asymptotic bias analysis, we found that their estimators can be biased when random effects do not fully explain the within-subject correlations among longitudinal covariate measurements. Specifically, the existing procedure is fairly sensitive to the independent measurement error assumption. To overcome this limitation, we propose new estimation procedures that require neither a distributional or covariance structural assumption on covariate random effects nor an independence assumption on within-subject measurement errors. These new procedures are more flexible, readily cover scenarios that have multivariate longitudinal covariate processes, and can be implemented using available software. Through simulations and an analysis of data from a hypertension study, we evaluate and illustrate the numerical performances of the new estimators. 相似文献

13.

Markov chain models for threshold exceedances 总被引：7，自引：0，他引：7

SMITH RICHARD L.; TAWN JONATHAN A.; COLES STUART G. 《Biometrika》1997,84(2):249-268

相似文献

14.

Likelihood-ratio tests for hidden Markov models

Giudici P Rydén T Vandekerkhove P 《Biometrics》2000,56(3):742-747

We consider hidden Markov models as a versatile class of models for weakly dependent random phenomena. The topic of the present paper is likelihood-ratio testing for hidden Markov models, and we show that, under appropriate conditions, the standard asymptotic theory of likelihood-ratio tests is valid. Such tests are crucial in the specification of multivariate Gaussian hidden Markov models, which we use to illustrate the applicability of our general results. Finally, the methodology is illustrated by means of a real data set. 相似文献

15.

Dynamical reweighting methods for Markov models

《Current opinion in structural biology》2020

相似文献

16.

HMMoC--a compiler for hidden Markov models

Lunter G 《Bioinformatics (Oxford, England)》2007,23(18):2485-2487

相似文献

17.

Markov encoding for detecting signals in genomic sequences

Rajapakse JC Ho LS 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2005,2(2):131-142

相似文献

18.

Identifiability assumptions for missing covariate data in failure time regression models

Rathouz PJ 《Biostatistics (Oxford, England)》2007,8(2):345-356

Methods in the literature for missing covariate data in survival models have relied on the missing at random (MAR) assumption to render regression parameters identifiable. MAR means that missingness can depend on the observed exit time, and whether or not that exit is a failure or a censoring event. By considering ways in which missingness of covariate X could depend on the true but possibly censored failure time T and the true censoring time C, we attempt to identify missingness mechanisms which would yield MAR data. We find that, under various reasonable assumptions about how missingness might depend on T and/or C, additional strong assumptions are needed to obtain MAR. We conclude that MAR is difficult to justify in practical applications. One exception arises when missingness is independent of T, and C is independent of the value of the missing X. As alternatives to MAR, we propose two new missingness assumptions. In one, the missingness depends on T but not on C; in the other, the situation is reversed. For each, we show that the failure time model is identifiable. When missingness is independent of T, we show that the naive complete record analysis will yield a consistent estimator of the failure time distribution. When missingness is independent of C, we develop a complete record likelihood function and a corresponding estimator for parametric failure time models. We propose analyses to evaluate the plausibility of either assumption in a particular data set, and illustrate the ideas using data from the literature on this problem. 相似文献

19.

Mixed Poisson regression models with covariate dependent rates

Wang P Puterman ML Cockburn I Le N 《Biometrics》1996,52(2):381-400

This paper studies a class of Poisson mixture models that includes covariates in rates. This model contains Poisson regression and independent Poisson mixtures as special cases. Estimation methods based on the EM and quasi-Newton algorithms, properties of these estimates, a model selection procedure, residual analysis, and goodness-of-fit test are discussed. A Monte Carlo study investigates implementation and model choice issues. This methodology is used to analyze seizure frequency and Ames salmonella assay data. 相似文献

20.

On errors-in-variables for binary regression models 总被引：2，自引：0，他引：2

CARROLL RAYMOND J.; SPIEGELMAN CLIFFORD H.; LAN K. K. GORDON; BAILEY KENT T.; ABBOTT ROBERT D. 《Biometrika》1984,71(1):19-25

相似文献