期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A context dependent pair hidden Markov model for statistical alignment

Arribas-Gil A Matias C 《Statistical applications in genetics and molecular biology》2012,11(1):Article 5

This article proposes a novel approach to statistical alignment of nucleotide sequences by introducing a context dependent structure on the substitution process in the underlying evolutionary model. We propose to estimate alignments and context dependent mutation rates relying on the observation of two homologous sequences. The procedure is based on a generalized pair-hidden Markov structure, where conditional on the alignment path, the nucleotide sequences follow a Markov distribution. We use a stochastic approximation expectation maximization (saem) algorithm to give accurate estimators of parameters and alignments. We provide results both on simulated data and vertebrate genomes, which are known to have a high mutation rate from CG dinucleotide. In particular, we establish that the method improves the accuracy of the alignment of a human pseudogene and its functional gene. 相似文献

2.

Robust depth-based estimation in the time warping model

Arribas-Gil A Romo J 《Biostatistics (Oxford, England)》2012,13(3):398-414

In functional data analysis, the time warping model aims at representing a set of curves exhibiting phase and amplitude variation with respect to a common continuous process. Many biological processes, when observed across the time among different individuals, fit into this concept. The observed curves are modeled as the composition of an "amplitude process," which governs the common behavior, and a "warping process" that induces time distortion among the individuals. We aim at characterizing the first one. Because of the phase variation present among the curves, classical sample statistics computed on the observed sample provide poor representations of the amplitude process. Existing methods to estimate the mean behavior of the amplitude process consist on aligning the curves, that is, eliminating time variation, before estimation. However, since they rely on the use of sample means, they are very sensitive to the presence of outliers. In this article, we propose the use of a functional depth-based median as a robust estimator of the central behavior of the amplitude process. We investigate its properties in the time warping model, and we evaluate its performance in different simulation studies where we compare it to existing estimators, and we show its robustness against atypical observations. Finally, we illustrate its use with real data on a yeast time course microarray data set. 相似文献

3.

Statistical Alignment with a Sequence Evolution Model Allowing Rate Heterogeneity along the Sequence

Arribas-Gil Ana Metzler Dirk Plouhinec Jean-Louis 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2009,6(2):281-295

We present a stochastic sequence evolution model to obtain alignments and estimate mutation rates between two homologous sequences. The model allows two possible evolutionary behaviors along a DNA sequence in order to determine conserved regions and take its heterogeneity into account. In our model, the sequence is divided into slow and fast evolution regions. The boundaries between these sections are not known. It is our aim to detect them. The evolution model is based on a fragment insertion and deletion process working on fast regions only and on a substitution process working on fast and slow regions with different rates. This model induces a pair hidden Markov structure at the level of alignments, thus making efficient statistical alignment algorithms possible. We propose two complementary estimation methods, namely, a Gibbs sampler for Bayesian estimation and a stochastic version of the EM algorithm for maximum likelihood estimation. Both algorithms involve the sampling of alignments. We propose a partial alignment sampler, which is computationally less expensive than the typical whole alignment sampler. We show the convergence of the two estimation algorithms when used with this partial sampler. Our algorithms provide consistent estimates for the mutation rates and plausible alignments and sequence segmentations on both simulated and real data. 相似文献