首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
Peptide Microarray Immunoassay (PMI for brevity) is a novel technology that enables researchers to map a large number of proteomic measurements at a peptide level, providing information regarding the relationship between antibody response and clinical sensitivity. PMI studies aim at recognizing antigen-specific antibodies from serum samples and at detecting epitope regions of the protein antigen. PMI data present new challenges for statistical analysis mainly due to the structural dependence among peptides. A PMI is made of a complete library of consecutive peptides. They are synthesized by systematically shifting a window of a fixed number of amino acids through the finite sequence of amino acids of the antigen protein as ordered in the primary structure of the protein. This implies that consecutive peptides have a certain number of amino acids in common and hence are structurally dependent. We propose a new flexible Bayesian hierarchical model framework, which allows one to detect recognized peptides and bound epitope regions in a single framework, taking into account the structural dependence between peptides through a suitable latent Markov structure. The proposed model is illustrated using PMI data from a recent study about egg allergy. A simulation study shows that the proposed model is more powerful and robust in terms of epitope detection than simpler models overlooking some of the dependence structure.  相似文献   

MOTIVATION: Cellular processes cause changes over time. Observing and measuring those changes over time allows insights into the how and why of regulation. The experimental platform for doing the appropriate large-scale experiments to obtain time-courses of expression levels is provided by microarray technology. However, the proper way of analyzing the resulting time course data is still very much an issue under investigation. The inherent time dependencies in the data suggest that clustering techniques which reflect those dependencies yield improved performance. RESULTS: We propose to use Hidden Markov Models (HMMs) to account for the horizontal dependencies along the time axis in time course data and to cope with the prevalent errors and missing values. The HMMs are used within a model-based clustering framework. We are given a number of clusters, each represented by one Hidden Markov Model from a finite collection encompassing typical qualitative behavior. Then, our method finds in an iterative procedure cluster models and an assignment of data points to these models that maximizes the joint likelihood of clustering and models. Partially supervised learning--adding groups of labeled data to the initial collection of clusters--is supported. A graphical user interface allows querying an expression profile dataset for time course similar to a prototype graphically defined as a sequence of levels and durations. We also propose a heuristic approach to automate determination of the number of clusters. We evaluate the method on published yeast cell cycle and fibroblasts serum response datasets, and compare them, with favorable results, to the autoregressive curves method.  相似文献   

Time course microarray experiments designed to characterize the dynamic regulation of gene expression in biological systems are becoming increasingly important. One critical issue that arises when examining time course microarray data is the identification of genes that show different temporal expression patterns among biological conditions. Here we propose a Bayesian hierarchical model to incorporate important experimental factors and to account for correlated gene expression measurements over time and over different genes. A new gene selection algorithm is also presented with the model to simultaneously identify genes that show changes in expression among biological conditions, in response to time and other experimental factors of interest. The algorithm performs well in terms of the false positive and false negative rates in simulation studies. The methodology is applied to a mouse model time course experiment to correlate temporal changes in azoxymethane-induced gene expression profiles with colorectal cancer susceptibility.  相似文献   

Electronic telemetry is frequently used to document animal movement through time. Methods that can identify underlying behaviors driving specific movement patterns can help us understand how and why animals use available space, thereby aiding conservation and management efforts. For aquatic animal tracking data with significant measurement error, a Bayesian state‐space model called the first‐Difference Correlated Random Walk with Switching (DCRWS) has often been used for this purpose. However, for aquatic animals, highly accurate tracking data are now becoming more common. We developed a new hidden Markov model (HMM) for identifying behavioral states from animal tracks with negligible error, called the hidden Markov movement model (HMMM). We implemented as the basis for the HMMM the process equation of the DCRWS, but we used the method of maximum likelihood and the R package TMB for rapid model fitting. The HMMM was compared to a modified version of the DCRWS for highly accurate tracks, the DCRWS, and to a common HMM for animal tracks fitted with the R package moveHMM. We show that the HMMM is both accurate and suitable for multiple species by fitting it to real tracks from a grey seal, lake trout, and blue shark, as well as to simulated data. The HMMM is a fast and reliable tool for making meaningful inference from animal movement data that is ideally suited for ecologists who want to use the popular DCRWS implementation and have highly accurate tracking data. It additionally provides a groundwork for development of more complex modeling of animal movement with TMB. To facilitate its uptake, we make it available through the R package swim.  相似文献   



Determining whether a gene is differentially expressed in two different samples remains an important statistical problem. Prior work in this area has featured the use of t-tests with pooled estimates of the sample variance based on similarly expressed genes. These methods do not display consistent behavior across the entire range of pooling and can be biased when the prior hyperparameters are specified heuristically.  相似文献   

Recent developments in microarrays technology enable researchers to study simultaneously the expression of thousands of genes from one cell line or tissue sample. This new technology is often used to assess changes in mRNA expression upon a specified transfection for a cell line in order to identify target genes. For such experiments, the range of differential expression is moderate, and teasing out the modified genes is challenging and calls for detailed modeling. The aim of this paper is to propose a methodological framework for studies that investigate differential gene expression through microarrays technology that is based on a fully Bayesian mixture approach (Richardson and Green, 1997). A case study that investigated those genes that were differentially expressed in two cell lines (normal and modified by a gene transfection) is provided to illustrate the performance and usefulness of this approach.  相似文献   

We propose a hidden Markov model for multivariate continuous longitudinal responses with covariates that accounts for three different types of missing pattern: (I) partially missing outcomes at a given time occasion, (II) completely missing outcomes at a given time occasion (intermittent pattern), and (III) dropout before the end of the period of observation (monotone pattern). The missing-at-random (MAR) assumption is formulated to deal with the first two types of missingness, while to account for the informative dropout, we rely on an extra absorbing state. Estimation of the model parameters is based on the maximum likelihood method that is implemented by an expectation-maximization (EM) algorithm relying on suitable recursions. The proposal is illustrated by a Monte Carlo simulation study and an application based on historical data on primary biliary cholangitis.  相似文献   

In the decade since their invention, spotted microarrays have been undergoing technical advances that have increased the utility, scope and precision of their ability to measure gene expression. At the same time, more researchers are taking advantage of the fundamentally quantitative nature of these tools with refined experimental designs and sophisticated statistical analyses. These new approaches utilise the power of microarrays to estimate differences in gene expression levels, rather than just categorising genes as up- or down-regulated, and allow the comparison of expression data across multiple samples. In this review, some of the technical aspects of spotted microarrays that can affect statistical inference are highlighted, and a discussion is provided of how several methods for estimating gene expression level across multiple samples deal with these challenges. The focus is on a Bayesian analysis method, BAGEL, which is easy to implement and produces easily interpreted results.  相似文献   

Protein-protein interactions play a defining role in protein function. Identifying the sites of interaction in a protein is a critical problem for understanding its functional mechanisms, as well as for drug design. To predict sites within a protein chain that participate in protein complexes, we have developed a novel method based on the Hidden Markov Model, which combines several biological characteristics of the sequences neighboring a target residue: structural information, accessible surface area, and transition probability among amino acids. We have evaluated the method using 5-fold cross-validation on 139 unique proteins and demonstrated precision of 66% and recall of 61% in identifying interfaces. These results are better than those achieved by other methods used for identification of interfaces.  相似文献   

Exposure to air pollution is associated with increased morbidity and mortality. Recent technological advancements permit the collection of time-resolved personal exposure data. Such data are often incomplete with missing observations and exposures below the limit of detection, which limit their use in health effects studies. In this paper, we develop an infinite hidden Markov model for multiple asynchronous multivariate time series with missing data. Our model is designed to include covariates that can inform transitions among hidden states. We implement beam sampling, a combination of slice sampling and dynamic programming, to sample the hidden states, and a Bayesian multiple imputation algorithm to impute missing data. In simulation studies, our model excels in estimating hidden states and state-specific means and imputing observations that are missing at random or below the limit of detection. We validate our imputation approach on data from the Fort Collins Commuter Study. We show that the estimated hidden states improve imputations for data that are missing at random compared to existing approaches. In a case study of the Fort Collins Commuter Study, we describe the inferential gains obtained from our model including improved imputation of missing data and the ability to identify shared patterns in activity and exposure among repeated sampling days for individuals and among distinct individuals.  相似文献   

A hidden Markov model for progressive multiple alignment   总被引:4,自引:0,他引:4  
MOTIVATION: Progressive algorithms are widely used heuristics for the production of alignments among multiple nucleic-acid or protein sequences. Probabilistic approaches providing measures of global and/or local reliability of individual solutions would constitute valuable developments. RESULTS: We present here a new method for multiple sequence alignment that combines an HMM approach, a progressive alignment algorithm, and a probabilistic evolution model describing the character substitution process. Our method works by iterating pairwise alignments according to a guide tree and defining each ancestral sequence from the pairwise alignment of its child nodes, thus, progressively constructing a multiple alignment. Our method allows for the computation of each column minimum posterior probability and we show that this value correlates with the correctness of the result, hence, providing an efficient mean by which unreliably aligned columns can be filtered out from a multiple alignment.  相似文献   

Surveillance systems tracking health patterns in animals have potential for early warning of infectious disease in humans, yet there are many challenges that remain before this can be realized. Specifically, there remains the challenge of detecting early warning signals for diseases that are not known or are not part of routine surveillance for named diseases. This paper reports on the development of a hidden Markov model for analysis of frontline veterinary sentinel surveillance data from Sri Lanka. Field veterinarians collected data on syndromes and diagnoses using mobile phones. A model for submission patterns accounts for both sentinel-related and disease-related variability. Models for commonly reported cattle diagnoses were estimated separately. Region-specific weekly average prevalence was estimated for each diagnoses and partitioned into normal and abnormal periods. Visualization of state probabilities was used to indicate areas and times of unusual disease prevalence. The analysis suggests that hidden Markov modelling is a useful approach for surveillance datasets from novel populations and/or having little historical baselines.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号