首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Hidden Markov models (HMMs) are one of various methods that have been applied to prediction of major histo-compatibility complex (MHC) binding peptide. In terms of model topology, a fully-connected HMM (fcHMM) has the greatest potential to predict binders, at the cost of intensive computation. While a profile HMM (pHMM) performs dramatically fewer computations, it potentially merges overlapping patterns into one which results in some patterns being missed. In a profile HMM a state corresponds to a position on a peptide while in an fcHMM a state has no specific biological meaning. This work proposes optimally-connected HMMs (ocHMMs), which do not merge overlapping patterns and yet, by performing topological reductions, a model's connectivity is greatly reduced from an fcHMM. The parameters of ocHMMs are initialized using a novel amino acid grouping approach called "multiple property grouping." Each group represents a state in an ocHMM. The proposed ocHMMs are compared to a pHMM implementation using HMMER, based on performance tests on two MHC alleles HLA (Human Leukocyte Antigen)-A*0201 and HLA-B*3501. The results show that the heuristic approaches can be adjusted to make an ocHMM achieve higher predictive accuracy than HMMER. Hence, such obtained ocHMMs are worthy of trial for predicting MHC-binding peptides.  相似文献   

2.
Motivation: Most genome-wide association studies rely on singlenucleotide polymorphism (SNP) analyses to identify causal loci.The increased stringency required for genome-wide analyses (withper-SNP significance threshold typically 10–7) meansthat many real signals will be missed. Thus it is still highlyrelevant to develop methods with improved power at low typeI error. Haplotype-based methods provide a promising approach;however, they suffer from statistical problems such as abundanceof rare haplotypes and ambiguity in defining haplotype blockboundaries. Results: We have developed an ancestral haplotype clustering(AncesHC) association method which addresses many of these problems.It can be applied to biallelic or multiallelic markers typedin haploid, diploid or multiploid organisms, and also handlesmissing genotypes. Our model is free from the assumption ofa rigid block structure but recognizes a block-like structureif it exists in the data. We employ a Hidden Markov Model (HMM)to cluster the haplotypes into groups of predicted common ancestralorigin. We then test each cluster for association with diseaseby comparing the numbers of cases and controls with 0, 1 and2 chromosomes in the cluster. We demonstrate the power of thisapproach by simulation of case-control status under a rangeof disease models for 1500 outcrossed mice originating fromeight inbred lines. Our results suggest that AncesHC has substantiallymore power than single-SNP analyses to detect disease association,and is also more powerful than the cladistic haplotype clusteringmethod CLADHC. Availability: The software can be downloaded from http://www.imperial.ac.uk/medicine/people/l.coin Contact: I.coin{at}imperial.ac.uk Supplementary Information: Supplementary data are availableat Bioinformatics online. Associate Editor: Martin Bishop  相似文献   

3.
4.
5.
KEGG Mapper for inferring cellular functions from protein sequences   总被引:1,自引:0,他引:1  
KEGG is a reference knowledge base for biological interpretation of large‐scale molecular datasets, such as genome and metagenome sequences. It accumulates experimental knowledge about high‐level functions of the cell and the organism represented in terms of KEGG molecular networks, including KEGG pathway maps, BRITE hierarchies, and KEGG modules. By the process called KEGG mapping, a set of protein coding genes in the genome, for example, can be converted to KEGG molecular networks enabling interpretation of cellular functions and other high‐level features. Here we report a new version of KEGG Mapper, a suite of KEGG mapping tools available at the KEGG website ( https://www.kegg.jp/ or https://www.genome.jp/kegg/ ), together with the KOALA family tools for automatic assignment of KO (KEGG Orthology) identifiers used in the mapping.  相似文献   

6.
As hidden Markov models (HMMs) become increasingly more important in the analysis of biological sequences, so too have databases of HMMs expanded in size, number and importance. While the standard paradigm a short while ago was the analysis of one or a few sequences at a time, it has now become standard procedure to submit an entire microbial genome. In the future, it will be common to submit large groups of completed genomes to run simultaneously against a dozen public databases and any number of internally developed targets. This paper looks at some of the readily available HMM (or HMM-like) algorithms and several publicly available HMM databases, and outlines methods by which the reader may develop custom HMM targets.  相似文献   

7.
A simple approach for the sensitive detection of distant relationships among protein families and for sequence-structure alignment via comparison of hidden Markov models based on their quasi-consensus sequences is presented. Using a previously published benchmark dataset, the approach is demonstrated to give better homology detection and yield alignments with improved accuracy in comparison to an existing state-of-the-art dynamic programming profile-profile comparison method. This method also runs significantly faster and is therefore suitable for a server covering the rapidly increasing structure database. A server based on this method is available at http://liao.cis.udel.edu/website/servers/modmod  相似文献   

8.
Complex biological dynamics often generate sequences of discrete events which can be described as a Markov process. The order of the underlying Markovian stochastic process is fundamental for characterizing statistical dependencies within sequences. As an example for this class of biological systems, we investigate the Markov order of sequences of microsaccadic eye movements from human observers. We calculate the integrated likelihood of a given sequence for various orders of the Markov process and use this in a Bayesian framework for statistical inference on the Markov order. Our analysis shows that data from most participants are best explained by a first-order Markov process. This is compatible with recent findings of a statistical coupling of subsequent microsaccade orientations. Our method might prove to be useful for a broad class of biological systems.  相似文献   

9.
Hiroshi Mamitsuka 《Proteins》1998,33(4):460-474
The binding of a major histocompatibility complex (MHC) molecule to a peptide originating in an antigen is essential to recognizing antigens in immune systems, and it has proved to be important to use computers to predict the peptides that will bind to an MHC molecule. The purpose of this paper is twofold: First, we propose to apply supervised learning of hidden Markov models (HMMs) to this problem, which can surpass existing methods for the problem of predicting MHC-binding peptides. Second, we generate peptides that have high probabilities to bind to a certain MHC molecule, based on our proposed method using peptides binding to MHC molecules as a set of training data. From our experiments, in a type of cross-validation test, the discrimination accuracy of our supervised learning method is usually approximately 2–15% better than those of other methods, including backpropagation neural networks, which have been regarded as the most effective approach to this problem. Furthermore, using an HMM trained for HLA-A2, we present new peptide sequences that are provided with high binding probabilities by the HMM and that are thus expected to bind to HLA-A2 proteins. Peptide sequences not shown in this paper but with rather high binding probabilities can be obtained from the author (E-mail: mami@ccm.cl.nec.co.jp). Proteins 33:460–474, 1998. © 1998 Wiley-Liss, Inc.  相似文献   

10.
SUMMARY: I describe a parallel implementation of Rogers' mismatch algorithm, a method for making inferences about demographic history from DNA sequence data. The program is distributed on clusters of workstations, providing a substantial speedup and low execution times on large numbers of nodes. AVAILABILITY: Source code and documentation are available at http://mombasa.anthro.utah.edu/wooding/ CONTACT: stephen.wooding@anthro.utah.edu  相似文献   

11.
Lee K  Daniels MJ 《Biometrics》2007,63(4):1060-1067
Generalized linear models with serial dependence are often used for short longitudinal series. Heagerty (2002, Biometrics58, 342-351) has proposed marginalized transition models for the analysis of longitudinal binary data. In this article, we extend this work to accommodate longitudinal ordinal data. Fisher-scoring algorithms are developed for estimation. Methods are illustrated on quality-of-life data from a recent colorectal cancer clinical trial.  相似文献   

12.
13.
14.
Microarray expression profiles are inherently noisy and many different sources of variation exist in microarray experiments. It is still a significant challenge to develop stochastic models to realize noise in microarray expression profiles, which has profound influence on the reverse engineering of genetic regulation. Using the target genes of the tumour suppressor gene p53 as the test problem, we developed stochastic differential equation models and established the relationship between the noise strength of stochastic models and parameters of an error model for describing the distribution of the microarray measurements. Numerical results indicate that the simulated variance from stochastic models with a stochastic degradation process can be represented by a monomial in terms of the hybridization intensity and the order of the monomial depends on the type of stochastic process. The developed stochastic models with multiple stochastic processes generated simulations whose variance is consistent with the prediction of the error model. This work also established a general method to develop stochastic models from experimental information.  相似文献   

15.
16.
Exposure to air pollution is associated with increased morbidity and mortality. Recent technological advancements permit the collection of time-resolved personal exposure data. Such data are often incomplete with missing observations and exposures below the limit of detection, which limit their use in health effects studies. In this paper, we develop an infinite hidden Markov model for multiple asynchronous multivariate time series with missing data. Our model is designed to include covariates that can inform transitions among hidden states. We implement beam sampling, a combination of slice sampling and dynamic programming, to sample the hidden states, and a Bayesian multiple imputation algorithm to impute missing data. In simulation studies, our model excels in estimating hidden states and state-specific means and imputing observations that are missing at random or below the limit of detection. We validate our imputation approach on data from the Fort Collins Commuter Study. We show that the estimated hidden states improve imputations for data that are missing at random compared to existing approaches. In a case study of the Fort Collins Commuter Study, we describe the inferential gains obtained from our model including improved imputation of missing data and the ability to identify shared patterns in activity and exposure among repeated sampling days for individuals and among distinct individuals.  相似文献   

17.
Titman AC 《Biometrics》2011,67(3):780-787
Methods for fitting nonhomogeneous Markov models to panel-observed data using direct numerical solution to the Kolmogorov Forward equations are developed. Nonhomogeneous Markov models occur most commonly when baseline transition intensities depend on calendar time, but may also occur with deterministic time-dependent covariates such as age. We propose transition intensities based on B-splines as a smooth alternative to piecewise constant intensities and also as a generalization of time transformation models. An expansion of the system of differential equations allows first derivatives of the likelihood to be obtained, which can be used in a Fisher scoring algorithm for maximum likelihood estimation. The method is evaluated through a small simulation study and demonstrated on data relating to the development of cardiac allograft vasculopathy in posttransplantation patients.  相似文献   

18.

Background  

EST sequencing is a versatile approach for rapidly gathering protein coding sequences. They provide direct access to an organism's gene repertoire bypassing the still error-prone procedure of gene prediction from genomic data. Therefore, ESTs are often the only source for biological sequence data from taxa outside mainstream interest. The widespread use of ESTs in evolutionary studies and particularly in molecular systematics studies is still hindered by the lack of efficient and reliable approaches for automated ortholog predictions in ESTs. Existing methods either depend on a known species tree or cannot cope with redundancy in EST data.  相似文献   

19.
20.
Modeling of bioprocesses for engineering applications is a very difficult and time consuming task, due to their complex nonlinear dynamic behavior. In the last years several propositions for hybrid models, and especially serial approaches, were published and discussed, in order to combine analytical prior knowledge with the learning capabilities of Artificial Neural Networks (ANN). These approaches often require synchronous and equidistant sampled training data. However, in practice concentrations are mostly off-line measured, rare, and asynchronous. In this paper a new training method especially suited for very few asynchronously sampled data is presented and applied for modeling animal cell cultures. The achieved model is able to predict the concentrations of the reaction components inside a stirred tank bioreactor.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号