首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
As hidden Markov models (HMMs) become increasingly more important in the analysis of biological sequences, so too have databases of HMMs expanded in size, number and importance. While the standard paradigm a short while ago was the analysis of one or a few sequences at a time, it has now become standard procedure to submit an entire microbial genome. In the future, it will be common to submit large groups of completed genomes to run simultaneously against a dozen public databases and any number of internally developed targets. This paper looks at some of the readily available HMM (or HMM-like) algorithms and several publicly available HMM databases, and outlines methods by which the reader may develop custom HMM targets.  相似文献   

2.
3.
4.
Hidden Markov models (HMMs) are one of various methods that have been applied to prediction of major histo-compatibility complex (MHC) binding peptide. In terms of model topology, a fully-connected HMM (fcHMM) has the greatest potential to predict binders, at the cost of intensive computation. While a profile HMM (pHMM) performs dramatically fewer computations, it potentially merges overlapping patterns into one which results in some patterns being missed. In a profile HMM a state corresponds to a position on a peptide while in an fcHMM a state has no specific biological meaning. This work proposes optimally-connected HMMs (ocHMMs), which do not merge overlapping patterns and yet, by performing topological reductions, a model's connectivity is greatly reduced from an fcHMM. The parameters of ocHMMs are initialized using a novel amino acid grouping approach called "multiple property grouping." Each group represents a state in an ocHMM. The proposed ocHMMs are compared to a pHMM implementation using HMMER, based on performance tests on two MHC alleles HLA (Human Leukocyte Antigen)-A*0201 and HLA-B*3501. The results show that the heuristic approaches can be adjusted to make an ocHMM achieve higher predictive accuracy than HMMER. Hence, such obtained ocHMMs are worthy of trial for predicting MHC-binding peptides.  相似文献   

5.
Motivation: Most genome-wide association studies rely on singlenucleotide polymorphism (SNP) analyses to identify causal loci.The increased stringency required for genome-wide analyses (withper-SNP significance threshold typically 10–7) meansthat many real signals will be missed. Thus it is still highlyrelevant to develop methods with improved power at low typeI error. Haplotype-based methods provide a promising approach;however, they suffer from statistical problems such as abundanceof rare haplotypes and ambiguity in defining haplotype blockboundaries. Results: We have developed an ancestral haplotype clustering(AncesHC) association method which addresses many of these problems.It can be applied to biallelic or multiallelic markers typedin haploid, diploid or multiploid organisms, and also handlesmissing genotypes. Our model is free from the assumption ofa rigid block structure but recognizes a block-like structureif it exists in the data. We employ a Hidden Markov Model (HMM)to cluster the haplotypes into groups of predicted common ancestralorigin. We then test each cluster for association with diseaseby comparing the numbers of cases and controls with 0, 1 and2 chromosomes in the cluster. We demonstrate the power of thisapproach by simulation of case-control status under a rangeof disease models for 1500 outcrossed mice originating fromeight inbred lines. Our results suggest that AncesHC has substantiallymore power than single-SNP analyses to detect disease association,and is also more powerful than the cladistic haplotype clusteringmethod CLADHC. Availability: The software can be downloaded from http://www.imperial.ac.uk/medicine/people/l.coin Contact: I.coin{at}imperial.ac.uk Supplementary Information: Supplementary data are availableat Bioinformatics online. Associate Editor: Martin Bishop  相似文献   

6.
7.
Surface proteins in Gram-positive bacteria are frequently implicated in virulence. We have focused on a group of extracellular cell wall-attached proteins (CWPs), containing an LPXTG motif for cleavage and covalent coupling to peptidoglycan by sortase enzymes. A hidden Markov model (HMM) approach for predicting the LPXTG-anchored cell wall proteins of Gram-positive bacteria was developed and compared against existing methods. The HMM model is parsimonious in terms of the number of freely estimated parameters, and it has proved to be very sensitive and specific in a training set of 55 experimentally verified LPXTG-anchored cell wall proteins as well as in reliable data sets of globular and transmembrane proteins. In order to identify such proteins in Gram-positive bacteria, a comprehensive analysis of 94 completely sequenced genomes has been performed. We identified, in total, 860 LPXTG-anchored cell wall proteins, a number that is significantly higher compared to those obtained by other available methods. Of these proteins, 237 are hypothetical proteins according to the annotation of SwissProt, and 88 had no homologs in the SwissProt database--this might be evidence that they are members of newly identified families of CWPs. The prediction tool, the database with the proteins identified in the genomes, and supplementary material are available online at http://bioinformatics.biol.uoa.gr/CW-PRED/.  相似文献   

8.
We present a new method for inferring hidden Markov models from noisy time sequences without the necessity of assuming a model architecture, thus allowing for the detection of degenerate states. This is based on the statistical prediction techniques developed by Crutchfield et al. and generates so called causal state models, equivalent in structure to hidden Markov models. The new method is applicable to any continuous data which clusters around discrete values and exhibits multiple transitions between these values such as tethered particle motion data or Fluorescence Resonance Energy Transfer (FRET) spectra. The algorithms developed have been shown to perform well on simulated data, demonstrating the ability to recover the model used to generate the data under high noise, sparse data conditions and the ability to infer the existence of degenerate states. They have also been applied to new experimental FRET data of Holliday Junction dynamics, extracting the expected two state model and providing values for the transition rates in good agreement with previous results and with results obtained using existing maximum likelihood based methods. The method differs markedly from previous Markov-model reconstructions in being able to uncover truly hidden states.  相似文献   

9.
10.
Ubiquitin functions to regulate protein turnover in a cell by closely regulating the degradation of specific proteins. Such a regulatory role is very important, and thus I have analyzed the proteins that are ubiquitin-like, using an artificial neural network, support vector machines and a hidden Markov model (HMM). The methods were trained and tested on a set of 373 ubiquitin proteins and 373 non-ubiquitin proteins, obtained from Entrez protein database. The artificial neural network and support vector machine are trained and tested using both the physicochemical properties and PSSM matrices generated from PSI-BLAST, while in the HMM based method direct sequences are used for training-testing procedures. Further, the performance measures of the methods are calculated for test sequences, i.e. accuracy, specificity, sensitivity and Matthew's correlation coefficients of the methods are calculated. The highest accuracy of 90.2%, specificity of 87.04% and sensitivity of 94.08% was achieved using the support vector machine model with PSSM matrices. While accuracies of 86.82%, 83.37%, 80.18% and 72.11% were obtained for the support vector machine with physicochemical properties, neural network with PSSM matrices, neural networks with physicochemical properties, and hidden Markov model, respectively. As the accuracy for SVM model is better both using physicochemical properties and the PSSM matrices, it is concluded that kernel methods such as SVM outperforms neural networks and hidden Markov models.  相似文献   

11.

Background  

This paper considers the problem of identifying pathways through metabolic networks that relate to a specific biological response. Our proposed model, HME3M, first identifies frequently traversed network paths using a Markov mixture model. Then by employing a hierarchical mixture of experts, separate classifiers are built using information specific to each path and combined into an ensemble prediction for the response.  相似文献   

12.
A hidden Markov model (HMM) has been utilized to predict and generate artificial secretory signal peptide sequences. The strength of signal peptides of proteins from different subcellular locations via Lactococcus lactis bacteria correlated with their HMM bit scores in the model. The results show that the HMM bit score +12 are determined as the threshold for discriminating secreteory signal sequences from the others. The model is used to generate artificial signal peptides with different bit scores for secretory proteins. The signal peptide with the maximum bit score strongly directs proteins secretion.  相似文献   

13.
MOTIVATION: Comparative sequence analysis is widely used to study genome function and evolution. This approach first requires the identification of homologous genes and then the interpretation of their homology relationships (orthology or paralogy). To provide help in this complex task, we developed three databases of homologous genes containing sequences, multiple alignments and phylogenetic trees: HOBACGEN, HOVERGEN and HOGENOM. In this paper, we present two new tools for automating the search for orthologs or paralogs in these databases. RESULTS: First, we have developed and implemented an algorithm to infer speciation and duplication events by comparison of gene and species trees (tree reconciliation). Second, we have developed a general method to search in our databases the gene families for which the tree topology matches a peculiar tree pattern. This algorithm of unordered tree pattern matching has been implemented in the FamFetch graphical interface. With the help of a graphical editor, the user can specify the topology of the tree pattern, and set constraints on its nodes and leaves. Then, this pattern is compared with all the phylogenetic trees of the database, to retrieve the families in which one or several occurrences of this pattern are found. By specifying ad hoc patterns, it is therefore possible to identify orthologs in our databases.  相似文献   

14.
15.
基于直向同源序列的比较基因组学研究   总被引:2,自引:0,他引:2  
直向同源序列在不同的物种中具有相近甚至相同的功能、相似的调控途径, 扮演相似甚至相同的角色, 而且, 绝大多数核心生物功能就是由相当数量的直向同源基因所承担, 它是基因组序列的功能注释与分析中最可靠的选择, 其特殊的生物学特性决定: 利用直向同源序列开展比较基因组学研究, 必将为探测不同生物在进化过程中重要功能基因的出现、表达和丢失提供线索。文章从直向同源基因的基本特性、直向同源序列与比较基因组学的关系、应用直向同源序列开展比较基因组学相关研究方法、现状等展开综述。关键词: 直向同源; 比较基因组学; 生物学特性; 数据库  相似文献   

16.
17.
Polymerase chain reaction (PCR) is a major DNA amplification technology from molecular biology. The quantitative analysis of PCR aims at determining the initial amount of the DNA molecules from the observation of typically several PCR amplifications curves. The mainstream observation scheme of the DNA amplification during PCR involves fluorescence intensity measurements. Under the classical assumption that the measured fluorescence intensity is proportional to the amount of present DNA molecules, and under the assumption that these measurements are corrupted by an additive Gaussian noise, we analyze a single amplification curve using a hidden Markov model(HMM). The unknown parameters of the HMM may be separated into two parts. On the one hand, the parameters from the amplification process are the initial number of the DNA molecules and the replication efficiency, which is the probability of one molecule to be duplicated. On the other hand, the parameters from the observational scheme are the scale parameter allowing to convert the fluorescence intensity into the number of DNA molecules and the mean and variance characterizing the Gaussian noise. We use the maximum likelihood estimation procedure to infer the unknown parameters of the model from the exponential phase of a single amplification curve, the main parameter of interest for quantitative PCR being the initial amount of the DNA molecules. An illustrative example is provided. This research was financed by the Swedish foundation for Strategic Research through the Gothenburg Mathematical Modelling Centre.  相似文献   

18.
We describe a hidden Markov model, HMMSTR, for general protein sequence based on the I-sites library of sequence-structure motifs. Unlike the linear hidden Markov models used to model individual protein families, HMMSTR has a highly branched topology and captures recurrent local features of protein sequences and structures that transcend protein family boundaries. The model extends the I-sites library by describing the adjacencies of different sequence-structure motifs as observed in the protein database and, by representing overlapping motifs in a much more compact form, achieves a great reduction in parameters. The HMM attributes a considerably higher probability to coding sequence than does an equivalent dipeptide model, predicts secondary structure with an accuracy of 74.3 %, backbone torsion angles better than any previously reported method and the structural context of beta strands and turns with an accuracy that should be useful for tertiary structure prediction.  相似文献   

19.
Since membranous proteins play a key role in drug targeting therefore transmembrane proteins prediction is active and challenging area of biological sciences. Location based prediction of transmembrane proteins are significant for functional annotation of protein sequences. Hidden markov model based method was widely applied for transmembrane topology prediction. Here we have presented a revised and a better understanding model than an existing one for transmembrane protein prediction. Scripting on MATLAB was built and compiled for parameter estimation of model and applied this model on amino acid sequence to know the transmembrane and its adjacent locations. Estimated model of transmembrane topology was based on TMHMM model architecture. Only 7 super states are defined in the given dataset, which were converted to 96 states on the basis of their length in sequence. Accuracy of the prediction of model was observed about 74 %, is a good enough in the area of transmembrane topology prediction. Therefore we have concluded the hidden markov model plays crucial role in transmembrane helices prediction on MATLAB platform and it could also be useful for drug discovery strategy. AVAILABILITY: The database is available for free at bioinfonavneet@gmail.comvinaysingh@bhu.ac.in.  相似文献   

20.
MOTIVATION: Repeat sequences in ESTs are a source of problems, in particular for clustering. ESTs are therefore commonly masked against a library of known repeats. High quality repeat libraries are available for the widely studied organisms, but for most other organisms the lack of such libraries is likely to compromise the quality of EST analysis. RESULTS: We present a fast, flexible and library-less method for masking repeats in EST sequences, based on match statistics within the EST collection. The method is not linked to a particular clustering algorithm. Extensive testing on datasets using different clustering methods and a genomic mapping as reference shows that this method gives results that are better than or as good as those obtained using RepeatMasker with a repeat library. AVAILABILITY: The implementation of RBR is available under the terms of the GPL from http://www.ii.uib.no/~ketil/bioinformatics CONTACT: ketil.malde@bccs.uib.no SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号