首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 140 毫秒
1.
2.
Accelerated Profile HMM Searches   总被引:4,自引:0,他引:4  
Profile hidden Markov models (profile HMMs) and probabilistic inference methods have made important contributions to the theory of sequence database homology search. However, practical use of profile HMM methods has been hindered by the computational expense of existing software implementations. Here I describe an acceleration heuristic for profile HMMs, the "multiple segment Viterbi" (MSV) algorithm. The MSV algorithm computes an optimal sum of multiple ungapped local alignment segments using a striped vector-parallel approach previously described for fast Smith/Waterman alignment. MSV scores follow the same statistical distribution as gapped optimal local alignment scores, allowing rapid evaluation of significance of an MSV score and thus facilitating its use as a heuristic filter. I also describe a 20-fold acceleration of the standard profile HMM Forward/Backward algorithms using a method I call "sparse rescaling". These methods are assembled in a pipeline in which high-scoring MSV hits are passed on for reanalysis with the full HMM Forward/Backward algorithm. This accelerated pipeline is implemented in the freely available HMMER3 software package. Performance benchmarks show that the use of the heuristic MSV filter sacrifices negligible sensitivity compared to unaccelerated profile HMM searches. HMMER3 is substantially more sensitive and 100- to 1000-fold faster than HMMER2. HMMER3 is now about as fast as BLAST for protein searches.  相似文献   

3.
Profile hidden Markov models (HMMs) are amongst the most successful procedures for detecting remote homology between proteins. There are two popular profile HMM programs, HMMER and SAM. Little is known about their performance relative to each other and to the recently improved version of PSI-BLAST. Here we compare the two programs to each other and to non-HMM methods, to determine their relative performance and the features that are important for their success. The quality of the multiple sequence alignments used to build models was the most important factor affecting the overall performance of profile HMMs. The SAM T99 procedure is needed to produce high quality alignments automatically, and the lack of an equivalent component in HMMER makes it less complete as a package. Using the default options and parameters as would be expected of an inexpert user, it was found that from identical alignments SAM consistently produces better models than HMMER and that the relative performance of the model-scoring components varies. On average, HMMER was found to be between one and three times faster than SAM when searching databases larger than 2000 sequences, SAM being faster on smaller ones. Both methods were shown to have effective low complexity and repeat sequence masking using their null models, and the accuracy of their E-values was comparable. It was found that the SAM T99 iterative database search procedure performs better than the most recent version of PSI-BLAST, but that scoring of PSI-BLAST profiles is more than 30 times faster than scoring of SAM models.  相似文献   

4.
Profile hidden Markov models (HMMs) are used to model protein families and for detecting evolutionary relationships between proteins. Such a profile HMM is typically constructed from a multiple alignment of a set of related sequences. Transition probability parameters in an HMM are used to model insertions and deletions in the alignment. We show here that taking into account unrelated sequences when estimating the transition probability parameters helps to construct more discriminative models for the global/local alignment mode. After normal HMM training, a simple heuristic is employed that adjusts the transition probabilities between match and delete states according to observed transitions in the training set relative to the unrelated (noise) set. The method is called adaptive transition probabilities (ATP) and is based on the HMMER package implementation. It was benchmarked in two remote homology tests based on the Pfam and the SCOP classifications. Compared to the HMMER default procedure, the rate of misclassification was reduced significantly in both tests and across all levels of error rate.  相似文献   

5.
BACKGROUND: A variety of methods for prediction of peptide binding to major histocompatibility complex (MHC) have been proposed. These methods are based on binding motifs, binding matrices, hidden Markov models (HMM), or artificial neural networks (ANN). There has been little prior work on the comparative analysis of these methods. MATERIALS AND METHODS: We performed a comparison of the performance of six methods applied to the prediction of two human MHC class I molecules, including binding matrices and motifs, ANNs, and HMMs. RESULTS: The selection of the optimal prediction method depends on the amount of available data (the number of peptides of known binding affinity to the MHC molecule of interest), the biases in the data set and the intended purpose of the prediction (screening of a single protein versus mass screening). When little or no peptide data are available, binding motifs are the most useful alternative to random guessing or use of a complete overlapping set of peptides for selection of candidate binders. As the number of known peptide binders increases, binding matrices and HMM become more useful predictors. ANN and HMM are the predictive methods of choice for MHC alleles with more than 100 known binding peptides. CONCLUSION: The ability of bioinformatic methods to reliably predict MHC binding peptides, and thereby potential T-cell epitopes, has major implications for clinical immunology, particularly in the area of vaccine design.  相似文献   

6.
Protein homology detection by HMM-HMM comparison   总被引:22,自引:4,他引:18  
MOTIVATION: Protein homology detection and sequence alignment are at the basis of protein structure prediction, function prediction and evolution. RESULTS: We have generalized the alignment of protein sequences with a profile hidden Markov model (HMM) to the case of pairwise alignment of profile HMMs. We present a method for detecting distant homologous relationships between proteins based on this approach. The method (HHsearch) is benchmarked together with BLAST, PSI-BLAST, HMMER and the profile-profile comparison tools PROF_SIM and COMPASS, in an all-against-all comparison of a database of 3691 protein domains from SCOP 1.63 with pairwise sequence identities below 20%.Sensitivity: When the predicted secondary structure is included in the HMMs, HHsearch is able to detect between 2.7 and 4.2 times more homologs than PSI-BLAST or HMMER and between 1.44 and 1.9 times more than COMPASS or PROF_SIM for a rate of false positives of 10%. Approximately half of the improvement over the profile-profile comparison methods is attributable to the use of profile HMMs in place of simple profiles. Alignment quality: Higher sensitivity is mirrored by an increased alignment quality. HHsearch produced 1.2, 1.7 and 3.3 times more good alignments ('balanced' score >0.3) than the next best method (COMPASS), and 1.6, 2.9 and 9.4 times more than PSI-BLAST, at the family, superfamily and fold level, respectively.Speed: HHsearch scans a query of 200 residues against 3691 domains in 33 s on an AMD64 2GHz PC. This is 10 times faster than PROF_SIM and 17 times faster than COMPASS.  相似文献   

7.
Hiroshi Mamitsuka 《Proteins》1998,33(4):460-474
The binding of a major histocompatibility complex (MHC) molecule to a peptide originating in an antigen is essential to recognizing antigens in immune systems, and it has proved to be important to use computers to predict the peptides that will bind to an MHC molecule. The purpose of this paper is twofold: First, we propose to apply supervised learning of hidden Markov models (HMMs) to this problem, which can surpass existing methods for the problem of predicting MHC-binding peptides. Second, we generate peptides that have high probabilities to bind to a certain MHC molecule, based on our proposed method using peptides binding to MHC molecules as a set of training data. From our experiments, in a type of cross-validation test, the discrimination accuracy of our supervised learning method is usually approximately 2–15% better than those of other methods, including backpropagation neural networks, which have been regarded as the most effective approach to this problem. Furthermore, using an HMM trained for HLA-A2, we present new peptide sequences that are provided with high binding probabilities by the HMM and that are thus expected to bind to HLA-A2 proteins. Peptide sequences not shown in this paper but with rather high binding probabilities can be obtained from the author (E-mail: mami@ccm.cl.nec.co.jp). Proteins 33:460–474, 1998. © 1998 Wiley-Liss, Inc.  相似文献   

8.
9.
MOTIVATION: Profile HMMs are a powerful tool for modeling conserved motifs in proteins. These models are widely used by search tools to classify new protein sequences into families based on domain architecture. However, the proliferation of known motifs and new proteomic sequence data poses a computational challenge for search, requiring days of CPU time to annotate an organism's proteome. RESULTS: We use PROSITE-like patterns as a filter to speed up the comparison between protein sequence and profile HMM. A set of patterns is designed starting from the HMM, and only sequences matching one of these patterns are compared to the HMM by full dynamic programming. We give an algorithm to design patterns with maximal sensitivity subject to a bound on the false positive rate. Experiments show that our patterns typically retain at least 90% of the sensitivity of the source HMM while accelerating search by an order of magnitude. AVAILABILITY: Contact the first author at the address below.  相似文献   

10.
SUMMARY: Hidden Markov models (HMMs) are widely used for biological sequence analysis because of their ability to incorporate biological information in their structure. An automatic means of optimizing the structure of HMMs would be highly desirable. However, this raises two important issues; first, the new HMMs should be biologically interpretable, and second, we need to control the complexity of the HMM so that it has good generalization performance on unseen sequences. In this paper, we explore the possibility of using a genetic algorithm (GA) for optimizing the HMM structure. GAs are sufficiently flexible to allow incorporation of other techniques such as Baum-Welch training within their evolutionary cycle. Furthermore, operators that alter the structure of HMMs can be designed to favour interpretable and simple structures. In this paper, a training strategy using GAs is proposed, and it is tested on finding HMM structures for the promoter and coding region of the bacterium Campylobacter jejuni. The proposed GA for hidden Markov models (GA-HMM) allows, HMMs with different numbers of states to evolve. To prevent over-fitting, a separate dataset is used for comparing the performance of the HMMs to that used for the Baum-Welch training. The GA-HMM was capable of finding an HMM comparable to a hand-coded HMM designed for the same task, which has been published previously.  相似文献   

11.
The computational approach for identifying promoters on increasingly large genomic sequences has led to many false positives. The biological significance of promoter identification lies in the ability to locate true promoters with and without prior sequence contextual knowledge. Prior approaches to promoter modelling have involved artificial neural networks (ANNs) or hidden Markov models (HMMs), each producing adequate results on small scale identification tasks, i.e. narrow upstream regions. In this work, we present an architecture to support prokaryote promoter identification on large scale genomic sequences, i.e. not limited to narrow upstream regions. The significant contribution involved the hybrid formed via aggregation of the profile HMM with the ANN, via Viterbi scoring optimizations. The benefit obtained using this architecture includes the modelling ability of the profile HMM with the ability of the ANN to associate elements composing the promoter. We present the high effectiveness of the hybrid approach in comparison to profile HMMs and ANNs when used separately. The contribution of Viterbi optimizations is also highlighted for supporting the hybrid architecture in which gains in sensitivity (+0.3), specificity (+0.65) and precision (+0.54) are achieved over existing approaches.  相似文献   

12.
Profile hidden Markov models (HMMs) based on classical HMMs have been widely applied for protein sequence identification. The formulation of the forward and backward variables in profile HMMs is made under statistical independence assumption of the probability theory. We propose a fuzzy profile HMM to overcome the limitations of that assumption and to achieve an improved alignment for protein sequences belonging to a given family. The proposed model fuzzifies the forward and backward variables by incorporating Sugeno fuzzy measures and Choquet integrals, thus further extends the generalized HMM. Based on the fuzzified forward and backward variables, we propose a fuzzy Baum-Welch parameter estimation algorithm for profiles. The strong correlations and the sequence preference involved in the protein structures make this fuzzy architecture based model as a suitable candidate for building profiles of a given family, since the fuzzy set can handle uncertainties better than classical methods.  相似文献   

13.
Designing Patterns and Profiles for Faster HMM Search   总被引:1,自引:0,他引:1  
Profile HMMs are powerful tools for modeling conserved motifs in proteins. They are widely used by search tools to classify new protein sequences into families based on domain architecture. However, the proliferation of known motifs and new proteomic sequence data poses a computational challenge for search, requiring days of CPU time to annotate an organism's proteome. It is highly desirable to speed up HMM search in large databases. We design PROSITE-like patterns and short profiles that are used as filters to rapidly eliminate protein-motif pairs for which a full profile HMM comparison does not yield a significant match. The design of the pattern-based filters is formulated as a multichoice knapsack problem. Profile-based filters with high sensitivity are extracted from a profile HMM based on their theoretical sensitivity and false positive rate. Experiments show that our profile-based filters achieve high sensitivity (near 100 percent) while keeping around 20times speedup with respect to the unfiltered search program. Pattern-based filters typically retain at least 90 percent of the sensitivity of the source HMM with 30-40times speedup. The profile-based filters have sensitivity comparable to the multistage filtering strategy HMMERHEAD and are faster in most of our experiments.  相似文献   

14.

Background  

Profile Hidden Markov Models (HMM) are statistical representations of protein families derived from patterns of sequence conservation in multiple alignments and have been used in identifying remote homologues with considerable success. These conservation patterns arise from fold specific signals, shared across multiple families, and function specific signals unique to the families. The availability of sequences pre-classified according to their function permits the use of negative training sequences to improve the specificity of the HMM, both by optimizing the threshold cutoff and by modifying emission probabilities to minimize the influence of fold-specific signals. A protocol to generate family specific HMMs is described that first constructs a profile HMM from an alignment of the family's sequences and then uses this model to identify sequences belonging to other classes that score above the default threshold (false positives). Ten-fold cross validation is used to optimise the discrimination threshold score for the model. The advent of fast multiple alignment methods enables the use of the profile alignments to align the true and false positive sequences, and the resulting alignments are used to modify the emission probabilities in the original model.  相似文献   

15.
Myelin oligodendrocyte glycoprotein (MOG) is an Ag present in the myelin sheath of the CNS thought to be targeted by the autoimmune T cell response in multiple sclerosis (MS). In this study, we have for the first time characterized the T cell epitopes of human MOG restricted by HLA-DR4 (DRB1*0401), an MHC class II allele associated with MS in a subpopulation of patients. Using MHC binding algorithms, we have predicted MOG peptide binding to HLA-DR4 (DRB1*0401) and subsequently defined the in vivo T cell reactivity to overlapping MOG peptides by testing HLA-DR4 (DRB1*0401) transgenic mice immunized with recombinant human (rh)MOG. The data indicated that MOG peptide 97-108 (core 99-107, FFRDHSYQE) was the immunodominant HLA-DR4-restricted T cell epitope in vivo. This peptide has a high in vitro binding affinity for HLA-DR4 (DRB1*0401) and upon immunization induced severe experimental autoimmune encephalomyelitis in the HLA-DR4 transgenic mice. Interestingly, the same peptide was presented by human B cells expressing HLA-DR4 (DRB1*0401), suggesting a role for the identified MOG epitopes in the pathogenesis of human MS.  相似文献   

16.
Prediction of protein secondary structure by the hidden Markov model   总被引:4,自引:0,他引:4  
The purpose of this paper is to introduce a new method for analyzingthe amino acid sequences of proteins using the hidden Markovmodel (HMM), which is a type of stochastic model. Secondarystructures such as helix, sheet and turn are learned by HMMs,and these HMMs are applied to new sequences whose structuresare unknown. The output probabilities from the HMMs are usedto predict the secondary structures of the sequences. The authorstested this prediction system on 100 sequences from a publicdatabase (Brookhaven PDB). Although the implementation is ‘withoutgrammar’ (no rule for the appearance patterns of secondarystructure) the result was reasonable.  相似文献   

17.
Pisatin is the major phytoalexin produced by pea upon microbial infection. The enzyme that catalyzes the terminal step in the pisatin biosynthetic pathway is (+)6a-hydroxymaackiain 3-O-methyltransferase (HMM). We report here the isolation and characterization of two HMM cDNA clones (pHMM1 and pHMM2) made from RNA obtained from Nectria haematococca-infected pea tissue. The two clones were confirmed to encode HMM activity by heterologous expression in Escherichia coli/. The substrate specificity of the methyltransferases in E. coli was similar to the activity detected in CuCl2-treated pea tissue. Nucleotide sequence analysis of Hmm1 and Hmm2 revealed an open reading frame of 1080 bp and 360 amino acid residues which would encode 40.36 kda and 40.41 kDa polypeptides, respectively. The deduced amino acid sequence of HMM1 has 95.8% identity to HMM2, 40.6% identity to Zrp4, a putative O-methyltransferase (OMT) in maize root, and 39.1% to pBH72-F1, a putative OMT induced in barley by fungal pathogens or UV light. Comparison of the deduced amino acid sequences of the cDNA clones to OMTs from other higher plants identified the binding sites of S-adenosylmethionine (AdoMet). Southern blot analysis showed two closely linked genes with strong homology to Hmm in the pea genome.  相似文献   

18.
The hidden Markov model (HMM) is a framework for time series analysis widely applied to single-molecule experiments. Although initially developed for applications outside the natural sciences, the HMM has traditionally been used to interpret signals generated by physical systems, such as single molecules, evolving in a discrete state space observed at discrete time levels dictated by the data acquisition rate. Within the HMM framework, transitions between states are modeled as occurring at the end of each data acquisition period and are described using transition probabilities. Yet, whereas measurements are often performed at discrete time levels in the natural sciences, physical systems evolve in continuous time according to transition rates. It then follows that the modeling assumptions underlying the HMM are justified if the transition rates of a physical process from state to state are small as compared to the data acquisition rate. In other words, HMMs apply to slow kinetics. The problem is, because the transition rates are unknown in principle, it is unclear, a priori, whether the HMM applies to a particular system. For this reason, we must generalize HMMs for physical systems, such as single molecules, because these switch between discrete states in “continuous time”. We do so by exploiting recent mathematical tools developed in the context of inferring Markov jump processes and propose the hidden Markov jump process. We explicitly show in what limit the hidden Markov jump process reduces to the HMM. Resolving the discrete time discrepancy of the HMM has clear implications: we no longer need to assume that processes, such as molecular events, must occur on timescales slower than data acquisition and can learn transition rates even if these are on the same timescale or otherwise exceed data acquisition rates.  相似文献   

19.
Rapid, sensitive, and specific virus detection is an important component of clinical diagnostics. Massively parallel sequencing enables new diagnostic opportunities that complement traditional serological and PCR based techniques. While massively parallel sequencing promises the benefits of being more comprehensive and less biased than traditional approaches, it presents new analytical challenges, especially with respect to detection of pathogen sequences in metagenomic contexts. To a first approximation, the initial detection of viruses can be achieved simply through alignment of sequence reads or assembled contigs to a reference database of pathogen genomes with tools such as BLAST. However, recognition of highly divergent viral sequences is problematic, and may be further complicated by the inherently high mutation rates of some viral types, especially RNA viruses. In these cases, increased sensitivity may be achieved by leveraging position-specific information during the alignment process. Here, we constructed HMMER3-compatible profile hidden Markov models (profile HMMs) from all the virally annotated proteins in RefSeq in an automated fashion using a custom-built bioinformatic pipeline. We then tested the ability of these viral profile HMMs (“vFams”) to accurately classify sequences as viral or non-viral. Cross-validation experiments with full-length gene sequences showed that the vFams were able to recall 91% of left-out viral test sequences without erroneously classifying any non-viral sequences into viral protein clusters. Thorough reanalysis of previously published metagenomic datasets with a set of the best-performing vFams showed that they were more sensitive than BLAST for detecting sequences originating from more distant relatives of known viruses. To facilitate the use of the vFams for rapid detection of remote viral homologs in metagenomic data, we provide two sets of vFams, comprising more than 4,000 vFams each, in the HMMER3 format. We also provide the software necessary to build custom profile HMMs or update the vFams as more viruses are discovered (http://derisilab.ucsf.edu/software/vFam).  相似文献   

20.
Interaction of tropomyosin with F-actin-heavy meromyosin complex   总被引:1,自引:0,他引:1  
The effect of phosphorylated and dephosphorylated heavy meromyosins (HMMs) saturated with Ca2+ or Mg2+ on the binding of tropomyosin to F-actin and on the conformational changes of tropomyosin on actin was investigated. The experimental data were analysed on the basis of th emodel of cooperative binding of tropomyosin to F-actin with overlapping binding sites. In general, attachment of both HMMs to F-actin increased around 100-fold the tropomyosin-binding affinity but concomittantly reduced the cooperatively of binding. In the presence of Ca2+ and in the absence of ATP the binding of tropomyosin to F-actin in a "doubly contiguous" manner was three-fold stronger for F-actin saturated with dephosphorylated HMM as compared to phosphorylated HMM. Under the same rigor conditions but in the absence of Ca2+ the reverse was true but the difference was about 1.5-fold. The binding stoichiometry of tropomyosin to actin was 7:1 in the presence of dephosphorylated HMM saturated with Ca2+ or phosphorylated-saturated with Mg2+ and tended to be about 6:1 for both after the exchange of the cation bound to myosin heads. Bound HMM was also found to influence the fluorescence polarization of 1,5-IAEDANS-labelled tropomyosin complexed with F-actin in muscle ghost fibres. In the presence of Ca2+, the amount of randomly arranged tropomyosin fluorophores decreased when dephosphorylated HMM was bound to ghost fibres, in contrast to an observed increase in the case of bound phosphorylated HMM. Thus HMM induced conformational changes of tropomyosin in the actin-tropomyosin complex that was reflected in an alteration of the geometrical arrangement between tropomyosin and actin.(ABSTRACT TRUNCATED AT 250 WORDS)  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号