共查询到20条相似文献,搜索用时 15 毫秒
1.
Payal Singh Pradipta Bandyopadhyay Sudha Bhattacharya A Krishnamachari Supratim Sengupta 《BMC bioinformatics》2009,10(1):325
Background
Riboswitches are a type of noncoding RNA that regulate gene expression by switching from one structural conformation to another on ligand binding. The various classes of riboswitches discovered so far are differentiated by the ligand, which on binding induces a conformational switch. Every class of riboswitch is characterized by an aptamer domain, which provides the site for ligand binding, and an expression platform that undergoes conformational change on ligand binding. The sequence and structure of the aptamer domain is highly conserved in riboswitches belonging to the same class. We propose a method for fast and accurate identification of riboswitches using profile Hidden Markov Models (pHMM). Our method exploits the high degree of sequence conservation that characterizes the aptamer domain. 相似文献2.
Accurate classification of HIV-1 subtypes is essential for studying the dynamic spatial distribution pattern of HIV-1 subtypes and also for developing effective methods of treatment that can be targeted to attack specific subtypes. We propose a classification method based on profile Hidden Markov Model that can accurately identify an unknown strain. We show that a standard method that relies on the construction of a positive training set only, to capture unique features associated with a particular subtype, can accurately classify sequences belonging to all subtypes except B and D. We point out the drawbacks of the standard method; namely, an arbitrary choice of threshold to distinguish between true positives and true negatives, and the inability to discriminate between closely related subtypes. We then propose an improved classification method based on construction of a positive as well as a negative training set to improve discriminating ability between closely related subtypes like B and D. Finally, we show how the improved method can be used to accurately determine the subtype composition of Common Recombinant Forms of the virus that are made up of two or more subtypes. Our method provides a simple and highly accurate alternative to other classification methods and will be useful in accurately annotating newly sequenced HIV-1 strains. 相似文献
3.
4.
Ensembles are a well established machine learning paradigm, leading to accurate and robust models, predominantly applied to predictive modeling tasks. Ensemble models comprise a finite set of diverse predictive models whose combined output is expected to yield an improved predictive performance as compared to an individual model. In this paper, we propose a new method for learning ensembles of process-based models of dynamic systems. The process-based modeling paradigm employs domain-specific knowledge to automatically learn models of dynamic systems from time-series observational data. Previous work has shown that ensembles based on sampling observational data (i.e., bagging and boosting), significantly improve predictive performance of process-based models. However, this improvement comes at the cost of a substantial increase of the computational time needed for learning. To address this problem, the paper proposes a method that aims at efficiently learning ensembles of process-based models, while maintaining their accurate long-term predictive performance. This is achieved by constructing ensembles with sampling domain-specific knowledge instead of sampling data. We apply the proposed method to and evaluate its performance on a set of problems of automated predictive modeling in three lake ecosystems using a library of process-based knowledge for modeling population dynamics. The experimental results identify the optimal design decisions regarding the learning algorithm. The results also show that the proposed ensembles yield significantly more accurate predictions of population dynamics as compared to individual process-based models. Finally, while their predictive performance is comparable to the one of ensembles obtained with the state-of-the-art methods of bagging and boosting, they are substantially more efficient. 相似文献
5.
6.
7.
This paper examines recent developments and applications of Hidden Markov Models (HMMs) to various problems in computational biology, including multiple sequence alignment, homology detection, protein sequences classification, and genomic annotation. 相似文献
8.
Byung-Jun Yoon 《Current Genomics》2009,10(6):402-415
Hidden Markov models (HMMs) have been extensively used in biological sequence analysis. In this paper, we give a tutorial review of HMMs and their applications in a variety of problems in molecular biology. We especially focus on three types of HMMs: the profile-HMMs, pair-HMMs, and context-sensitive HMMs. We show how these HMMs can be used to solve various sequence analysis problems, such as pairwise and multiple sequence alignments, gene annotation, classification, similarity search, and many others.Key Words: Hidden Markov model (HMM), pair-HMM, profile-HMM, context-sensitive HMM (csHMM), profile-csHMM, sequence analysis. 相似文献
9.
Background
Hidden Markov Models (HMMs) have proven very useful in computational biology for such applications as sequence pattern matching, gene-finding, and structure prediction. Thus far, however, they have been confined to representing 1D sequence (or the aspects of structure that could be represented by character strings). 相似文献10.
11.
Rapid, sensitive, and specific virus detection is an important component of clinical diagnostics. Massively parallel sequencing enables new diagnostic opportunities that complement traditional serological and PCR based techniques. While massively parallel sequencing promises the benefits of being more comprehensive and less biased than traditional approaches, it presents new analytical challenges, especially with respect to detection of pathogen sequences in metagenomic contexts. To a first approximation, the initial detection of viruses can be achieved simply through alignment of sequence reads or assembled contigs to a reference database of pathogen genomes with tools such as BLAST. However, recognition of highly divergent viral sequences is problematic, and may be further complicated by the inherently high mutation rates of some viral types, especially RNA viruses. In these cases, increased sensitivity may be achieved by leveraging position-specific information during the alignment process. Here, we constructed HMMER3-compatible profile hidden Markov models (profile HMMs) from all the virally annotated proteins in RefSeq in an automated fashion using a custom-built bioinformatic pipeline. We then tested the ability of these viral profile HMMs (“vFams”) to accurately classify sequences as viral or non-viral. Cross-validation experiments with full-length gene sequences showed that the vFams were able to recall 91% of left-out viral test sequences without erroneously classifying any non-viral sequences into viral protein clusters. Thorough reanalysis of previously published metagenomic datasets with a set of the best-performing vFams showed that they were more sensitive than BLAST for detecting sequences originating from more distant relatives of known viruses. To facilitate the use of the vFams for rapid detection of remote viral homologs in metagenomic data, we provide two sets of vFams, comprising more than 4,000 vFams each, in the HMMER3 format. We also provide the software necessary to build custom profile HMMs or update the vFams as more viruses are discovered (http://derisilab.ucsf.edu/software/vFam). 相似文献
12.
Biolog EcoPlates™ can be used to measure the carbon substrate utilisation patterns of microbial communities. This method results in a community-level physiological profile (CLPP), which yields a very large amount of data that may be difficult to interpret. In this work, we explore a combination of statistical techniques (particularly the use of generalised additive models [GAMs]) to improve the exploitation of CLPP data. The strength of GAMs lies in their ability to address highly non-linear relationships between the response and the set of explanatory variables. We studied the impact of earthworms (Aporrectodea caliginosa Savigny 1826) and cadmium (Cd) on the CLPP of soil bacteria. The results indicated that both Cd and earthworms modified the CLPP. GAMs were used to assess time-course changes in the diversity of substrate utilisation (DSU) using the Shannon-Wiener index. GAMs revealed significant differences for all treatments (compared to control -S-). The Cd exposed microbial community presented very high metabolic capacities on a few substrata, resulting in an initial acute decrease of DSU (i.e. intense utilization of a few carbon substrata). After 54 h, and over the next 43 h the increase of the DSU suggest that other taxa, less dominant, reached high numbers in the wells containing sources that are less suitable for the Cd-tolerant taxa. Earthworms were a much more determining factor in explaining time course changes in DSU than Cd. Accordingly, Ew and EwCd soils presented similar trends, regardless the presence of Cd. Moreover, both treatments presented similar number of bacteria and higher than Cd-treated soils. This experimental approach, based on the use of DSU and GAMs allowed for a global and statistically relevant interpretation of the changes in carbon source utilisation, highlighting the key role of earthworms on the protection of microbial communities against the Cd. 相似文献
13.
14.
15.
Background
G- Protein coupled receptors (GPCRs) comprise the largest group of eukaryotic cell surface receptors with great pharmacological interest. A broad range of native ligands interact and activate GPCRs, leading to signal transduction within cells. Most of these responses are mediated through the interaction of GPCRs with heterotrimeric GTP-binding proteins (G-proteins). Due to the information explosion in biological sequence databases, the development of software algorithms that could predict properties of GPCRs is important. Experimental data reported in the literature suggest that heterotrimeric G-proteins interact with parts of the activated receptor at the transmembrane helix-intracellular loop interface. Utilizing this information and membrane topology information, we have developed an intensive exploratory approach to generate a refined library of statistical models (Hidden Markov Models) that predict the coupling preference of GPCRs to heterotrimeric G-proteins. The method predicts the coupling preferences of GPCRs to Gs, Gi/o and Gq/11, but not G12/13 subfamilies. 相似文献16.
Characterization of single channel currents using digital signal processing techniques based on Hidden Markov Models. 总被引:14,自引:0,他引:14
S H Chung J B Moore L G Xia L S Premkumar P W Gage 《Philosophical transactions of the Royal Society of London. Series B, Biological sciences》1990,329(1254):265-285
Techniques for extracting small, single channel ion currents from background noise are described and tested. It is assumed that single channel currents are generated by a first-order, finite-state, discrete-time, Markov process to which is added 'white' background noise from the recording apparatus (electrode, amplifiers, etc). Given the observations and the statistics of the background noise, the techniques described here yield a posteriori estimates of the most likely signal statistics, including the Markov model state transition probabilities, duration (open- and closed-time) probabilities, histograms, signal levels, and the most likely state sequence. Using variations of several algorithms previously developed for solving digital estimation problems, we have demonstrated that: (1) artificial, small, first-order, finite-state, Markov model signals embedded in simulated noise can be extracted with a high degree of accuracy, (2) processing can detect signals that do not conform to a first-order Markov model but the method is less accurate when the background noise is not white, and (3) the techniques can be used to extract from the baseline noise single channel currents in neuronal membranes. Some studies have been included to test the validity of assuming a first-order Markov model for biological signals. This method can be used to obtain directly from digitized data, channel characteristics such as amplitude distributions, transition matrices and open- and closed-time durations. 相似文献
17.
Bulla I Schultz AK Meinicke P 《Statistical applications in genetics and molecular biology》2012,11(1):Article 1
Profile Hidden Markov Models (pHMMs) are widely used to model nucleotide or protein sequence families. In many applications, a sequence family classified into several subfamilies is given and each subfamily is modeled separately by one pHMM. A major drawback of this approach is the difficulty of coping with subfamilies composed of very few sequences.Correct subtyping of human immunodeficiency virus-1 (HIV-1) sequences is one of the most crucial bioinformatic tasks affected by this problem of small subfamilies, i.e., HIV-1 subtypes with a small number of known sequences. To deal with small samples for particular subfamilies of HIV-1, we employ a machine learning approach. More precisely, we make use of an existing HMM architecture and its associated inference engine, while replacing the unsupervised estimation of emission probabilities by a supervised method. For that purpose, we use regularized linear discriminant learning together with a balancing scheme to account for the widely varying sample size. After training the multiclass linear discriminants, the corresponding weights are transformed to valid probabilities using a softmax function.We apply this modified algorithm to classify HIV-1 sequence data (in the form of partial-length HIV-1 sequences and semi-artificial recombinants) and show that the performance of pHMMs can be significantly improved by the proposed technique. 相似文献
18.
Classification and Mapping of Riparian Systems Using Airborne Multispectral Videography 总被引:1,自引:0,他引:1
Christopher M. U. Neale 《Restoration Ecology》1997,5(4S):103-112
19.
β-lactamase mediated antibiotic resistance is an important health issue and the discovery of new β-lactam type antibiotics or β-lactamase inhibitors is an area of intense research. Today, there are about a thousand β-lactamases due to the evolutionary pressure exerted by these ligands. While β-lactamases hydrolyse the β-lactam ring of antibiotics, rendering them ineffective, Penicillin-Binding Proteins (PBPs), which share high structural similarity with β-lactamases, also confer antibiotic resistance to their host organism by acquiring mutations that allow them to continue their participation in cell wall biosynthesis. In this paper, we propose a novel approach to include ligand sharing information for classifying and clustering β-lactamases and PBPs in an effort to elucidate the ligand induced evolution of these β-lactam binding proteins. We first present a detailed summary of the β-lactamase and PBP families in the Protein Data Bank, as well as the compounds they bind to. Then, we build two different types of networks in which the proteins are represented as nodes, and two proteins are connected by an edge with a weight that depends on the number of shared identical or similar ligands. These models are analyzed under three different edge weight settings, namely unweighted, weighted, and normalized weighted. A detailed comparison of these six networks showed that the use of ligand sharing information to cluster proteins resulted in modules comprising proteins with not only sequence similarity but also functional similarity. Consideration of ligand similarity highlighted some interactions that were not detected in the identical ligand network. Analysing the β-lactamases and PBPs using ligand-centric network models enabled the identification of novel relationships, suggesting that these models can be used to examine other protein families to obtain information on their ligand induced evolutionary paths. 相似文献
20.
Reliability analysis of the electrical control system of a subsea blowout preventer (BOP) stack is carried out based on Markov method. For the subsea BOP electrical control system used in the current work, the 3-2-1-0 and 3-2-0 input voting schemes are available. The effects of the voting schemes on system performance are evaluated based on Markov models. In addition, the effects of failure rates of the modules and repair time on system reliability indices are also investigated. 相似文献