首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Adaptive introgression—the flow of adaptive genetic variation between species or populations—has attracted significant interest in recent years and it has been implicated in a number of cases of adaptation, from pesticide resistance and immunity, to local adaptation. Despite this, methods for identification of adaptive introgression from population genomic data are lacking. Here, we present Ancestry_HMM-S, a hidden Markov model-based method for identifying genes undergoing adaptive introgression and quantifying the strength of selection acting on them. Through extensive validation, we show that this method performs well on moderately sized data sets for realistic population and selection parameters. We apply Ancestry_HMM-S to a data set of an admixed Drosophila melanogaster population from South Africa and we identify 17 loci which show signatures of adaptive introgression, four of which have previously been shown to confer resistance to insecticides. Ancestry_HMM-S provides a powerful method for inferring adaptive introgression in data sets that are typically collected when studying admixed populations. This method will enable powerful insights into the genetic consequences of admixture across diverse populations. Ancestry_HMM-S can be downloaded from https://github.com/jesvedberg/Ancestry_HMM-S/.  相似文献   

2.

Background  

Riboswitches are a type of noncoding RNA that regulate gene expression by switching from one structural conformation to another on ligand binding. The various classes of riboswitches discovered so far are differentiated by the ligand, which on binding induces a conformational switch. Every class of riboswitch is characterized by an aptamer domain, which provides the site for ligand binding, and an expression platform that undergoes conformational change on ligand binding. The sequence and structure of the aptamer domain is highly conserved in riboswitches belonging to the same class. We propose a method for fast and accurate identification of riboswitches using profile Hidden Markov Models (pHMM). Our method exploits the high degree of sequence conservation that characterizes the aptamer domain.  相似文献   

3.
Dwivedi SK  Sengupta S 《PloS one》2012,7(5):e36566
Accurate classification of HIV-1 subtypes is essential for studying the dynamic spatial distribution pattern of HIV-1 subtypes and also for developing effective methods of treatment that can be targeted to attack specific subtypes. We propose a classification method based on profile Hidden Markov Model that can accurately identify an unknown strain. We show that a standard method that relies on the construction of a positive training set only, to capture unique features associated with a particular subtype, can accurately classify sequences belonging to all subtypes except B and D. We point out the drawbacks of the standard method; namely, an arbitrary choice of threshold to distinguish between true positives and true negatives, and the inability to discriminate between closely related subtypes. We then propose an improved classification method based on construction of a positive as well as a negative training set to improve discriminating ability between closely related subtypes like B and D. Finally, we show how the improved method can be used to accurately determine the subtype composition of Common Recombinant Forms of the virus that are made up of two or more subtypes. Our method provides a simple and highly accurate alternative to other classification methods and will be useful in accurately annotating newly sequenced HIV-1 strains.  相似文献   

4.
H. Zhao  J. Li  W. P. Robinson 《Biometrics》2001,57(4):1074-1079
Genetic studies of uniparental disomy (UPD) employing many markers have helped geneticists to gain a better understanding of the molecular mechanisms underlying nondisjunction. However, most existing methods cannot simultaneously analyze all genetic markers and consistently incorporate crossover interference; they thus fail to make the most use of genetic information in the data. In the present article, we describe a hidden Markov model for multilocus uniparental disomy data. This method is based on the chi-square model for the crossover process and can simultaneously incorporate all marker information including untyped and uninformative markers. We then apply this novel method to analyze a set of UPD15 data.  相似文献   

5.
This paper proposes the use of hidden Markov time series models for the analysis of the behaviour sequences of one or more animals under observation. These models have advantages over the Markov chain models commonly used for behaviour sequences, as they can allow for time-trend or expansion to several subjects without sacrificing parsimony. Furthermore, they provide an alternative to higher-order Markov chain models if a first-order Markov chain is unsatisfactory as a model. To illustrate the use of such models, we fit multivariate and univariate hidden Markov models allowing for time-trend to data from an experiment investigating the effects of feeding on the locomotory behaviour of locusts (Locusta migratoria).  相似文献   

6.

Background

The advent of various high-throughput experimental techniques for measuring molecular interactions has enabled the systematic study of biological interactions on a global scale. Since biological processes are carried out by elaborate collaborations of numerous molecules that give rise to a complex network of molecular interactions, comparative analysis of these biological networks can bring important insights into the functional organization and regulatory mechanisms of biological systems.

Methodology/Principal Findings

In this paper, we present an effective framework for identifying common interaction patterns in the biological networks of different organisms based on hidden Markov models (HMMs). Given two or more networks, our method efficiently finds the top matching paths in the respective networks, where the matching paths may contain a flexible number of consecutive insertions and deletions.

Conclusions/Significance

Based on several protein-protein interaction (PPI) networks obtained from the Database of Interacting Proteins (DIP) and other public databases, we demonstrate that our method is able to detect biologically significant pathways that are conserved across different organisms. Our algorithm has a polynomial complexity that grows linearly with the size of the aligned paths. This enables the search for very long paths with more than 10 nodes within a few minutes on a desktop computer. The software program that implements this algorithm is available upon request from the authors.  相似文献   

7.
8.
隐马尔科夫过程在生物信息学中的应用   总被引:3,自引:0,他引:3  
隐马尔科夫过程(hidden markov model,简称HMM)是20世纪70年代提出来的一种统计方法,以前主要用于语音识别。1989年Churchill将其引入计算生物学。目前,HMM是生物信息学中应用比较广泛的一种统计方法,主要用于:线性序列分析、模型分析、基因发现等方面。对HMM进行了简明扼要的描述,并对其在上述几个方面的应用作一概略介绍。  相似文献   

9.
Ensembles are a well established machine learning paradigm, leading to accurate and robust models, predominantly applied to predictive modeling tasks. Ensemble models comprise a finite set of diverse predictive models whose combined output is expected to yield an improved predictive performance as compared to an individual model. In this paper, we propose a new method for learning ensembles of process-based models of dynamic systems. The process-based modeling paradigm employs domain-specific knowledge to automatically learn models of dynamic systems from time-series observational data. Previous work has shown that ensembles based on sampling observational data (i.e., bagging and boosting), significantly improve predictive performance of process-based models. However, this improvement comes at the cost of a substantial increase of the computational time needed for learning. To address this problem, the paper proposes a method that aims at efficiently learning ensembles of process-based models, while maintaining their accurate long-term predictive performance. This is achieved by constructing ensembles with sampling domain-specific knowledge instead of sampling data. We apply the proposed method to and evaluate its performance on a set of problems of automated predictive modeling in three lake ecosystems using a library of process-based knowledge for modeling population dynamics. The experimental results identify the optimal design decisions regarding the learning algorithm. The results also show that the proposed ensembles yield significantly more accurate predictions of population dynamics as compared to individual process-based models. Finally, while their predictive performance is comparable to the one of ensembles obtained with the state-of-the-art methods of bagging and boosting, they are substantially more efficient.  相似文献   

10.
11.
12.
13.
Recent applications of Hidden Markov Models in computational biology   总被引:2,自引:0,他引:2  
This paper examines recent developments and applications of Hidden Markov Models (HMMs) to various problems in computational biology, including multiple sequence alignment, homology detection, protein sequences classification, and genomic annotation.  相似文献   

14.
Hidden Markov models (HMMs) have been extensively used in biological sequence analysis. In this paper, we give a tutorial review of HMMs and their applications in a variety of problems in molecular biology. We especially focus on three types of HMMs: the profile-HMMs, pair-HMMs, and context-sensitive HMMs. We show how these HMMs can be used to solve various sequence analysis problems, such as pairwise and multiple sequence alignments, gene annotation, classification, similarity search, and many others.Key Words: Hidden Markov model (HMM), pair-HMM, profile-HMM, context-sensitive HMM (csHMM), profile-csHMM, sequence analysis.  相似文献   

15.
基于隐马氏模型对编码序列缺失与插入的检测(英)   总被引:2,自引:0,他引:2  
在基因组测序工作完成后,利用计算工具进行基因识别以及基因结构预测受到了越来越多人的重视.人们开发了大量的相关应用软件,如GenScan, Genemark, GRAIL等,这些软件在寻找新基因方面提供了很重要的线索.但基因的识别和预测问题仍未得到完全解决,当目标基因的编码序列有缺失和插入时,其预测结果和基因的实际结构相差很大.为了消除测序错误对预测结果的影响,希望能找出编码序列区的测序错误.基于这种想法,尝试根据DNA序列的一些统计特性,利用隐马尔科夫模型(Hidden Markov Model),引入缺失和插入状态,然后用Viterbi算法,从中找出含有缺失和插入的外显子序列片段.在常用的Burset/Guigo检测集进行检测,得到的结果在外显子水平上,Sn(sensitivity)和Sp(specificity)均达到84%以上.  相似文献   

16.

Background  

Hidden Markov Models (HMMs) have proven very useful in computational biology for such applications as sequence pattern matching, gene-finding, and structure prediction. Thus far, however, they have been confined to representing 1D sequence (or the aspects of structure that could be represented by character strings).  相似文献   

17.
18.
Rapid, sensitive, and specific virus detection is an important component of clinical diagnostics. Massively parallel sequencing enables new diagnostic opportunities that complement traditional serological and PCR based techniques. While massively parallel sequencing promises the benefits of being more comprehensive and less biased than traditional approaches, it presents new analytical challenges, especially with respect to detection of pathogen sequences in metagenomic contexts. To a first approximation, the initial detection of viruses can be achieved simply through alignment of sequence reads or assembled contigs to a reference database of pathogen genomes with tools such as BLAST. However, recognition of highly divergent viral sequences is problematic, and may be further complicated by the inherently high mutation rates of some viral types, especially RNA viruses. In these cases, increased sensitivity may be achieved by leveraging position-specific information during the alignment process. Here, we constructed HMMER3-compatible profile hidden Markov models (profile HMMs) from all the virally annotated proteins in RefSeq in an automated fashion using a custom-built bioinformatic pipeline. We then tested the ability of these viral profile HMMs (“vFams”) to accurately classify sequences as viral or non-viral. Cross-validation experiments with full-length gene sequences showed that the vFams were able to recall 91% of left-out viral test sequences without erroneously classifying any non-viral sequences into viral protein clusters. Thorough reanalysis of previously published metagenomic datasets with a set of the best-performing vFams showed that they were more sensitive than BLAST for detecting sequences originating from more distant relatives of known viruses. To facilitate the use of the vFams for rapid detection of remote viral homologs in metagenomic data, we provide two sets of vFams, comprising more than 4,000 vFams each, in the HMMER3 format. We also provide the software necessary to build custom profile HMMs or update the vFams as more viruses are discovered (http://derisilab.ucsf.edu/software/vFam).  相似文献   

19.
Biolog EcoPlates™ can be used to measure the carbon substrate utilisation patterns of microbial communities. This method results in a community-level physiological profile (CLPP), which yields a very large amount of data that may be difficult to interpret. In this work, we explore a combination of statistical techniques (particularly the use of generalised additive models [GAMs]) to improve the exploitation of CLPP data. The strength of GAMs lies in their ability to address highly non-linear relationships between the response and the set of explanatory variables. We studied the impact of earthworms (Aporrectodea caliginosa Savigny 1826) and cadmium (Cd) on the CLPP of soil bacteria. The results indicated that both Cd and earthworms modified the CLPP. GAMs were used to assess time-course changes in the diversity of substrate utilisation (DSU) using the Shannon-Wiener index. GAMs revealed significant differences for all treatments (compared to control -S-). The Cd exposed microbial community presented very high metabolic capacities on a few substrata, resulting in an initial acute decrease of DSU (i.e. intense utilization of a few carbon substrata). After 54 h, and over the next 43 h the increase of the DSU suggest that other taxa, less dominant, reached high numbers in the wells containing sources that are less suitable for the Cd-tolerant taxa. Earthworms were a much more determining factor in explaining time course changes in DSU than Cd. Accordingly, Ew and EwCd soils presented similar trends, regardless the presence of Cd. Moreover, both treatments presented similar number of bacteria and higher than Cd-treated soils. This experimental approach, based on the use of DSU and GAMs allowed for a global and statistically relevant interpretation of the changes in carbon source utilisation, highlighting the key role of earthworms on the protection of microbial communities against the Cd.  相似文献   

20.
Hidden Markov models were successfully applied in various fields of time series analysis, especially for analyzing ion channel recordings. The maximum likelihood estimator (MLE) has recently been proven to be asymptotically normally distributed. Here, we investigate finite sample properties of the MLE and of different types of likelihood ratio tests (LRTs) by means of simulation studies. The MLE is shown to reach the asymptotic behavior within sample sizes that are common for various applications. Thus, reliable estimates and confidence intervals can be obtained. We give an approximative scaling function for the estimation error for finite samples, and investigate the power of different LRTs suitable for applications to ion channels, including tests for superimposed hidden Markov processes. Our results are applied to physiological sodium channel data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号