共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
The detection of Outer Membrane Proteins (OMP) in whole genomes is an actual question, their sequence characteristics have thus been intensively studied. This class of protein displays a common beta-barrel architecture, formed by adjacent antiparallel strands. However, due to the lack of available structures, few structural studies have been made on this class of proteins. Here we propose a novel OMP local structure investigation, based on a structural alphabet approach, i.e., the decomposition of 3D structures using a library of four-residue protein fragments. The optimal decomposition of structures using hidden Markov model results in a specific structural alphabet of 20 fragments, six of them dedicated to the decomposition of beta-strands. This optimal alphabet, called SA20-OMP, is analyzed in details, in terms of local structures and transitions between fragments. It highlights a particular and strong organization of beta-strands as series of regular canonical structural fragments. The comparison with alphabets learned on globular structures indicates that the internal organization of OMP structures is more constrained than in globular structures. The analysis of OMP structures using SA20-OMP reveals some recurrent structural patterns. The preferred location of fragments in the distinct regions of the membrane is investigated. The study of pairwise specificity of fragments reveals that some contacts between structural fragments in beta-sheets are clearly favored whereas others are avoided. This contact specificity is stronger in OMP than in globular structures. Moreover, SA20-OMP also captured sequential information. This can be integrated in a scoring function for structural model ranking with very promising results. 相似文献
3.
Predicting the experimental unfolding rates of two-state proteins and models describing the unfolding rates of these proteins is quite limited because of the complexity present in the unfolding mechanism and the lack of experimental unfolding data compared with folding data. In this work, 25 two-state proteins characterized by Maxwell et al. (Protein Sci 2005;14:602–616) using a consensus set of experimental conditions were taken, and the parameter long-range order (LRO) derived from their three-dimensional structures were related with their experimental unfolding rates ln(k(u)). From the total data set of 30 proteins used by Maxwell et al. (Protein Sci 2005;14:602–616), five slow-unfolding proteins with very low unfolding rates were considered to be outliers and were not included in our data set. Except all beta structural class, LRO of both the all-alpha and mixed-class proteins showed a strong inverse correlation of r = -0.99 and -0.88, respectively, with experimental ln(k(u)). LRO shows a correlation of -0.62 with experimental ln(k(u)) for all-beta proteins. For predicting the unfolding rates, a simple statistical method has been used and linear regression equations were developed for individual structural classes of proteins using LRO, and the results obtained showed a better agreement with experimental results. 相似文献
4.
5.
6.
Dhoha Triki Sandrine Fartek Benoit Visseaux Diane Descamps Anne-Claude Camproux 《Journal of biomolecular structure & dynamics》2013,31(17):4658-4670
AbstractThe HIV-2 protease (PR2) is an important target for designing new drugs against the HIV-2 infection. In this study, we explored the structural backbone variability of all available PR2 structures complexed with various inhibitors using a structural alphabet approach. 77% of PR2 positions are structurally variable, meaning they exhibit different local conformations in PR2 structures. This variability was observed all along the structure, particularly in the elbow and flap regions. A part of these backbone changes observed between the 18 PR2 is induced by intrinsic flexibility, and ligand binding putatively induces others occurring in the binding pocket. These latter changes could be important for PR2 adaptation to diverse ligands and are accompanied by changes outside the binding pocket. In addition, the study of the link between structural variability of the pocket and PR2–ligand interactions allowed us to localize pocket regions important for ligand binding and catalytic function, regions important for ligand recognition that adjust their backbone in response to ligand binding and regions important for the pocket opening and closing that have large intrinsic flexibility. Finally, we suggested that differences in ligand effectiveness for PR2 could be partially explained by different backbone deformations induced by these ligands. To conclude, this study is the first characterization of the PR2 structural variability considering ligand diversity. It provides information about the recognition of PR2 to various ligands and its mechanisms to adapt its local conformation to bound ligands that could help understand the resistance of PR2 to its inhibitors, a major antiretroviral class.Communicated by Ramaswamy H. Sarma 相似文献
7.
We consider hidden Markov models as a versatile class of models for weakly dependent random phenomena. The topic of the present paper is likelihood-ratio testing for hidden Markov models, and we show that, under appropriate conditions, the standard asymptotic theory of likelihood-ratio tests is valid. Such tests are crucial in the specification of multivariate Gaussian hidden Markov models, which we use to illustrate the applicability of our general results. Finally, the methodology is illustrated by means of a real data set. 相似文献
8.
Residue burial, which describes a protein residue's exposure to solvent and neighboring atoms, is key to protein structure prediction, modeling, and analysis. We assessed 21 alphabets representing residue burial, according to their predictability from amino acid sequence, conservation in structural alignments, and utility in one fold-recognition scenario. This follows upon our previous work in assessing nine representations of backbone geometry.1 The alphabet found to be most effective overall has seven states and is based on a count of C(beta) atoms within a 14 A-radius sphere centered at the C(beta) of a residue of interest. When incorporated into a hidden Markov model (HMM), this alphabet gave us a 38% performance boost in fold recognition and 23% in alignment quality. 相似文献
9.
It is often desired to identify further homologs of a family of biological sequences from the ever-growing sequence databases. Profile hidden Markov models excel at capturing the common statistical features of a group of biological sequences. With these common features, we can search the biological database and find new homologous sequences. Most general profile hidden Markov model methods, however, treat the evolutionary relationships between the sequences in a homologous group in an ad-hoc manner. We hereby introduce a method to incorporate phylogenetic information directly into hidden Markov models, and demonstrate that the resulting model performs better than most of the current multiple sequence-based methods for finding distant homologs. 相似文献
10.
This work presents a novel pairwise statistical alignment method based on an explicit evolutionary model of insertions and deletions (indels). Indel events of any length are possible according to a geometric distribution. The geometric distribution parameter, the indel rate, and the evolutionary time are all maximum likelihood estimated from the sequences being aligned. Probability calculations are done using a pair hidden Markov model (HMM) with transition probabilities calculated from the indel parameters. Equations for the transition probabilities make the pair HMM closely approximate the specified indel model. The method provides an optimal alignment, its likelihood, the likelihood of all possible alignments, and the reliability of individual alignment regions. Human alpha and beta-hemoglobin sequences are aligned, as an illustration of the potential utility of this pair HMM approach. 相似文献
11.
Knowledge of the pattern of selection in natural populations is fundamental for our understanding of the evolutionary process. Selection at higher levels has gained considerable theoretical support in recent years, and one possible level of selection is the breeding pair where fitness is a function of the pair and cannot be reduced to single individuals. We analyzed the importance of pair‐level selection over 25 years in a natural population of the collared flycatcher. Pair‐level selection was significant in five and probably in another 9 years. The relative importance of pair‐level selection varied over years and can have stronger or the same strength as directional selection. This means that selection can act on the combination of the breeding pair in addition to selection on each individual separately. Overall, the conservative estimates obtained here show that this is a potentially important form of selection. 相似文献
12.
Methylated non-CpGs (mCpHs) in mammalian cells yield weak enrichment signals and colocalize with methylated CpGs (mCpGs), thus have been considered byproducts of hyperactive methyltransferases. However, mCpHs are cell type-specific and associated with epigenetic regulation, although their dependency on mCpGs remains to be elucidated. In this study, we demonstrated that mCpHs colocalize with mCpGs in pluripotent stem cells, but not in brain cells. In addition, profiling genome-wide methylation patterns using a hidden Markov model revealed abundant genomic regions in which CpGs and CpHs are differentially methylated in brain. These regions were frequently located in putative enhancers, and mCpHs within the enhancers increased in correlation with brain age. The enhancers with hypermethylated CpHs were associated with genes functionally enriched in immune responses, and some of the genes were related to neuroinflammation and degeneration. This study provides insight into the roles of non-CpG methylation as an epigenetic code in the mammalian brain genome. 相似文献
13.
Selection for new favorable variants can lead to selective sweeps. However, such sweeps might be rare in the evolution of different species for which polygenic adaptation or selection on standing variation might be more common. Still, strong selective sweeps have been described in domestic species such as chicken lines or dog breeds. The goal of our study was to use a panel of individuals from 12 different cattle breeds genotyped at high density (800K SNPs) to perform a whole‐genome scan for selective sweeps defined as unexpectedly long stretches of reduced heterozygosity. To that end, we developed a hidden Markov model in which one of the hidden states corresponds to regions of reduced heterozygosity. Some unexpectedly long regions were identified. Among those, six contained genes known to affect traits with simple genetic architecture such as coat color or horn development. However, there was little evidence for sweeps associated with genes underlying production traits. 相似文献
14.
在基因组测序工作完成后,利用计算工具进行基因识别以及基因结构预测受到了越来越多人的重视.人们开发了大量的相关应用软件,如GenScan, Genemark, GRAIL等,这些软件在寻找新基因方面提供了很重要的线索.但基因的识别和预测问题仍未得到完全解决,当目标基因的编码序列有缺失和插入时,其预测结果和基因的实际结构相差很大.为了消除测序错误对预测结果的影响,希望能找出编码序列区的测序错误.基于这种想法,尝试根据DNA序列的一些统计特性,利用隐马尔科夫模型(Hidden Markov Model),引入缺失和插入状态,然后用Viterbi算法,从中找出含有缺失和插入的外显子序列片段.在常用的Burset/Guigo检测集进行检测,得到的结果在外显子水平上,Sn(sensitivity)和Sp(specificity)均达到84%以上. 相似文献
15.
Detilleux JC 《Animal : an international journal of animal bioscience》2011,5(2):175-181
In many countries, high somatic cell scores (SCS) in milk are used as an indicator for mastitis because they are collected on a routine basis. However, individual test-day SCS are not very accurate in identifying infected cows. Mathematical models may improve the accuracy of the biological marker by making better use of the information contained in the available data. Here, a simple hidden Markov model (HMM) is described mathematically and applied to SCS recorded monthly on cows with or without clinical mastitis to evaluate its accuracy in estimating parameters (mean, variance and transition probabilities) under healthy or diseased states. The SCS means were estimated at 1.96 (s.d. = 0.16) and 4.73 (s.d. = 0.71) for the hidden healthy and infected states, and the common variance at 0.83 (s.d. = 0.11). The probability of remaining uninfected, recovering from infection, getting newly infected and remaining infected between consecutive test days was estimated at 78.84%, 60.49%, 11.70% and 15%, respectively. Three different health-related states were compared: clinical stages observed by farmers, subclinical cases defined for somatic cell counts below or above 250 000 cells/ml and infected stages obtained from the HMM. The results showed that HMM identifies infected cows before the appearance of clinical and subclinical signs, which may critically improve the power of the studies on the genetic determinants of SCS and reduce biases in predicting breeding values for SCS. 相似文献
16.
17.
Throughout history, the population size of modern humans has varied considerably due to changes in environment, culture, and technology. More accurate estimates of population size changes, and when they occurred, should provide a clearer picture of human colonization history and help remove confounding effects from natural selection inference. Demography influences the pattern of genetic variation in a population, and thus genomic data of multiple individuals sampled from one or more present-day populations contain valuable information about the past demographic history. Recently, Li and Durbin developed a coalescent-based hidden Markov model, called the pairwise sequentially Markovian coalescent (PSMC), for a pair of chromosomes (or one diploid individual) to estimate past population sizes. This is an efficient, useful approach, but its accuracy in the very recent past is hampered by the fact that, because of the small sample size, only few coalescence events occur in that period. Multiple genomes from the same population contain more information about the recent past, but are also more computationally challenging to study jointly in a coalescent framework. Here, we present a new coalescent-based method that can efficiently infer population size changes from multiple genomes, providing access to a new store of information about the recent past. Our work generalizes the recently developed sequentially Markov conditional sampling distribution framework, which provides an accurate approximation of the probability of observing a newly sampled haplotype given a set of previously sampled haplotypes. Simulation results demonstrate that we can accurately reconstruct the true population histories, with a significant improvement over the PSMC in the recent past. We apply our method, called diCal, to the genomes of multiple human individuals of European and African ancestry to obtain a detailed population size change history during recent times. 相似文献
18.
Cadherins are cell surface adhesion proteins important for tissue development and integrity. Type I and type II, or classical, cadherins form adhesive dimers via an interface formed through the exchange, or “swapping”, of the N-terminal β-strands from their membrane-distal EC1 domains. Here, we ask which sequence and structural features in EC1 domains are responsible for β-strand swapping and whether members of other cadherin families form similar strand-swapped binding interfaces. We created a comprehensive database of multiple alignments of each type of cadherin domain. We used the known three-dimensional structures of classical cadherins to identify conserved positions in multiple sequence alignments that appear to be crucial determinants of the cadherin domain structure. We identified features that are unique to EC1 domains. On the basis of our analysis, we conclude that all cadherin domains have very similar overall folds but, with the exception of classical and desmosomal cadherin EC1 domains, most of them do not appear to bind through a strand-swapping mechanism. Thus, non-classical cadherins that function in adhesion are likely to use different protein-protein interaction interfaces. Our results have implications for the evolution of molecular mechanisms of cadherin-mediated adhesion in vertebrates. 相似文献
19.
《Cell reports》2020,30(11):3644-3654.e6
- Download : Download high-res image (209KB)
- Download : Download full-size image
20.
Johann C Detilleux 《遗传、选种与进化》2008,40(5):491-509
A mixed hidden Markov model (HMM) was developed for predicting breeding values of a biomarker (here, somatic cell score) and the individual probabilities of health and disease (here, mastitis) based upon the measurements of the biomarker. At a first level, the unobserved disease process (Markov model) was introduced and at a second level, the measurement process was modeled, making the link between the unobserved disease states and the observed biomarker values. This hierarchical formulation allows joint estimation of the parameters of both processes. The flexibility of this approach is illustrated on the simulated data. Firstly, lactation curves for the biomarker were generated based upon published parameters (mean, variance, and probabilities of infection) for cows with known clinical conditions (health or mastitis due to Escherichia coli or Staphylococcus aureus). Next, estimation of the parameters was performed via Gibbs sampling, assuming the health status was unknown. Results from the simulations and mathematics show that the mixed HMM is appropriate to estimate the quantities of interest although the accuracy of the estimates is moderate when the prevalence of the disease is low. The paper ends with some indications for further developments of the methodology. 相似文献