G- Protein coupled receptors (GPCRs) comprise the largest group of eukaryotic cell surface receptors with great pharmacological interest. A broad range of native ligands interact and activate GPCRs, leading to signal transduction within cells. Most of these responses are mediated through the interaction of GPCRs with heterotrimeric GTP-binding proteins (G-proteins). Due to the information explosion in biological sequence databases, the development of software algorithms that could predict properties of GPCRs is important. Experimental data reported in the literature suggest that heterotrimeric G-proteins interact with parts of the activated receptor at the transmembrane helix-intracellular loop interface. Utilizing this information and membrane topology information, we have developed an intensive exploratory approach to generate a refined library of statistical models (Hidden Markov Models) that predict the coupling preference of GPCRs to heterotrimeric G-proteins. The method predicts the coupling preferences of GPCRs to Gs, Gi/o and Gq/11, but not G12/13 subfamilies.  相似文献   

G-protein coupled receptors (GPCRs) represent one of the most important classes of drug targets for pharmaceutical industry and play important roles in cellular signal transduction. Predicting the coupling specificity of GPCRs to G-proteins is vital for further understanding the mechanism of signal transduction and the function of the receptors within a cell, which can provide new clues for pharmaceutical research and development. In this study, the features of amino acid compositions and physiochemical properties of the full-length GPCR sequences have been analyzed and extracted. Based on these features, classifiers have been developed to predict the coupling specificity of GPCRs to G-protelns using support vector machines. The testing results show that this method could obtain better prediction accuracy.  相似文献   

MOTIVATION: Alignments of two multiple-sequence alignments, or statistical models of such alignments (profiles), have important applications in computational biology. The increased amount of information in a profile versus a single sequence can lead to more accurate alignments and more sensitive homolog detection in database searches. Several profile-profile alignment methods have been proposed and have been shown to improve sensitivity and alignment quality compared with sequence-sequence methods (such as BLAST) and profile-sequence methods (e.g. PSI-BLAST). Here we present a new approach to profile-profile alignment we call Comparison of Alignments by Constructing Hidden Markov Models (HMMs) (COACH). COACH aligns two multiple sequence alignments by constructing a profile HMM from one alignment and aligning the other to that HMM. RESULTS: We compare the alignment accuracy of COACH with two recently published methods: Yona and Levitt's prof_sim and Sadreyev and Grishin's COMPASS. On two sets of reference alignments selected from the FSSP database, we find that COACH is able, on average, to produce alignments giving the best coverage or the fewest errors, depending on the chosen parameter settings. AVAILABILITY: COACH is freely available from www.drive5.com/lobster  相似文献   

Ubiquitin functions to regulate protein turnover in a cell by closely regulating the degradation of specific proteins. Such a regulatory role is very important, and thus I have analyzed the proteins that are ubiquitin-like, using an artificial neural network, support vector machines and a hidden Markov model (HMM). The methods were trained and tested on a set of 373 ubiquitin proteins and 373 non-ubiquitin proteins, obtained from Entrez protein database. The artificial neural network and support vector machine are trained and tested using both the physicochemical properties and PSSM matrices generated from PSI-BLAST, while in the HMM based method direct sequences are used for training-testing procedures. Further, the performance measures of the methods are calculated for test sequences, i.e. accuracy, specificity, sensitivity and Matthew's correlation coefficients of the methods are calculated. The highest accuracy of 90.2%, specificity of 87.04% and sensitivity of 94.08% was achieved using the support vector machine model with PSSM matrices. While accuracies of 86.82%, 83.37%, 80.18% and 72.11% were obtained for the support vector machine with physicochemical properties, neural network with PSSM matrices, neural networks with physicochemical properties, and hidden Markov model, respectively. As the accuracy for SVM model is better both using physicochemical properties and the PSSM matrices, it is concluded that kernel methods such as SVM outperforms neural networks and hidden Markov models.  相似文献   

Predicting the hand and fingers posture during grasping tasks is an important issue in the frame of biomechanics. In this paper, a technique based on neural networks is proposed to learn the inverse kinematics mapping between the fingertip 3D position and the corresponding joint angles. Finger movements are obtained by an instrumented glove and are mapped to a multichain model of the hand. From the fingertip desired position, the neural networks allow predicting the corresponding finger joint angles keeping the specific subject coordination patterns. Two sets of movements are considered in this study. The first one, the training set, consisting of free fingers movements is used to construct the mapping between fingertip position and joint angles. The second one, constructed for testing purposes, is composed of a sequence of grasping tasks of everyday-life objects. The maximal mean error between fingertip measured position and fingertip position obtained from simulated joint angles and forward kinematics is 0.99+/-0.76mm for the training set and 1.49+/-1.62mm for the test set. Also, the maximal RMS error of joint angles prediction is 2.85 degrees and 5.10 degrees for the training and test sets respectively, while the maximal mean joint angles prediction error is -0.11+/-4.34 degrees and -2.52+/-6.71 degrees for the training and test sets, respectively. Results relative to the learning and generalization capabilities of this architecture are also presented and discussed.  相似文献   

Artificial neural networks (ANNs) have been used for the recognition of non-linear patterns, a characteristic of bioprocesses like wine production. In this work, ANNs were tested to predict problems of wine fermentation. A database of about 20,000 data from industrial fermentations of Cabernet Sauvignon and 33 variables was used. Two different ways of inputting data into the model were studied, by points and by fermentation. Additionally, different sub-cases were studied by varying the predictor variables (total sugar, alcohol, glycerol, density, organic acids and nitrogen compounds) and the time of fermentation (72, 96 and 256 h). The input of data by fermentations gave better results than the input of data by points. In fact, it was possible to predict 100% of normal and problematic fermentations using three predictor variables: sugars, density and alcohol at 72 h (3 days). Overall, ANNs were capable of obtaining 80% of prediction using only one predictor variable at 72 h; however, it is recommended to add more fermentations to confirm this promising result.  相似文献   

A model was developed for novel prediction of N-linked glycan branching pattern classification for CHO-derived N-linked glycoproteins. The model consists of 30 independent recurrent neural networks and uses predicted quantities of secondary structure elements and residue solvent accessibility as an input vector. The model was designed to predict the major component of a heterogeneous mixture of CHO-derived glycoforms of a recombinant protein under normal growth conditions. Resulting glycosylation prediction is classified as either complex-type or high mannose. The incorporation of predicted quantities in the input vector allowed for theoretical mutant N-linked glycan branching predictions without initial experimental analysis of protein structures. Primary amino acid sequence data were effectively eliminated from the input vector space based on neural network prediction analyses. This provided further evidence that localized protein secondary structure elements and conformational structure may play more important roles in determining glycan branching patterns than does the primary sequence of a polypeptide. A confidence interval parameter was incorporated into the model to enable identification of false predictions. The model was further tested using published experimental results for mutants of the tissue-type plasminogen activator protein [J. Wilhelm, S.G. Lee, N.K. Kalyan, S.M. Cheng, F. Wiener, W. Pierzchala, P.P. Hung, Alterations in the domain structure of tissue-type plasminogen activator change the nature of asparagine glycosylation. Biotechnology (N.Y.) 8 (1990) 321-325].  相似文献   

In order to investigate the relationship between glycosyltransferase families and the motif for them, we classified 47 glycosyltransferase families in the CAZy database into four superfamilies, GTS-A, -B, -C, and -D, using a profile Hidden Markov Model method. On the basis of the classification and the similarity between GTS-A and nucleotidylyltransferase family catalyzing the synthesis of nucleotide-sugar, we proposed that ancient oligosaccharide might have been synthesized by the origin of GTS-B whereas the origin of GTS-A might be the gene encoding for synthesis of nucleotide-sugar as the donor and have evolved to glycosyltransferases to catalyze the synthesis of divergent carbohydrates. We also suggested that the divergent evolution of each superfamily in the corresponding subcellular component has increased the complexities of eukaryotic carbohydrate structure.  相似文献   

Selective knockdown of gene expression by short interference RNAs (siRNAs) has allowed rapid validation of gene functions and made possible a high throughput, genome scale approach to interrogate gene function. However, randomly designed siRNAs display different knockdown efficiencies of target genes. Hence, various prediction algorithms based on siRNA functionality have recently been constructed to increase the likelihood of selecting effective siRNAs, thereby reducing the experimental cost. Toward this end, we have trained three Back-propagation and Bayesian neural network models, previously not used in this context, to predict the knockdown efficiencies of 180 experimentally verified siRNAs on their corresponding target genes. Using our input coding based primarily on RNA structure thermodynamic parameters and cross-validation method, we showed that our neural network models outperformed most other methods and are comparable to the best predicting algorithm thus far published. Furthermore, our neural network models correctly classified 74% of all siRNAs into different efficiency categories; with a correlation coefficient of 0.43 and receiver operating characteristic curve score of 0.78, thus highlighting the potential utility of this method to complement other existing siRNA classification and prediction schemes.  相似文献   

Segmentation of yeast DNA using hidden Markov models   总被引:2,自引:0,他引:2  

Traditional regression analysis of body weight growth curvesencounters problems .when the data are extremely variable. Whiletransformations are often employed to meet the criteria of theanalysis, some transformations are inadequate for normalizingthe data. Regression analysis also requires presuppositionsregarding the model to be fit and the techniques to be usedin the analysis. An alternative approach using artificial neuralnetworks is presented which may be suitable for developing predictivemodels of growth. Neural networks are simulators of the processesthat occur in the biological brain during the learning process.They are trained on the data, developing the necessary algorithmswithin their internal architecture, and produce a predictivemodel based on the learned facts. A dataset of Sprague–Dawleyrat (Rattus norvegicus) weights is analyzed by both traditionalregression analysis and neural network training. Predictionsof body weight are made from both models. While both methodsproduce models that adequately predict the body weights, theneural network model is superior in that it combines accuracyand precision, being less influenced by longitudinal variabilityin the data. Thus, the neural network provides another toolfor researchers to analyze growth curve data.  相似文献   

This work presents a novel pairwise statistical alignment method based on an explicit evolutionary model of insertions and deletions (indels). Indel events of any length are possible according to a geometric distribution. The geometric distribution parameter, the indel rate, and the evolutionary time are all maximum likelihood estimated from the sequences being aligned. Probability calculations are done using a pair hidden Markov model (HMM) with transition probabilities calculated from the indel parameters. Equations for the transition probabilities make the pair HMM closely approximate the specified indel model. The method provides an optimal alignment, its likelihood, the likelihood of all possible alignments, and the reliability of individual alignment regions. Human alpha and beta-hemoglobin sequences are aligned, as an illustration of the potential utility of this pair HMM approach.  相似文献   

In this work the advantages of using artificial neural networks (ANNs) combined with experimental design (ED) to optimize the separation of amino acids enantiomers, with α‐cyclodextrin as chiral selector, were demonstrated. The results obtained with the ED‐ANN approach were compared with those of either the partial least‐squares (PLS) method or the response surface methodology where experimental design and the regression equation were used. The ANN approach is quite general, no explicit model is needed, and the amount of experimental work can be decreased considerably. Chirality 11:616–621, 1999. © 1999 Wiley‐Liss, Inc.  相似文献   

1.  Linking the movement and behaviour of animals to their environment is a central problem in ecology. Through the use of electronic tagging and tracking (ETT), collection of in situ data from free-roaming animals is now commonplace, yet statistical approaches enabling direct relation of movement observations to environmental conditions are still in development.
2.  In this study, we examine the hidden Markov model (HMM) for behavioural analysis of tracking data. HMMs allow for prediction of latent behavioural states while directly accounting for the serial dependence prevalent in ETT data. Updating the probability of behavioural switches with tag or remote-sensing data provides a statistical method that links environmental data to behaviour in a direct and integrated manner.
3.  It is important to assess the reliability of state categorization over the range of time-series lengths typically collected from field instruments and when movement behaviours are similar between movement states. Simulation with varying lengths of times series data and contrast between average movements within each state was used to test the HMMs ability to estimate movement parameters.
4.  To demonstrate the methods in a realistic setting, the HMMs were used to categorize resident and migratory phases and the relationship between movement behaviour and ocean temperature using electronic tagging data from southern bluefin tuna ( Thunnus maccoyii ). Diagnostic tools to evaluate the suitability of different models and inferential methods for investigating differences in behaviour between individuals are also demonstrated.  相似文献   

Surveillance data for communicable nosocomial pathogens usually consist of short time series of low-numbered counts of infected patients. These often show overdispersion and autocorrelation. To date, almost all analyses of such data have ignored the communicable nature of the organisms and have used methods appropriate only for independent outcomes. Inferences that depend on such analyses cannot be considered reliable when patient-to-patient transmission is important. We propose a new method for analysing these data based on a mechanistic model of the epidemic process. Since important nosocomial pathogens are often carried asymptomatically with overt infection developing in only a proportion of patients, the epidemic process is usually only partially observed by routine surveillance data. We therefore develop a 'structured' hidden Markov model where the underlying Markov chain is generated by a simple transmission model. We apply both structured and standard (unstructured) hidden Markov models to time series for three important pathogens. We find that both methods can offer marked improvements over currently used approaches when nosocomial spread is important. Compared to the standard hidden Markov model, the new approach is more parsimonious, is more biologically plausible, and allows key epidemiological parameters to be estimated.  相似文献   

Hidden Markov models have been used to restore recorded signals of single ion channels buried in background noise. Parameter estimation and signal restoration are usually carried out through likelihood maximization by using variants of the Baum-Welch forward-backward procedures. This paper presents an alternative approach for dealing with this inferential task. The inferences are made by using a combination of the framework provided by Bayesian statistics and numerical methods based on Markov chain Monte Carlo stochastic simulation. The reliability of this approach is tested by using synthetic signals of known characteristics. The expectations of the model parameters estimated here are close to those calculated using the Baum-Welch algorithm, but the present methods also yield estimates of their errors. Comparisons of the results of the Bayesian Markov Chain Monte Carlo approach with those obtained by filtering and thresholding demonstrate clearly the superiority of the new methods.  相似文献   

MacKay Altman R 《Biometrics》2004,60(2):444-450
In this article, we propose a graphical technique for assessing the goodness-of-fit of a stationary hidden Markov model (HMM). We show that plots of the estimated distribution against the empirical distribution detect lack of fit with high probability for large sample sizes. By considering plots of the univariate and multidimensional distributions, we are able to examine the fit of both the assumed marginal distribution and the correlation structure of the observed data. We provide general conditions for the convergence of the empirical distribution to the true distribution, and demonstrate that these conditions hold for a wide variety of time-series models. Thus, our method allows us to compare not only the fit of different HMMs, but also that of other models as well. We illustrate our technique using a multiple sclerosis data set.  相似文献   

