首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Predicting residue-residue contacts using random forest models   总被引:1,自引:0,他引:1  
  相似文献   

2.
A comprehensive statistical analysis of residue-residue contacts and residue environment in protein 3-D structures is presented. In the present work the range of interresidue interactions (effective radius of influence) in tertiary structures of proteins is examined and found to be 10 Å. This result is obtained by correlating the average number of residues within a spherical volume of different radii (contact numbers) with hydrophobicity. Best correlations are obtained with a radius of 10 Å. The same result is obtained when (i) only long-range interactions are considered and (ii) representative side chain atoms are used to indicate the tertiary structure instead of the usual representation of Cα atoms. Residue environment has been investigated using similar methods. Environmental hydrophobicity varies within only a small range of all residue types. Other physicochemical properties also exhibit similar trends of variation, and only five hydrophobic residues (Leu, Val, Met, Phe and Ile) produce a decrement of around 10% from the expected mean of the physicochemical distance between a residue type and its average environment. An information theory approach is proposed to compare domains, which takes into account the effective radius of influence of residues and sequence similarity.  相似文献   

3.
Rath A  Johnson RM  Deber CM 《Biopolymers》2007,88(2):217-232
Although the structural analysis of membrane proteins is advancing, an understanding of the basic principles that underlie their folding and assembly remains limited because of the high insolubility intrinsic to these molecules and concomitant challenges in obtaining crystals. Fortunately, from an experimental standpoint, membrane protein folding can be approximated as the rigid-body docking of pre-formed alpha-helical transmembrane segments one with another to form the final functional protein structure. Peptides derived from the sequences of native alpha-helical transmembrane segments and those that mimic their properties are therefore valuable in the experimental evaluation of protein folding within the membrane. Here we present an overview of the progress made in our laboratory and elsewhere in using peptide models toward defining the sequence requirements and forces stabilizing membrane protein folds.  相似文献   

4.

Background

Majority of influenza A viruses reside and circulate among animal populations, seldom infecting humans due to host range restriction. Yet when some avian strains do acquire the ability to overcome species barrier, they might become adapted to humans, replicating efficiently and causing diseases, leading to potential pandemic. With the huge influenza A virus reservoir in wild birds, it is a cause for concern when a new influenza strain emerges with the ability to cross host species barrier, as shown in light of the recent H7N9 outbreak in China. Several influenza proteins have been shown to be major determinants in host tropism. Further understanding and determining host tropism would be important in identifying zoonotic influenza virus strains capable of crossing species barrier and infecting humans.

Results

In this study, computational models for 11 influenza proteins have been constructed using the machine learning algorithm random forest for prediction of host tropism. The prediction models were trained on influenza protein sequences isolated from both avian and human samples, which were transformed into amino acid physicochemical properties feature vectors. The results were highly accurate prediction models (ACC>96.57; AUC>0.980; MCC>0.916) capable of determining host tropism of individual influenza proteins. In addition, features from all 11 proteins were used to construct a combined model to predict host tropism of influenza virus strains. This would help assess a novel influenza strain's host range capability.

Conclusions

From the prediction models constructed, all achieved high prediction performance, indicating clear distinctions in both avian and human proteins. When used together as a host tropism prediction system, zoonotic strains could potentially be identified based on different protein prediction results. Understanding and predicting host tropism of influenza proteins lay an important foundation for future work in constructing computation models capable of directly predicting interspecies transmission of influenza viruses. The models are available for prediction at http://fluleap.bic.nus.edu.sg.
  相似文献   

5.
Aromatic residues have been previously shown to mediate the self-assembly of different soluble proteins through pi-pi interactions (McGaughey, G. B., Gagne, M., and Rappe, A. K. (1998) J. Biol. Chem. 273, 15458-15463). However, their role in transmembrane (TM) assembly is not yet clear. In this study, we performed statistical analysis of the frequency of occurrence of aromatic pairs in a bacterial TM data base that provided an initial indication that the appearance of a specific aromatic pattern, Aromatic-XX-Aromatic, is not coincidental, similar to the well characterized QXXS motif. The QXXS motif was previously shown to be both critical and sufficient for stabilizing TM self-assembly. Using the ToxR system, we monitored the dimerization propensities of TM domains that contain mutations of interacting residues to aromatic amino acids and demonstrated that aromatic residues can adequately stabilize self-association. Importantly, we have provided an example of a natural TM domain, the cholera toxin secretion protein EpsM, whose TM self-assembly is mediated by an aromatic motif (WXXW). This is, in fact, the first evidence that aromatic residues are involved in the dimerization of a wild type TM domain. The association mediated by aromatic residues was found to be sensitive to the TM sequence, suggesting that aromatic residue motifs can provide a general means for specificity in TM assembly. Molecular dynamics provided a structural explanation for this backbone sequence sensitivity.  相似文献   

6.
Fuchs A  Kirschner A  Frishman D 《Proteins》2009,74(4):857-871
Despite rapidly increasing numbers of available 3D structures, membrane proteins still account for less than 1% of all structures in the Protein Data Bank. Recent high-resolution structures indicate a clearly broader structural diversity of membrane proteins than initially anticipated, motivating the development of reliable structure prediction methods specifically tailored for this class of molecules. One important prediction target capturing all major aspects of a protein's 3D structure is its contact map. Our analysis shows that computational methods trained to predict residue contacts in globular proteins perform poorly when applied to membrane proteins. We have recently published a method to identify interacting alpha-helices in membrane proteins based on the analysis of coevolving residues in predicted transmembrane regions. Here, we present a substantially improved algorithm for the same problem, which uses a newly developed neural network approach to predict helix-helix contacts. In addition to the input features commonly used for contact prediction of soluble proteins, such as windowed residue profiles and residue distance in the sequence, our network also incorporates features that apply to membrane proteins only, such as residue position within the transmembrane segment and its orientation toward the lipophilic environment. The obtained neural network can predict contacts between residues in transmembrane segments with nearly 26% accuracy. It is therefore the first published contact predictor developed specifically for membrane proteins performing with equal accuracy to state-of-the-art contact predictors available for soluble proteins. The predicted helix-helix contacts were employed in a second step to identify interacting helices. For our dataset consisting of 62 membrane proteins of solved structure, we gained an accuracy of 78.1%. Because the reliable prediction of helix interaction patterns is an important step in the classification and prediction of membrane protein folds, our method will be a helpful tool in compiling a structural census of membrane proteins.  相似文献   

7.
8.
Han Shi  Simin Liu  Junqi Chen  Xuan Li  Qin Ma  Bin Yu 《Genomics》2019,111(6):1839-1852
The identification of drug-target interactions has great significance for pharmaceutical scientific research. Since traditional experimental methods identifying drug-target interactions is costly and time-consuming, the use of machine learning methods to predict potential drug-target interactions has attracted widespread attention. This paper presents a novel drug-target interactions prediction method called LRF-DTIs. Firstly, the pseudo-position specific scoring matrix (PsePSSM) and FP2 molecular fingerprinting were used to extract the features of drug-target. Secondly, using Lasso to reduce the dimension of the extracted feature information and then the Synthetic Minority Oversampling Technique (SMOTE) method was used to deal with unbalanced data. Finally, the processed feature vectors were input into a random forest (RF) classifier to predict drug-target interactions. Through 10 trials of 5-fold cross-validation, the overall prediction accuracies on the enzyme, ion channel (IC), G-protein-coupled receptor (GPCR) and nuclear receptor (NR) datasets reached 98.09%, 97.32%, 95.69%, and 94.88%, respectively, and compared with other prediction methods. In addition, we have tested and verified that our method not only could be applied to predict the new interactions but also could obtain a satisfactory result on the new dataset. All the experimental results indicate that our method can significantly improve the prediction accuracy of drug-target interactions and play a vital role in the new drug research and target protein development. The source code and all datasets are available at https://github.com/QUST-AIBBDRC/LRF-DTIs/ for academic use.  相似文献   

9.
Predicting domain-domain interactions using a parsimony approach   总被引:4,自引:2,他引:2       下载免费PDF全文
We propose a novel approach to predict domain-domain interactions from a protein-protein interaction network. In our method we apply a parsimony-driven explanation of the network, where the domain interactions are inferred using linear programming optimization, and false positives in the protein network are handled by a probabilistic construction. This method outperforms previous approaches by a considerable margin. The results indicate that the parsimony principle provides a correct approach for detecting domain-domain contacts.  相似文献   

10.
11.
Sequence motifs are responsible for ensuring the proper assembly of transmembrane (TM) helices in the lipid bilayer. To understand the mechanism by which the affinity of a common TM-TM interactive motif is controlled at the sequence level, we compared two well characterized GXXXG motif-containing homodimers, those formed by human erythrocyte protein glycophorin A (GpA, high-affinity dimer) and those formed by bacteriophage M13 major coat protein (MCP, low affinity dimer). In both constructs, the GXXXG motif is necessary for TM-TM association. Although the remaining interfacial residues (underlined) in GpA (LIXXGVXXGVXXT) differ from those in MCP (VVXXGAXXGIXXF), molecular modeling performed here indicated that GpA and MCP dimers possess the same overall fold. Thus, we could introduce GpA interfacial residues, alone and in combination, into the MCP sequence to help decrypt the determinants of dimer affinity. Using both in vivo TOXCAT assays and SDS-PAGE gel migration rates of synthetic peptides derived from TM regions of the proteins, we found that the most distal interfacial sites, 12 residues apart (and approximately 18 A in structural space), work in concert to control TM-TM affinity synergistically.  相似文献   

12.
MOTIVATION: Protein interactions are of biological interest because they orchestrate a number of cellular processes such as metabolic pathways and immunological recognition. Domains are the building blocks of proteins; therefore, proteins are assumed to interact as a result of their interacting domains. Many domain-based models for protein interaction prediction have been developed, and preliminary results have demonstrated their feasibility. Most of the existing domain-based methods, however, consider only single-domain pairs (one domain from one protein) and assume independence between domain-domain interactions. RESULTS: In this paper, we introduce a domain-based random forest of decision trees to infer protein interactions. Our proposed method is capable of exploring all possible domain interactions and making predictions based on all the protein domains. Experimental results on Saccharomyces cerevisiae dataset demonstrate that our approach can predict protein-protein interactions with higher sensitivity (79.78%) and specificity (64.38%) compared with that of the maximum likelihood approach. Furthermore, our model can be used to infer interactions not only for single-domain pairs but also for multiple domain pairs.  相似文献   

13.
14.
Alanine insertions into the glycophorin A transmembrane helix are found to disrupt helix-helix dimerization in a way that is fully consistent with earlier saturation mutagenesis data, suggesting that Ala-insertion scanning can be used to rapidly map the approximate location of structurally and/or functionally important segments in transmembrane helices.  相似文献   

15.
Transmembrane beta-barrel (TMB) proteins are embedded in the outer membrane of Gram-negative bacteria, mitochondria, and chloroplasts. The cellular location and functional diversity of beta-barrel outer membrane proteins (omps) makes them an important protein class. At the present time, very few nonhomologous TMB structures have been determined by X-ray diffraction because of the experimental difficulty encountered in crystallizing transmembrane proteins. A novel method using pairwise interstrand residue statistical potentials derived from globular (nonouter membrane) proteins is introduced to predict the supersecondary structure of transmembrane beta-barrel proteins. The algorithm transFold employs a generalized hidden Markov model (i.e., multitape S-attribute grammar) to describe potential beta-barrel supersecondary structures and then computes by dynamic programming the minimum free energy beta-barrel structure. Hence, the approach can be viewed as a "wrapping" component that may capture folding processes with an initiation stage followed by progressive interaction of the sequence with the already-formed motifs. This approach differs significantly from others, which use traditional machine learning to solve this problem, because it does not require a training phase on known TMB structures and is the first to explicitly capture and predict long-range interactions. TransFold outperforms previous programs for predicting TMBs on smaller (相似文献   

16.
Since protein complexes play a crucial role in biological cells, one of the major goals in bioinformatics is the elucidation of protein complexes. A general approach is to build a prediction rule based on multiple data sources, e.g. gene expression data and protein interaction data, to assess the likelihood of two proteins having complex association. We critically revisit the step of predictor construction, i.e. the determination of a proper training set, an optimal classifier, and, most importantly, an optimal feature set. We use an exhaustive set of features, which includes the 2hop-feature as introduced by Wong et al. for predicting synthetic sick or lethal interactions. Post-processing of the likelihoods of protein interaction is then required to extract protein complexes. We propose a new protocol for combining these likelihood estimates. The protocol interprets the probabilities of complex association as output by the prediction rule as distances and employs hierarchical clustering to find groups of interacting proteins. In contrast to the computationally expensive search-and-score approach of Sharan et al., this protocol is very fast and can be applied to fully connected graphs. The protocol identifies trusted protein complexes with high confidence. We show that the 2hop-feature is relevant for predicting protein complexes. Furthermore, several interesting hypotheses about new protein complexes have been generated. For example, our approach linked the protein FYV4 to the mitochondrial ribosomal subunit. Interestingly, it is known that this protein is located in the mitochondrion, but its biological role is unknown. Vid22 and YGR071C were also linked, which corresponds to the new TAP data of Krogan et al.  相似文献   

17.
18.
The amino acid distribution and residue-residue contacts in molecular chaperones are different when compared to normal globular proteins. The study of molecular chaperones reveals a different surrounding environment to exist for the residues Cys, Trp, and His which may play an important role in determining the chaperone structures. Unlike globular proteins, it has been observed that a one-to-one correspondence between the amino acid distribution in a sequence and the structures of molecular chaperones. The preference of amino acid residues surrounding all 20 types of residues in secondary structures and their accessible surface areas have been analysed.  相似文献   

19.
《Genomics》2020,112(6):4666-4674
Natural antioxidant proteins are mainly found in plants and animals, which interact to eliminate excessive free radicals and protect cells and DNA from damage, prevent and treat some diseases. Therefore, accurate identification of antioxidant proteins is important for the development of new drugs and research of related diseases. This article proposes novel method based on the combination of random forest and hybrid features that can accurately predict antioxidant proteins. Four single feature extraction methods (188D, profile-based Auto-cross covariance (ACC-PSSM), N-gram, and g-gap) and hybrid feature representation methods were used to feature extraction. Three feature selection methods (MRMD, t-SNE, and the optimal feature set selection) were adopted to determine the optimal features. The new hybrid feature vectors derived by combining 188D with the other three features all have indicators ranging from 0.9550 to 0.9990. The novel method showed better performance compared with the other methods.  相似文献   

20.

Background

Estimation of allele frequency is of fundamental importance in population genetic analyses and in association mapping. In most studies using next-generation sequencing, a cost effective approach is to use medium or low-coverage data (e.g., < 15X). However, SNP calling and allele frequency estimation in such studies is associated with substantial statistical uncertainty because of varying coverage and high error rates.

Results

We evaluate a new maximum likelihood method for estimating allele frequencies in low and medium coverage next-generation sequencing data. The method is based on integrating over uncertainty in the data for each individual rather than first calling genotypes. This method can be applied to directly test for associations in case/control studies. We use simulations to compare the likelihood method to methods based on genotype calling, and show that the likelihood method outperforms the genotype calling methods in terms of: (1) accuracy of allele frequency estimation, (2) accuracy of the estimation of the distribution of allele frequencies across neutrally evolving sites, and (3) statistical power in association mapping studies. Using real re-sequencing data from 200 individuals obtained from an exon-capture experiment, we show that the patterns observed in the simulations are also found in real data.

Conclusions

Overall, our results suggest that association mapping and estimation of allele frequencies should not be based on genotype calling in low to medium coverage data. Furthermore, if genotype calling methods are used, it is usually better not to filter genotypes based on the call confidence score.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号