首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 0 毫秒
MOTIVATION: Short well-defined domains known as peptide recognition modules (PRMs) regulate many important protein-protein interactions involved in the formation of macromolecular complexes and biochemical pathways. Since high-throughput experiments like yeast two-hybrid and phage display are expensive and intrinsically noisy, it would be desirable to more specifically target or partially bypass them with complementary in silico approaches. In the present paper, we present a probabilistic discriminative approach to predicting PRM-mediated protein-protein interactions from sequence data. The model is motivated by the discriminative model of Segal and Sharan as an alternative to the generative approach of Reiss and Schwikowski. In our evaluation, we focus on predicting the interaction network. As proposed by Williams, we overcome the problem of susceptibility to over-fitting by adopting a Bayesian a posteriori approach based on a Laplacian prior in parameter space. RESULTS: The proposed method was tested on two datasets of protein-protein interactions involving 28 SH3 domain proteins in Saccharmomyces cerevisiae, where the datasets were obtained with different experimental techniques. The predictions were evaluated with out-of-sample receiver operator characteristic (ROC) curves. In both cases, Laplacian regularization turned out to be crucial for achieving a reasonable generalization performance. The Laplacian-regularized discriminative model outperformed the generative model of Reiss and Schwikowski in terms of the area under the ROC curve on both datasets. The performance was further improved with a hybrid approach, in which our model was initialized with the motifs obtained with the method of Reiss and Schwikowski. AVAILABILITY: Software and supplementary material is available from http://lehrach.com/wolfgang/dmf.  相似文献   

Protein-DNA interactions are crucial for many biological processes. Attempts to model these interactions have generally taken the form of amino acid-base recognition codes or purely sequence-based profile methods, which depend on the availability of extensive sequence and structural information for specific structural families, neglect side-chain conformational variability, and lack generality beyond the structural family used to train the model. Here, we take advantage of recent advances in rotamer-based protein design and the large number of structurally characterized protein-DNA complexes to develop and parameterize a simple physical model for protein-DNA interactions. The model shows considerable promise for redesigning amino acids at protein-DNA interfaces, as design calculations recover the amino acid residue identities and conformations at these interfaces with accuracies comparable to sequence recovery in globular proteins. The model shows promise also for predicting DNA-binding specificity for fixed protein sequences: native DNA sequences are selected correctly from pools of competing DNA substrates; however, incorporation of backbone movement will likely be required to improve performance in homology modeling applications. Interestingly, optimization of zinc finger protein amino acid sequences for high-affinity binding to specific DNA sequences results in proteins with little or no predicted specificity, suggesting that naturally occurring DNA-binding proteins are optimized for specificity rather than affinity. When combined with algorithms that optimize specificity directly, the simple computational model developed here should be useful for the engineering of proteins with novel DNA-binding specificities.  相似文献   

A set of 43 337 splice junction pairs was extracted from mammalian GenBank annotated genes. Expressed sequence tag (EST) sequences support 22 489 of them. Of these, 98.71% contain canonical dinucleotides GT and AG for donor and acceptor sites, respectively; 0.56% hold non-canonical GC-AG splice site pairs; and the remaining 0.73% occurs in a lot of small groups (with a maximum size of 0.05%). Studying these groups we observe that many of them contain splicing dinucleotides shifted from the annotated splice junction by one position. After close examination of such cases we present a new classification consisting of only eight observed types of splice site pairs (out of 256 a priori possible combinations). EST alignments allow us to verify the exonic part of the splice sites, but many non-canonical cases may be due to intron sequencing errors. This idea is given substantial support when we compare the sequences of human genes having non-canonical splice sites deposited in GenBank by high throughput genome sequencing projects (HTG). A high proportion (156 out of 171) of the human non-canonical and EST-supported splice site sequences had a clear match in the human HTG. They can be classified after corrections as: 79 GC-AG pairs (of which one was an error that corrected to GC-AG), 61 errors that were corrected to GT-AG canonical pairs, six AT-AC pairs (of which two were errors that corrected to AT-AC), one case was produced from non-existent intron, seven cases were found in HTG that were deposited to GenBank and finally there were only two cases left of supported non-canonical splice sites. If we assume that approximately the same situation is true for the whole set of annotated mammalian non-canonical splice sites, then the 99.24% of splice site pairs should be GT-AG, 0.69% GC-AG, 0.05% AT-AC and finally only 0.02% could consist of other types of non-canonical splice sites. We analyze several characteristics of EST-verified splice sites and build weight matrices for the major groups, which can be incorporated into gene prediction programs. We also present a set of EST-verified canonical splice sites larger by two orders of magnitude than the current one (22 199 entries versus approximately 600) and finally, a set of 290 EST-supported non-canonical splice sites. Both sets should be significant for future investigations of the splicing mechanism.  相似文献   

A database (SpliceDB) of known mammalian splice site sequences has been developed. We extracted 43 337 splice pairs from mammalian divisions of the gene-centered Infogene database, including sites from incomplete or alternatively spliced genes. Known EST sequences supported 22 815 of them. After discarding sequences with putative errors and ambiguous location of splice junctions the verified dataset includes 22 489 entries. Of these, 98.71% contain canonical GT-AG junctions (22 199 entries) and 0.56% have non-canonical GC-AG splice site pairs. The remainder (0.73%) occurs in a lot of small groups (with a maximum size of 0.05%). We especially studied non-canonical splice sites, which comprise 3.73% of GenBank annotated splice pairs. EST alignments allowed us to verify only the exonic part of splice sites. To check the conservative dinucleotides we compared sequences of human non-canonical splice sites with sequences from the high throughput genome sequencing project (HTG). Out of 171 human non-canonical and EST-supported splice pairs, 156 (91.23%) had a clear match in the human HTG. They can be classified after sequence analysis as: 79 GC-AG pairs (of which one was an error that corrected to GC-AG), 61 errors corrected to GT-AG canonical pairs, six AT-AC pairs (of which two were errors corrected to AT-AC), one case was produced from a non-existent intron, seven cases were found in HTG that were deposited to GenBank and finally there were only two other cases left of supported non-canonical splice pairs. The information about verified splice site sequences for canonical and non-canonical sites is presented in SpliceDB with the supporting evidence. We also built weight matrices for the major splice groups, which can be incorporated into gene prediction programs. SpliceDB is available at the computational genomic Web server of the Sanger Centre: http://genomic.sanger.ac. uk/spldb/SpliceDB.html and at http://www.softberry. com/spldb/SpliceDB.html.  相似文献   

Biomechanics and Modeling in Mechanobiology - There is a growing interest in the development of patient-specific finite element models of the human lumbar spine for both the assessment of injury...  相似文献   

BackgroundThere is a growing body of evidence associating microRNAs (miRNAs) with human diseases. MiRNAs are new key players in the disease paradigm demonstrating roles in several human diseases. The functional association between miRNAs and diseases remains largely unclear and far from complete. With the advent of high-throughput functional genomics techniques that infer genes and biological pathways dysregulted in diseases, it is now possible to infer functional association between diseases and biological molecules by integrating disparate biological information.ResultsHere, we first used Lasso regression model to identify miRNAs associated with disease signature as a proof of concept. Then we proposed an integrated approach that uses disease-gene associations from microarray experiments and text mining, and miRNA-gene association from computational predictions and protein networks to build functional associations network between miRNAs and diseases. The findings of the proposed model were validated against gold standard datasets using ROC analysis and results were promising (AUC=0.81). Our protein network-based approach discovered 19 new functional associations between prostate cancer and miRNAs. The new 19 associations were validated using miRNA expression data and clinical profiles and showed to act as diagnostic and prognostic prostate biomarkers. The proposed integrated approach allowed us to reconstruct functional associations between miRNAs and human diseases and uncovered functional roles of newly discovered miRNAs.ConclusionsLasso regression was used to find associations between diseases and miRNAs using their gene signature. Defining miRNA gene signature by integrating the downstream effect of miRNAs demonstrated better performance than the miRNA signature alone. Integrating biological networks and multiple data to define miRNA and disease gene signature demonstrated high performance to uncover new functional associations between miRNAs and diseases.  相似文献   

Osteoarthritis is the most prevalent form of arthritis in the world and it is becoming a major public health problem. Osteoarthritic chondrocytes undergo morphological and biochemical changes that lead to de-differentiation. The involvement of signaling pathways, such as the Wnt pathway, during cartilage pathology has been reported. Wnt signaling regulates critical biological processes. Wnt signals are transduced through at least three intracellular signaling pathways including the canonical Wnt/β-catenin pathway, the Wnt/Ca2 + pathway and the Wnt/planar cell polarity pathway. We investigated the involvement of the Wnt canonical and non-canonical pathways in human articular chondrocyte de-differentiation in vitro. Human articular chondrocytes were cultured through four passages with no treatment, or with sFRP3 treatment, an inhibitor of Wnt pathways, or with DKK1 treatment, an inhibitor of the canonical pathway. Chondrocyte-secreted markers and Wnt pathway components were analyzed using western blotting and qPCR. Inhibition of the Wnt pathway showed that the canonical Wnt signaling probably is responsible for inhibition of collagen II expression, activation of metalloproteinase 13 expression and regulation of Wnt7a and c-jun expression during chondrocyte de-differentiation in vitro. Our results also suggest that expressions of eNOS, Wnt5a and cyclinE1 are regulated by non-canonical Wnt signaling.  相似文献   

Weighted least-squares regression has been programmed in Pascal for a microcomputer. A double precision Pascal compiler and the Motorola 6809 assembler produce a fast machine-code program occupying 22,000 bytes of memory when appended to the Pascal run-time module. Large data sets fit in the remaining memory. A regression with 72 observations and 24 parameters runs in 7 min, excluding optional print out of large matrices. The maximum dimensions of the design matrix, X, can be altered by modifying two Pascal constants. Minor changes to the Pascal source program will make it compatible with other Pascal compilers. The program optionally orthogonalises the X matrix to detect linearly-dependent columns in X, and/or generate orthogonal parameter estimates. After orthogonalizing X and fitting the model, the parameter estimates for the original X can be retrieved by the program. Regressions on a repeatedly reduced model are performed through elimination of columns in X until the minimum adequate model is obtained.  相似文献   



Selection of influential genes with microarray data often faces the difficulties of a large number of genes and a relatively small group of subjects. In addition to the curse of dimensionality, many gene selection methods weight the contribution from each individual subject equally. This equal-contribution assumption cannot account for the possible dependence among subjects who associate similarly to the disease, and may restrict the selection of influential genes.  相似文献   

Weekly weighings of the laboratory rats are required to determine the correct dosage for mixing in the food. This creates problems in that the food mixing must be done immediately after the weighings and staff are often heavily taxed to perform the task. A discounted least squares growth prediction model allows for prediction of weights a week ahead of time, obviating the necessity for instantaneously processing the weight data. When dosages were prepared based on these predictions, for 10 treatment combinations 100% of the doses proved to be within 8-0% of the required dosage; 98-4% were within 5% of the required dosage; 78-7% were within 2% of the required dosage; and 51-6% were within 1% of the required dosage. The quadratic weight prediction model can also be incorporated into a model for predicting food consumption.  相似文献   

The helix/coil equilibrium of a peptide in solution can be modulated by a variety of side-chain interactions that are not incorporated into the standard statistical mechanical models for prediction of peptide helical content. In this report, we describe a recursive formulation of the Lifson-Roig model that facilitates incorporation of specific pairwise side-chain interactions as well as nonspecific individual side-chain capping interactions. Application of this extended model to a series of host/guest peptides indicates that the apparent delta G value for a pairwise apolar interaction is dependent upon the spacing and orientation but not the sequential location of the participating residues. The apparent delta G values for such interactions are about 40% greater than the corresponding apparent delta delta G values obtained from difference measurements.  相似文献   

Song J  Tan H  Wang M  Webb GI  Akutsu T 《PloS one》2012,7(2):e30361
Protein backbone torsion angles (Phi) and (Psi) involve two rotation angles rotating around the C(α)-N bond (Phi) and the C(α)-C bond (Psi). Due to the planarity of the linked rigid peptide bonds, these two angles can essentially determine the backbone geometry of proteins. Accordingly, the accurate prediction of protein backbone torsion angle from sequence information can assist the prediction of protein structures. In this study, we develop a new approach called TANGLE (Torsion ANGLE predictor) to predict the protein backbone torsion angles from amino acid sequences. TANGLE uses a two-level support vector regression approach to perform real-value torsion angle prediction using a variety of features derived from amino acid sequences, including the evolutionary profiles in the form of position-specific scoring matrices, predicted secondary structure, solvent accessibility and natively disordered region as well as other global sequence features. When evaluated based on a large benchmark dataset of 1,526 non-homologous proteins, the mean absolute errors (MAEs) of the Phi and Psi angle prediction are 27.8° and 44.6°, respectively, which are 1% and 3% respectively lower than that using one of the state-of-the-art prediction tools ANGLOR. Moreover, the prediction of TANGLE is significantly better than a random predictor that was built on the amino acid-specific basis, with the p-value<1.46e-147 and 7.97e-150, respectively by the Wilcoxon signed rank test. As a complementary approach to the current torsion angle prediction algorithms, TANGLE should prove useful in predicting protein structural properties and assisting protein fold recognition by applying the predicted torsion angles as useful restraints. TANGLE is freely accessible at http://sunflower.kuicr.kyoto-u.ac.jp/~sjn/TANGLE/.  相似文献   

The primary objective of this paper is to provide a guide on implementing Bayesian generalized kernel regression methods for genomic prediction in the statistical software R. Such methods are quite efficient for capturing complex non-linear patterns that conventional linear regression models cannot. Furthermore, these methods are also powerful for leveraging environmental covariates, such as genotype × environment (G×E) prediction, among others. In this study we provide the building process of seven kernel methods: linear, polynomial, sigmoid, Gaussian, Exponential, Arc-cosine 1 and Arc-cosine L. Additionally, we highlight illustrative examples for implementing exact kernel methods for genomic prediction under a single-environment, a multi-environment and multi-trait framework, as well as for the implementation of sparse kernel methods under a multi-environment framework. These examples are followed by a discussion on the strengths and limitations of kernel methods and, subsequently by conclusions about the main contributions of this paper.Subject terms: Genomics, Plant sciences  相似文献   

Phytochemistry Reviews - Strigolactones (SLs) are natural products with promising applications as agrochemicals to prevent infestation with parasitic weeds due to their ability to trigger seed...  相似文献   

We have previously identified a single inhibitory Ca2+-binding site in the first EF-hand of the essential light chain of Physarum conventional myosin (Farkas, L., Malnasi-Csizmadia, A., Nakamura, A., Kohama, K., and Nyitray, L. (2003) J. Biol. Chem. 278, 27399-27405). As a general rule, conformation of the EF-hand-containing domains in the calmodulin family is "closed" in the absence and "open" in the presence of bound cations; a notable exception is the unusual Ca2+-bound closed domain in the essential light chain of the Ca2+-activated scallop muscle myosin. Here we have reported the 1.8 A resolution structure of the regulatory domain (RD) of Physarum myosin II in which Ca2+ is bound to a canonical EF-hand that is also in a closed state. The 12th position of the EF-hand loop, which normally provides a bidentate ligand for Ca2+ in the open state, is too far in the structure to participate in coordination of the ion. The structure includes a second Ca2+ that only mediates crystal contacts. To reveal the mechanism behind the regulatory effect of Ca2+, we compared conformational flexibilities of the liganded and unliganded RD. Our working hypothesis, i.e. the modulatory effect of Ca2+ on conformational flexibility of RD, is in line with the observed suppression of hydrogen-deuterium exchange rate in the Ca2+-bound form, as well as with results of molecular dynamics calculations. Based on this evidence, we concluded that Ca2+-induced change in structural dynamics of RD is a major factor in Ca2+-mediated regulation of Physarum myosin II activity.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号