首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
As one of the most common post-translational modifications, ubiquitination regulates the quantity and function of a variety of proteins. Experimental and clinical investigations have also suggested the crucial roles of ubiquitination in several human diseases. The complicated sequence context of human ubiquitination sites revealed by proteomic studies highlights the need of developing effective computational strategies to predict human ubiquitination sites. Here we report the establishment of a novel human-specific ubiquitination site predictor through the integration of multiple complementary classifiers. Firstly, a Support Vector Machine (SVM) classier was constructed based on the composition of k-spaced amino acid pairs (CKSAAP) encoding, which has been utilized in our previous yeast ubiquitination site predictor. To further exploit the pattern and properties of the ubiquitination sites and their flanking residues, three additional SVM classifiers were constructed using the binary amino acid encoding, the AAindex physicochemical property encoding and the protein aggregation propensity encoding, respectively. Through an integration that relied on logistic regression, the resulting predictor termed hCKSAAP_UbSite achieved an area under ROC curve (AUC) of 0.770 in 5-fold cross-validation test on a class-balanced training dataset. When tested on a class-balanced independent testing dataset that contains 3419 ubiquitination sites, hCKSAAP_UbSite has also achieved a robust performance with an AUC of 0.757. Specifically, it has consistently performed better than the predictor using the CKSAAP encoding alone and two other publicly available predictors which are not human-specific. Given its promising performance in our large-scale datasets, hCKSAAP_UbSite has been made publicly available at our server (http://protein.cau.edu.cn/cksaap_ubsite/).  相似文献   

2.
Preferred in vivo ubiquitination sites   总被引:1,自引:0,他引:1  
MOTIVATION: The conjugation of ubiquitin to target molecules involves several enzymatic steps. Little is known about the specificity of ubiquitination. How E3 ligases select their substrate and which lysines are targeted for ubiquitin conjugation is largely an enigma. The object of this study is to identify preferred ubiquitination sites. Genetic approaches to study this question have proven difficult, because of the redundancy of ligases and the lack of strictly required motifs. However, a better understanding of acceptor site selection could help to predict ubiquitination sites and clarify yet unsolved structure-function relationships of the transfer reaction. RESULTS: In an effort to define preferences for ubiquitination, we systematically analyzed structure and sequence of 135 known ubiquitination sites in 95 proteins in Saccharomyces cerevisiae. The results show clear structural preferences for ubiquitin ligation to target proteins, and compartment-specific amino acid patterns in close proximity to the modified side chain. SUPPLEMENTARY INFORMATION: http://www.people.fas.harvard.edu/~catic.  相似文献   

3.
Ubiquitination is an important post-translational event responsible for half-life and turnover of proteins inside the cell. Proteins are ubiquitinated by forming an iso-peptide bond between their lysine residue and C-terminal glycine residue of ubiquitin leading to rapid degradation of proteins by 26S proteosome complex. Deregulation of ubiquitination is manifested by aberrant expression of E3-ligase activity or mutation in the surroundings of ubiquitination sites. Many new experimentally validated ubiquitinated lysines have been recently identified that motivated the study of the environments surrounding the ubiquitinated lysines. With the help of known ubiquitinated proteins, here we present a comprehensive study of sequence and spatial environment of ubiquitination sites of human and yeast proteins. To identify position-specific features, this work distinguishes the spatial environments as proximity and distal regions. Certain amino acids specific to these regions, well differentiate the ubiquitination sites from non-ubiquitination sites are revealed. Additionally, amino acid signatures that contribute for protein disordered regions and solvent accessibility of amino acids are found to be contributing factors in ubiquitination sites. These results suggest that the ubiquitination site environment of the substrate determines the recognition and unfolding of substrate to facilitate the entry into 26S proteosomal complex. We believe that these findings will help in better prediction of ubiquitination sites using the sequence and spatial information.  相似文献   

4.
We have investigated the target choice of the related transposable elements Tc1 and Tc3 of the nematode C. elegans. The exact locations of 204 independent Tc1 insertions and 166 Tc3 insertions in an 1 kbp region of the genome were determined. There was no phenotypic selection for the insertions. All insertions were into the sequence TA. Both elements have a strong preference for certain positions in the 1 kbp region. Hot sites for integration are not clustered or regularly spaced. The orientation of the integrated transposon has no effect on the distribution pattern. We tested several explanations for the target site preference. If simple structural features of the DNA (e.g. bends) would mark hot sites, we would expect the patterns of the two related transposons Tc1 and Tc3 to be similar; however we found them to be completely different. Furthermore we found that the sequence at the donor site has no effect on the choice of the new insertion site, because the insertion pattern of a transposon that jumps from a transgenic donor site is identical to the insertion pattern of transposons jumping from endogenous genomic donor sites. The most likely explanation for the target choice is therefore that the primary sequence of the target site is recognized by the transposase. However, alignment of the Tc1 and Tc3 integration sites does not reveal a strong consensus sequence for either transposon.  相似文献   

5.
Alternative pre-mRNA splicing may be the most efficient and widespread mechanism to generate multiple protein isoforms from single genes. Here, we describe the genomic analysis of one of the most frequent types of alternative pre-mRNA splicing, alternative 5'- and 3'-splice-site selection. Using an EST-based alternative splicing database recording >47,000 alternative splicing events, we determined the frequency and location of alternative 5'- and 3'-splice sites within the human genome. The most common alternative splice sites used in the human genome are located within 6 nucleotides (nt) of the dominant splice site. We show that the EST database overrepresents alternative splicing events that maintain the reading frame, thus supporting the concept that RNA quality-control steps ensure that mRNAs that encode for potentially harmful protein products are destroyed and do not serve as templates for translation. The most frequent location for alternative 5'-splice sites is 4 nt upstream or downstream from the dominant splice site. Sequence analysis suggests that this preference is a consequence of the U1 snRNP binding sequence at the 5'-splice site, which frequently contains a GU dinucleotide 4 nt downstream from the dominant splice site. Surprisingly, approximately 50% of duplicated 3'-YAG splice junctions are subject to alternative splicing. This high probability of alternative 3'-splice-site activation in close proximity of the dominant 3'-splice site suggests that the second step of the splicing may be prone to violate splicing fidelity.  相似文献   

6.
7.
Cai Y  Huang T  Hu L  Shi X  Xie L  Li Y 《Amino acids》2012,42(4):1387-1395
Ubiquitination, one of the most important post-translational modifications of proteins, occurs when ubiquitin (a small 76-amino acid protein) is attached to lysine on a target protein. It often commits the labeled protein to degradation and plays important roles in regulating many cellular processes implicated in a variety of diseases. Since ubiquitination is rapid and reversible, it is time-consuming and labor-intensive to identify ubiquitination sites using conventional experimental approaches. To efficiently discover lysine-ubiquitination sites, a sequence-based predictor of ubiquitination site was developed based on nearest neighbor algorithm. We used the maximum relevance and minimum redundancy principle to identify the key features and the incremental feature selection procedure to optimize the prediction engine. PSSM conservation scores, amino acid factors and disorder scores of the surrounding sequence formed the optimized 456 features. The Mathew’s correlation coefficient (MCC) of our ubiquitination site predictor achieved 0.142 by jackknife cross-validation test on a large benchmark dataset. In independent test, the MCC of our method was 0.139, higher than the existing ubiquitination site predictor UbiPred and UbPred. The MCCs of UbiPred and UbPred on the same test set were 0.135 and 0.117, respectively. Our analysis shows that the conservation of amino acids at and around lysine plays an important role in ubiquitination site prediction. What’s more, disorder and ubiquitination have a strong relevance. These findings might provide useful insights for studying the mechanisms of ubiquitination and modulating the ubiquitination pathway, potentially leading to potential therapeutic strategies in the future.  相似文献   

8.
9.
Alternative splicing constitutes a major mechanism creating protein diversity in humans. This diversity can result from the alternative skipping of entire exons or by alternative selection of the 5′ or 3′ splice sites that define the exon boundaries. In this study, we analyze the sequence and evolutionary characteristics of alternative 3′ splice sites conserved between human and mouse genomes for distances ranging from 3 to 100 nucleotides. We show that alternative splicing events can be distinguished from constitutive splicing by a combination of properties which vary depending on the distance between the splice sites. Among the unique features of alternative 3′ splice sites, we observed an unexpectedly high occurrence of events in which a polypyrimidine tract was found to overlap the upstream splice site. By applying a machine-learning approach, we show that we can successfully discriminate true alternative 3′ splice sites from constitutive 3′ splice sites. Finally, we propose that the unique features of the intron flanking alternative splice sites are indicative of a regulatory mechanism that is involved in splice site selection. We postulate that the process of splice site selection is influenced by the distance between the competitive splice sites.  相似文献   

10.
The ErbB protein tyrosine kinases are among the most important cell signaling families and mutation-induced modulation of their activity is associated with diverse functions in biological networks and human disease. We have combined molecular dynamics simulations of the ErbB kinases with the protein structure network modeling to characterize the reorganization of the residue interaction networks during conformational equilibrium changes in the normal and oncogenic forms. Structural stability and network analyses have identified local communities integrated around high centrality sites that correspond to the regulatory spine residues. This analysis has provided a quantitative insight to the mechanism of mutation-induced “superacceptor” activity in oncogenic EGFR dimers. We have found that kinase activation may be determined by allosteric interactions between modules of structurally stable residues that synchronize the dynamics in the nucleotide binding site and the αC-helix with the collective motions of the integrating αF-helix and the substrate binding site. The results of this study have pointed to a central role of the conserved His-Arg-Asp (HRD) motif in the catalytic loop and the Asp-Phe-Gly (DFG) motif as key mediators of structural stability and allosteric communications in the ErbB kinases. We have determined that residues that are indispensable for kinase regulation and catalysis often corresponded to the high centrality nodes within the protein structure network and could be distinguished by their unique network signatures. The optimal communication pathways are also controlled by these nodes and may ensure efficient allosteric signaling in the functional kinase state. Structure-based network analysis has quantified subtle effects of ATP binding on conformational dynamics and stability of the EGFR structures. Consistent with the NMR studies, we have found that nucleotide-induced modulation of the residue interaction networks is not limited to the ATP site, and may enhance allosteric cooperativity with the substrate binding region by increasing communication capabilities of mediating residues.  相似文献   

11.
12.
Shen X  Mao H  Miao S 《Génome》2011,54(2):144-150
cis-Elements CArG bound by serum response factor (SRF) are presently being intensively studied, but little is known about the substitution pattern of functional CArG elements. Here, we have performed the first evolutionary analysis of CArGome in the human and mouse genome through bioinformatic methods and statistical tests. We calculated the substitution rate at each site of the functional CArG elements. The results showed that the core sites of the functional CArG elements evolved faster than did the background DNA, indicating that these sites were likely to evolve under positive selection. Moreover, a strong TATA "motif" was evident in the core region within the functional CArG elements in both human and mouse promoters. This motif could probably be a major contribution to the formation of the spatial structure, which was important for CArG-SRF recognition. Thus, the study further revealed the sequence character and substitution pattern of CArG elements and provided useful information for the study of the SRF-binding efficiencies of CArG promoters in functional assays.  相似文献   

13.
Chen Z  Chen YZ  Wang XF  Wang C  Yan RX  Zhang Z 《PloS one》2011,6(7):e22930
As one of the most important reversible protein post-translation modifications, ubiquitination has been reported to be involved in lots of biological processes and closely implicated with various diseases. To fully decipher the molecular mechanisms of ubiquitination-related biological processes, an initial but crucial step is the recognition of ubiquitylated substrates and the corresponding ubiquitination sites. Here, a new bioinformatics tool named CKSAAP_UbSite was developed to predict ubiquitination sites from protein sequences. With the assistance of Support Vector Machine (SVM), the highlight of CKSAAP_UbSite is to employ the composition of k-spaced amino acid pairs surrounding a query site (i.e. any lysine in a query sequence) as input. When trained and tested in the dataset of yeast ubiquitination sites (Radivojac et al, Proteins, 2010, 78: 365-380), a 100-fold cross-validation on a 1∶1 ratio of positive and negative samples revealed that the accuracy and MCC of CKSAAP_UbSite reached 73.40% and 0.4694, respectively. The proposed CKSAAP_UbSite has also been intensively benchmarked to exhibit better performance than some existing predictors, suggesting that it can be served as a useful tool to the community. Currently, CKSAAP_UbSite is freely accessible at http://protein.cau.edu.cn/cksaap_ubsite/. Moreover, we also found that the sequence patterns around ubiquitination sites are not conserved across different species. To ensure a reasonable prediction performance, the application of the current CKSAAP_UbSite should be limited to the proteome of yeast.  相似文献   

14.
The study of protein ubiquitination, a post-translational modification by ubiquitin, has emerged as one of the most active areas in biology because of the important role of this type of modification on the regulation of various cellular proteins. Advances in techniques for the determination and site mapping of protein ubiquitination can facilitate the elucidation of molecular mechanisms of this modification. We have recently described a novel method for identifying peptides containing ubiquitinated amino acid residues, based on the MALDI-MS/MS analysis of tryptic peptide derivatives. In particular, we have utilized N-terminal sulfonation of these peptides to provide a unique fragmentation pattern that leads to the direct identification and sequencing of ubiquitin modified peptides. Here we present an application of this new method on the characterization of ubiquitin conjugated C-terminal Hsc70-interacting protein (CHIP), a recently identified U-box containing E3 enzyme. Three peptides bearing ubiquitination sites have been identified from the digest of ubiquitinated CHIP; one of these was a site on CHIP, while the other two were found on the ubiquitin molecules, demonstrating that sulfonation of tryptic peptides is a general and efficient method for characterizing protein ubiquitination.  相似文献   

15.
Human and mouse LSP1 genes code for highly conserved phosphoproteins   总被引:4,自引:0,他引:4  
With use of the mouse LSP1 cDNA we isolated a human homologue of the mouse LSP1 gene from a human CTL cDNA library. The predicted protein sequence of human LSP1 is compared with the predicted mouse LSP1 protein sequence and regions of homology are identified in order to predict structural features of the LSP1 protein that might be important for its function. Both the human and mouse LSP1 proteins consist of two domains, an N-terminal acidic domain and a C-terminal basic domain. The C-terminal domains of the mouse and human LSP1 proteins are highly conserved and include several conserved, putative serine/threonine phosphorylation sites. Immunoprecipitation of LSP1 protein from 32P-orthophosphate-loaded cells show that both the mouse and human LSP1 proteins are phosphoproteins. The sequences of the putative Ca2(+)-binding sites present in the N-terminal domain of the mouse LSP1 protein are not conserved in the human LSP1 protein; however, a different Ca2(+)-binding site may exist in the human protein, indicating a functional conservation rather than a strict sequence conservation of the two proteins. The expression of the human LSP1 gene follows the same pattern as the expression of the mouse LSP1 gene. Southern analysis of human genomic DNA shows multiple LSP1-related fragments of varying intensity in contrast to the simple pattern found after similar analysis of mouse genomic DNA. By using different parts of the human LSP1 cDNA as a probe, we show that most of these multiple bands contain sequences homologous to the conserved C-terminal region of the LSP1 cDNA. This suggests that there are several LSP1-related genes present in the human genome.  相似文献   

16.
HIV-1 integration in the human genome favors active genes and local hotspots   总被引:73,自引:0,他引:73  
Schröder AR  Shinn P  Chen H  Berry C  Ecker JR  Bushman F 《Cell》2002,110(4):521-529
  相似文献   

17.
Protein ubiquitination is central to the regulation of various pathways in eukaryotes. The process of ubiquitination and its cellular outcome were investigated in hundreds of proteins to date. Despite this, the evolution of this regulatory mechanism has not yet been addressed comprehensively. Here, we quantify the rates of evolutionary changes of ubiquitination and SUMOylation (Small Ubiquitin-like MOdifier) sites. We estimate the time at which they first appeared, and compare them to acetylation and phosphorylation sites and to unmodified residues. We observe that the various modification sites studied exhibit similar rates. Mammalian ubiquitination sites are weakly more conserved than unmodified lysine residues, and a higher degree of relative conservation is observed when analyzing bona fide ubiquitination sites. Various reasons can be proposed for the limited level of excess conservation of ubiquitination, including shifts in locations of the sites, the presence of alternative sites, and changes in the regulatory pathways. We observe that disappearance of sites may be compensated by the presence of a lysine residue in close proximity, which is significant when compared to evolutionary patterns of unmodified lysine residues, especially in disordered regions. This emphasizes the importance of analyzing a window in the vicinity of functional residues, as well as the capability of the ubiquitination machinery to ubiquitinate residues in a certain region. Using prokaryotic orthologs of ubiquitinated proteins, we study how ubiquitination sites were formed, and observe that while sometimes sequence additions and rearrangements are involved, in many cases the ubiquitination machinery utilizes an already existing sequence without significantly changing it. Finally, we examine the evolution of ubiquitination, which is linked with other modifications, to infer how these complex regulatory modules have evolved. Our study gives initial insights into the formation of ubiquitination sites, their degree of conservation in various species, and their co-evolution with other posttranslational modifications.  相似文献   

18.
The non-homologous end-joining (NHEJ) pathway is a mechanism to repair DNA double strand breaks, which can introduce mutations at repair sites. We constructed new cellular systems to specifically analyze sequence modifications occurring at the repair site. In particular, we looked for the presence of telomeric repeats at the repair junctions, since our previous work indicated that telomeric sequences could be inserted at break sites in germ-line cells during primate evolution. To induce specific DNA breaks, we used the I-SceI system of Saccharomyces cerevisiae or digestion with restriction enzymes. We isolated human and hamster cell lines containing the I-SceI target site integrated in a single chromosomal locus and we exposed the cells to a continuous expression of the I-SceI endonuclease gene. Additionally, we isolated human cell lines that expressed constitutively the I-SceI endonuclease and we introduced the target site on an episomal plasmid stably transfected into the cells. These strategies allowed us to recover repair junctions in which the I-SceI target site was modified at high frequency (100% in hamster cells and about 70% in human cells). Finally, we analyzed junctions produced on an episomal plasmid linearized by restriction enzymes. In all the systems studied, sequence analysis of individual repair junctions showed that deletions were the most frequent modifications, being present in more than 80% of the junctions. On the episomal plasmids, the average deletion length was greater than at intrachromosomal sites. Insertions of nucleotides or deletions associated with insertions were rare events. Junction organization suggested different mechanisms of formation. To check for the insertion of telomeric sequences, we screened plasmid libraries representing about 3.5 x 10(5) junctions with a telomeric repeat probe. No positive clones were detected, suggesting that the addition of telomeric sequences during double strand break repair in somatic cells in culture is either a very rare event or does not occur at all.  相似文献   

19.
Protein–DNA interactions play important roles in many biological processes. To understand the molecular mechanisms of protein–DNA interaction, it is necessary to identify the DNA-binding sites in DNA-binding proteins. In the last decade, computational approaches have been developed to predict protein–DNA-binding sites based solely on protein sequences. In this study, we developed a novel predictor based on support vector machine algorithm coupled with the maximum relevance minimum redundancy method followed by incremental feature selection. We incorporated not only features of physicochemical/biochemical properties, sequence conservation, residual disorder, secondary structure, solvent accessibility, but also five three-dimensional (3D) structural features calculated from PDB data to predict the protein–DNA interaction sites. Feature analysis showed that 3D structural features indeed contributed to the prediction of DNA-binding site and it was demonstrated that the prediction performance was better with 3D structural features than without them. It was also shown via analysis of features from each site that the features of DNA-binding site itself contribute the most to the prediction. Our prediction method may become a useful tool for identifying the DNA-binding sites and the feature analysis described in this paper may provide useful insights for in-depth investigations into the mechanisms of protein–DNA interaction.  相似文献   

20.
Many alternative splice events result in subtle mRNA changes, and most of them occur at short-distance tandem donor and acceptor sites. The splicing mechanism of such tandem sites likely involves the stochastic selection of either splice site. While tandem splice events are frequent, it is unknown how many are functionally important. Here, we use phylogenetic conservation to address this question, focusing on tandems with a distance of 3-9 nucleotides. We show that previous contradicting results on whether alternative or constitutive tandem motifs are more conserved between species can be explained by a statistical paradox (Simpson's paradox). Applying methods that take biases into account, we found higher conservation of alternative tandems in mouse, dog, and even chicken, zebrafish, and Fugu genomes. We estimated a lower bound for the number of alternative sites that are under purifying (negative) selection. While the absolute number of conserved tandem motifs decreases with the evolutionary distance, the fraction under selection increases. Interestingly, a number of frameshifting tandems are under selection, suggesting a role in regulating mRNA and protein levels via nonsense-mediated decay (NMD). An analysis of the intronic flanks shows that purifying selection also acts on the intronic sequence. We propose that stochastic splice site selection can be an advantageous mechanism that allows constant splice variant ratios in situations where a deviation in this ratio is deleterious.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号