首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
An algorithm for automatic clustering of database protein sequences from Bothrops jararacussu venomous gland, according to sequence similarities of their domains, is described. The program was written in C and Perl languages. This algorithm compares a domain with each ORF protein sequence in the database. Each nucleotide FASTA sequence generates six ORFs. As a result, the user has a list containing all sequences found in a specific domain and a display of the sequence, domain and number of hits. The algorithm lists only the sequences that present a minimum similarity of 30 hits and the best alignment. This limit was considered appropriate. The algorithm is available in the Internet (www.compbionet.org.br/cgi-domains/homesnake) and it can quickly and accurately organizes large database into classes.  相似文献   

2.
There is remarkable homology between the core structures of plasmin, a fibrin clot-degrading enzyme, and factor D, a complement-activating enzyme, despite markedly different biological functions. We postulated that sequence divergence in the loop structures between these two enzymes mediated the unique substrate and inhibitor interactions of plasmin. Recombinant microplasminogens chimerized with factor D sequences at loops 3, 5, and 7 were cleaved by the plasminogen activator urokinase and developed titratable active sites. Chimerization abolished functional interactions with the plasminogen activator streptokinase but did not block complex formation. The microplasmin chimeras showed enhanced resistance (k(i) decreased up to two to three times) to inactivation of microplasmin by alpha(2)-antiplasmin. Microplasmin chimerization had minimal ( approximately 2 fold) effects on the catalytic efficiency for cleavage of small substrates and did not alter the cleavage of fibrin. However, microplasmin and the microplasmin chimeras showed enhanced abilities to degrade fibrin in plasma clots suspended in human plasma. These studies indicate that loop regions of the protease domain of plasmin are important for interactions with substrates, regulatory molecules, and inhibitors. Because modification of these regions affected substrate and inhibitor interactions, loop chimerization may hold promise for improving the clot dissolving properties of this enzyme.  相似文献   

3.
We have generated a set of dual-reporter human cell lines and devised a chase protocol to quantify proteasomal degradation of a ubiquitin fusion degradation (UFD) substrate, a ubiquitin ligase CRL2(VHL) substrate, and a ubiquitin-independent substrate. Well characterized inhibitors that target different aspects of the ubiquitin-proteasome system can be distinguished by their distinctive patterns of substrate stabilization, enabling assignment of test compounds as inhibitors of the proteasome, ubiquitin chain formation or perception, CRL activity, or the UFD-p97 pathway. We confirmed that degradation of the UFD but not the CRL2(VHL) or ubiquitin-independent substrates depends on p97 activity. We optimized our suite of assays to establish conditions suitable for high-throughput screening and then validated their performance by screening against 160 cell-permeable protein kinase inhibitors. This screen identified Syk inhibitor III as an irreversible p97/vasolin containing protein inhibitor (IC(50) = 1.7 μM) that acts through Cys-522 within the D2 ATPase domain. Our work establishes a high-throughput screening-compatible pipeline for identification and classification of small molecules, cDNAs, or siRNAs that target components of the ubiquitin-proteasome system.  相似文献   

4.

Background

By using a standard Support Vector Machine (SVM) with a Sequential Minimal Optimization (SMO) method of training, Naïve Bayes and other machine learning algorithms we are able to distinguish between two classes of protein sequences: those folding to highly-designable conformations, or those folding to poorly- or non-designable conformations.

Results

First, we generate all possible compact lattice conformations for the specified shape (a hexagon or a triangle) on the 2D triangular lattice. Then we generate all possible binary hydrophobic/polar (H/P) sequences and by using a specified energy function, thread them through all of these compact conformations. If for a given sequence the lowest energy is obtained for a particular lattice conformation we assume that this sequence folds to that conformation. Highly-designable conformations have many H/P sequences folding to them, while poorly-designable conformations have few or no H/P sequences. We classify sequences as folding to either highly – or poorly-designable conformations. We have randomly selected subsets of the sequences belonging to highly-designable and poorly-designable conformations and used them to train several different standard machine learning algorithms.

Conclusion

By using these machine learning algorithms with ten-fold cross-validation we are able to classify the two classes of sequences with high accuracy – in some cases exceeding 95%.
  相似文献   

5.
Various sources of protein data, such as knowledgebases and scientific literature, are currently available, as are numerous tools for their analysis. The matter becomes one of choosing the tools that are most appropriate for the specific task and for the specific proteins. A combination of standard and alternative tools may lead to biologically significant results.Here, a computational classification of proteins is made using standard multiple sequence alignment in combination with an alternative method for analysis of hydropathy distribution in proteins. Both of these methods are applied to the Na+/Cl-dependent neurotransmitter symporters (NSSs), resulting in two alternative classifications. The classifications are validated and interpreted biologically by literature and knowledgebase annotation mining, producing a consensus classification. The classification leads to the identification and functional characterization of three families of largely structurally and functionally uncharacterized orphan NSSs. The literature and knowledgebase annotations are mined to functionally characterize the NSSs in these families. The presented work also demonstrates that, in specific cases, the analysis of the hydropathy distribution in proteins is capable of revealing functional properties of proteins.  相似文献   

6.
MOTIVATION: Misfolding of membrane proteins plays an important role in many human diseases such as retinitis pigmentosa, hereditary deafness and diabetes insipidus. Little is known about membrane proteins as there are only very few high-resolution structures. Single-molecule force spectroscopy is a novel technique, which measures the force necessary to pull a protein out of a membrane. Such force curves contain valuable information on the protein structure, conformation, and inter- and intra-molecular forces. High-throughput force spectroscopy experiments generate hundreds of force curves including spurious ones and good curves, which correspond to different unfolding pathways. Manual analysis of these data is a bottleneck and source of inconsistent and subjective annotation. RESULTS: We propose a novel algorithm for the identification of spurious curves and curves representing different unfolding pathways. Our algorithm proceeds in three stages: first, we reduce noise in the curves by applying dimension reduction; second, we align the curves with dynamic programming and compute pairwise distances and third, we cluster the curves based on these distances. We apply our method to a hand-curated dataset of 135 force curves of bacteriorhodopsin mutant P50A. Our algorithm achieves a success rate of 81% distinguishing spurious from good curves and a success rate of 76% classifying unfolding pathways. As a result, we discuss five different unfolding pathways of bacteriorhodopsin including three main unfolding events and several minor ones. Finally, we link folding barriers to the degree of conservation of residues. Overall, the algorithm tackles the force spectroscopy bottleneck and leads to more consistent and reproducible results paving the way for high-throughput analysis of structural features of membrane proteins.  相似文献   

7.
MOTIVATION: Low-complexity or cryptically simple sequences are widespread in protein sequences but their evolution and function are poorly understood. To date methods for the detection of low complexity in proteins have been directed towards the filtering of such regions prior to sequence homology searches but not to the analysis of the regions per se. However, many of these regions are encoded by non-repetitive DNA sequences and may therefore result from selection acting on protein structure and/or function. RESULTS: We have developed a new tool, based on the SIMPLE algorithm, that facilitates the quantification of the amount of simple sequence in proteins and determines the type of short motifs that show clustering above a certain threshold. By modifying the sensitivity of the program simple sequence content can be studied at various levels, from highly organised tandem structures to complex combinations of repeats. We compare the relative amount of simplicity in different functional groups of yeast proteins and determine the level of clustering of the different amino acids in these proteins. AVAILABILITY: The program is available on request or online at http://www.biochem.ucl.ac.uk/bsm/SIMPLE.  相似文献   

8.
9.
Proteins employ a wide variety of folds to perform their biological functions. How are these folds first acquired? An important step toward answering this is to obtain an estimate of the overall prevalence of sequences adopting functional folds. Since tertiary structure is needed for a typical enzyme active site to form, one way to obtain this estimate is to measure the prevalence of sequences supporting a working active site. Although the immense number of sequence combinations makes wholly random sampling unfeasible, two key simplifications may provide a solution. First, given the importance of hydrophobic interactions to protein folding, it seems likely that the sample space can be restricted to sequences carrying the hydropathic signature of a known fold. Second, because folds are stabilized by the cooperative action of many local interactions distributed throughout the structure, the overall problem of fold stabilization may be viewed reasonably as a collection of coupled local problems. This enables the difficulty of the whole problem to be assessed by assessing the difficulty of several smaller problems. Using these simplifications, the difficulty of specifying a working beta-lactamase domain is assessed here. An alignment of homologous domain sequences is used to deduce the pattern of hydropathic constraints along chains that form the domain fold. Starting with a weakly functional sequence carrying this signature, clusters of ten side-chains within the fold are replaced randomly, within the boundaries of the signature, and tested for function. The prevalence of low-level function in four such experiments indicates that roughly one in 10(64) signature-consistent sequences forms a working domain. Combined with the estimated prevalence of plausible hydropathic patterns (for any fold) and of relevant folds for particular functions, this implies the overall prevalence of sequences performing a specific function by any domain-sized fold may be as low as 1 in 10(77), adding to the body of evidence that functional folds require highly extraordinary sequences.  相似文献   

10.
Advancements in sequencing technologies have witnessed an exponential rise in the number of newly found enzymes. Enzymes are proteins that catalyze bio-chemical reactions and play an important role in metabolic pathways. Commonly, function of such enzymes is determined by experiments that can be time consuming and costly. Hence, a need for a computing method is felt that can distinguish protein enzyme sequences from those of non-enzymes and reliably predict the function of the former. To address this problem, approaches that cluster enzymes based on their sequence and structural similarity have been presented. But, these approaches are known to fail for proteins that perform the same function and are dissimilar in their sequence and structure. In this article, we present a supervised machine learning model to predict the function class and sub-class of enzymes based on a set of 73 sequence-derived features. The functional classes are as defined by International Union of Biochemistry and Molecular Biology. Using an efficient data mining algorithm called random forest, we construct a top-down three layer model where the top layer classifies a query protein sequence as an enzyme or non-enzyme, the second layer predicts the main function class and bottom layer further predicts the sub-function class. The model reported overall classification accuracy of 94.87% for the first level, 87.7% for the second, and 84.25% for the bottom level. Our results compare very well with existing methods, and in many cases report better performance. Using feature selection methods, we have shown the biological relevance of a few of the top rank attributes.  相似文献   

11.
Alignment of amino-acid sequences from the N-terminal and C-terminal halves of transferrin-binding protein B revealed an underlying bilobed nature with several regions of identity. Based on this analysis, purified recombinant fusion proteins of maltose-binding protein (Mbp) with intact TbpB, its N-terminal half or C-terminal half from the human pathogens Neisseria meningitidis and Moraxella catarrhalis were produced. Solid-phase binding assays and affinity isolation assays demonstrated that the N-terminal and C-terminal halves of TbpB could bind independently to human transferrin (hTf). A solid-phase overlapping synthetic peptide library representing the amino-acid sequence of hTf was probed with soluble, labelled Mbp-TbpB fusions to localize TbpB-binding regions on hTf. An essentially identical series of peptides from domains within both lobes of hTf was recognized by intact TbpB from both organisms, demonstrating a conserved TbpB-hTf interaction. Both halves of TbpB from N. meningitidis bound the same series of peptides, which included peptides from equivalent regions on the two hTf lobes, indicating that TbpB interacts with each lobe of hTf in a similar manner. Mapping of the peptide-binding regions on a molecular model of hTf revealed a series of nearly adjacent surface regions that nearly encircled each lobe. Binding studies with chimeric hTf/bTf transferrins demonstrated that regions in the C-lobe of hTf were preferentially recognized by the N-terminal half of TbpB. Collectively, these results provide evidence that TbpB consists of two lobes, each with distinct yet homologous Tf-binding regions.  相似文献   

12.
13.
We present a homology scanning microcomputer program to predict functional T-cell epitopes within proteins. By taking into account particular human or mouse restriction elements the predictions are made haplotype-specific. The generality of this approach is confirmed by (i) identification of well-characterized immunogenic T-cell determinants in lysozyme (ii) search for potential T epitopes on unanalysed proteins like the human beta 2-adrenoreceptor (iii) modification of non-immunogenic peptide sequences in order to generate T-cell determinants.  相似文献   

14.
Many proteins assemble as oligomeric complexes and in several cases a distinct domain mediates the interaction between the subunits. The identification of new oligomerization modules is relevant to comprehend both the architecture and the evolution of protein sequences and also for protein engineering applications. Using the bacteriophage lambda repressor dimerization assay, we searched Escherichia coli genomic libraries for sequences able to mediate protein oligomerization in vivo. We identified short peptides that can substitute very effectively the dimerizing domain of the repressor. Most of these peptides belong to open reading frames that are normally not expressed in the bacterial cell.  相似文献   

15.
Association of an atypical protein kinase C (aPKC) with an adapter protein can affect the location, activity, substrate specificity, and physiological role of the phosphotransferase. Knowledge of mechanisms that govern formation and intracellular targeting of aPKC.adapter protein complexes is limited. Caenorhabditis elegans protein kinase C adapter proteins (CKA1 and CKA1S) bind and target aPKCs and provide prototypes for mechanistic analysis. CKA1 binds an aPKC (PKC3) via a phosphotyrosine binding (PTB) domain. A distinct, Arg/Lys-rich N-terminal region targets CKA1 to the cell periphery. We discovered that a short segment ((212)GGIDNGAFHEHEI(224)) of the V(2) (linker) region of PKC3 creates a binding surface that interacts with the PTB domain of CKA1/CKA1S. The docking domain of PKC3 differs from classical PTB ligands by the absence of Tyr and Pro. Substitution of Ile(214), Asn(216), or Phe(219) with Ala abrogates binding of PKC3 with CKA1; these residues cooperatively configure a docking site that complements an apolar surface of the CKA1 PTB domain. Phosphorylation site domains (PSD1, residues 11-25; PSD2, residues 61-77) in CKA1 route the adapter (and tethered PKC3) to the cell periphery. Phosphorylation of Ser(17) and Ser(65) in PSDs 1 and 2 elicits translocation of CKA1 from the cell surface to cytoplasm. Activities of DAG-stimulated PKCs and opposing protein Ser/Thr phosphatases can dynamically regulate the distribution of adapter protein between the cell periphery and cytoplasm.  相似文献   

16.

Background  

Comparative genomics methods such as phylogenetic profiling can mine powerful inferences from inherently noisy biological data sets. We introduce Sites Inferred by Metabolic Background Assertion Labeling (SIMBAL), a method that applies the Partial Phylogenetic Profiling (PPP) approach locally within a protein sequence to discover short sequence signatures associated with functional sites. The approach is based on the basic scoring mechanism employed by PPP, namely the use of binomial distribution statistics to optimize sequence similarity cutoffs during searches of partitioned training sets.  相似文献   

17.
Although the small-subunit ribosomal RNA (SSU rRNA) gene is widely used in the molecular systematics, few large-subunit (LSU) rRNA gene sequences are known from protostome animals, and the value of the LSU gene for invertebrate systematics has not been explored. The goal of this study is to test whether combined LSU and SSU rRNA gene sequences support the division of protostomes into Ecdysozoa (molting forms) and Lophotrochozoa, as was proposed by Aguinaldo et al. (1997) (Nature 387:489) based on SSU rRNA sequences alone. Nearly complete LSU gene sequences were obtained, and combined LSU + SSU sequences were assembled, for 15 distantly related protostome taxa plus five deuterostome outgroups. When the aligned LSU + SSU sequences were analyzed by tree-building methods (minimum evolution analysis of LogDet-transformed distances, maximum likelihood, and maximum parsimony) and by spectral analysis of LogDet distances, both Ecdysozoa and Lophotrochozoa were indeed strongly supported (e.g., bootstrap values >90%), with higher support than from the SSU sequences alone. Furthermore, with the LogDet-based methods, the LSU + SSU sequences resolved some accepted subgroups within Ecdysozoa and Lophotrochozoa (e.g., the polychaete sequence grouped with the echiuran, and the annelid sequences grouped with the mollusc and lophophorates)-subgroups that SSU-based studies do not reveal. Also, the mollusc sequence grouped with the sequences from lophophorates (brachiopod and phoronid). Like SSU sequences, our LSU + SSU sequences contradict older hypotheses that grouped annelids with arthropods as Articulata, that said flatworms and nematodes were basal bilateralians, and considered lophophorates, nemerteans, and chaetognaths to be deuterostomes. The position of chaetognaths within protostomes remains uncertain: our chaetognath sequence associated with that of an onychophoran, but this was unstable and probably artifactual. Finally, the benefits of combining LSU with SSU sequences for phylogenetic analyses are discussed: LSU adds signal, it can be used at lower taxonomic levels, and its core region is easy to align across distant taxa-but its base frequencies tend to be nonstationary across such taxa. We conclude that molecular systematists should use combined LSU + SSU rRNA genes rather than SSU alone.  相似文献   

18.
The Escherichia coli McrA protein, a putative C5-methylcytosine/C5-hydroxyl methylcytosine-specific nuclease, binds DNA with symmetrically methylated HpaII sequences (Cm5CGG), but its precise recognition sequence remains undefined. To determine McrA’s binding specificity, we cloned and expressed recombinant McrA with a C-terminal StrepII tag (rMcrA-S) to facilitate protein purification and affinity capture of human DNA fragments with m5C residues. Sequence analysis of a subset of these fragments and electrophoretic mobility shift assays with model methylated and unmethylated oligonucleotides suggest that N(Y > R) m5CGR is the canonical binding site for rMcrA-S. In addition to binding HpaII-methylated double-stranded DNA, rMcrA-S binds DNA containing a single, hemimethylated HpaII site; however, it does not bind if A, C, T or U is placed across from the m5C residue, but does if I is opposite the m5C. These results provide the first systematic analysis of McrA’s in vitro binding specificity.  相似文献   

19.
20.
MOTIVATION: A central problem in genomics is to determine the function of a protein using the information contained in its amino acid sequence. Variable length Markov chains (VLMC) are a promising class of models that can effectively classify proteins into families and they can be estimated in linear time and space. RESULTS: We introduce a new algorithm, called Sparse Probabilistic Suffix Trees (SPST), that identifies equivalence between the contexts of a VLMC. We show that, in many cases, the identification of these equivalence can improve the classification rate of the classical Probabilistic Suffix Trees (PST) algorithm. We also show that better classification can be achieved by identifying representative fingerprints in the amino acid chains, and this variation in the SPST algorithm is called F-SPST.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号