首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
Chou KC 《Proteins》2001,42(1):136-139
Protein signal sequences play a central role in the targeting and translocation of nearly all secreted proteins and many integral membrane proteins in both prokaryotes and eukaryotes. The knowledge of signal sequences has become a crucial tool for pharmaceutical scientists who genetically modify bacteria, plants, and animals to produce effective drugs. However, to effectively use such a tool, the first important thing is to find a fast and effective method to identify the "zipcode" entity; this is also evoked by both the huge amount of unprocessed data available and the industrial need to find more effective vehicles for the production of proteins in recombinant systems. In view of this, a sequence-encoded algorithm was developed to identify the signal sequences and predict their cleavage sites. The rate of correct prediction for 1,939 secretory proteins and 1,440 nonsecretory proteins by self-consistency test is 90.14% and that by jackknife test is 90.13%. The encouraging results indicate that the signal sequences share some common features although they lack similarity in sequence, length, and even composition and that they are predictable to a considerably accurate extent.  相似文献   

Functioning as an "address tag" or "zip code" that guides nascent proteins (newly synthesized proteins in the cytosol) to wherever they are needed, signal peptides (also called targeting signals or signal sequences) have become a crucial tool in finding new drugs or reprogramming cells for gene therapy. To effectively and timely use such a tool, however, the first important thing is to develop an automated method for quickly and accurately identifying the signal peptide for a given nascent protein. With the avalanche of new protein sequences generated in the post-genomic era, the challenge has become even more urgent and critical. In this paper, five statistical rulers were derived via performing a mutual information analysis. By combining these statistical rulers, a new prediction algorithm was established and high success prediction rates were observed. The new algorithm may play a complementary role to the existing algorithms in this area. It is anticipated that the mutual information approach introduced here may be very useful for studying many other sequence-coupling problems in molecular biology as well.  相似文献   

Prediction of splice junctions in mRNA sequences.   总被引:8,自引:6,他引:2       下载免费PDF全文
K Nakata  M Kanehisa    C DeLisi 《Nucleic acids research》1985,13(14):5327-5340
A general method based on the statistical technique of discriminant analysis is developed to distinguish boundaries of coding and non-coding regions in nucleic acid sequences. In particular, the method is applied to the prediction of splicing sites in messenger RNA precursors. Information used for discrimination includes consensus sequence patterns around splice junctions, free energy of snRNA and mRNA base pairing, and statistical differences between coding and non-coding regions such as periodic appearance of specific bases in coding regions reflecting the non-random usage of degenerate codons. Given the reading frame of an exon (but not the exon/intron boundaries), the method will predict the following exon, namely, the intron to be excised out. When applied to human sequences in the GenBank database, the method correctly identified 80% of true splice junctions.  相似文献   

For sustainable development, biodiversity conservation and life-quality improvement must be simultaneously considered. Molecular techniques have greatly impacted biotechnology. These methods have, in particular, improved the capability to investigate the fine differences among organisms and, as a consequence, to better investigate the effects on environmental factors on them. We propose an approach to support the optimal selection of molecular probes for barcoding application in many biotechnological fields. The aim of our work is specificity maximization. To this purpose, we have integrated a filter system based on wavelet transforms with biological knowledge about the sequence proneness to mutation and post-translational modification. Specifically, we have tested the proposed method on ITS1 sequences that are a region of the rRNA locus. Our analysis has shown the presence of other local relative stable conformations in addition to known cleavage site. Their characteristics differ within the group of mammals selected for our analysis. These variations could be used to design new species-specific barcoding probes or other quick molecular screening tools.  相似文献   

The prediction of protein secondary structure (alpha-helices, beta-sheets and coil) is improved by 9% to 66% using the information available from a family of homologous sequences. The approach is based both on averaging the Garnier et al. (1978) secondary structure propensities for aligned residues and on the observation that insertions and high sequence variability tend to occur in loop regions between secondary structures. Accordingly, an algorithm first aligns a family of sequences and a value for the extent of sequence conservation at each position is obtained. This value modifies a Garnier et al. prediction on the averaged sequence to yield the improved prediction. In addition, from the sequence conservation and the predicted secondary structure, many active site regions of enzymes can be located (26 out of 43) with limited over-prediction (8 extra). The entire algorithm is fully automatic and is applicable to all structural classes of globular proteins.  相似文献   

Identification of functionally important sites (FIS) in proteins is a critical problem and can have profound importance where protein structural information is limited. Machine learning techniques have been very useful in successful classification of many important biological problems. In this paper, we adopt the sparse kernel least squares classifiers (SKLSC) approach for classification and/or prediction of FIS using protein sequence derived features. The SKLSC algorithm was applied to 5435 FIS that have been extracted from 312 reliable alignments for a wide range of protein families. We obtained 68.28% sensitivity and 68.66% specificity for training dataset and 65.34% sensitivity and 66.88% specificity for testing dataset. Further, large scale benchmarking study using alignments of 101 protein families containing 1899 FIS showed that our method achieved an average ∼70% sensitivity in predicting different types of FIS, such as active sites, metal, ligand or protein binding sites. Our findings also indicate that active sites and metal binding sites are comparably easier to predict compared to the ligand and protein binding sites. Despite moderate success, our results suggest the usefulness and potential of SKLSC approach in prediction of FIS using only protein sequence derived information.  相似文献   

MOTIVATION: Motivated by the abundance, importance and unique functionality of zinc, both biologically and physiologically, we have developed an improved method for the prediction of zinc-binding sites in proteins from their amino acid sequences. RESULTS: By combining support vector machine (SVM) and homology-based predictions, our method predicts zinc-binding Cys, His, Asp and Glu with 75% precision (86% for Cys and His only) at 50% recall according to a 5-fold cross-validation on a non-redundant set of protein chains from the Protein Data Bank (PDB) (2727 chains, 235 of which bind zinc). Consequently, our method predicts zinc-binding Cys and His with 10% higher precision at different recall levels compared to a recently published method when tested on the same dataset. AVAILABILITY: The program is available for download at www.fos.su.se/~nanjiang/zincpred/download/  相似文献   

We predicted gamma-turns from amino acid sequences using the first-order Markov chain theory and enlarged representative data sets corresponding to protein chains selected from the Protein Data Bank (PDB). The following data sets were used for training and deriving the probability values: (1) an initial data set containing 315 protein chains comprising 904 gamma-turns and (2) a later data set in order to include new entries in the PDB, containing 434 protein chains and comprising 1053 gamma-turns. By excluding 93 protein chains that were common to these two training data sets, we generated two mutually exclusive data sets containing 222 and 341 protein chains for testing our predictions. Applying amino acid probability values derived from training data sets on to testing data sets yielded overall prediction accuracies in the range 54-57%. We recommend the use of probability values derived from the data set comprising 315 protein chains that represents more gamma-turns and also provides better predictions.  相似文献   

Isoleucine:RNA sites with associated coding sequences.   总被引:6,自引:3,他引:3       下载免费PDF全文

DNA sequences at immunoglobulin switch region recombination sites.   总被引:21,自引:0,他引:21       下载免费PDF全文
The immunoglobulin heavy chain switch from synthesis of IgM to IgG, IgA or IgE is mediated by a DNA recombination event. Recombination occurs within switch regions, 2-10 kb segments of DNA that lie upstream of heavy chain constant region genes. A compilation of DNA sequences at more than 150 recombination sites within heavy chain switch regions is presented. Switch recombination does not appear to occur by homologous recombination. An extensive search for a recognition motif failed to find such a sequence, implying that switch recombination is not a site-specific event. A model for switch recombination that involves illegitimate priming of one switch region on another, followed by error-prone DNA synthesis, is proposed.  相似文献   

Identifying potential tRNA genes in genomic DNA sequences.   总被引:16,自引:0,他引:16  
We have developed an algorithm that automatically and reproducibly identifies potential tRNA genes in genomic DNA sequences, and we present a general strategy for testing the sensitivity of such algorithms. This algorithm is useful for the flagging and characterization of long genomic sequences that have not been experimentally analyzed for identification of functional regions, and for the scanning of nucleotide sequence databases for errors in the sequences and the functional assignments associated with them. In an exhaustive scan of the GenBank database, 97.5% of the 744 known tRNA genes were correctly identified (true-positives), and 42 previously unidentified sequences were predicted to be tRNAs. A detailed analysis of these latter predictions reveals that 16 of the 42 are very similar to known tRNA genes, and we predict that they do, in fact, code for tRNA, yielding a false-positive rate for the algorithm of 0.003%. The new algorithm and testing strategy are a considerable improvement over any previously described strategies for recognizing tRNA genes, and they allow detections of genes (including introns) embedded in long genomic sequences.  相似文献   

In mammals, the esterification of sterols by ACAT plays a critical role in eukaryotic lipid homeostasis. Using the predominant isoform of the yeast ACAT-related enzyme family, Are2p, as a model, we targeted phylogenetically conserved sequences for mutagenesis in order to identify functionally important motifs. Deletion, truncation, and missense mutations implicate a regulatory role for the amino-terminal domain of Are2p and identified two carboxyl-terminal motifs as required for catalytic activity. A serine-to-leucine mutation in the (H/Y)SF motif (residues 338-340), unique to sterol esterification enzymes, nullified the activity and stability of yeast Are2p. Similarly, a tyrosine-to-alanine change in the FYxDWWN motif of Are2p (residues 523-529) produced an enzyme with decreased activity and apparent affinity for oleoyl-CoA. Mutagenesis of the tryptophan residues in this motif completely abolished activity. In human ACAT1, mutagenesis of the corresponding motifs (residues 268-270, and 403-409, respectively) also nullified enzymatic activity. On the basis of their critical roles in enzymatic activity and their sequence conservation, we propose that these motifs mediate sterol and acyl-CoA binding by this class of enzymes.  相似文献   

In this paper we address the problem of extracting features relevant for predicting protein--protein interaction sites from the three-dimensional structures of protein complexes. Our approach is based on information about evolutionary conservation and surface disposition. We implement a neural network based system, which uses a cross validation procedure and allows the correct detection of 73% of the residues involved in protein interactions in a selected database comprising 226 heterodimers. Our analysis confirms that the chemico-physical properties of interacting surfaces are difficult to distinguish from those of the whole protein surface. However neural networks trained with a reduced representation of the interacting patch and sequence profile are sufficient to generalize over the different features of the contact patches and to predict whether a residue in the protein surface is or is not in contact. By using a blind test, we report the prediction of the surface interacting sites of three structural components of the Dnak molecular chaperone system, and find close agreement with previously published experimental results. We propose that the predictor can significantly complement results from structural and functional proteomics.  相似文献   

The number of segregating sites provides an indicator of the degree of DNA sequence variation that is present in a sample, and has been of great interest to the biological, pharmaceutical and medical professions. In this paper, we first provide linear- and expected-sublinear-time algorithms for finding all the segregating sites of a given set of DNA sequences. We also describe a data structure for tracking segregating sites in a set of sequences, such that every time the set is updated with the insertion of a new sequence or removal of an existing one, the segregating sites are updated accordingly without the need to re-scan the entire set of sequences.  相似文献   

Summary. We have previously established a transgenic Drosophila line with a highly transposable P element insertion. Using this strain we analyzed transposition and excision of the P element at the molecular level. We examined sequences flanking the new insertion sites and those of the remnants after excision. Our results on mobilization of the P element demonstrate that target-site duplication at the original insertion site does not play a role in forward excision and transposition. After P element excision an 8 by target-site duplication and part of the 31 by terminal inverted repeat (5–18 bp) remained in all the strains examined. Moreover, in 11 out of 28 strains, extra sequences were found between the two remaining inverted repeats. The double-strand gap repair model does not explain the origin of these extra sequences. The mechanism creating them may be similar to the hairpin model proposed for the transposon Tam in Antirrhinum majus.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号