首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Protein disulfide isomerase (PDI) has been identified in a protein extract from the venom duct of the marine snail C. amadis. In-gel tryptic digestion of a thick protein band at approximately 55 kDa yields a mixture of peptides. Analysis of tryptic fragments by MALDI-MS/MS and LC-ESI-MS/MS methods permits sequence assignment. Three tryptic fragments yield two nine residue sequences (FVQDFLDGK and EPQLGDRVR ) and an eleven residue sequence (DQESTGALAFK ). Database analysis using peptides and were consistent with the sequence of PDI and peptide appears to be derived from a co-migrating protein. In identifying proteins based on the characterization of short peptide sequences the question arises about the reliability of identification using peptide fragments. Here we have also demonstrated the minimum length of peptide fragment necessary for unambiguous protein identification using fragments obtained from the experimentally derived sequences. Sequences of length > or =7 residues provide unambiguous identification in conjunction with protein molecular mass as a filter. The length of sequence necessary for unambiguous protein identification is also established using randomly chosen tryptic fragments from a standard dataset of proteins. The results are of significance in the identification of proteins from organisms with unsequenced genomes.  相似文献   

2.
3.
ABSTRACT: BACKGROUND: The detection of conserved residue clusters on a protein structure is one of the effective strategies for the prediction of functional protein regions. Various methods, such as Evolutionary Trace, have been developed based on this strategy. In such approaches, the conserved residues are identified through comparisons of homologous amino acid sequences. Therefore, the selection of homologous sequences is a critical step. It is empirically known that a certain degree of sequence divergence in the set of homologous sequences is required for the identification of conserved residues. However, the development of a method to select homologous sequences appropriate for the identification of conserved residues has not been sufficiently addressed. An objective and general method to select appropriate homologous sequences is desired for the efficient prediction of functional regions. RESULTS: We have developed a novel index to select the sequences appropriate for the identification of conserved residues, and implemented the index within our method to predict the functional regions of a protein. The implementation of the index improved the performance of the functional region prediction. The index represents the degree of conserved residue clustering on the tertiary structure of the protein. For this purpose, the structure and sequence information were integrated within the index by the application of spatial statistics. Spatial statistics is a field of statistics in which not only the attributes but also the geometrical coordinates of the data are considered simultaneously. Higher degrees of clustering generate larger index scores. We adopted the set of homologous sequences with the highest indexscore, under the assumption that the best prediction accuracy is obtained when the degree of clustering is the maximum. The set of sequences selected by the index led to higher functional region prediction performance than the sets of sequences selected by other sequence-based methods. CONCLUSIONS: Appropriate homologous sequences are selected automatically and objectively by the index. Such sequence selection improved the performance of functional region prediction. As far as we know, this is the first approach in which spatial statistics have been applied t o protein analyses. Such integration of structure and sequence information would be useful for other bioinformatics problems.  相似文献   

4.
Adrenodoxin reductase is an NADP dependent flavoenzyme which functions as the reductase of mitochondrial P 450 systems. We sequenced two adrenodoxin reductase cDNAs isolated from a bovine adrenal cortex cDNA library. The deduced amino acid sequence shows no similarity to the sequence of the microsomal P 450 systems or other known protein sequences. Nonetheless, by sequence analysis and c comparisons with known sequences of dinucleotide-binding folds of two NADP-binding flavoenzymes, two regions of adrenodoxin reductase sequence were identified as the FAD- and NADP-binding sites. These analyses revealed a consensus sequence for the NADP-binding dinucleotide fold (GXGXXAXXXAXXXXXXG, in one-letter amino acid code) that differs from FAD and NAD-binding dinucleotide-fold sequences. In the data base of protein sequences, the NADP-binding-site sequence appears solely in NADP-dependent enzymes, the binding sites of which were not known to date. Thus, this sequence may be used for identification of a certain type of NADP-binding site of enzymes that show no significant sequence similarity.  相似文献   

5.
Proteomics research is hampered in many organisms due to a lack of an appropriate reference genome sequence that can be used in the interpretation of tandem mass spectrometry data for the identification of proteins. Public DNA sequence repositories have grown to considerable size and can, in most cases, serve to provide at least partial interpretation of a large-scale proteomics dataset. However, when species-specific sequences or sequences from a closely related species are available, a boutique sequence database can provide considerable increases in specificity, confidence, and completeness of protein identification. Here, we describe the development of a protein database from a large-scale expressed sequence tag and full-length complementary DNA sequencing project in the economically and ecologically important spruce (Picea) genus.  相似文献   

6.
The protein identification resource (PIR).   总被引:58,自引:7,他引:51       下载免费PDF全文
The Protein Identification Resource, which provides the scientific community with an efficient on-line computer system designed for the identification and analysis of protein sequences and their corresponding coding sequences, has been established. The resource consists of an integrated computer system composed of a number of protein and nucleic acid sequence databases and the software necessary to analyze this information effectively.  相似文献   

7.
The use of mass spectrometry data to search molecular sequence databases is a well-established method for protein identification. The technique can be extended to searching raw genomic sequences, providing experimental confirmation or correction of predicted coding sequences, and has the potential to identify novel genes and elucidate splicing patterns.  相似文献   

8.
《Trends in biotechnology》2001,19(10):S17-S22
The use of mass spectrometry data to search molecular sequence databases is a well-established method for protein identification. The technique can be extended to searching raw genomic sequences, providing experimental confirmation or correction of predicted coding sequences, and has the potential to identify novel genes and elucidate splicing patterns.  相似文献   

9.
Peptide mass fingerprint (PMF) matching is a high-throughput method used for protein spot identification in connection with two-dimensional gel electrophoresis (2DE). However, the success of PMF matching largely depends on whether the proteins to be identified exist in the database searched. Consequently, it is often necessary to apply other more sophisticated but also time-consuming technologies to generate sequence-tags for definitive protein identification. On the other hand, modern sequencing technologies are generating a large quantity of DNA sequences, first in unfinished form or with low genome coverage due to the time-consuming and thus limiting steps of finishing and annotation. We recently started to sequence the genome of Bacillus megaterium DSM 319, a bacterium of industrial interest. In this study, we demonstrate that a protein database generated from merely three-fold coverage, unfinished genomic sequences of this bacterium allows a fast and reliable protein spot identification solely based on PMF from high-throughput MALDI-TOF MS analysis. We further show that the strain-specific protein database from low coverage genomic sequence greatly outperforms the commonly used cross-species databases constructed from 13 completely sequenced Bacillus strains for protein spot identification via PMF.  相似文献   

10.
A novel protein identification framework, PILOT_PROTEIN, has been developed to construct a comprehensive list of all unmodified proteins that are present in a living sample. It uses the peptide identification results from the PILOT_SEQUEL algorithm to initially determine all unmodified proteins within the sample. Using a rigorous biclustering approach that groups incorrect peptide sequences with other homologous sequences, the number of false positives reported is minimized. A sequence tag procedure is then incorporated along with the untargeted PTM identification algorithm, PILOT_PTM, to determine a list of all modification types and sites for each protein. The unmodified protein identification algorithm, PILOT_PROTEIN, is compared to the methods SEQUEST, InsPecT, X!Tandem, VEMS, and ProteinProspector using both prepared protein samples and a more complex chromatin digest. The algorithm demonstrates superior protein identification accuracy with a lower false positive rate. All materials are freely available to the scientific community at http://pumpd.princeton.edu .  相似文献   

11.
Protein design has become a powerful approach for understanding the relationship between amino acid sequence and 3-dimensional structure. In the past 5 years, there have been many breakthroughs in the development of computational methods that allow the selection of novel sequences given the structure of a protein backbone. Successful design of protein scaffolds has now paved the way for new endeavors to design function. The ability to design sequences compatible with a fold may also be useful in structural and functional genomics by expanding the range of proteins used for fold recognition and for the identification of functionally important domains from multiple sequence alignments.  相似文献   

12.
For the identification of novel proteins using MS/MS, de novo sequencing software computes one or several possible amino acid sequences (called sequence tags) for each MS/MS spectrum. Those tags are then used to match, accounting amino acid mutations, the sequences in a protein database. If the de novo sequencing gives correct tags, the homologs of the proteins can be identified by this approach and software such as MS-BLAST is available for the matching. However, de novo sequencing very often gives only partially correct tags. The most common error is that a segment of amino acids is replaced by another segment with approximately the same masses. We developed a new efficient algorithm to match sequence tags with errors to database sequences for the purpose of protein and peptide identification. A software package, SPIDER, was developed and made available on Internet for free public use. This paper describes the algorithms and features of the SPIDER software.  相似文献   

13.
M Ikeuchi  K Takio  Y Inoue 《FEBS letters》1989,242(2):263-269
High resolution gel electrophoresis in the low-molecular-mass region combined with electroblotting using polyvinylidene difluoride membranes enabled us to sequence the low-molecular-mass proteins of photosystem II membrane fragments from spinach and wheat. The determined N-terminal sequences, all showing considerable homology between the two plants, involved two newly determined sequences for the 4.1 kDa protein and one for the 5 kDa proteins. The sequence of the 4.1 kDa protein did not match any part of the chloroplast DNA sequence from tobacco or liverwort, suggesting that it is encoded by the nuclear genome. In contrast, the sequence of the 5 kDa protein matched ORF38, which is located just downstream of psbE and psbF in the chloroplast DNA and is assumed to be co-transcribed with them. These two components were associated with the O2-evolving core complex. Sequences of other low-molecular-mass proteins confirmed the previous identification as photosystem II components.  相似文献   

14.
A new method has been developed to predict the enzymatic attribute of proteins by hybridizing the gene product composition and pseudo amino acid composition. As a demonstration, a working dataset was generated with a cutoff of 60% sequence identity to avoid redundancy and bias in statistical prediction. The dataset thus constructed contains 39989 protein sequences, of which 27469 are non-enzymes and 12520 enzymes that were further classified into 6 enzyme family classes according to their 6 main EC (Enzyme Commission) numbers (2314 are oxidoreductases, 3653 transferases, 3246 hydrolases, 1307 lyases, 676 isomerases, and 1324 ligases). The overall success rate by the jackknife test for the identification between enzyme and non-enzyme was 94%, and that for the identification among the 6 enzyme family classes was 98%. It is anticipated that, with the rapid increase of protein sequences entering into databanks, the current method will become a useful automated tool in identifying the enzymatic attribute of a newly found protein sequence.  相似文献   

15.
The web application oriented on identification and visualization of protein regions encoded by exons is presented. The Exon Visualiser can be used for visualisation on different levels of protein structure: at the primary (sequence) level and secondary structures level, as well as at the level of tertiary protein structure. The programme is suitable for processing data for all genes which have protein expressions deposited in the PDB database. The procedure steps implemented in the application: I) loading exons sequences and theirs coordinates from GenBank file as well as protein sequences: CDS from GenBank and aminoacid sequence from PDB II) consensus sequence creation (comparing amino acid sequences form PDB file with the CDS sequence from GenBank file) III) matching exon coordinates IV) visualisation in 2D and 3D protein structures. Presented web-tool among others provides the color-coded graphical display of protein sequences and chains in three dimensional protein structures which are correlated with the corresponding exons.

Availability

http://149.156.12.53/ExonVisualiser/  相似文献   

16.
We describe two novel sequence similarity search algorithms, FASTS and FASTF, that use multiple short peptide sequences to identify homologous sequences in protein or DNA databases. FASTS searches with peptide sequences of unknown order, as obtained by mass spectrometry-based sequencing, evaluating all possible arrangements of the peptides. FASTF searches with mixed peptide sequences, as generated by Edman sequencing of unseparated mixtures of peptides. FASTF deconvolutes the mixture, using a greedy heuristic that allows rapid identification of high scoring alignments while reducing the total number of explored alternatives. Both algorithms use the heuristic FASTA comparison strategy to accelerate the search but use alignment probability, rather than similarity score, as the criterion for alignment optimality. Statistical estimates are calculated using an empirical correction to a theoretical probability. These calculated estimates were accurate within a factor of 10 for FASTS and 1000 for FASTF on our test dataset. FASTS requires only 15-20 total residues in three or four peptides to robustly identify homologues sharing 50% or greater protein sequence identity. FASTF requires about 25% more sequence data than FASTS for equivalent sensitivity, but additional sequence data are usually available from mixed Edman experiments. Thus, both algorithms can identify homologues that diverged 100 to 500 million years ago, allowing proteomic identification from organisms whose genomes have not been sequenced.  相似文献   

17.
To understand the molecular basis of glycosyltransferases' (GTFs) catalytic mechanism, extensive structural information is required. Here, fold recognition methods were employed to assign 3D protein shapes (folds) to the currently known GTF sequences, available in public databases such as GenBank and Swissprot. First, GTF sequences were retrieved and classified into clusters, based on sequence similarity only. Intracluster sequence similarity was chosen sufficiently high to ensure that the same fold is found within a given cluster. Then, a representative sequence from each cluster was selected to compose a subset of GTF sequences. The members of this reduced set were processed by three different fold recognition methods: 3D-PSSM, FUGUE, and GeneFold. Finally, the results from different fold recognition methods were analyzed and compared to sequence-similarity search methods (i.e., BLAST and PSI-BLAST). It was established that the folds of about 70% of all currently known GTF sequences can be confidently assigned by fold recognition methods, a value which is higher than the fold identification rate based on sequence comparison alone (48% for BLAST and 64% for PSI-BLAST). The identified folds were submitted to 3D clustering, and we found that most of the GTF sequences adopt the typical GTF A or GTF B folds. Our results indicate a lack of evidence that new GTF folds (i.e., folds other than GTF A and B) exist. Based on cases where fold identification was not possible, we suggest several sequences as the most promising targets for a structural genomics initiative focused on the GTF protein family.  相似文献   

18.
Protein structure prediction by comparative modeling benefits greatly from the use of multiple sequence alignment information to improve the accuracy of structural template identification and the alignment of target sequences to structural templates. Unfortunately, this benefit is limited to those protein sequences for which at least several natural sequence homologues exist. We show here that the use of large diverse alignments of computationally designed protein sequences confers many of the same benefits as natural sequences in identifying structural templates for comparative modeling targets. A large-scale massively parallelized application of an all-atom protein design algorithm, including a simple model of peptide backbone flexibility, has allowed us to generate 500 diverse, non-native, high-quality sequences for each of 264 protein structures in our test set. PSI-BLAST searches using the sequence profiles generated from the designed sequences ("reverse" BLAST searches) give near-perfect accuracy in identifying true structural homologues of the parent structure, with 54% coverage. In 41 of 49 genomes scanned using reverse BLAST searches, at least one novel structural template (not found by the standard method of PSI-BLAST against PDB) is identified. Further improvements in coverage, through optimizing the scoring function used to design sequences and continued application to new protein structures beyond the test set, will allow this method to mature into a useful strategy for identifying distantly related structural templates.  相似文献   

19.
Delahunty CM  Yates JR 《BioTechniques》2007,43(5):563, 565, 567 passim
Large-scale biology emerged out of the efforts to sequence genomes of important organisms. Based on resources created by whole genome sequencing, large-scale analyses of messenger RNA (mRNA) and protein expression are now possible. With the availability of large amounts of genomic sequence information, a convenient method for the identification and analysis of proteins based on proteolytic digestion into peptides emerged. Processes to fragment peptides using collision-activated dissociation (CAD) in tandem mass spectrometers and computer algorithms to match the tandem mass spectra of peptides to sequences in databases enable rapid identification of amino acid sequences, and hence proteins, present in mixtures. The inherent complexity of the peptide mixtures has necessitated improvements in methodology for mass spectrometry (MS) analysis of peptides.  相似文献   

20.
Sequence-based species identification relies on the extent and integrity of sequence data available in online databases such as GenBank. When identifying species from a sample of unknown origin, partial DNA sequences obtained from the sample are aligned against existing sequences in databases. When the sequence from the matching species is not present in the database, high-scoring alignments with closely related sequences might produce unreliable results on species identity. For species identification in mammals, the cytochrome b (cyt b) gene has been identified to be highly informative; thus, large amounts of reference sequence data from the cyt b gene are much needed. To enhance availability of cyt b gene sequence data on a large number of mammalian species in GenBank and other such publicly accessible online databases, we identified a primer pair for complete cyt b gene sequencing in mammals. Using this primer pair, we successfully PCR amplified and sequenced the complete cyt b gene from 40 of 44 mammalian species representing 10 orders of mammals. We submitted 40 complete, correctly annotated, cyt b protein coding sequences to GenBank. To our knowledge, this is the first single primer pair to amplify the complete cyt b gene in a broad range of mammalian species. This primer pair can be used for the addition of new cyt b gene sequences and to enhance data available on species represented in GenBank. The availability of novel and complete gene sequences as high-quality reference data can improve the reliability of sequence-based species identification.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号