首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Babnigg G  Giometti CS 《Proteomics》2006,6(16):4514-4522
In proteome studies, identification of proteins requires searching protein sequence databases. The public protein sequence databases (e.g., NCBInr, UniProt) each contain millions of entries, and private databases add thousands more. Although much of the sequence information in these databases is redundant, each database uses distinct identifiers for the identical protein sequence and often contains unique annotation information. Users of one database obtain a database-specific sequence identifier that is often difficult to reconcile with the identifiers from a different database. When multiple databases are used for searches or the databases being searched are updated frequently, interpreting the protein identifications and associated annotations can be problematic. We have developed a database of unique protein sequence identifiers called Sequence Globally Unique Identifiers (SEGUID) derived from primary protein sequences. These identifiers serve as a common link between multiple sequence databases and are resilient to annotation changes in either public or private databases throughout the lifetime of a given protein sequence. The SEGUID Database can be downloaded (http://bioinformatics.anl.gov/SEGUID/) or easily generated at any site with access to primary protein sequence databases. Since SEGUIDs are stable, predictions based on the primary sequence information (e.g., pI, Mr) can be calculated just once; we have generated approximately 500 different calculations for more than 2.5 million sequences. SEGUIDs are used to integrate MS and 2-DE data with bioinformatics information and provide the opportunity to search multiple protein sequence databases, thereby providing a higher probability of finding the most valid protein identifications.  相似文献   

2.
Computational methods seeking to automatically determine the properties (functional, structural, physicochemical, etc.) of a protein directly from the sequence have long been the focus of numerous research groups. With the advent of advanced sequencing methods and systems, the number of amino acid sequences that are being deposited in the public databases has been increasing steadily. This has in turn generated a renewed demand for automated approaches that can annotate individual sequences and complete genomes quickly, exhaustively and objectively. In this paper, we present one such approach that is centered around and exploits the Bio-Dictionary, a collection of amino acid patterns that completely covers the natural sequence space and can capture functional and structural signals that have been reused during evolution, within and across protein families. Our annotation approach also makes use of a weighted, position-specific scoring scheme that is unaffected by the over-representation of well-conserved proteins and protein fragments in the databases used. For a given query sequence, the method permits one to determine, in a single pass, the following: local and global similarities between the query and any protein already present in a public database; the likeness of the query to all available archaeal/ bacterial/eukaryotic/viral sequences in the database as a function of amino acid position within the query; the character of secondary structure of the query as a function of amino acid position within the query; the cytoplasmic, transmembrane or extracellular behavior of the query; the nature and position of binding domains, active sites, post-translationally modified sites, signal peptides, etc. In terms of performance, the proposed method is exhaustive, objective and allows for the rapid annotation of individual sequences and full genomes. Annotation examples are presented and discussed in Results, including individual queries and complete genomes that were released publicly after we built the Bio-Dictionary that is used in our experiments. Finally, we have computed the annotations of more than 70 complete genomes and made them available on the World Wide Web at http://cbcsrv.watson.ibm.com/Annotations/.  相似文献   

3.
序列同源性分析软件Blast的WEB界面构建及其应用   总被引:5,自引:1,他引:4  
基于局域网(Intranet)内的PC/Linux服务器, 构建了序列同源性分析软件Blast的WEB界面. 局域网内的所有计算机均可通过WEB方式访问该服务器进行公共数据库和自建数据库的查询,具有保密、高效、免费的优点,能够满足实验室和研究院所的大规模、快速数据分析任务.  相似文献   

4.
Identification of modified amino acids can be a challenging part for Edman degradation sequence analysis, largely because they are not included among the commonly used phenylthiohydantion amino acid standards. Yet many can have unique retention times and can be assigned by an experienced researcher or through the use of a guide showing their typical chromatography characteristics. The Edman Sequencing Research Group (ESRG) 2005 study is a continuation of the 2004 study, in which the participating laboratories were provided a synthetic peptide and asked to identify the modified amino acids present in the sequence. The study sample provided an opportunity to sequence a peptide containing a variety of modified amino acids and note their retention times relative to the common amino acids. It also allowed the ESRG to compile the chromatographic properties and intensities from multiple instruments and tabulate an average elution position for these modified amino acids on commonly used instruments. Participating laboratories were given 2000 pmoles of a synthetic peptide, 18 amino acids long, containing the following modified amino acids: dimethyl- and trimethyl-lysine, 3-methyl-histidine, N-carbamyl-lysine, cystine, N-methyl-alanine, and isoaspartic acid. The modified amino acids were interspersed with standard amino acids to help in the assessment of initial and repetitive yields. In addition to filling in an assignment sheet, which included retention times and peak areas, participants were asked to provide specific details about the parameters used for the sequencing run. References for some of the modified amino acid elution characteristics were provided and the participants had the option of viewing a list of the modified amino acids present in the peptide at the ESRG Web site. The ABRF ESRG 2005 sample is the seventeenth in a series of studies designed to aid laboratories in evaluating their abilities to obtain and interpret amino acid sequence data.  相似文献   

5.
Complete sequence determination of gene 18 encoding the tail sheath protein was carried out mainly by the Maxam-Gilbert method. Approximately 40 peptides contained in a tryptic digest and a lysyl endopeptidase digest of gp 18 were isolated by reversed-phase high-performance liquid chromatography. All the peptides were identified along the nucleotide sequence of gene 18 based on the amino acid compositions. These peptides cover 88% of the total primary structure. Furthermore, the amino acid sequences of 9 of the 40 peptides were determined by a gas-phase protein sequencer; one of them turned to be the N-terminal one. The C-terminal peptide in the tryptic digest was isolated from the unadsorbed fraction of affinity chromatography on immobilized anhydrotrypsin and the amino acid sequence was also determined. Thus, the complete primary structure of gp 18 was determined; it has 658 amino acid residues and a molecular weight of 71,160.This article was presented during the proceedings of the International Conference on Macromolecular Structure and Function, held at the National Defence Medical College, Tokorozawa, Japan, December 1985.  相似文献   

6.
MOTIVATION: Tandem mass spectrometry combined with sequence database searching is one of the most powerful tools for protein identification. As thousands of spectra are generated by a mass spectrometer in one hour, the speed of database searching is critical, especially when searching against a large sequence database, or when the peptide is generated by some unknown or non-specific enzyme, even or when the target peptides have post-translational modifications (PTM). In practice, about 70-90% of the spectra have no match in the database. Many believe that a significant portion of them are due to peptides of non-specific digestions by unknown enzymes or amino acid modifications. In another case, scientists may choose to use some non-specific enzymes such as pepsin or thermolysin for proteolysis in proteomic study, in that not all proteins are amenable to be digested by some site-specific enzymes, and furthermore many digested peptides may not fall within the rang of molecular weight suitable for mass spectrometry analysis. Interpreting mass spectra of these kinds will cost a lot of computational time of database search engines. OVERVIEW: The present study was designed to speed up the database searching process for both cases. More specifically speaking, we employed an approach combining suffix tree data structure and spectrum graph. The suffix tree is used to preprocess the protein sequence database, while the spectrum graph is used to preprocess the tandem mass spectrum. We then search the suffix tree against the spectrum graph for candidate peptides. We design an efficient algorithm to compute a matching threshold with some statistical significance level, e.g. p = 0.01, for each spectrum, and use it to select candidate peptides. Then we rank these peptides using a SEQUEST-like scoring function. The algorithms were implemented and tested on experimental data. For post-translational modifications, we allow arbitrary number of any modification to a protein. AVAILABILITY: The executable program and other supplementary materials are available online at: http://hto-c.usc.edu:8000/msms/suffix/.  相似文献   

7.
The alpha- and beta-subunits of the GTP-binding protein (transducin) from cattle retina were cleaved with cyanogen bromide. 21 peptides covering 90-100% of the amino acid sequence of the alpha- and beta-subunits were isolated from the hydrolyzate. Cyanogen bromide peptides complete or partial amino acid sequence was determined, the results were compared with those by Numa and coworkers [1] and Lochrie et al. [2] at the primary structure of the transducin alpha-subunit deduced from the nucleotide sequence of the cDNA. The structure by Lochrie is shown to differ much from the true structure of the alpha-subunit; probably, the investigators isolated cDNA, corresponding to the gene for some GTP-binding protein homologous to transducin, but not to the gene for the transducin alpha-subunit. The Numa's structure also contains an error. The final primary structure of the transducin alpha-subunit is given. The protein polypeptide chain consists of 349 amino acid residues and has an acetylmethionine residue as the N-terminal residue.  相似文献   

8.
The Edman Sequencing Research Group (ESRG) designs studies on the use of Edman degradation for protein and peptide analysis. These studies provide a means for participating laboratories to compare their analyses against a benchmark of those from other laboratories that provide this valuable service. The main purpose of the 2006 study was to determine how accurate Edman sequencing is for quantitative analysis of polypeptides. Secondarily, participants were asked to identify a modified amino acid residue, N-epsilon-acetyl lysine [Lys(Ac)], present within one of the peptides. The ESRG 2006 peptide mixture consisted of three synthetic peptides. The Peptide Standards Research Group (PSRG) provided two peptides, with the following sequences: KAQYARSVLLEKDAEPDILELATGYR (peptide B), and RQAKVLLYSGR (peptide C). The third peptide, peptide C*, synthesized and characterized by ESRG, was identical to peptide C but with acetyl lysine in position 4. The mixture consisted of 20% peptide B and 40% each of peptide C and its acetylated form, peptide C*. Participating laboratories were provided with two tubes, each containing 100 picomoles of the peptide mixture (as determined by quantitative amino acid analysis) and were asked to provide amino acid assignments, peak areas, retention times at each cycle, as well as initial and repetitive yield estimates for each peptide in the mixture. Details about instruments and parameters used in the analysis were also collected. Participants in the study with access to a mass spectrometer (MALDI-TOF or ESI) were asked to provide information about the relative peak areas of the peptides in the mixture as a comparison with the peptide quantitation results from Edman sequencing. Positive amino acid assignments were 88% correct for peptide C and 93% correct for peptide B. The absolute initial sequencing yields were an average of 67% for peptide (C+C*) and 65.6 % for peptide B. The relative molar ratios determined by Edman sequencing were an average of 4.27 (expected ratio of 4) for peptides (C+C*)/B, and 1.49 for peptide C*/C (expected ratio of 1); the seemingly high 49% error in quantification of Lys(Ac) in peptide C* can be attributed to commercial unavailability of its PTH standard. These values compare very favorably with the values obtained by mass spectrometry.  相似文献   

9.
Amino Acid Sequence of Porcine Myelin Basic Protein   总被引:6,自引:6,他引:0  
The myelin basic protein (BP) of pig brain was cleaved into its constituent tryptic peptides and the amino acid composition of each was determined. Those tryptic peptides that had not been sequenced previously were cleaved with dipeptidyl peptidases and the resulting dipeptides were trimethylsilated, separated by gas chromatography, and identified by mass spectrometry. Carboxypeptidases B and Y were used to establish the COOH-terminal sequences of some of the tryptic peptides; one tryptic peptide (sequence 76-92) was cleaved with thermolysin and the thermolytic peptides were analyzed. From the results of the present study together with those reported previously, it has been possible to determine the complete amino acid sequence of the protein. The protein consists of 172 residues and has a theoretical molecular weight of 18,604. Its amino acid sequence is identical with that reported for the homologous bovine protein with the following exceptions: Ser replaces (bovine) Ala2; His-Gly is inserted between Arg9 and Ser10; Ala replaces Ser45; His and Gly replace Gly76 and His77, respectively; Pro replaces Ser131 and Ser135; Ala is inserted between Gly142 and His143; and Gln replaces His143.  相似文献   

10.
Abstract: We have identified previously a synaptic membrane-associated protein, PP59, that serves as a substrate for cyclic AMP-dependent protein kinase and is enriched in rat cerebellum. We show here that PP59 can be extracted from synaptic plasma membranes with a combination of 2% Triton X-100 plus 1 M KCl. A 290-fold purification of PP59 was achieved by selective solubilization, followed by continuous-elution preparative gel electrophoresis. To determine the amino acid sequence surrounding the cyclic AMP-dependent protein kinase phosphorylation site within PP59, the partially purified 32P-phosphorylated protein was digested with chymotrypsin, and radiolabeled peptides were purified by sequential reversed-phase HPLC in two different solvent systems. Automated Edman degradation revealed a single phosphorylation site contained within the sequence Ala-Arg-Glu-Arg-Ser-Asp-Ser(P)-Thr-Gly-Ser-Ser-Ser-Val-Tyr. No strong sequence homology to this peptide fragment with other known peptides or proteins in the SwissProt, PIR, or GenPept databases could be found. A synthetic peptide containing this unique 14-amino acid sequence was used to develop polyclonal anti-peptide antibodies that were affinity-purified and shown to recognize intact PP59 as determined by western blotting. These antibodies specifically inhibited the phosphorylation of PP59 by cyclic AMP-dependent protein kinase in an in vitro phosphorylation assay containing synaptic plasma membranes.  相似文献   

11.
Multigenicity is one of the features of cancer/testis-associated genes. In the present study we analyzed the number and expression of genes of the SPANX(CTp11) family of cancer/testis-associated genes. Genomic database analysis, next to the four previously described SPANX genes, revealed the presence of a novel gene: SPANXE. Moreover, we detected an allelic variant of SPANXB resulting in one amino acid substitution in the encoded protein: SPANXB'. Most SPANX genes are present on contig NT_011574 located at Xq26.3-Xq27.1. Based on expressed sequence tag databases and RT-PCR analysis three additional novel SPANX sequences were identified, though not represented so far in the human genome sequence. Sequence alignments justify a subdivision of this gene family based on the absence (SPANXA-likes) or presence (SPANXB) of an 18 base pair sequence stretch in the open reading frame. The alignments also reveal an unusually high level (99%) of intron homology. Furthermore, the nucleotide variations in the open reading frame almost all lead to amino acid substitutions. Southern blot and database analyses indicate that SPANX sequences are exclusively present in primates. With RT-PCR analysis on human sperm cell precursors and tumor cell lines most family members could be detected. SPANXB was only found in sperm cell precursors and could not be detected in the tumor cell lines tested. Overall SPANXA was the most frequently expressed SPANX variant in melanoma and glioblastoma cell lines.  相似文献   

12.
Edman degradation sequencing relies on comparing high-performance liquid chromatography retention times of the sample phenylthiohydantoin amino acids with phenylthiohydantoin amino acid standards. The elution characteristics of the twenty common amino acids have been well characterized, which aids in making confident assignments. Modified amino acids may present more of a challenge since they are not part of the commonly used standards and because the protein sequencer analyst may not have experience with them. Laboratories requesting a sample were sent a tube containing approximately 775 pmoles of a 20-amino-acid synthetic peptide composed of several modified amino acids that may be found in proteins or are generated during sample preparation. In addition to filling in an assignment sheet, which included retention times and peak areas, participants were asked to provide specific details about the parameters used for the sequencing run. References for some of the modified amino acid elution characteristics were provided and the participants had the option of viewing a list of the modified amino acids present in the peptide at the Edman Sequencing Research Group website (ESRG). The goal of the study consisted of two parts: assessment of the ability to correctly assign all the amino acids in the peptide, including the modified amino acids; and the collection and compiling of elution time characteristics of modified amino acids for instruments used in the study. The resulting compilation of the modified amino acid elution times and running conditions will be accessible at the Association of Biomolecular Resource Facilities (ABRF) ESRG website for future reference. The ABRF ESRG 2004 sample is the 16th in a series of studies designed to aid laboratories in evaluating their abilities to obtain and interpret amino acid sequence data.  相似文献   

13.
A procedure is presented for the automatic determination of the amino acid sequence of peptides by processing data obtained from mass spectrometry analysis. This is a basic and relevant problem in the field of proteomics. Furthermore, it has an even higher conceptual and applicative interest in peptide research, as well as in other connected fields. The analysis does not rely on known protein databases, but on the computation of all amino acid sequences compatible with the given spectral data. By formulating a mathematical model for such combinatorial problems, the structural limitations of known methods are overcome, and efficient solution algorithms can be developed. The results are very encouraging both from the accuracy and computational points of view.  相似文献   

14.
ABRF-PRG04: differentiation of protein isoforms.   总被引:1,自引:1,他引:0  
Accurate protein identification sometimes requires careful discrimination between closely related protein isoforms that may differ by as little as a single amino acid substitution or post-translational modification. The ABRF Proteomics Research Group sent a mixture of three picomoles each of three closely related proteins to laboratories who requested it in the form of intact proteins, and participating laboratories were asked to identify the proteins and report their results. The primary goal of the ABRF-PRG04 Study was to give participating laboratories a chance to evaluate their capabilities and practices with regards to sample fractionation (1D- or 2D-PAGE, HPLC, or none), protein digestion methods (in-solution, in-gel, enzyme choice), and approaches to protein identification (instrumentation, use of software, and/or manual techniques to facilitate interpretation), as well as determination of amino acid or post-translational modifications. Of the 42 laboratories that responded, 8 (19%) correctly identified all three isoforms and N-terminal acetylation of each, 16 (38%) labs correctly identified two isoforms, 9 (21%) correctly identified two isoforms but also made at least one incorrect identification, and 9 (21%) made no correct protein identifications. All but one lab used mass spectrometry, and data submitted enabled a comparison of strategies and methods used.  相似文献   

15.
16.
Precise annotation of genes or open reading frames is still a difficult task that results in divergence even for data generated from the same genomic sequence. This has an impact in further proteomic studies, and also compromises the characterization of clinical isolates with many specific genetic variations that may not be represented in the selected database. We recently developed software called multistrain mass spectrometry prokaryotic database builder (MSMSpdbb) that can merge protein databases from several sources and be applied on any prokaryotic organism, in a proteomic-friendly approach. We generated a database for the Mycobacterium tuberculosis complex (using three strains of Mycobacterium bovis and five of M. tuberculosis), and analyzed data collected from two laboratory strains and two clinical isolates of M. tuberculosis. We identified 2561 proteins, of which 24 were present in M. tuberculosis H37Rv samples, but not annotated in the M. tuberculosis H37Rv genome. We were also able to identify 280 nonsynonymous single amino acid polymorphisms and confirm 367 translational start sites. As a proof of concept we applied the database to whole-genome DNA sequencing data of one of the clinical isolates, which allowed the validation of 116 predicted single amino acid polymorphisms and the annotation of 131 N-terminal start sites. Moreover we identified regions not present in the original M. tuberculosis H37Rv sequence, indicating strain divergence or errors in the reference sequence. In conclusion, we demonstrated the potential of using a merged database to better characterize laboratory or clinical bacterial strains.  相似文献   

17.
The nucleotide sequence of gene 18 of bacteriophage T4 was determined by the Maxam-Gilbert method, partially aided by the dideoxy method. To confirm the deduced amino acid sequence of the tail sheath protein (gp18) that is encoded by gene 18, gp18 was extensively digested by trypsin or lysyl endopeptidase and subjected to reverse-phase high-performance liquid chromatography. Approximately 40 peptides, which cover 88% of the primary structure, were fractionated, the amino acid compositions were determined, and the corresponding sequences in DNA were identified. Furthermore, the amino acid sequences of 10 of the 40 peptides were determined by a gas phase protein sequencer, including N- and C-terminal sequences. Thus, the complete amino acid sequence of gp18, which consists of 658 amino acids with a molecular weight of 71,160, was determined.  相似文献   

18.
In a proteomics experiment, reduction and alkylation of proteins prior to enzymatic digestion ensures high sequence coverage of that protein during a database search. However, the alkylation procedure uses an excess of an alkylating agent such as iodoacetamide (IAA). Therefore, although other amino acids are alkylated, these modified peptides are not identified in a database search. Here we show that a large proportion of peptides are mono- and di-alkylated by IAA and therefore not identified via a database search. The first alkylation consistently takes place at the N-terminal amino acid. Therefore, we propose that during the database search conducted during a proteomics experiment, one should have the option of searching for any alkylated peptide at the N-terminal amino acid.  相似文献   

19.
Expressed sequence tags (ESTs) from the marine red alga Gracilaria gracilis   总被引:2,自引:0,他引:2  
Expressed sequence tags (ESTs) are partial sequences of cDNAs, and can be used to characterize gene expression in organisms or tissues. We have constructed a 200-sequence EST database from vegetative thalli of Gracilaria gracilis, the first ESTs reported from any alga. This database contains recognizable ESTs corresponding to genes of carbohydrate metabolism (seven), amino acid metabolism (three), photosynthesis (five), nucleic acid synthesis, repair and processing (three), protein synthesis (14), protein degradation (six), cellular maintenance and stress response (three), other identifiable protein-coding genes (13) and 146 sequences for which significant matches were not found in existing sequence databases. We have already used this EST database to recover genes of carbohydrate biosynthesis from G. gracilis. This revised version was published online in August 2006 with corrections to the Cover Date.  相似文献   

20.
López JL  Marina A  Alvarez G  Vázquez J 《Proteomics》2002,2(12):1658-1665
In this work, a novel approach based on proteomics is applied for the analysis of the three European marine mussel species: Mytilus edulis (ME), Mytilus galloprovincialis (MG) and Mytilus trossulus (MT), which are of interest in biotechnology and food industry. The proteomes of these species are poorly described in databases, are difficult to diagnose, and have a controversial taxonomy, To characterise species-specific peptides, we compared 51 matrix-assisted laser desorption/ioization-time of flight peptide mass maps generated from 6 random selected prominent spots derived from the two-dimensional electrophoresis analysis of foot protein extracts from several individuals. Minor species-specific differences in the peptide maps were detected in only one of the spots, corresponding to tropomyosin. Two peptides were unique to ME and MG individuals, whereas another peptide was present only in MT individuals. The sequence of these peptides was characterised by, nanoelectrospray ionization-ion trap (nanoESI-IT) tandem mass spectrometry (MS/MS) analysis followed by database searching and de novo sequence interpretation. We detected a single T to D amino acid substitution in MT tropomyosin. Unambiguous and highly-specific species identification was then demonstrated by analysing peptide extracts from tropomyosin spots by micro high-performande liquid chromatography (microHPL) ESI-IT mass spectrometry using the selected ion monitoring configuration, focused on these peptides, in continuous MS/MS operation. Our results suggest that proteomics may be successfully applied for the identification of species whose proteome is not present in databases.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号