期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Mass accuracy and sequence requirements for protein database searching.

M K Green M V Johnston B S Larsen 《Analytical biochemistry》1999,275(1):39-46

To elucidate the role of high mass accuracy in mass spectrometric peptide mapping and database searching, selected proteins were subjected to tryptic digestion and the resulting mixtures were analyzed by electrospray ionization on a 7 Tesla Fourier transform mass spectrometer with a mass accuracy of 1 ppm. Two extreme cases were examined in detail: equine apomyoglobin, which digested easily and gave very few spurious masses, and bovine alpha-lactalbumin, which under the conditions used, gave many spurious masses. The effectiveness of accurate mass measurements in minimizing false protein matches was examined by varying the mass error allowed in the search over a wide range (2-500 ppm). For the "clean" data obtained from apomyoglobin, very few masses were needed to return valid protein matches, and the mass error allowed in the search had little effect up to 500 ppm. However, in the case of alpha-lactalbumin more mass values were needed, and low mass errors increased the search specificity. Mass errors below 30 ppm were particularly useful in eliminating false protein matches when few mass values were used in the search. Collision-induced dissociation of an unassigned peak in the alpha-lactalbumin digest provided sufficient data to unambiguously identify the peak as a fragment from alpha-lactalbumin and eliminate a large number of spurious proteins found in the peptide mass search. The results show that even with a relatively high mass error (0.8 Da for mass differences between singly charged product ions), collision-induced dissociation can help identify proteins in cases where unfavorable digest conditions or modifications render digest peaks unidentifiable by a simple mass mapping search. 相似文献

2.

Increased coverage of protein families with the blocks database servers 总被引：34，自引：0，他引：34

Henikoff JG Greene EA Pietrokovski S Henikoff S 《Nucleic acids research》2000,28(1):228-230

The Blocks Database WWW (http://blocks.fhcrc.org ) and Email (blocks@blocks.fhcrc.org ) servers provide tools to search DNA and protein queries against the Blocks+ Database of multiple alignments, which represent conserved protein regions. Blocks+ nearly doubles the number of protein families included in the database by adding families from the Pfam-A, ProDom and Domo databases to those from PROSITE and PRINTS. Other new features include improved Block Searcher statistics, searching with NCBI's IMPALA program and 3D display of blocks on PDB structures. 相似文献

3.

Automated assembly of protein blocks for database searching. 总被引：45，自引：7，他引：45

下载免费PDF全文

S Henikoff J G Henikoff 《Nucleic acids research》1991,19(23):6565-6572

A system is described for finding and assembling the most highly conserved regions of related proteins for database searching. First, an automated version of Smith's algorithm for finding motifs is used for sensitive detection of multiple local alignments. Next, the local alignments are converted to blocks and the best set of non-overlapping blocks is determined. When the automated system was applied successively to all 437 groups of related proteins in the PROSITE catalog, 1764 blocks resulted; these could be used for very sensitive searches of sequence databases. Each block was calibrated by searching the SWISS-PROT database to obtain a measure of the chance distribution of matches, and the calibrated blocks were concatenated into a database that could itself be searched. Examples are provided in which distant relationships are detected either using a set of blocks to search a sequence database or using sequences to search the database of blocks. The practical use of the blocks database is demonstrated by detecting previously unknown relationships between oxidoreductases and by evaluating a proposed relationship between HIV Vif protein and thiol proteases. 相似文献

4.

Optimization of filtering criterion for SEQUEST database searching to improve proteome coverage in shotgun proteomics

Xinning Jiang Xiaogang Jiang Guanghui Han Mingliang Ye Hanfa Zou 《BMC bioinformatics》2007,8(1):323

Background

In proteomic analysis, MS/MS spectra acquired by mass spectrometer are assigned to peptides by database searching algorithms such as SEQUEST. The assignations of peptides to MS/MS spectra by SEQUEST searching algorithm are defined by several scores including Xcorr, ΔCn, Sp, Rsp, matched ion count and so on. Filtering criterion using several above scores is used to isolate correct identifications from random assignments. However, the filtering criterion was not favorably optimized up to now. 相似文献

5.

HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment 总被引：2，自引：0，他引：2

Remmert M Biegert A Hauser A Söding J 《Nature methods》2012,9(2):173-175

Sequence-based protein function and structure prediction depends crucially on sequence-search sensitivity and accuracy of the resulting sequence alignments. We present an open-source, general-purpose tool that represents both query and database sequences by profile hidden Markov models (HMMs): 'HMM-HMM-based lightning-fast iterative sequence search' (HHblits; http://toolkit.genzentrum.lmu.de/hhblits/). Compared to the sequence-search tool PSI-BLAST, HHblits is faster owing to its discretized-profile prefilter, has 50-100% higher sensitivity and generates more accurate alignments. 相似文献

6.

Searching the protein sequence database 总被引：1，自引：0，他引：1

Bruce C. Orcutt Winona C. Barker 《Bulletin of mathematical biology》1984,46(4):545-552

As the volume of protein sequence data grows, rapid methods for searching the protein sequence database become of primary importance. Rigorous comparison of sequences is obtained with the well-known dynamic programming algorithms. However, these algorithms are not rapid enough to use for routinely searching the entire database. In this paper we discuss some methods that can be used for rapid searches. 相似文献

7.

Automated protein sequence pattern handling and PROSITE searching

Sibbald Peter R.; Sommerfeldt Hubert; Argos Patrick 《Bioinformatics (Oxford, England)》1991,7(4):535-536

The protein sequence searching program Scrutineer has been modifiedto search for targets from a file. We are distributing a reformattedfile of PROSITES which can be read by Scrutineer. In addition,Scrutineer still accepts targets typed in interactively butcan now write them out in the format required as input. Sincethe input format is the same as the output format, target managementand re-use is simple. 相似文献

8.

EMBOPRO--an automatically generated protein sequence database

Stulich R.; Rohde K. 《Bioinformatics (Oxford, England)》1989,5(1):15-18

For the identification of newly sequenced proteins it is necessaryto have a large stock of known proteins for comparison. In thispaper we present an automatically generated protein sequencedatabase. The translation program introduced allows a periodicaltranslation of every new release of the EMBL database. Possibleerrors of the translation are discussed as well as the reliabilityof the nucleotide sequence data, which turns out to be quitegood. A comparison of our translated database with some establishedones is given. Received on December 15, 1987; accepted on April 19, 1988 相似文献

9.

Strategies for searching sequence databases

Nicholas HB Deerfield DW Ropelewski AJ 《BioTechniques》2000,28(6):1174-8, 1180, 1182 passim

We provide a detailed overview of the choices inherent in performing a sequence database search, including the choice of algorithm, substitution matrix and gap model. Each of these choices has implications that can be described as restrictions on the underlying model of sequence evolution, the expected degree of divergence between the query sequence and the database sequences (if one uses an evolutionary based matrix), as well as the sensitivity and selectivity of the search. We conclude with a series of recommendations for researchers performing these searches based on our experience and literature studies. 相似文献

10.

A large database DNA sequence handling program with generalized searching specifications. 总被引：1，自引：3，他引：1

下载免费PDF全文

P A Stockwell 《Nucleic acids research》1982,10(1):115-125

The program described allows for the creation and manipulation of files of DNA sequence data up to very great lengths. The program uses its own paging system to load segments of the sequence into a small internal buffer so that the program does not have excessive memory requirements. The program offers a menu of functions to the user, and has been written to be forgiving of user errors. A code for the generalised specification of bases as a series of groups (i.e. A or T, Purine, etc.) has been devised and can be used in search specifications or in sequence files. Versions of the program have been developed to run with special efficiency under DIGITAL's RT11 operating system or to run under systems with a suitable implementation of FORTRAN VI. 相似文献

11.

Protein backbone chemical shifts predicted from searching a database for torsion angle and sequence homology

Shen Y Bax A 《Journal of biomolecular NMR》2007,38(4):289-302

Chemical shifts of nuclei in or attached to a protein backbone are exquisitely sensitive to their local environment. A computer program, SPARTA, is described that uses this correlation with local structure to predict protein backbone chemical shifts, given an input three-dimensional structure, by searching a newly generated database for triplets of adjacent residues that provide the best match in phi/psi/chi(1 )torsion angles and sequence similarity to the query triplet of interest. The database contains (15)N, (1)H(N), (1)H(alpha), (13)C(alpha), (13)C(beta) and (13)C' chemical shifts for 200 proteins for which a high resolution X-ray (< or =2.4 A) structure is available. The relative importance of the weighting factors for the phi/psi/chi(1) angles and sequence similarity was optimized empirically. The weighted, average secondary shifts of the central residues in the 20 best-matching triplets, after inclusion of nearest neighbor, ring current, and hydrogen bonding effects, are used to predict chemical shifts for the protein of known structure. Validation shows good agreement between the SPARTA-predicted and experimental shifts, with standard deviations of 2.52, 0.51, 0.27, 0.98, 1.07 and 1.08 ppm for (15)N, (1)H(N), (1)H(alpha), (13)C(alpha), (13)C(beta) and (13)C', respectively, including outliers. 相似文献

12.

YACOP: Enhanced gene prediction obtained by a combination of existing methods

Tech M Merkl R 《In silico biology》2003,3(4):441-451

The performance of gene-predicting tools varies considerably if evaluated with respect to the parameters sensitivity and specificity or their capability to identify the correct start codon. We were interested to validate tools for gene prediction and to implement a metatool named YACOP, which combines existing tools and has a higher performance. YACOP parses and combines the output of the three gene-predicting systems Criticia, Glimmer and ZCURVE. It outperforms each of the programs tested with its high sensitivity and specificity values combined with a larger number of correctly predicted gene starts. Performance of YACOP and the gene-finding programs was tested by comparing their output with a carefully selected set of annotated genomes. We found that the problem of identifying genes in prokaryotic genomes by means of computational analysis was solved satisfactorily. In contrast, the correct localization of the start codon still appeared to be a problem, as in all cases under test at least 7.8% and up to 32.3% of the positions given in the annotations differed from the locus predicted by any of the programs tested. YACOP can be downloaded from http://www.g2l.bio.uni-goettingen.de. 相似文献

13.

Defining parameters for homology-tolerant database searching.

J P Kayser J L Vallet R L Cerny 《Journal of biomolecular techniques》2004,15(4):285-295

De novo interpretation of tandem mass spectrometry (MS/MS) spectra provides sequences for searching protein databases when limited sequence information is present in the database. Our objective was to define a strategy for this type of homology-tolerant database search. Homology searches, using MS-Homology software, were conducted with 20, 10, or 5 of the most abundant peptides from 9 proteins, based either on precursor trigger intensity or on total ion current, and allowing for 50%, 30%, or 10% mismatch in the search. Protein scores were corrected by subtracting a threshold score that was calculated from random peptides. The highest (p < .01) corrected protein scores (i.e., above the threshold) were obtained by submitting 20 peptides and allowing 30% mismatch. Using these criteria, protein identification based on ion mass searching using MS/MS data (i.e., Mascot) was compared with that obtained using homology search. The highest-ranking protein was the same using Mascot, homology search using the 20 most intense peptides, or homology search using all peptides, for 63.4% of 112 spots from two-dimensional polyacrylamide gel electrophoresis gels. For these proteins, the percent coverage was greatest using Mascot compared with the use of all or just the 20 most intense peptides in a homology search (25.1%, 18.3%, and 10.6%, respectively). Finally, 35% of de novo sequences completely matched the corresponding known amino acid sequence of the matching peptide. This percentage increased when the search was limited to the 20 most intense peptides (44.0%). After identifying the protein using MS-Homology, a peptide mass search may increase the percent coverage of the protein identified. 相似文献

14.

Increased sequence diversity coverage improves detection of HIV-specific T cell responses 总被引：2，自引：0，他引：2

Frahm N Kaufmann DE Yusim K Muldoon M Kesmir C Linde CH Fischer W Allen TM Li B McMahon BH Faircloth KL Hewitt HS Mackey EW Miura T Khatri A Wolinsky S McMichael A Funkhouser RK Walker BD Brander C Korber BT 《Journal of immunology (Baltimore, Md. : 1950)》2007,179(10):6638-6650

The accurate identification of HIV-specific T cell responses is important for determining the relationship between immune response, viral control, and disease progression. HIV-specific immune responses are usually measured using peptide sets based on consensus sequences, which frequently miss responses to regions where test set and infecting virus differ. In this study, we report the design of a peptide test set with significantly increased coverage of HIV sequence diversity by including alternative amino acids at variable positions during the peptide synthesis step. In an IFN-gamma ELISpot assay, these "toggled" peptides detected HIV-specific CD4(+) and CD8(+) T cell responses of significantly higher breadth and magnitude than matched consensus peptides. The observed increases were explained by a closer match of the toggled peptides to the autologous viral sequence. Toggled peptides therefore afford a cost-effective and significantly more complete view of the host immune response to HIV and are directly applicable to other variable pathogens. 相似文献

15.

Membrane protein identification: N-terminal labeling of nontryptic membrane protein peptides facilitates database searching

Jansson M Wårell K Levander F James P 《Journal of proteome research》2008,7(2):659-665

Membrane proteins are fairly refractory to digestion especially by trypsin, and less specific proteases, such as elastase and pepsin, are much more effective. However, database searching using nontryptic peptides is much less effective because of the lack of charge localization at the N and C termini and the absence of sequence specificity. We describe a method for N-terminal-specific labeling of peptides from nontryptic digestions of membrane proteins, which facilitates Mascot database searching and can be used for relative quantitation. The conditions for digestion have been optimized to obtain peptides of a suitable length for mass spectrometry (MS) fragmentation. We show the effectiveness of the method using a plasma membrane preparation from a leukemia cell line and demonstrate a large increase in the number of membrane proteins, with small extra-membranar domains being identified in comparison to previous published methods. 相似文献

16.

OWL--a non-redundant composite protein sequence database. 总被引：4，自引：1，他引：4

下载免费PDF全文

A J Bleasby D Akrigg T K Attwood 《Nucleic acids research》1994,22(17):3574-3577

A comprehensive, non-redundant composite protein sequence database is described. The database, OWL, is an amalgam of data from six publicly-available primary sources, and is generated using strict redundancy criteria. The database is updated monthly and its size has increased almost eight-fold in the last six years: the current version contains > 76,000 entries. For added flexibility, OWL is distributed with a tailor-made query language, together with a number of programs for database exploration, information retrieval and sequence analysis, which together form an integrated database and software resource for protein sequences. 相似文献

17.

CHIKVPRO - a protein sequence annotation database for chikungunya virus

Mishra AK Jain CK Agrawal A Jain SJ Dudha N Kumar K Sharma SK Gupta S 《Bioinformation》2010,5(1):4-6

In the recent past, there has been a resurgence of interest in Chikungunya virus (CHIKV) attributed to massive outbreaks of Chikungunya fever in the South-East Asia Region. This has reflected in substantial increase in submission of CHIKV genome sequences to NCBI (National Center for Biotechnology Information) database. Hereby we submit a database "CHIKVPRO" containing structural and functional annotation of Chikungunya virus proteins (25 strains) submitted in the NCBI repository. The CHIKV genome encodes for 9 proteins:4 non-structural and 5 structural. The CHIKVPRO database aims to provide the virology community with a single accession authoritative resource for CHIKV proteome- with reference to physiochemical and molecular properties, proteolytic cleavage sites, hydrophobicity, transmembrane prediction, and classification into functional families using SVMProt and other Expasy tools. AVAILABILITY: The database is freely available at http://www.chikvpro.info/ 相似文献

18.

Pseudoknots in prion protein mRNAs confirmed by comparative sequence analysis and pattern searching

下载免费PDF全文

Barrette I Poisson G Gendron P Major F 《Nucleic acids research》2001,29(3):753-758

The human prion gene contains five copies of a 24 nt repeat that is highly conserved among species. An analysis of folding free energies of the human prion mRNA, in particular in the repeat region, suggested biased codon selection and the presence of RNA patterns. In particular, pseudoknots, similar to the one predicted by Wills in the human prion mRNA, were identified in the repeat region of all available prion mRNAs available in GenBank, but not those of birds and the red slider turtle. An alignment of these mRNAs, which share low sequence homology, shows several co-variations that maintain the pseudoknot pattern. The presence of pseudoknots in yeast Sup35p and Rnq1 suggests acquisition in the prokaryotic era. Computer generated three-dimensional structures of the human prion pseudoknot highlight protein and RNA interaction domains, which suggest a possible effect in prion protein translation. The role of pseudoknots in prion diseases is discussed as individuals with extra copies of the 24 nt repeat develop the familial form of Creutzfeldt–Jakob disease. 相似文献

19.

Critical comparison of multidimensional separation methods for increasing protein expression coverage

Antberg L Cifani P Sandin M Levander F James P 《Journal of proteome research》2012,11(5):2644-2652

We present a comparison of two-dimensional separation methods and how they affect the degree of coverage of protein expression in complex mixtures. We investigated the relative merits of various protein and peptide separations prior to acidic reversed-phase chromatography directly coupled to an ion trap mass spectrometer. The first dimensions investigated were density gradient organelle fractionation of cell extracts, 1D SDS-PAGE protein separation followed by digestion by trypsin or GluC proteases, strong cation exchange chromatography, and off-gel isoelectric focusing of tryptic peptides. The number of fractions from each first dimension and the total data accumulation RP-HPLC-MS/MS time was kept constant and the experiments were run in triplicate. We find that the most critical parameters are the data accumulation time, which defines the level of under-sampling and the avoidance of peptides from high expression level proteins eluting over the entire gradient. 相似文献

20.

Integrated approach for manual evaluation of peptides identified by searching protein sequence databases with tandem mass spectra

Chen Y Kwon SW Kim SC Zhao Y 《Journal of proteome research》2005,4(3):998-1005

Quantitative proteomics relies on accurate protein identification, which often is carried out by automated searching of a sequence database with tandem mass spectra of peptides. When these spectra contain limited information, automated searches may lead to incorrect peptide identifications. It is therefore necessary to validate the identifications by careful manual inspection of the mass spectra. Not only is this task time-consuming, but the reliability of the validation varies with the experience of the analyst. Here, we report a systematic approach to evaluating peptide identifications made by automated search algorithms. The method is based on the principle that the candidate peptide sequence should adequately explain the observed fragment ions. Also, the mass errors of neighboring fragments should be similar. To evaluate our method, we studied tandem mass spectra obtained from tryptic digests of E. coli and HeLa cells. Candidate peptides were identified with the automated search engine Mascot and subjected to the manual validation method. The method found correct peptide identifications that were given low Mascot scores (e.g., 20-25) and incorrect peptide identifications that were given high Mascot scores (e.g., 40-50). The method comprehensively detected false results from searches designed to produce incorrect identifications. Comparison of the tandem mass spectra of synthetic candidate peptides to the spectra obtained from the complex peptide mixtures confirmed the accuracy of the evaluation method. Thus, the evaluation approach described here could help boost the accuracy of protein identification, increase number of peptides identified, and provide a step toward developing a more accurate next-generation algorithm for protein identification. 相似文献