首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
LQ Xie  CP Shen  MB Liu  ZD Chen  RY Du  GQ Yan  HJ Lu  PY Yang 《Molecular bioSystems》2012,8(10):2692-2698
Electron transfer dissociation (ETD) is a useful and complementary activation method for peptide fragmentation in mass spectrometry. However, ETD spectra typically receive a relatively low score in the identifications of 2+ ions. To overcome this challenge, we, for the first time, systematically interrogated the benefits of combining ion charge enhancing methods (dimethylation, guanidination, m-nitrobenzyl alcohol (m-NBA) or Lys-C digestion) and differential search algorithms (Mascot, Sequest, OMSSA, pFind and X!Tandem). A simple sample (BSA) and a complex sample (AMJ2 cell lysate) were selected in benchmark tests. Clearly distinct outcomes were observed through different experimental protocol. In the analysis of AMJ2 cell lines, X!Tandem and pFind revealed 92.65% of identified spectra; m-NBA adduction led to a 5-10% increase in average charge state and the most significant increase in the number of successful identifications, and Lys-C treatment generated peptides carrying mostly triple charges. Based on the complementary identification results, we suggest that a combination of m-NBA and Lys-C strategies accompanied by X!Tandem and pFind can greatly improve ETD identification.  相似文献   

2.
A software package, IndexToolkit, aimed at overcoming the disadvantage of FASTA-format databases for frequent searching, is developed to utilize an indexing strategy to substantially accelerate sequence queries. IndexToolkit includes user-friendly tools and an Application Programming Interface (API) to facilitate indexing, storage and retrieval of protein sequence databases. As open source, it provides a sequence-retrieval developing framework, which is easily extensible for high-speed-request proteomic applications, such as database searching or modification discovering. We applied IndexToolkit to database searching engine pFind to demonstrate its effect. Experimental studies show that IndexToolkit is able to support significantly faster searches of protein database. AVAILABILITY: The IndexToolkit is free to use under the open source GNU GPL license. The source code and the compiled binary can be freely accessed through the website http://pfind.jdl.ac.cn/IndexToolkit. In this website, the more detailed information including screenshots and documentations for users and developers is also available.  相似文献   

3.
Database-searching programs generally identify only a fraction of the spectra acquired in a standard LC/MS/MS study of digested proteins. Subtle variations in database-searching algorithms for assigning peptides to MS/MS spectra have been known to provide different identification results. To leverage this variation, a probabilistic framework is developed for combining the results of multiple search engines. The scores for each search engine are first independently converted into peptide probabilities. These probabilities can then be readily combined across search engines using Bayesian rules and the expectation maximization learning algorithm. A significant gain in the number of peptides identified with high confidence with each additional search engine is demonstrated using several data sets of increasing complexity, from a control protein mixture to a human plasma sample, searched using SEQUEST, Mascot, and X! Tandem database-searching programs. The increased rate of peptide assignments also translates into a substantially larger number of protein identifications in LC/MS/MS studies compared to a typical analysis using a single database-search tool.  相似文献   

4.
This paper surveys the computational strategies followed to parallelise the most used software in the bioinformatics arena. The studied algorithms are computationally expensive and their computational patterns range from regular, such as database-searching applications, to very irregularly structured patterns (phylogenetic trees). Fine- and coarse-grained parallel strategies are discussed for these very diverse sets of applications. This overview outlines computational issues related to parallelism, physical machine models, parallel programming approaches and scheduling strategies for a broad range of computer architectures. In particular, it deals with shared, distributed and shared/distributed memory architectures.  相似文献   

5.
The B30.2 domain is a conserved region of around 170 amino acids associated with several different protein domains, including the immunoglobulin folds of butyrophilin and the RING finger domain of ret finger protein. We recently reported several novel members of this family as well as previously undescribed protein families possessing the B30.2 domain. Many proteins have subsequently been found to possess this domain, including pyrin/marenostrin and the midline 1 (MID1) protein. Mutations in the B30.2 domain of pyrin/marenostrin are implicated in familial Mediterranean fever, and partial loss of the B30.2 domain of MID1 is responsible for Opitz G/BBB syndrome, characterized by developmental midline defects. In this study, we scrutinized the available sequence data bases for the identification of novel B30.2 domain proteins using highly sensitive database-searching tools. In addition, we discuss the chromosomal localization of genes in the B30.2 family, since the encoded proteins are likely to be involved in other forms of periodic fever, autoimmune, and genetic diseases.   相似文献   

6.
SUMMARY: BioQuery is an application that helps scientists automate database searches. Users can build and store queries to public biomedical databases, and receive periodic updates on the results of those queries when new data is available. The application is implemented on a portable object framework that can provide database-searching capability to other applications. This framework is easily extensible, allowing users to develop plug-ins that provide access to new databases. BioQuery thus provides end-users with a complete database searching interface and updating service, and gives developers a toolkit to provide database-searching capability to their applications. AVAILABILITY: Free to all users: http://www.bioquery.org.  相似文献   

7.
MOTIVATION: The correlation among fragment ions in a tandem mass spectrum is crucial in reducing stochastic mismatches for peptide identification by database searching. Until now, an efficient scoring algorithm that considers the correlative information in a tunable and comprehensive manner has been lacking. RESULTS: This paper provides a promising approach to utilizing the correlative information for improving the peptide identification accuracy. The kernel trick, rooted in the statistical learning theory, is exploited to address this issue with low computational effort. The common scoring method, the tandem mass spectral dot product (SDP), is extended to the kernel SDP (KSDP). Experiments on a dataset reported previously demonstrate the effectiveness of the KSDP. The implementation on consecutive fragments shows a decrease of 10% in the error rate compared with the SDP. Our software tool, pFind, using a simple scoring function based on the KSDP, outperforms two SDP-based software tools, SEQUEST and Sonar MS/MS, in terms of identification accuracy. SUPPLEMENTARY INFORMATION: http://www.jdl.ac.cn/user/yfu/pfind/index.html  相似文献   

8.
Database searching by flexible protein structure alignment   总被引:1,自引:0,他引:1  
We have recently developed a flexible protein structure alignment program (FATCAT) that identifies structural similarity, at the same time accounting for flexibility of protein structures. One of the most important applications of a structure alignment method is to aid in functional annotations by identifying similar structures in large structural databases. However, none of the flexible structure alignment methods were applied in this task because of a lack of significance estimation of flexible alignments. In this paper, we developed an estimate of the statistical significance of FATCAT alignment score, allowing us to use it as a database-searching tool. The results reported here show that (1) the distribution of the similarity score of FATCAT alignment between two unrelated protein structures follows the extreme value distribution (EVD), adding one more example to the current collection of EVDs of sequence and structure similarities; (2) introducing flexibility into structure comparison only slightly influences the sensitivity and specificity of identifying similar structures; and (3) the overall performance of FATCAT as a database searching tool is comparable to that of the widely used rigid-body structure comparison programs DALI and CE. Two examples illustrating the advantages of using flexible structure alignments in database searching are also presented. The conformational flexibilities that were detected in the first example may be involved with substrate specificity, and the conformational flexibilities detected in the second example may reflect the evolution of structures by block building.  相似文献   

9.
Fang X  Chen W  Xin Y  Zhang H  Yan C  Yu H  Liu H  Xiao W  Wang S  Zheng G  Liu H  Jin L  Ma H  Ruan S 《Journal of Proteomics》2012,75(13):4074-4090
  相似文献   

10.
Proteomic profiles of the lamina of Ecklonia kurome Okamura, one of the Japanese dominant laminarialean kelps, were investigated by two-dimensional electrophoresis (2-DE) and MALDI-TOF/TOF. Due to the absence of E. kurome DNA or protein databases, homology-based cross-species protein identification was performed using a combination of three database-searching algorithms, Mascot peptide mass fingerprinting, Mascot MS/MS ion search, and mass spectrometry-based BLAST. Proteins were extracted from the lamina by an ethanol/phenol method and subjected to 2-DE (pI 4–7, 10 % polyacrylamide gel). More than 700 spots were detected in the 2-DE gel with CBB, and 93 spots (24 proteins) were successfully identified by MALDI-TOF/TOF and the cross-species database searching. The identified proteins mainly consisted of cytoplasmic carbohydrate metabolic enzymes, chloroplast proteins involved in photosynthesis, and haloperoxidases. Interestingly, vanadium-dependent bromoperoxidases (vBPO), which is thought to be involved in halogen uptake, synthesis of halogenated products, and detoxification of reactive oxygen species, were separated into at least 23 different spots. By comparing mass spectra, amino acid sequences predicted from tandem mass spectra and haloperoxidase activities of the vBPOs, we found that (1) at least two types of vBPOs were expressed in the lamina of E. kurome and (2) two pro-vBPOs might be activated by specific cleavage at N- and C-terminal regions.  相似文献   

11.
Liska AJ  Shevchenko A  Pick U  Katz A 《Plant physiology》2004,136(1):2806-2817
Salinity is a major limiting factor for the proliferation of plants and inhibits central metabolic activities such as photosynthesis. The halotolerant green alga Dunaliella can adapt to hypersaline environments and is considered a model photosynthetic organism for salinity tolerance. To clarify the molecular basis for salinity tolerance, a proteomic approach has been applied for identification of salt-induced proteins in Dunaliella. Seventy-six salt-induced proteins were selected from two-dimensional gel separations of different subcellular fractions and analyzed by mass spectrometry (MS). Application of nanoelectrospray mass spectrometry, combined with sequence-similarity database-searching algorithms, MS BLAST and MultiTag, enabled identification of 80% of the salt-induced proteins. Salinity stress up-regulated key enzymes in the Calvin cycle, starch mobilization, and redox energy production; regulatory factors in protein biosynthesis and degradation; and a homolog of a bacterial Na(+)-redox transporters. The results indicate that Dunaliella responds to high salinity by enhancement of photosynthetic CO(2) assimilation and by diversion of carbon and energy resources for synthesis of glycerol, the osmotic element in Dunaliella. The ability of Dunaliella to enhance photosynthetic activity at high salinity is remarkable because, in most plants and cyanobacteria, salt stress inhibits photosynthesis. The results demonstrated the power of MS BLAST searches for the identification of proteins in organisms whose genomes are not known and paved the way for dissecting molecular mechanisms of salinity tolerance in algae and higher plants.  相似文献   

12.
MOTIVATION: Peptide-sequencing methods by mass spectrum use the following two approaches: database searching and de novo sequencing. The database-searching approach is convenient; however, in cases wherein the corresponding sequences are not included in the databases, the exact identification is difficult. On the other hand, in the case of de novo sequencing, no preliminary information is necessary; however, continuous amino acid sequence peaks and the differentiation of these peaks are required. It is, however, very difficult to obtain and differentiate the peaks of all amino acids by using an actual spectrum. We propose a novel de novo sequencing approach using not only mass-to-charge ratio but also ion peak intensity and amino acid cleavage intensity ratio (CIR). RESULTS: Our method compensates for any undetectable amino acid peak intervals by estimating the amino acid set and the probability of peak expression based on amino acid CIR. It provides more accurate identification of sequences than the existing methods, by which it is usually difficult to sequence.  相似文献   

13.
14.

Background  

MannDB was created to meet a need for rapid, comprehensive automated protein sequence analyses to support selection of proteins suitable as targets for driving the development of reagents for pathogen or protein toxin detection. Because a large number of open-source tools were needed, it was necessary to produce a software system to scale the computations for whole-proteome analysis. Thus, we built a fully automated system for executing software tools and for storage, integration, and display of automated protein sequence analysis and annotation data.  相似文献   

15.
Motivation: Peptide mass fingerprinting (PMF) is a method for protein identification in which a protein is fragmented by a defined cleavage protocol (usually proteolysis with trypsin), and the masses of these products constitute a 'fingerprint' that can be searched against theoretical fingerprints of all known proteins. In the first stage of PMF, the raw mass spectrometric data are processed to generate a peptide mass list. In the second stage this protein fingerprint is used to search a database of known proteins for the best protein match. Although current software solutions can typically deliver a match in a relatively short time, a system that can find a match in real time could change the way in which PMF is deployed and presented. In a paper published earlier we presented a hardware design of a raw mass spectra processor that, when implemented in Field Programmable Gate Array (FPGA) hardware, achieves almost 170-fold speed gain relative to a conventional software implementation running on a dual processor server. In this article we present a complementary hardware realization of a parallel database search engine that, when running on a Xilinx Virtex 2 FPGA at 100 MHz, delivers 1800-fold speed-up compared with an equivalent C software routine, running on a 3.06 GHz Xeon workstation. The inherent scalability of the design means that processing speed can be multiplied by deploying the design on multiple FPGAs. The database search processor and the mass spectra processor, running on a reconfigurable computing platform, provide a complete real-time PMF protein identification solution.  相似文献   

16.
A line profile of fluorescent intensities in confocal images is frequently examined. We have developed the computer software tool to analyse the profiles of intensities of fluorescent probes in confocal images. The software averages neighbouring pixels, adjacent to the central line, without reducing the spatial resolution of the image. As an experimental model, we have used the skeletal muscle fibre isolated from the rat skeletal muscle extensor digitorum brevis. As a marker of myofibrils' structure, we have used phalloidin–rhodamine staining and the anti-TIM antibody to label mitochondria. We also tested the distribution of the protein kinase B/Akt. Since signalling is ordered in modules and large protein complexes appear to direct signalling to organelles and regulate specific physiological functions, a software tool to analyse such complexes in fluorescent confocal images is required. The software displays the image, and the user defines the line for analysis. The image is rotated by the angle of the line. The line profile is calculated by averaging one dimension of the cropped rotated image matrix. The spatial resolution in averaged line profile is not decreased compared with single-pixel line profile, which was confirmed by the discrete Fourier transform computed with a fast Fourier transform algorithm. We conclude that the custom software tool presented here is a useful tool to analyse line profiles of fluorescence intensities in confocal images.  相似文献   

17.
Many methods have been described to predict the subcellular location of proteins from sequence information. However, most of these methods either rely on global sequence properties or use a set of known protein targeting motifs to predict protein localization. Here, we develop and test a novel method that identifies potential targeting motifs using a discriminative approach based on hidden Markov models (discriminative HMMs). These models search for motifs that are present in a compartment but absent in other, nearby, compartments by utilizing an hierarchical structure that mimics the protein sorting mechanism. We show that both discriminative motif finding and the hierarchical structure improve localization prediction on a benchmark data set of yeast proteins. The motifs identified can be mapped to known targeting motifs and they are more conserved than the average protein sequence. Using our motif-based predictions, we can identify potential annotation errors in public databases for the location of some of the proteins. A software implementation and the data set described in this paper are available from http://murphylab.web.cmu.edu/software/2009_TCBB_motif/.  相似文献   

18.
Proper subcellular localization is critical for proteins to perform their roles in cellular functions. Proteins are transported by different cellular sorting pathways, some of which take a protein through several intermediate locations until reaching its final destination. The pathway a protein is transported through is determined by carrier proteins that bind to specific sequence motifs. In this article, we present a new method that integrates protein interaction and sequence motif data to model how proteins are sorted through these sorting pathways. We use a hidden Markov model (HMM) to represent protein sorting pathways. The model is able to determine intermediate sorting states and to assign carrier proteins and motifs to the sorting pathways. In simulation studies, we show that the method can accurately recover an underlying sorting model. Using data for yeast, we show that our model leads to accurate prediction of subcellular localization. We also show that the pathways learned by our model recover many known sorting pathways and correctly assign proteins to the path they utilize. The learned model identified new pathways and their putative carriers and motifs and these may represent novel protein sorting mechanisms. Supplementary results and software implementation are available from http://murphylab.web.cmu.edu/software/2010_RECOMB_pathways/.  相似文献   

19.
The study of the protein?Cprotein interactions (PPIs) of unique ORFs is a strategy for deciphering the biological roles of unique ORFs of interest. For uniform reference, we define unique ORFs as those for which no matching protein is found after PDB-BLAST search with default parameters. The uniqueness of the ORFs generally precludes the straightforward use of structure-based approaches in the design of experiments to explore PPIs. Many open-source bioinformatics tools, from the commonly-used to the relatively esoteric, have been built and validated to perform analyses and/or predictions of sorts on proteins. How can these available tools be combined into a protocol that helps the non-expert bioinformaticist researcher to design experiments to explore the PPIs of their unique ORF? Here we define a pragmatic protocol based on accessibility of software to achieve this and we make it concrete by applying it on two proteins??the ImuB and ImuA?? proteins from Mycobacterium tuberculosis. The protocol is pragmatic in that decisions are made largely based on the availability of easy-to-use freeware. We define the following basic and user-friendly software pathway to build testable PPI hypotheses for a query protein sequence: PSI-PRED????MUSTER????metaPPISP????ASAView and ConSurf. Where possible, other analytical and/or predictive tools may be included. Our protocol combines the software predictions and analyses with general bioinformatics principles to arrive at consensus, prioritised and testable PPI hypotheses.  相似文献   

20.
Protein fluorescence is a powerful tool for studying protein structure and dynamics if we have a means to interpret the spectral data in terms of protein structural properties. Our previous research successfully provided this support through the development of individual software modules implementing the algorithms for fluorescence and structural analyses. Now we have integrated the developed software modules, introduced a new program for the assignment of tryptophan residues to spectral-structural classes, and created a web-based toolkit PFAST: protein fluorescence and structural toolkit: http://pfast.phys.uri.edu/. PFAST contains three modules: (1) FCAT is a fluorescence-correlation analysis tool, which decomposes protein fluorescence spectra to reveal the spectral components of individual tryptophan residues or groups of tryptophan residues located close to each other, and assigns spectral components to one of five previously established spectral-structural classes. (2) SCAT is a structural-correlation analysis tool for the calculation of the structural parameters of the environment of tryptophan residues from the atomic structures of the proteins from the Protein Data Bank (PDB), and for the assignment of tryptophan residues to one of five spectral-structural classes. (3) The last module is a PFAST database that contains protein fluorescence and structural data obtained from results of the FCAT and SCAT analyses.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号