期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Optimal construction of theoretical spectra for MS/MS spectra identification

Fridman T Protopopescu V Hurst G Borziak A Gorin A 《Omics : a journal of integrative biology》2005,9(4):380-390

We derive the optimal number of peaks (defined as the minimum number that provides the required efficiency of spectra identification) in the theoretical spectra as a function of (i) the experimental accuracy, sigma, of the measured ratio m/z; (ii) experimental spectrum density; (iii) size of the database; (iv) number of peaks in the theoretical spectra; and (v) types of ions that the peaks represent. We show that if theoretical spectra are constructed including b and y ions alone, then for sigma = 0.5, which is typical for high-throughput data, peptide chains of eight amino acids or longer can be identified based on the positions of peaks alone, at a rate of false identification below 1%. To discriminate between shorter peptides, additional (e.g., intensity-inferred) information is necessary. We derive the dependence of the probability of false identification on the number of peaks in the theoretical spectra and on the types of ions that the peaks represent. Our results suggest that the class of mass spectrum identification problems, for which more elaborate development of fragmentation rules (such as intensity model) is required, can be reduced to the problems that involve homologous peptides. 相似文献

2.

Regular expressions of MS/MS spectra for partial annotation of metabolite features

Fumio Matsuda 《Metabolomics : Official journal of the Metabolomic Society》2016,12(7):113

相似文献

3.

Methods for calculating the probabilities of finding patterns in sequences 总被引：1，自引：0，他引：1

Staden Rodger 《Bioinformatics (Oxford, England)》1989,5(2):89-96

This paper describes the use of probability-generating functionsfor calculating the probabilities of finding motifs in nucleicacid and protein sequences. Equations and algorithms are givenfor calculating the probabilities associated with nine differentways of defining motifs. Comparisons are made with searchesof random sequences. A higher level structure-the pattern-isdefined as a list of motifs. A pattern also specifies the permittedranges of spacing allowed between its constituent motifs. Equationsfor calculating the expected numbers of matches to patternsare given. Received on March 1, 1988; accepted on September 30, 1988 相似文献

4.

A tool for aligning very similar DNA sequences 总被引：4，自引：0，他引：4

Chao Kun-Mao; Zhang Jinghui; Ostell James; Miller Webb 《Bioinformatics (Oxford, England)》1997,13(1):75-80

Results: We have produced a computer program, named sim3, thatsolves the following computational problem. Two DNA sequencesare given, where the shorter sequence is very similar to somecontiguous region of the longer sequence. Sim3 determines sucha similar region of the longer sequence, and then computes anoptimal set of single-nucleotide changes (i.e. insertions, deletionsor substitutions) that will convert the shorter sequence tothat region. Thus, the alignment scoring scheme is designedto model sequencing errors, rather than evolutionary processes.The program can align a 100 kb sequence to a 1 megabase sequencein a few seconds on a workstation, provided that there are veryfew differences between the shorter sequence and some regionin the longer sequence. The program has been used to assemblesequence data for the Genomes Division at the National Centerfor Biotechnology Information. Availability: A version of sim3 for UNIX machines can be obtainedby anonymous ftp from ncbi. nlm. nih. gov, in the pub/sim3 directory. Contact: For portable versions for Macs and PCs, contact zjing@sunset.nlm. nih. gov. 相似文献

5.

A tool for calculating binding-site residues on proteins from PDB structures

Jing Hu Changhui Yan 《BMC structural biology》2009,9(1):52-6

Background

In the research on protein functional sites, researchers often need to identify binding-site residues on a protein. A commonly used strategy is to find a complex structure from the Protein Data Bank (PDB) that consists of the protein of interest and its interacting partner(s) and calculate binding-site residues based on the complex structure. However, since a protein may participate in multiple interactions, the binding-site residues calculated based on one complex structure usually do not reveal all binding sites on a protein. Thus, this requires researchers to find all PDB complexes that contain the protein of interest and combine the binding-site information gleaned from them. This process is very time-consuming. Especially, combing binding-site information obtained from different PDB structures requires tedious work to align protein sequences. The process becomes overwhelmingly difficult when researchers have a large set of proteins to analyze, which is usually the case in practice. 相似文献

6.

Evaluation of MALDI-TOF MS as a tool for high-throughput dereplication

Ghyselinck J Van Hoorde K Hoste B Heylen K De Vos P 《Journal of microbiological methods》2011,86(3):327-336

The present study examined the suitability of matrix assisted laser desorption/ionisation time-of-flight mass spectrometry (MALDI-TOF MS) for the rapid grouping of bacterial isolates, i.e. dereplication. Dereplication is important in large-scale isolation campaigns and screening programs since it can significantly reduce labor intensity, time and costs in further downstream analyses. Still, current dereplication techniques are time consuming and costly. MALDI-TOF MS is an attractive tool since it performs fast and cheap analyses with the potential of automation. However, its taxonomic resolution for a broad diversity of bacteria remains largely unknown. To verify the suitability of MALDI-TOF MS for dereplication, a total of 249 unidentified bacterial isolates retrieved from the rhizosphere of potato plants, were analyzed with both MALDI-TOF MS and repetitive element sequence based polymerase chain reaction (rep-PCR). The latter technique was used as a benchmark. Cluster analysis and inspection of the profiles showed that for 204 isolates (82%) the taxonomic resolution of both techniques was comparable, while for 45 isolates (18%) one of both techniques had a higher taxonomic resolution. Additionally, 16S rRNA gene sequence analysis was performed on all members of each delineated cluster to gain insight in the identity and sequence similarity between members in each cluster. MALDI-TOF MS proved to have higher reproducibility than rep-PCR and seemed to be more promising with respect to high-throughput analyses, automation, and time and cost efficiency. Its taxonomic resolution was situated at the species to strain level. The present study demonstrated that MALDI-TOF MS is a powerful tool for dereplication. 相似文献

7.

COBALT: constraint-based alignment tool for multiple protein sequences

Papadopoulos JS Agarwala R 《Bioinformatics (Oxford, England)》2007,23(9):1073-1079

MOTIVATION: A tool that simultaneously aligns multiple protein sequences, automatically utilizes information about protein domains, and has a good compromise between speed and accuracy will have practical advantages over current tools. RESULTS: We describe COBALT, a constraint based alignment tool that implements a general framework for multiple alignment of protein sequences. COBALT finds a collection of pairwise constraints derived from database searches, sequence similarity and user input, combines these pairwise constraints, and then incorporates them into a progressive multiple alignment. We show that using constraints derived from the conserved domain database (CDD) and PROSITE protein-motif database improves COBALT's alignment quality. We also show that COBALT has reasonable runtime performance and alignment accuracy comparable to or exceeding that of other tools for a broad range of problems. AVAILABILITY: COBALT is included in the NCBI C++ toolkit. A Linux executable for COBALT, and CDD and PROSITE data used is available at: ftp://ftp.ncbi.nlm.nih.gov/pub/agarwala/cobalt 相似文献

8.

SWORDS: A statistical tool for analysing large DNA sequences

Probal?Chaudhuri Email author Sandip?Das 《Journal of biosciences》2002,27(1):1-6

In this article, we present some simple yet effective statistical techniques for analysing and comparing large DNA sequences. These techniques are based on frequency distributions of DNA words in a large sequence, and have been packaged into a software called SWORDS. Using sequences available in public domain databases housed in the Internet, we demonstrate how SWORDS can be conveniently used by molecular biologists and geneticists to unmask biologically important features hidden in large sequences and assess their statistical significance. 相似文献

9.

Power spectra of pulse sequences and implications for membrane fluctuations

K. L. Schick 《Acta biotheoretica》1974,23(1):1-17

相似文献

10.

Validation of peptide MS/MS spectra using metabolic isotope labeling for spectral matching-based shotgun proteome analysis

Xu M Li L 《Journal of proteome research》2011,10(8):3632-3641

We report an isotope labeling shotgun proteome analysis strategy to validate the spectrum-to-sequence assignments generated by using sequence-database searching for the construction of a more reliable MS/MS spectral library. This strategy is demonstrated in the analysis of the E. coli K12 proteome. In the workflow, E. coli cells were cultured in normal and (15)N-enriched media. The differentially labeled proteins from the cell extracts were subjected to trypsin digestion and two-dimensional liquid chromatography quadrupole time-of-flight tandem mass spectrometry (2D-LC QTOF MS/MS) analysis. The MS/MS spectra of the two samples were individually searched using Mascot against the E. coli proteome database to generate lists of peptide sequence matches. The two data sets were compared by overlaying the spectra of unlabeled and labeled matches of the same peptide sequence for validation. Two cutoff filters, one based on the number of common fragment ions and another one on the similarity of intensity patterns among the common ions, were developed and applied to the overlaid spectral pairs to reject the low quality or incorrectly assigned spectra. By examining 257,907 and 245,156 spectra acquired from the unlabeled and (15)N-labeled samples, respectively, an experimentally validated MS/MS spectral library of tryptic peptides was constructed for E. coli K12 that consisted of 9,302 unique spectra with unique sequence and charge state, representing 7,763 unique peptide sequences. This E. coli spectral library could be readily expanded, and the overall strategy should be applicable to other organisms. Even with this relatively small library, it was shown that more peptides could be identified with higher confidence using the spectral search method than by sequence-database searching. 相似文献

11.

DNannotator: Annotation software tool kit for regional genomic sequences

Liu C Bonner TI Nguyen T Lyons JL Christian SL Gershon ES 《Nucleic acids research》2003,31(13):3729-3735

Sequence annotation is essential for genomics-based research. Investigators of a specific genomic region who have developed abundant local discoveries such as genes and genetic markers, or have collected annotations from multiple resources, can be overwhelmed by the difficulty in creating local annotation and the complexity of integrating all the annotations. Presenting such integrated data in a form suitable for data mining and high-throughput experimental design is even more daunting. DNannotator, a web application, was designed to perform batch annotation on a sizeable genomic region. It takes annotation source data, such as SNPs, genes, primers, and so on, prepared by the end-user and/or a specified target of genomic DNA, and performs de novo annotation. DNannotator can also robustly migrate existing annotations in GenBank format from one sequence to another. Annotation results are provided in GenBank format and in tab-delimited text, which can be imported and managed in a database or spreadsheet and combined with existing annotation as desired. Graphic viewers, such as Genome Browser or Artemis, can display the annotation results. Reference data (reports on the process) facilitating the user's evaluation of annotation quality are optionally provided. DNannotator can be accessed at http://sky.bsd.uchicago.edu/DNannotator.htm. 相似文献

12.

VirtualSpectrum,a tool for simulating peak list for multi-dimensional NMR spectra

Jakob Toudahl Nielsen Niels Chr. Nielsen 《Journal of biomolecular NMR》2014,60(1):51-66

NMR spectroscopy is a widely used technique for characterizing the structure and dynamics of macromolecules. Often large amounts of NMR data are required to characterize the structure of proteins. To save valuable time and resources on data acquisition, simulated data is useful in the developmental phase, for data analysis, and for comparison with experimental data. However, existing tools for this purpose can be difficult to use, are sometimes specialized for certain types of molecules or spectra, or produce too idealized data. Here we present a fast, flexible and robust tool, VirtualSpectrum, for generating peak lists for most multi-dimensional NMR experiments for both liquid and solid state NMR. It is possible to tune the quality of the generated peak lists to include sources of artifacts from peak overlap, noise and missing signals. VirtualSpectrum uses an analytic expression to represent the spectrum and derive the peak positions, seamlessly handling overlap between signals. We demonstrate our tool by comparing simulated and experimental spectra for different multi-dimensional NMR spectra and analyzing systematically three cases where overlap between peaks is particularly relevant; solid state NMR data, liquid state NMR homonuclear ¹H and ¹⁵N-edited spectra, and 2D/3D heteronuclear correlation spectra of unstructured proteins. We analyze the impact of protein size and secondary structure on peak overlap and on the accuracy of structure determination based on data of different qualities simulated by VirtualSpectrum. 相似文献

13.

LDDist: a Perl module for calculating LogDet pair-wise distances for protein and nucleotide sequences

Thollesson M 《Bioinformatics (Oxford, England)》2004,20(3):416-418

LDDist is a Perl module implemented in C++ that allows the user to calculate LogDet pair-wise genetic distances for amino acid as well as nucleotide sequence data. It can handle site-to-site rate variation by treating a proportion of the sites as invariant and/or by assigning sites to different, presumably homogenous, rate categories. The rate-class assignments and invariant proportion can be set explicitly, or estimated by the program; the latter using either of two different capture-recapture methods. The assignment to rate categories in lieu of a phylogeny can be done using Shannon-Wiener index as a crude token for relative rate. 相似文献

14.

Predotar: A tool for rapidly screening proteomes for N-terminal targeting sequences 总被引：14，自引：0，他引：14

Small I Peeters N Legeai F Lurin C 《Proteomics》2004,4(6):1581-1590

Probably more than 25% of the proteins encoded by the nuclear genomes of multicellular eukaryotes are targeted to membrane-bound compartments by N-terminal targeting signals. The major signals are those for the endoplasmic reticulum, the mitochondria, and in plants, plastids. The most abundant of these targeted proteins are well-known and well-studied, but a large proportion remain unknown, including most of those involved in regulation of organellar gene expression or regulation of biochemical pathways. The discovery and characterization of these proteins by biochemical means will be long and difficult. An alternative method is to identify candidate organellar proteins via their characteristic N-terminal targeting sequences. We have developed a neural network-based approach (Predotar--Prediction of Organelle Targeting sequences) for identifying genes encoding these proteins amongst eukaryotic genome sequences. The power of this approach for identifying and annotating novel gene families has been illustrated by the discovery of the pentatricopeptide repeat family. 相似文献

15.

Clustering of MS spectra for improved protein identification rate and screening for protein variants and modifications by MALDI-MS/MS

Granlund I Kieselbach T Alm R Schröder WP Emanuelsson C 《Journal of Proteomics》2011,74(8):1190-1200

It is an established fact that allelic variation and post-translational modifications create different variants of proteins, which are observed as isoelectric and size subspecies in two-dimensional gel based proteomics. Here we explore the stromal proteome of spinach and Arabidopsis chloroplast and show that clustering of mass spectra is a useful tool for investigating such variants and detecting modified peptides with amino acid substitutions or post-translational modifications. This study employs data mining by hierarchical clustering of MALDI-MS spectra, using the web version of the SPECLUST program (http://bioinfo.thep.lu.se/speclust.html). The tool can also be used to remove peaks of contaminating proteins and to improve protein identification, especially for species without a fully sequenced genome. Mutually exclusive peptide peaks within a cluster provide a good starting point for MS/MS investigation of modified peptides, here exemplified by the identification of an A to E substitution that accounts for the isoelectric heterogeneity in protein isoforms. 相似文献

16.

Regression trees for analysis of mutational spectra in nucleotide sequences.

V B Berikov I B Rogozin 《Bioinformatics (Oxford, England)》1999,15(7-8):553-562

MOTIVATION: The study and comparison of mutational spectra is an important problem in molecular biology, because these spectra often reveal important features of the action of various mutagens and the functioning of repair/replication enzymes. As is known, mutability varies significantly along nucleotide sequences: mutations often concentrate at certain positions in a sequence, otherwise termed 'hotspots'. RESULTS: Herein, we propose a regression analysis method based on the use of regression trees in order to analyse the influence of nucleotide context on the occurrence of such hotspots. The REGRT program developed has been tested on simulated and real mutational spectra. For the G:C-->T:A mutational spectra induced by Sn1 alkylating agents (nine spectra), the prediction accuracy was 0. 99. AVAILABILITY: The REGRT program is available upon request from V.Berikov. 相似文献

17.

BLogo: a tool for visualization of bias in biological sequences

Li W Yang B Liang S Wang Y Whiteley C Cao Y Wang X 《Bioinformatics (Oxford, England)》2008,24(19):2254-2255

Blogo is a web-based tool that detects and displays statistically significant position-specific sequence bias with reduced background noise. The over-represented and under-represented symbols in a particular position are shown above and below the zero line. When the sequences are in open reading frames, the background frequency of nucleotides could be calculated separately for the three positions of a codon, thus greatly reducing the background noise. The chi(2)-test or Fisher's exact test is used to evaluate the statistical significance of every symbol in every position and only those that are significant are highlighted in the resulting logo. The perl source code of the program is freely available and can be run locally. AVAILABILITY: http://acephpx.cropdb.org/blogo/, http://www.bioinformatics.org/blogo/. 相似文献

18.

Murlet: a practical multiple alignment tool for structural RNA sequences

Kiryu H Tabei Y Kin T Asai K 《Bioinformatics (Oxford, England)》2007,23(13):1588-1598

MOTIVATION: Structural RNA genes exhibit unique evolutionary patterns that are designed to conserve their secondary structures; these patterns should be taken into account while constructing accurate multiple alignments of RNA genes. The Sankoff algorithm is a natural alignment algorithm that includes the effect of base-pair covariation in the alignment model. However, the extremely high computational cost of the Sankoff algorithm precludes its application to most RNA sequences. RESULTS: We propose an efficient algorithm for the multiple alignment of structural RNA sequences. Our algorithm is a variant of the Sankoff algorithm, and it uses an efficient scoring system that reduces the time and space requirements considerably without compromising on the alignment quality. First, our algorithm computes the match probability matrix that measures the alignability of each position pair between sequences as well as the base pairing probability matrix for each sequence. These probabilities are then combined to score the alignment using the Sankoff algorithm. By itself, our algorithm does not predict the consensus secondary structure of the alignment but uses external programs for the prediction. We demonstrate that both the alignment quality and the accuracy of the consensus secondary structure prediction from our alignment are the highest among the other programs examined. We also demonstrate that our algorithm can align relatively long RNA sequences such as the eukaryotic-type signal recognition particle RNA that is approximately 300 nt in length; multiple alignment of such sequences has not been possible by using other Sankoff-based algorithms. The algorithm is implemented in the software named 'Murlet'. AVAILABILITY: The C++ source code of the Murlet software and the test dataset used in this study are available at http://www.ncrna.org/papers/Murlet/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. 相似文献

19.

A method for constructing maximum parsimony ancestral amino acid sequences on a given network 总被引：1，自引：0，他引：1

G W Moore J Barnabas M Goodman 《Journal of theoretical biology》1973,38(3):459-485

A solution is presented for the problem of how to find ancestral codons which minimize the number of mutations over a given network of species for which character-states of aligned amino acid sequences among the contemporary species are known. Three theorems which allow this “maximum parsimony” problem to be solved are proved; then the use of these theorems in finding maximum parsimony ancestral codons is illustrated on a network of chicken and mammalian alpha globin amino acid sequences at two alignment positions. 相似文献

20.

A polynomial time algorithm for calculating the probability of a ranked gene tree given a species tree

Stadler T Degnan JH 《Algorithms for molecular biology : AMB》2012,7(1):7

ABSTRACT: BACKGROUND: The ancestries of genes form gene trees which do not necessarily have the same topology as the species tree due to incomplete lineage sorting. Available algorithms determining the probability of a gene tree given a species tree require exponential computational runtime. RESULTS: In this paper, we provide a polynomial time algorithm to calculate the probability of a ranked gene tree topology for a given species tree, where a ranked tree topology is a tree topology with the internal vertices being ordered. The probability of a gene tree topology can thus be calculated in polynomial time if the number of orderings of the internal vertices is a polynomial number. However, the complexity of calculating the probability of a gene tree topology with an exponential number of rankings for a given species tree remains unknown. CONCLUSIONS: Polynomial algorithms for calculating ranked gene tree probabilities may become useful in developing methodology to infer species trees based on a collection of gene trees, leading to a more accurate reconstruction of ancestral species relationships. 相似文献