期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Parallel BLAST on split databases 总被引：1，自引：0，他引：1

Mathog DR 《Bioinformatics (Oxford, England)》2003,19(14):1865-1866

SUMMARY: BLAST programs often run on large SMP machines where multiple threads can work simultaneously and there is enough memory to cache the databases between program runs. A group of programs is described which allows comparable performance to be achieved with a Beowulf configuration in which no node has enough memory to cache a database but the cluster as an aggregate does. To achieve this result, databases are split into equal sized pieces and stored locally on each node. Each query is run on all nodes in parallel and the resultant BLAST output files from all nodes merged to yield the final output. AVAILABILITY: Source code is available from ftp://saf.bio.caltech.edu/ 相似文献

2.

Benchmarking of 16S rRNA gene databases using known strain sequences

Kunal Dixit Dimple Davray Diptaraj Chaudhari Pratik Kadam Rudresh Kshirsagar Yogesh Shouche Dhiraj Dhotre Sunil D Saroj 《Bioinformation》2021,17(3):377

16S rRNA gene analysis is the most convenient and robust method for microbiome studies. Inaccurate taxonomic assignment of bacterial strains could have deleterious effects as all downstream analyses rely heavily on the accurate assessment of microbial taxonomy. The use of mock communities to check the reliability of the results has been suggested. However, often the mock communities used in most of the studies represent only a small fraction of taxa and are used mostly as validation of sequencing run to estimate sequencing artifacts. Moreover, a large number of databases and tools available for classification and taxonomic assignment of the 16S rRNA gene make it challenging to select the best-suited method for a particular dataset. In the present study, we used authentic and validly published 16S rRNA gene type strain sequences (full length, V3-V4 region) and analyzed them using a widely used QIIME pipeline along with different parameters of OTU clustering and QIIME compatible databases. Data Analysis Measures (DAM) revealed a high discrepancy in ratifying the taxonomy at different taxonomic hierarchies. Beta diversity analysis showed clear segregation of different DAMs. Limited differences were observed in reference data set analysis using partial (V3-V4) and full-length 16S rRNA gene sequences, which signify the reliability of partial 16S rRNA gene sequences in microbiome studies. Our analysis also highlights common discrepancies observed at various taxonomic levels using various methods and databases. 相似文献

3.

An automated annotation tool for genomic DNA sequences using GeneScan and BLAST 总被引：1，自引：0，他引：1

Lynn AM Jain CK Kosalai K Barman P Thakur N Batra H Bhattacharya A 《Journal of genetics》2001,80(1):9-16

Genomic sequence data are often available well before the annotated sequence is published. We present a method for analysis of genomic DNA to identify coding sequences using the GeneScan algorithm and characterize these resultant sequences by BLAST. The routines are used to develop a system for automated annotation of genome DNA sequences. 相似文献

4.

Automated methods of predicting the function of biological sequences using GO and BLAST

Craig?E?Jones Email author Ute?Baumann Alfred?L?Brown 《BMC bioinformatics》2005,6(1):272

Background

With the exponential increase in genomic sequence data there is a need to develop automated approaches to deducing the biological functions of novel sequences with high accuracy. Our aim is to demonstrate how accuracy benchmarking can be used in a decision-making process evaluating competing designs of biological function predictors. We utilise the Gene Ontology, GO, a directed acyclic graph of functional terms, to annotate sequences with functional information describing their biological context. Initially we examine the effect on accuracy scores of increasing the allowed distance between predicted and a test set of curator assigned terms. Next we evaluate several annotator methods using accuracy benchmarking. Given an unannotated sequence we use the Basic Local Alignment Search Tool, BLAST, to find similar sequences that have already been assigned GO terms by curators. A number of methods were developed that utilise terms associated with the best five matching sequences. These methods were compared against a benchmark method of simply using terms associated with the best BLAST-matched sequence (best BLAST approach). 相似文献

5.

BLAST2SRS,a web server for flexible retrieval of related protein sequences in the SWISS-PROT and SPTrEMBL databases

Bimpikis K Budd A Linding R Gibson TJ 《Nucleic acids research》2003,31(13):3792-3794

SRS (Sequence Retrieval System) is a widely used keyword search engine for querying biological databases. BLAST2 is the most widely used tool to query databases by sequence similarity search. These tools allow users to retrieve sequences by shared keyword or by shared similarity, with many public web servers available. However, with the increasingly large datasets available it is now quite common that a user is interested in some subset of homologous sequences but has no efficient way to restrict retrieval to that set. By allowing the user to control SRS from the BLAST output, BLAST2SRS (http://blast2srs.embl.de/) aims to meet this need. This server therefore combines the two ways to search sequence databases: similarity and keyword. 相似文献

6.

Genomic BLAST: custom-defined virtual databases for complete and unfinished genomes 总被引：10，自引：0，他引：10

Cummings L Riley L Black L Souvorov A Resenchuk S Dondoshansky I Tatusova T 《FEMS microbiology letters》2002,216(2):133-138

BLAST (Basic Local Alignment Search Tool) searches against DNA and protein sequence databases have become an indispensable tool for biomedical research. The proliferation of the genome sequencing projects is steadily increasing the fraction of genome-derived sequences in the public databases and their importance as a public resource. We report here the availability of Genomic BLAST, a novel graphical tool for simplifying BLAST searches against complete and unfinished genome sequences. This tool allows the user to compare the query sequence against a virtual database of DNA and/or protein sequences from a selected group of organisms with finished or unfinished genomes. The organisms for such a database can be selected using either a graphic taxonomy-based tree or an alphabetical list of organism-specific sequences. The first option is designed to help explore the evolutionary relationships among organisms within a certain taxonomy group when performing BLAST searches. The use of an alphabetical list allows the user to perform a more elaborate set of selections, assembling any given number of organism-specific databases from unfinished or complete genomes. This tool, available at the NCBI web site http://www.ncbi.nlm.nih.gov/cgi-bin/Entrez/genom_table_cgi, currently provides access to over 170 bacterial and archaeal genomes and over 40 eukaryotic genomes. 相似文献

7.

Comparing compressed sequences for faster nucleotide BLAST searches

Cameron M Williams HE 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2007,4(3):349-364

相似文献

8.

Characterization of tomato SSR markers developed using BAC-end and cDNA sequences from genome databases

Akio Ohyama Erika Asamizu Satomi Negoro Koji Miyatake Hirotaka Yamaguchi Satoshi Tabata Hiroyuki Fukuoka 《Molecular breeding : new strategies in plant improvement》2009,23(4):685-691

We developed nearly 700 non-redundant 2- or 3-base simple sequence repeat (SSR) markers from tomato using sequence data obtained from open genome databases. Among various types of core motifs, AT was most abundant in SSRs derived from cDNAs (~53%) and bacterial artificial chromosome (BAC) ends (~72%). There was a positive correlation between the rate of detection of polymorphic alleles (heterozygosity value; Hv) and the repeat number of the core motif in all markers showing polymorphisms among at least one pair of six cultivars or lines tested (r = 0.566**). The average Hv of BAC-end-derived SSR markers (~0.5) was higher than that of cDNA-derived markers (~0.3). These characteristics of BAC-end-derived SSRs are useful for genetic studies using closely related cultivars and lines. However, BAC-end-derived SSRs tended to cluster in centromeric regions (~80%). A scheme for the construction of a high-density linkage map of tomato is discussed. 相似文献

9.

Contamination of sequence databases with adaptor sequences.

T Yoshikawa A R Sanders S D Detera-Wadleigh 《American journal of human genetics》1997,60(2):463-466

相似文献

10.

Querying the public databases for sequences using complex keywords contained in the feature lines

Olivier Croce Micha?l Lamarre Richard Christen 《BMC bioinformatics》2006,7(1):45

Background

High throughput technologies often require the retrieval of large data sets of sequences. Retrieval of EMBL or GenBank entries using keywords is easy using tools such as ACNUC, Entrez or SRS, but has some limitations, in particular when querying with complex keywords. 相似文献

11.

cTrans: generating polypeptide databases from cDNA sequences

Xu H Yang L Xu P Tao Y Ma Z 《Proteomics》2007,7(2):177-179

cTrans is a comprehensive utility used to generate polypeptide databases from cDNA sequences. The goal is achieved through integrating four main functions, including retrieving sequences of species of interest from the downloaded packages from dbEST of GenBank, format conversion, checking and deleting vector and adaptor contamination, and translating the cDNA sequences in all six frames and selecting specific translations for database construction in a user-defined length threshold. In addition, this utility is also applicable to cDNA sequences produced by users themselves. 相似文献

12.

PhyloBLAST: facilitating phylogenetic analysis of BLAST results

Brinkman FS Wan I Hancock RE Rose AM Jones SJ 《Bioinformatics (Oxford, England)》2001,17(4):385-387

PhyloBLAST is an internet-accessed application based on CGI/Perl programming that compares a users protein sequence to a SwissProt/TREMBL database using BLAST2 and then allows phylogenetic analyses to be performed on selected sequences from the BLAST output. Flexible features such as ability to input your own multiple sequence alignment and use PHYLIP program options provide additional web-based phylogenetic analysis functionality beyond the analysis of a BLAST result. 相似文献

13.

Comparison of protein expression lists from mass spectrometry of human blood fluids using exact peptide sequences versus BLAST

Peihong Zhu Peter Bowden Voitek Pendrak Herbert Thiele Du Zhang Michael Siu Eleftherios P. Diamandis John Marshall 《Clinical proteomics》2006,2(3-4):185-203

The proteins in blood were all first expressed as mRNAs from genes within cells. There are databases of human proteins that are known to be expressed as mRNA in human cells and tissues. Proteins identified from human blood by the correlation of mass spectra that fail to match human mRNA expression products may not be correct. We compared the proteins identified in human blood by mass spectrometry by 10 different groups by correlation to human and nonhuman nucleic acid sequences. We determined whether the peptides or proteins identified by the different groups mapped to the human known proteins of the Reference Sequence (RefSeq) database. We used Structured Query Language data base searches of the peptide sequences correlated to tandem mass spectrometry spectra and basic local alignment search tool analysis of the identified full length proteins to control for correlation to the wrong peptide sequence or the existence of the same or very similar peptide sequence shared by more than one protein. Mass spectra were correlated against large protein data bases that contain many sequences that may not be expressed in human beings yet the search returned a very high percentage of peptides or proteins that are known to be found in humans. Only about 5% of proteins mapped to hypothetical sequences, which is in agreement with the reported false-positive rate of searching algorithms conditions. The results were highly enriched in secreted and soluble proteins and diminished in insoluble or membrane proteins. Most of the proteins identified were relatively short and showed a similar size distribution compared to the RefSeq database. At least three groups agree on a nonredundant set of 1671 types of proteins and a nonredundant set of 3151 proteins were identified by at least three peptides. 相似文献

14.

Identification of antimicrobial peptides from teleosts and anurans in expressed sequence tag databases using conserved signal sequences

Tessera V Guida F Juretić D Tossi A 《The FEBS journal》2012,279(5):724-736

相似文献

15.

PatMaN: rapid alignment of short sequences to large databases

Prüfer K Stenzel U Dannemann M Green RE Lachmann M Kelso J 《Bioinformatics (Oxford, England)》2008,24(13):1530-1531

We present a tool suited for searching for many short nucleotide sequences in large databases, allowing for a predefined number of gaps and mismatches. The commandline-driven program implements a non-deterministic automata matching algorithm on a keyword tree of the search strings. Both queries with and without ambiguity codes can be searched. Search time is short for perfect matches, and retrieval time rises exponentially with the number of edits allowed. AVAILABILITY: The C++ source code for PatMaN is distributed under the GNU General Public License and has been tested on the GNU/Linux operating system. It is available from http://bioinf.eva.mpg.de/patman. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. 相似文献

16.

Highly clustered zein gene sequences reveal evolutionary history of the multigene family 总被引：3，自引：0，他引：3

G Heidecker S Chaudhuri J Messing 《Genomics》1991,10(3):719-732

We have determined the nucleotide sequences of zein cDNA clones ZG14, ZG15, and ZG35. The three clones have 95 to 98% homology to the previously published sequence of clone A20, and 84% homology to sequences of the zein subfamily A30. Comparison of all sequences of the A30 and A20 subfamilies highlights the following features: the 5' nontranslated regions are 68 and 57 nucleotides in length for the A20- and A30-like mRNAs, respectively, and contain at least three repeats of the consensus sequence ACGAACAAta/gG; the majority of these genes are highly clustered as judged from pulsed-field gel electrophoresis of high molecular weight maize DNA. Furthermore, we discuss a model for the evolution of the multigene family which stresses the special importance of unequal crossingover and gene conversion in this system. 相似文献

17.

ViroBLAST: a stand-alone BLAST web server for flexible queries of multiple databases and user's datasets 总被引：2，自引：0，他引：2

Deng W Nickle DC Learn GH Maust B Mullins JI 《Bioinformatics (Oxford, England)》2007,23(17):2334-2336

ViroBLAST is a stand-alone BLAST web interface for nucleotide and amino acid sequence similarity searches. It extends the utility of BLAST to query against multiple sequence databases and user sequence datasets, and provides a friendly output to easily parse and navigate BLAST results. ViroBLAST is readily useful for all research areas that require BLAST functions and is available online and as a downloadable archive for independent installation. Availability: http://indra.mullins.microbiol.washington.edu/blast/viroblast.php. 相似文献

18.

TreeGeneBrowser: phylogenetic data mining of gene sequences from public databases

Jakobsen IB Saleeba JA Poidinger M Littlejohn TG 《Bioinformatics (Oxford, England)》2001,17(6):535-540

MOTIVATION: Sequence databases represent an enormous resource of phylogenetic information, but there is a lack of tools for accessing that information in order to assess the amount of evolutionary information in these databases that may be suitable for phylogenetic reconstruction and for identifying areas of the taxonomy that are under-represented for specific gene sequences. RESULTS: We have developed TreeGeneBrowser which allows inspection and evaluation of gene sequence data for phylogenetic reconstruction. This program improves the efficiency of identification of genes that may be useful for particular phylogenetic studies and identifies taxa and taxonomic branches that are under-represented in sequence databases. 相似文献

19.

Identification of tryptic peptides from large databases using multiplexed tandem mass spectrometry: simulations and experimental results

Masselon C Pasa-Tolić L Lee SW Li L Anderson GA Harkewicz R Smith RD 《Proteomics》2003,3(7):1279-1286

Multiplexed tandem mass spectrometry (MS/MS) has recently been demonstrated as a means to increase the throughput of peptide identification in liquid chromatography (LC) MS/MS experiments. In this approach, a set of parent species is dissociated simultaneously and measured in a single spectrum (in the same manner that a single parent ion is conventionally studied), providing a gain in sensitivity and throughput proportional to the number of species that can be simultaneously addressed. In the present work, simulations performed using the Caenorhabditis elegans predicted proteins database show that multiplexed MS/MS data allow the identification of tryptic peptides from mixtures of up to ten peptides from a single dataset with only three "y" or "b" fragments per peptide and a mass accuracy of 2.5 to 5 ppm. At this level of database and data complexity, 98% of the 500 peptides considered in the simulation were correctly identified. This compares favorably with the rates obtained for classical MS/MS at more modest mass measurement accuracy. LC multiplexed Fourier transform-ion cyclotron resonance MS/MS data obtained from a 66 kDa protein (bovine serum albumin) tryptic digest sample are presented to illustrate the approach, and confirm that peptides can be effectively identified from the C. elegans database to which the protein sequence had been appended. 相似文献

20.

On the reliability of DNA sequences of Ophiocordyceps sinensis in public databases

Shu Zhang Yong-Jie Zhang Xing-Zhong Liu Hong Zhang Dian-Sheng Liu 《Journal of industrial microbiology & biotechnology》2013,40(3-4):365-378

Some DNA sequences in the International Nucleotide Sequence Databases (INSD) are erroneously annotated, which has lead to misleading conclusions in publications. Ophiocordyceps sinensis (syn. Cordyceps sinensis) is a fungus endemic to the Tibetan Plateau, and more than 100 populations covering almost its distribution area have been examined by us over recent years. In this study, using the data from authentic materials, we have evaluated the reliability of nucleotide sequences annotated as O. sinensis in the INSD. As of October 15, 2012, the INSD contained 874 records annotated as O. sinensis, including 555 records representing nuclear ribosomal DNA (63.5 %), 197 representing protein-coding genes (22.5 %), 92 representing random markers with unknown functions (10.5 %), and 30 representing microsatellite loci (3.5 %). Our analysis indicated that 39 of the 397 internal transcribed spacer entries, 27 of the 105 small subunit entries, and five of the 53 large subunit entries were incorrectly annotated as belonging to O. sinensis. For protein-coding sequences, all records of serine protease genes, the mating-type gene MAT1-2-1, the DNA lyase gene, the two largest subunits of RNA polymerase II, and elongation factor-1α gene were correct, while 14 of the 73 β-tubulin entries were indeterminate. Genetic diversity analyses using those sequences correctly identified as O. sinensis revealed significant genetic differentiation in the fungus although the extent of genetic differentiation varied with the gene. The relationship between O. sinensis and some other related fungal taxa is also discussed. 相似文献