期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

An SCI-Based PC Cluster Utilizing Coherent Network Cache

Sang-Hwa Chung Soo-Cheol Oh 《Cluster computing》2003,6(2):153-159

It is extremely important to minimize network access time in constructing a high-performance PC cluster system. For an SCI-based PC cluster, it is possible to reduce the network access time by maintaining network cache in each cluster node. This paper presents a Network-Cache-Coherent-NUMA (NCC-NUMA) card that utilizes network cache for SCI-based PC clustering. The NCC-NUMA card is directly plugged into the PCI slot of each node, and contains shared memory, network cache, and interconnection modules. The network cache is maintained for the shared memory on the PCI bus of cluster nodes. The coherency mechanism between the network cache and the shared memory is based on the IEEE SCI standard. Both a simulator and an NCC-NUMA prototype card are developed to evaluate the performance of the system. According to the experiments, the cluster system with the NCC-NUMA card showed considerable improvements compared with an SCI-based cluster without network cache. 相似文献

2.

Implementations of BLAST for Parallel Computer 总被引：2，自引：0，他引：2

Juulich Anne 《Bioinformatics (Oxford, England)》1995,11(1):3-6

The BLAST sequence comparison programs have been ported to avariety of parallel computers – the shared memory machineCray Y–MP 8/864 and the distributed memory architecturesIntel iPSC/860 and nCUBE. Addi–tionally, the programswere ported to run on workstation clusters. We explain the parallelizationtechniques and consider the pros and cons of these methods.The BLAST programs are very well suited for parallelizationfor a moderate number of processors. We illustrate our resultsusing the program blastp as an example. As input data for blastp,a 799 residue protein query sequence and the protein databasePIR were used. 相似文献

3.

ViroBLAST: a stand-alone BLAST web server for flexible queries of multiple databases and user's datasets 总被引：2，自引：0，他引：2

Deng W Nickle DC Learn GH Maust B Mullins JI 《Bioinformatics (Oxford, England)》2007,23(17):2334-2336

ViroBLAST is a stand-alone BLAST web interface for nucleotide and amino acid sequence similarity searches. It extends the utility of BLAST to query against multiple sequence databases and user sequence datasets, and provides a friendly output to easily parse and navigate BLAST results. ViroBLAST is readily useful for all research areas that require BLAST functions and is available online and as a downloadable archive for independent installation. Availability: http://indra.mullins.microbiol.washington.edu/blast/viroblast.php. 相似文献

4.

Zerg: a very fast BLAST parser library

Paquola AC Machado AA Reis EM Da Silva AM Verjovski-Almeida S 《Bioinformatics (Oxford, England)》2003,19(8):1035-1036

Summary: Zerg is a library of sub-routines that parses the output from all NCBI BLAST programs (Blastn, Blastp, Blastx, Tblastn and Tblastx) and returns the attributes of a BLAST report to the user. It is optimized for speed, being especially useful for large-scale genomic analysis. Benchmark tests show that Zerg is over two orders of magnitude faster than some widely used BLAST parsers. AVAILABILITY: http://bioinfo.iq.usp.br/zerg 相似文献

5.

Design issues for a high-performance distributed shared memory on symmetrical multiprocessor clusters

Sumit Roy Vipin Chaudhary 《Cluster computing》1999,2(3):177-186

Clusters of Symmetrical Multiprocessors (SMPs) have recently become the norm for high-performance economical computing solutions. Multiple nodes in a cluster can be used for parallel programming using a message passing library. An alternate approach is to use a software Distributed Shared Memory (DSM) to provide a view of shared memory to the application programmer. This paper describes Strings, a high performance distributed shared memory system designed for such SMP clusters. The distinguishing feature of this system is the use of a fully multi-threaded runtime system, using kernel level threads. Strings allows multiple application threads to be run on each node in a cluster. Since most modern UNIX systems can multiplex these threads on kernel level light weight processes, applications written using Strings can exploit multiple processors on a SMP machine. This paper describes some of the architectural details of the system and illustrates the performance improvements with benchmark programs from the SPLASH-2 suite, some computational kernels as well as a full fledged application. It is found that using multiple processes on SMP nodes provides good speedups only for a few of the programs. Multiple application threads can improve the performance in some cases, but other programs show a slowdown. If kernel threads are used additionally, the overall performance improves significantly in all programs tested. Other design decisions also have a beneficial impact, though to a lesser degree. This revised version was published online in July 2006 with corrections to the Cover Date. 相似文献

6.

Batch Blast Extractor: an automated blastx parser application

Pirooznia M Perkins EJ Deng Y 《BMC genomics》2008,9(Z2):S10

MOTIVATION: BLAST programs are very efficient in finding similarities for sequences. However for large datasets such as ESTs, manual extraction of the information from the batch BLAST output is needed. This can be time consuming, insufficient, and inaccurate. Therefore implementation of a parser application would be extremely useful in extracting information from BLAST outputs. RESULTS: We have developed a java application, Batch Blast Extractor, with a user friendly graphical interface to extract information from BLAST output. The application generates a tab delimited text file that can be easily imported into any statistical package such as Excel or SPSS for further analysis. For each BLAST hit, the program obtains and saves the essential features from the BLAST output file that would allow further analysis. The program was written in Java and therefore is OS independent. It works on both Windows and Linux OS with java 1.4 and higher. It is freely available from: http://mcbc.usm.edu/BatchBlastExtractor/ 相似文献

7.

BLAST2SRS,a web server for flexible retrieval of related protein sequences in the SWISS-PROT and SPTrEMBL databases

Bimpikis K Budd A Linding R Gibson TJ 《Nucleic acids research》2003,31(13):3792-3794

SRS (Sequence Retrieval System) is a widely used keyword search engine for querying biological databases. BLAST2 is the most widely used tool to query databases by sequence similarity search. These tools allow users to retrieve sequences by shared keyword or by shared similarity, with many public web servers available. However, with the increasingly large datasets available it is now quite common that a user is interested in some subset of homologous sequences but has no efficient way to restrict retrieval to that set. By allowing the user to control SRS from the BLAST output, BLAST2SRS (http://blast2srs.embl.de/) aims to meet this need. This server therefore combines the two ways to search sequence databases: similarity and keyword. 相似文献

8.

MULTBLAST: A web application for multiple BLAST searches

Mittler T Levy M Chad F Karen S 《Bioinformation》2010,5(5):224-226

Basic Local Alignment Search Tool, (BLAST) allows the comparison of a query sequence/s to a database of sequences and identifies those sequences that are similar to the query above a user-defined threshold. We have developed a user friendly web application, MULTBLAST that runs a series of BLAST searches on a user-supplied list of proteins against one or more target protein or nucleotide databases. The application pre-processes the data, launches each individual BLAST search on the University of Nevada, Reno''s-TimeLogic DeCypher® system (available from Active Motif, Inc.) and retrieves and combines all the results into a simple, easy to read output file. The output file presents the list of the query proteins, followed by the BLAST results for the matching sequences from each target database in consecutive columns. This format is especially useful for either comparing the results from the different target databases, or analyzing the results while keeping the identification of each target database separate.

Availability

The application is available at the URLhttp://blastpipe.biochem.unr.edu/ 相似文献

9.

Saturated BLAST: an automated multiple intermediate sequence search used to detect distant homology

Li W Pio F Pawłowski K Godzik A 《Bioinformatics (Oxford, England)》2000,16(12):1105-1110

MOTIVATION: Two proteins can have a similar 3-dimensional structure and biological function, but have sequences sufficiently different that traditional protein sequence comparison algorithms do not identify their relationship. The desire to identify such relations has led to the development of more sensitive sequence alignment strategies. One such strategy is the Intermediate Sequence Search (ISS), which connects two proteins through one or more intermediate sequences. In its brute-force implementation, ISS is a strategy that repetitively uses the results of the previous query as new search seeds, making it time-consuming and difficult to analyze. RESULTS: Saturated BLAST is a package that performs ISS in an efficient and automated manner. It was developed using Perl and Perl/Tk and implemented on the LINUX operating system. Starting with a protein sequence, Saturated BLAST runs a BLAST search and identifies representative sequences for the next generation of searches. The procedure is run until convergence or until some predefined criteria are met. Saturated BLAST has a friendly graphic user interface, a built-in BLAST result parser, several multiple alignment tools, clustering algorithms and various filters for the elimination of false positives, thereby providing an easy way to edit, visualize, analyze, monitor and control the search. Besides detecting remote homologies, Saturated BLAST can be used to maintain protein family databases and to search for new genes in genomic databases. 相似文献

10.

Alignment of BLAST high-scoring segment pairs based on the longest increasing subsequence algorithm

Zhang H 《Bioinformatics (Oxford, England)》2003,19(11):1391-1396

MOTIVATION:The popular BLAST algorithm is based on a local similarity search strategy, so its high-scoring segment pairs (HSPs) do not have global alignment information. When scientists use BLAST to search for a target protein or DNA sequence in a huge database like the human genome map, the existence of repeated fragments, homologues or pseudogenes in the genome often makes the BLAST result filled with redundant HSPs. Therefore, we need a computational strategy to alleviate this problem. RESULTS: In the gene discovery group of Celera Genomics, I developed a two-step method, i.e. a BLAST step plus an LIS step, to align thousands of cDNA and protein sequences into the human genome map. The LIS step is based on a mature computational algorithm, Longest Increasing Subsequence (LIS) algorithm. The idea is to use the LIS algorithm to find the longest series of consecutive HSPs in the BLAST output. Such a BLAST+LIS strategy can be used as an independent alignment tool or as a complementary tool for other alignment programs like Sim4 and GenWise. It can also work as a general purpose BLAST result processor in all sorts of BLAST searches. Two examples from Celera were shown in this paper. 相似文献

11.

BioParser

Catanho M Mascarenhas D Degrave W de Miranda AB 《Applied bioinformatics》2006,5(1):49-53

The widely used programs BLAST (in this article, 'BLAST' includes both the National Center for Biotechnology Information [NCBI] BLAST and the Washington University version WU BLAST) and FASTA for similarity searches in nucleotide and protein databases usually result in copious output. However, when large query sets are used, human inspection rapidly becomes impractical. BioParser is a Perl program for parsing BLAST and FASTA reports. Making extensive use of the BioPerl toolkit, the program filters, stores and returns components of these reports in either ASCII or HTML format. BioParser is also capable of automatically feeding a local MySQL database with the parsed information, allowing subsequent filtering of hits and/or alignments with specific attributes. For this reason, BioParser is a valuable tool for large-scale similarity analyses by improving the access to the information present in BLAST or FASTA reports, facilitating extraction of useful information of large sets of sequence alignments, and allowing for easy handling and processing of the data. AVAILABILITY: BioParser is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs 2.0 license terms (http://creativecommons.org/licenses/by-nc-nd/2.0/) and is available upon request. Additional information can be found at the BioParser website (http://www.dbbm.fiocruz.br/BioParser.html). 相似文献

12.

Protein intrinsic disorder toolbox for comparative analysis of viral proteins

Goh Gerard Kian-Meng Dunker A Keith Uversky Vladimir N 《BMC genomics》2008,9(2):1-22

Motivation

BLAST programs are very efficient in finding similarities for sequences. However for large datasets such as ESTs, manual extraction of the information from the batch BLAST output is needed. This can be time consuming, insufficient, and inaccurate. Therefore implementation of a parser application would be extremely useful in extracting information from BLAST outputs.

Results

We have developed a java application, Batch Blast Extractor, with a user friendly graphical interface to extract information from BLAST output. The application generates a tab delimited text file that can be easily imported into any statistical package such as Excel or SPSS for further analysis. For each BLAST hit, the program obtains and saves the essential features from the BLAST output file that would allow further analysis. The program was written in Java and therefore is OS independent. It works on both Windows and Linux OS with java 1.4 and higher. It is freely available from: http://mcbc.usm.edu/BatchBlastExtractor/

相似文献

13.

GenBank. 总被引：2，自引：1，他引：2

下载免费PDF全文

D A Benson M S Boguski D J Lipman J Ostell B F Ouellette 《Nucleic acids research》1998,26(1):1-7

The GenBank(R) sequence database (http://www.ncbi.nlm.nih.gov/) incorporates DNA sequences from all available public sources, primarily through the direct submission of sequence data from individual laboratories and from large-scale sequencing projects. Most submitters use the BankIt (WWW) or Sequin programs to send their sequence data. Data exchange with the EMBL Data Library and the DNA Data Bank of Japan helps ensure comprehensive worldwide coverage. GenBank data is accessible through NCBI's integrated retrieval system, Entrez , which integrates data from the major DNA and protein sequence databases along with taxonomy, genome and protein structure information. MEDLINE(R) abstracts from published articles describing the sequences are also included as an additional source of biological annotation. Sequence similarity searching is offered through the BLAST series of database search programs. In addition to FTP, e-mail and server/client versions of Entrez and BLAST, NCBI offers a wide range of World Wide Web retrieval and analysis services of interest to biologists. 相似文献

14.

BeoBLAST: distributed BLAST and PSI-BLAST on a Beowulf cluster

Grant JD Dunbrack RL Manion FJ Ochs MF 《Bioinformatics (Oxford, England)》2002,18(5):765-766

BeoBLAST is an integrated software package that handles user requests and distributes BLAST and PSI-BLAST searches to nodes of a Beowulf cluster, thus providing a simple way to implement a scalable BLAST system on top of relatively inexpensive computer clusters. Additionally, BeoBLAST offers a number of novel search features through its web interface, including the ability to perform simultaneous searches of multiple databases with multiple queries, and the ability to start a search using the PSSM generated from a previous PSI-BLAST search on a different database. The underlying system can also handle automated querying for high throughput work. AVAILABILITY: Source code is available under the GNU public license at http://bioinformatics.fccc.edu/ 相似文献

15.

Multi-query sequence BLAST output examination with MuSeqBox

Xing L Brendel V 《Bioinformatics (Oxford, England)》2001,17(8):744-745

SUMMARY: MuSeqBox is a program to parse BLAST output and store attributes of BLAST hits in tabular form. The user can apply a number of selection criteria to filter out hits with particular attributes. MuSeqBox provides a powerful annotation tool for large sets of query sequences that are simultaneously compared against a database with any of the standard stand-alone or network-client BLAST programs. We discuss such application to the problem of annotation and analysis of EST collections. AVAILABILITY: The program was written in standard C++ and is freely available to noncommercial users by request from the authors. The program is also available over the web at http://bioinformatics.iastate.edu/bioinformatics2go/mb/MuSeqBox.html. 相似文献

16.

CLAST: CUDA implemented large-scale alignment search tool

Masahiro Yano Hiroshi Mori Yutaka Akiyama Takuji Yamada Ken Kurokawa 《BMC bioinformatics》2014,15(1)

Background

Metagenomics is a powerful methodology to study microbial communities, but it is highly dependent on nucleotide sequence similarity searching against sequence databases. Metagenomic analyses with next-generation sequencing technologies produce enormous numbers of reads from microbial communities, and many reads are derived from microbes whose genomes have not yet been sequenced, limiting the usefulness of existing sequence similarity search tools. Therefore, there is a clear need for a sequence similarity search tool that can rapidly detect weak similarity in large datasets.

Results

We developed a tool, which we named CLAST (CUDA implemented large-scale alignment search tool), that enables analyses of millions of reads and thousands of reference genome sequences, and runs on NVIDIA Fermi architecture graphics processing units. CLAST has four main advantages over existing alignment tools. First, CLAST was capable of identifying sequence similarities ~80.8 times faster than BLAST and 9.6 times faster than BLAT. Second, CLAST executes global alignment as the default (local alignment is also an option), enabling CLAST to assign reads to taxonomic and functional groups based on evolutionarily distant nucleotide sequences with high accuracy. Third, CLAST does not need a preprocessed sequence database like Burrows–Wheeler Transform-based tools, and this enables CLAST to incorporate large, frequently updated sequence databases. Fourth, CLAST requires <2 GB of main memory, making it possible to run CLAST on a standard desktop computer or server node.

Conclusions

CLAST achieved very high speed (similar to the Burrows–Wheeler Transform-based Bowtie 2 for long reads) and sensitivity (equal to BLAST, BLAT, and FR-HIT) without the need for extensive database preprocessing or a specialized computing platform. Our results demonstrate that CLAST has the potential to be one of the most powerful and realistic approaches to analyze the massive amount of sequence data from next-generation sequencing technologies.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-014-0406-y) contains supplementary material, which is available to authorized users. 相似文献

17.

Compressed indexing and local alignment of DNA

Lam TW Sung WK Tam SL Wong CK Yiu SM 《Bioinformatics (Oxford, England)》2008,24(6):791-797

MOTIVATION: Recent experimental studies on compressed indexes (BWT, CSA, FM-index) have confirmed their practicality for indexing very long strings such as the human genome in the main memory. For example, a BWT index for the human genome (with about 3 billion characters) occupies just around 1 G bytes. However, these indexes are designed for exact pattern matching, which is too stringent for biological applications. The demand is often on finding local alignments (pairs of similar substrings with gaps allowed). Without indexing, one can use dynamic programming to find all the local alignments between a text T and a pattern P in O(|T||P|) time, but this would be too slow when the text is of genome scale (e.g. aligning a gene with the human genome would take tens to hundreds of hours). In practice, biologists use heuristic-based software such as BLAST, which is very efficient but does not guarantee to find all local alignments. RESULTS: In this article, we show how to build a software called BWT-SW that exploits a BWT index of a text T to speed up the dynamic programming for finding all local alignments. Experiments reveal that BWT-SW is very efficient (e.g. aligning a pattern of length 3 000 with the human genome takes less than a minute). We have also analyzed BWT-SW mathematically for a simpler similarity model (with gaps disallowed), and we show that the expected running time is O(/T/(0.628)/P/) for random strings. As far as we know, BWT-SW is the first practical tool that can find all local alignments. Yet BWT-SW is not meant to be a replacement of BLAST, as BLAST is still several times faster than BWT-SW for long patterns and BLAST is indeed accurate enough in most cases (we have used BWT-SW to check against the accuracy of BLAST and found that only rarely BLAST would miss some significant alignments). AVAILABILITY: www.cs.hku.hk/~ckwong3/bwtsw CONTACT: twlam@cs.hku.hk. 相似文献

18.

Parallelisation of the blast algorithm

Qi Y Lin F 《Cellular & molecular biology letters》2005,10(2):281-285

Retrieving homologous DNA and protein sequences from existing databases is a fundamental routine in bioinformatics research. Programs of the NCBI BLAST family are widely used for this purpose. We evaluated paraBLAST, a parallelised version of the NCBI BLAST algorithm, using a Message Passing Interface (MPI) on a multi-node compute cluster. Here, we propose static and dynamic database-partitioning schemes based on the availability of the cluster. We evaluated the application of the algorithm in querying nucleotide sequences against a large-scale sequence database with different numbers of database partitions, and hence, different numbers of CPUs. Since the program's tasks are performed independently of each other, each available CPU can run its own copy of BLAST queries, resulting in reduced interference between processes and leading to a highly scalable solution. 相似文献

19.

CloudLCA: finding the lowest common ancestor in metagenome analysis using cloud computing

Guoguang Zhao Dechao Bu Changning Liu Jing Li Jian Yang Zhiyong Liu Yi Zhao Runsheng Chen 《蛋白质与细胞》2012,3(2):148

Estimating taxonomic content constitutes a key problem in metagenomic sequencing data analysis. However, extracting such content from high-throughput data of next-generation sequencing is very time-consuming with the currently available software. Here, we present CloudLCA, a parallel LCA algorithm that significantly improves the efficiency of determining taxonomic composition in metagenomic data analysis. Results show that CloudLCA (1) has a running time nearly linear with the increase of dataset magnitude, (2) displays linear speedup as the number of processors grows, especially for large datasets, and (3) reaches a speed of nearly 215 million reads each minute on a cluster with ten thin nodes. In comparison with MEGAN, a well-known metagenome analyzer, the speed of CloudLCA is up to 5 more times faster, and its peak memory usage is approximately 18.5% that of MEGAN, running on a fat node. CloudLCA can be run on one multiprocessor node or a cluster. It is expected to be part of MEGAN to accelerate analyzing reads, with the same output generated as MEGAN, which can be import into MEGAN in a direct way to finish the following analysis. Moreover, CloudLCA is a universal solution for finding the lowest common ancestor, and it can be applied in other fields requiring an LCA algorithm. 相似文献

20.

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. 总被引：820，自引：54，他引：820

下载免费PDF全文

S F Altschul T L Madden A A Schffer J Zhang Z Zhang W Miller D J Lipman 《Nucleic acids research》1997,25(17):3389-3402

The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original. In addition, a method is introduced for automatically combining statistically significant alignments produced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position-Specific Iterated BLAST (PSI-BLAST) program runs at approximately the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biologically relevant sequence similarities. PSI-BLAST is used to uncover several new and interesting members of the BRCT superfamily. 相似文献