共查询到20条相似文献,搜索用时 15 毫秒
1.
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. 总被引:820,自引:54,他引:820 下载免费PDF全文
S F Altschul T L Madden A A Schffer J Zhang Z Zhang W Miller D J Lipman 《Nucleic acids research》1997,25(17):3389-3402
The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original. In addition, a method is introduced for automatically combining statistically significant alignments produced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position-Specific Iterated BLAST (PSI-BLAST) program runs at approximately the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biologically relevant sequence similarities. PSI-BLAST is used to uncover several new and interesting members of the BRCT superfamily. 相似文献
2.
Background
The BLAST algorithm compares biological sequences to one another in order to determine shared motifs and common ancestry. However, the comparison of all non-redundant (NR) sequences against all other NR sequences is a computationally intensive task. We developed NBLAST as a cluster computer implementation of the BLAST family of sequence comparison programs for the purpose of generating pre-computed BLAST alignments and neighbour lists of NR sequences. 相似文献3.
4.
UniBLAST: a system to filter,cluster, and display BLAST results and assign unique gene annotation 总被引:1,自引:0,他引:1
MOTIVATION: More and more often, a gene is epitomized by a large number of sequences in GenBank. This high redundancy makes it very difficult to identify a unique best match for a query sequence from its BLAST results. We developed a novel program UniBLAST that filters out uninformative hits, clusters the redundant hits, groups the hits by LocusLink, and graphically displays the results. We also implemented a scoring function in UniBLAST to assign a unique gene name to a query sequence. UniBLAST significantly increases the efficiency of gene annotation. AVAILABILITY: The program is available at http://south.genomics.org.cn/software/uniblast/index.html CONTACT: uniblast@genomics.org.cn; wei@nexusgenomics.com 相似文献
5.
Subcellular location is an important functional annotation of proteins. An automatic, reliable and efficient prediction system for protein subcellular localization is necessary for large-scale genome analysis. This paper describes a protein subcellular localization method which extracts features from protein profiles rather than from amino acid sequences. The protein profile represents a protein family, discards part of the sequence information that is not conserved throughout the family and therefore is more sensitive than the amino acid sequence. The amino acid compositions of whole profile and the N-terminus of the profile are extracted, respectively, to train and test the probabilistic neural network classifiers. On two benchmark datasets, the overall accuracies of the proposed method reach 89.1% and 68.9%, respectively. The prediction results show that the proposed method perform better than those methods based on amino acid sequences. The prediction results of the proposed method are also compared with Subloc on two redundance-reduced datasets. 相似文献
6.
Grayscale electron-beam lithography is a technique widely used in transferring three-dimensional structures onto the resist
layer or substrate. The proximity effect caused by electron scattering in the resist imposes a severe limitation on the ultimate
spatial resolution attainable by e-beam lithography. Therefore, correction of the proximity effect is essential particularly
for the fine-feature, high-density circuit patterns. However, the proximity effect correction is very time-consuming due to
the intensive computation required in the correction procedure and a large size of circuit data to be processed. Hence, it
is an ideal candidate for distributed computing where the otherwise-unused CPU cycles of a number of computers on a network
(cluster) can be efficiently utilized. One of the characteristics of such a cluster is its heterogeneity, i.e., the available
computing power varies with computer and/or time. This variation may degrade the performance of distributed computing significantly.
In this paper, efficient distributed implementations of grayscale proximity effect correction on a temporally heterogeneous
cluster are described with the main emphasis on static and dynamic load balancing schemes and their optimization through effective
task partitioning methods. The experimental results obtained on a cluster of Sun workstations shared by multiple users are
presented with detailed discussion. 相似文献
7.
8.
9.
A homology-based structure prediction method ideally gives both a correct fold assignment and an accurate query-template alignment. In this article we show that the combination of two existing methods, PSI-BLAST and threading, leads to significant enhancement in the success rate of fold recognition. The combined approach, termed COBLATH, also yields much higher alignment accuracy than found in previous studies. It consists of two-way searches both by PSI-BLAST and by threading. In the PSI-BLAST portion, a query is used to search for hits in a library of potential templates and, conversely, each potential template is used to search for hits in a library of queries. In the threading portion, the scoring function is the sum of a sequence profile and a 6x6 substitution matrix between predicted query and known template secondary structure and solvent exposure. "Two-way" in threading means that the query's sequence profile is used to match the sequences of all potential templates and the sequence profiles of all potential templates are used to match the query's sequence. When tested on a set of 533 nonhomologous proteins, COBLATH was able to assign folds for 390 (73%). Among these 390 queries, 265 (68%) had root-mean-square deviations (RMSDs) of less than 8 A between predicted and actual structures. Such high success rate and accuracy make COBLATH an ideal tool for structural genomics. 相似文献
10.
SUMMARY: BLAST2GENE is a program that allows a detailed analysis of genomic regions containing completely or partially duplicated genes. From a BLAST (or BL2SEQ) comparison of a protein or nucleotide query sequence with any genomic region of interest, BLAST2GENE processes all high scoring pairwise alignments (HSPs) and provides the disposition of all independent copies along the genomic fragment. The results are provided in text and PostScript formats to allow an automatic and visual evaluation of the respective region. AVAILABILITY: The program is available upon request from the authors. A web server of BLAST2GENE is maintained at http://www.bork.embl.de/blast2gene 相似文献
11.
A new program, PSI Protein Classifier, generalizing the results of both successive and independent iterations of the PSI-BLAST program was developed. The technical opportunities of the program are described and illustrated by two examples. An iterative screening of the amino acid sequence database detected potential evolutionary relationships between GH5, GH13, GH27, GH31, GH36, GH66, GH101 and GH114 families of glycoside hydrolases. Analysis of the statistically significant sequence similarity (E-value analysis) allowed us to divide the family GH31 into 38 subfamilies. 相似文献
12.
Wajdi Louati Ines Houidi Manel Kharrat Djamal Zeghlache Hormuzd M. Khosravi 《Cluster computing》2008,11(4):355-372
This paper presents the design, implementation and evaluation of an extensible, scalable and distributed heterogeneous cluster
based programmable router, called DHCR (Distributed Heterogeneous Cluster based Router), capable of supporting and deploying
network services at run time. DHCR is a software IP router relying on heterogeneous cluster composed of separated computers
with different hardware and software architecture capabilities, running different operating systems and interconnected through
a high speed network connection. The DHCR ensures dynamic deployment of services and distributed control of router components
(forwarding and routing elements) over heterogeneous system environments. The DHCR combines the IETF ForCES (Forwarding and
Control Element Separation) architecture with software component technologies to meet the requirements of the next generation
software routers. To ensure reliable and transparent communication between separated, decentralized and heterogeneous router
components, the CORBA based middleware technology is used to support the DHCR internal communication. The paper also explores
the use of the CORBA Component Model (CCM) to design and implement a modular, distributed and heterogeneous forwarding path
for the DHCR router architecture. The CCM based forwarding plane ensures dynamic reconfiguration of the data path topology
needed for low-level service deployment. Results on achievable performance using the proposed DHCR router are reported.
相似文献
Hormuzd M. KhosraviEmail: |
13.
In this paper, we present a framework that can provide users with a simple, convenient and powerful way to deploy multiple message queue system on demand in a Hadoop cluster. Specifically, we are leveraging the Apache Kafka which is one of the state of art distributed message queue systems that can achieve high throughput, low latency, and good load balancing. Our framework provides automation of setting up and starting Kafka brokers on the fly and users can leverage the framework to quickly adopt Kafka without spending much efforts on installation and configuration challenges. In addition, the framework supports users to run their Kafka-based applications without detailed knowledge about the Hadoop YARN APIs and underlying mechanisms. We present a use case of the framework to evaluate Kafka’s performance with various test cases and working scenarios. The experimental results allow Kafka’s potential users to perceive the influences of different settings on the queuing performance. 相似文献
14.
Paquola AC Machado AA Reis EM Da Silva AM Verjovski-Almeida S 《Bioinformatics (Oxford, England)》2003,19(8):1035-1036
Summary: Zerg is a library of sub-routines that parses the output from all NCBI BLAST programs (Blastn, Blastp, Blastx, Tblastn and Tblastx) and returns the attributes of a BLAST report to the user. It is optimized for speed, being especially useful for large-scale genomic analysis. Benchmark tests show that Zerg is over two orders of magnitude faster than some widely used BLAST parsers. AVAILABILITY: http://bioinfo.iq.usp.br/zerg 相似文献
15.
Jones CE Schwerdt J Bretag TA Baumann U Brown AL 《Bioinformatics (Oxford, England)》2008,24(22):2628-2629
GOSLING is a web-based protein function annotator that uses a decision tree-derived rule set to quickly predict Gene Ontology terms for a protein. A score is assigned to each term prediction that is indicative of the accuracy of the prediction. Due to its speed and accuracy GOSLING is ideally suited for high-throughput annotation tasks. AVAILABILITY: https://www.sapac.edu.au/gosling 相似文献
16.
Background -
Sequencing of EST and BAC end datasets is no longer limited to large research groups. Drops in per-base pricing have made high throughput sequencing accessible to individual investigators. However, there are few options available which provide a free and user-friendly solution to the BLAST result storage and data mining needs of biologists. 相似文献17.
BLAST+: architecture and applications 总被引:5,自引:0,他引:5
Christiam Camacho George Coulouris Vahram Avagyan Ning Ma Jason Papadopoulos Kevin Bealer Thomas L Madden 《BMC bioinformatics》2009,10(1):421
Background
Sequence similarity searching is a very important bioinformatics task. While Basic Local Alignment Search Tool (BLAST) outperforms exact methods through its use of heuristics, the speed of the current BLAST software is suboptimal for very long queries or database sequences. There are also some shortcomings in the user-interface of the current command-line applications. 相似文献18.
MOTIVATION: The deluge of biological information from different genomic initiatives and the rapid advancement in biotechnologies have made bioinformatics tools an integral part of modern biology. Among the widely used sequence alignment tools, BLAST and PSI-BLAST are arguably the most popular. PSI-BLAST, which uses an iterative profile position specific score matrix (PSSM)-based search strategy, is more sensitive than BLAST in detecting weak homologies, thus making it suitable for remote homolog detection. Many refinements have been made to improve PSI-BLAST, and its computational efficiency and high specificity have been much touted. Nevertheless, corruption of its profile via the incorporation of false positive sequences remains a major challenge. RESULTS: We have developed a simple and elegant approach to resolve the problem of model corruption in PSI-BLAST searches. We hypothesized that combining results from the first (least-corrupted) profile with results from later (most sensitive) iterations of PSI-BLAST provides a better discriminator for true and false hits. Accordingly, we have derived a formula that utilizes the E-values from these two PSI-BLAST iterations to obtain a figure of merit for rank-ordering the hits. Our verification results based on a 'gold-standard' test set indicate that this figure of merit does indeed delineate true positives from false positives better than PSI-BLAST E-values. Perhaps what is most notable about this strategy is that it is simple and straightforward to implement. 相似文献
19.