首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
3.
Mastering seeds for genomic size nucleotide BLAST searches   总被引:1,自引:0,他引:1  
One of the most common activities in bioinformatics is the search for similar sequences. These searches are usually carried out with the help of programs from the NCBI BLAST family. As the majority of searches are routinely performed with default parameters, a question that should be addressed is how reliable the results obtained using the default parameter values are, i.e. what fraction of potential matches have been retrieved by these searches. Our primary focus is on the initial hit parameter, also known as the seed or word, used by the NCBI BLASTn, MegaBLAST and other similar programs in searches for similar nucleotide sequences. We show that the use of default values for the initial hit parameter can have a big negative impact on the proportion of potentially similar sequences that are retrieved. We also show how the hit probability of different seeds varies with the minimum length and similarity of sequences desired to be retrieved and describe methods that help in determining appropriate seeds. The experimental results described in this paper illustrate situations in which these methods are most applicable and also show the relationship between the various BLAST parameters.  相似文献   

4.
Mittler T  Levy M  Chad F  Karen S 《Bioinformation》2010,5(5):224-226
Basic Local Alignment Search Tool, (BLAST) allows the comparison of a query sequence/s to a database of sequences and identifies those sequences that are similar to the query above a user-defined threshold. We have developed a user friendly web application, MULTBLAST that runs a series of BLAST searches on a user-supplied list of proteins against one or more target protein or nucleotide databases. The application pre-processes the data, launches each individual BLAST search on the University of Nevada, Reno''s-TimeLogic DeCypher® system (available from Active Motif, Inc.) and retrieves and combines all the results into a simple, easy to read output file. The output file presents the list of the query proteins, followed by the BLAST results for the matching sequences from each target database in consecutive columns. This format is especially useful for either comparing the results from the different target databases, or analyzing the results while keeping the identification of each target database separate.

Availability

The application is available at the URLhttp://blastpipe.biochem.unr.edu/  相似文献   

5.
In order to predict biologically significant attributes such as function from protein sequences, searching against large databases for homologous proteins is a common practice. In particular, BLAST and HMMER are widely used in a variety of biological fields. However, sequencehomologous proteins determined by BLAST and proteins having the same domains predicted by HMMER are not always functionally equivalent, even though their sequences are aligning with high similarity. Thus, accurate assignment of functionally equivalent proteins from aligned sequences remains a challenge in bioinformatics. We have developed the FEP-BH algorithm to predict functionally equivalent proteins from protein-protein pairs identified by BLAST and from protein-domain pairs predicted by HMMER. When examined against domain classes of the Pfam-A seed database, FEP-BH showed 71.53% accuracy, whereas BLAST and HMMER were 57.72% and 36.62%, respectively. We expect that the FEP-BH algorithm will be effective in predicting functionally equivalent proteins from BLAST and HMMER outputs and will also suit biologists who want to search out functionally equivalent proteins from among sequence-homologous proteins.  相似文献   

6.
7.
Protein identifications with the borderline statistical confidence are typically produced by matching a few marginal quality MS/MS spectra to database peptide sequences and represent a significant bottleneck in the reliable and reproducible characterization of proteomes. Here, we present a method for rapid validation of borderline hits that circumvents the need in, often biased, manual inspection of raw MS/MS spectra. The approach takes advantage of the independent interpretation of corresponding MS/MS spectra by PepNovo de novo sequencing software followed by mass spectrometry-driven BLAST (MS BLAST) sequence-similarity database searches that utilize all partially inaccurate, degenerate and redundant candidate peptide sequences. In a case study involving the identification of more than 180 Caenorhabditis elegans proteins by nanoLC-MS/MS analysis on a linear ion trap LTQ mass spectrometer, the approach enabled rapid assignment (confirmation or rejection) of more than 70% of Mascot hits of borderline statistical confidence.  相似文献   

8.
Homology search is a key tool for understanding the role, structure, and biochemical function of genomic sequences. The most popular technique for rapid homology search is BLAST, which has been in widespread use within universities, research centers, and commercial enterprises since the early 1990s. We propose a new step in the BLAST algorithm to reduce the computational cost of searching with negligible effect on accuracy. This new step - semigapped alignment - compromises between the efficiency of ungapped alignment and the accuracy of gapped alignment, allowing BLAST to accurately filter sequences with lower computational cost. In addition, we propose a heuristic - restricted insertion alignment - that avoids unlikely evolutionary paths with the aim of reducing gapped alignment cost with negligible effect on accuracy. Together, after including an optimization of the local alignment recursion, our two techniques more than double the speed of the gapped alignment stages in blast. We conclude that our techniques are an important improvement to the BLAST algorithm. Source code for the alignment algorithms is available for download at http://www.bsg.rmit.edu.au/iga/.  相似文献   

9.
Profile searches using aligned short protein blocks are an effectivemethod for identifying putative protein functions. An algorithmis presented that accelerates block searches by a factor 2–5with only limited lack of sensitivity: this algorithm is particularlysuited for application in large-scale genome research.  相似文献   

10.

Background  

BLAST searches are widely used for sequence alignment. The search results are commonly adopted for various functional and comparative genomics tasks such as annotating unknown sequences, investigating gene models and comparing two sequence sets. Advances in sequencing technologies pose challenges for high-throughput analysis of large-scale sequence data. A number of programs and hardware solutions exist for efficient BLAST searching, but there is a lack of generic software solutions for mining and personalized management of the results. Systematically reviewing the results and identifying information of interest remains tedious and time-consuming.  相似文献   

11.
GOSLING is a web-based protein function annotator that uses a decision tree-derived rule set to quickly predict Gene Ontology terms for a protein. A score is assigned to each term prediction that is indicative of the accuracy of the prediction. Due to its speed and accuracy GOSLING is ideally suited for high-throughput annotation tasks. AVAILABILITY: https://www.sapac.edu.au/gosling  相似文献   

12.
Improved sensitivity of biological sequence database searches   总被引:26,自引:0,他引:26  
We have increased the sensitivity ofDNA and protein sequencedatabase searches by allowing similar but non-identical aminoacids or nucleotides to match. In addition, one can match k-tuplesor words instead of matching individual residues in order tospeed the search. A matching matrix specifies which k-tuplesmatch each other. The matching matrix can be calculated froma similarity matrix of amino acids and a threshold of similarityrequired for matching. This permits amino acid similarity matricesor replacement matrices (PAM matrices) to be used in the firststep of a sequence comparison rather than in a secondary scoringphase. The concept of matching non-identical k-tuples also increasesthe power ofDNA database searches. For example, a matrix thatspecifies that any 3-tuple in a DNA sequence can match any other3-tuple encoding the same amino acid permits a DNA databasesearch using a DNA query sequence for regions that would encodea similar amino acid sequence. Received on October 10, 1989; accepted on May 1, 1990  相似文献   

13.
14.
MOTIVATION: Searches of biological sequence databases are usually focussed on distinguishing significant from random matches. However, the increasing abundance of related sequences on databases present a second challenge: to distinguish the evolutionarily most closely related sequences (often orthologues) from more distantly related homologues. This is particularly important when searching a database of partial sequences, where short orthologous sequences from a non-conserved region will score much more poorly than non-orthologous (outgroup) sequences from a conserved region. RESULTS: Such inferences are shown to be improved by conditioning the search results on the scores of an outgroup sequence. The log-odds score for each target sequence identified on the database has the log-odds score of the outgroup sequence subtracted from it. A test group of Caenorhabditis elegans kinase sequences and their identified C.elegans outgroups were searched against a test database of human Expressed Sequence Tag (EST) sequences, where the sets of true target sequences were known in advance. The outgroup conditioned method was shown to identify 58% more true positives ahead of the first false positive, compared to the straightforward search without an outgroup. A test dataset of 151 proteins drawn from the C.elegans genome, where the putative 'outgroup' was assigned automatically, similarly found 50% more true positives using outgroup conditioning. Thus, outgroup conditioning provides a means to improve the results of database searching with little increase in the search computation time.  相似文献   

15.

Background  

The identification of protein domains plays an important role in protein structure comparison. Domain query size and composition are critical to structure similarity search algorithms such as the Vector Alignment Search Tool (VAST), the method employed for computing related protein structures in NCBI Entrez system. Currently, domains identified on the basis of structural compactness are used for VAST computations. In this study, we have investigated how alternative definitions of domains derived from conserved sequence alignments in the Conserved Domain Database (CDD) would affect the domain comparisons and structure similarity search performance of VAST.  相似文献   

16.
Cell seeding and attachment in three-dimensional scaffolds is a key step in tissue engineering with implications for cell differentiation and tissue development. In this work, two new seeding methods were investigated using human chondrocytes and polyglycolic acid (PGA) fibrous mesh scaffolds. A simple semi-static seeding method using culture plates and tissue flasks was developed as an easy-to-perform modification of static seeding. An alginate-loading method was also studied, using alginate hydrogel as an adjuvant for entrapping cells within PGA scaffolds. Both the semi-static and PGA-alginate methods produced more homogeneous cell distributions than conventional static and dynamic seeding. Using 20 × 10(6) cells, whereas the seeding efficiency for static seeding was only 52%, all other techniques produced seeding efficiencies of ≥ 90%. With 40 × 10(6) cells, the efficiency of semi-static seeding declined to 74% while the dynamic and PGA-alginate methods retained their ability to accommodate high cell numbers. The seeded scaffolds were cultured in recirculation bioreactors to determine the effect of seeding method on cartilage production. Statically seeded scaffolds did not survive the 5-week cultivation period. Deposition of extracellular matrix in scaffolds seeded using the semi-static and PGA-alginate methods was more uniform compared with scaffolds seeded using the dynamic method. The new semi-static and PGA-alginate seeding methods developed in this work are recommended for tissue engineering because they provide substantial benefits compared with static seeding in terms of seeding efficiency, cell distribution, and cartilage deposition while remaining simple and easy to execute.  相似文献   

17.
Improved protein refolding using hollow-fibre membrane dialysis   总被引:7,自引:0,他引:7  
We have used a cellulose acetate, hollow-fibre (HF) ultrafiltration membrane to refold bovine carbonic anhydrase, loaded into the lumen space, by removing the denaturant through controlled dialysis via the shell side space. When challenged with GdnHCl-denatured carbonic anhydrase, 70% of the loaded protein reptated through the membrane into the circulating dialysis buffer. Reptation occurred because the protein, in its fully unfolded configuration, was able to pass through the pores. The loss of carbonic anhydrase through the membrane was controlled by the dialysis conditions. Dialysis against 0.05 M Tris-HCl for 30 min reduced the denaturant around the protein to a concentration that allowed the return of secondary structure, increasing the hydrodynamic radius, thus preventing protein transmission. Under these conditions a maximum of 42% of carbonic anhydrase was recovered (from a starting concentration of 5 mg/mL) with 94% activity. This is an improvement over refolding carbonic anhydrase by simple batch dilution, which gave a maximum reactivation of 85% with 35% soluble protein yield. The batch refolding of carbonic anhydrase is very sensitive to temperature; however, during HF refolding between 0 and 25 degrees C the temperature sensitivity was considerably reduced. In order to reduce the convection forces that give rise to aggregation and promote refolding the dialyzate was slowly heated from 4 to 25 degrees C. This slow, temperature-controlled refolding gave an improved soluble protein recovery of 55% with a reactivation yield of 90%. The effect of a number of additives on the refolding system performance were tested: the presence of PEG improved both the protein recovery and the recovered activity from the membrane, while the detergents Tween 20 and IGEPAL CA-630 increased only the refolding yield.  相似文献   

18.
Naturally split inteins mediate a traceless protein ligation process known as protein trans‐splicing (PTS). Although frequently used in protein engineering applications, the efficiency of PTS can be reduced by the tendency of some split intein fusion constructs to aggregate; a consequence of the fragmented nature of the split intein itself or the polypeptide to which it is fused (the extein). Here, we report a strategy to help address this liability. This involves embedding the split intein within a protein sequence designed to stabilize either the intein fragment itself or the appended extein. We expect this approach to increase the scope of PTS‐based protein engineering efforts.  相似文献   

19.
Searching for similar 3D protein structures is one of the primary processes employed in the field of structural bioinformatics. However, the computational complexity of this process means that it is constantly necessary to search for new methods that can perform such a process faster and more efficiently. Finding molecular substructures that complex protein structures have in common is still a challenging task, especially when entire databases containing tens or even hundreds of thousands of protein structures must be scanned. Graphics processing units (GPUs) and general purpose graphics processing units (GPGPUs) can perform many time-consuming and computationally demanding processes much more quickly than a classical CPU can. In this paper, we describe the GPU-based implementation of the CASSERT algorithm for 3D protein structure similarity searching. This algorithm is based on the two-phase alignment of protein structures when matching fragments of the compared proteins. The GPU (GeForce GTX 560Ti: 384 cores, 2GB RAM) implementation of CASSERT (“GPU-CASSERT”) parallelizes both alignment phases and yields an average 180-fold increase in speed over its CPU-based, single-core implementation on an Intel Xeon E5620 (2.40GHz, 4 cores). In this paper, we show that massive parallelization of the 3D structure similarity search process on many-core GPU devices can reduce the execution time of the process, allowing it to be performed in real time. GPU-CASSERT is available at: http://zti.polsl.pl/dmrozek/science/gpucassert/cassert.htm.  相似文献   

20.
In modern biology, there is a critical need to develop a high-throughput and inexpensive platform for DNA sequencing. Pyrosequencing is a nonelectrophoretic single-tube DNA sequencing method that takes advantage of cooperativity between four enzymes to monitor DNA synthesis. In these studies, single-stranded DNA-binding protein (SSB) was added to the primed DNA template prior to the Pyrosequencing reaction. The addition of SSB to a Pyrosequencing reaction system resulted in a read length of more than 30 nucleotides. Improvements were observed as: (i) increased efficiency of the enzymes, (ii) reduced mispriming, as measured by nonspecific signals, (iii) an increase in signal intensity during the reaction, (iv) higher accuracy in reading the number of identical adjacent nucleotides in difficult templates, and (v) longer reads. The usefulness of these results for future Pyrosequencing applications is discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号